The accurate prediction of maximum erosion depth in riverbeds is crucial for early protection of bank slopes. In this study, K-means clustering analysis was used for outlier identification and feature selection, resulting in Plan 1 with six influential features. Plan 2 included features selected by existing methods. Regression models were built using Support Vector Regression, Random Forest Regression (RF Regression), and eXtreme Gradient Boosting on sample data from Plan 1 and Plan 2. To enhance accuracy, a Stacking method with a feed-forward neural network was introduced as the meta-learner. Model performance was evaluated using root mean squared error, mean absolute error, mean absolute percentage error, and R2 coefficients. The results demonstrate that the performance of the three models in Plan 1 outperformed that of Plan 2, with improvements in R2 values of 0.0025, 0.0423, and 0.0205, respectively. Among the three regression models in Plan 1, RF Regression performs the best with an R2 value of 0.9149 but still lower than the 0.9389 achieved by the Stacking fusion model. Compared to the existing formulas, the Stacking model exhibits superior predictive performance. This study verifies the effectiveness of combining clustering analysis, feature selection, and the Stacking method in predicting maximum scour depth in bends, providing a novel approach for bank protection design.

  • Feature selection was applied to this study and features were selected that were different from those used in existing studies.

  • The three regression models demonstrate that the features selected in this study are superior to those selected in existing studies.

  • The Stacking model was developed and compared with the existing methods, and the results show that the Stacking model is more accurate.

Natural rivers, particularly in mountainous regions, generally exhibit a sinuous course. Even in sections that appear partially straight, the local topography of the riverbed can induce curvature in the main channel. Consequently, investigating the hydrodynamics of these bends and the associated scouring mechanisms has become a pivotal focus within the realm of fluvial dynamics. For instance, numerous roads in mountainous areas are constructed parallel to river banks, where the roadbed acts as an embankment for the meandering mountain river. However, the soil supporting these embankments is susceptible to erosion by the flowing water, ultimately culminating in embankment collapse and subsequent damage to the road infrastructure.

In the realm of concave embankments, the paramount factor engendering profound scour is the phenomenon of bend circulation (Odgaard 1984). When a river meanders through a curvature, it experiences not only the pull of gravity but also the centrifugal forces at play. As a consequence, the surface of the concave embankment rises above that of its convex counterpart. A vertical flow pattern ensues close to the embankments, with water cascading down the concave side and ascending along the convex side, thus giving rise to a circulating motion. This circulation induces the migration of relatively less sandy surface water toward the concave embankment, where it descends vigorously, while the more sandy bottom water gravitates toward the convex embankment, surging upward with fervor. Such dynamics yield an asymmetry in the transportation of sediment, compounded by the erosive action of the flow on the concave embankment, causing the slope to crumble and the resulting detritus to be carried by the subsurface current toward the convex embankment. Consequently, a pronounced scour depth manifests itself on the concave embankment in comparison to the remaining expanse of the riverbed.

There have been several studies on predicting maximum scour depth. Thorne (1989) employed data derived from the Red River in Louisiana to derive an equation; Thorne & Abt (1993) discovered that empirical prognostications yielded superior concordance with measured values compared to estimates based on theoretical analyses of flow dynamics and sediment equilibrium at bends. USACE (1994) furnished a graphical correlation for determining the optimal design scour depth of a bend, while Maynord (1996) presented his equation alongside a secure design curve, having excluded laboratory data. With the advancement of computer software, several techniques have emerged for forecasting the maximum scour depth of bends through computer simulations. In the study by Ling (2006), a BP neural network was employed to predict the maximum scour depth at the bends of rivers. However, the limited amount of data used and the training model's limited accuracy pose constraints. Nevertheless, the obtained results still outperform empirical formulas. Rousseau et al. (2016) conducted a comparative analysis of six widely utilized numerical simulation methods and observed that despite their similar computational speeds, these simulations failed to yield accurate water depths and did not ensure result precision. In a recent study, Froehlich (2020) harnessed 202 sets of measured data from Maynord, Jackson, Thorne, and Abt to develop an artificial neural network (ANN) model capable of predicting the utmost scour depth at bends in sandy riverbeds and introduced a novel approach to establish an upper threshold for the maximum scour depth.

These methodologies exhibit a certain degree of applicability to the intricate realm of dataset rivers; however, they are not without their limitations. Primarily, these methodologies suffer from an insufficient consideration of variables and fail to elucidate the magnitude of influence each variable bears on the maximum scour depth. Moreover, these methodologies rely on dimensionless analysis, which, while convenient for analytical purposes, may engender a loss of autonomy among individual variables. These constraints potentially undermine the prognostic capacity of said methodologies.

Over the past few years, the utilization of artificial intelligence (AI) in various engineering domains has been prominent for the development of predictive models encompassing diverse natural variables. Ehteram et al. (2020) integrated the multilayer perceptron (MLP) model with colliding bodies' optimization (CBO). In their study, sediment size, wave characteristics, and pipeline geometry were employed as inputs for the proposed models. The MLP-CBO model outperformed regression models and empirical models in predicting pipeline scour rates. Parsaie et al. (2021) established multiple models including support vector machine (SVM) and multivariate adaptive regression splines (MARS) to predict the piezometric head and seepage discharge in an earth dam. The results showcased excellent performance of these models in prediction, particularly, the MARS model exhibiting the highest accuracy. Tofiq et al. (2022) utilized various AI techniques to develop several highly accurate prediction models for river streamflow in the Aswan High Dam.

In this study, several methods of machine learning are used in the prediction of maximum scour depths in river bends: K-means clustering analysis was adopted to identify and remove outliers, and conducted feature selection to identify the six most influential features on maximum erosion depth, referred to as Plan 1. Furthermore, to highlight the importance of feature selection, features selected by existing methods as Plan 2 were also included in the subsequent study for comparison. Along with developing three traditional regression models, Support Vector Regression (SVR), Random Forest Regression (RF Regression) and eXtreme Gradient Boosting (XGBoost), the Stacking method was introduced and a Stacking model was built with the aim of improving prediction accuracy. Furthermore, to ensure the independence of each variable, the data undergo a process of normalization.

Existing formulas and methods

Some of the existing methods for predicting maximum scour depth are listed here.

Thorne (1989) derived an equation from data collected from the Red River in Louisiana:
(1)
Maynord (1996) augmented this relationship between large and small rivers by introducing an additional term, resulting in an alternative formula:
(2)
Froehlich (2020) employed a comprehensive approach by utilizing 202 sets of measured data obtained from Maynord, Jackson, Thorne, and Abt. The input values Rc/W, W/Dmnc, and , along with the output value η = Dmxb/Dmnc − 1, were fed into an ANN model to predict the maximum scour depth at the bend in the sandy river bed. Figure 1 provides a clear understanding of the definition of several variables.
Figure 1

Section parameters of river bends and upstream junctions.

Figure 1

Section parameters of river bends and upstream junctions.

Close modal

Database

Sample data from 230 river measurement sets can be obtained from Thorne & Abt (1993). Each set consists of nine features, namely Rc, Mw, W, Dmnc, v, I, f, Q, and S, with one output variable – maximum scour depth Dmxb. Table 1 displays the statistics of these variables.

Table 1

Summary of river data

RcMwWDmncvIfQSDmxb
Mean 1,474.15 4,312.24 588.56 3.57 1.81 1.89 0.14 3,222.27 1.35 7.90 
Std 3,188.75 8,094.28 1,580.27 1.70 0.54 2.76 0.21 6,320.54 0.39 4.47 
Minimum 3.48 24.36 4.40 0.42 0.50 0.05 0.00 7.10 1.01 0.81 
Maximum 21,250.00 47,600.00 8,490.00 6.94 4.55 21.47 2.69 29,500.00 5.32 21.25 
RcMwWDmncvIfQSDmxb
Mean 1,474.15 4,312.24 588.56 3.57 1.81 1.89 0.14 3,222.27 1.35 7.90 
Std 3,188.75 8,094.28 1,580.27 1.70 0.54 2.76 0.21 6,320.54 0.39 4.47 
Minimum 3.48 24.36 4.40 0.42 0.50 0.05 0.00 7.10 1.01 0.81 
Maximum 21,250.00 47,600.00 8,490.00 6.94 4.55 21.47 2.69 29,500.00 5.32 21.25 

Rc, centerline radius of bend; Mw, meander wavelength; W, water surface width at the upstream end of bend; Dmnc, mean channel depth at upstream crossing; v, cross-section average velocity at the upstream crossing point for bankfull flow conditions; I, slope; f, friction factor; Q, quantity of flow; S, sinuosity; Dmxb, maximum water depth in bend.

Data outlier processing

Cluster analysis is a common machine learning algorithm that identifies possible small classes in the data, performs small class carving, and reveals the intrinsic structure of the data. One widely used clustering algorithm is K-means (Ikotun et al. 2022), which randomly selects k initial points in the space of clustering variables as cluster centroids. It then calculates the distance between each sample point and the centroids to allocate them accordingly, resulting in a deterministic and nonhierarchical clustering solution. In this study, the K-means algorithm was employed to reduce data noise and detect outliers by dividing the data into different subclasses. The algorithm makes a judgement on the division of clusters by minimizing the value of sum of squared error (SSE), which is given in Equation (3).
(3)
where Ci represents the ith class, p denotes all sample points in class Ci, and m denotes the center of mass of class Ci.
The main focus when applying the K-means algorithm is to identify the optimal k-value, i.e., the number of clusters. To achieve this, one can utilize the elbow method (Syakur et al. 2018), which involves visualizing the relationship between the k-value and the SSE, as shown in Figure 2.
Figure 2

Elbow curve.

To determine the optimal number of clusters, the elbow method is employed by plotting the relationship between the k-value and the SSE, as shown in Figure 2. The inflection point on the curve occurs at k = 2, indicating that increasing k beyond this point will have little impact on decreasing SSE. Thus, the optimal number of clusters is determined to be 2. After performing cluster analysis, it is observed that 218 out of the original 230 datasets belong to one class, while the remaining 12 datasets form another class. It is possible that these 12 datasets, all from the lower Ganges, either possess unique characteristics that differ from the rest of the data and thus belong to a separate class or are outliers due to measurement errors or other factors. Therefore, only the 218 datasets belonging to the main class are analyzed.

To mitigate the issues of redundant information and poor generalization caused by an excessive number of features, feature selection is imperative (Li et al. 2017). This study employs two techniques, namely, Pearson's correlation analysis and feature importance assessment, to identify the features that have the most significant impact on maximum scour depth. These methods are based on distinct principles, thereby preventing the limitations of a single linear correlation evaluation. By combining the results of both calculations, the most crucial features are selected for further analysis.

Pearson's correlation analysis of features

The Pearson correlation coefficient is a statistical measure that quantifies the strength of the linear relationship between two variables. It ranges between −1 and 1, where a value of −1 indicates a perfect negative correlation, 0 indicates no linear correlation, and 1 indicates a perfect positive correlation. The expression for the Pearson correlation coefficient is as follows (Deng et al. 2004):
(4)
where r denotes the correlation coefficient, Xi and Yi denote the variables, and and denote the means of the two variables, respectively.
In this study, Pearson's correlation analysis was conducted separately on all features to determine their correlation with the output variable. The results, as shown in Figure 3, indicate that the highest correlation between the features and the output variables is between Dmnc and Q, with a correlation coefficient of 0.92. Similarly, the highest correlation between Dmxb and the features is between Dmxb and Q, with a correlation coefficient of 0.93. This suggests that the size of the flow has the greatest influence on the linearity of the scour depth.
Figure 3

Pearson's correlation analysis.

Figure 3

Pearson's correlation analysis.

Close modal

Further analysis revealed that among the relationships between Dmxb and all the features, Rc, Mw, W, Dmnc, and Q were positively correlated with Dmxb, while v, I, f, and S were negatively correlated. Features with an absolute correlation coefficient between 0 and 0.4 were considered weakly or not correlated, indicating that their influence on Dmxb is minimal. Therefore, based on the results of the Pearson correlation analysis, it may be appropriate to simplify the model by discarding v, f, and S features.

Assessment of the importance of features

Extremely Randomized Trees (Extra-Trees) (Geurts et al. 2006) is a widely used algorithm that assesses the significance of features (Alswaina & Elleithy 2018) and falls under the bagging method in machine learning. It is based on the conventional random forest approach, where split nodes are constructed by randomly selecting features. However, Extra-Trees constructs each split node of a tree by initially gathering a random number of features and then selecting the best node features using an impurity index. The impurity index (G) for a node is calculated as follows:
(5)
where ui represents the attribute of judgement for a given node and vij denotes the corresponding value associated with said attribute. NS is the total number of training samples pertaining to the current node, with Xleft and Xright denoting the sets of training samples belonging to the left and right nodes, respectively. Finally, the symbols and represent the average values of the sample target variables of the left and right children of the current node.
The fundamental tenet of utilizing Extra-Trees to assess feature significance lies in the frequency with which a feature is chosen as a segmentation point. This principle underpins the mechanism by which the importance of each feature impacting the Dmxb value is evaluated, as depicted in Figure 4. Despite the divergent analytical approaches employed, it is noteworthy that the v, f, and S features continue to occupy the bottom rungs of the importance hierarchy.
Figure 4

Feature importance assessment.

Figure 4

Feature importance assessment.

Close modal

The selection of features was carried out through a combination of two methods, namely, Pearson's correlation analysis and Extra-Trees for feature importance assessment. The top six features that exhibited the greatest influence on the dependent variable Dmxb were selected in order of their significance, which included Q, W, Dmnc, Rc, Mw, and I. The standardized values of the six selected features were recorded as input values for Plan 1, while the features considered in existing studies including Rc/B, W/Dmnc, and were recorded as input values for Plan 2, and the output values for both Plan 1 and Plan 2 are calculated as Dmxb/Dmnc.

Regression modeling theory

In this study, three regression algorithms, namely, SVR, RF Regression, and XGBoost, are utilized as training models. The computational principles of these models are presented below.

SVR theory

SVR (Cortes & Vapnik 1995; Vapnik 1995) is a popular branch of SVMs that has been extensively used in water resources research (Liu et al. 2014; Roushangar & Koosheh 2015). The primary objective of SVR is to identify an optimal hyperplane that minimizes the distance to the sample point farthest from the hyperplane, as depicted in Figure 5.
Figure 5

Schematic diagram of the SVR method.

Figure 5

Schematic diagram of the SVR method.

Close modal
SVR works as follows: Suppose there are samples E = {(xi, yi)|i = 1, 2, …, l}, xi ɛ Rn, yi ɛ R, where Rn is an n-dimensional Euclidean space and R is the set of real numbers. The data x in the sample set are mapped to the high-dimensional space F by a nonlinear mapping ϕ, and linear regression is performed in the space F using Equation (6) as follows:
(6)
where ω is the vector of regression coefficients, ω ɛ F, and b is the bias term. On either side of the f(x) function, a spacing band is created with a spacing of ε, and only the support vector, i.e., samples outside the spacing band, affects the model. To determine the parameters ω and b, it is necessary not only to maximize the spacing ε, but also to minimize the loss. The cost function of the SVR is
where
A slack variable ξ is added to allow some samples to be outside the interval band, and a penalty factor C is added:
(7)

Solve Equation (7) using the Lagrange multiplier method, leading to the pairwise form.

Expressed in terms of training sample points, from which the linear regression function can be derived as
(8)
(9)
where αi and are solutions for minimizing R (ω, ξi, ).

RF Regression theory

RF Regression (Breiman 2001), a parallel integrated algorithm, employs a decision tree as its base learner. Implementing the idea of bagging and random subspace method (RSM) for the randomization of sample selection and feature selection, it enhances the generalization ability of the model compared to a separate decision tree.

The random forest model obtains K sets of sampled training sets with the same capacity as the original training set by K independent random samples with put-back from the original training set and uses these sampled training sets for training to obtain K decision trees, each with m (m < M) features randomly selected from M features at the nodes as the set of splitting features at the current node. The RF Regression model selects the features and splitting points that minimize the prediction error to split, repeats this step until the decision tree cannot be split, and the final prediction result is averaged over all decision trees. The random forest computation process is shown in Figure 6.
Figure 6

Flow chart of RF Regression calculation.

Figure 6

Flow chart of RF Regression calculation.

Close modal
The regression tree splitting criterion is based on choosing the input variable with the lowest Gini Index (Wang et al. 2016):
(10)
where f(tX(xi),j) is the proportion of samples with the value xi belonging to leave j as node t. The predicted value of an observation is calculated by averaging over all the trees.

XGBoost theory

The XGBoost (Chen & Guestrin 2016) model is an integrated algorithm for boosting using forward-distributed additive modeling based on Gradient Boosting (Friedman 2001); unlike the random forest model, where the base learners are independent, the base learners of XGBoost are interrelated. Given a dataset with n samples, m features, and K trees in the model, the expression for the output of the ith sample is
(11)
where , q denotes the structure of each tree, T is the number of leaf nodes, and w is the weight of each leaf node in the decision tree model.
The objective function is shown below:
(12)
where n is the number of samples, and represents the prediction error for i samples. Ω(fk) is the regular term representing the complexity of the tree, and the expression is shown below:
(13)
where γ is the number of leaves and λ is the regularization factor.
Using the greedy algorithm, each time a new weak learner is built to maximizing the reduction of the objective function. Using the second-order Taylor expansion, and by finding the first-order derivative of wj and setting the derivative to zero, the optimal solution of the objective function is found as
(14)

Grid search is a method of finding the optimal model parameters by traversing a given combination of parameters. The optimal parameters for the three regression models were determined through a comprehensive grid search method, with the selected parameters listed in Table 2.

Table 2

Parameter settings for each model

ModelParameter setting
Plan 1Plan 2
SVR C = 50 C = 135.5 
gamma = 0.05 gamma = 0.05 
epsilon = 0.35 epsilon = 0.3 
kernel = "rbf" kernel = "rbf" 
RF Regression max_depth = 11 max_depth = 5 
n_estimators = 20 n_estimators = 10 
XGBoost max_depth = 3 max_depth = 11 
n_estimators = 10 n_estimators = 40 
learning_rate = 0.5 learning_rate = 0.5 
ModelParameter setting
Plan 1Plan 2
SVR C = 50 C = 135.5 
gamma = 0.05 gamma = 0.05 
epsilon = 0.35 epsilon = 0.3 
kernel = "rbf" kernel = "rbf" 
RF Regression max_depth = 11 max_depth = 5 
n_estimators = 20 n_estimators = 10 
XGBoost max_depth = 3 max_depth = 11 
n_estimators = 10 n_estimators = 40 
learning_rate = 0.5 learning_rate = 0.5 

The evaluation of each regression model's test results was conducted using a set of commonly used indicators, and the definitions of these indicators are expounded below, where yi represents the observed value, denotes the anticipated value of the model, and n refers to the size of the sample. Smaller values of root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) indicate higher predictive accuracy of the model (Handelman et al. 2019; Abed et al. 2023).

  • (1)
    RMSE: RMSE is a sample standard deviation that represents the difference between predicted values and observed values, describing the dispersion of the sample.
    (15)
  • (2)
    MAE: MAE is the average of absolute errors between predicted values and observed values. It is a linear score where all individual differences have equal weights at the mean.
    (16)
  • (3)
    MAPE: MAPE is a measure of relative error that uses absolute values to avoid offsetting of positive and negative errors.
    (17)
  • (4)
    Coefficient of determination (R2): R2 is typically used for comparing the performance of different models. Its values range from 0 to 1, and the closer it is to 1, the better the model explains the data variance.
    (18)

Stacking fusion model

The present investigation employs a Stacking approach (Pavlyshenko 2018) to formulate a fusion model that amalgamates the advantages of three distinct models, thereby enhancing the prognostication of maximum scour depth.

The concept of the Stacking method, first proposed by Wolpert (1992), represents an integrated learning approach that combines three major advantages of good model effect, strong interpretability, and applicability to complex data. As one of the most practical methods in the field of fusion, it has gained widespread recognition. The principle of the Stacking method is relatively simple: it consists of two layers of algorithms. The first layer, known as level 0, contains one or more base learners, typically models with high complexity and strong learning ability. The second layer, referred to as level 1, contains only one meta-learner, and models with strong generalization ability are usually selected for this role. During training, the data are first fed into level 0 for training, and each base learner produces the corresponding prediction results, which form a new matrix that is then fed into the level 1 meta-learner for training. The model structure can be visualized using Figure 7.
Figure 7

Principle flow chart of the Stacking method.

Figure 7

Principle flow chart of the Stacking method.

Close modal

The pre-processed data are standardized, and the training and test sets are randomly split in an 8:2 ratio. The training set is then fed into the base learner model for training. To address the issue of small sample size, cross-validation can be utilized to expand the data as the base learner trains.

This study employs a fivefold cross-validation approach (Berrar 2019), whereby the training data are partitioned into five subsets, with four subsets used for training and one for validation. This process is repeated five times to ensure the robustness of the model. During cross-validation, the validation set is not used for training, and the prediction results on this set are used to evaluate the generalization ability of the model. The resulting predictions from each of the five models correspond to five subsets of the original training set, thereby enabling all the data to be predicted simultaneously upon the completion of cross-validation. Each base learner is trained separately and outputs its respective predictions, which are then merged to form a new feature matrix comprising 186 rows and 3 columns as the training set for the meta-learner. During each cross-validation, a prediction is obtained for the test dataset. The predictions from the five cross-validations are then averaged, and the resulting predictions from the test sets of the three models are combined into a new matrix, which serves as the test set features for the meta-learner. Following the training of the base learners, the newly generated feature matrix is fed into the meta-learner for further training.

In this study, the meta-learner chosen was the feed-forward neural network (FNN) (Rumelhart et al. 1986; Kwok & Yeung 1997), renowned for its formidable nonlinear modeling capabilities and the ability to accurately fit nonlinear data. In addition, the FNN model demonstrated excellent generalization aptitude. Using Plan 1 as an example, the FNN model was composed of one input layer, two hidden layers, and one output layer. The hidden layers are comprised of eight, six, and four neurons, respectively, and the topology of the FNN model is depicted in Figure 8; two functions can be performed on every node of it.
Figure 8

Topology diagram of the FNN.

Figure 8

Topology diagram of the FNN.

Close modal
The mathematical representation of the input and output values of the hidden and output layers is as follows (Svozil et al. 1997):
(19)
(20)
(21)
where h is the number of neurons in the hidden layer, xi is the input value of this layer, which is the feature matrix obtained by training the base learner, wki and wj are the connection weights between the neurons, bk and b are the thresholds, f(x) is the activation function, and in this paper the sigmoid function is chosen. ak and zk are the input and output values of the hidden layer, respectively. The expression of the sigmoid function is shown below:
(22)
The training procedure consists of two distinct phases: forward propagation and backward propagation. During the forward propagation phase, the input data traverse through the input and hidden layers, ultimately reaching the output layer. This process does not alter the model's parameters. Subsequently, the error is computed by comparing the output with the correct value. The error is then propagated back through the original pathway via backward propagation, wherein the model's parameters are adjusted to minimize the error. The mean squared error (MSE) function is employed to quantify the error, as depicted in the equation below:
(23)
where yi denotes the measured value, represents the predicted value generated by the model, and n signifies the sample size.

The two models, designated as Plan 1 and Plan 2, were trained independently, with their respective parameters outlined in Table 3.

Table 3

Parameter setting of the FNN model

ModelParameter setting
Plan 1Plan 2
FNN Activation = ‘sigmoid’ Activation = ‘sigmoid’ 
Number of hidden layers:3 Number of hidden layers:3 
Number of neurons per layer: 8, 6, 4, 1 Number of neurons per layer: 8, 6, 4, 1 
ModelParameter setting
Plan 1Plan 2
FNN Activation = ‘sigmoid’ Activation = ‘sigmoid’ 
Number of hidden layers:3 Number of hidden layers:3 
Number of neurons per layer: 8, 6, 4, 1 Number of neurons per layer: 8, 6, 4, 1 

Results

Comparison of each model results

The anticipated maximum scour depths were computed individually for each base learner and the Stacking fusion model and subsequently juxtaposed with the measured maximum scour depths. The output value of the model is Dmxb/Dmnc, which enables the prediction of Dmxb and thereby the assessment of each model's performance in predicting Dmxb. The outcome of each evaluation metric is presented in Table 4.

Table 4

Comparison of each model

ModelRMSE
MAE
MAPE
R2
Plan 1Plan 2Plan 1Plan 2Plan 1Plan 2Plan 1Plan 2
SVR 1.1528 1.1650 0.9058 0.8615 0.1123 0.1111 0.8845 0.8820 
RF Regression 0.9896 1.2106 0.7768 0.9739 0.0927 0.1232 0.9149 0.8726 
XGBoost 1.3954 1.4777 1.0183 1.1226 0.1197 0.1468 0.8307 0.8102 
Stacking model 0.8385 0.9591 0.6647 0.7537 0.0819 0.0999 0.9389 0.9200 
ModelRMSE
MAE
MAPE
R2
Plan 1Plan 2Plan 1Plan 2Plan 1Plan 2Plan 1Plan 2
SVR 1.1528 1.1650 0.9058 0.8615 0.1123 0.1111 0.8845 0.8820 
RF Regression 0.9896 1.2106 0.7768 0.9739 0.0927 0.1232 0.9149 0.8726 
XGBoost 1.3954 1.4777 1.0183 1.1226 0.1197 0.1468 0.8307 0.8102 
Stacking model 0.8385 0.9591 0.6647 0.7537 0.0819 0.0999 0.9389 0.9200 

The predicted maximum scour depths for each base learner and the Stacking fusion model were calculated separately and compared with the measured maximum scour depths. The evaluation metrics used to assess the performance of each model included RMSE, MAE, MAPE, and R2. In essence, the predictive performance of a model is deemed better as the coefficient of determination approaches 1 or lower than its RMSE, MAE, and MAPE values.

Table 4 presents the results of the evaluation metrics for all models in Plan 1, with coefficients of determination exceeding 0.8 for all models. The RF Regression model achieved the highest accuracy with an R2 value of 0.9149, followed by the SVR model with an R2 value of 0.8845, while the XGBoost model had the lowest R2 value of 0.8307. In Plan 2, the XGBoost model also had the lowest R2 value among the SVR, RF Regression, and XGBoost models.

Although overall the four evaluation metrics yield consistent results for assessing model performance, there are a few exceptions: from the RMSE and R2 metrics of the SVR model in Plan 1 and Plan 2, it can be observed that its predictive performance of Plan 1 is superior to that of Plan 2. However, its MAE and MAPE values of Plan 1 are larger than those of Plan 2, contradicting the previous conclusion. This discrepancy can be attributed to the fact that MAE and MAPE reflect the model's mean error on the sample, whereas R2 and RMSE focus more on the fit of the model to the sample variance.

Overall, in Plan 1, the RF Regression model performs the best in predicting the maximum scour depth, followed by the SVR and XGBoost models. In Plan 2, the SVR model demonstrates the best performance, followed by the RF Regression model and XGBoost model. Figure 9 illustrates the predictions of each model on the test set compared to the true values and the absolute error values.
Figure 9

Comparison of predicted values of each model.

Figure 9

Comparison of predicted values of each model.

Close modal

The Stacking model, an amalgamation of the SVR, RF Regression, XGBoost models, along with the FNN model, was meticulously examined in comparison to the SVR, RF Regression, and XGBoost models individually. Evidently, the Stacking model exhibits superior prediction accuracy when compared to the three underlying base learner models. This is substantiated by its R2, which attains a remarkable value of 0.9389. Notably, this surpasses the R2 value of the most proficient base learner, RF Regression, by 0.0240 and exceeds that of the least proficient base learner, XGBoost, by a substantial margin of 0.1082. Furthermore, the values of the three error metrics for the Stacking model are also comparatively smaller than those obtained from the base learners, thereby underscoring the evident optimization achieved through the employment of the Stacking model.

Comparison of the Stacking model and other methods

The predictive outcomes of Plan 1's Stacking model were juxtaposed against established methodologies, namely, the Thorne formula, the Maynord formula, and the ANN model, on the test dataset. The evaluation of the prediction results persisted through the employment of four metrics: RMSE, MAE, MAPE, and R2. It is worth noting that the Thorne formula necessitates adherence to the prerequisite that Rc/W exceeds 2 for the sample data. Consequently, meticulous analysis and comparison were conducted solely on the subset of 35 datasets from the test dataset that satisfied this criterion. The ensuing findings are eloquently exhibited in Figure 10. The outcome of each evaluation metric is presented in Table 5.
Table 5

Comparison of the stacking model and other methods

R2RMSEMAEMAPE
Thorne formula 0.8851 1.1900 0.8806 0.1281 
Maynord formula 0.9253 0.9594 0.7674 0.1013 
ANN model 0.9227 0.9760 0.8160 0.1122 
Stacking model 0.9456 0.8189 0.6475 0.0826 
R2RMSEMAEMAPE
Thorne formula 0.8851 1.1900 0.8806 0.1281 
Maynord formula 0.9253 0.9594 0.7674 0.1013 
ANN model 0.9227 0.9760 0.8160 0.1122 
Stacking model 0.9456 0.8189 0.6475 0.0826 
Figure 10

Comparison of each formula and model.

Figure 10

Comparison of each formula and model.

Close modal

In the scatter plot depicting predicted values against measured values, the proximity of each data point to the y = x line indicates the similarity between the predicted and measured values. Among the three other methods, the Thorne formula exhibits the greatest deviation from the line of perfect fit, represented by y = x. Conversely, the scatter points corresponding to the ANN model and the Maynord formula exhibit closer proximity to the y = x line when compared to those generated by the Thorne formula on the whole.

The Stacking model outperforms all other methodologies across the 35 datasets, as its predicted values demonstrate the highest degree of concurrence with the measured values overall.

The presented bar chart exhibits the assessment indicators for all the different forecasting methods. Through rigorous numerical analysis, it is evident that the Stacking model surpasses all others in terms of forecasting accuracy. With R2 exceeding 0.94, this model demonstrates exceptional predictive capabilities. Following closely behind is the Maynord formula, boasting an R2 greater than 0.92 and approaching 0.93, while the ANN model, although slightly inferior, demonstrates comparable performance. Conversely, the Thorne formula fails to achieve an R2 surpassing 0.9.

When considering the three error indicators, namely, RMSE, MAE, and MAPE, it becomes apparent that the Stacking model yields the lowest value for each of these metrics among all the models and formulas. The Maynord formula, the ANN model, and the Thorne formula subsequently follow suit, albeit with marginally higher error indicators. Notably, these three error indicators align harmoniously with the aforementioned R2, further substantiating the models' predictive efficacy.

Discussion

The comparison between Plan 1 and Plan 2 reveals that the predictive performance of each model in Plan 1 surpasses that of Plan 2 generally, with the most significant improvement observed in the RF Regression model, where the coefficient of determination has increased by 0.0423. This could be attributed to several factors. First, the existing formulas and methods have undergone a process between the features Rc/B, W/Dmnc, and , resulting in reduced independence of each feature. Secondly, these features may not be the most appropriate indicators of maximum scour depth, and there could be other factors that exert a greater influence on maximum scour depth, which are not taken into account. To address this issue, this study conducted a comprehensive analysis based on Pearson's correlation analysis and the Extra-Trees model and selected a few key features as independent variables, which were standardized and entered into the regression models. Notably, each regression model predicted maximum scour depth more accurately than the features selected by the existing methods.

Among the quartet of regression models considered in this study, namely, the Stacking model, SVR model, RF Regression model, and XGBoost model, the Stacking model exhibits the highest R2 value at an impressive 0.9389 in Plan 1 notably. This represents a significant improvement of 0.0240 compared to the leading RF Regression model among the base learners and a remarkable enhancement of 0.1082 over the weakest performing XGBoost model. Thus, it is evident that the Stacking model demonstrates superior predictive capabilities.

Both Plan 1 and Plan 2 corroborate the efficacy of employing the FNN model as the meta-learner for the Stacking model in this investigation. The Stacking model amalgamates the strengths of multiple models, yielding enhanced prediction outcomes.

Furthermore, the SVR model of Plan 1 exhibits comparable predictive performance to that of Plan 2. However, the predictive performance of the other three models surpasses that of Plan 2. Specifically, the RF Regression model, XGBoost model, and Stacking model achieved an increase in the R2 value of 0.0423, 0.0205, and 0.0189, respectively. The comparison between Plan 1 and Plan 2 demonstrates the significance of feature selection in predicting maximum scour depth. This further highlights the fact that several pivotal features, meticulously chosen through Pearson's correlation analysis and the Extra-Trees model, enable each regression model to more accurately predict the maximum scour depth in contrast to the features selected by existing prediction methods.

In comparison to the existing formulas, the Stacking model demonstrates superior predictive performance among the 35 samples. Its R2 outperforms the best-performing Maynord formula by 0.0203, surpasses the moderately performing ANN model by 0.0229, and exceeds the comparatively weakest Thorne formula by 0.0605. This conclusion emerges from a careful consideration of both intuitive observations and rigorous evaluation criteria. Notably, this exceptional performance can be attributed to the astute selection of pivotal features for analysis as well as the proficient utilization of the Stacking method, which seamlessly amalgamates the strengths inherent in multiple regression models.

In the present investigation, a rigorous outlier detection approach utilizing K-means cluster analysis was employed to identify and remove outliers from a total of 230 datasets, resulting in 218 remaining datasets for further analysis. Subsequently, through meticulous consideration, six distinctive features (namely, Q, W, Dmnc, Rc, Mw, and I) were selected from a pool of nine features (Rc, Mw, W, Dmnc, v, I, f, Q, and S) using Pearson's correlation analysis and Extra-Trees feature importance evaluation. Notably, these chosen features differ from those examined in previous studies, which are Rc/B, W/Dmnc, and . To predict the maximum scour depth accurately, three regression algorithms, namely, SVR, RF Regression, and XGBoost, were employed for training and prediction purposes. To enhance the predictive accuracy, a Stacking model was constructed, with the aforementioned three models serving as base learners and the FNN model functioning as the meta-learner. The standardized values of the six selected features were recorded as input values for Plan 1, while the features considered in existing studies were recorded as input values for Plan 2, facilitating a comprehensive comparative analysis.

The evaluation of the training models, Plan 1 and Plan 2, involved the assessment of their prediction results using four metrics: RMSE, MAE, MAPE, and R2. A total of 44 datasets out of the original 218 were utilized for this purpose. Furthermore, the prediction outcomes of the Stacking models, derived from both Plan 1 and Plan 2, were compared against three other models: SVR, RF Regression, and XGBoost. Among the three fundamental models, the RF Regression model performs the best in Plan 1, while the SVR model performs the best in Plan 2. For Plan 1, the value of R2 of the RF Regression model is 0.9149, and the RMSE, MAE, and MAPE values are 0.9896, 0.7768, and 0.0927, respectively. For Plan 2, the R2 value of the SVR model is 0.8820, and the RMSE, MAE, and MAPE values are 1.1650, 0.8615, and 0.1111, respectively. On the other hand, the worst-performing model is the XGBoost model, with R2 values of only 0.8307 for Plan 1 and 0.8102 for Plan 2. However, both in Plan 1 and Plan 2, the performance of these models falls short compared to the Stacking model. The Stacking model has R2 values of 0.9389 and 0.9200 for Plan 1 and Plan 2, respectively. The corresponding RMSE, MAE, and MAPE values are 0.8385 and 0.9591, 0.6647 and 0.7537, and 0.0819 and 0.0999, respectively.

Upon comparing Plan 1 and Plan 2, it was observed that the prediction results obtained from each base learner within Plan 1, as well as the Stacking model itself, outperformed those of Plan 2. This superiority can be attributed to the more comprehensive set of features considered in Plan 1, which surpassed the limited scope of features in Plan 2.

The Stacking model was compared with the other methods, and the prediction results for the 35 datasets in the test set showed that the Stacking model outperformed the other methods, with a R2 of 0.9490, while the other methods' R2 were all below 0.93. This superior performance can be attributed, in part, to the removal of outliers through K-means clustering analysis prior to model construction. By reducing the interference of outliers during model training, the accuracy of predictions was improved. Another factor contributing to the Stacking model's success is the careful selection of six features: Q, W, Dmnc, Rc, Mw, and I. These features were chosen based on Pearson's correlation analysis and Extra-Trees feature importance assessment. It was found that these features have a more comprehensive impact on the maximum flush depth compared to other methods. Their inclusion in the model greatly enhanced its predictive capabilities. Furthermore, the Stacking method employed in this study incorporates the advantages of multiple regression models. This makes it particularly well-suited for predicting the maximum scour depth. By leveraging the strengths of different regression models, the Stacking method combines their merits and yields more accurate predictions. However, the model accuracy in this study has not yet reached a completely satisfactory level. The fundamental reason is that the database contains only a few hundred samples, which is not sufficient. In addition, although there are nine original features in the samples, other important factors such as the properties of the soil at river bends were not taken into account. Therefore, if there were a larger database available in the future, the method employed in this study should be able to provide more accurate predictions.

In conclusion, the combination of K-means clustering, feature selection, and Stacking method for predicting the maximum scour depth of bends has yielded promising results. This novel approach not only provides valuable insights into the design of concave bank protection but also offers a new avenue for future research in this field.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abed
M.
,
Imteaz
M. A.
,
Ahmed
A. N.
&
Huang
Y. F.
2023
A novel application of transformer neural network (TNN) for estimating pan evaporation rate
.
Applied Water Science
13
(
2
),
31
.
Berrar
D.
2019
Encyclopedia of Bioinformatics and Computational Biology
(Shoba Ranganathan, Michael Gribskov, Kenta Nakai, Christian Schönbach, eds.). vol 1, pp. 542?545. Elsevier, Amsterdam, The Netherlands.
Breiman
L.
2001
Random forests
.
Machine Learning
5
32
.
Chen
T.
&
Guestrin
C.
2016
XGBoost: A Scalable Tree Boosting System
.
Knowledge Discovery and Data Mining. ACM
.
Cortes
C.
&
Vapnik
V. N.
1995
Support vector networks
.
Machine Learning
20
(
3
),
273
297
.
Deng
L.
,
Pei
J.
,
Ma
J.
&
Lee
D. L.
2004
A rank sum test method for informative gene discovery
. In:
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
410
419
.
Ehteram
M.
,
Ahmed
A. N.
,
Ling
L.
,
Fai
C. M.
,
Latif
S. D.
,
Afan
H. A.
,
Banadkooki
F. B.
&
El-Shafie
A.
2020
Pipeline scour rates prediction-based model utilizing a multilayer perceptron-colliding body algorithm
.
Water
12
(
3
),
902
.
Friedman
J.
2001
Greedy function approximation: A gradient boosting machine
.
Annals of Statistics
29
(
5
),
1189
1232
.
Froehlich
D. C.
2020
Neural network prediction of maximum scour in bends of sand-bed rivers
.
Journal of Hydraulic Engineering
146
(
10
).
Geurts
P.
,
Ernst
D.
&
Wehenkel
L.
2006
Extremely randomized trees
.
Machine Learning
63
(
1
),
3
42
.
Handelman
G. S.
,
Kok
H. K.
,
Chandra
R. V.
,
Razavi
A. H.
,
Huang
S.
,
Brooks
M.
,
Lee
M. J.
&
Asadi
H.
2019
Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods
.
American Journal of Roentgenology
212
(
1
),
38
43
.
Ikotun
A. M.
,
Ezugwu
A. E.
,
Abualigah
L.
,
Abuhaija
B.
&
Heming
J.
2022
K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of Big Data
.
Information Sciences
622 (3), 178–210
.
Li
J.
,
Cheng
K.
,
Wang
S.
,
Morstatter
F.
,
Trevino
R. P.
,
Tang
J.
&
Liu
H.
2017
Feature selection: A data perspective
.
ACM Computing Surveys (CSUR)
50
(
6
),
1
45
.
Ling
J.
,
Cui
B.
&
Zhao
H.
2006
BP neural network model-based prediction of maximal scour-depth at river bends
.
Journal Tongji University
34
(
8
),
1040
.
Maynord
S. T.
1996
Toe-scour estimation in stabilized bendways
.
Journal of Hydraulic Engineering
122
(
8
),
460
464
.
Odgaard
A. J.
1984
Flow and bed topography in alluvial channel bend
.
Journal of Hydraulic Engineering
110
(
4
),
521
536
.
Parsaie
A.
,
Haghiabi
A. H.
,
Latif
S. D.
&
Tripathi
R. P.
2021
Predictive modelling of piezometric head and seepage discharge in earth dam using soft computational models
.
Environmental Science and Pollution Research
28
(
43
),
60842
60856
.
Pavlyshenko
B.
2018
Using stacking approaches for machine learning models
. In
2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP)
.
IEEE
, pp.
255
258
.
Roushangar
K.
&
Koosheh
A.
2015
Evaluation of GA-SVR method for modeling bed load transport in gravel-bed rivers
.
Journal of Hydrology
527
,
1142
1152
.
Rousseau
Y. Y.
,
Biron
P. M.
&
Wiel
M. J. V. D.
2016
Sensitivity of simulated flow fields and bathymetries in meandering channels to the choice of a morphodynamic model
.
Earth Surface Processes & Landforms
41
(
9
),
1169
1184
.
Rumelhart
D. E.
,
Hinton
G. E.
&
Williams
R. J.
1986
Learning representations by back-propagating errors
.
Nature
323
(
6088
),
533
536
.
Svozil
D.
,
Kvasnicka
V.
&
Pospichal
J.
1997
Introduction to multi-layer feed-forward neural networks
.
Chemometrics and Intelligent Laboratory Systems
39
(
1
),
43
62
.
Syakur
M. A.
,
Khotimah
B. K.
,
Rochman
E. M. S.
&
Satoto
B. D.
2018
Integration k-means clustering method and elbow method for identification of the best customer profile cluster
. In:
IOP Conference Series: Materials Science and Engineering
, Vol.
336
.
IOP Publishing
, p.
012017
.
Thorne
C. R.
1989
Bank Processes on the Red River between Index, Arkansas and Shreveport, Louisiana, Queen Mary College, London, UK
.
Thorne
C. R.
&
Abt
S. R.
1993
Velocity and scour prediction in river bends
.
Tofiq
Y. M.
,
Latif
S. D.
,
Ahmed
A. N.
,
Kumar
P.
&
El-Shafie
A.
2022
Optimized model inputs selections for enhancing river streamflow forecasting accuracy using different artificial intelligence techniques
.
Water Resources Management
36
(
15
),
5999
6016
.
USACE
.
1994
Hydraulic Design of Flood Control Channels
.
Engineer Manual No. 1110-2-1601
.
USACE
,
Washington, DC
.
Vapnik
V.
1995
The Nature of Statistical Learning Theory
.
Springer
,
New York
.
Wolpert
D. H.
1992
Stacked generalization
.
Neural Networks
5
(
2
),
241
259
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).