## Abstract

The accurate prediction of maximum erosion depth in riverbeds is crucial for early protection of bank slopes. In this study, *K*-means clustering analysis was used for outlier identification and feature selection, resulting in Plan 1 with six influential features. Plan 2 included features selected by existing methods. Regression models were built using Support Vector Regression, Random Forest Regression (RF Regression), and eXtreme Gradient Boosting on sample data from Plan 1 and Plan 2. To enhance accuracy, a Stacking method with a feed-forward neural network was introduced as the meta-learner. Model performance was evaluated using root mean squared error, mean absolute error, mean absolute percentage error, and *R*^{2} coefficients. The results demonstrate that the performance of the three models in Plan 1 outperformed that of Plan 2, with improvements in *R*^{2} values of 0.0025, 0.0423, and 0.0205, respectively. Among the three regression models in Plan 1, RF Regression performs the best with an *R*^{2} value of 0.9149 but still lower than the 0.9389 achieved by the Stacking fusion model. Compared to the existing formulas, the Stacking model exhibits superior predictive performance. This study verifies the effectiveness of combining clustering analysis, feature selection, and the Stacking method in predicting maximum scour depth in bends, providing a novel approach for bank protection design.

## HIGHLIGHTS

Feature selection was applied to this study and features were selected that were different from those used in existing studies.

The three regression models demonstrate that the features selected in this study are superior to those selected in existing studies.

The Stacking model was developed and compared with the existing methods, and the results show that the Stacking model is more accurate.

## INTRODUCTION

Natural rivers, particularly in mountainous regions, generally exhibit a sinuous course. Even in sections that appear partially straight, the local topography of the riverbed can induce curvature in the main channel. Consequently, investigating the hydrodynamics of these bends and the associated scouring mechanisms has become a pivotal focus within the realm of fluvial dynamics. For instance, numerous roads in mountainous areas are constructed parallel to river banks, where the roadbed acts as an embankment for the meandering mountain river. However, the soil supporting these embankments is susceptible to erosion by the flowing water, ultimately culminating in embankment collapse and subsequent damage to the road infrastructure.

In the realm of concave embankments, the paramount factor engendering profound scour is the phenomenon of bend circulation (Odgaard 1984). When a river meanders through a curvature, it experiences not only the pull of gravity but also the centrifugal forces at play. As a consequence, the surface of the concave embankment rises above that of its convex counterpart. A vertical flow pattern ensues close to the embankments, with water cascading down the concave side and ascending along the convex side, thus giving rise to a circulating motion. This circulation induces the migration of relatively less sandy surface water toward the concave embankment, where it descends vigorously, while the more sandy bottom water gravitates toward the convex embankment, surging upward with fervor. Such dynamics yield an asymmetry in the transportation of sediment, compounded by the erosive action of the flow on the concave embankment, causing the slope to crumble and the resulting detritus to be carried by the subsurface current toward the convex embankment. Consequently, a pronounced scour depth manifests itself on the concave embankment in comparison to the remaining expanse of the riverbed.

There have been several studies on predicting maximum scour depth. Thorne (1989) employed data derived from the Red River in Louisiana to derive an equation; Thorne & Abt (1993) discovered that empirical prognostications yielded superior concordance with measured values compared to estimates based on theoretical analyses of flow dynamics and sediment equilibrium at bends. USACE (1994) furnished a graphical correlation for determining the optimal design scour depth of a bend, while Maynord (1996) presented his equation alongside a secure design curve, having excluded laboratory data. With the advancement of computer software, several techniques have emerged for forecasting the maximum scour depth of bends through computer simulations. In the study by Ling (2006), a BP neural network was employed to predict the maximum scour depth at the bends of rivers. However, the limited amount of data used and the training model's limited accuracy pose constraints. Nevertheless, the obtained results still outperform empirical formulas. Rousseau *et al.* (2016) conducted a comparative analysis of six widely utilized numerical simulation methods and observed that despite their similar computational speeds, these simulations failed to yield accurate water depths and did not ensure result precision. In a recent study, Froehlich (2020) harnessed 202 sets of measured data from Maynord, Jackson, Thorne, and Abt to develop an artificial neural network (ANN) model capable of predicting the utmost scour depth at bends in sandy riverbeds and introduced a novel approach to establish an upper threshold for the maximum scour depth.

These methodologies exhibit a certain degree of applicability to the intricate realm of dataset rivers; however, they are not without their limitations. Primarily, these methodologies suffer from an insufficient consideration of variables and fail to elucidate the magnitude of influence each variable bears on the maximum scour depth. Moreover, these methodologies rely on dimensionless analysis, which, while convenient for analytical purposes, may engender a loss of autonomy among individual variables. These constraints potentially undermine the prognostic capacity of said methodologies.

Over the past few years, the utilization of artificial intelligence (AI) in various engineering domains has been prominent for the development of predictive models encompassing diverse natural variables. Ehteram *et al.* (2020) integrated the multilayer perceptron (MLP) model with colliding bodies' optimization (CBO). In their study, sediment size, wave characteristics, and pipeline geometry were employed as inputs for the proposed models. The MLP-CBO model outperformed regression models and empirical models in predicting pipeline scour rates. Parsaie *et al.* (2021) established multiple models including support vector machine (SVM) and multivariate adaptive regression splines (MARS) to predict the piezometric head and seepage discharge in an earth dam. The results showcased excellent performance of these models in prediction, particularly, the MARS model exhibiting the highest accuracy. Tofiq *et al.* (2022) utilized various AI techniques to develop several highly accurate prediction models for river streamflow in the Aswan High Dam.

In this study, several methods of machine learning are used in the prediction of maximum scour depths in river bends: *K*-means clustering analysis was adopted to identify and remove outliers, and conducted feature selection to identify the six most influential features on maximum erosion depth, referred to as Plan 1. Furthermore, to highlight the importance of feature selection, features selected by existing methods as Plan 2 were also included in the subsequent study for comparison. Along with developing three traditional regression models, Support Vector Regression (SVR), Random Forest Regression (RF Regression) and eXtreme Gradient Boosting (XGBoost), the Stacking method was introduced and a Stacking model was built with the aim of improving prediction accuracy. Furthermore, to ensure the independence of each variable, the data undergo a process of normalization.

## METHODS

### Existing formulas and methods

Some of the existing methods for predicting maximum scour depth are listed here.

*R*/

_{c}*W*,

*W*/

*D*, and , along with the output value

_{mnc}*η*=

*D*− 1, were fed into an ANN model to predict the maximum scour depth at the bend in the sandy river bed. Figure 1 provides a clear understanding of the definition of several variables.

_{mxb}/D_{mnc}### Database

Sample data from 230 river measurement sets can be obtained from Thorne & Abt (1993). Each set consists of nine features, namely *R _{c}*,

*M*,

_{w}*W*,

*D*,

_{mnc}*v*,

*I*,

*f*,

*Q*, and

*S*, with one output variable – maximum scour depth

*D*. Table 1 displays the statistics of these variables.

_{mxb}. | R
. _{c} | M
. _{w} | W
. | D
. _{mnc} | v
. | I
. | f
. | Q
. | S
. | D
. _{mxb} |
---|---|---|---|---|---|---|---|---|---|---|

Mean | 1,474.15 | 4,312.24 | 588.56 | 3.57 | 1.81 | 1.89 | 0.14 | 3,222.27 | 1.35 | 7.90 |

Std | 3,188.75 | 8,094.28 | 1,580.27 | 1.70 | 0.54 | 2.76 | 0.21 | 6,320.54 | 0.39 | 4.47 |

Minimum | 3.48 | 24.36 | 4.40 | 0.42 | 0.50 | 0.05 | 0.00 | 7.10 | 1.01 | 0.81 |

Maximum | 21,250.00 | 47,600.00 | 8,490.00 | 6.94 | 4.55 | 21.47 | 2.69 | 29,500.00 | 5.32 | 21.25 |

. | R
. _{c} | M
. _{w} | W
. | D
. _{mnc} | v
. | I
. | f
. | Q
. | S
. | D
. _{mxb} |
---|---|---|---|---|---|---|---|---|---|---|

Mean | 1,474.15 | 4,312.24 | 588.56 | 3.57 | 1.81 | 1.89 | 0.14 | 3,222.27 | 1.35 | 7.90 |

Std | 3,188.75 | 8,094.28 | 1,580.27 | 1.70 | 0.54 | 2.76 | 0.21 | 6,320.54 | 0.39 | 4.47 |

Minimum | 3.48 | 24.36 | 4.40 | 0.42 | 0.50 | 0.05 | 0.00 | 7.10 | 1.01 | 0.81 |

Maximum | 21,250.00 | 47,600.00 | 8,490.00 | 6.94 | 4.55 | 21.47 | 2.69 | 29,500.00 | 5.32 | 21.25 |

*R _{c}*, centerline radius of bend;

*M*, meander wavelength;

_{w}*W*, water surface width at the upstream end of bend;

*D*, mean channel depth at upstream crossing;

_{mnc}*v*, cross-section average velocity at the upstream crossing point for bankfull flow conditions;

*I*, slope;

*f*, friction factor;

*Q*, quantity of flow;

*S*, sinuosity;

*D*, maximum water depth in bend.

_{mxb}#### Data outlier processing

*K*-means (Ikotun

*et al.*2022), which randomly selects

*k*initial points in the space of clustering variables as cluster centroids. It then calculates the distance between each sample point and the centroids to allocate them accordingly, resulting in a deterministic and nonhierarchical clustering solution. In this study, the

*K*-means algorithm was employed to reduce data noise and detect outliers by dividing the data into different subclasses. The algorithm makes a judgement on the division of clusters by minimizing the value of sum of squared error (SSE), which is given in Equation (3).where

*C*represents the

_{i}*i*th class,

*p*denotes all sample points in class

*C*, and

_{i}*m*denotes the center of mass of class

*C*.

_{i}*K*-means algorithm is to identify the optimal

*k*-value, i.e., the number of clusters. To achieve this, one can utilize the elbow method (Syakur

*et al.*2018), which involves visualizing the relationship between the

*k*-value and the SSE, as shown in Figure 2.

To determine the optimal number of clusters, the elbow method is employed by plotting the relationship between the *k*-value and the SSE, as shown in Figure 2. The inflection point on the curve occurs at *k* = 2, indicating that increasing *k* beyond this point will have little impact on decreasing SSE. Thus, the optimal number of clusters is determined to be 2. After performing cluster analysis, it is observed that 218 out of the original 230 datasets belong to one class, while the remaining 12 datasets form another class. It is possible that these 12 datasets, all from the lower Ganges, either possess unique characteristics that differ from the rest of the data and thus belong to a separate class or are outliers due to measurement errors or other factors. Therefore, only the 218 datasets belonging to the main class are analyzed.

To mitigate the issues of redundant information and poor generalization caused by an excessive number of features, feature selection is imperative (Li *et al.* 2017). This study employs two techniques, namely, Pearson's correlation analysis and feature importance assessment, to identify the features that have the most significant impact on maximum scour depth. These methods are based on distinct principles, thereby preventing the limitations of a single linear correlation evaluation. By combining the results of both calculations, the most crucial features are selected for further analysis.

#### Pearson's correlation analysis of features

*et al.*2004):where

*r*denotes the correlation coefficient,

*X*and

_{i}*Y*denote the variables, and and denote the means of the two variables, respectively.

_{i}*D*and

_{mnc}*Q*, with a correlation coefficient of 0.92. Similarly, the highest correlation between

*D*and the features is between

_{mxb}*D*and

_{mxb}*Q*, with a correlation coefficient of 0.93. This suggests that the size of the flow has the greatest influence on the linearity of the scour depth.

Further analysis revealed that among the relationships between *D _{mxb}* and all the features,

*R*,

_{c}*M*,

_{w}*W*,

*D*, and

_{mnc}*Q*were positively correlated with

*D*, while

_{mxb}*v*,

*I*,

*f*, and

*S*were negatively correlated. Features with an absolute correlation coefficient between 0 and 0.4 were considered weakly or not correlated, indicating that their influence on

*D*is minimal. Therefore, based on the results of the Pearson correlation analysis, it may be appropriate to simplify the model by discarding

_{mxb}*v*,

*f*, and

*S*features.

#### Assessment of the importance of features

*et al.*2006) is a widely used algorithm that assesses the significance of features (Alswaina & Elleithy 2018) and falls under the bagging method in machine learning. It is based on the conventional random forest approach, where split nodes are constructed by randomly selecting features. However, Extra-Trees constructs each split node of a tree by initially gathering a random number of features and then selecting the best node features using an impurity index. The impurity index (

*G*) for a node is calculated as follows:where

*u*represents the attribute of judgement for a given node and

_{i}*v*denotes the corresponding value associated with said attribute.

_{ij}*N*is the total number of training samples pertaining to the current node, with

_{S}*X*

_{left}and

*X*

_{right}denoting the sets of training samples belonging to the left and right nodes, respectively. Finally, the symbols and represent the average values of the sample target variables of the left and right children of the current node.

*D*value is evaluated, as depicted in Figure 4. Despite the divergent analytical approaches employed, it is noteworthy that the

_{mxb}*v*,

*f*, and

*S*features continue to occupy the bottom rungs of the importance hierarchy.

The selection of features was carried out through a combination of two methods, namely, Pearson's correlation analysis and Extra-Trees for feature importance assessment. The top six features that exhibited the greatest influence on the dependent variable *D _{mxb}* were selected in order of their significance, which included

*Q*,

*W*,

*D*,

_{mnc}*R*,

_{c}*M*, and

_{w}*I*. The standardized values of the six selected features were recorded as input values for Plan 1, while the features considered in existing studies including

*R*/

_{c}*B*,

*W*/

*D*, and were recorded as input values for Plan 2, and the output values for both Plan 1 and Plan 2 are calculated as

_{mnc}*D*/

_{mxb}*D*.

_{mnc}### Regression modeling theory

In this study, three regression algorithms, namely, SVR, RF Regression, and XGBoost, are utilized as training models. The computational principles of these models are presented below.

#### SVR theory

*et al.*2014; Roushangar & Koosheh 2015). The primary objective of SVR is to identify an optimal hyperplane that minimizes the distance to the sample point farthest from the hyperplane, as depicted in Figure 5.

*SVR works as follows*: Suppose there are samples

*E*= {(

*x*,

_{i}*y*)|

_{i}*i*= 1, 2, …,

*l*},

*x*ɛ

_{i}*R*,

^{n}*y*ɛ

_{i}*R*, where

*R*is an

^{n}*n*-dimensional Euclidean space and

*R*is the set of real numbers. The data

*x*in the sample set are mapped to the high-dimensional space

*F*by a nonlinear mapping

*ϕ*, and linear regression is performed in the space

*F*using Equation (6) as follows:where

*ω*is the vector of regression coefficients,

*ω*ɛ

*F*, and

*b*is the bias term. On either side of the

*f*(

*x*) function, a spacing band is created with a spacing of

*ε*, and only the support vector, i.e., samples outside the spacing band, affects the model. To determine the parameters

*ω*and

*b*, it is necessary not only to maximize the spacing

*ε*, but also to minimize the loss. The cost function of the SVR iswhere

Solve Equation (7) using the Lagrange multiplier method, leading to the pairwise form.

#### RF Regression theory

RF Regression (Breiman 2001), a parallel integrated algorithm, employs a decision tree as its base learner. Implementing the idea of bagging and random subspace method (RSM) for the randomization of sample selection and feature selection, it enhances the generalization ability of the model compared to a separate decision tree.

*K*sets of sampled training sets with the same capacity as the original training set by

*K*independent random samples with put-back from the original training set and uses these sampled training sets for training to obtain

*K*decision trees, each with

*m*(

*m*<

*M*) features randomly selected from

*M*features at the nodes as the set of splitting features at the current node. The RF Regression model selects the features and splitting points that minimize the prediction error to split, repeats this step until the decision tree cannot be split, and the final prediction result is averaged over all decision trees. The random forest computation process is shown in Figure 6.

*et al.*2016):where

*f*(

*t*

_{X}_{(xi)},

*j*) is the proportion of samples with the value

*x*belonging to leave

_{i}*j*as node

*t*. The predicted value of an observation is calculated by averaging over all the trees.

#### XGBoost theory

*n*samples,

*m*features, and

*K*trees in the model, the expression for the output of the

*i*th sample iswhere ,

*q*denotes the structure of each tree,

*T*is the number of leaf nodes, and

*w*is the weight of each leaf node in the decision tree model.

*n*is the number of samples, and represents the prediction error for

*i*samples. Ω(

*f*) is the regular term representing the complexity of the tree, and the expression is shown below:where

_{k}*γ*is the number of leaves and

*λ*is the regularization factor.

*w*and setting the derivative to zero, the optimal solution of the objective function is found as

_{j}Grid search is a method of finding the optimal model parameters by traversing a given combination of parameters. The optimal parameters for the three regression models were determined through a comprehensive grid search method, with the selected parameters listed in Table 2.

Model . | Parameter setting . | |
---|---|---|

Plan 1 . | Plan 2 . | |

SVR | C = 50 | C = 135.5 |

gamma = 0.05 | gamma = 0.05 | |

epsilon = 0.35 | epsilon = 0.3 | |

kernel = "rbf" | kernel = "rbf" | |

RF Regression | max_depth = 11 | max_depth = 5 |

n_estimators = 20 | n_estimators = 10 | |

XGBoost | max_depth = 3 | max_depth = 11 |

n_estimators = 10 | n_estimators = 40 | |

learning_rate = 0.5 | learning_rate = 0.5 |

Model . | Parameter setting . | |
---|---|---|

Plan 1 . | Plan 2 . | |

SVR | C = 50 | C = 135.5 |

gamma = 0.05 | gamma = 0.05 | |

epsilon = 0.35 | epsilon = 0.3 | |

kernel = "rbf" | kernel = "rbf" | |

RF Regression | max_depth = 11 | max_depth = 5 |

n_estimators = 20 | n_estimators = 10 | |

XGBoost | max_depth = 3 | max_depth = 11 |

n_estimators = 10 | n_estimators = 40 | |

learning_rate = 0.5 | learning_rate = 0.5 |

The evaluation of each regression model's test results was conducted using a set of commonly used indicators, and the definitions of these indicators are expounded below, where *y _{i}* represents the observed value, denotes the anticipated value of the model, and

*n*refers to the size of the sample. Smaller values of root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) indicate higher predictive accuracy of the model (Handelman

*et al.*2019; Abed

*et al.*2023).

- (1)
- (2)
- (3)
- (4)

### Stacking fusion model

The present investigation employs a Stacking approach (Pavlyshenko 2018) to formulate a fusion model that amalgamates the advantages of three distinct models, thereby enhancing the prognostication of maximum scour depth.

The pre-processed data are standardized, and the training and test sets are randomly split in an 8:2 ratio. The training set is then fed into the base learner model for training. To address the issue of small sample size, cross-validation can be utilized to expand the data as the base learner trains.

This study employs a fivefold cross-validation approach (Berrar 2019), whereby the training data are partitioned into five subsets, with four subsets used for training and one for validation. This process is repeated five times to ensure the robustness of the model. During cross-validation, the validation set is not used for training, and the prediction results on this set are used to evaluate the generalization ability of the model. The resulting predictions from each of the five models correspond to five subsets of the original training set, thereby enabling all the data to be predicted simultaneously upon the completion of cross-validation. Each base learner is trained separately and outputs its respective predictions, which are then merged to form a new feature matrix comprising 186 rows and 3 columns as the training set for the meta-learner. During each cross-validation, a prediction is obtained for the test dataset. The predictions from the five cross-validations are then averaged, and the resulting predictions from the test sets of the three models are combined into a new matrix, which serves as the test set features for the meta-learner. Following the training of the base learners, the newly generated feature matrix is fed into the meta-learner for further training.

*et al.*1986; Kwok & Yeung 1997), renowned for its formidable nonlinear modeling capabilities and the ability to accurately fit nonlinear data. In addition, the FNN model demonstrated excellent generalization aptitude. Using Plan 1 as an example, the FNN model was composed of one input layer, two hidden layers, and one output layer. The hidden layers are comprised of eight, six, and four neurons, respectively, and the topology of the FNN model is depicted in Figure 8; two functions can be performed on every node of it.

*et al.*1997):where

*h*is the number of neurons in the hidden layer,

*x*is the input value of this layer, which is the feature matrix obtained by training the base learner,

_{i}*w*and

_{ki}*w*are the connection weights between the neurons,

_{j}*b*and

_{k}*b*are the thresholds,

*f*(

*x*) is the activation function, and in this paper the sigmoid function is chosen.

*a*and

_{k}*z*are the input and output values of the hidden layer, respectively. The expression of the sigmoid function is shown below:

_{k}*y*denotes the measured value, represents the predicted value generated by the model, and

_{i}*n*signifies the sample size.

The two models, designated as Plan 1 and Plan 2, were trained independently, with their respective parameters outlined in Table 3.

Model . | Parameter setting . | |
---|---|---|

Plan 1 . | Plan 2 . | |

FNN | Activation = ‘sigmoid’ | Activation = ‘sigmoid’ |

Number of hidden layers:3 | Number of hidden layers:3 | |

Number of neurons per layer: 8, 6, 4, 1 | Number of neurons per layer: 8, 6, 4, 1 |

Model . | Parameter setting . | |
---|---|---|

Plan 1 . | Plan 2 . | |

FNN | Activation = ‘sigmoid’ | Activation = ‘sigmoid’ |

Number of hidden layers:3 | Number of hidden layers:3 | |

Number of neurons per layer: 8, 6, 4, 1 | Number of neurons per layer: 8, 6, 4, 1 |

## RESULTS AND DISCUSSION

### Results

#### Comparison of each model results

The anticipated maximum scour depths were computed individually for each base learner and the Stacking fusion model and subsequently juxtaposed with the measured maximum scour depths. The output value of the model is *D _{mxb}*/

*D*, which enables the prediction of

_{mnc}*D*and thereby the assessment of each model's performance in predicting

_{mxb}*D*. The outcome of each evaluation metric is presented in Table 4.

_{mxb}Model . | RMSE . | MAE . | MAPE . | R^{2}. | ||||
---|---|---|---|---|---|---|---|---|

Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | |

SVR | 1.1528 | 1.1650 | 0.9058 | 0.8615 | 0.1123 | 0.1111 | 0.8845 | 0.8820 |

RF Regression | 0.9896 | 1.2106 | 0.7768 | 0.9739 | 0.0927 | 0.1232 | 0.9149 | 0.8726 |

XGBoost | 1.3954 | 1.4777 | 1.0183 | 1.1226 | 0.1197 | 0.1468 | 0.8307 | 0.8102 |

Stacking model | 0.8385 | 0.9591 | 0.6647 | 0.7537 | 0.0819 | 0.0999 | 0.9389 | 0.9200 |

Model . | RMSE . | MAE . | MAPE . | R^{2}. | ||||
---|---|---|---|---|---|---|---|---|

Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | Plan 1 . | Plan 2 . | |

SVR | 1.1528 | 1.1650 | 0.9058 | 0.8615 | 0.1123 | 0.1111 | 0.8845 | 0.8820 |

RF Regression | 0.9896 | 1.2106 | 0.7768 | 0.9739 | 0.0927 | 0.1232 | 0.9149 | 0.8726 |

XGBoost | 1.3954 | 1.4777 | 1.0183 | 1.1226 | 0.1197 | 0.1468 | 0.8307 | 0.8102 |

Stacking model | 0.8385 | 0.9591 | 0.6647 | 0.7537 | 0.0819 | 0.0999 | 0.9389 | 0.9200 |

The predicted maximum scour depths for each base learner and the Stacking fusion model were calculated separately and compared with the measured maximum scour depths. The evaluation metrics used to assess the performance of each model included RMSE, MAE, MAPE, and *R*^{2}. In essence, the predictive performance of a model is deemed better as the coefficient of determination approaches 1 or lower than its RMSE, MAE, and MAPE values.

Table 4 presents the results of the evaluation metrics for all models in Plan 1, with coefficients of determination exceeding 0.8 for all models. The RF Regression model achieved the highest accuracy with an *R*^{2} value of 0.9149, followed by the SVR model with an *R*^{2} value of 0.8845, while the XGBoost model had the lowest *R*^{2} value of 0.8307. In Plan 2, the XGBoost model also had the lowest *R*^{2} value among the SVR, RF Regression, and XGBoost models.

Although overall the four evaluation metrics yield consistent results for assessing model performance, there are a few exceptions: from the RMSE and *R*^{2} metrics of the SVR model in Plan 1 and Plan 2, it can be observed that its predictive performance of Plan 1 is superior to that of Plan 2. However, its MAE and MAPE values of Plan 1 are larger than those of Plan 2, contradicting the previous conclusion. This discrepancy can be attributed to the fact that MAE and MAPE reflect the model's mean error on the sample, whereas *R*^{2} and RMSE focus more on the fit of the model to the sample variance.

The Stacking model, an amalgamation of the SVR, RF Regression, XGBoost models, along with the FNN model, was meticulously examined in comparison to the SVR, RF Regression, and XGBoost models individually. Evidently, the Stacking model exhibits superior prediction accuracy when compared to the three underlying base learner models. This is substantiated by its *R*^{2}, which attains a remarkable value of 0.9389. Notably, this surpasses the *R*^{2} value of the most proficient base learner, RF Regression, by 0.0240 and exceeds that of the least proficient base learner, XGBoost, by a substantial margin of 0.1082. Furthermore, the values of the three error metrics for the Stacking model are also comparatively smaller than those obtained from the base learners, thereby underscoring the evident optimization achieved through the employment of the Stacking model.

#### Comparison of the Stacking model and other methods

*R*

^{2}. It is worth noting that the Thorne formula necessitates adherence to the prerequisite that

*R*/

_{c}*W*exceeds 2 for the sample data. Consequently, meticulous analysis and comparison were conducted solely on the subset of 35 datasets from the test dataset that satisfied this criterion. The ensuing findings are eloquently exhibited in Figure 10. The outcome of each evaluation metric is presented in Table 5.

. | R^{2}
. | RMSE . | MAE . | MAPE . |
---|---|---|---|---|

Thorne formula | 0.8851 | 1.1900 | 0.8806 | 0.1281 |

Maynord formula | 0.9253 | 0.9594 | 0.7674 | 0.1013 |

ANN model | 0.9227 | 0.9760 | 0.8160 | 0.1122 |

Stacking model | 0.9456 | 0.8189 | 0.6475 | 0.0826 |

. | R^{2}
. | RMSE . | MAE . | MAPE . |
---|---|---|---|---|

Thorne formula | 0.8851 | 1.1900 | 0.8806 | 0.1281 |

Maynord formula | 0.9253 | 0.9594 | 0.7674 | 0.1013 |

ANN model | 0.9227 | 0.9760 | 0.8160 | 0.1122 |

Stacking model | 0.9456 | 0.8189 | 0.6475 | 0.0826 |

In the scatter plot depicting predicted values against measured values, the proximity of each data point to the *y* = *x* line indicates the similarity between the predicted and measured values. Among the three other methods, the Thorne formula exhibits the greatest deviation from the line of perfect fit, represented by *y* = *x*. Conversely, the scatter points corresponding to the ANN model and the Maynord formula exhibit closer proximity to the *y* = *x* line when compared to those generated by the Thorne formula on the whole.

The Stacking model outperforms all other methodologies across the 35 datasets, as its predicted values demonstrate the highest degree of concurrence with the measured values overall.

The presented bar chart exhibits the assessment indicators for all the different forecasting methods. Through rigorous numerical analysis, it is evident that the Stacking model surpasses all others in terms of forecasting accuracy. With *R*^{2} exceeding 0.94, this model demonstrates exceptional predictive capabilities. Following closely behind is the Maynord formula, boasting an *R*^{2} greater than 0.92 and approaching 0.93, while the ANN model, although slightly inferior, demonstrates comparable performance. Conversely, the Thorne formula fails to achieve an *R*^{2} surpassing 0.9.

When considering the three error indicators, namely, RMSE, MAE, and MAPE, it becomes apparent that the Stacking model yields the lowest value for each of these metrics among all the models and formulas. The Maynord formula, the ANN model, and the Thorne formula subsequently follow suit, albeit with marginally higher error indicators. Notably, these three error indicators align harmoniously with the aforementioned *R*^{2}, further substantiating the models' predictive efficacy.

### Discussion

The comparison between Plan 1 and Plan 2 reveals that the predictive performance of each model in Plan 1 surpasses that of Plan 2 generally, with the most significant improvement observed in the RF Regression model, where the coefficient of determination has increased by 0.0423. This could be attributed to several factors. First, the existing formulas and methods have undergone a process between the features *R _{c}*/

*B*,

*W*/

*D*, and , resulting in reduced independence of each feature. Secondly, these features may not be the most appropriate indicators of maximum scour depth, and there could be other factors that exert a greater influence on maximum scour depth, which are not taken into account. To address this issue, this study conducted a comprehensive analysis based on Pearson's correlation analysis and the Extra-Trees model and selected a few key features as independent variables, which were standardized and entered into the regression models. Notably, each regression model predicted maximum scour depth more accurately than the features selected by the existing methods.

_{mnc}Among the quartet of regression models considered in this study, namely, the Stacking model, SVR model, RF Regression model, and XGBoost model, the Stacking model exhibits the highest *R*^{2} value at an impressive 0.9389 in Plan 1 notably. This represents a significant improvement of 0.0240 compared to the leading RF Regression model among the base learners and a remarkable enhancement of 0.1082 over the weakest performing XGBoost model. Thus, it is evident that the Stacking model demonstrates superior predictive capabilities.

Both Plan 1 and Plan 2 corroborate the efficacy of employing the FNN model as the meta-learner for the Stacking model in this investigation. The Stacking model amalgamates the strengths of multiple models, yielding enhanced prediction outcomes.

Furthermore, the SVR model of Plan 1 exhibits comparable predictive performance to that of Plan 2. However, the predictive performance of the other three models surpasses that of Plan 2. Specifically, the RF Regression model, XGBoost model, and Stacking model achieved an increase in the *R*^{2} value of 0.0423, 0.0205, and 0.0189, respectively. The comparison between Plan 1 and Plan 2 demonstrates the significance of feature selection in predicting maximum scour depth. This further highlights the fact that several pivotal features, meticulously chosen through Pearson's correlation analysis and the Extra-Trees model, enable each regression model to more accurately predict the maximum scour depth in contrast to the features selected by existing prediction methods.

In comparison to the existing formulas, the Stacking model demonstrates superior predictive performance among the 35 samples. Its *R*^{2} outperforms the best-performing Maynord formula by 0.0203, surpasses the moderately performing ANN model by 0.0229, and exceeds the comparatively weakest Thorne formula by 0.0605. This conclusion emerges from a careful consideration of both intuitive observations and rigorous evaluation criteria. Notably, this exceptional performance can be attributed to the astute selection of pivotal features for analysis as well as the proficient utilization of the Stacking method, which seamlessly amalgamates the strengths inherent in multiple regression models.

## CONCLUSION

In the present investigation, a rigorous outlier detection approach utilizing *K*-means cluster analysis was employed to identify and remove outliers from a total of 230 datasets, resulting in 218 remaining datasets for further analysis. Subsequently, through meticulous consideration, six distinctive features (namely, *Q*, *W*, *D _{mnc}*,

*R*,

_{c}*M*, and

_{w}*I*) were selected from a pool of nine features (

*R*,

_{c}*M*,

_{w}*W*,

*D*,

_{mnc}*v*,

*I*,

*f*,

*Q*, and

*S*) using Pearson's correlation analysis and Extra-Trees feature importance evaluation. Notably, these chosen features differ from those examined in previous studies, which are

*R*/

_{c}*B*,

*W*/

*D*, and . To predict the maximum scour depth accurately, three regression algorithms, namely, SVR, RF Regression, and XGBoost, were employed for training and prediction purposes. To enhance the predictive accuracy, a Stacking model was constructed, with the aforementioned three models serving as base learners and the FNN model functioning as the meta-learner. The standardized values of the six selected features were recorded as input values for Plan 1, while the features considered in existing studies were recorded as input values for Plan 2, facilitating a comprehensive comparative analysis.

_{mnc}The evaluation of the training models, Plan 1 and Plan 2, involved the assessment of their prediction results using four metrics: RMSE, MAE, MAPE, and *R*^{2}. A total of 44 datasets out of the original 218 were utilized for this purpose. Furthermore, the prediction outcomes of the Stacking models, derived from both Plan 1 and Plan 2, were compared against three other models: SVR, RF Regression, and XGBoost. Among the three fundamental models, the RF Regression model performs the best in Plan 1, while the SVR model performs the best in Plan 2. For Plan 1, the value of *R*^{2} of the RF Regression model is 0.9149, and the RMSE, MAE, and MAPE values are 0.9896, 0.7768, and 0.0927, respectively. For Plan 2, the *R*^{2} value of the SVR model is 0.8820, and the RMSE, MAE, and MAPE values are 1.1650, 0.8615, and 0.1111, respectively. On the other hand, the worst-performing model is the XGBoost model, with *R*^{2} values of only 0.8307 for Plan 1 and 0.8102 for Plan 2. However, both in Plan 1 and Plan 2, the performance of these models falls short compared to the Stacking model. The Stacking model has *R*^{2} values of 0.9389 and 0.9200 for Plan 1 and Plan 2, respectively. The corresponding RMSE, MAE, and MAPE values are 0.8385 and 0.9591, 0.6647 and 0.7537, and 0.0819 and 0.0999, respectively.

Upon comparing Plan 1 and Plan 2, it was observed that the prediction results obtained from each base learner within Plan 1, as well as the Stacking model itself, outperformed those of Plan 2. This superiority can be attributed to the more comprehensive set of features considered in Plan 1, which surpassed the limited scope of features in Plan 2.

The Stacking model was compared with the other methods, and the prediction results for the 35 datasets in the test set showed that the Stacking model outperformed the other methods, with a *R*^{2} of 0.9490, while the other methods' *R*^{2} were all below 0.93. This superior performance can be attributed, in part, to the removal of outliers through *K*-means clustering analysis prior to model construction. By reducing the interference of outliers during model training, the accuracy of predictions was improved. Another factor contributing to the Stacking model's success is the careful selection of six features: *Q*, *W*, *D _{mnc}*,

*R*,

_{c}*M*, and

_{w}*I*. These features were chosen based on Pearson's correlation analysis and Extra-Trees feature importance assessment. It was found that these features have a more comprehensive impact on the maximum flush depth compared to other methods. Their inclusion in the model greatly enhanced its predictive capabilities. Furthermore, the Stacking method employed in this study incorporates the advantages of multiple regression models. This makes it particularly well-suited for predicting the maximum scour depth. By leveraging the strengths of different regression models, the Stacking method combines their merits and yields more accurate predictions. However, the model accuracy in this study has not yet reached a completely satisfactory level. The fundamental reason is that the database contains only a few hundred samples, which is not sufficient. In addition, although there are nine original features in the samples, other important factors such as the properties of the soil at river bends were not taken into account. Therefore, if there were a larger database available in the future, the method employed in this study should be able to provide more accurate predictions.

In conclusion, the combination of *K*-means clustering, feature selection, and Stacking method for predicting the maximum scour depth of bends has yielded promising results. This novel approach not only provides valuable insights into the design of concave bank protection but also offers a new avenue for future research in this field.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

**622**(3), 178–210