ABSTRACT
Weirs are designed to stabilize rivers, grade control, and raise upstream water levels. The failure of these structures is primarily due to local scour at the structural site. Consequently, an accurate estimate of the likely scour depth at the structure is critical for weir design safety and economy. This study proposes machine learning models for scour depth prediction at submerged weirs by introducing advanced gradient boosting algorithms, namely gradient boosting (GB), categorical boosting (CatBoost), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost). A database consisting of 308 cases was collected for model calibration and evaluation. The results demonstrate that the GB algorithm is very accurate, with coefficients of determination of 0.99610 and 0.96222 for the training and testing datasets, respectively. The GB model outperforms other developed models, such as support vector regression, decision tree, and ridge models, in the literature. A sensitivity analysis study has determined that the morphological jump parameter is the most significant factor, whereas the normal flow depth on the equilibrium bed slope is the least significant factor in predicting the ds under the submerged weir.
HIGHLIGHTS
Proposes machine learning (ML) models, namely gradient boosting (GB), categorical boosting (CatBoost), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost), for scour depth prediction at submerged weir.
GB model demonstrates superior performance for scour depth prediction at submerged weir.
Rigorous statistical metrics validate the robustness of ML-based predictions.
INTRODUCTION
Weirs, also known as bed sills, are river training structures that are used to raise upstream water level, stabilize the bed, and reduce flow velocity. The weir is completely submerged in the river during high-flow events, and scouring occurs both upstream and downstream of the weir. The erosive process during weir overflow may significantly increase local scour downstream and cause severe damage to these structures (Guan 2015). When designing hydraulic structures, it is essential to accurately ascertain the depth of scouring in order to ensure the proper hydraulic design of structures. Previous research has demonstrated that scouring is caused mostly by the materials used to construct river beds, the design of these beds, the timing and geometry of the pier, and the fluid characteristics. As a result, individual characteristics have different effects on scour depth (Najafzadeh et al. 2016a). Various empirical formulations for scouring depth predictions employing effective parameters such as velocity, sediment size, Froude number, and specific sediment weight have been proposed (Vanoni 2006). While empirical formulas are simple to employ, they might either underestimate or overestimate scouring depth. Due to the complexity of the scouring process, empirical formulations cannot be used to obtain a physical understanding of the process. Although numerous empirical formulations for scouring process modeling have been presented, none have achieved sufficient performance due to the interaction of non-linearity, non-stationarity, and stochasticity (Parsaie et al. 2019). As a result, a reliable and accurate physical model capable of simulating the mechanism of the scouring phenomenon has yet to be developed. As a result of these limitations, as well as the need to improve prediction capabilities, various studies have examined novel approaches for improving conventional analysis based on physical properties (Hong et al. 2012; Wei et al. 2022; Wei et al. 2023; Li et al. 2024).
Several laboratory studies were carried out in order to assess the scour depth downstream of grade-control structures. Bormann & Julien (1991) investigated the scouring depth downstream of grade-control structures using large-scale fume tests. D'Agostino & Ferro (2004) analyzed the relationship between upstream head, weir height, and scouring depth.
Artificial intelligence (AI) models have demonstrated their potential in multidisciplinary engineering applications (Li et al. 2021; Luo et al. 2022; Meng et al. 2023; Shi et al. 2023; Zhang et al. 2024). Advances in AI models in the hydraulic engineering discipline have enabled researchers to improve their accuracy when simulating specific phenomena. AI techniques have been used in hydraulic engineering to accurately estimate the scouring process of downstream or adjacent hydraulic structures such as bridge piers, ski-jump buckets, and abutments (Sharafati et al. 2021). Scouring depth prediction is a critical issue for preserving the hydraulic structures. In the literature, different efforts have been initiated for approaching the problem. For example, Sharafati et al. (2021) presented a comprehensive review of the conducted ML methods for the equilibrium scour depth prediction. Where artificial neural networks (ANN), support vector regression (SVR), and M5 tree models were the most used methods. Sharafati et al. (2020) used fuzzy neural networks and particle swarm optimization algorithms for the prediction of scour depth downstream of a sluice gate, which achieved 94% of R2 and 44% of the root mean squared error (RMSE). Furthermore, Parsaie et al. (2019) adopted the SVR algorithm for the prediction of scour depth underneath the river pipelines, which achieved good results in comparison with the ANN and adaptive neuro-fuzzy inference system (ANFIS) algorithms. Nonetheless, Rashki Ghaleh Nou et al. (2019) proposed a self-adaptive extreme learning machine for the scouring depth prediction around the submerged weirs. In which, the proposed approach showed more reliable performance results in comparison with SVR and ANN. Muzzammil et al. (2015) used gene expression programming (GEP) to calculate the scour depth at bridge piers in cohesive, soil showing improvement in prediction performance when compared to Chaudhari's empirical formulations. Najafzadeh et al. (2016a, 2016b) estimated the maximal scouring depth around bridge piers using GEP, model tree, and evolutionary polynomial regression (EPR), taking debris flow into account. According to their report, the EPR models demonstrated superior accuracy compared to the other models. Tao et al. (2021) The author put forth a theoretical framework that combines the extreme gradient boosting model with a genetic algorithm (XGBoost-GA) optimizer. The genetic algorithm (GA) is employed in a hybrid approach to address the hyperparameter optimization challenge in the XGBoost model, in addition to identifying the influential input predictors on ds. The XGBoost–GA model has been developed by incorporating fifteen physical parameters of submerged weirs. The XGBoost–GA model had an excellent degree of predictability, as evidenced by its maximum coefficient of determination (R2 = 0.933) and minimum root mean square error (RMSE = 0.014 m). This field is still being researched and investigated. The innovations and main contributions of this paper are as follows:
1. Providing an accurate and efficient gradient boosting (GB) model for predicting scour depth (ds) at submerged weirs.
2. Examining the prediction accuracy of the best GB model against that of existing machine learning (ML) models, such as SVR, linear regression (LR), decision tree (DT), and ridge models in the literature using performance metrics.
3. To check the correlations among the parameters of ds and to conduct sensitivity analysis to assess the influence of each input parameter on the ds at submerged weirs.
MATERIALS AND METHODS
Scour around the submerged weirs
Scour model sketch for a sequence of weirs in mountain streams with steep slopes.
Scour model sketch for a sequence of weirs in mountain streams with steep slopes.
Data collection and correlation analysis
A total of 308 data are used for developing the proposed models (see Supplementary file, Table S1). Data were taken from previously published studies and reflected in Guan (2015). Readers may refer to Guan (2015) for further details. Table 1 presents the mean, standard deviation, kurtosis, skewness, minimum, maximum, and count of the input and output parameters. A lower standard deviation value, such as Q, indicates that the findings are mostly close to the mean, whereas a higher standard deviation value, such as d90, shows that the results are more spread out (Sun et al. 2015; Edjabou et al. 2017; Sun et al. 2018a; Jiang et al. 2024; Liu et al. 2024; Zi et al. 2024). Skewness, which can be positive, zero, negative, or undefined, aids in determining the degree of asymmetry of a probability distribution in the case of a real-valued arbitrary parameter from the perspective of its average value (Sun et al. 2018b; Sharma & Ojha 2020; Rong et al. 2022). Additionally, according to Brown and Greene, kurtosis is usually between −10 (heavy-tailed) and +10 (light-tailed), which helps determine the form of a probability distribution (Brown & Greene 2006; Wang et al. 2017; Zhou et al. 2022; Wang et al. 2024). The kurtosis values for q and a1 are negative and range between −0.6 and −0.2 (following mesokurtic distribution), whereas the rest are positive values (following leptokurtic distribution) (Benson 1993; Lee & Ahn 2019). The correlations among different parameters in the dataset are illustrated in Table 2. The Pearson correlation coefficient (r) was used to correlate all of the investigated input and target variables. Each cell in the plot contains a correlation coefficient, quantifying the degree of association between two parameters. These coefficients range from −1 to 1, indicating the strength and direction of the correlation. From Table 2, it is clear that the parameter a1 (r = 0.854) has a very strong positive correlation and parameter hu (r = −0.004) has a very weak negative correlation with ds. In the order of strong positive to very weak negative, the relationships between input and output parameters are shown in Table 2. As a result, no parameters from the scour depth prediction under the submerged weir were removed.
Dataset parameter statistics
. | Q . | q . | B . | b . | hu . | S0 . | Seq . | a1 . | Hs . | L . | d16 . | d50 . | d90 . | SI . | ρs . | ds . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | 0.02249 | 0.03892 | 0.61968 | 0.61968 | 0.02265 | 0.05630 | 0.02320 | 0.04165 | 0.08599 | 1.71766 | 2.19605 | 7.81981 | 14.63182 | 1.81146 | 2649.22078 | 0.13495 |
Standard deviation | 0.01217 | 0.01203 | 0.47146 | 0.47146 | 0.02709 | 0.02836 | 0.01874 | 0.02414 | 0.02329 | 1.07764 | 0.88231 | 2.15049 | 8.53155 | 1.24110 | 3.87634 | 0.06857 |
Kurtosis | 8.80083 | −0.51114 | 11.01298 | 11.01298 | 9.70729 | 1.34812 | 1.51905 | −0.26140 | 2.81797 | 6.73825 | 14.40440 | 3.55293 | 4.20653 | 6.95659 | 21.06707 | −1.00049 |
Skewness | 2.56582 | −0.32630 | 3.55377 | 3.55377 | 3.28167 | 0.42015 | 0.73911 | 0.64477 | 0.77819 | 2.15831 | 3.39181 | −2.31181 | 1.89967 | 2.97103 | −4.78856 | 0.42547 |
Minimum | 0.004 | 0.007 | 0.3 | 0.3 | 0.013 | 0.006 | 0 | 0.004 | 0.025 | 0.5 | 1 | 1.8 | 2.3 | 1.15 | 2630 | 0.024 |
Maximum | 0.081 | 0.061 | 2.44 | 2.44 | 0.14 | 0.148 | 0.104 | 0.115 | 0.175 | 6.5 | 6.3 | 8.7 | 40 | 5.88 | 2650 | 0.298 |
Count | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 |
. | Q . | q . | B . | b . | hu . | S0 . | Seq . | a1 . | Hs . | L . | d16 . | d50 . | d90 . | SI . | ρs . | ds . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | 0.02249 | 0.03892 | 0.61968 | 0.61968 | 0.02265 | 0.05630 | 0.02320 | 0.04165 | 0.08599 | 1.71766 | 2.19605 | 7.81981 | 14.63182 | 1.81146 | 2649.22078 | 0.13495 |
Standard deviation | 0.01217 | 0.01203 | 0.47146 | 0.47146 | 0.02709 | 0.02836 | 0.01874 | 0.02414 | 0.02329 | 1.07764 | 0.88231 | 2.15049 | 8.53155 | 1.24110 | 3.87634 | 0.06857 |
Kurtosis | 8.80083 | −0.51114 | 11.01298 | 11.01298 | 9.70729 | 1.34812 | 1.51905 | −0.26140 | 2.81797 | 6.73825 | 14.40440 | 3.55293 | 4.20653 | 6.95659 | 21.06707 | −1.00049 |
Skewness | 2.56582 | −0.32630 | 3.55377 | 3.55377 | 3.28167 | 0.42015 | 0.73911 | 0.64477 | 0.77819 | 2.15831 | 3.39181 | −2.31181 | 1.89967 | 2.97103 | −4.78856 | 0.42547 |
Minimum | 0.004 | 0.007 | 0.3 | 0.3 | 0.013 | 0.006 | 0 | 0.004 | 0.025 | 0.5 | 1 | 1.8 | 2.3 | 1.15 | 2630 | 0.024 |
Maximum | 0.081 | 0.061 | 2.44 | 2.44 | 0.14 | 0.148 | 0.104 | 0.115 | 0.175 | 6.5 | 6.3 | 8.7 | 40 | 5.88 | 2650 | 0.298 |
Count | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 | 308 |
Correlation among different parameters in the dataset
. | Q . | q . | B . | b . | hu . | S0 . | Seq . | a1 . | Hs . | L . | d16 . | d50 . | d90 . | SI . | ρs . | ds . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Q | 1 | |||||||||||||||
q | 0.230 | 1 | ||||||||||||||
B | 0.844 | −0.292 | 1 | |||||||||||||
b | 0.844 | −0.292 | 1.000 | 1 | ||||||||||||
hu | 0.741 | −0.328 | 0.906 | 0.906 | 1 | |||||||||||
S0 | −0.315 | 0.020 | −0.329 | −0.329 | −0.482 | 1 | ||||||||||
Seq | −0.327 | −0.196 | −0.224 | −0.224 | −0.308 | 0.464 | 1 | |||||||||
a1 | 0.132 | 0.297 | −0.012 | −0.012 | −0.118 | 0.534 | 0.037 | 1 | ||||||||
Hs | 0.936 | 0.515 | 0.647 | 0.647 | 0.585 | −0.352 | −0.364 | 0.180 | 1 | |||||||
L | 0.520 | −0.229 | 0.668 | 0.668 | 0.715 | −0.586 | −0.134 | 0.024 | 0.4458 | 1 | ||||||
d16 | 0.816 | 0.048 | 0.758 | 0.758 | 0.657 | −0.376 | −0.232 | 0.027 | 0.7564 | 0.465 | 1 | |||||
d50 | 0.164 | 0.442 | −0.087 | −0.087 | −0.346 | 0.588 | 0.390 | 0.362 | 0.2377 | −0.409 | 0.210 | 1 | ||||
d90 | −0.171 | −0.182 | −0.090 | −0.090 | −0.227 | 0.778 | 0.475 | 0.186 | −0.3032 | −0.432 | −0.320 | 0.447 | 1 | |||
SI | −0.225 | −0.389 | −0.029 | −0.029 | −0.084 | 0.622 | 0.366 | 0.063 | −0.4094 | −0.295 | −0.413 | 0.098 | 0.933 | 1 | ||
ρs | 0.128 | 0.235 | 0.008 | 0.008 | 0.067 | 0.342 | 0.224 | 0.219 | 0.2301 | −0.146 | 0.159 | 0.565 | 0.292 | 0.098 | 1 | |
ds | 0.261 | 0.428 | 0.058 | 0.058 | −0.004 | 0.208 | 0.037 | 0.854 | 0.3684 | 0.251 | 0.133 | 0.249 | −0.124 | −0.235 | 0.150 | 1 |
. | Q . | q . | B . | b . | hu . | S0 . | Seq . | a1 . | Hs . | L . | d16 . | d50 . | d90 . | SI . | ρs . | ds . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Q | 1 | |||||||||||||||
q | 0.230 | 1 | ||||||||||||||
B | 0.844 | −0.292 | 1 | |||||||||||||
b | 0.844 | −0.292 | 1.000 | 1 | ||||||||||||
hu | 0.741 | −0.328 | 0.906 | 0.906 | 1 | |||||||||||
S0 | −0.315 | 0.020 | −0.329 | −0.329 | −0.482 | 1 | ||||||||||
Seq | −0.327 | −0.196 | −0.224 | −0.224 | −0.308 | 0.464 | 1 | |||||||||
a1 | 0.132 | 0.297 | −0.012 | −0.012 | −0.118 | 0.534 | 0.037 | 1 | ||||||||
Hs | 0.936 | 0.515 | 0.647 | 0.647 | 0.585 | −0.352 | −0.364 | 0.180 | 1 | |||||||
L | 0.520 | −0.229 | 0.668 | 0.668 | 0.715 | −0.586 | −0.134 | 0.024 | 0.4458 | 1 | ||||||
d16 | 0.816 | 0.048 | 0.758 | 0.758 | 0.657 | −0.376 | −0.232 | 0.027 | 0.7564 | 0.465 | 1 | |||||
d50 | 0.164 | 0.442 | −0.087 | −0.087 | −0.346 | 0.588 | 0.390 | 0.362 | 0.2377 | −0.409 | 0.210 | 1 | ||||
d90 | −0.171 | −0.182 | −0.090 | −0.090 | −0.227 | 0.778 | 0.475 | 0.186 | −0.3032 | −0.432 | −0.320 | 0.447 | 1 | |||
SI | −0.225 | −0.389 | −0.029 | −0.029 | −0.084 | 0.622 | 0.366 | 0.063 | −0.4094 | −0.295 | −0.413 | 0.098 | 0.933 | 1 | ||
ρs | 0.128 | 0.235 | 0.008 | 0.008 | 0.067 | 0.342 | 0.224 | 0.219 | 0.2301 | −0.146 | 0.159 | 0.565 | 0.292 | 0.098 | 1 | |
ds | 0.261 | 0.428 | 0.058 | 0.058 | −0.004 | 0.208 | 0.037 | 0.854 | 0.3684 | 0.251 | 0.133 | 0.249 | −0.124 | −0.235 | 0.150 | 1 |
Methodology
Extreme gradient boosting
Categorical boosting


Adaptive boosting
Gradient boosting
GB is a type of ensemble method in which multiple weak models are developed and then combined to improve overall performance. GB uses the methodology of gradient descent in order to minimize the loss function that relates to a given model. The process of incorporating weak learners into the model is carried out using an iterative approach. The ultimate prediction is established by the combined input of every weak learner, which is then determined by a gradient optimization procedure with the objective of reducing the overall error of the strong learner (Aurélien 2019; Islam & Amin 2020). The gradient component is an essential part of gradient boosters. Instead of employing the parameters of the weaker models, the gradient descent optimization approach is utilized on the output of the model. The GB approach is an improved version of the gradient descent technique that enables generalization through the modification of both the gradient and the loss function (Ngo et al. 2023).
Performance evaluation
RESULTS AND DISCUSSION
The proposed models that predict the scour depth are developed using orange software. The predictor variables were provided via an input set (x) defined by x = [Q, q, hu, d50, d16, d90, a1, S0, Seq, L, SI, b, B, ρs, and Hs], while the target variable (y) is scour depth (ds) at a submerged weir. Every modeling stage requires the selection of a suitable size of training and testing datasets. Consequently, 80% (246 cases) of the total data were employed to generate models, while the remaining 20% (62 cases) of the data were used to test the developed models in this study. The proposed models were tuned through trial and error to get optimal hyperparameter values owing to accurate prediction of scour depth (ds) at the submerged weir. This study optimizes some essential model parameters and clarifies the definitions of these hyperparameters. The tuning parameters for the models were selected and then changed during the trials until the best metrics presented in Table 3 were obtained.
Hyperparameter optimization results
Model . | Hyperparameter optimization value . |
---|---|
XGBoost | Number of trees = 20, Learning rate = 0.5, Regularization (λ) = 0.011 Limit depth of individual trees = 10 |
GB | Number of trees = 110, Learning rate = 0.1 Limit depth of individual trees = 9, Do not split subsets smaller than 5 |
AdaBoost | Number of estimators = 95, Learning rate = 1, Classification algorithm = SAMME.R, Regression loss function = Linear |
CatBoost | Number of trees = 30, Learning rate = 0.5, Regularization (λ) = 3 Limit depth of individual trees = 10 |
Model . | Hyperparameter optimization value . |
---|---|
XGBoost | Number of trees = 20, Learning rate = 0.5, Regularization (λ) = 0.011 Limit depth of individual trees = 10 |
GB | Number of trees = 110, Learning rate = 0.1 Limit depth of individual trees = 9, Do not split subsets smaller than 5 |
AdaBoost | Number of estimators = 95, Learning rate = 1, Classification algorithm = SAMME.R, Regression loss function = Linear |
CatBoost | Number of trees = 30, Learning rate = 0.5, Regularization (λ) = 3 Limit depth of individual trees = 10 |
Comparing the effectiveness of constructed models
Results of rank analysis based on performance indices
. | XGBoost . | GB . | AdaBoost . | CatBoost . | ||||
---|---|---|---|---|---|---|---|---|
Parameter . | TR . | TS . | TR . | TS . | TR . | TS . | TR . | TS . |
R2 | 0.99605 | 0.95984 | 0.99610 | 0.96222 | 0.98956 | 0.96539 | 0.98985 | 0.94567 |
Score | 3 | 2 | 4 | 3 | 1 | 4 | 2 | 1 |
RMSE | 0.00422 | 0.01442 | 0.00419 | 0.01399 | 0.00686 | 0.01339 | 0.00676 | 0.01677 |
Score | 3 | 2 | 4 | 3 | 1 | 4 | 2 | 1 |
MAE | 0.00175 | 0.01123 | 0.00142 | 0.01084 | 0.00405 | 0.01035 | 0.00499 | 0.01201 |
Score | 3 | 2 | 4 | 3 | 2 | 4 | 1 | 1 |
MAPE | 1.74427 | 12.73715 | 1.42102 | 12.81763 | 4.62190 | 10.82858 | 4.78914 | 16.03426 |
Score | 3 | 3 | 4 | 2 | 2 | 4 | 1 | 1 |
MAD | 0.00175 | 0.01123 | 0.00142 | 0.01084 | 0.00405 | 0.01035 | 0.00499 | 0.01201 |
Score | 3 | 2 | 4 | 3 | 2 | 4 | 1 | 1 |
KGE | 0.996 | 0.976 | 0.997 | 0.970 | 0.966 | 0.964 | 0.987 | 0.913 |
Score | 3 | 4 | 4 | 3 | 1 | 2 | 2 | 1 |
Sub total | 18 | 15 | 24 | 17 | 9 | 22 | 9 | 6 |
Total score | 33 | 37 | 31 | 15 | ||||
Rank | 2 | 1 | 3 | 4 |
. | XGBoost . | GB . | AdaBoost . | CatBoost . | ||||
---|---|---|---|---|---|---|---|---|
Parameter . | TR . | TS . | TR . | TS . | TR . | TS . | TR . | TS . |
R2 | 0.99605 | 0.95984 | 0.99610 | 0.96222 | 0.98956 | 0.96539 | 0.98985 | 0.94567 |
Score | 3 | 2 | 4 | 3 | 1 | 4 | 2 | 1 |
RMSE | 0.00422 | 0.01442 | 0.00419 | 0.01399 | 0.00686 | 0.01339 | 0.00676 | 0.01677 |
Score | 3 | 2 | 4 | 3 | 1 | 4 | 2 | 1 |
MAE | 0.00175 | 0.01123 | 0.00142 | 0.01084 | 0.00405 | 0.01035 | 0.00499 | 0.01201 |
Score | 3 | 2 | 4 | 3 | 2 | 4 | 1 | 1 |
MAPE | 1.74427 | 12.73715 | 1.42102 | 12.81763 | 4.62190 | 10.82858 | 4.78914 | 16.03426 |
Score | 3 | 3 | 4 | 2 | 2 | 4 | 1 | 1 |
MAD | 0.00175 | 0.01123 | 0.00142 | 0.01084 | 0.00405 | 0.01035 | 0.00499 | 0.01201 |
Score | 3 | 2 | 4 | 3 | 2 | 4 | 1 | 1 |
KGE | 0.996 | 0.976 | 0.997 | 0.970 | 0.966 | 0.964 | 0.987 | 0.913 |
Score | 3 | 4 | 4 | 3 | 1 | 2 | 2 | 1 |
Sub total | 18 | 15 | 24 | 17 | 9 | 22 | 9 | 6 |
Total score | 33 | 37 | 31 | 15 | ||||
Rank | 2 | 1 | 3 | 4 |
Note. TR, training; TS, testing.
The predicted versus actual scour depth (ds) (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost models based on the training dataset.
The predicted versus actual scour depth (ds) (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost models based on the training dataset.
The predicted versus actual scour depth (ds) (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost models based on the testing dataset.
The predicted versus actual scour depth (ds) (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost models based on the testing dataset.
Comparison of the proposed models results in the training dataset: (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost in predicting scour depth values.
Comparison of the proposed models results in the training dataset: (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost in predicting scour depth values.
Comparison of the proposed models results in the testing dataset: (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost in predicting scour depth values.
Comparison of the proposed models results in the testing dataset: (a) XGBoost, (b) GB, (c) AdaBoost, and (d) CatBoost in predicting scour depth values.
Comparison of the developed models with available ML models in literature
Comparative performance of developed models with the available models in literature.
Comparative performance of developed models with the available models in literature.
Taylor diagram
Rank analysis
The frequently employed approach to assess the performance of models is through rank analysis, which involves evaluating the models based on their performance index values. In the present methodology, a score of ‘n’ (in the context of this specific study, n = 4, representing a number of computational models) is assigned to the proposed computational model that exhibits the highest performance parameter value. Conversely, a score of 1 is assigned to the model that demonstrates the lowest performance parameter value. This scoring system is applied independently to both the training and testing outcomes. In the context of performance metrics such as RMSE, MAE, and MAPE, the model that achieves the lowest value of error parameters for both the training and testing phases is assigned the highest score (n = 4). Next, the individual scores for each model are put together to produce the models' overall score. The final score of the model is computed by summing the scores obtained from both the training (TR) and testing (TS) phases. The models are ranked in descending order based on their total scores, with the model achieving the greatest score being regarded as the most efficient overall. The details of the score analysis are presented in Table 4, where it is evident that the GB model achieved the highest total score (total score = 37), followed by XGBoost (total score = 33), AdaBoost (total score = 31), and CatBoost (total score = 15).
Sensitivity analysis






CONCLUSIONS
In this research study, ML algorithms such as XGBoost, GB, AdaBoost, and CatBoost were used to predict the ds under the submerged weir. The performance of the developed models was evaluated using statistical metrics such as R2, RMSE, MAE, MAPE, MAD, and KGE and compared to the recently developed models in literature such as XGBoost–GA, LR, and SVR. The following are the main findings based on the results:
(1) Pearson correlation coefficient showed that morphological jump (a1) (r = 0.854) has a very strong positive correlation and normal flow depth on the equilibrium bed slope (hu) (r = −0.004) has a very weak negative correlation with scour depth (ds) for the simulated data.
(2) The new proposed models, i.e. XGBoost, GB, AdaBoost, and CatBoost have the highest performance capability as compared to available XGBoost–GA, XGBoost-Grid, SVR, LR, DT, and Ridge models developed recently in literature with less variation in the actual and predicted values in terms of errors in the training and test sets.
(3) Results showed that all the R2 values of the four ML algorithms (i.e. XGBoost, GB, AdaBoost, and CatBoost) were larger than 0.90 for both the training dataset and testing dataset, indicating that the proposed approach is able to predict the ds under submerged weir with satisfactory accuracy. Furthermore, the GB model performs best with (R2 = 0.99610, RMSE = 0.00419. MAE = 0.00142, MAPE = 1.42102, KGE = 0.997, and MAD = 0.00142) for the training set. Based on the scatter plots of actual and predicted values, the GB model exhibited a better fit to the actual data, indicating that it has potential for broader applications in scour depth prediction.
(4) The sensitivity analysis indicated that the a1 has the highest degree of sensitivity index (0.961), whereas the hu has the lowest degree of sensitivity index (0.573), which contributes to the scour depth prediction.
The accuracy and reliability of predictions provided by the presented models improve when interpolation is employed, as opposed to extrapolation, owing to the use of input values. Therefore, the models should not be applied to input parameter values outside of the range specified by the study. It should be noted that the accuracy and reliability of MLalgorithms are affected by the dataset, such as the number and kind of samples. Therefore, additional samples should be collected and more effective models should be suggested in the future.
ACKNOWLEDGEMENTS
The authors are thankful to the Deanship of Graduate Studies and Scientific Research at the Najran University for funding this work under the Easy Funding Program grant code (NU/EFP/SERC/13/132).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.