Due to the numerous uncertain factors affecting contraction scour depth, although many traditional empirical formulas have been proposed in past research, their prediction accuracy is generally low. In recent years, with advancements in machine learning (ML) technology, these techniques have been able to accurately capture the nonlinear characteristics of scour-depth data. However, in pursuit of higher prediction accuracy, researchers have explored a wide range of diverse ML models that require various combinations of input parameters. These input parameter combinations often lack reliability, and the models themselves have poor interpretability, increasing the ‘black-box effect.’ Therefore, this study uses a principal component analysis (PCA)-enhanced support vector regression (SVR) model to construct a scour depth prediction model, combined with the interpretability method of SHapley Additive exPlanations (SHAP). The results show that the SVR model's predictions are highly consistent with physical experimental laws, and the model primarily identifies features that are strongly linearly correlated with the dependent variable (scour depth and SHAP values). The application of PCA enhances the correlation, and when using the CC-PCA-4 input parameter combination, the SVR model achieves high accuracy (R2 = 0.971, mean absolute percentage error = 7.54%). Moreover, its comprehensive evaluation in terms of stability, accuracy, and conservativeness surpassed that of other ML models and empirical formulas.

  • By using traditional correlation algorithms, the principal component analysis (PCA)-processed data are consistently ranked by the absolute value.

  • SHAP analysis indicates that the support vector regression model's predictions are consistent with the physical laws of contraction scour.

  • PCA enhances the linear relationship between independent and dependent variables, thereby improving the overall performance of the model.

In river flow, both artificial and natural structures, such as bridge abutments and landslides, lead to the reduced fluid cross-section area. The reduced width significantly accelerates the flow and therefore leads to increased bed shear stress in the narrowed riverbed, creating a high risk of scour or erosion. Therefore, the accurate calculation of contraction scour depth is crucial for risk assessment and the safe operation of these structures.

Straub (1934) conducted the earliest documented study on contraction scour, deriving a correlation for predicting scour depth under dynamic conditions based on the continuity equations for water flow and sediment. Subsequent expansions and modifications of this correlation were made by researchers (Ashida 1963; Laursen 1963; Komura 1966; Gill 1981; Webby 1984; Lim 1993; Lim & Cheng 1998; Raikar 2004; Briaud et al. 2005; Oliveto & Marino 2019; Nowroozpour & Ettema 2021). Researchers have proposed various criteria for classifying contraction channels. Komura (1966) defined the long contraction channel when the ratio of the contraction section length to the uncontracted channel width (L/b1) exceeds 1, whereas Raikar (2004) and Webby (1984) specified a threshold ratio of 2, above which the channel is regarded as a long contraction channel. Further research indicates that an increase in the length of contraction section reduces and stabilizes shear stress on riverbeds. Additionally, the L/b1 ratio also affects scour morphology, influencing the depth of scour (Oliveto & Marino 2019).

In hydraulic engineering, scouring is generally classified into two types: clear water scour and live-bed scour. Extensive experimental studies using flume tests under controlled laboratory conditions have established a robust set of empirical formulas. Refined by Dey & Raikar (2005), these formulas offer highly accurate predictions for maximum equilibrium scour depth. Despite these advancements, previous studies have often oversimplified the complexity of the problem in their empirical models, limiting their applicability in intricate scenarios (Le et al. 2024). Recently, Lagasse et al. (2021) conducted extensive research and evaluation on the applicability of contraction and pier scour formulas and provided a relatively rare field case study (i.e., the U.S. Highway 287 crossing Spring Creek in Fort Collins, Colorado), marking a critical step toward the practical application of these formulas.

Building on recent artificial intelligence (AI) developments, several researchers are now using machine learning (ML) or artificial neural network (ANN/Backpropagation Neural Network (BP)) approaches in this field. Najafzadeh et al. (2016) used an adaptive neuro-fuzzy inference system (ANFIS) and support vector machine (SVM) to predict scour depths, with the ANFIS observed to outperform both SVM and empirical formulas in accuracy. Sensitivity analysis identified the contraction ratio (b2/b1, detailed in Figure 1) as the key factor affecting scour depth in long contraction zones. Drawing on the success of the ANN and genetic programming in predicting scour depth at spillways and around cylindrical bridge piers (Azamathulla et al. 2008; Guven et al. 2009), Raikar et al. (2016) used these methods on non-cohesive sediments under clear water conditions, achieving high accuracy with single-layer ANN and genetic algorithm (GA)-3 models. Then, principal component analysis (PCA) was used to interpret the choice of non-dimensional parameters in the GA models. Harasti et al. (2023) employed a similar approach to analyze factors affecting scour depth in large sandy riverbeds, effectively reducing data dimensionality to pinpoint key variables. Annad et al. (2021) indicated that empirical formulas fail to generalize across all sediments due to reliance on a single calculation method. They categorized soil into seven classes by particle size and used PCA to identify optimal variables for each class.
Figure 1

(a) Schematic diagram of a long rectangular channel under equilibrium scour conditions (Ghazvinei et al. 2012; Lagasse et al. 2021). (b) Side view.

Figure 1

(a) Schematic diagram of a long rectangular channel under equilibrium scour conditions (Ghazvinei et al. 2012; Lagasse et al. 2021). (b) Side view.

Close modal

As research progresses, Najafzadeh et al. (2018) expanded the range of predictive methods by using gene expression programming (GEP), M5 Tree (MT), and evolutionary polynomial regression (EPR) to predict maximum equilibrium scour depth. Moreover, enhancing prediction accuracy while reducing the number of input variables has become central to their studies. Najafzadeh et al. (2016) conducted a variable sensitivity analysis using the ANFIS by sequentially removing variables and assessing their impact on model performance, assuming that significant decreases in prediction accuracy would indicate a high importance of the removed variables. More recently, absolute values of correlation coefficients have frequently adopted to assess the importance of variables, combining ML and ensemble techniques to generate numerous hybrid models and select the optimized model that best matches different variable input combinations. For instance, Sharafati et al. (2021) combined five metaheuristic algorithms with the ANFIS model to develop a reliable and robust prediction model that broadly reflects the relationship between input parameters and the target scour depth. The Dagging-Iterative Classifier Optimizer hybrid model by Khosravi et al. (2021) was developed by integrating the Dagging (DA) and Random Subspace (RS) algorithms with five distinct ML models. Recent advances have highlighted the significant role of ML in sediment scour prediction, particularly in estimating scour depth around bridge piers. Eini et al. (2023) provided a comprehensive summary of various ML models and proposed an XGBoost hybrid model enhanced by optimization algorithms, which markedly improved the accuracy of scour depth predictions around circular bridge piers. Furthermore, SHapley Additive exPlanation (SHAP) analysis was employed to interpret the variable importance in the model. In addition, Eini et al. (2024) advanced this approach by combining Bayesian optimization with SVM and XGBoost, achieving enhanced prediction accuracy for scour depth around different pier shapes and demonstrating the substantial potential of ML in scour prediction.

However, the inputs of the aforementioned models are based on traditional correlation measurement methods such as Pearson, Spearman, and Kendall. Constrained by the assumptions (data distribution) of these algorithms, which cause discrepancies in the ranking of variable importance and numerous models with limited interpretability. In recent years, to mitigate the ‘black-box effect’ of prediction models, researchers have adopted interpretable feature selection algorithms such as SHAP or Sobel Operator (SOBEL) analysis for ordering and combining input variables (Guo et al. 2024), given that SHAP results do not rely on any particular model. After identifying variable importance, developing models that can capture these importances becomes a new challenge. Studies on hybrid dimensionality reduction algorithms have confirmed that feature reduction significantly improves the generalization capability and computational efficiency of ML models (Laghrissi et al. 2021; Chang et al. 2022; Pandey et al. 2023). Jia et al. (2022) elaborated on common feature selection and extraction methods and their applications, including PCA, Kernel PCA (KPCA), linear discriminant analysis (LDA), non-negative matrix factorization (NMF), and t-distributed stochastic neighbor embedding (tSNE). They emphasized that during the identification and learning process, adopting appropriate, reliable, and practical feature reduction techniques is crucial for identifying data characteristics and improving accuracy (Cao et al. 2003; Anowar et al. 2021). PCA, as a linear dimensionality reduction technique, transforms high-dimensional data into a lower-dimensional space, effectively mitigating the ‘curse of dimensionality’ by reducing the number of features. Given that the experimental data are sample-based, they are expected to follow the central limit theorem and approximate a Gaussian distribution, aligning well with PCA's assumptions regarding data distribution. This makes PCA especially suitable for handling the high-dimensional, sparse data in this study, as it provides effective denoising, redundancy removal, and computational efficiency. While alternative methods like KPCA and tSNE can also perform dimensionality reduction, they have limitations in data distribution assumptions and feature interpretability, making them less ideal for this study (Jia et al. 2022). Additionally, PCA offers a scalable approach for handling additional data features that may emerge in future observational datasets. To further capture the nonlinear relationships in the data post-dimension reduction, we selected support vector regression (SVR) with a Gaussian kernel as the ML model. SVR leverages kernel functions to map data into a higher-dimensional feature space, enabling the accurate capture of nonlinear patterns with greater precision and robustness. With fewer hyperparameters, it is easier to optimize and, due to its inherent regularization, can avoid overfitting in limited samples, supporting improved generalizability when expanding the dataset in the future (Eini et al. 2024).

Therefore, we developed an interpretable scour depth prediction model based on PCA dimensionality reduction and SVR. The traditional variable importance rankings were unified using PCA algorithm, and the SVR model was optimized through Bayesian grid search and k-fold cross-validation to learn the features of data. Meanwhile, SHAP was used to reveal the patterns learned by the model from the dataset. Additionally, to comprehensively evaluate the predictive performance of the optimized model, it was compared with other literature models in terms of metrics and uncertainty analysis, and its accuracy and conservatism were compared with empirical formulas.

As shown in Figure 1, the local scour resulting from channel contraction is referred to as contraction scour. Previous experimental studies have categorized the parameters that influence scour depth into three main groups: fluid parameters, sediment characteristics, and geometric features. Consequently, the functional relationship of scour depth is expressed, as shown in Equation (1):
(1)
where is the uniform flow velocity in the uncontracted section; is the critical velocity of the sediment; ν is the kinematic viscosity of water; g is the gravitational acceleration with the value of 9.81 m/s2. denotes the standard deviation of the sediment particle size distribution. is the median particle size of the sediment; is the depth of uniform inflow; is the width of the uncontracted channel; is the width of the contracted channel; L is the length of the contraction channel section; and are the water and sediment densities, respectively.
With the recently discovered parameter – L0 that has not been previously considered, the non-dimensional (ND) form of Equation (1) derived from the Buckingham π theorem can be expressed as:
(2)
where ; ; and is the densimetric Froude number. ; ; ; and S denotes the relative density (specific gravity) of the sediment.

Table 1 summarizes the distribution of available datasets from contraction channel clear water scour experiments (Laursen 1963; Komura 1966; Gill 1981; Webby 1984; Lim 1993; Dey & Raikar 2005) in literature. The summary includes additional details not found in earlier literature, such as the contraction channel length (L) for each of the 182 experimental samples (Lagasse et al. 2021).

Table 1

The ranges of input variables in the experiments

ParametersStage
TrainingTesting
L0 1.6667-10 1.6667-10 
d0 8.75 × 10−4–0.02375 8.75 × 10−4–0.02375 
Fr0 1.1434–3.3497 1.6081–3.3497 
h0 0.0360–0.2288 0.0509–0.2277 
b0 0.25–0.7 0.25–0.7 
σg 1.065–3.6 1.08–3.6 
S 2.59–2.65 2.59–2.928 
U1/UC 0.392–1 0.517–0.974 
ds0 0.0132–0.2483 0.0267–0.2567 
ParametersStage
TrainingTesting
L0 1.6667-10 1.6667-10 
d0 8.75 × 10−4–0.02375 8.75 × 10−4–0.02375 
Fr0 1.1434–3.3497 1.6081–3.3497 
h0 0.0360–0.2288 0.0509–0.2277 
b0 0.25–0.7 0.25–0.7 
σg 1.065–3.6 1.08–3.6 
S 2.59–2.65 2.59–2.928 
U1/UC 0.392–1 0.517–0.974 
ds0 0.0132–0.2483 0.0267–0.2567 

Note: Lim's data are based on the portions listed by Dey & Raikar (2005). All data have been reviewed and confirmed to have been measured under dynamic equilibrium conditions and to be free of scale effects (Laursen 1963; Komura 1966; Gill 1981; Webby 1984; Lim 1993; Dey & Raikar 2005).

Modeling method steps

Standard practice in ML involves splitting the dataset into training and testing sets in an 80–20% ratio (Tola et al. 2023). This study employs Bayesian grid search to tune the model's hyperparameters (regularization parameter C and Kernel parameter γ, with values ranging from 0.001 to 10) (Deng et al. 2023). Considering the limitations of the dataset size, we used a five-fold cross-validation method, where one-fold is used as the validation set and the others as the training set, the model's fitting metrics are evaluated. This process is repeated until each fold has been used as the validation set once, followed by the derivation of optimal hyperparameters that will be implemented in the testing set (Deng et al. 2023). Mean squared error (MSE) is selected as the objective function (the loop takes only 600 steps), and Figure 2 illustrates the process for the model development. Additionally, SHAP analysis is adopted to emphasize the rationality of the current scour prediction model.
Figure 2

Flowchart of the methodology.

Figure 2

Flowchart of the methodology.

Close modal

Dimensionality reduction algorithms

Principal component analysis

PCA is a statistical method that uses an orthogonal transformation to convert correlated variables into linearly uncorrelated variables called principal components (Jolliffe & Cadima 2016). As a widely used data processing technique, PCA has applications across various fields of hydraulic engineering (Kim et al. 2021; Laghrissi et al. 2021). This method is also employed to reduce data dimensionality while preserving significant variability (variance) within the data. Assuming X is an n × d matrix representing n samples and d features, PCA seeks an orthogonal transformation matrix P to achieve the following transformation:
(3)
where Y is the transformed data matrix, and the column vectors of P are the eigenvectors of the covariance matrix of X.

Other dimensionality reduction algorithms

In addition to PCA, several other dimensionality reduction techniques are utilized to handle complex data structures. These include KPCA, tSNE, LDA, and NMF. (1) KPCA extends PCA to nonlinear data structures by using a Kernel trick to project data into a higher-dimensional space, enabling the capture of complex patterns (Schölkopf et al. 1997); (2) tSNE excels in visualizing high-dimensional data by preserving local relationships through probabilistic distributions, making it suitable for exploratory data analysis (Gisbrecht et al. 2015); (3) LDA differentiates itself by focusing on maximizing class separability, which is highly beneficial in supervised learning scenarios where class labels are known (Park & Park 2008); (4) NMF is particularly used in contexts like image processing and text mining, where it decomposes non-negative data into simpler, meaningful components, facilitating topic identification and feature extraction (Lee & Seung 1999).

The parameter settings for the five dimensionality reduction algorithms mentioned above are detailed in Table 2.

Table 2

Configuration of the five dimensionality reduction algorithms

PreprocessingConfigure
PCA Normalization 
Covariance matrix 
KPCA Gaussian kernel = 1 
tSNE Perplexity = 30 
Algorithm = exact 
Exaggeration = 4 
Learning rate = 200 
LDA Discrimination type = linear 
NMF Algorithm = alternating least squares 
Convergence threshold = 10−4 
PreprocessingConfigure
PCA Normalization 
Covariance matrix 
KPCA Gaussian kernel = 1 
tSNE Perplexity = 30 
Algorithm = exact 
Exaggeration = 4 
Learning rate = 200 
LDA Discrimination type = linear 
NMF Algorithm = alternating least squares 
Convergence threshold = 10−4 

Support vector regression

SVM is a supervised learning model used to solve classification problems by identifying the optimal hyperplane. SVR, a variant of SVM, effectively addresses nonlinear regression issues (Goel & Pal 2009; Goel 2015; Najafzadeh et al. 2016). Its goal is to derive a linear support vector regression function that approximates target values, where w represents the weight vector and b is the bias vector. After transformation via the Lagrange dual method, the final model configuration is as follows:
(4)
where and denote the Lagrange multipliers; denotes radial basis function (RBF) = ; the epsilon parameter is set to 0.0001.

SHAP analysis

SHAP is an advanced model interpretation method used to address the ‘black-box effect’ in ML models. It provides an explanation for model decisions by calculating the average marginal contribution of each feature to the model prediction, known as the Shapley value. The sign of the Shapley value (positive or negative) indicates the direction in which the feature affects the prediction result, while the magnitude of the absolute value reflects the importance of the feature: the larger the absolute value, the more significant the influence of the feature, and thus the more important the feature is considered. The application of SHAP has expanded to various technical and engineering fields, including the scour domain (Kim et al. 2024), demonstrating its broad applicability and practical value.

Model evaluation

To evaluate the model performances, eight evaluation metrics are used: coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), mean absolute percentage error (MAPE), Pearson's correlation coefficient (CC) (Kim et al. 2024; Kumar et al. 2024).

Correlation ranking and interpretation

Table 3 summarizes the relative sensitivity of the scour depth for different input variables derived based on different criteria in the literature. Raikar et al. (2016) used PCA, selecting the factors with the largest proportion of principal components (PCs) as representative variables to explain the results of the model (e.g., PC1 = F1ec). However, since each PC is composed of a linear combination of all original dimensionless quantities, it is difficult to distinguish the dominant variables. Najafzadeh et al. (2016) evaluated the sensitivity of the results to the input variables by sequentially excluding each variable from the ANFIS model ensemble. A highly important variable is recognized as its removal leads to a significant drop in model prediction accuracy. Although this method provides a preliminary assessment, the mechanisms by which variables influence the outcomes remain unexplained, and the interpretability is highly dependent on the selected ML model. Though not explicitly stated, it is inferred that Sharafati et al. (2021) used the Pearson correlation method (Table 3 – current investigation before PCA) and observed that the shrinkage ratio b0 is the most important factor, exhibiting a negative correlation with the results. This observation is consistent with previous studies, i.e., the larger the b0 = b2/b1, the less concentrated the flow and the smaller the scouring capacity. Khosravi et al. (2021) also found that b0 is negatively correlated with the scour depth, but their results showed that σg is positively correlated with the scour depth, contrary to the known ‘shielding effect’ theory, which suggests that an increase in the sediment non-uniformity coefficient should reduce scour depth. Previous studies have consistently found that h0 is always positively correlated with scour depth, in accordance with experiments suggesting that larger h0 values are more likely to generate large-scale vortex structures, thereby increasing scour depth. However, the importance of h0 varies across different studies, and no uniform significance has been assigned. This variation could be attributed to changes in h0 during actual scour processes, as noted by Lagasse et al. (2021).

Table 3

Ranking of different correlation coefficient algorithms between variables and scour depth

Previous research
Khosravi et al. (2021)  Fr0 b0 h0 σg U1/UC d0 – – 
CC −0.624 −0.578 0.563 0.331 0.33 0.292 – – 
Sharafati et al. (2021)  b0 h0 σg U1/UC d0 Fr0 S – 
CC −0.578 0.347 −0.271 0.263 0.259 0.124 – 
Najafzadeh et al. (2016)  h0 d0 b0 σg Fr0 U1/UC – – 
Sensitivity by ANFIS 0.6 0.84 0.84 0.85 0.85 0.91 – – 
Raikar et al. (2016)  F1ec(PC1) d0 (PC2) b0 (PC3) h0 (PC4) Fr0 (PC5) – – – 
Variance by PCA 38.7% 28.1% 22.7% 8.3% 2.2% – – – 
Current investigation before PCA
Parameter b0 h0 σg U1/UC d0 S Fr0 L0 
CC of Pearson −0.578 0.343 −0.329 0.279 0.26 −0.229 0.125 0.117 
Parameter b0 σg d0 L0 h0 S U1/UC Fr0 
CC of Spearman −0.615 −0.308 0.299 −0.293 0.288 −0.25 0.158 0.062 
Parameter b0 L0 σg d0 h0 S U1/UC Fr0 
CC of Kendall −0.484 −0.239 −0.226 0.214 0.204 −0.203 0.103 0.039 
SHAP b0 h0 σg Fr0 d0 U1/UC S L0 
SHAP values 0.029 0.016 0.012 0.009 0.008 0.004 0.003 0.002 
Current investigation after PCA
Parameter PC3 PC5 PC4 PC8 PC7 PC2 PC6 PC1 
CC of Pearson 0.649 0.628 0.128 0.099 0.078 0.072 0.049 0.047 
CC of Spearman 0.705 0.631 0.183 0.174 0.098 0.079 0.063 0.029 
CC of Kendall 0.523 0.458 0.161 0.136 0.063 0.051 0.046 0.031 
Parameter PC5 PC3 PC1 PC4 PC6 PC2 PC8 PC7 
SHAP values 0.031 0.029 0.007 0.005 0.004 0.004 0.003 0.001 
Parameter PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 
Variance by PCA 33.1% 27.5% 14.5% 11.4% 8.3% 4.9% 0.2% 0.1% 
Previous research
Khosravi et al. (2021)  Fr0 b0 h0 σg U1/UC d0 – – 
CC −0.624 −0.578 0.563 0.331 0.33 0.292 – – 
Sharafati et al. (2021)  b0 h0 σg U1/UC d0 Fr0 S – 
CC −0.578 0.347 −0.271 0.263 0.259 0.124 – 
Najafzadeh et al. (2016)  h0 d0 b0 σg Fr0 U1/UC – – 
Sensitivity by ANFIS 0.6 0.84 0.84 0.85 0.85 0.91 – – 
Raikar et al. (2016)  F1ec(PC1) d0 (PC2) b0 (PC3) h0 (PC4) Fr0 (PC5) – – – 
Variance by PCA 38.7% 28.1% 22.7% 8.3% 2.2% – – – 
Current investigation before PCA
Parameter b0 h0 σg U1/UC d0 S Fr0 L0 
CC of Pearson −0.578 0.343 −0.329 0.279 0.26 −0.229 0.125 0.117 
Parameter b0 σg d0 L0 h0 S U1/UC Fr0 
CC of Spearman −0.615 −0.308 0.299 −0.293 0.288 −0.25 0.158 0.062 
Parameter b0 L0 σg d0 h0 S U1/UC Fr0 
CC of Kendall −0.484 −0.239 −0.226 0.214 0.204 −0.203 0.103 0.039 
SHAP b0 h0 σg Fr0 d0 U1/UC S L0 
SHAP values 0.029 0.016 0.012 0.009 0.008 0.004 0.003 0.002 
Current investigation after PCA
Parameter PC3 PC5 PC4 PC8 PC7 PC2 PC6 PC1 
CC of Pearson 0.649 0.628 0.128 0.099 0.078 0.072 0.049 0.047 
CC of Spearman 0.705 0.631 0.183 0.174 0.098 0.079 0.063 0.029 
CC of Kendall 0.523 0.458 0.161 0.136 0.063 0.051 0.046 0.031 
Parameter PC5 PC3 PC1 PC4 PC6 PC2 PC8 PC7 
SHAP values 0.031 0.029 0.007 0.005 0.004 0.004 0.003 0.001 
Parameter PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 
Variance by PCA 33.1% 27.5% 14.5% 11.4% 8.3% 4.9% 0.2% 0.1% 

Given the discrepancy in ranking order and the lack of interpretability, which might increase the difficulty of matching with appropriate ML models, it is necessary to carefully review the consistency of different commonly used variable correlation algorithms in identifying variable importance before inputting them into ML models. In the ‘current investigation before PCA’ section of Table 3, the relative significances indicated by the correlation coefficients of different input variables derived from Pearson, Spearman, and Kendall algorithms are listed. Except for b0, which is consistently observed to posses a negative correlation coefficient with the highest magnitude. The relative significances of the remaining variables vary among different models. This phenomenon may stem from the assumptions of traditional Pearson, Spearman, and Kendall algorithms, which, respectively, require linear relationships, monotonic relationships, and association consistency. These assumptions are difficult to satisfy for datasets with small sample sizes.

Therefore, this study adopted the more interpretable SHAP variable importance ranking (Deng et al. 2023), which, after tuning on the training set, identifies the weights of input parameters through SVR. The larger the weight (i.e., SHAP value), the more important the variable is. The results (Table 3 – current investigation before PCA) show that the importance of b0, h0, and σg is consistent with the results of Pearson, indicating that SVR tends to capture linear relationships (even when using the RBF kernel). To further verify linear correlations, it is necessary to select algorithms that can enhance the correlation between input variables and the scour depth and ensure that the independent variable data are processed to remove noise as much as possible for subsequent analysis. Considering all factors, copula entropy, which does not require any data distribution assumptions, was selected to evaluate the performance of dimensionality reduction algorithms to measure the enhancement effect of correlation. According to Ma & Sun (2011), the larger the copula entropy, the stronger the correlation between independent and dependent variables. The copula entropies of different dimensionality reduction algorithms are illustrated in Figure 3. When no dimensionality reduction is performed (ND), the total copula entropy of all dimensionless variables is only 0.872. At the same time after PCA processing, the correlation is enhanced to 6.609, indicating a 7.58-fold increase, which is the highest among the five dimensionality reduction algorithms. Therefore, PCA will be used for data processing subsequently.
Figure 3

Copula entropy of different dimensionality reduction algorithms.

Figure 3

Copula entropy of different dimensionality reduction algorithms.

Close modal

As shown in the ‘Current investigation after PCA’ section of Table 3, after PCA processing, the importance rankings of traditional correlation algorithms have been unified, with the importance of PCs decreasing in the order of PC3, PC5, PC4, PC8, PC7, PC2, PC6, and PC1. Although the SHAP rankings of PCs show slight inconsistencies, three of the top four rankings are consistent with traditional algorithms, which might be due to the use of the RBF kernel in SVR.

To further explore the impact of various variables on scour depth, this study utilizes SHAP visualization analysis to process data with and without PCA (as shown in Figure 4). As shown in Figure 4(b), the summary plot displays the SHAP values of each feature across all samples. Unlike traditional variable importance measures (such as the Pearson coefficient), which only provide a correlation coefficient, the game theory-based SHAP analysis more accurately quantifies each feature's contribution to the prediction of scour depth (with red to blue indicating actual feature values from high to low). According to the average SHAP values, the feature importance ranks from the highest to the lowest as follows: b0 (0.029) > h0 (0.016) > σg (0.012) > Fr0 (0.009) > d0 (0.007) > U1/UC (0.004) > S (0.003) > L0 (0.002), indicating that b0 is the most important feature, while L0 has the lowest importance, with SHAP values of 0.029 and 0.002, respectively. As the contraction ratio b0 (b2/b1) decreases, the unit width flow increases, leading to more concentrated energy and a significantly enhanced impact on scour depth, which is consistent with the findings of Nowroozpour & Ettema (2021). When L0 increases from 2 to 4, its inhibitory effect on scour increases, possibly due to the transfer of control section water depth from h3 to h2 (h2 < h3) in longer contractions (Lagasse et al. 2021). However, the data in the range of 4–10 are insufficient, and the scattered distribution near 10 makes it difficult to conclude, necessitating further experimental validation. This observation is consistent with the physical interpretation provided in the first half of Section 4.1.
Figure 4

SHAP analysis. (a) Dependence plot for each physical input feature in the SVR model without PCA processing. (b) SHAP summary plot based on the SVR, showing the importance ranking of each physical feature. (c) SHAP summary plot showing the importance ranking of PCs derived from PCA-processed input features. (d) Dependence plot for the PCs in the SVR model after PCA processing of input features. All plots are based on training set data.

Figure 4

SHAP analysis. (a) Dependence plot for each physical input feature in the SVR model without PCA processing. (b) SHAP summary plot based on the SVR, showing the importance ranking of each physical feature. (c) SHAP summary plot showing the importance ranking of PCs derived from PCA-processed input features. (d) Dependence plot for the PCs in the SVR model after PCA processing of input features. All plots are based on training set data.

Close modal

For h0, its SHAP value nearly linearly increases from −0.04 to 0.02 as the feature values increase, which aligns with observed differences in wall shear stress contributions between shallow water effects at lower depths and the formation of large-scale vortices, such as horseshoe or necklace vortices, at greater depths. The development of these vortex structures is a significant factor in increasing scour depth (Eini et al. 2024), and current data indicate that this trend shows no signs of diminishing. The increase in particle gradation σg stabilizes its impact on scour depth, reducing scour depth, which aligns with actual physical processes. The impact of Fr0 on scour depth is similar to that of h0, with a linear increase, following the mechanism described by Raikar et al. (2016) using excess Froude number (F1ec), where Froude number comprises the difference between the contraction flow velocity and the critical sediment entrainment velocity. When this value exceeds 0, it significantly increases scour depth; however, current data only involve clear water scour, and more data are required for live-bed scour. The slight decrease of d0 near zero is consistent with Dey & Raikar's (2005) experimental findings, which are due to variations in the bed shear stress requirements caused by the transitional properties of the Shields curve. For gravel, in order to maintain the flow velocity U1 close to UC, U1 was significantly increased (thus increasing U1/UC), greatly enhancing bed shear stress in the contraction zone and thereby increasing scour potential, leading ds0 to increase at a constant rate. Although the sample size for density ratio S is limited, its negative correlation with scour depth aligns with experimental laws. Before PCA processing, SHAP visualization analysis reveals significant linear relationships between the top-ranked variables such as b0 (CC = −0.933), h0 (CC = 0.952), and σg (CC = −0.952) and the SHAP values of the dependent variable (Figure 4(a)), with the trend of influence consistent with the physical interpretation (Section 4.1). In contrast, lower-ranked variables S and L0 did not show clear linear relationships. (Previous research indicates that high S values reduce sediment scour, but most natural sediments cluster around an S value of 2.65, suggesting a need for more diverse data.) Additionally, longer L values may alter flow control from downstream to upstream, a hypothesis that requires further validation with larger datasets (Lagasse et al. 2021). The above observations indicate a high correlation between the top-ranked variables identified by traditional correlation algorithms and their SHAP values in the ‘current investigation after PCA’ section of Table 3.

To further prove this point, when the data are processed by PCA (as shown in Figure 4(d)), SHAP visualization analysis shows that as variables such as PC3 (CC = 0.977) and PC5 (CC = 0.994) increased, the corresponding SHAP values exhibit significant linear growth. These variables consistently ranked high in traditional algorithms, indicating a strong linear contribution relationship between PC3, PC5, and scour depth. This indirectly confirms that the PCA method's effect of linearizing data features make it easier for SVR to identify these features. To directly verify this point, Section 4.2 will input variables based on correlation rankings to explore their specific impact on prediction accuracy.

Performance of the model

Although it has been determined that SVR tends to identify the linear relationships between independent variables and SHAP values, the specific impact of the number of input variables on prediction accuracy remains unknown. Therefore, this study designed four schemes (combination of input variables) for the ‘current investigation after PCA’ section of Table 3: Scheme 1 (CC-PCA) uses traditionally correlated PCA-processed data, namely Pearson, Spearman, and Kendall; Scheme 2 (SHAP-PCA) is based on SHAP's importance analysis ranking of PCA-processed data; Scheme 3 (cumulative-PCA) ranks based on the PCA cumulative variance contribution rate; Scheme 4 (SHAP) ranks the importance of the original data without any dimensionality reduction using SHAP. Starting from the prediction based on the most significant variable, the remaining variables are accumulatively fed into the model, while training and testing are carried out after the inclusion of each variable and the value of R2 is plotted in Figure 5(a). It can be found that under traditional correlation algorithms, the PCA-processed data achieve the highest prediction accuracy on the test set at R2 = 0.971 when the number of PCs accumulates to 4 (PC3, PC5, PC4, and PC8). However, as the number of inputs increases, the accuracy significantly decreases. This indicates that adding the subsequent variable factors that show a weak linear correlation with the output (ds0) only increases the complexity of the data sample set without improving accuracy while the SVR model is implemented. When SHAP reorders the PCA-processed data, the highest R2 = 0.926 is also achieved when the number of inputs is 4 (PC5, PC3, PC1, and PC4) (Figure 5(b)). This reflects that SVR has a strong ability to capture linear correlation because SHAP's importance identification relies more on the size of SHAP values. However, excessively high SHAP values for several sample points may increase the overall importance of these variables, as shown in the charts of PC1 in Figure 4(c) and 4(d), where 11 discrete high-value data points stand out prominently.
Figure 5

R2 values of different input component numbers. (a)–(d) are Schemes 1–4. t, training set; T, testing set.

Figure 5

R2 values of different input component numbers. (a)–(d) are Schemes 1–4. t, training set; T, testing set.

Close modal

The method of cumulatively inputting variables ranked by the conventional PCA principal component variance contribution rate did not yield ideal results (as shown in Figure 5(c)), reaching the maximum R2 = 0.936 when five PCs were input. However, adding more PCs has a negligible effect on the accuracy. This is mainly due to that PC3, PC4, and PC5 have been included in the first five PCs.

If PCA is not performed and the dimensionless variables ranked by SHAP analysis are directly input in order of importance, an R2 = 0.955 can be achieved with only b0, h0, and σg taken as inputs. However, as more variables are added, the accuracy slightly decreases, confirming the physical experimental laws that adding more physical variables does not necessarily lead to better results (e.g., ignoring the Reynolds number in dimensionless analysis because it is not considered in turbulent flows). Most studies also overlook d0 and U1/UC when fitting data using empirical formulas, as outlined in Section 4.3. Nevertheless, it can at least enhance the stability of SVR predictions. Moreover, the SHAP interpretability analysis method's SHAP value distribution for scour depth is consistent with the conventional theoretical knowledge of scour depth influencing factors, indicating that the SVR model's predictions are reasonable (the Laursen formula in Section 4.3. also uses b0 and h0 variables).

To pursue higher accuracy and stability, a more detailed analysis was conducted on the four models with the highest R2 in Figure 5. Figure 6 shows the scatter plots comparing the predicted values and actual values of these models. The CC-PCA-4 model inputs show good consistency (Figure 6(a)), while the SHAP-PCA-4 model shows a more prominent underestimation (Figure 6(b)). The cumulative-PCA-5 model exhibits slightly lower accuracy (R2 = 0.936, Figure 6(c)). Although the top three variables ranked by the SHAP method perform well on the test set (R2 = 0.955), the discrepancy with the training set (R2 = 0.904) might lead to instability, as shown in Figure 6(d).
Figure 6

Scatter plots of predicted vs. observed scour depth for the four optimal ML models. (a)–(d) are Schemes 1–4.

Figure 6

Scatter plots of predicted vs. observed scour depth for the four optimal ML models. (a)–(d) are Schemes 1–4.

Close modal
To evaluate the reasonableness of the model's prediction distribution and its closeness to the actual distribution, Figure 7 shows the box plots comparing the predicted and experimentally observed results. In Figure 7(b), the SVR model with CC-PCA-4 inputs predicts maximum, mean, and median values closest to the actual values. Considering the principle that it is better to overestimate rather than underestimate scour depth (Khosravi et al. 2021), only the CC-PCA-4 model meets this criterion.
Figure 7

Box plots comparing the SVR model with four different input feature combination schemes to the observed data distribution on the training and testing sets. (a) Training set and (b) testing set.

Figure 7

Box plots comparing the SVR model with four different input feature combination schemes to the observed data distribution on the training and testing sets. (a) Training set and (b) testing set.

Close modal

Further uncertainty analysis (see Table 4) revealed that all models tend to underestimate (negative MBE). Additionally, the standard deviation is defined as where is the predicted value and is the observed value, and the 95% confidence interval (95% CI) is calculated to provide precise estimates (Ebtehaj et al. 2017). The results show that the CC-PCA-4 model has the smallest and narrowest uncertainty interval ([−0.01816, 0.01483]), further confirming its superior predictive performance.

Table 4

Uncertainty analysis parameters for various inputs of the SVR model

Model inputMBESeBandwidth95%CI
CC-PCA-4 −0.00167 0.008415814 0.032989 [−0.01816, 0.01483] 
SHAP-PCA-4 −0.00639 0.008822983 0.034585 [−0.02368, 0.01091] 
Cumulative-5 −0.00458 0.009129544 0.035787 [−0.02247, 0.01331] 
SHAP-3 −0.00112 0.008843427 0.034666 [−0.01845, 0.01622] 
Model inputMBESeBandwidth95%CI
CC-PCA-4 −0.00167 0.008415814 0.032989 [−0.01816, 0.01483] 
SHAP-PCA-4 −0.00639 0.008822983 0.034585 [−0.02368, 0.01091] 
Cumulative-5 −0.00458 0.009129544 0.035787 [−0.02247, 0.01331] 
SHAP-3 −0.00112 0.008843427 0.034666 [−0.01845, 0.01622] 

To provide a more comprehensive evaluation of the model, we have added a sensitivity analysis. The sensitivity of the input variables to the output was analyzed using the best SVR model. Multiple training datasets were generated by removing each input variable one by one, and results were evaluated based on the R2 metric. Findings in Table 5 indicate that b0 plays a critical role in predicting scour depth compared to other input variables, as it yielded the lowest R2 on the test set. For σg and h0; however, the sensitivity ranking differs slightly from the importance derived through SHAP analysis. This is understandable, as both variables are similarly close in terms of sensitivity and importance. While SHAP values measure each feature's marginal contribution to the model output, eliminating a feature during hyperparameter optimization changes the interactions among the remaining features. These complex interactions may lead to different sensitivity analysis results. For the other variables, the sensitivity ranking is consistent with the SHAP analysis (Figure 4(b)).

Table 5

Sensitivity analysis study utilizing SVR

Input combinationsParameters to be removedR2 (training)R2 (testing)
h0, σg, Fr0, d0, U1/UC, S, L0 b0 0.4656 0.2857 
b0, σg, Fr0, d0, U1/UC, S, L0 h0 0.9721 0.8267 
b0, h0, Fr0, d0, U1/UC, S, L0 σg 0.8355 0.8191 
b0, h0, σg, d0, U1/UC, S, L0 Fr0 0.9050 0.8480 
b0, h0, σg, Fr0, U1/UC, S, L0 d0 0.9429 0.8847 
b0, h0, σg, Fr0, d0, S, L0 U1/UC 0.9696 0.9218 
b0, h0, σg, Fr0, d0, U1/UC, L0 S 0.9803 0.9259 
b0, h0, σg, Fr0, d0, U1/UC, S L0 0.9843 0.9524 
Input combinationsParameters to be removedR2 (training)R2 (testing)
h0, σg, Fr0, d0, U1/UC, S, L0 b0 0.4656 0.2857 
b0, σg, Fr0, d0, U1/UC, S, L0 h0 0.9721 0.8267 
b0, h0, Fr0, d0, U1/UC, S, L0 σg 0.8355 0.8191 
b0, h0, σg, d0, U1/UC, S, L0 Fr0 0.9050 0.8480 
b0, h0, σg, Fr0, U1/UC, S, L0 d0 0.9429 0.8847 
b0, h0, σg, Fr0, d0, S, L0 U1/UC 0.9696 0.9218 
b0, h0, σg, Fr0, d0, U1/UC, L0 S 0.9803 0.9259 
b0, h0, σg, Fr0, d0, U1/UC, S L0 0.9843 0.9524 
Table 6

Statistical parameters of the testing dataset for difference models of previous works and the current study

ReferenceModelCCR2RMSEMAPE (%)
Current CC-PCA-4-SVR 0.9863 0.971 0.0104 7.54 
CC-PCA-6-BP 0.9368 0.8709 0.0181 18.61 
CC-PCA-7-GA-BP 0.9475 0.8957 0.0162 11.2 
CC-PCA-5-PSO-BP 0.9402 0.8808 0.0174 12.74 
CC-PCA-6-ELM 0.9524 0.905 0.0155 14.57 
CC-PCA-6-RBF 0.9666 0.9286 0.0134 11.11 
Khosravi et al. (2021)  DA-ICO-4 0.95 – 0.01 – 
Sharafati et al. (2021)  ANFIS-BBO-M2 0.9619 – 0.0158 – 
Najafzadeh et al. (2018)  EPR 0.9030 0.903 0.0263 – 
Najafzadeh et al. (2016)  ANFIS 0.8900 – 0.0281 27.54 
Raikar et al. (2016)  GA-3 – 0.955 – 13.2 
Laursen (1963)  Equation (50.82 0.6724 0.0211 27.58 
Komura (1966)  Equation (60.7944 0.631 0.0889 141.09 
Gill (1981)  Equation (70.8226 0.6766 0.123 189.44 
Lim (1993)  Equation (80.8295 0.6881 0.0937 144.94 
ReferenceModelCCR2RMSEMAPE (%)
Current CC-PCA-4-SVR 0.9863 0.971 0.0104 7.54 
CC-PCA-6-BP 0.9368 0.8709 0.0181 18.61 
CC-PCA-7-GA-BP 0.9475 0.8957 0.0162 11.2 
CC-PCA-5-PSO-BP 0.9402 0.8808 0.0174 12.74 
CC-PCA-6-ELM 0.9524 0.905 0.0155 14.57 
CC-PCA-6-RBF 0.9666 0.9286 0.0134 11.11 
Khosravi et al. (2021)  DA-ICO-4 0.95 – 0.01 – 
Sharafati et al. (2021)  ANFIS-BBO-M2 0.9619 – 0.0158 – 
Najafzadeh et al. (2018)  EPR 0.9030 0.903 0.0263 – 
Najafzadeh et al. (2016)  ANFIS 0.8900 – 0.0281 27.54 
Raikar et al. (2016)  GA-3 – 0.955 – 13.2 
Laursen (1963)  Equation (50.82 0.6724 0.0211 27.58 
Komura (1966)  Equation (60.7944 0.631 0.0889 141.09 
Gill (1981)  Equation (70.8226 0.6766 0.123 189.44 
Lim (1993)  Equation (80.8295 0.6881 0.0937 144.94 

The bold values emphasize that this model is optimal.

Comparison with previous studies

Khosravi et al. (2021) evaluated five models – isotonic regression (ISOR), sequential minimal optimization (SMO), iterative classifier optimizer (ICO), locally weighted learning (LWL), and least median of squares regression (LMS) – combined with the DA and RS optimization algorithms, finding that the DA-ICO-4 model performed best. Sharafati et al. (2021) applied five optimization algorithms – ant colony optimization (ACO), biogeographic-based optimization (BBO), GA, invasive weeds optimization (IWO), and teaching–learning-based optimization (TLBO) – to optimize the ANFIS model, with ANFIS-BBO-M2 showing the best performance. Najafzadeh et al. (2016) used a base version of ANFIS, which outperformed a simple hyperparameter-tuned SVM (CC = 0.88); they subsequently applied GEP (CC = 0.89), MT (CC = 0.874), and EPR (CC = 0.903), with EPR yielding relatively better results. Raikar et al. (2016) tested ANN (BP) models with single and double hidden layers, as well as GA. Pursuing a lower MSE, the single-layer network (4-16-1 structure) achieved an R2 of only 0.808, inferior to the GA's R2 of 0.955.

To further verify the superiority of the models in this study, we added five additional ML models commonly used in pier scour, downstream scour, and other sediments scour fields: BP, particle swarm optimization (PSO)-BP, GA-BP, RBF, and extreme learning machine (ELM) (Ebtehaj et al. 2018; Riahi-Madvar et al. 2019; Guguloth et al. 2024). Following the approach of Raikar et al. (2016), we set a single hidden layer and tuned only the number of units within this layer as a hyperparameter. Detailed hyperparameter settings are provided in the Appendix (Tables A and B), and the training, validation, and testing process is consistent with that shown in Figure 2. Table 6 presents a comparison between the optimal models of this study, the previously best-performing models, and other benchmarks. Each baseline model underwent a comparison across the four schemes shown in Figure 5, and the best-performing model was selected for result evaluation. Table 5 also lists the performance metrics of commonly used empirical formulas and other ML models. The CC-PCA-4-SVR model performs the best in terms of correlation coefficient (CC = 0.9863) and mean absolute percentage error (MAPE = 7.54%). The other five benchmark models do not perform as well as SVR. Interestingly, RBF outperforms ANN, indirectly confirming that using a Gaussian kernel under the central limit theorem assumption is appropriate and feasible. When using RMSE as the evaluation criterion, it reduces the error by 4% compared to Khosravi et al. (2021), 52% compared to Sharafati et al. (2021), 153% compared to Najafzadeh et al. (2018), 170% compared to Najafzadeh et al. (2016), 103% compared to Laursen (1963) (Equation (5)), 755% compared to Komura (1966) (Equation (6)), 1,083% compared to Gill (1981) (Equation (7)), and 800% compared to Lim (1993) (Equation (8)). Although Raikar et al. (2016) did not report RMSE, their MAPE is 1.75 times higher than the current model.
(5)
(6)
(7)
(8)
where .
While the CC-PCA-4 input combination with the SVR model produces results close to the best line with minimal error and outperforms other models in the literature, it lacks conservativeness (see the MBE metric in Table 4). This can be improved through scaling adjustments (Shahriar et al. 2022). Ensuring that structures do not fail due to scour is a core step, guaranteeing sufficient safety redundancy. By defining a safety factor k and multiplying it by the predicted value, MAPE is calculated by comparing it to the actual value, and the proportion of cases where the scaled predicted value minus the actual value is positive is defined as ‘conservatism’ (range: 0–100%), as shown in Figure 8. As k increases from 1 to 2.5, MAPE increases slowly. Except for the Laursen (1963) formula, other empirical formulas have errors exceeding 100%, reaching this value only when k = 2. Therefore, in terms of accuracy, Laursen's (1963) empirical formula is more suitable among all formulas. For conservativeness, the models of Komura (1966), Gill (1981), and Lim (1993) show very high conservativeness, up to 100%, but in engineering applications, a balance between safety and economic efficiency is desired. When k = 1.2, the prediction error of the CC-PCA-4-SVR model is only 22.1%, with conservatism around 95% (94.6%), which is a commonly used reliability threshold in engineering, while Laursen's (1963) formula's conservativeness is only 56.76%, lacking necessary safety redundancy. Therefore, k = 1.2 can be selected in the current model striking a balance between accuracy and safety design.
Figure 8

Comparison of prediction error and conservatism between the SVR with CC-PCA-4 inputs and empirical formulas.

Figure 8

Comparison of prediction error and conservatism between the SVR with CC-PCA-4 inputs and empirical formulas.

Close modal

This research proposes a PCA-enhanced SVR model to predict contraction scour depth. Around 182 laboratory samples were used to construct the model. SHAP analysis was used to explain the prediction mechanism of the model, and its effectiveness was evaluated by comparison with a priori knowledge. The use of PCA allows the ML model to achieve high accuracy without the need for largely extensive searches for input variable matching. Finally, the model's predictive performance was validated by comparing it with existing models and empirical formulas from the aspects of uncertainty, predictive distribution, accuracy, and conservatism. Overall, this study validates the feasibility of using the interpretable ML approach to predict scour depth and provides an explanation and comprehensive evaluation of its mechanism. The detailed conclusions of this study are as follows:

  • 1. SHAP analysis shows that the SVR model tends to identify variables that are strongly linearly correlated with scour depth and SHAP values. The identified key variables, such as b0, h0, and σg, have influence trends consistent with the physical experimental laws.

  • 2. The data after PCA dimensionality reduction enhanced the correlation between independent variables (PCs) and dependent variables (scour depth and SHAP values), making the importance rankings of PCA-processed variables more consistent according to traditional correlation algorithms.

  • 3. The SVR model outperforms other models and empirical formulas in accuracy. With the top four PCA-processed correlated input parameters (CC-PCA-4 scheme), the R2 value is 0.971, and MAPE is 7.54%. Its results show a narrower uncertainty interval ([–0.01816, 0.01483]). Conservatism analysis shows that when the safety factor k = 1.2, the model's prediction error (MAPE) is 22.1% while maintaining a high conservatism of 94.6%.

Despite the effectiveness of the model demonstrated in this study, there remain certain limitations regarding data scale and complexity. Data acquisition costs are exceedingly high, whether from physical sediment model experiments, numerical simulations, or field measurements. This constraint is one reason why current research predominantly relies on laboratory data. Given these conditions, a feasible approach for rapid dataset expansion is to establish an automated experimental and monitoring platform to address data gaps in density ratios and contraction segment lengths. Additionally, we aim to integrate more statistical metrics into the existing data types through this platform, such as time-varying turbulence energy, sediment concentration, and time-varying scour depth, thereby extending the data range to include live-bed scour and extreme flood conditions. We also plan to collect more field measurement cases and, based on this enriched dataset, develop a physics-based online support vector machine that can enable dynamic prediction, assessment, and early warning, providing a reliable tool for infrastructure safety and health monitoring.

The authors are consistent with the ethical requirements.

The authors all consent to participate in the paper editing.

The authors all consent to the publication of the paper.

The authors wish to acknowledge the financial support by the National Natural Science Foundation of China (Grant Nos. 52179060, 52209081, and 52479060).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Ashida
K.
(
1963
)
Study on the Stable Channel Through Constrictions
,
Disaster Prevention Research Institute Annuals
, 6,
312
327
.
Annad
M.
,
Lefkir
A.
,
Mammar-kouadri
M.
&
Bettahar
I.
(
2021
)
Development of a local scour prediction model clustered by soil class
,
Water Practice and Technology
,
16
(
4
),
1159
1172
.
https://doi.org/10.2166/wpt.2021.065
.
Anowar
F.
,
Sadaoui
S.
&
Selim
B.
(
2021
)
Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE)
,
Computer Science Review
,
40
,
100378
.
https://doi.org/10.1016/j.cosrev.2021.100378
.
Azamathulla
H. M.
,
Deo
M. C.
&
Deolalikar
P. B.
(
2008
)
Alternative neural networks to estimate the scour below spillways
,
Advances in Engineering Software
,
39
(
8
),
689
698
.
https://doi.org/10.1016/j.advengsoft.2007.07.004
.
Briaud
J.-L.
,
Chen
H.-C.
,
Li
Y.
,
Nurtjahyo
P.
&
Wang
J.
(
2005
)
SRICOS-EFA Method for Contraction Scour in Fine-Grained Soils
,
Journal of Geotechnical and Geoenvironmental Engineering
,
131
(
10
),
1283
1294
.
https://doi.org/10.1061/(ASCE)1090-0241(2005)131:10(1283)
.
Cao
L. J.
,
Chua
K. S.
,
Chong
W. K.
,
Lee
H. P.
&
Gu
Q. M.
(
2003
)
A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine
,
Neurocomputing
,
55
(
1
),
321
336
.
https://doi.org/10.1016/S0925-2312(03)00433-8
.
Chang
H.-M.
,
Xu
Y.
,
Chen
S.-S.
&
He
Z.
(
2022
)
Enhanced understanding of osmotic membrane bioreactors through machine learning modeling of water flux and salinity
,
Science of the Total Environment
,
838
,
156009
.
https://doi.org/10.1016/j.scitotenv.2022.156009
.
Deng
Y.
,
Zhang
D.
,
Zhang
D.
,
Wu
J.
&
Liu
Y.
(
2023
)
A hybrid ensemble machine learning model for discharge coefficient prediction of side orifices with different shapes
,
Flow Measurement and Instrumentation
,
91
,
102372
.
https://doi.org/10.1016/j.flowmeasinst.2023.102372
.
Dey
S.
&
Raikar
R. V.
(
2005
)
Scour in long contractions
,
Journal of Hydraulic Engineering
,
131
(
12
),
1036
1049
.
https://doi.org/10.1061/(ASCE)0733-9429(2005)131:12(1036)
.
Ebtehaj
I.
,
Sattar
A. M. A.
,
Bonakdari
H.
&
Zaji
A. H.
(
2017
)
Prediction of scour depth around bridge piers using self-adaptive extreme learning machine
,
Journal of Hydroinformatics
,
19
(
2
),
207
224
.
https://doi.org/10.2166/hydro.2016.025
.
Ebtehaj
I.
,
Bonakdari
H.
,
Moradi
F.
,
Gharabaghi
B.
&
Khozani
Z. S.
(
2018
)
An integrated framework of extreme learning machines for predicting scour at pile groups in clear water condition
,
Coastal Engineering
,
135
,
1
15
.
https://doi.org/10.1016/j.coastaleng.2017.12.012
.
Eini
N.
,
Bateni
S. M.
,
Jun
C.
,
Heggy
E.
&
Band
S. S.
(
2023
)
Estimation and interpretation of equilibrium scour depth around circular bridge piers by using optimized XGBoost and SHAP
,
Engineering Applications of Computational Fluid Mechanics
,
17
(
1
),
2244558
.
https://doi.org/10.1080/19942060.2023.2244558
.
Eini
N.
,
Janizadeh
S.
,
Bateni
S. M.
,
Jun
C.
&
Kim
Y.
(
2024
)
Estimating equilibrium scour depth around non-circular bridge piers using interpretable hybrid machine learning models
,
Ocean Engineering
,
312
,
119246
.
https://doi.org/10.1016/j.oceaneng.2024.119246
.
Ghazvinei
P. T.
,
Mohamed
T. A.
,
Ghazali
A. H.
&
Huat
B. K.
(
2012
)
Scour hazard assessment and bridge abutment instability analysis
,
Electronic Journal of Geotechnical Engineering
,
17
,
2213
2224
.
Gill
M. A.
(
1981
)
Bed erosion in rectangular long contraction
,
Journal of the Hydraulics Division
,
107
(
3
),
273
284
.
https://doi.org/10.1061/JYCEAJ.0005626
.
Gisbrecht
A.
,
Schulz
A.
&
Hammer
B.
(
2015
)
Parametric nonlinear dimensionality reduction using kernel t-SNE
,
Neurocomputing
,
147
,
71
82
.
https://doi.org/10.1016/j.neucom.2013.11.045
.
Goel
A.
(
2015
)
Predicting bridge pier scour depth with SVM
,
International Journal of Civil and Environmental Engineering
,
9
(
2
),
211
216
.
Goel
A.
&
Pal
M.
(
2009
)
Application of support vector machines in scour prediction on grade-control structures
,
Engineering Applications of Artificial Intelligence
,
22
(
2
),
216
223
.
https://doi.org/10.1016/j.engappai.2008.05.008
.
Guguloth
S.
,
Pandey
M.
&
Pal
M.
(
2024
)
Application of hybrid AI models for accurate prediction of scour depths under submerged circular vertical jet
,
Journal of Hydrologic Engineering
,
29
(
3
),
04024010
.
https://doi.org/10.1061/JHYEFF.HEENG-6149
.
Guo
G.
,
Liu
Y.
,
Cao
Z.
,
Zhang
D.
&
Zhao
X.
(
2024
)
Interpretable GBDT model-based multi-objective optimization analysis for the lateral inlet/outlet design in pumped-storage power stations
,
Journal of Hydroinformatics
,
26
(
5
),
1189
1205
.
https://doi.org/10.2166/hydro.2024.304
.
Guven
A.
,
Azamathulla
H. M.
&
Zakaria
N. A.
(
2009
)
Linear genetic programming for prediction of circular pile scour
,
Ocean Engineering
,
36
(
12–13
),
985
991
.
https://doi.org/10.1016/j.oceaneng.2009.05.010
.
Harasti
A.
,
Gilja
G.
,
Adžaga
N.
&
Žic
M.
(
2023
)
Analysis of variables influencing scour on large sand-bed rivers conducted using field data
,
Applied Sciences
,
13
(
9
),
5365
.
https://doi.org/10.3390/app13095365
.
Jia
W.
,
Sun
M.
,
Lian
J.
&
Hou
S.
(
2022
)
Feature dimensionality reduction: a review
,
Complex & Intelligent Systems
,
8
(
3
),
2663
2693
.
https://doi.org/10.1007/s40747-021-00637-x
.
Jolliffe
I. T.
&
Cadima
J.
(
2016
)
Principal component analysis: a review and recent developments
,
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
,
374
(
2065
),
20150202
.
https://doi.org/10.1098/rsta.2015.0202
.
Khosravi
K.
,
Safari
M. J. S.
&
Cooper
J. R.
(
2021
)
Clear-water scour depth prediction in long channel contractions: application of new hybrid machine learning algorithms
,
Ocean Engineering
,
238
,
109721
.
https://doi.org/10.1016/j.oceaneng.2021.109721
.
Kim
G.-B.
,
Hwang
C.-I.
&
Choi
M.-R.
(
2021
)
PCA-based multivariate LSTM model for predicting natural groundwater level variations in a time-series record affected by anthropogenic factors
,
Environmental Earth Sciences
,
80
(
18
),
657
.
https://doi.org/10.1007/s12665-021-09957-0
.
Kim
T.
,
Shahriar
A. R.
,
Lee
W.-D.
&
Gabr
M. A.
(
2024
)
Interpretable machine learning scheme for predicting bridge pier scour depth
,
Computers and Geotechnics
,
170
,
106302
.
https://doi.org/10.1016/j.compgeo.2024.106302
.
Komura
S.
(
1966
)
Equilibrium depth of scour in long constrictions
,
Journal of the Hydraulics Division
,
92
(
5
),
17
37
.
https://doi.org/10.1061/JYCEAJ.0001504
.
Kumar
S.
,
Oliveto
G.
,
Deshpande
V.
,
Agarwal
M.
&
Rathnayake
U.
(
2024
)
Forecasting of time-dependent scour depth based on bagging and boosting machine learning approaches
,
Journal of Hydroinformatics
,
26
(
8
),
1906
1928
.
https://doi.org/10.2166/hydro.2024.047
.
Lagasse
P. F.
,
Ettema
R.
,
DeRosset
W. M.
,
Nowroozpour
A.
&
Clopper
P. E.
(
2021
)
National Cooperative Highway Research Program, Transportation Research Board, & National Academies of Sciences, Engineering, and Medicine. Revised Clear-Water and Live-Bed Contraction Scour Analysis (p. 26198). Edited by Kami Cabral. Transportation Research Board
,
Washington, D.C
.
https://doi.org/10.17226/26198
.
Laghrissi
F.
,
Douzi
S.
,
Douzi
K.
&
Hssina
B.
(
2021
)
Intrusion detection systems using long short-term memory (LSTM)
,
Journal of Big Data
,
8
(
1
),
65
.
https://doi.org/10.1186/s40537-021-00448-4
.
Laursen
E. M.
(
1963
)
An analysis of relief bridge scour
,
Journal of the Hydraulics Division
,
89
(
3
),
93
118
.
https://doi.org/10.1061/JYCEAJ.0000896
.
Le
X.-H.
,
Thu Hien
L. T.
,
Ho
H. V.
&
Lee
G.
(
2024
)
Benchmarking the performance and uncertainty of machine learning models in estimating scour depth at sluice outlets
,
Journal of Hydroinformatics
,
26
(
7
),
1572
1588
.
https://doi.org/10.2166/hydro.2024.297
.
Lee
D. D.
&
Seung
H. S.
(
1999
)
Learning the parts of objects by non-negative matrix factorization
,
Nature
,
401
(
6755
),
788
791
.
https://doi.org/10.1038/44565
.
Lim
S. Y.
(
1993
)
Clear water scour in long contractions
,
Proceedings of the Institution of Civil Engineers – Water Maritime and Energy
,
101
(
6
),
93
98
.
Lim
S. Y.
&
Cheng
N.-S.
(
1998
)
Scouring in Long Contractions
.
Journal of Irrigation and Drainage Engineering
124
(
5
),
258
261
.
https://doi.org/10.1061/(ASCE)0733-9437(1998)124:5(258)
.
Ma
J.
&
Sun
Z.
(
2011
)
Mutual information is copula entropy
,
Tsinghua Science & Technology
,
16
(
1
),
51
54
.
https://doi.org/10.1016/S1007-0214(11)70008-6
.
Najafzadeh
M.
,
Etemad-Shahidi
A.
&
Lim
S. Y.
(
2016
)
Scour prediction in long contractions using ANFIS and SVM
,
Ocean Engineering
,
111
,
128
135
.
https://doi.org/10.1016/j.oceaneng.2015.10.053
.
Najafzadeh
M.
,
Shiri
J.
&
Rezaie-Balf
M.
(
2018
)
New expression-based models to estimate scour depth at clear water conditions in rectangular channels
,
Marine Georesources & Geotechnology
,
36
(
2
),
227
235
.
https://doi.org/10.1080/1064119X.2017.1303009
.
Nowroozpour
A.
&
Ettema
R.
(
2021
)
Observations from Contraction–Scour Experiments Conducted with a Large Rectangular Channel
,
Journal of Hydraulic Engineering
,
147
(
8
),
04021025
.
https://doi.org/10.1061/(ASCE)HY.1943-7900.0001903.
Oliveto
G.
&
Marino
M. C.
(
2019
)
Morphological patterns at river contractions
,
Water
,
11
(
8
),
1683
.
https://doi.org/10.3390/w11081683
.
Pandey
M.
,
Karbasi
M.
,
Jamei
M.
,
Malik
A.
&
Pu
J. H.
(
2023
)
A comprehensive experimental and computational investigation on estimation of scour depth at bridge abutment: emerging ensemble intelligent systems
,
Water Resources Management
,
37
(
9
),
3745
3767
.
https://doi.org/10.1007/s11269-023-03525-w
.
Park
C. H.
&
Park
H.
(
2008
)
A comparison of generalized linear discriminant analysis algorithms
,
Pattern Recognition
,
41
(
3
),
1083
1097
.
https://doi.org/10.1016/j.patcog.2007.07.022
.
Raikar
R. V.
(
2004
)
Local and General Scour of Gravel Beds
.
PhD thesis
,
Department of Civil Engineering, Indian Institute of Technology
,
Kharagpur, India
.
Raikar
R. V.
,
Wang
C.-Y.
,
Shih
H.-P.
&
Hong
J.-H.
(
2016
)
Prediction of contraction scour using ANN and GA
,
Flow Measurement and Instrumentation
,
50
,
26
34
.
https://doi.org/10.1016/j.flowmeasinst.2016.06.006
.
Riahi-Madvar
H.
,
Dehghani
M.
,
Seifi
A.
,
Salwana
E.
,
Shamshirband
S.
,
Mosavi
A.
&
Chau
K.
(
2019
)
Comparative analysis of soft computing techniques RBF, MLP, and ANFIS with MLR and MNLR for predicting grade-control scour hole geometry
,
Engineering Applications of Computational Fluid Mechanics
,
13
(
1
),
529
550
.
https://doi.org/10.1080/19942060.2019.1618396
.
Schölkopf
B.
,
Smola
A.
,
Müller
K.-R.
, (
1997
)
Kernel principal component analysis
. In:
Gerstner
W.
,
Germond
A.
,
Hasler
M.
&
Nicoud
J.-D.
(eds.)
Artificial Neural Networks – ICANN'97
.
Berlin, Heidelberg: Springer
, pp.
583
588
.
https://doi.org/10.1007/BFb0020217
.
Shahriar
A.
,
Gabr
M.
,
Montoya
B.
&
Ortiz
A.
(
2022
)
Estimating live-bed local scour around bridge piers in cohesionless sediments: applicability and bias of selected models
.
Canadian Geotechnical Journal
60
(
4
),
471
487
.
https://doi.org/10.1139/cgj-2022-0122
.
Sharafati
A.
,
Haghbin
M.
,
Torabi
M.
&
Yaseen
Z. M.
(
2021
)
Assessment of novel nature-inspired fuzzy models for predicting long contraction scouring and related uncertainties
,
Frontiers of Structural and Civil Engineering
,
15
(
3
),
665
681
.
https://doi.org/10.1007/s11709-021-0713-0
.
Straub
L. G.
(
1934
)
Effect of channel-contraction works upon regimen of movable bed-streams
,
Eos, Transactions American Geophysical Union
,
15
(
2
),
454
463
.
https://doi.org/10.1029/TR015i002p00454
.
Tola
S.
,
Tinoco
J.
,
Matos
J. C.
&
Obrien
E.
(
2023
)
Scour detection with monitoring methods and machine learning algorithms – a critical review
,
Applied Sciences
,
13
(
3
),
1661
.
https://doi.org/10.3390/app13031661
.
Webby
M. G.
(
1984
)
General scour at contraction
,
RRU Bulletin
,
73
,
109
118
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data