## Abstract

Scour depth estimation is an essential factor in water-related engineering problems. Scouring below spillways may endanger a dam's stability and even lead to dam destruction. As a result, it has undesirable environmental effects due to dam failure. Hence, reliable and accurate scour depth estimation below spillways is an exciting topic for researchers. For this purpose, the published and reliable prototype data related to scour depth below ski jump bucket spillways (*D _{s}*) was used to develop data-driven models. This study employed two widely used decision tree (DT) methods, including the M5 model tree (M5MT) and the classification and regression tree (CART), and also multivariate adaptive regression splines (MARS) for the estimation of (

*D*). The proposed methods provided explicit and clear equations with straightforward applications for estimating scour depth. For the quantitative assessments of the developed formulas, three common statistical metrics, namely root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (CC), were used. Moreover, comparison results with previous approaches existing in the literature indicated the efficacy of the suggested methods. The obtained results revealed that the MARS technique was the best approach for the estimation of scour depth.

_{s}## HIGHLIGHTS

Predictive equations were developed to estimate the scour depth below ski-jump bucket spillways.

White-box data-driven models were evaluated in this study.

The MARS model provided more accurate results when compared to decision tree methods and empirical formulas for scour depth prediction.

Field measurements were used in this study.

## INTRODUCTION

The scouring phenomenon of spillways, bridges, piers, culverts, and other hydraulic structures is a critical issue and one of the most interesting of hydraulic engineering problems (Samadi *et al.* 2014; Malik *et al.* 2021; Chou & Nguyen 2022; Daneshfaraz *et al.* 2022; Kartal & Emiroglu 2022). The scouring below spillways can jeopardize a dam's safety. The destruction of a dam causes severe damage to the economy, environment, and human life downstream. Therefore, scour below spillways should be monitored and its quantity measured. Scour depth modeling below spillways is one of the most significant challenges in hydraulic engineering research. Estimating scour depth is essential in dam design and assessing its operational safety. Due to the non-linear behavior and stochastic nature of the scouring process, various data-driven methods have been proposed for modeling scour depth (Homaei & Najafzadeh 2020; Pandey *et al.* 2020; Ahmadianfar *et al.* 2022; Devi & Kumar 2022; Homaei & Najafzadeh 2022; Nimbalkar *et al.* 2022; Rathod & Manekar 2022). Moreover, data-driven models are widely and successfully used for modeling water-related problems (Mojaradi *et al.* 2018; Ghasemi *et al.* 2022).

A literature review indicated that the applications of data-driven approaches for modeling scour downstream of the ski-jump bucket spillways can be classified into two study groups. Researchers have used experimental results and field measurements to model scour in both these groups of studies. It should be noted that little field data is published and available in the literature due to the difficulty of measuring scour depth downstream of dams. Nevertheless, Azmathullah *et al.* (2006) highlighted the importance of field measurements in accurately modeling scour depth. They recommended the application of prototype data for scour depth estimation to more accurately represent natural circumstances than experimental work conducted under controlled conditions and influenced by scale effects. Therefore, developing data-driven approaches seems necessary for predicting scour depth using field measurements.

Regarding experimental work, the following studies have been conducted using data-driven models to estimate and model the scour downstream of ski-jump bucket spillways. Azmathullah *et al.* (2005) implemented an artificial neural network (ANN) to predict scour hole characteristics. They indicated the outperformance of ANN compared to traditional regression methods. Agarwal *et al.* (2010) employed locally weighted projection regression (LWPR) and found that LWPR was more efficient than ANN. Goyal & Ojha (2011) indicated favorable efficiency for support vector machine (SVM) and M5 model tree (M5MT) compared to ANN. Najafzadeh *et al.* (2014) indicated that a combination of the group method of data handling (GMDH) approach with back-propagation (BP) algorithms was more accurate compared to ANN, genetic programming, adaptive neural fuzzy inference system (ANFIS), and conventional equations. Noori *et al.* (2017) modeled the dimensions of a scour hole using the granular computing (GrC) method and demonstrated the potential of the GrC method. Nou *et al.* (2021) combined the ANFIS with particle swarm optimization (PSO) algorithms and demonstrated the superiority of the ANFIS-PSO approach over the stand-alone ANFIS method. Sun *et al.* (2021) used a combination of support vector regression (SVR) with fruitfly optimization algorithms (FOAs). Their proposed method improved the accuracy of scour depth prediction compared with a stand-alone SVR method.

Concerning field measurements, the following studies have been made to estimate the scour downstream of ski-jump bucket spillways using data-driven approaches. Azmathullah *et al.* (2006) and Azmathullah *et al.* (2008a) found that ANN and ANFIS could better predict the depth of scour than traditional formulas. Guven & Azamathulla (2012) provided mathematical expressions using gene expression programming (GEP) to estimate normalized scour depth. Sammen *et al.* (2020) introduced the hybridization of ANN with Harris hawks optimization (ANN-HHO), PSO (ANN-PSO), and genetic algorithm (ANN-GA) to predict normalized scour depth. They illustrated the efficiency of ANN-HHO compared to the ANN-PSO and ANN-GA models.

The literature review indicated that data-driven methods developed using prototype data were fewer than those developed using experimental data. To the authors' knowledge there was no published study conducted using multivariate adaptive regression splines (MARS) and decision tree (DT) approaches using prototype data and nondimensional parameters to estimate scour depth below ski jump bucket spillways. Therefore, this study used field measurements of scour depth below ski jump bucket spillways to develop MARS and two well-known DT approaches, including M5MT and classification and regression tree (CART) algorithms. This research investigated the proposed models' effectiveness and compared their results with existing previous approaches using statistical analysis and graphical evaluation. It is worth mentioning that the main features of the proposed methods derive explicit equations to predict scour depth compared to black-box data-driven methods such as the ANN model. The suggested predictive formulas are beneficial for practical engineering in real-world applications and reduce potential safety risks.

## MATERIALS AND METHODS

This section presents a description of field measurements used for scour depth estimation. In addition, a brief overview of the data-driven methods used to model scour depth, such as MARS, CART, and M5MT, is also presented.

### Field measurements and existing approaches

A limited number of field measurements of scour depth below spillways have been reported in the literature. This study used published data of scour below a ski-jump bucket spillway reported by Azamathulla *et al.* (2008b). They reported the head between the upper water level (reservoir level) and the tailwater level, , (m), discharge intensity, *q*, (m^{3}/s/m), and depth of scour, , (m) of 82 field measurements of various dams.

A literature review indicated that some traditional formulas are suggested for predicting scour depth below spillways. Table 1 shows some traditional formulas for calculating (Mason & Arumugam 1985; Azamathulla *et al.* 2008b; Kumar & Sreeja 2012; Azamathulla 2013; Khatsuria 2013).

Approach . | Formula . |
---|---|

Veronese-(B) (1937) | |

Wu (1973) | |

Martins-(B) (1975) | |

Taraimovich (1978) | |

Sofrelec (1980) | |

Incyth (1982) | |

CWPRS (1986) | |

Azmathullah et al. (2006) | |

Kumar & Sreeja (2012) |

Approach . | Formula . |
---|---|

Veronese-(B) (1937) | |

Wu (1973) | |

Martins-(B) (1975) | |

Taraimovich (1978) | |

Sofrelec (1980) | |

Incyth (1982) | |

CWPRS (1986) | |

Azmathullah et al. (2006) | |

Kumar & Sreeja (2012) |

where is defined as the Froude number, and is acceleration due to gravitation.

### Multivariate adaptive regression splines (MARS)

*et al.*2018). The MARS approach uses linear functions that can model nonlinear systems with a reduced degree of complexity in formulating the problem. MARS estimates an output parameter using the linear combination of many basis functions (BFs). A BF illustrates the relationship between inputs and outputs. MARS constructs an explicit equation to determine the output parameter () in the following general form (Sihag

*et al.*2021):where is a constant value, is the input variable, is the corresponding coefficient of each BF, and

*M*is the total number of BFs. is the m

^{th}BF which is defined as follows (Yonesi

*et al.*2022):

The MARS model is constructed in two steps: forward and backward. In the forward step, all possible BFs are added to a MARS model that may result in an overfitted model. Afterward, the BFs of less importance are eliminated in the backward step concerning the generalized cross-validation (GCV) criterion.

### Decision trees (DTs)

Decision tree (DT) algorithms provide a set of logical rules that are used for classification and regression issues. The main concept of DTs for solving a complex problem is to divide the input domain of the problem into several subdomains and create a specialized model for each sub-domain (Enayati *et al.* 2022; Torabi *et al.* 2022). This work reduces the degree of complexity of the problem with the combination of local models and enhances the predictive capability and accuracy of the model. The result of a DT is expressed as a hierarchical inverse tree-like structure with split rules into internal and terminal nodes and provides the predictive models in each terminal node. A DT is divided into two or more groups at each internal node based on the specific DT algorithm. A binary DT creates two branches in each internal node. To produce a DT, an inference method or division condition is used during the tree development process. The model's division criterion involves calculating the standard deviation of the class values entering the node as an error value and calculating the predicted reduction in this error as the test result for each feature in that node.

A CART is a binary DT that can be used for classification and regression problems (Breiman 1984). Each internal node classifies the data into two groups by a simple if-then rule based on a single variable (Kamranzad *et al.* 2013). In each class, the response factor must optimize homogeneity while minimizing total deviation. Another popular binary DT is the M5 model tree (M5MT). The M5MT was introduced by Quinlan (1992) as a DT model used exclusively for regression problems and numerical prediction. The outcome of M5MT is the extraction of knowledge from a tree structure in the form of if-then rules, considering splitting variables, the range of splitting variables, and multivariate linear regression models on the leaves.

*et al.*2022; Singh

*et al.*2022). The input variable within the internal node of the tree is selected based on the feature that results in the greatest possible decrease in predicted error when measured against the standard deviation of the output parameter (Khosravi

*et al.*2022). The result of a terminal node can be expressed as:where

*O*is the output parameter, are the coefficients of the multiple linear regression model, and are the input variables that contribute to the prediction of the output parameter.

In summary, M5MT and CART are popular and widely used binary DTs for predictive purposes. M5MT and CART have several advantages, such as simple and understandable construction, reduced computing costs, and visual depiction. The major difference between M5MT and CART is that M5MT generates multivariate linear functions in terminal nodes while CART provides constant numerical values. The rules of DTs are clear and easy to use for everyone. More details about M5MT and CART algorithms can be found in Wang & Witten (1997) and Breiman (1984).

## MODEL DEVELOPMENT FOR ESTIMATION OF SCOUR DEPTH

Training and testing datasets are essential for the construction of data-driven models. Hence, 80% of the data set was used for training, and 20% remained for the testing set. Additionally, non-dimensionless parameters were utilized in the creation of the proposed models. Therefore, and were considered input and output variables, respectively. It is worth mentioning that the earlier studies conducted by Guven & Azamathulla (2012) and Sammen *et al.* (2020) employed dimensionless parameters to predict scour depth using field data. The statistical characteristic values of input and output variables are listed in Table 2.

Parameter . | Train dataset . | Test dataset . | All dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

Min . | Max . | Avg . | Min . | Max . | Avg . | Min . | Max . | Avg . | |

0.0040 | 4.4699 | 0.1869 | 0.0088 | 0.1850 | 0.0696 | 0.0040 | 4.4699 | 0.1669 | |

0.0572 | 6.3500 | 0.7315 | 0.1687 | 1.2936 | 0.6323 | 0.0572 | 6.3500 | 0.7146 |

Parameter . | Train dataset . | Test dataset . | All dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

Min . | Max . | Avg . | Min . | Max . | Avg . | Min . | Max . | Avg . | |

0.0040 | 4.4699 | 0.1869 | 0.0088 | 0.1850 | 0.0696 | 0.0040 | 4.4699 | 0.1669 | |

0.0572 | 6.3500 | 0.7315 | 0.1687 | 1.2936 | 0.6323 | 0.0572 | 6.3500 | 0.7146 |

As can be seen, the simple mathematical expressions were obtained with two BFs for predicting scour depth.

It is worth noting that the number of BFs and the value of the GCV criterion are important in developing the MARS model in order to find the MARS equation for the estimation of . As stated previously, the MARS model is developed in two steps. For generating the MARS equation, 12 BFs were considered in the first step, and in the second stage (the pruning stage), 10 BFs were removed. As a result, the final MARS equation with 2 BFs was obtained for the estimation of . Furthermore, the value of the GCV criterion for the MARS equation was equal to 0.07617.

As seen in Figure 1, two branches have divided the domain of the problem into two terminal nodes, which contain constant numerical values, and this has terminated the growth of the CART tree with 2 rules. This is an important point: the Least Squared Deviation (LSD) impurity measure is employed for splitting rules and goodness of fit criteria. In addition, each terminal node's estimated category is the weighted average of the target values for records in the node.

As seen in Figure 2, the M5MT divided the input domain of the problem into two sub-subdomains and provided two linear regression models at two terminal nodes for the estimation of . M5MT constructs a regression tree by recursive splitting based on treating the standard deviation of the class values that reach a node as a measure of the error at the node. In addition, the M5MT method employs a pruning procedure to avoid overfitting the obtained linear models. After pruning, M5 used a smoothing procedure to compensate for discontinuities that occurred due to the pruning procedure. Therefore, the smoothed and pruned M5MT was generated using training data for the estimation of . The coefficients of linear models in M5MT are obtained using the least-squares method.

As seen, compared to constant numerical values provided by the CART method in terminal nodes, the M5 DT presented multivariate linear models in terminal nodes, which increased the M5 models' flexibility for scour depth estimation. As previously discussed, regression trees and M5MT are used to solve regression problems. However, the main difference between M5MT and regression trees is that the leaves of the regression trees have a constant value. In contrast, M5MT provides linear models in their leaves, which can predict numeric values for a given data sample.

Regarding Figures 1 and 2, the CART and M5 models have similar structures, and the splitting value for was 0.139. The value of the splitting criterion for is established by optimizing the training data set to improve the estimation and minimize the estimation error for the training data, but they do not necessarily have a physical significance. This issue was highlighted and expressed by previous researchers (Bhattacharya *et al.* 2007; Bonakdar & Etemad-Shahidi 2011; Samadi *et al.* 2014).

## PERFORMANCE CRITERIA OF DEVELOPMENT MODELS

*n*.

## RESULTS AND DISCUSSION

The statistical measurement values of data-driven models are tabulated in Table 3.

Model . | CC . | RMSE . | MAE . |
---|---|---|---|

MARS (Train) | 0.9584 | 0.2545 | 0.1761 |

MARS (Test) | 0.7734 | 0.2374 | 0.1429 |

M5MT (Train) | 0.9531 | 0.2716 | 0.1887 |

M5MT (Test) | 0.7579 | 0.2643 | 0.1853 |

CART (Train) | 0.6896 | 0.6631 | 0.2925 |

CART (Test) | 0.6246 | 0.3413 | 0.2450 |

Model . | CC . | RMSE . | MAE . |
---|---|---|---|

MARS (Train) | 0.9584 | 0.2545 | 0.1761 |

MARS (Test) | 0.7734 | 0.2374 | 0.1429 |

M5MT (Train) | 0.9531 | 0.2716 | 0.1887 |

M5MT (Test) | 0.7579 | 0.2643 | 0.1853 |

CART (Train) | 0.6896 | 0.6631 | 0.2925 |

CART (Test) | 0.6246 | 0.3413 | 0.2450 |

For the training dataset, the CC, RMSE, and MAE values of MARS were 0.9584, 0.2545, and 0.1761, respectively. The CC, RMSE, and MAE values of M5MT were 0.9531, 0.2716, and 0.1887. The CC, RMSE, and MAE values of CART were 0.6896, 0.6631, and 0.2925, respectively. Therefore, MARS outperformed M5MT and CART in the training stage. In addition, the CC, RMSE, and MAE values of MARS were 0.7734, 0.2374, and 0.1429 more precise than M5MT with CC = 0.7579, RMSE = 0.2643, and MAE = 0.1853, and the CART method with CC = 0.6246, RMSE = 0.3413, and MAE = 0.2450 for testing data sets. Therefore, based on the values of statistical indices for training and testing data sets, the MARS model's performance was obviously better than the M5MT and CART algorithms.

MARS was similar to the compact DT method in that it provided three if-then rules for estimating scour depth with regard to the value of . However, both DT algorithms, i.e., M5MT and CART models, had similar structures with the same splitting criterion (i.e., 0.139). Moreover, M5MT and CART provided two if-then rules for scour depth predictions. However, compared to constant numerical values presented by CART in terminal nodes, M5MT provided multivariate linear models that increased the generalizability of M5MT. This issue improved the power prediction of M5MT compared to CART. The decision rules were obtained from the DT methods that employed a single variable, i.e., , for scour depth prediction. The appropriate rule was selected based on the value of and simply computed scour depth. As observed, these rules were easy to use for computing scour depth.

However hand, MARS provided more rules (three rules) and caused more flexibility and generalizability for the estimation of scour depth, while M5MT and CART generated two rules for scour depth predictions. Overall, the MARS equation is superior to decision rules obtained from M5MT and CART methods. Furthermore, the MARS model, as the best predictive formula, was compared to earlier robust data-driven models reported by Guven & Azamathulla (2012) and Sammen *et al.* (2020). Table 4 summarizes the values of statistical indices of the MARS, ANN-HHO, and GEP for the estimation of .

Approach . | Category . | CC . | RMSE . | MAE . |
---|---|---|---|---|

MARS (Present study) | Training | 0.9584 | 0.2545 | 0.1761 |

MARS (Present study) | Testing | 0.7734 | 0.2374 | 0.1429 |

ANN-HHO (Sammen et al. 2020) | Training | 0.9557 | 0.2626 | 0.1791 |

ANN-HHO (Sammen et al. 2020) | Testing | 0.7765 | 0.2538 | 0.1760 |

GEP (Guven & Azamathulla 2012) | Training | 0.9564 | 0.3606 | 0.1957 |

GEP (Guven & Azamathulla 2012) | Testing | 0.7813 | 0.2582 | 0.1826 |

Approach . | Category . | CC . | RMSE . | MAE . |
---|---|---|---|---|

MARS (Present study) | Training | 0.9584 | 0.2545 | 0.1761 |

MARS (Present study) | Testing | 0.7734 | 0.2374 | 0.1429 |

ANN-HHO (Sammen et al. 2020) | Training | 0.9557 | 0.2626 | 0.1791 |

ANN-HHO (Sammen et al. 2020) | Testing | 0.7765 | 0.2538 | 0.1760 |

GEP (Guven & Azamathulla 2012) | Training | 0.9564 | 0.3606 | 0.1957 |

GEP (Guven & Azamathulla 2012) | Testing | 0.7813 | 0.2582 | 0.1826 |

As observed in Table 4, in the training phase, the values of CC, RMSE, and MAE of the MARS approach were 0.9584, 0.2545, and 0.1761, respectively, followed by ANN-HHO with CC = 0.9557, RMSE = 0.2626, and MAE = 0.1791, and GEP with CC = 0.9564, RMSE = 0.3606, and MAE = 0.1957. Similarly, regarding error values, MARS has the best performance with RMSE = 0.2374 and MAE = 0.1429 compared to ANN-HHO with RMSE = 0.2538 and MAE = 0.1760 and GEP with RMSE = 0.2582 and MAE = 0.1826 in the testing phase. So, it can be concluded that the MARS approach is the best predictive data-driven model for estimating .

In addition, compared to ANN-HHO, the MARS method provided simple mathematical expressions that easily and quickly replaced the value of in the MARS equation and estimated scour depth without needing any software or computer programming knowledge. In the following, the proposed data-driven methods were compared with traditional formulas for estimating scour depth. The values of statistical indices are computed and presented in Table 5.

Approach . | CC . | RMSE . | MAE . |
---|---|---|---|

MARS (Present study) | 0.9526 | 0.2517 | 0.1705 |

M5MT (Present study) | 0.9463 | 0.2703 | 0.1881 |

CART (Present study) | 0.6746 | 0.6201 | 0.2844 |

Veronese-(B) (1937) | 0.9392 | 0.5332 | 0.3880 |

Wu (1973) | 0.9426 | 0.3427 | 0.2064 |

Martins-(B) (1975) | 0.9479 | 0.3097 | 0.2098 |

Taraimovich (1978) | 0.8871 | 0.4348 | 0.2404 |

Sofrelec (1980) | 0.9479 | 0.8268 | 0.4583 |

Incyth (1982) | 0.9415 | 0.2873 | 0.2002 |

CWPRS (1986) | 0.9480 | 0.2928 | 0.2014 |

Azmathullah et al. (2006) | 0.9416 | 0.3261 | 0.2027 |

Kumar & Sreeja (2012) | 0.8723 | 0.7890 | 0.5825 |

Approach . | CC . | RMSE . | MAE . |
---|---|---|---|

MARS (Present study) | 0.9526 | 0.2517 | 0.1705 |

M5MT (Present study) | 0.9463 | 0.2703 | 0.1881 |

CART (Present study) | 0.6746 | 0.6201 | 0.2844 |

Veronese-(B) (1937) | 0.9392 | 0.5332 | 0.3880 |

Wu (1973) | 0.9426 | 0.3427 | 0.2064 |

Martins-(B) (1975) | 0.9479 | 0.3097 | 0.2098 |

Taraimovich (1978) | 0.8871 | 0.4348 | 0.2404 |

Sofrelec (1980) | 0.9479 | 0.8268 | 0.4583 |

Incyth (1982) | 0.9415 | 0.2873 | 0.2002 |

CWPRS (1986) | 0.9480 | 0.2928 | 0.2014 |

Azmathullah et al. (2006) | 0.9416 | 0.3261 | 0.2027 |

Kumar & Sreeja (2012) | 0.8723 | 0.7890 | 0.5825 |

The error values indicated that the Incyth formula for all datasets had the minimum RMSE and MAE values among the traditional formulas. Comparing the values of statistical indices of the Incyth formula (CC = 0.9415, RMSE = 0.2873, and MAE = 0.2002) with the MARS method (CC = 0.9526, RMSE = 0.2517, and MAE = 0.1705) revealed the best performance of MARS for estimation of scour depth. Further, another proposed data-driven approach, i.e., M5MT, with CC = 0.9463, RMSE = 0.2703, and MAE = 0.1881, was slightly better than the Incyth formula.

^{4}

^{5}

^{6}

^{7}–8.

As seen in the scatter plots and outcomes of the proposed data-driven models, it was clearly observed that the MARS model has the best performance in estimating in the training and testing stages. The graphical evaluation results confirm the accuracy of MARS for the prediction of . Finally, some examples of outcomes of the proposed models, including MARS, M5MT, and CART, are provided in Table 6.

Sample No. . | . | . | MARS . | M5MT . | CART . |
---|---|---|---|---|---|

1 | 0.044 | 0.422 | 0.430 | 0.435 | 0.404 |

2 | 0.153 | 1.555 | 1.028 | 0.960 | 1.569 |

3 | 0.141 | 1.038 | 0.950 | 0.943 | 1.569 |

4 | 0.185 | 1.263 | 1.222 | 1.002 | 1.569 |

5 | 0.019 | 0.300 | 0.311 | 0.368 | 0.404 |

6 | 0.032 | 0.344 | 0.372 | 0.403 | 0.404 |

7 | 0.071 | 0.556 | 0.567 | 0.512 | 0.404 |

8 | 0.013 | 0.223 | 0.282 | 0.352 | 0.404 |

Sample No. . | . | . | MARS . | M5MT . | CART . |
---|---|---|---|---|---|

1 | 0.044 | 0.422 | 0.430 | 0.435 | 0.404 |

2 | 0.153 | 1.555 | 1.028 | 0.960 | 1.569 |

3 | 0.141 | 1.038 | 0.950 | 0.943 | 1.569 |

4 | 0.185 | 1.263 | 1.222 | 1.002 | 1.569 |

5 | 0.019 | 0.300 | 0.311 | 0.368 | 0.404 |

6 | 0.032 | 0.344 | 0.372 | 0.403 | 0.404 |

7 | 0.071 | 0.556 | 0.567 | 0.512 | 0.404 |

8 | 0.013 | 0.223 | 0.282 | 0.352 | 0.404 |

As observed in Table 6, the values of predicted by the MARS model were the closest results to measurements of . It is worth mentioning that previous studies conducted by Samadi & Jabbar (2012), Samadi *et al.* (2015), Haghiabi (2017), Rezaie-Balf (2019), Samadi *et al.* (2020), Samadi *et al.* (2021), and Najafzadeh & Oliveto (2022) have shown the potential and capability of the MARS model for estimating scour depth.

## SUMMARY AND CONCLUSIONS

The scouring process downstream of a dam is one of the main parameters affecting the dam's stability and adverse environmental effects. Scouring can endanger the safety of a dam and related structures. Therefore, correct and reliable scour depth estimation is one of the most important topics for water and hydraulic engineering. This study used three robust white-box data-driven models based on the two popular decision tree methods (M5MT and CART algorithms) and the MARS method to generate explicit equations for scour depth estimation. Field measurements of the scour depth below ski-jump bucket spillways were used to develop the white-box data-driven models. Concerning statistical assessments of the developed models, it was found that the MARS method is the best predictive model for the estimation of scour depth.

The proposed data-driven approaches provided explicit expressions for scour depth prediction. These equations simply and easily compute the depth of scour below ski-jump bucket spillways. The reasonable and effective performance of the MARS model indicated that this method is a high-potential data-driven method for estimating scour depth, which is critical for hydraulic engineering in the design, construction, and stability of dams. The mathematical expressions provided by the proposed methods are clear and understandable for everyone without needing any prior knowledge about the physics of scour depth or data-driven models. These simple rules, generated by the MARS and DTs algorithms, help engineers quickly and accurately approximate scour depth. The findings of this study appear to support the applicability of the suggested approaches for modeling scour depth.

## FUNDING

This study did not receive funding from any sources.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict of interest.

## REFERENCES

*Studies in Computational Intelligence, vol 1043*).

**29**, 72839–72852. DOI: https://doi.org/10.1007/s11356-022-20989-2.