## Abstract

Riprap stones are frequently applied to protect rivers and channels against erosion processes. Many empirical equations have been proposed in the past to estimate the unit discharge at the failure circumstance of riprap layers. However, these equations lack general impact due to the limited range of experimental variables. To overcome these shortcomings, support vector machine (SVM), multivariate adaptive regression splines (MARS), and random forest (RF) techniques have been applied in this study to estimate the approach densimetric Froude number at the incipient motion of riprap stones. Riprap stone size, streambank slope, uniformity coefficient of riprap layer stone, specific density of stones, and thickness of riprap layer have been considered as controlling variables. Quantitative performances of the artificial intelligence (AI) models have been assessed by many statistical measures including: coefficient of correlation (R), root mean square error (RMSE), mean absolute error (MAE), and scatter index (SI). Statistical performance of AI models indicated that SVM model with radial basis function (RBF) kernel had better performance (SI = 0.37) than MARS (SI = 0.75) and RF (SI = 0.63) techniques. The proposed AI models performed better than existing empirical equations. From a parametric study the results demonstrated that the erosion-critical stone-referred Froude number (*F _{s,c}*) is mainly controlled by the streambank slope.

## INTRODUCTION

Rock armor (commonly known as riprap) has been used in hydraulic engineering to protect hydraulic structures such as bridge piers, grade-control structures, bridge abutments, culvert outlets, end sill of stilling basin, ski-jump bucket spillways, dam embankments, and channel beds, which are exposed to scour and erosion processes (e.g., Borah 1989; Froehlich 1995; Lauchlan & Melville 2001; Dey & Barbhuiya 2004; Eli & Gray 2008; Hiller *et al.* 2018). Stability of ripraps is a significant factor in their design. The unit discharge of the overtopping flow, the gradation and shape of riprap stones, and the bed and bank slope of waterways highly affect the stability of ripraps (e.g., Ullmann & Abt 2000; Thornton *et al.* 2008; Eli & Gray 2008). Underestimation of these effective variables may increase the possibility of scouring or liquidation of armored rock layer. In contrast, overestimation of these important elements increases the cost of the project (Thornton *et al.* 2014). For instance, the accurate estimation of the stone sizes enhances the stability of ripraps, especially when they are vulnerable to overtopping (Thornton *et al.* 2014). Hence, a large number of studies investigated the riprap stability on steep slopes for different hydraulic conditions, and gradation of riprap stones (Hartung & Scheuerlein 1970; Abt *et al.* 1987; Wittler & Abt 1997; Ullmann & Abt 2000; Gallegos 2001; Eli & Gray 2008; Hiller *et al.* 2018, 2019).

In the effort to quantify the overtopping phenomenon, many empirical equations, extracted from experimental observations, have been proposed to estimate the unit discharge at the failure circumstance of riprap layer for various streambank slopes and properties of bed sediments (Thornton *et al.* 2014). However, these equations lack generalization due to the limited range of experimental variables and hence do not extend to a wide range of hydraulic conditions (Thornton *et al.* 2014; Najafzadeh *et al.* 2018). Moreover, these empirical relationships are developed on the traditional regression-based approaches that cannot robustly capture the non-linear relationship between the key variables at the incipient motion of riprap stones.

Due to the above-mentioned restrictions, artificial intelligence (AI) approaches have been recently employed to accurately estimate the riprap stone size. Najafzadeh *et al.* (2018) used evolutionary algorithm-based formulations to predict the size of riprap stones in overtopping flows. From their research, it was found that the utilized AI models could provide more accurate predictions.

Recently, AI-based data classification and machine learning methods have been employed for forecasting groundwater table (Giustolisi 2006; Amaranto *et al.* 2018), evaluation of circumstances of sewer networks (Caradot *et al.* 2018), estimation of chlorophyll-a concentration in water surfaces (Yajima & Derot 2017), prediction of water demand for a short-time period (Antunes *et al.* 2018), estimation of suspended sediment concentration in river (Babovic 2009), run-off forecasting (Babovic 2005; Adamowski *et al.* 2012; Meshgi *et al.* 2015), prediction of standardized precipitation index (Komasi *et al.* 2018), shear strength of soil (Pham *et al.* 2018), and longitudinal dispersion coefficients in rivers (Haghiabi 2016). Through these studies, support vector machine (SVM), multivariate adaptive regression spline (MARS), and random forest (RF) are the most robust machine learning models which have ever applied in solving various problems in water engineering. Because of their remarkable advantages, these AI models were considered. The most remarkable characterization of SVM is the high potential of generalizing datasets whose number is small in the training stage and additionally SVM does not get stuck in local optimum like artificial neural networks. Furthermore, RF is generally fast to build the model and has the capability to automatically select datasets from a large number of input variables. For quite a few datasets, RF is able to produce a highly precise classifier. As a merit, MARS techniques will no longer need the functional relationship among independent and dependent variables and, in addition to this, relationships given by MARS model are additive and iterative. In the case of the incipient motion of a riprap for overtopping flows, there are more than 20 empirical equations obtained by experimental investigations. Each empirical equation was extracted from certain experimental conditions and limited range of experimental variables. Khan & Ahmad (2011) collected previous experimental data and presented a multiple regression equation over all the available datasets. Even though their equation had the highest precision in comparison to previous empirical equations, Khan & Ahmad (2011) just performed the validation of their proposed equation. Additionally, their equation was not subsequently tested (or checked) by the new experimental datasets. On the other hand, the accuracy level of the empirical equation by Khan & Ahmad (2011) was not checked. This means that the generalization of this equation due to randomizing datasets/partitioning datasets into calibration (training) and validation (testing) stages cannot be reached. During the recent half-century, data-acquiring systems have been employed to obtain information about some processes. With the emergence of contemporary sciences, these kinds of systems have shown more accurate and reliable results. Furthermore, data acquisition systems can be automated by advanced machines, introduced as machine learning models, to obtain more reliable recognition of behavioral patterns for various phenomena with engineering applications. There is no denying that the use of machine learning models in prediction of various variables can efficiently cover limitations of empirical techniques.

With this study, there is no claim to clarify the overtopping phenomena in a comprehensive manner, but there is a tendency to: (i) emphasize the limitations of the current empirical equations; (ii) highlight the key dimensionless variables controlling the overtopping phenomena; and (iii) try to provide predictive models that, even though more structurally complex, are more accurate. To the best of the authors' knowledge, three powerful AI models, SVM, MARS, and RF have not been used in designing the size of riprap stones. To all this, new contributions in terms of AI modeling methodology are also provided in this study. Namely, the insufficient number of reliable empirical equations for the estimation of unite discharge at failure state, which is employed in the design of the riprap stone size, can jeopardize the slope stability of bed in waterways, rivers, and channels. It is known that regression-based equations still have a high level of inaccuracy. More specifically, in such investigations, the results from the AI models under study are required to be connected with the problem. The only way to show how AI results are linked to solve the problem consists in the investigation of AI results for consistency. This implies that the recognition of general patterns between input and output variables should be conceptually investigated so that agreements between general pattern and experimental studies (given in literature) are recognized. In this study, it is suggested how AI models could be reliable techniques, recognizing existing general variation of input–output systems. Therefore, experimental datasets from the literature are used to assess the performance of SVM, MARS, and RF techniques in the prediction of dimensionless overtopping discharge at the riprap failure condition. A parametric study is conducted to illustrate the consistency of the AI models' results in riprap designing. Finally, results from the AI techniques are compared to those obtained from the empirical equations.

## A SURVEY OF EXPERIMENTAL AND FIELD STUDIES

*q*,

_{f}*S, D*

_{50}, and

*G*denote: unit discharge at failure (or incipient motion) of a riprap layer

_{s}*,*bed (or embankment) slope, median riprap stone size, and specific density of riprap stone, respectively. Specifically,

*G*is the ratio of the riprap stone density (

_{s}*ρ*) to the water density (

_{s}*ρ*). In Equation (1), and are in square feet over second and feet, respectively.

*D*

_{50}and

*q*are in feet and ft

_{f}^{2}/s, respectively.

*et al.*’s (1998) study, a rock chute as a riprap layer whose geometry is angular was designed. They presented the following empirical equations predicting the unit discharge at the failure condition, on the basis of the geometric characteristics of a riprap layer:where the unit of measure for the median riprap stone size (

*D*

_{50}) is mm and that of

*q*m

_{f}^{2}/s.

*F*. Through his experiments, the slope of riprap varied from 0.29 to 0.67. Additionally,

_{s,c}*D*

_{50}values were between 0.03 and 0.05 m and the riprap stones' density

*ρ*was equal to 2,610 kg/m

_{s}^{3}. The following relationship was recommended:where

*F*is the stone-referred densimetric Froude number, is the slope angle, and Δ is equal to G

_{s,c}*− 1 = (*

_{s}*ρ*−

_{s}*ρ*)/

*ρ*. Hereafter, the current terminology of

*F*is also in harmony with the definition provided by Siebel (2007).

_{s,c}*D*

_{50}and

*q*are in cm and L/m·s, respectively. Khan & Ahmad (2011) inferred from previous experimental works the following regression-based equation for the unit discharge at the riprap failure state:where is the uniformity coefficient of riprap stones and

_{f}*t*(in mm) is the thickness of riprap layer. Moreover,

*q*is in m

_{f}^{2}/s and

*D*

_{50}in mm. Then, Khan & Ahmad (2011) asserted that is a function of the physical proprieties of the riprap stones.

*et al.*(2013) compared 21 practical formulations. From their study, a power regression relationship has been drawn which had the capability to compare the observed values of

*D*

_{50}against the predicted ones for both the comprehensive datasets and the subdivided one considering a lower range (

*D*

_{50}< 5.1 cm), a middle range, (5.1 cm ≤

*D*

_{50}≤ 25.4 cm), and an upper range (

*D*

_{50}> 25.4 cm). They found that the following equation, proposed by Thornton

*et al.*(2014) considering 102 experimental datasets:could provide the most level of accuracy with the same units of measurement as for Equation (7). In particular, Equation (8) exhibited more accurate prediction than Equation (7).

Finally, Hiller *et al.* (2018) conducted a field study including a large-scale riprap layer with *D*_{50} = 0.37 m. Also, a corresponding experimental set up in scale 1:6.5 was constructed with slope 1:1.5 (vertical:horizontal) in order to investigate stability criterion, packing density of riprap stones, and illustrative flow patterns. Their study revealed an interesting similarity between field and laboratory results when considering the stone-referred densimetric Froude number *F _{s,c}*.

## DATA DESCRIPTIONS

*et al.*2008). Controlling unit discharge in the incipient motion of riprap stones can ameliorate occurrence of the overtopping phenomenon and, additionally, an accurate estimation of this variable may result in an increase of stability level of riprap layer subject to overtopping. The stability of riprap stones depends on the maximum value of unit discharge, height of roughness, specific gravity of water, riprap stone size, and embankment slope (Isbash 1936; Hartung & Scheuerlein 1970; Mishra 1998; Abt

*et al.*2008). More specifically, according to previous experimental works, the unit discharge at failure (or incipient motion) of a riprap layer (

*q*) is a function (

_{f}*ψ*) of bed (or embankment) slope (

*S*), mean diameter of riprap stone size (

*D*

_{50})

_{,}uniformity coefficient of riprap stones (

*C*), riprap layer thickness (

_{u}*t*), stone density (

*ρ*), and water density (

_{s}*ρ*) (Abt

*et al.*2013):Figure 1 provides a scheme of a riprap embankment protection exposed to overtopping flow and the description of the main variables in Equation (9).

*et al.*2013). On the basis of dimensional analysis, introduced as the Buckingham theorem, Equation (9) can be expressed as (Hiller

*et al.*2018):In the case of AI applications into hydraulic engineering issues, especially in sediment transport problems, the use of non-dimensionless parameter gave accurate estimations rather than results obtained by dimensional variables (e.g., Azamathulla

*et al.*2005; Samadi

*et al.*2014; Khan

*et al.*2018; Sharafati

*et al.*2018). Furthermore, the use of Froude number due to riprap stone conceptions is a reasonable selection for prediction of unit discharge for a riprap layer at the failure state, as mentioned in the literature (Siebel 2007).

The three dimensionless parameters on the right-hand side of Equation (10) were used as inputs in the SVM, MARS, and RF models. Explored ranges of the dimensional variables are given in Table 1. In this work, 102 experimental data points collected from the literature were considered. The raw data (i.e., unprocessed data) were considered just like measured in the experimental works. The datasets were collected from a wide range of experimental conditions of studies as small, large, and very large-laboratory scales, and in Table 1 the nature of the data (e.g., laboratory experiments, field experiments) related to each literature source is given. The influence of various scales on the accuracy level of AI approaches and empirical equations has become ignorable. Reportedly, this issue can decrease capacity generalization of AI models' performance, as introduced in previous literature (Najafzadeh *et al.* 2018). The experimental dataset was divided into two parts: 75% of the data (76 data points) was used to train the AI models, and the remaining 25% (26 data points) was utilized to test the models. Overall, empirical equations to predict the discharge at the failure state are in non-dimensional form and, additionally, illustrative representation of the dimensionless parameters against non-dimensional effective parameter (i.e., design curve) is of high interest to engineers. This means that non-dimensional parameters were used to run AI models in order to wipe out the effects of input–output scale (experimental data) on the performance of AI models. In this way, the use of dimensionless parameters causes to increase not only the applicability of traditional equations from experimental to field scale, but also makes estimations more reliable. Furthermore, there are three important issues regarding the considered experimental datasets. First, approach flow was fully turbulent and second, the effect of channel side-walls was negligible. The third issue is that the flow at the interface flow–riprap was fully turbulent.

Authors . | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|

Abt et al. (1987) | 2.59–5.59 | 0.21–0.66 | 1.75–2.09 | 1.72 | 0.01–0.02 | 38–40 | 7.77–16.77 | 3 | 12.15–29.20 |

Abt & Johnson (1991) | 2.60–15.70 | 0.03–0.42 | 1.75–2.30 | 1.65–1.72 | 0.01–0.20 | 38–42 | 7.54–31.20 | 2–3 | 0.92–8.13 |

Maynord (1992) | 1.52 | 0.75 | 2.07 | 1.65 | 0.002 | 36 | 2.54 | 1.67 | 99.49 |

Wittler (1994) | 8.13–8.38 | 0.103–0.291 | 1.56–5.33 | 1.52–1.70 | 0.05–0.20 | 41 | 24.39–25.14 | 3 | 1.08–3.18 |

Mishra (1998) | 27.10–65.50 | 0.204–0.929 | 1.52–1.90 | 1.65 | 0.5 | 42 | 53.11–122.48 | 1.58–1.96 | 0.23–0.43 |

Robinson et al. (1998) | 1.5–27.8 | 0.003–1.626 | 1.25–1.73 | 1.54–1.82 | 0.02–0.40 | 36–42 | 3.00–55.60 | 2 | 0.39–5.63 |

Siebel (2007) | 5.2–7.3 | 0.050–0.282 | 2 | 1.65 | 0.10–0.33 | 40–41 | 15.98–16.03 | 2.19–3.08 | 0.93–3.55 |

Thornton et al. (2008) | 9.91–12.19 | 0.467–0.697 | 1.54–1.86 | 1.65 | 0.005 | 41 | 19.82–24.38 | 2 | 3.66–4.02 |

Thornton et al. (2012) | 1.98–12.19 | 0.026–0.562 | 1.54–1.86 | 1.69 | 0.006–0.500 | 38–41 | 3.98–24.38 | 2 | 1.15–29.00 |

Authors . | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|

Abt et al. (1987) | 2.59–5.59 | 0.21–0.66 | 1.75–2.09 | 1.72 | 0.01–0.02 | 38–40 | 7.77–16.77 | 3 | 12.15–29.20 |

Abt & Johnson (1991) | 2.60–15.70 | 0.03–0.42 | 1.75–2.30 | 1.65–1.72 | 0.01–0.20 | 38–42 | 7.54–31.20 | 2–3 | 0.92–8.13 |

Maynord (1992) | 1.52 | 0.75 | 2.07 | 1.65 | 0.002 | 36 | 2.54 | 1.67 | 99.49 |

Wittler (1994) | 8.13–8.38 | 0.103–0.291 | 1.56–5.33 | 1.52–1.70 | 0.05–0.20 | 41 | 24.39–25.14 | 3 | 1.08–3.18 |

Mishra (1998) | 27.10–65.50 | 0.204–0.929 | 1.52–1.90 | 1.65 | 0.5 | 42 | 53.11–122.48 | 1.58–1.96 | 0.23–0.43 |

Robinson et al. (1998) | 1.5–27.8 | 0.003–1.626 | 1.25–1.73 | 1.54–1.82 | 0.02–0.40 | 36–42 | 3.00–55.60 | 2 | 0.39–5.63 |

Siebel (2007) | 5.2–7.3 | 0.050–0.282 | 2 | 1.65 | 0.10–0.33 | 40–41 | 15.98–16.03 | 2.19–3.08 | 0.93–3.55 |

Thornton et al. (2008) | 9.91–12.19 | 0.467–0.697 | 1.54–1.86 | 1.65 | 0.005 | 41 | 19.82–24.38 | 2 | 3.66–4.02 |

Thornton et al. (2012) | 1.98–12.19 | 0.026–0.562 | 1.54–1.86 | 1.69 | 0.006–0.500 | 38–41 | 3.98–24.38 | 2 | 1.15–29.00 |

*Note*: is the angle of repose of the sediment.

The results from empirical equations are compared with the proposed AI models in the testing stage. Furthermore, histograms for all the variables are illustrated in Figures 2(a)–2(f). These histograms represent the frequency distributions for the main variables controlling the processes under study, in order to provide a compact and effective summary of the literature data characteristics. Incidentally, this analysis might turn out to be useful in the planning of future experimental research. The number of classes was selected so that reasonable displays were developed. In general, the number of classes mainly depends on the number of observations (although also the amount of scatter or dispersion in the data is of significance) and, typically, a number between 5 and 20 bins is satisfactory in most cases. Choosing the number of bins approximately equal to the square root of the number of the observations often works well in practice. Since the number of observations related to the diagrams from Figures 2(a)–2(f) is always the same (i.e., 102 observations), we consider a fixed number (equal to 9) of classes for each variable. For each variable, the frequency represents the number of times an outcome takes place in the dataset, in relation to the total number of observations. From these histograms it follows that the explored range for the embankment slope *S* is satisfactorily large (i.e., from 0.002 to 0.50) with an approximately uniform frequency distribution within it. Conversely, the frequency distribution for the other variable of particular interest, namely, the uniformity coefficient *C _{u}*, is almost asymmetric and positively skewed. The tail of the distribution goes to the largest value of 5.33, but the majority of tests (around 90%) were conducted with

*C*around 2.0, which would imply the use of bed sediments only slightly different from the uniform ones (i.e.,

_{u}*C*< 1.5). It is worth noting here that also the relative riprap layer thickness,

_{u}*t*/

*D*

_{50}, was not adequately tested in the literature with values almost always around 2–3, as can be seen from Table 1. Finally, Figure 2(e) shows how the specific density of riprap layer,

*G*, is practically always around 2.65, as one would expect for natural bed sediments. However, for future research, it would be interesting to test synthetic materials, with

_{s}*G*significantly different from 2.65, in order to definitely assess the role of the stone-referred densimetric Froude number

_{s}*F*.

_{s,c}## MODELS AND METHODS

In this section, basic definitions of SVM, MARS, and RF models are introduced in brief. More details can be found in the literature (e.g., Vapnik 1995; Giustolisi 2006). Afterwards, development of the proposed models using databases extracted from experimental studies will be implemented.

### Support vector machine (SVM)

The SVM is one of the powerful supervised learning techniques to provide reliable and robust predictions. SVM can yield minimum value of the expected computational error by means of structural risk minimization (SRM) technique so as to eradicate occurrence of overfitting. Basically, SRM is used in machine learning theory. Commonly in machine learning, a generalized model should be essentially selected from a dataset with a certain (finite) sample. Consequently, the problem of overfitting may occur in a way that the model suffers from not only becoming too strongly tailored to the particularities of the training dataset, but also generalizing poorly to contemporary (or new) dataset (lack of generalization). In fact, the SRM can eradicate the possibility of this problem by balancing the model's complexity against its prosperity at fitting the training datasets. SVM can map input datasets related to the training phase into a higher dimensional feature space (Vapnik 1995; Yu *et al.* 2004; Amaranto *et al.* 2018; Antunes *et al.* 2018).

*N*is the sample size of data. These sets of input–output variables are considered to conduct the training stage. The regression function,

*ϕ*, is generally expressed as (Komasi

*et al.*2018):where

*w*and

*b*are weighting vector in the feature space with the dimension

*o*and bias term, respectively, and denotes the inner product. Moreover, by addition of function of empirical risk, Equation (11) is converted to a minimization problem as (Komasi

*et al.*2018):where

*C*denotes a constant value, greater than zero, which defines the penalty for computational error by model, and are the slack variables which are required to define in order to measure the observed (or actual) values to the related boundary values of . In SVM, quadratic programming (QP), as one of the most efficient techniques, is applied to solve non-linear optimization problems (Equations (12) and (13)) with linear constraints. With respect to QP method, Equations (12) and (13) are rearranged as (Pham

*et al.*2018):where and are the Lagrange multipliers,

*N*is the sample size, and

*k*(·) is the kernel function, which is defined as an inner product of and functions as follows:

Input variables in SVM are transferred to kernel function-based formulations (i.e., radial basis function, sigmoid, linear, and polynomial). In this mapping process, input variables in the forms of kernels are prepared to create a non-linear problem. Different kernel functions are shown in Table 2. They are used in SVM to determine the best one. There are some tuning SVM parameters, such as optimal regularization parameter (*C*), kind of kernel function, and optimal kernel parameters (*r*, *d*, *γ*), which are required to be assigned (Amaranto *et al.* 2018). With reference to this study, the optimal magnitudes of parameters for each type of kernel function are given in Table 3. All types of the kernel functions have been tested and their corresponding mean squared errors (MSEs) in the training stage were considered for selection of the best performance. Table 3 indicated that the RBF (MSE = 1.27) and polynomial (MSE = 1.41) kernels could achieve comparatively lower computational errors for predicting *F _{s,c}* in comparison with linear (MSE = 3.36) and sigmoid (MSE = 4.41) kernel functions. Also, SVM with RBF kernel function had higher accuracy level in the prediction of

*F*than that with polynomial kernel function. Hence, in this study, the RBF kernel function was used in the SVM technique. Moreover, through modeling the SVM, control parameter of K-fold number was considered to eradicate the possibility of overfitting. In fact, various K-fold numbers (3, 5, 8, and 10) were assigned and generally it was found that K-fold = 10 had the best performance in terms of accuracy level.

_{s,c}Kernel type . | Formulation . |
---|---|

Linear | |

Radial basis function (RBF) | , |

Polynomial | , |

Sigmoid |

Kernel type . | Formulation . |
---|---|

Linear | |

Radial basis function (RBF) | , |

Polynomial | , |

Sigmoid |

Kernel functions . | Dimensionless variables . | |
---|---|---|

Setting parameters . | MSE . | |

Linear | γ= 0.0164 | 3.36 |

C= 149.071 | ||

Radial basis function (RBF) | γ= 2564.67 | 1.27 |

C= 58.548 | ||

Polynomial | r= 1.66 | 1.41 |

γ= 0.1312 | ||

d= 5 | ||

C= 73.061 | ||

Sigmoid | γ= 0.1560 | 4.41 |

r= 1.85 | ||

C= 54.210 |

Kernel functions . | Dimensionless variables . | |
---|---|---|

Setting parameters . | MSE . | |

Linear | γ= 0.0164 | 3.36 |

C= 149.071 | ||

Radial basis function (RBF) | γ= 2564.67 | 1.27 |

C= 58.548 | ||

Polynomial | r= 1.66 | 1.41 |

γ= 0.1312 | ||

d= 5 | ||

C= 73.061 | ||

Sigmoid | γ= 0.1560 | 4.41 |

r= 1.85 | ||

C= 54.210 |

### Multivariate adaptive regression splines (MARS)

*et al.*2012). To enhance the accuracy of MARS, the backward technique is employed in order to wipe out the unnecessary datasets using generalized cross validation (GCV). The GCV relationship is computed as:where

*PF*denotes the penalty factor which is calculated as:in which,

*d*and

_{e}*P*denote the determination parameter and number of BFs, respectively.

*O*with

*T*(output vector), in which is known as the specific pattern of the predicted error by the model. is a function being estimated by the BFs. In point of fact, mathematic formulations of BFs are generally linear at least (or polynomials) with smooth trend. For smoothing polynomials' degree, the piecewise linear regression is taken into account. Basic mathematical formulation of piecewise linear regression is known as , where a knot exists at

*u*value. Real values of are determined as:MARS technique can generate BFs with linear combination as:in which, is a basis function at least,

*v*denotes the constant coefficients of BFs which are approximated via least square method (LSM).

*O*), is obtained using a combination of BFs. To assess comparative significance of the input vectors and the BFs, analysis of variance (ANOVA) decomposition is applied (Haghiabi 2016). In the present investigation, 17 BFs were adjusted for the prediction of

*F*in Table 4. Results of ANOVA decomposition of the proposed MARS technique is given in Table 5. The GCV values, in the third column, provide information about the comparative significance of the corresponding ANOVA function. The best model extracted from the MARS model for prediction of

_{s,c}*F*is expressed as:

_{s,c}BF . | Equations . |
---|---|

BF1 | max(0, S− 0.04) |

BF2 | max(0, 2 −t/D_{50}) × max(0, 0.2 −S) |

BF3 | max(0, t/D_{50}− 2.9) |

BF4 | max(0, S− 0.04) × max(0, t/D_{50}− 2.9) |

BF5 | max(0, S− 0.04) × max(0, 2.9 −t/D_{50}) |

BF6 | max(0, 0.05 −S) |

BF7 | max(0, S− 0.05) × max(0, t/D_{50}− 2.9) |

BF8 | max(0, S− 0.05) × max(0, 2.9 −t/D_{50}) |

BF9 | max(0, 2.9 −t/D_{50}) × max(0, C_{u}− 1.65) |

BF10 | max(0, 2.9 −t/D_{50}) × max(0, 1.65 −C) _{u} |

BF11 | max(0, C_{u}− 1.54) |

BF12 | max(0, S− 0.1) |

BF13 | max(0, 0.1 −S) |

BF14 | max(0, C_{u}− 2.3) |

BF15 | max(0, 2.3 −C) × max(0, _{u}t/D_{50}− 2.39) |

BF16 | max(0, 2.3 −C) × max(0, 2.39 _{u}−t/D_{50}) |

BF17 | max(0, 1.75 −C) × max(0, 0.04 _{u}−S) |

BF . | Equations . |
---|---|

BF1 | max(0, S− 0.04) |

BF2 | max(0, 2 −t/D_{50}) × max(0, 0.2 −S) |

BF3 | max(0, t/D_{50}− 2.9) |

BF4 | max(0, S− 0.04) × max(0, t/D_{50}− 2.9) |

BF5 | max(0, S− 0.04) × max(0, 2.9 −t/D_{50}) |

BF6 | max(0, 0.05 −S) |

BF7 | max(0, S− 0.05) × max(0, t/D_{50}− 2.9) |

BF8 | max(0, S− 0.05) × max(0, 2.9 −t/D_{50}) |

BF9 | max(0, 2.9 −t/D_{50}) × max(0, C_{u}− 1.65) |

BF10 | max(0, 2.9 −t/D_{50}) × max(0, 1.65 −C) _{u} |

BF11 | max(0, C_{u}− 1.54) |

BF12 | max(0, S− 0.1) |

BF13 | max(0, 0.1 −S) |

BF14 | max(0, C_{u}− 2.3) |

BF15 | max(0, 2.3 −C) × max(0, _{u}t/D_{50}− 2.39) |

BF16 | max(0, 2.3 −C) × max(0, 2.39 _{u}−t/D_{50}) |

BF17 | max(0, 1.75 −C) × max(0, 0.04 _{u}−S) |

MARS technique by dimensionless variables . | ||||
---|---|---|---|---|

Function . | Standard deviation . | GCV . | Number of basis functions . | Variables . |

1 | 6.54 | 261.97 | 1 | t/D_{50} |

2 | 3.81 | 74732.21 | 4 | S |

3 | 3.48 | 110.35 | 2 | C _{u} |

4 | 10.96 | 601.93 | 5 | t/D_{50} and S |

5 | 5.16 | 106.58 | 4 | t/D_{50} and C _{u} |

6 | 2.05 | 32.28 | 1 | S and C _{u} |

1 | 6.54 | 261.97 | 1 | t/D_{50} |

MARS technique by dimensionless variables . | ||||
---|---|---|---|---|

Function . | Standard deviation . | GCV . | Number of basis functions . | Variables . |

1 | 6.54 | 261.97 | 1 | t/D_{50} |

2 | 3.81 | 74732.21 | 4 | S |

3 | 3.48 | 110.35 | 2 | C _{u} |

4 | 10.96 | 601.93 | 5 | t/D_{50} and S |

5 | 5.16 | 106.58 | 4 | t/D_{50} and C _{u} |

6 | 2.05 | 32.28 | 1 | S and C _{u} |

1 | 6.54 | 261.97 | 1 | t/D_{50} |

Intrinsically, MARS technique is capable of producing a polynomial regression (with quadratic form) based on spline conceptions. However, in this study, all the 17 BFs have simple formulations. Additionally, all the input variables (*S*, *C _{u}*,

*t*/

*D*

_{50}) used in these BFs are easy to acquire. As will be shown in the following, statistical benchmarks indicated that MARS technique provided a more satisfying performance than empirical equations. Additionally, results from the MARS model are absolutely flexible to the changes in ranges of inputs. In this regard, two important issues were considered through running MARS: (1) preserving the physical meaning of results or consistency and (2) obtaining the highest level of accuracy in comparison to the empirical equations. Finally, it has been implied that the proposed MARS model does not replace the traditional equations in which physical essence is most explicit, but the joint use of this AI model and empirical models can lead to extremely reliable results.

### Random forest (RF)

*et al.*2018). In RF model, input variables are converted to splitting parameters and then their corresponding values are obtained. In fact, in this step, impurity of children nodes are evaluated and, additionally, the best splitting parameter is selected among input variables by using the Gini index (GI). This index is a benchmark of how each input variable contributes to the homogeneity (or impurity) of the nodes and leaves in the resulting RF model. Each time a particular variable is applied in order to split a node, the GI related to the child nodes is computed and compared to that of the original node. Furthermore, in tree structure of the RF approach, the ultimate decision is obtained by means of output average, after ascertaining fitness for single trees within bagging technique. The bias value related to the bagged trees is equal to that obtained in single trees, whereas the variance decreases by decreasing the meaningful correlation among trees (Antunes

*et al.*2018). Development of RFs is at the mercy of tree-growing technique which is contingent upon a random vector (ϕ) with the aim that tree estimator,

*λ*(

*X*, ϕ), has the capability of numerical results' derivation. To evaluate the performance of RF, the mean squared error (E), pertained to each numerical estimator

*λ*(

*X*), is expressed as (Breiman 2001):

Basically, the more voluminous trees get the more accurate the results. However, the development of trees declines as the number of trees increases, i.e., at a certain point the benefit in precision level of RF from training more trees will become lower than the cost in computation time for training these additional trees. RFs are known as ensemble techniques implying an average over some trees. In a similar way, should one wish to estimate an average of a real-valued random variable, a sample could be considered. In this case, for 102 data series, a forest with 10 trees performs more accurately in comparison with 500 trees. This issue is due to the statistical variance value. If this took place automatically, something goes wrong with the implementation. Typical values for the number of trees (or level trees) are 10, 30, or 100. In the case of very few practices, more than 300 trees outweighs the cost of training trees. In this study, 10 level trees had the most accurate level rather than other level trees.

## RESULTS AND DISCUSSION

*N*is the number of observations and the meaning of the other symbols is easy to understand. In terms of quantification, RMSE is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSE is better than a higher one. SI is calculated by dividing RMSE by the mean of the observations at each grid point. It presents the percentage of RMSE difference with respect to mean observation or it gives the percentage of expected error for the parameter.

### Quantitative comparisons of the proposed AI models

Quantitative comparisons to investigate the performance of SVM-RBF, MARS, and RF were carried out for both training and testing phases. Table 6 presents the statistical results of models' performance. Through the training phase, the values of R (0.99) and RMSE (1.11) given by MARS indicated higher level of precision of this model when compared to SVM-RBF (R = 0.99 and RMSE = 1.62) and RF (R = 0.98 and RMSE = 3.61). Moreover, with respect to MAE and SI parameters, MARS technique, introduced as a set of linear and quadratic relationships, has estimated the *F _{s,c}* with the lowest value of computational errors (MAE = 0.27 and SI = 0.25) than the SVM-RBF (MAE = 0.31 and SI = 0.37) and RF (MAE = 0.23 and SI = 0.81) techniques. Table 6 indicates that the proposed SVM-RBF technique with RBF kernel function has a higher level of accuracy compared to the RF approach. Overall, statistical information in Table 6 showed that R-values for the training phase had marginal differences together and, consequently, these values may not be a permissible platform in order to quantify comparison of performance, whereas other statistical parameters could provide valuable information about models' performance.

. | Training stage . | |||
---|---|---|---|---|

R . | RMSE . | MAE . | SI . | |

AI models | ||||

SVM-RBF | 0.99 | 1.62 | 0.31 | 0.37 |

MARS | 0.99 | 1.11 | 0.27 | 0.25 |

RF | 0.98 | 3.61 | 0.23 | 0.81 |

. | Testing stage . | |||

. | R . | RMSE . | MAE . | SI . |

AI models | ||||

SVM-RBF | 0.98 | 1.17 | 0.58 | 0.37 |

MARS | 0.92 | 2.32 | 0.31 | 0.75 |

RF | 0.89 | 1.93 | 0.37 | 0.63 |

Empirical equations | ||||

Olivier (1967) | 0.94 | 1.45 | 0.36 | 0.48 |

Abt & Johnson (1991) | 0.91 | 2.41 | 0.37 | 0.77 |

Sommer (1997) | 0.94 | 6.01 | 1.77 | 1.45 |

Robinson et al. (1998) | −0.58 | 4.57 | 0.47 | 1.32 |

Dornack (2001) | 0.93 | 3.95 | 0.53 | 1.18 |

Siebel (2007) | 0.94 | 1.79 | 0.37 | 0.52 |

Khan & Ahmad (2011) | 0.95 | 1.67 | 0.43 | 0.53 |

Thornton et al. (2014) | 0.95 | 1.38 | 0.75 | 0.40 |

. | Training stage . | |||
---|---|---|---|---|

R . | RMSE . | MAE . | SI . | |

AI models | ||||

SVM-RBF | 0.99 | 1.62 | 0.31 | 0.37 |

MARS | 0.99 | 1.11 | 0.27 | 0.25 |

RF | 0.98 | 3.61 | 0.23 | 0.81 |

. | Testing stage . | |||

. | R . | RMSE . | MAE . | SI . |

AI models | ||||

SVM-RBF | 0.98 | 1.17 | 0.58 | 0.37 |

MARS | 0.92 | 2.32 | 0.31 | 0.75 |

RF | 0.89 | 1.93 | 0.37 | 0.63 |

Empirical equations | ||||

Olivier (1967) | 0.94 | 1.45 | 0.36 | 0.48 |

Abt & Johnson (1991) | 0.91 | 2.41 | 0.37 | 0.77 |

Sommer (1997) | 0.94 | 6.01 | 1.77 | 1.45 |

Robinson et al. (1998) | −0.58 | 4.57 | 0.47 | 1.32 |

Dornack (2001) | 0.93 | 3.95 | 0.53 | 1.18 |

Siebel (2007) | 0.94 | 1.79 | 0.37 | 0.52 |

Khan & Ahmad (2011) | 0.95 | 1.67 | 0.43 | 0.53 |

Thornton et al. (2014) | 0.95 | 1.38 | 0.75 | 0.40 |

Through testing stages, SVM-RBF model indicated more accurate prediction of *F _{s,c}* with regard to RMSE (1.17) and SI (0.37) in comparison to MARS (RMSE = 2.32 and SI = 0.75) and RF (RMSE = 1.93 and SI = 0.63). Similarly, R-values indicated slight superiority of SVM-RBF to the other two AI models. With respect to R and MAE values, MARS approach with R value of 0.92 and MAE of 0.31 predicted

*F*more accurately than RF (R = 0.89 and MAE = 0.37). Even though the MARS model has provided stone-refereed densimetric Froude number,

_{s,c}*F*, values with relatively lower accuracy level than SVM-RBF, Equation (21) has more practicability and it is easy to use compared to SVM-RBF and RF approaches.

_{s,c}### Qualitative comparisons of the proposed AI models

Figures 3(a)–3(f) illustrate the graphical performance of the AI models used in the current investigation at both training and testing phases. At the training stage, SVM-RBF and MARS techniques had the best performance for the extreme value of *F _{s,c}* = 99.49 (Figures 3(a) and 3(b)), while RF technique indicated relatively high underestimation (Figure 3(c)). For

*F*around 30, SVM-RBF and RF models demonstrated the same manner, showing slight underprediction in the allowable error range, whereas the MARS approach had both underprediction and overprediction. Furthermore, the three AI models indicated the best performance for

_{s,c}*F*< 10. At the testing stage, Figure 3 indicates that, for observed values of

_{s,c}*F*between 1 and 2, SVM-RBF overpredicted

_{s,c}*F*remarkably (Figure 3(d)). Figure 3(e) illustrated that MARS had the best performance for

_{s,c}*F*< 5. Moreover, MARS had underprediction and overprediction for the ranges of

_{s,c}*F*5–10 and 10–15, respectively. From Figure 3(f), for

_{s,c}*F*< 2, RF model had a relative acceptable performance with slight overprediction.

_{s,c}*F*just over 6 demonstrated remarkable overprediction whereas, for

_{s,c}*F*approximately 10, RF had underprediction.

_{s,c}### Comparative study of the empirical equations performance

In this section, the efficiency of the considered empirical equations (from Equation (1) to Equation (8)) was investigated by using testing datasets. According to Table 6, Equation (1), suggested by Olivier (1967), had the absolute superiority in estimating *F _{s,c}* in comparison to other experimental equations, showing RMSE = 1.45 and SI = 0.48. In contrast, Equation (3), proposed by Sommer (1997), predicted

*F*with higher computational error in comparison with other equations. Sommer's (1997) equation indicated significant overprediction with RMSE = 6.01 and SI = 1.45. Moreover, Siebel's (2007) equation achieved the second rank of accuracy level with MAE = 0.37 and SI = 0.52. With respect to RMSE and SI criteria, Siebel's (2007) equation (Equation (6)) had better performance than Equation (4) by Robinson

_{s,c}*et al.*(1998) (RMSE = 4.57 and SI = 1.32) and almost the same performance as Equation (7) by Khan & Ahmad (2011) (RMSE = 1.67 and SI = 0.53). According to Table 6, Equation (2), proposed by Abt & Johnson (1991), has provided the

*F*predictions with relatively higher accuracy than Equation (5) by Dornack (2001) (RMSE = 3.95 and SI = 1.18). In addition, RMSE and SI values obtained by Thornton

_{s,c}*et al.*’s (2014) equation would demonstrate that this equation definitely estimates more accurate

*F*values in comparison with Dornack's (2001) equation (RMSE = 3.95 and SI = 1.18).

_{s,c}In terms of qualitative comparisons, Figures 4(a)–4(h)) show the performance of the empirical equations with significant over- and underpredictions. As can be seen in Figure 4(a), Olivier's (1967) equation (Equation (1)) had a permissible level of performance, whereas Abt & Johnson's (1991) equation had slight overprediction for lower values of *F _{s,c}*, as illustrated in Figure 4(b).

*F*values by Sommer's (1997) equation suffered from remarkable overpredictions (Figure 4(c)); on the contrary, Figure 4(d) illustrates the opposite trend exhibited by Robinson

_{s,c}*et al.*’s (1998) equation. As shown in Figure 4(e), Dornack's (2001) equation has indicated high underpredictions for

*F*greater than 2. In fact, Equation (5), proposed by Dornack (2001), depends only on the slope of the riprap layer with a range of

_{s,c}*S*from 0.29 to 0.67. According to Figures 4(f) and 4(g), both Siebel's (2007) and Khan & Ahmad's (2011) equations had comparatively perfectible performance. Ultimately, Figure 4(h) shows how the equation by Thornton

*et al.*(2014) is prone to provide relatively high overpredictions for

*F*< 4.

_{s,c}In the final analysis, therefore, the equation proposed by Sommer (1997) would appear quite conservative. Conversely, the equations proposed by Robinson *et al.* (1998) and Dornack (2001) would lead to considerable underpredictions already from values of *F _{s,c}* > 2 (i.e.,

*D*

_{50}low-values). Perhaps this is linked with the objectives pursued by these authors in their studies: rock-fill dam spillways, in the case of Dornack (2001), and rock chutes, in the case of Robinson

*et al.*(1998). Both cases imply a better focus on

*D*

_{50}high-values and then more restricted values of

*F*. To a lesser extent, also the equation suggested by Abt & Johnson (1991) would lead to significant underpredictions, but for

_{s,c}*F*> 10. A plausible reason may relate to the experimental range of the values of

_{s,c}*F*explored by the authors (from 0.92 to 8.13, as shown in Table 1), values on which their equation was calibrated.

_{s,c}## PARAMETRIC STUDY

In this part of the research, the effects of *S* on *F _{s,c}* were investigated for different ranges of

*S*itself (Table 7). As regards RMSE and SI criteria, for

*S*values between 0.002 and 0.080, SVM-RBF and MARS models had the same performance in predicting

*F*. Conversely, RF indicated a lower value of accuracy (RMSE = 6.70 and SI = 0.52) in comparison with the other two AI techniques. For

_{s,c}*S*ranging from 0.100 to 0.167, the RF model could achieve higher values of accuracy (RMSE = 0.33 and SI = 0.17) than SVM-RBF (RMSE = 0.84 and SI = 0.45) and MARS (RMSE = 0.53 and SI = 0.28). For slope values in the ranges from 0.20 to 0.25 and from 0.30 to 0.50, the RF technique had a similar trend to that observed in the range from 0.10 to 0.17. In addition, Figures 5(a)–5(c) indicate that all the proposed AI models have predicted a downward trend of

*F*versus

_{s,c}*S*. This trend is in good agreement with the experimental findings by several researchers. Experimental studies by Knauss (1979), Whittaker & Jäggi (1986), Palt (2002), and Siebel (2007) were conducted for

*F*= 0.5–7.0 and

_{s,c}*S*

*=*0.005–0.550. From these ranges and performances of AI models, it can be inferred that the simulated variations of

*F*versus

_{s,c}*S*values are rational and have the capability of preserving the consistency of results.

AI models . | S = 0.002–0.008 . | S = 0.10–0.17 . | S = 0.20–0.25 . | S = 0.30–0.50 . |
---|---|---|---|---|

SVM-RBF | RMSE = 2.90 | RMSE = 0.84 | RMSE = 0.46 | RMSE = 0.42 |

SI = 0.23 | SI = 0.45 | SI = 0.33 | SI = 0.43 | |

MARS | RMSE = 2.90 | RMSE = 0.53 | RMSE = 0.44 | RMSE = 0.39 |

SI = 0.24 | SI = 0.28 | SI = 0.32 | SI = 0.40 | |

RF | RMSE = 6.70 | RMSE = 0.33 | RMSE = 0.27 | RMSE = 0.38 |

SI = 0.52 | SI = 0.17 | SI = 0.20 | SI = 0.33 |

AI models . | S = 0.002–0.008 . | S = 0.10–0.17 . | S = 0.20–0.25 . | S = 0.30–0.50 . |
---|---|---|---|---|

SVM-RBF | RMSE = 2.90 | RMSE = 0.84 | RMSE = 0.46 | RMSE = 0.42 |

SI = 0.23 | SI = 0.45 | SI = 0.33 | SI = 0.43 | |

MARS | RMSE = 2.90 | RMSE = 0.53 | RMSE = 0.44 | RMSE = 0.39 |

SI = 0.24 | SI = 0.28 | SI = 0.32 | SI = 0.40 | |

RF | RMSE = 6.70 | RMSE = 0.33 | RMSE = 0.27 | RMSE = 0.38 |

SI = 0.52 | SI = 0.17 | SI = 0.20 | SI = 0.33 |

In addition to the slope *S*, the control by the uniformity coefficient of riprap stones, *C _{u}*, and the relative thickness of riprap layer,

*t*/

*D*

_{50}, over the stone-referred densimetric Froude number

*F*was explored. This is based on the MARS model proposed in this study (Equation (21)) by varying either

_{s,c}*C*or

_{u}*t*/

*D*

_{50}and keeping the other independent variables constant. In particular, it was found that the greater is

*C*(i.e., the non-uniformity of the riprap material) the greater is the resistance of the riprap layer to erosion. This result is in harmony with some empirical equations from the literature (e.g., Equations (7) and (8)), but the dependence of

_{u}*F*on

_{s,c}*C*would appear more reliable in this study because it is based on a much more wide-ranging dataset. Analogous trend was found for

_{u}*t*/

*D*

_{50}(i.e., the greater is

*t*/

*D*

_{50}and the greater is the resistance of the riprap layer to erosion), as it is reasonable to expect from a physical point of view. This result would be suitably in contradiction with Equations (7) and (8) from the literature according to which the probability of riprap failure would strangely increase with increasing the thickness

*t*of the riprap layer.

## EVALUATION OF THE PROPOSED TECHNIQUES USING DISCREPANCY ANALYSIS

On the basis of Equation (27), if DR is just (or roughly) equal to 1, the estimated *F _{s,c}* values are just the same as the observed

*F*values. If DR becomes larger than 1, the AI model overpredicts

_{s,c}*F*values, and finally, if DR becomes smaller than 1, the AI model would show underprediction status (Noori

_{s,c}*et al.*2009).

In the current investigation, results of testing stages obtained by the AI models and empirical equations were used to calculate DR values. Table 8 indicates quite a few statistical parameters of DR values. From Table 8, the MARS approach could achieve the minimum value of variance compared with the SVM-RBF and RF models. Furthermore, the average of DR values calculated by Sommer (1997) (Equation (3)) was indicative of having the lowest accuracy level of performance in comparison to the other equations obtained by experimental observations. In the case of Olivier's (1967) equation, average and variance of DR values showed relatively better performance than Dornack's (2001) equation. Table 8 indicates that the average and variance for Abt & Johnson's (1991) equation are practically the same as Olivier's (1967) equation. Moreover, the average of DR values given by Thornton *et al.*’s (2014) equation showed higher accuracy than Equation (3) by Sommer (1997). For assessing qualitative term comparisons of DR index, variations of DR values versus *S* for the AI techniques and conventional equations are shown in Figures 6(a) and 6(b), respectively. Figure 6(a) shows that almost all the DR values for AI models are from 0.5 to 1.5. Figure 6(a) also shows qualitatively that all the proposed models had either a lower level of underprediction or overprediction. For instance, AI models have overpredicted *F _{s,c}* values for

*S*around 0.125. Furthermore, most of the points are concentrated around the perfect line of DR = 1. Evidently, Figure 6(b) illustrates that Equation (3) by Sommer (1997) predicted

*F*values with remarkable overprediction in comparison with other traditional equations. This result clearly corroborates what has previously been said about the rather conservative nature of the Sommer (1997) equation.

_{s,c}DR statistics . | Average . | Minimum . | Maximum . | Variance . |
---|---|---|---|---|

AI models | ||||

SVM-RBF | 1.04 | 0.31 | 2.53 | 0.32 |

MARS | 1.15 | 0.44 | 2.53 | 0.12 |

RF | 1.22 | 0.57 | 4.00 | 0.42 |

Empirical equations | ||||

Olivier (1967) | 1.20 | 0.38 | 3.00 | 0.44 |

Abt & Johnson (1991) | 1.19 | 0.42 | 3.56 | 0.44 |

Sommer (1997) | 2.70 | 1.58 | 8.19 | 2.24 |

Robinson et al. (1998) | 0.64 | 0.02 | 3.01 | 0.39 |

Dornack (2001) | 0.95 | 0.12 | 4.36 | 0.86 |

Siebel (2007) | 0.76 | 0.38 | 2.41 | 0.21 |

Khan & Ahmad (2011) | 1.10 | 0.22 | 3.92 | 0.63 |

Thornton et al. (2014) | 1.59 | 0.26 | 3.93 | 0.75 |

DR statistics . | Average . | Minimum . | Maximum . | Variance . |
---|---|---|---|---|

AI models | ||||

SVM-RBF | 1.04 | 0.31 | 2.53 | 0.32 |

MARS | 1.15 | 0.44 | 2.53 | 0.12 |

RF | 1.22 | 0.57 | 4.00 | 0.42 |

Empirical equations | ||||

Olivier (1967) | 1.20 | 0.38 | 3.00 | 0.44 |

Abt & Johnson (1991) | 1.19 | 0.42 | 3.56 | 0.44 |

Sommer (1997) | 2.70 | 1.58 | 8.19 | 2.24 |

Robinson et al. (1998) | 0.64 | 0.02 | 3.01 | 0.39 |

Dornack (2001) | 0.95 | 0.12 | 4.36 | 0.86 |

Siebel (2007) | 0.76 | 0.38 | 2.41 | 0.21 |

Khan & Ahmad (2011) | 1.10 | 0.22 | 3.92 | 0.63 |

Thornton et al. (2014) | 1.59 | 0.26 | 3.93 | 0.75 |

## CONCLUSIONS

This study aimed to evaluate the non-dimensional unit discharge (*F _{s,c}*) at the failure condition of riprap layer for various streambank slopes using three data-mining approaches including MARS, SVM-RBF, and RF models. Five input variables were extracted from experimental works with the aim of developing AI approaches. Generally, the following conclusions can be drawn from the current investigation:

Statistical performance of both training and testing stages demonstrated that the SVM-RBF model provided

*F*values with a higher level of accuracy compared with MARS model, as a set of BFs, and RF techniques. Furthermore, Equation (21), given by MARS technique, was a more precise soft computing tool than other regression-based models._{s,c}Results of empirical equations indicated that Equations (3)–(5) are lower than the proposed machine learning approaches in terms of all statistical criteria considered in this study. Equations (1), (6), and (7) exhibit a more relatively acceptable precision in estimating

*F*than those obtained by Equations (3)–(5)._{s,c}Quantitative and qualitative variations of

*F*versus the slope_{s,c}*S*indicated that findings of the AI approaches were in permissible agreement with the preceding experimental investigations carried out by Siebel (2007). In fact, this issue preserved the consistency of results.DR analysis has proven that the

*F*values predicted by the AI techniques are placed in the permissible error bound in comparison with empirical equations, which produced a large amount of over- or underestimation._{s,c}

## REFERENCES

*–2012*

*PhD thesis*