Riprap stones are frequently applied to protect rivers and channels against erosion processes. Many empirical equations have been proposed in the past to estimate the unit discharge at the failure circumstance of riprap layers. However, these equations lack general impact due to the limited range of experimental variables. To overcome these shortcomings, support vector machine (SVM), multivariate adaptive regression splines (MARS), and random forest (RF) techniques have been applied in this study to estimate the approach densimetric Froude number at the incipient motion of riprap stones. Riprap stone size, streambank slope, uniformity coefficient of riprap layer stone, specific density of stones, and thickness of riprap layer have been considered as controlling variables. Quantitative performances of the artificial intelligence (AI) models have been assessed by many statistical measures including: coefficient of correlation (R), root mean square error (RMSE), mean absolute error (MAE), and scatter index (SI). Statistical performance of AI models indicated that SVM model with radial basis function (RBF) kernel had better performance (SI = 0.37) than MARS (SI = 0.75) and RF (SI = 0.63) techniques. The proposed AI models performed better than existing empirical equations. From a parametric study the results demonstrated that the erosion-critical stone-referred Froude number (Fs,c) is mainly controlled by the streambank slope.

Rock armor (commonly known as riprap) has been used in hydraulic engineering to protect hydraulic structures such as bridge piers, grade-control structures, bridge abutments, culvert outlets, end sill of stilling basin, ski-jump bucket spillways, dam embankments, and channel beds, which are exposed to scour and erosion processes (e.g., Borah 1989; Froehlich 1995; Lauchlan & Melville 2001; Dey & Barbhuiya 2004; Eli & Gray 2008; Hiller et al. 2018). Stability of ripraps is a significant factor in their design. The unit discharge of the overtopping flow, the gradation and shape of riprap stones, and the bed and bank slope of waterways highly affect the stability of ripraps (e.g., Ullmann & Abt 2000; Thornton et al. 2008; Eli & Gray 2008). Underestimation of these effective variables may increase the possibility of scouring or liquidation of armored rock layer. In contrast, overestimation of these important elements increases the cost of the project (Thornton et al. 2014). For instance, the accurate estimation of the stone sizes enhances the stability of ripraps, especially when they are vulnerable to overtopping (Thornton et al. 2014). Hence, a large number of studies investigated the riprap stability on steep slopes for different hydraulic conditions, and gradation of riprap stones (Hartung & Scheuerlein 1970; Abt et al. 1987; Wittler & Abt 1997; Ullmann & Abt 2000; Gallegos 2001; Eli & Gray 2008; Hiller et al. 2018, 2019).

In the effort to quantify the overtopping phenomenon, many empirical equations, extracted from experimental observations, have been proposed to estimate the unit discharge at the failure circumstance of riprap layer for various streambank slopes and properties of bed sediments (Thornton et al. 2014). However, these equations lack generalization due to the limited range of experimental variables and hence do not extend to a wide range of hydraulic conditions (Thornton et al. 2014; Najafzadeh et al. 2018). Moreover, these empirical relationships are developed on the traditional regression-based approaches that cannot robustly capture the non-linear relationship between the key variables at the incipient motion of riprap stones.

Due to the above-mentioned restrictions, artificial intelligence (AI) approaches have been recently employed to accurately estimate the riprap stone size. Najafzadeh et al. (2018) used evolutionary algorithm-based formulations to predict the size of riprap stones in overtopping flows. From their research, it was found that the utilized AI models could provide more accurate predictions.

Recently, AI-based data classification and machine learning methods have been employed for forecasting groundwater table (Giustolisi 2006; Amaranto et al. 2018), evaluation of circumstances of sewer networks (Caradot et al. 2018), estimation of chlorophyll-a concentration in water surfaces (Yajima & Derot 2017), prediction of water demand for a short-time period (Antunes et al. 2018), estimation of suspended sediment concentration in river (Babovic 2009), run-off forecasting (Babovic 2005; Adamowski et al. 2012; Meshgi et al. 2015), prediction of standardized precipitation index (Komasi et al. 2018), shear strength of soil (Pham et al. 2018), and longitudinal dispersion coefficients in rivers (Haghiabi 2016). Through these studies, support vector machine (SVM), multivariate adaptive regression spline (MARS), and random forest (RF) are the most robust machine learning models which have ever applied in solving various problems in water engineering. Because of their remarkable advantages, these AI models were considered. The most remarkable characterization of SVM is the high potential of generalizing datasets whose number is small in the training stage and additionally SVM does not get stuck in local optimum like artificial neural networks. Furthermore, RF is generally fast to build the model and has the capability to automatically select datasets from a large number of input variables. For quite a few datasets, RF is able to produce a highly precise classifier. As a merit, MARS techniques will no longer need the functional relationship among independent and dependent variables and, in addition to this, relationships given by MARS model are additive and iterative. In the case of the incipient motion of a riprap for overtopping flows, there are more than 20 empirical equations obtained by experimental investigations. Each empirical equation was extracted from certain experimental conditions and limited range of experimental variables. Khan & Ahmad (2011) collected previous experimental data and presented a multiple regression equation over all the available datasets. Even though their equation had the highest precision in comparison to previous empirical equations, Khan & Ahmad (2011) just performed the validation of their proposed equation. Additionally, their equation was not subsequently tested (or checked) by the new experimental datasets. On the other hand, the accuracy level of the empirical equation by Khan & Ahmad (2011) was not checked. This means that the generalization of this equation due to randomizing datasets/partitioning datasets into calibration (training) and validation (testing) stages cannot be reached. During the recent half-century, data-acquiring systems have been employed to obtain information about some processes. With the emergence of contemporary sciences, these kinds of systems have shown more accurate and reliable results. Furthermore, data acquisition systems can be automated by advanced machines, introduced as machine learning models, to obtain more reliable recognition of behavioral patterns for various phenomena with engineering applications. There is no denying that the use of machine learning models in prediction of various variables can efficiently cover limitations of empirical techniques.

With this study, there is no claim to clarify the overtopping phenomena in a comprehensive manner, but there is a tendency to: (i) emphasize the limitations of the current empirical equations; (ii) highlight the key dimensionless variables controlling the overtopping phenomena; and (iii) try to provide predictive models that, even though more structurally complex, are more accurate. To the best of the authors' knowledge, three powerful AI models, SVM, MARS, and RF have not been used in designing the size of riprap stones. To all this, new contributions in terms of AI modeling methodology are also provided in this study. Namely, the insufficient number of reliable empirical equations for the estimation of unite discharge at failure state, which is employed in the design of the riprap stone size, can jeopardize the slope stability of bed in waterways, rivers, and channels. It is known that regression-based equations still have a high level of inaccuracy. More specifically, in such investigations, the results from the AI models under study are required to be connected with the problem. The only way to show how AI results are linked to solve the problem consists in the investigation of AI results for consistency. This implies that the recognition of general patterns between input and output variables should be conceptually investigated so that agreements between general pattern and experimental studies (given in literature) are recognized. In this study, it is suggested how AI models could be reliable techniques, recognizing existing general variation of input–output systems. Therefore, experimental datasets from the literature are used to assess the performance of SVM, MARS, and RF techniques in the prediction of dimensionless overtopping discharge at the riprap failure condition. A parametric study is conducted to illustrate the consistency of the AI models' results in riprap designing. Finally, results from the AI techniques are compared to those obtained from the empirical equations.

Riprap design approaches are employed to keep safe various hydraulic structures exposed to erosion. This section wishes to present a survey of the literature studies. The original structure of the literature models is preserved, which may be more attractive for readers even if that means the use of unit of measurements in the imperial and US customary systems. In addition, the transformation of the original formulas into different ones could cause confusion for experts/readers to identify them properly. A basic design method is on the basis of hydraulic calculations in order to define flow-field properties; average size of riprap stones by means of design curve can be found in engineering guidelines (Walters 1982). To develop engineering manuals for fully complete appreciation of erosion caused by floods, a wide range of experimental and field studies was conducted. According to the literature, Olivier (1967) investigated an experimentally protective method to support the body of rock-fill dam. He considered the influence of riprap stone application on seepage flow and water flow profile through a rock-fill dam. The following empirical equation for unit discharge at the incipient motion of riprap stone was suggested:
(1)
in which, qf, S, D50, and Gs denote: unit discharge at failure (or incipient motion) of a riprap layer, bed (or embankment) slope, median riprap stone size, and specific density of riprap stone, respectively. Specifically, Gs is the ratio of the riprap stone density (ρs) to the water density (ρ). In Equation (1), and are in square feet over second and feet, respectively.
Abt & Johnson (1991) constructed a physical model with approximately full scale in order to investigate the protection of embankments exposed to overtopping flows. Through their experiments, slopes of embankment, which were sheltered by means of riprap layers with median stone sizes ranging from 1 to 6 inches, varied from 1 to 20%. They concluded that incipient motion of riprap stone occurred at roughly 74% of the unit discharge in the failure state. Additionally, the following equation was drawn through their study:
(2)
where units of D50 and qf are in feet and ft2/s, respectively.
Sommer (1997) proposed an empirical (dimensionless) relationship to obtain the unit discharge at the failure state based on the slope of embankment as:
(3)
From Robinson et al.’s (1998) study, a rock chute as a riprap layer whose geometry is angular was designed. They presented the following empirical equations predicting the unit discharge at the failure condition, on the basis of the geometric characteristics of a riprap layer:
(4)
where the unit of measure for the median riprap stone size (D50) is mm and that of qf m2/s.
In the case of rock-fill dam spillways, Dornack (2001) proposed an empirical equation to estimate Fs,c. Through his experiments, the slope of riprap varied from 0.29 to 0.67. Additionally, D50 values were between 0.03 and 0.05 m and the riprap stones' density ρs was equal to 2,610 kg/m3. The following relationship was recommended:
(5)
where Fs,c is the stone-referred densimetric Froude number, is the slope angle, and Δ is equal to Gs − 1 = (ρsρ)/ρ. Hereafter, the current terminology of Fs,c is also in harmony with the definition provided by Siebel (2007).
Later, Siebel (2007), on the basis of a large-scale physical model, presented a regression-based equation to determine the unit discharge in the overtopping circumstances as:
(6)
In Equation (6), D50 and qf are in cm and L/m·s, respectively. Khan & Ahmad (2011) inferred from previous experimental works the following regression-based equation for the unit discharge at the riprap failure state:
(7)
where is the uniformity coefficient of riprap stones and t (in mm) is the thickness of riprap layer. Moreover, qf is in m2/s and D50 in mm. Then, Khan & Ahmad (2011) asserted that is a function of the physical proprieties of the riprap stones.
With the aid of comprehensive datasets from deep appreciation of the literature, Abt et al. (2013) compared 21 practical formulations. From their study, a power regression relationship has been drawn which had the capability to compare the observed values of D50 against the predicted ones for both the comprehensive datasets and the subdivided one considering a lower range (D50 < 5.1 cm), a middle range, (5.1 cm ≤ D50 ≤ 25.4 cm), and an upper range (D50 > 25.4 cm). They found that the following equation, proposed by Thornton et al. (2014) considering 102 experimental datasets:
(8)
could provide the most level of accuracy with the same units of measurement as for Equation (7). In particular, Equation (8) exhibited more accurate prediction than Equation (7).

Finally, Hiller et al. (2018) conducted a field study including a large-scale riprap layer with D50 = 0.37 m. Also, a corresponding experimental set up in scale 1:6.5 was constructed with slope 1:1.5 (vertical:horizontal) in order to investigate stability criterion, packing density of riprap stones, and illustrative flow patterns. Their study revealed an interesting similarity between field and laboratory results when considering the stone-referred densimetric Froude number Fs,c.

Overtopping phenomenon is basically contingent upon the physical properties of sediments, the hydraulic gradient, and flow discharge (Abt et al. 2008). Controlling unit discharge in the incipient motion of riprap stones can ameliorate occurrence of the overtopping phenomenon and, additionally, an accurate estimation of this variable may result in an increase of stability level of riprap layer subject to overtopping. The stability of riprap stones depends on the maximum value of unit discharge, height of roughness, specific gravity of water, riprap stone size, and embankment slope (Isbash 1936; Hartung & Scheuerlein 1970; Mishra 1998; Abt et al. 2008). More specifically, according to previous experimental works, the unit discharge at failure (or incipient motion) of a riprap layer (qf) is a function (ψ) of bed (or embankment) slope (S), mean diameter of riprap stone size (D50), uniformity coefficient of riprap stones (Cu), riprap layer thickness (t), stone density (ρs), and water density (ρ) (Abt et al. 2013):
(9)
Figure 1 provides a scheme of a riprap embankment protection exposed to overtopping flow and the description of the main variables in Equation (9).
Figure 1

Schematic diagram of riprap stones for streambank protection.

Figure 1

Schematic diagram of riprap stones for streambank protection.

Close modal
are the most frequently seen variables through empirical equations given in the literature (Abt et al. 2013). On the basis of dimensional analysis, introduced as the Buckingham theorem, Equation (9) can be expressed as (Hiller et al. 2018):
(10)
In the case of AI applications into hydraulic engineering issues, especially in sediment transport problems, the use of non-dimensionless parameter gave accurate estimations rather than results obtained by dimensional variables (e.g., Azamathulla et al. 2005; Samadi et al. 2014; Khan et al. 2018; Sharafati et al. 2018). Furthermore, the use of Froude number due to riprap stone conceptions is a reasonable selection for prediction of unit discharge for a riprap layer at the failure state, as mentioned in the literature (Siebel 2007).

The three dimensionless parameters on the right-hand side of Equation (10) were used as inputs in the SVM, MARS, and RF models. Explored ranges of the dimensional variables are given in Table 1. In this work, 102 experimental data points collected from the literature were considered. The raw data (i.e., unprocessed data) were considered just like measured in the experimental works. The datasets were collected from a wide range of experimental conditions of studies as small, large, and very large-laboratory scales, and in Table 1 the nature of the data (e.g., laboratory experiments, field experiments) related to each literature source is given. The influence of various scales on the accuracy level of AI approaches and empirical equations has become ignorable. Reportedly, this issue can decrease capacity generalization of AI models' performance, as introduced in previous literature (Najafzadeh et al. 2018). The experimental dataset was divided into two parts: 75% of the data (76 data points) was used to train the AI models, and the remaining 25% (26 data points) was utilized to test the models. Overall, empirical equations to predict the discharge at the failure state are in non-dimensional form and, additionally, illustrative representation of the dimensionless parameters against non-dimensional effective parameter (i.e., design curve) is of high interest to engineers. This means that non-dimensional parameters were used to run AI models in order to wipe out the effects of input–output scale (experimental data) on the performance of AI models. In this way, the use of dimensionless parameters causes to increase not only the applicability of traditional equations from experimental to field scale, but also makes estimations more reliable. Furthermore, there are three important issues regarding the considered experimental datasets. First, approach flow was fully turbulent and second, the effect of channel side-walls was negligible. The third issue is that the flow at the interface flow–riprap was fully turbulent.

Table 1

Explored ranges of dimensional and non-dimensional variables from laboratory and field data considered in this study

Authors
Abt et al. (1987)  2.59–5.59 0.21–0.66 1.75–2.09 1.72 0.01–0.02 38–40 7.77–16.77 12.15–29.20 
Abt & Johnson (1991)  2.60–15.70 0.03–0.42 1.75–2.30 1.65–1.72 0.01–0.20 38–42 7.54–31.20 2–3 0.92–8.13 
Maynord (1992)  1.52 0.75 2.07 1.65 0.002 36 2.54 1.67 99.49 
Wittler (1994)  8.13–8.38 0.103–0.291 1.56–5.33 1.52–1.70 0.05–0.20 41 24.39–25.14 1.08–3.18 
Mishra (1998)  27.10–65.50 0.204–0.929 1.52–1.90 1.65 0.5 42 53.11–122.48 1.58–1.96 0.23–0.43 
Robinson et al. (1998)  1.5–27.8 0.003–1.626 1.25–1.73 1.54–1.82 0.02–0.40 36–42 3.00–55.60 0.39–5.63 
Siebel (2007)  5.2–7.3 0.050–0.282 1.65 0.10–0.33 40–41 15.98–16.03 2.19–3.08 0.93–3.55 
Thornton et al. (2008)  9.91–12.19 0.467–0.697 1.54–1.86 1.65 0.005 41 19.82–24.38 3.66–4.02 
Thornton et al. (2012)  1.98–12.19 0.026–0.562 1.54–1.86 1.69 0.006–0.500 38–41 3.98–24.38 1.15–29.00 
Authors
Abt et al. (1987)  2.59–5.59 0.21–0.66 1.75–2.09 1.72 0.01–0.02 38–40 7.77–16.77 12.15–29.20 
Abt & Johnson (1991)  2.60–15.70 0.03–0.42 1.75–2.30 1.65–1.72 0.01–0.20 38–42 7.54–31.20 2–3 0.92–8.13 
Maynord (1992)  1.52 0.75 2.07 1.65 0.002 36 2.54 1.67 99.49 
Wittler (1994)  8.13–8.38 0.103–0.291 1.56–5.33 1.52–1.70 0.05–0.20 41 24.39–25.14 1.08–3.18 
Mishra (1998)  27.10–65.50 0.204–0.929 1.52–1.90 1.65 0.5 42 53.11–122.48 1.58–1.96 0.23–0.43 
Robinson et al. (1998)  1.5–27.8 0.003–1.626 1.25–1.73 1.54–1.82 0.02–0.40 36–42 3.00–55.60 0.39–5.63 
Siebel (2007)  5.2–7.3 0.050–0.282 1.65 0.10–0.33 40–41 15.98–16.03 2.19–3.08 0.93–3.55 
Thornton et al. (2008)  9.91–12.19 0.467–0.697 1.54–1.86 1.65 0.005 41 19.82–24.38 3.66–4.02 
Thornton et al. (2012)  1.98–12.19 0.026–0.562 1.54–1.86 1.69 0.006–0.500 38–41 3.98–24.38 1.15–29.00 

Note: is the angle of repose of the sediment.

The results from empirical equations are compared with the proposed AI models in the testing stage. Furthermore, histograms for all the variables are illustrated in Figures 2(a)–2(f). These histograms represent the frequency distributions for the main variables controlling the processes under study, in order to provide a compact and effective summary of the literature data characteristics. Incidentally, this analysis might turn out to be useful in the planning of future experimental research. The number of classes was selected so that reasonable displays were developed. In general, the number of classes mainly depends on the number of observations (although also the amount of scatter or dispersion in the data is of significance) and, typically, a number between 5 and 20 bins is satisfactory in most cases. Choosing the number of bins approximately equal to the square root of the number of the observations often works well in practice. Since the number of observations related to the diagrams from Figures 2(a)–2(f) is always the same (i.e., 102 observations), we consider a fixed number (equal to 9) of classes for each variable. For each variable, the frequency represents the number of times an outcome takes place in the dataset, in relation to the total number of observations. From these histograms it follows that the explored range for the embankment slope S is satisfactorily large (i.e., from 0.002 to 0.50) with an approximately uniform frequency distribution within it. Conversely, the frequency distribution for the other variable of particular interest, namely, the uniformity coefficient Cu, is almost asymmetric and positively skewed. The tail of the distribution goes to the largest value of 5.33, but the majority of tests (around 90%) were conducted with Cu around 2.0, which would imply the use of bed sediments only slightly different from the uniform ones (i.e., Cu < 1.5). It is worth noting here that also the relative riprap layer thickness, t/D50, was not adequately tested in the literature with values almost always around 2–3, as can be seen from Table 1. Finally, Figure 2(e) shows how the specific density of riprap layer, Gs, is practically always around 2.65, as one would expect for natural bed sediments. However, for future research, it would be interesting to test synthetic materials, with Gs significantly different from 2.65, in order to definitely assess the role of the stone-referred densimetric Froude number Fs,c.

Figure 2

Hydrographs for the variables considered in this study: (a) embankment slope, S; (b) uniformity coefficient of riprap stones, Cu; (c) unit discharge at failure (or incipient motion) of riprap layer, qt; (d) thickness of riprap layer, t; (e) specific density of riprap layer, Gs; and (f) median riprap stone size, D50.

Figure 2

Hydrographs for the variables considered in this study: (a) embankment slope, S; (b) uniformity coefficient of riprap stones, Cu; (c) unit discharge at failure (or incipient motion) of riprap layer, qt; (d) thickness of riprap layer, t; (e) specific density of riprap layer, Gs; and (f) median riprap stone size, D50.

Close modal

In this section, basic definitions of SVM, MARS, and RF models are introduced in brief. More details can be found in the literature (e.g., Vapnik 1995; Giustolisi 2006). Afterwards, development of the proposed models using databases extracted from experimental studies will be implemented.

Support vector machine (SVM)

The SVM is one of the powerful supervised learning techniques to provide reliable and robust predictions. SVM can yield minimum value of the expected computational error by means of structural risk minimization (SRM) technique so as to eradicate occurrence of overfitting. Basically, SRM is used in machine learning theory. Commonly in machine learning, a generalized model should be essentially selected from a dataset with a certain (finite) sample. Consequently, the problem of overfitting may occur in a way that the model suffers from not only becoming too strongly tailored to the particularities of the training dataset, but also generalizing poorly to contemporary (or new) dataset (lack of generalization). In fact, the SRM can eradicate the possibility of this problem by balancing the model's complexity against its prosperity at fitting the training datasets. SVM can map input datasets related to the training phase into a higher dimensional feature space (Vapnik 1995; Yu et al. 2004; Amaranto et al. 2018; Antunes et al. 2018).

A series of datasets , , ….., ,….., is given in which is the input variable, is the output variable which pertains to , and N is the sample size of data. These sets of input–output variables are considered to conduct the training stage. The regression function, ϕ, is generally expressed as (Komasi et al. 2018):
(11)
where w and b are weighting vector in the feature space with the dimension o and bias term, respectively, and denotes the inner product. Moreover, by addition of function of empirical risk, Equation (11) is converted to a minimization problem as (Komasi et al. 2018):
(12)
(13)
where C denotes a constant value, greater than zero, which defines the penalty for computational error by model, and are the slack variables which are required to define in order to measure the observed (or actual) values to the related boundary values of . In SVM, quadratic programming (QP), as one of the most efficient techniques, is applied to solve non-linear optimization problems (Equations (12) and (13)) with linear constraints. With respect to QP method, Equations (12) and (13) are rearranged as (Pham et al. 2018):
(14)
(15)
where and are the Lagrange multipliers, N is the sample size, and k(·) is the kernel function, which is defined as an inner product of and functions as follows:
(16)

Input variables in SVM are transferred to kernel function-based formulations (i.e., radial basis function, sigmoid, linear, and polynomial). In this mapping process, input variables in the forms of kernels are prepared to create a non-linear problem. Different kernel functions are shown in Table 2. They are used in SVM to determine the best one. There are some tuning SVM parameters, such as optimal regularization parameter (C), kind of kernel function, and optimal kernel parameters (r, d, γ), which are required to be assigned (Amaranto et al. 2018). With reference to this study, the optimal magnitudes of parameters for each type of kernel function are given in Table 3. All types of the kernel functions have been tested and their corresponding mean squared errors (MSEs) in the training stage were considered for selection of the best performance. Table 3 indicated that the RBF (MSE = 1.27) and polynomial (MSE = 1.41) kernels could achieve comparatively lower computational errors for predicting Fs,c in comparison with linear (MSE = 3.36) and sigmoid (MSE = 4.41) kernel functions. Also, SVM with RBF kernel function had higher accuracy level in the prediction of Fs,c than that with polynomial kernel function. Hence, in this study, the RBF kernel function was used in the SVM technique. Moreover, through modeling the SVM, control parameter of K-fold number was considered to eradicate the possibility of overfitting. In fact, various K-fold numbers (3, 5, 8, and 10) were assigned and generally it was found that K-fold = 10 had the best performance in terms of accuracy level.

Table 2

A variety of kernel functions used in the SVM model

Kernel typeFormulation
Linear  
Radial basis function (RBF) , 
Polynomial ,  
Sigmoid  
Kernel typeFormulation
Linear  
Radial basis function (RBF) , 
Polynomial ,  
Sigmoid  
Table 3

Performance of different types of kernel functions used in SVM model for training stage

Kernel functionsDimensionless variables
Setting parametersMSE
Linear γ= 0.0164 3.36 
C= 149.071 
Radial basis function (RBF) γ= 2564.67 1.27 
C= 58.548 
Polynomial r= 1.66 1.41 
γ= 0.1312 
d=
C= 73.061 
Sigmoid γ= 0.1560 4.41 
r= 1.85 
C= 54.210 
Kernel functionsDimensionless variables
Setting parametersMSE
Linear γ= 0.0164 3.36 
C= 149.071 
Radial basis function (RBF) γ= 2564.67 1.27 
C= 58.548 
Polynomial r= 1.66 1.41 
γ= 0.1312 
d=
C= 73.061 
Sigmoid γ= 0.1560 4.41 
r= 1.85 
C= 54.210 

Multivariate adaptive regression splines (MARS)

MARS, as a non-parametric regression technique, is able to diminish the complexity degree of non-linear systems by establishing a set of piecewise linear splines (segments) among system variables. Pre-assumptions in the case of reasonable connection among the input–output of a complicated system do not exist. The point at the end of a specific segment, introduced as a knot, denotes not only the end of a region corresponding to the dataset but also the earliest point of the next segment. MARS model searches in a stepwise pattern in order to construct basis functions (BFs). An adaptive regression technique is applied to select the locations of knots. Basically, MARS technique is performed within a two-phase approach. In the first step, known as forward selection, the model generates all probable BFs and corresponding knots. The backward phase removes linear BFs, which have lower impacts on MARS performance (Adamowski et al. 2012). To enhance the accuracy of MARS, the backward technique is employed in order to wipe out the unnecessary datasets using generalized cross validation (GCV). The GCV relationship is computed as:
(17)
where PF denotes the penalty factor which is calculated as:
(18)
in which, de and P denote the determination parameter and number of BFs, respectively.
In the case of BFs' creation through stepwise manner, input vector of is required to be assigned. Basically, is utilized to make a connection between O with T (output vector), in which is known as the specific pattern of the predicted error by the model. is a function being estimated by the BFs. In point of fact, mathematic formulations of BFs are generally linear at least (or polynomials) with smooth trend. For smoothing polynomials' degree, the piecewise linear regression is taken into account. Basic mathematical formulation of piecewise linear regression is known as , where a knot exists at u value. Real values of are determined as:
(19)
MARS technique can generate BFs with linear combination as:
(20)
in which, is a basis function at least, v denotes the constant coefficients of BFs which are approximated via least square method (LSM).
At the final stage of MARS performance, a simplified formulation, H(O), is obtained using a combination of BFs. To assess comparative significance of the input vectors and the BFs, analysis of variance (ANOVA) decomposition is applied (Haghiabi 2016). In the present investigation, 17 BFs were adjusted for the prediction of Fs,c in Table 4. Results of ANOVA decomposition of the proposed MARS technique is given in Table 5. The GCV values, in the third column, provide information about the comparative significance of the corresponding ANOVA function. The best model extracted from the MARS model for prediction of Fs,c is expressed as:
(21)
Table 4

BFs extracted from MARS model using dimensionless variables

BFEquations
BF1 max(0, S 0.04) 
BF2 max(0, 2 t/D50) × max(0, 0.2 S
BF3 max(0, t/D50 2.9) 
BF4 max(0, S 0.04) × max(0, t/D50 2.9) 
BF5 max(0, S 0.04) × max(0, 2.9 t/D50
BF6 max(0, 0.05 S
BF7 max(0, S 0.05) × max(0, t/D50 2.9) 
BF8 max(0, S 0.05) × max(0, 2.9 t/D50
BF9 max(0, 2.9 t/D50) × max(0, Cu 1.65) 
BF10 max(0, 2.9 t/D50) × max(0, 1.65 Cu
BF11 max(0, Cu 1.54) 
BF12 max(0, S 0.1) 
BF13 max(0, 0.1 S
BF14 max(0, Cu 2.3) 
BF15 max(0, 2.3 Cu) × max(0, t/D50 2.39) 
BF16 max(0, 2.3 Cu) × max(0, 2.39 t/D50
BF17 max(0, 1.75 Cu) × max(0, 0.04 S
BFEquations
BF1 max(0, S 0.04) 
BF2 max(0, 2 t/D50) × max(0, 0.2 S
BF3 max(0, t/D50 2.9) 
BF4 max(0, S 0.04) × max(0, t/D50 2.9) 
BF5 max(0, S 0.04) × max(0, 2.9 t/D50
BF6 max(0, 0.05 S
BF7 max(0, S 0.05) × max(0, t/D50 2.9) 
BF8 max(0, S 0.05) × max(0, 2.9 t/D50
BF9 max(0, 2.9 t/D50) × max(0, Cu 1.65) 
BF10 max(0, 2.9 t/D50) × max(0, 1.65 Cu
BF11 max(0, Cu 1.54) 
BF12 max(0, S 0.1) 
BF13 max(0, 0.1 S
BF14 max(0, Cu 2.3) 
BF15 max(0, 2.3 Cu) × max(0, t/D50 2.39) 
BF16 max(0, 2.3 Cu) × max(0, 2.39 t/D50
BF17 max(0, 1.75 Cu) × max(0, 0.04 S
Table 5

Results of ANOVA decomposition for the proposed MARS models

MARS technique by dimensionless variables
FunctionStandard deviationGCVNumber of basis functionsVariables
6.54 261.97 t/D50 
3.81 74732.21 S 
3.48 110.35 Cu 
10.96 601.93 t/D50 and S 
5.16 106.58 t/D50 and Cu 
2.05 32.28 S and Cu 
6.54 261.97 t/D50 
MARS technique by dimensionless variables
FunctionStandard deviationGCVNumber of basis functionsVariables
6.54 261.97 t/D50 
3.81 74732.21 S 
3.48 110.35 Cu 
10.96 601.93 t/D50 and S 
5.16 106.58 t/D50 and Cu 
2.05 32.28 S and Cu 
6.54 261.97 t/D50 

Intrinsically, MARS technique is capable of producing a polynomial regression (with quadratic form) based on spline conceptions. However, in this study, all the 17 BFs have simple formulations. Additionally, all the input variables (S, Cu, t/D50) used in these BFs are easy to acquire. As will be shown in the following, statistical benchmarks indicated that MARS technique provided a more satisfying performance than empirical equations. Additionally, results from the MARS model are absolutely flexible to the changes in ranges of inputs. In this regard, two important issues were considered through running MARS: (1) preserving the physical meaning of results or consistency and (2) obtaining the highest level of accuracy in comparison to the empirical equations. Finally, it has been implied that the proposed MARS model does not replace the traditional equations in which physical essence is most explicit, but the joint use of this AI model and empirical models can lead to extremely reliable results.

Random forest (RF)

RF model is introduced as a group-training model that presents an efficient solution for problems whose dimensions are high. Basically, RF is a tree-based group technique in which all trees are structurally contingent upon a set of variables being randomized, and additionally the forest is originated from a large number of regression trees combined from a group (Breiman 2001; Caradot et al. 2018). In RF model, input variables are converted to splitting parameters and then their corresponding values are obtained. In fact, in this step, impurity of children nodes are evaluated and, additionally, the best splitting parameter is selected among input variables by using the Gini index (GI). This index is a benchmark of how each input variable contributes to the homogeneity (or impurity) of the nodes and leaves in the resulting RF model. Each time a particular variable is applied in order to split a node, the GI related to the child nodes is computed and compared to that of the original node. Furthermore, in tree structure of the RF approach, the ultimate decision is obtained by means of output average, after ascertaining fitness for single trees within bagging technique. The bias value related to the bagged trees is equal to that obtained in single trees, whereas the variance decreases by decreasing the meaningful correlation among trees (Antunes et al. 2018). Development of RFs is at the mercy of tree-growing technique which is contingent upon a random vector (ϕ) with the aim that tree estimator, λ (X, ϕ), has the capability of numerical results' derivation. To evaluate the performance of RF, the mean squared error (E), pertained to each numerical estimator λ(X), is expressed as (Breiman 2001):
(22)

Basically, the more voluminous trees get the more accurate the results. However, the development of trees declines as the number of trees increases, i.e., at a certain point the benefit in precision level of RF from training more trees will become lower than the cost in computation time for training these additional trees. RFs are known as ensemble techniques implying an average over some trees. In a similar way, should one wish to estimate an average of a real-valued random variable, a sample could be considered. In this case, for 102 data series, a forest with 10 trees performs more accurately in comparison with 500 trees. This issue is due to the statistical variance value. If this took place automatically, something goes wrong with the implementation. Typical values for the number of trees (or level trees) are 10, 30, or 100. In the case of very few practices, more than 300 trees outweighs the cost of training trees. In this study, 10 level trees had the most accurate level rather than other level trees.

The results extracted from the AI models and traditional methods are presented in this section. In terms of quantitative comparisons, a variety of statistical parameters was used to evaluate the performance of AI-based machine learning models in various applications (Babovic & Keijzer 2000; Keijzer & Babovic 2002; Chadalawada & Babovic 2017). To benchmark performances of the proposed techniques in this study, three widely known statistical measures including coefficient of correlation (R), root mean square error (RMSE), mean absolute error (MAE), and scatter index (SI) are considered. They are defined as follows:
(23)
(24)
(25)
(26)
in which, N is the number of observations and the meaning of the other symbols is easy to understand. In terms of quantification, RMSE is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSE is better than a higher one. SI is calculated by dividing RMSE by the mean of the observations at each grid point. It presents the percentage of RMSE difference with respect to mean observation or it gives the percentage of expected error for the parameter.

Quantitative comparisons of the proposed AI models

Quantitative comparisons to investigate the performance of SVM-RBF, MARS, and RF were carried out for both training and testing phases. Table 6 presents the statistical results of models' performance. Through the training phase, the values of R (0.99) and RMSE (1.11) given by MARS indicated higher level of precision of this model when compared to SVM-RBF (R = 0.99 and RMSE = 1.62) and RF (R = 0.98 and RMSE = 3.61). Moreover, with respect to MAE and SI parameters, MARS technique, introduced as a set of linear and quadratic relationships, has estimated the Fs,c with the lowest value of computational errors (MAE = 0.27 and SI = 0.25) than the SVM-RBF (MAE = 0.31 and SI = 0.37) and RF (MAE = 0.23 and SI = 0.81) techniques. Table 6 indicates that the proposed SVM-RBF technique with RBF kernel function has a higher level of accuracy compared to the RF approach. Overall, statistical information in Table 6 showed that R-values for the training phase had marginal differences together and, consequently, these values may not be a permissible platform in order to quantify comparison of performance, whereas other statistical parameters could provide valuable information about models' performance.

Table 6

Evaluation of the proposed models' performance using dimensionless variables

Training stage
RRMSEMAESI
AI models 
 SVM-RBF 0.99 1.62 0.31 0.37 
 MARS 0.99 1.11 0.27 0.25 
 RF 0.98 3.61 0.23 0.81 
Testing stage
RRMSEMAESI
AI models 
 SVM-RBF 0.98 1.17 0.58 0.37 
 MARS 0.92 2.32 0.31 0.75 
 RF 0.89 1.93 0.37 0.63 
Empirical equations 
Olivier (1967)  0.94 1.45 0.36 0.48 
Abt & Johnson (1991)  0.91 2.41 0.37 0.77 
Sommer (1997)  0.94 6.01 1.77 1.45 
Robinson et al. (1998)  −0.58 4.57 0.47 1.32 
Dornack (2001)  0.93 3.95 0.53 1.18 
Siebel (2007)  0.94 1.79 0.37 0.52 
Khan & Ahmad (2011)  0.95 1.67 0.43 0.53 
Thornton et al. (2014)  0.95 1.38 0.75 0.40 
Training stage
RRMSEMAESI
AI models 
 SVM-RBF 0.99 1.62 0.31 0.37 
 MARS 0.99 1.11 0.27 0.25 
 RF 0.98 3.61 0.23 0.81 
Testing stage
RRMSEMAESI
AI models 
 SVM-RBF 0.98 1.17 0.58 0.37 
 MARS 0.92 2.32 0.31 0.75 
 RF 0.89 1.93 0.37 0.63 
Empirical equations 
Olivier (1967)  0.94 1.45 0.36 0.48 
Abt & Johnson (1991)  0.91 2.41 0.37 0.77 
Sommer (1997)  0.94 6.01 1.77 1.45 
Robinson et al. (1998)  −0.58 4.57 0.47 1.32 
Dornack (2001)  0.93 3.95 0.53 1.18 
Siebel (2007)  0.94 1.79 0.37 0.52 
Khan & Ahmad (2011)  0.95 1.67 0.43 0.53 
Thornton et al. (2014)  0.95 1.38 0.75 0.40 

Through testing stages, SVM-RBF model indicated more accurate prediction of Fs,c with regard to RMSE (1.17) and SI (0.37) in comparison to MARS (RMSE = 2.32 and SI = 0.75) and RF (RMSE = 1.93 and SI = 0.63). Similarly, R-values indicated slight superiority of SVM-RBF to the other two AI models. With respect to R and MAE values, MARS approach with R value of 0.92 and MAE of 0.31 predicted Fs,c more accurately than RF (R = 0.89 and MAE = 0.37). Even though the MARS model has provided stone-refereed densimetric Froude number, Fs,c, values with relatively lower accuracy level than SVM-RBF, Equation (21) has more practicability and it is easy to use compared to SVM-RBF and RF approaches.

Qualitative comparisons of the proposed AI models

Figures 3(a)–3(f) illustrate the graphical performance of the AI models used in the current investigation at both training and testing phases. At the training stage, SVM-RBF and MARS techniques had the best performance for the extreme value of Fs,c = 99.49 (Figures 3(a) and 3(b)), while RF technique indicated relatively high underestimation (Figure 3(c)). For Fs,c around 30, SVM-RBF and RF models demonstrated the same manner, showing slight underprediction in the allowable error range, whereas the MARS approach had both underprediction and overprediction. Furthermore, the three AI models indicated the best performance for Fs,c < 10. At the testing stage, Figure 3 indicates that, for observed values of Fs,c between 1 and 2, SVM-RBF overpredicted Fs,c remarkably (Figure 3(d)). Figure 3(e) illustrated that MARS had the best performance for Fs,c < 5. Moreover, MARS had underprediction and overprediction for the ranges of Fs,c 5–10 and 10–15, respectively. From Figure 3(f), for Fs,c < 2, RF model had a relative acceptable performance with slight overprediction. Fs,c just over 6 demonstrated remarkable overprediction whereas, for Fs,c approximately 10, RF had underprediction.

Figure 3

Qualitative performance of AI models considered in this study: (a) SVM-RBF; (b) MARS; and (c) RF for training stage; (d) SVM-RBF; (e) MARS; and (f) RF for testing stage.

Figure 3

Qualitative performance of AI models considered in this study: (a) SVM-RBF; (b) MARS; and (c) RF for training stage; (d) SVM-RBF; (e) MARS; and (f) RF for testing stage.

Close modal

Comparative study of the empirical equations performance

In this section, the efficiency of the considered empirical equations (from Equation (1) to Equation (8)) was investigated by using testing datasets. According to Table 6, Equation (1), suggested by Olivier (1967), had the absolute superiority in estimating Fs,c in comparison to other experimental equations, showing RMSE = 1.45 and SI = 0.48. In contrast, Equation (3), proposed by Sommer (1997), predicted Fs,c with higher computational error in comparison with other equations. Sommer's (1997) equation indicated significant overprediction with RMSE = 6.01 and SI = 1.45. Moreover, Siebel's (2007) equation achieved the second rank of accuracy level with MAE = 0.37 and SI = 0.52. With respect to RMSE and SI criteria, Siebel's (2007) equation (Equation (6)) had better performance than Equation (4) by Robinson et al. (1998) (RMSE = 4.57 and SI = 1.32) and almost the same performance as Equation (7) by Khan & Ahmad (2011) (RMSE = 1.67 and SI = 0.53). According to Table 6, Equation (2), proposed by Abt & Johnson (1991), has provided the Fs,c predictions with relatively higher accuracy than Equation (5) by Dornack (2001) (RMSE = 3.95 and SI = 1.18). In addition, RMSE and SI values obtained by Thornton et al.’s (2014) equation would demonstrate that this equation definitely estimates more accurate Fs,c values in comparison with Dornack's (2001) equation (RMSE = 3.95 and SI = 1.18).

In terms of qualitative comparisons, Figures 4(a)–4(h)) show the performance of the empirical equations with significant over- and underpredictions. As can be seen in Figure 4(a), Olivier's (1967) equation (Equation (1)) had a permissible level of performance, whereas Abt & Johnson's (1991) equation had slight overprediction for lower values of Fs,c, as illustrated in Figure 4(b). Fs,c values by Sommer's (1997) equation suffered from remarkable overpredictions (Figure 4(c)); on the contrary, Figure 4(d) illustrates the opposite trend exhibited by Robinson et al.’s (1998) equation. As shown in Figure 4(e), Dornack's (2001) equation has indicated high underpredictions for Fs,c greater than 2. In fact, Equation (5), proposed by Dornack (2001), depends only on the slope of the riprap layer with a range of S from 0.29 to 0.67. According to Figures 4(f) and 4(g), both Siebel's (2007) and Khan & Ahmad's (2011) equations had comparatively perfectible performance. Ultimately, Figure 4(h) shows how the equation by Thornton et al. (2014) is prone to provide relatively high overpredictions for Fs,c < 4.

Figure 4

Qualitative performance of empirical models considered in this study: (a) Olivier (1967); (b) Abt & Johnson (1991); (c) Sommer (1997); (d) Robinson et al. (1998); (e) Dornack (2001); (f) Siebel (2007); (g) Khan & Ahmad (2011); and (h) Thornton et al. (2014).

Figure 4

Qualitative performance of empirical models considered in this study: (a) Olivier (1967); (b) Abt & Johnson (1991); (c) Sommer (1997); (d) Robinson et al. (1998); (e) Dornack (2001); (f) Siebel (2007); (g) Khan & Ahmad (2011); and (h) Thornton et al. (2014).

Close modal

In the final analysis, therefore, the equation proposed by Sommer (1997) would appear quite conservative. Conversely, the equations proposed by Robinson et al. (1998) and Dornack (2001) would lead to considerable underpredictions already from values of Fs,c > 2 (i.e., D50 low-values). Perhaps this is linked with the objectives pursued by these authors in their studies: rock-fill dam spillways, in the case of Dornack (2001), and rock chutes, in the case of Robinson et al. (1998). Both cases imply a better focus on D50 high-values and then more restricted values of Fs,c. To a lesser extent, also the equation suggested by Abt & Johnson (1991) would lead to significant underpredictions, but for Fs,c > 10. A plausible reason may relate to the experimental range of the values of Fs,c explored by the authors (from 0.92 to 8.13, as shown in Table 1), values on which their equation was calibrated.

In this part of the research, the effects of S on Fs,c were investigated for different ranges of S itself (Table 7). As regards RMSE and SI criteria, for S values between 0.002 and 0.080, SVM-RBF and MARS models had the same performance in predicting Fs,c. Conversely, RF indicated a lower value of accuracy (RMSE = 6.70 and SI = 0.52) in comparison with the other two AI techniques. For S ranging from 0.100 to 0.167, the RF model could achieve higher values of accuracy (RMSE = 0.33 and SI = 0.17) than SVM-RBF (RMSE = 0.84 and SI = 0.45) and MARS (RMSE = 0.53 and SI = 0.28). For slope values in the ranges from 0.20 to 0.25 and from 0.30 to 0.50, the RF technique had a similar trend to that observed in the range from 0.10 to 0.17. In addition, Figures 5(a)–5(c) indicate that all the proposed AI models have predicted a downward trend of Fs,c versus S. This trend is in good agreement with the experimental findings by several researchers. Experimental studies by Knauss (1979), Whittaker & Jäggi (1986), Palt (2002), and Siebel (2007) were conducted for Fs,c = 0.5–7.0 and S= 0.005–0.550. From these ranges and performances of AI models, it can be inferred that the simulated variations of Fs,c versus S values are rational and have the capability of preserving the consistency of results.

Table 7

Statistical results of AI models' performance for various ranges of bed slope

AI modelsS = 0.002–0.008S = 0.10–0.17S = 0.20–0.25S = 0.30–0.50
SVM-RBF RMSE = 2.90 RMSE = 0.84 RMSE = 0.46 RMSE = 0.42 
SI = 0.23 SI = 0.45 SI = 0.33 SI = 0.43 
MARS RMSE = 2.90 RMSE = 0.53 RMSE = 0.44 RMSE = 0.39 
SI = 0.24 SI = 0.28 SI = 0.32 SI = 0.40 
RF RMSE = 6.70 RMSE = 0.33 RMSE = 0.27 RMSE = 0.38 
SI = 0.52 SI = 0.17 SI = 0.20 SI = 0.33 
AI modelsS = 0.002–0.008S = 0.10–0.17S = 0.20–0.25S = 0.30–0.50
SVM-RBF RMSE = 2.90 RMSE = 0.84 RMSE = 0.46 RMSE = 0.42 
SI = 0.23 SI = 0.45 SI = 0.33 SI = 0.43 
MARS RMSE = 2.90 RMSE = 0.53 RMSE = 0.44 RMSE = 0.39 
SI = 0.24 SI = 0.28 SI = 0.32 SI = 0.40 
RF RMSE = 6.70 RMSE = 0.33 RMSE = 0.27 RMSE = 0.38 
SI = 0.52 SI = 0.17 SI = 0.20 SI = 0.33 
Figure 5

Variation of Fs,c versus S extracted from: (a) SVM-RBF, (b) MARS, and (c) RF models.

Figure 5

Variation of Fs,c versus S extracted from: (a) SVM-RBF, (b) MARS, and (c) RF models.

Close modal

In addition to the slope S, the control by the uniformity coefficient of riprap stones, Cu, and the relative thickness of riprap layer, t/D50, over the stone-referred densimetric Froude number Fs,c was explored. This is based on the MARS model proposed in this study (Equation (21)) by varying either Cu or t/D50 and keeping the other independent variables constant. In particular, it was found that the greater is Cu (i.e., the non-uniformity of the riprap material) the greater is the resistance of the riprap layer to erosion. This result is in harmony with some empirical equations from the literature (e.g., Equations (7) and (8)), but the dependence of Fs,c on Cu would appear more reliable in this study because it is based on a much more wide-ranging dataset. Analogous trend was found for t/D50 (i.e., the greater is t/D50 and the greater is the resistance of the riprap layer to erosion), as it is reasonable to expect from a physical point of view. This result would be suitably in contradiction with Equations (7) and (8) from the literature according to which the probability of riprap failure would strangely increase with increasing the thickness t of the riprap layer.

In the previous part of this study, it was found that the MARS model shows some superiority above the other AI approaches. Applying statistical criteria, as expressed in Equations (23)–(26), one can select the most accurate AI technique and, additionally, just quantify the relatively lower computational error. In fact, the error indices given by Equations (23)–(26) do not have the required potential to generate a reasonable pattern for error values. Thus, there is high importance to evaluate the performance of the proposed AI techniques by means of error. In the current study, a statistical parameter, known as discrepancy ratio (DR), is used to give more in-depth information about the performance of models. DR can be expressed as:
(27)

On the basis of Equation (27), if DR is just (or roughly) equal to 1, the estimated Fs,c values are just the same as the observed Fs,c values. If DR becomes larger than 1, the AI model overpredicts Fs,c values, and finally, if DR becomes smaller than 1, the AI model would show underprediction status (Noori et al. 2009).

In the current investigation, results of testing stages obtained by the AI models and empirical equations were used to calculate DR values. Table 8 indicates quite a few statistical parameters of DR values. From Table 8, the MARS approach could achieve the minimum value of variance compared with the SVM-RBF and RF models. Furthermore, the average of DR values calculated by Sommer (1997) (Equation (3)) was indicative of having the lowest accuracy level of performance in comparison to the other equations obtained by experimental observations. In the case of Olivier's (1967) equation, average and variance of DR values showed relatively better performance than Dornack's (2001) equation. Table 8 indicates that the average and variance for Abt & Johnson's (1991) equation are practically the same as Olivier's (1967) equation. Moreover, the average of DR values given by Thornton et al.’s (2014) equation showed higher accuracy than Equation (3) by Sommer (1997). For assessing qualitative term comparisons of DR index, variations of DR values versus S for the AI techniques and conventional equations are shown in Figures 6(a) and 6(b), respectively. Figure 6(a) shows that almost all the DR values for AI models are from 0.5 to 1.5. Figure 6(a) also shows qualitatively that all the proposed models had either a lower level of underprediction or overprediction. For instance, AI models have overpredicted Fs,c values for S around 0.125. Furthermore, most of the points are concentrated around the perfect line of DR = 1. Evidently, Figure 6(b) illustrates that Equation (3) by Sommer (1997) predicted Fs,c values with remarkable overprediction in comparison with other traditional equations. This result clearly corroborates what has previously been said about the rather conservative nature of the Sommer (1997) equation.

Table 8

Statistical results of DR values for the AI models and empirical equations

DR statisticsAverageMinimumMaximumVariance
AI models 
 SVM-RBF 1.04 0.31 2.53 0.32 
 MARS 1.15 0.44 2.53 0.12 
 RF 1.22 0.57 4.00 0.42 
Empirical equations 
Olivier (1967)  1.20 0.38 3.00 0.44 
Abt & Johnson (1991)  1.19 0.42 3.56 0.44 
Sommer (1997)  2.70 1.58 8.19 2.24 
Robinson et al. (1998)  0.64 0.02 3.01 0.39 
Dornack (2001)  0.95 0.12 4.36 0.86 
Siebel (2007)  0.76 0.38 2.41 0.21 
Khan & Ahmad (2011)  1.10 0.22 3.92 0.63 
Thornton et al. (2014)  1.59 0.26 3.93 0.75 
DR statisticsAverageMinimumMaximumVariance
AI models 
 SVM-RBF 1.04 0.31 2.53 0.32 
 MARS 1.15 0.44 2.53 0.12 
 RF 1.22 0.57 4.00 0.42 
Empirical equations 
Olivier (1967)  1.20 0.38 3.00 0.44 
Abt & Johnson (1991)  1.19 0.42 3.56 0.44 
Sommer (1997)  2.70 1.58 8.19 2.24 
Robinson et al. (1998)  0.64 0.02 3.01 0.39 
Dornack (2001)  0.95 0.12 4.36 0.86 
Siebel (2007)  0.76 0.38 2.41 0.21 
Khan & Ahmad (2011)  1.10 0.22 3.92 0.63 
Thornton et al. (2014)  1.59 0.26 3.93 0.75 
Figure 6

Variation of DR versus S for: (a) AI models and (b) empirical models.

Figure 6

Variation of DR versus S for: (a) AI models and (b) empirical models.

Close modal

This study aimed to evaluate the non-dimensional unit discharge (Fs,c) at the failure condition of riprap layer for various streambank slopes using three data-mining approaches including MARS, SVM-RBF, and RF models. Five input variables were extracted from experimental works with the aim of developing AI approaches. Generally, the following conclusions can be drawn from the current investigation:

  • Statistical performance of both training and testing stages demonstrated that the SVM-RBF model provided Fs,c values with a higher level of accuracy compared with MARS model, as a set of BFs, and RF techniques. Furthermore, Equation (21), given by MARS technique, was a more precise soft computing tool than other regression-based models.

  • Results of empirical equations indicated that Equations (3)–(5) are lower than the proposed machine learning approaches in terms of all statistical criteria considered in this study. Equations (1), (6), and (7) exhibit a more relatively acceptable precision in estimating Fs,c than those obtained by Equations (3)–(5).

  • Quantitative and qualitative variations of Fs,c versus the slope S indicated that findings of the AI approaches were in permissible agreement with the preceding experimental investigations carried out by Siebel (2007). In fact, this issue preserved the consistency of results.

  • DR analysis has proven that the Fs,c values predicted by the AI techniques are placed in the permissible error bound in comparison with empirical equations, which produced a large amount of over- or underestimation.

Abt
S. R.
Johnson
T. L.
1991
Riprap design for overtopping flow
.
J. Hydraul. Eng-ASCE
117
(
8
),
959
972
.
Abt
S. R.
Khattak
M. S.
Nelson
J. D.
Ruff
J. F.
Shaikh
A.
Wittler
R. J.
Lee
D. W.
Hinkle
N. E.
1987
Development of Riprap Design Criteria by Riprap Testing in Flumes: Phase 1
.
U.S. Nuclear Regulatory Commission Report NUREG/CR-4651, May
,
Washington, DC
,
USA
,
109
pp.
Abt
S. R.
Thornton
C. I.
Gallegos
H. A.
Ullmann
C. M.
2008
Round-shaped riprap stabilization in overtopping flow
.
J. Hydraul. Eng-ASCE
134
(
8
),
1035
1041
.
Abt
S. R.
Thornton
C. I.
Scholl
B. A.
Bender
T. R.
2013
Evaluation of overtopping riprap design relationships
.
J. Amer. Water Resour. Ass.
49
(
4
),
923
937
.
Amaranto
A.
Munoz-Arriola
F.
Corzo
G.
Solomatine
D. P.
Meyer
G.
2018
Semi-seasonal groundwater forecast using data-driven models in an irrigated cropland
.
J. Hydroinform.
20
(
6
),
1227
1246
.
Antunes
A.
Andrade-Campos
A.
Sardinha-Lourenço
A.
Oliveira
M. S.
2018
Short-term water demand forecasting using machine learning techniques
.
J. Hydroinform.
20
(
6
),
1343
1366
.
Azamathulla
H. M.
Deo
M. C.
Deolaikar
P. B.
2005
Neural networks for estimation of scour downstream of a ski-jump bucket
.
J. Hydraul. Eng-ASCE
131
(
10
),
898
908
.
Babovic
V.
2005
Data mining in hydrology
.
Hydrol. Process.
19
(
7
),
1511
1515
.
Babovic
V.
Keijzer
M.
2000
Genetic programming as a model induction engine
.
J. Hydroinform.
2
(
1
),
35
60
.
Borah
D. K.
1989
Scour-depth prediction under armoring conditions
.
J. Hydraul. Eng-ASCE
115
(
10
),
1421
1425
.
Breiman
L.
2001
Random forests
.
Mach. Learn.
45
(
1
),
5
32
.
Caradot
N.
Riechel
M.
Fesneau
M.
Hernandez
N.
Torres
A.
Sonnenberg
H.
Eckert
E.
Lengemen
N.
Waschnewski
J.
Rouault
P.
2018
Practical benchmarking of statistical and machine learning models for prediction of sewer pipes in Berlin, Germany
.
J. Hydroinform.
20
(
5
),
1131
1147
.
Dey
S.
Barbhuiya
A. K.
2004
Clear-water scour at abutments in thinly armored beds
.
J. Hydraul. Eng-ASCE
130
(
7
),
622
634
.
Dornack
S.
2001
Überströmbare Dämme-Beitrag zur Bemessung von Deckwerken aus Bruchsteinen (Overtoppable Dams-A Contribution to the Design of Riprap)
.
Dissertation
,
Technische Universität Dresden
,
Germany
.
Eli
R. N.
Gray
D. D.
2008
Hydraulic performance of a steep single layer riprap drainage channel
.
J. Hydraul. Eng-ASCE
134
(
11
),
1651
1655
.
Froehlich
D. C.
1995
Armor-limited clear-water contraction scour at bridges
.
J. Hydraul. Eng-ASCE
121
(
6
),
490
493
.
Gallegos
H.
2001
Design Criteria for Rounded/Angular Rock Riprap in Overtopping Flow
.
Master's thesis, Colorado State University
,
Fort Collins, CO
,
USA
.
Hartung
D. C.
Scheuerlein
H.
1970
Design of overflow rockfill dams
. In:
Proceedings of the 10th International Congress of Large Dams
,
Montreal, Canada
,
1
, pp.
587
598
.
Hiller
P. H.
Aberle
J.
Lia
L.
2018
Displacements as failure origin of placed riprap on steep slopes
.
J. Hydraul. Res.
56
(
2
),
141
155
.
Hiller
P. H.
Lia
L.
Aberle
J.
2019
Field and model tests of riprap on steep slopes exposed to overtopping
.
J. Appl. Water Eng. Res.
7
(
2
),
103
117
.
Isbash
S.
1936
Construction of Dams by Dumping Stones Into Flowing Water
.
U.S. Army Engineer District
,
Eastport, ME
,
USA
.
Keijzer
M.
Babovic
V.
2002
Declarative and preferential bias in GP-based scientific discovery
.
Genet. Program. Evol. Mach.
3
(
1
),
41
79
.
Khan
D.
Ahmad
Z.
2011
Stabilization of angular-shaped riprap under overtopping flows
.
World Acad. Sci. Eng. Tech.
5
(
11
),
550
554
.
Khan
M.
Tufail
M.
Azamathulla
H. M.
Ahmad
I.
Muhammad
N.
2018
Genetic functions-based modelling for pier scour depth prediction in coarse bed streams
.
P. I. Civil Eng.-Wat. M.
171
(
5
),
225
240
.
Knauss
J.
1979
Computation of maximum discharge at overflow rockfill dams (a comparison of different model test results)
. In:
Proceedings of the 13th ICOLD Congress
,
Q.50–R.9
,
New Delhi, India
, pp.
143
160
.
Lauchlan
C. S.
Melville
B. W.
2001
Riprap protection at bridge piers
.
J. Hydraul. Eng-ASCE
127
(
5
),
412
418
.
Maynord
S. T.
1992
Riprap stability: Studies in near-prototype size laboratory channel. Technical Rep. HL-92-5, Waterways Experiment Station, Vicksburg, MS
.
Mishra
S. K.
1998
Riprap Design of Overtopped Embankments
.
PhD dissertation
,
Department of Civil Engineering, Colorado State University
,
Fort Collins, CO
,
USA
.
Najafzadeh
M.
Rezaie-Balf
M.
Tafarojnoruz
A.
2018
Prediction of riprap stone size under overtopping flow using data-driven models
.
Int. J. River Basin Manage.
16
(
4
),
505
512
.
Olivier
H.
1967
Through and overflow rockfill dams-new design techniques
.
P. I. Civil Eng.
36
(
3
),
433
471
.
Palt
S. M.
2002
Entwicklung eines Dimensionierungskonzepts für naturnahe raue Rampen anhand von Naturuntersuchungen in Gebirgsflüssen
. In:
Proceedings of the International Symposium Moderne Methoden und Konzepte im Wasserbau, Band 2
,
7–9, October 2002
,
Zürich, Switzerland
.
Pham
B. T.
Son
L. H.
Hoang
T.-A.
Nguyen
D.-M.
Bui
D. T.
2018
Prediction of shear strength of soft soil using machine learning methods
.
Catena
166
,
181
191
.
Robinson
K. M.
Rice
C. E.
Kadavy
K. C.
1998
Design of rock chutes
.
Trans. Amer. Soc. Agr. Engr.
41
(
3
),
621
626
.
Sharafati
A.
Yasa
R.
Azamathulla
H. M.
2018
Assessment of stochastic approaches in prediction of wave-induced pipeline scour depth
.
J. Pipeline Syst. Eng.
9
(
4
),
04018024
.
Sommer
P.
1997
Überströmbare Deckwerke
(Overtoppable erosion protections). Institut für Wasserbau und Kulturtechnik, Versuchsanstalt für Wasserbau, Universität Karlsruhe
,
Karlsruhe
,
Germany
.
Thornton
C.
Abt
S. R.
Clopper
C.
Scholl
B. N.
Cox
A. L.
2012
Rock Stability Testing in Overtopping Flow –2012
(Hydraulics Laboratory Technical Report 2012-1). Engineering Research Center, Colorado State University
,
Fort Collins, CO
.
Thornton
C. I.
Cox
A. L.
Turner
M. D.
2008
Las Vegas Wash Sloped Rock-Weir Study. Report Prepared for the Southern Nevada Water Authority
.
Colorado State University
,
Fort Collins, CO
,
USA
.
Thornton
C.
Steven
R.
Abt
F.
Bryan
N.
Theodore
R.
2014
Enhanced stone sizing for overtopping flow
.
J. Hydraul. Eng-ASCE
140
(
4
),
06014005
.
Ullmann
C. M.
Abt
S. R.
2000
Stability of rounded riprap in overtopping flow
. In
Joint Conference on Water Resource Engineering and Water Resources Planning and Management
,
Minneapolis, MN, USA
.
Vapnik
V.
1995
The Nature of Statistical Learning Theory
.
Springer-Verlag
,
New York
,
USA
.
Walters
W. H.
1982
Rock Riprap Design Methods and Their Applicability to Long-Term Protection of Uranium Mill Tailings Impoundments
.
Pacific Northwest Laboratory Operated by Battelle Memorial Institute
,
Richland, WA
,
USA
.
Whittaker
J.
Jäggi
M.
1986
Blockschwellen, vol. 91, Laboratory for Hydraulics, Hydrology and Glaciology
.
ETH Zürich
,
Switzerland
.
Wittler
R. J.
1994
Mechanics of Riprap in Overtopping Flow. PhD thesis
.
Colorado State University
,
Fort Collins, CO
.
Wittler
R. J.
Abt
S. R.
1997
Riprap Design for Full Spectrum Overtopping Flows
.
U.S. Bureau of Reclamation
,
Denver, CO
,
USA
.
Yu
X.
Liong
S.-Y.
Babovic
V.
2004
EC-SVM approach for real-time hydrologic forecasting
.
J. Hydroinform.
6
(
3
),
209
223
.