Abstract
Riprap stones are frequently applied to protect rivers and channels against erosion processes. Many empirical equations have been proposed in the past to estimate the unit discharge at the failure circumstance of riprap layers. However, these equations lack general impact due to the limited range of experimental variables. To overcome these shortcomings, support vector machine (SVM), multivariate adaptive regression splines (MARS), and random forest (RF) techniques have been applied in this study to estimate the approach densimetric Froude number at the incipient motion of riprap stones. Riprap stone size, streambank slope, uniformity coefficient of riprap layer stone, specific density of stones, and thickness of riprap layer have been considered as controlling variables. Quantitative performances of the artificial intelligence (AI) models have been assessed by many statistical measures including: coefficient of correlation (R), root mean square error (RMSE), mean absolute error (MAE), and scatter index (SI). Statistical performance of AI models indicated that SVM model with radial basis function (RBF) kernel had better performance (SI = 0.37) than MARS (SI = 0.75) and RF (SI = 0.63) techniques. The proposed AI models performed better than existing empirical equations. From a parametric study the results demonstrated that the erosion-critical stone-referred Froude number (Fs,c) is mainly controlled by the streambank slope.
INTRODUCTION
Rock armor (commonly known as riprap) has been used in hydraulic engineering to protect hydraulic structures such as bridge piers, grade-control structures, bridge abutments, culvert outlets, end sill of stilling basin, ski-jump bucket spillways, dam embankments, and channel beds, which are exposed to scour and erosion processes (e.g., Borah 1989; Froehlich 1995; Lauchlan & Melville 2001; Dey & Barbhuiya 2004; Eli & Gray 2008; Hiller et al. 2018). Stability of ripraps is a significant factor in their design. The unit discharge of the overtopping flow, the gradation and shape of riprap stones, and the bed and bank slope of waterways highly affect the stability of ripraps (e.g., Ullmann & Abt 2000; Thornton et al. 2008; Eli & Gray 2008). Underestimation of these effective variables may increase the possibility of scouring or liquidation of armored rock layer. In contrast, overestimation of these important elements increases the cost of the project (Thornton et al. 2014). For instance, the accurate estimation of the stone sizes enhances the stability of ripraps, especially when they are vulnerable to overtopping (Thornton et al. 2014). Hence, a large number of studies investigated the riprap stability on steep slopes for different hydraulic conditions, and gradation of riprap stones (Hartung & Scheuerlein 1970; Abt et al. 1987; Wittler & Abt 1997; Ullmann & Abt 2000; Gallegos 2001; Eli & Gray 2008; Hiller et al. 2018, 2019).
In the effort to quantify the overtopping phenomenon, many empirical equations, extracted from experimental observations, have been proposed to estimate the unit discharge at the failure circumstance of riprap layer for various streambank slopes and properties of bed sediments (Thornton et al. 2014). However, these equations lack generalization due to the limited range of experimental variables and hence do not extend to a wide range of hydraulic conditions (Thornton et al. 2014; Najafzadeh et al. 2018). Moreover, these empirical relationships are developed on the traditional regression-based approaches that cannot robustly capture the non-linear relationship between the key variables at the incipient motion of riprap stones.
Due to the above-mentioned restrictions, artificial intelligence (AI) approaches have been recently employed to accurately estimate the riprap stone size. Najafzadeh et al. (2018) used evolutionary algorithm-based formulations to predict the size of riprap stones in overtopping flows. From their research, it was found that the utilized AI models could provide more accurate predictions.
Recently, AI-based data classification and machine learning methods have been employed for forecasting groundwater table (Giustolisi 2006; Amaranto et al. 2018), evaluation of circumstances of sewer networks (Caradot et al. 2018), estimation of chlorophyll-a concentration in water surfaces (Yajima & Derot 2017), prediction of water demand for a short-time period (Antunes et al. 2018), estimation of suspended sediment concentration in river (Babovic 2009), run-off forecasting (Babovic 2005; Adamowski et al. 2012; Meshgi et al. 2015), prediction of standardized precipitation index (Komasi et al. 2018), shear strength of soil (Pham et al. 2018), and longitudinal dispersion coefficients in rivers (Haghiabi 2016). Through these studies, support vector machine (SVM), multivariate adaptive regression spline (MARS), and random forest (RF) are the most robust machine learning models which have ever applied in solving various problems in water engineering. Because of their remarkable advantages, these AI models were considered. The most remarkable characterization of SVM is the high potential of generalizing datasets whose number is small in the training stage and additionally SVM does not get stuck in local optimum like artificial neural networks. Furthermore, RF is generally fast to build the model and has the capability to automatically select datasets from a large number of input variables. For quite a few datasets, RF is able to produce a highly precise classifier. As a merit, MARS techniques will no longer need the functional relationship among independent and dependent variables and, in addition to this, relationships given by MARS model are additive and iterative. In the case of the incipient motion of a riprap for overtopping flows, there are more than 20 empirical equations obtained by experimental investigations. Each empirical equation was extracted from certain experimental conditions and limited range of experimental variables. Khan & Ahmad (2011) collected previous experimental data and presented a multiple regression equation over all the available datasets. Even though their equation had the highest precision in comparison to previous empirical equations, Khan & Ahmad (2011) just performed the validation of their proposed equation. Additionally, their equation was not subsequently tested (or checked) by the new experimental datasets. On the other hand, the accuracy level of the empirical equation by Khan & Ahmad (2011) was not checked. This means that the generalization of this equation due to randomizing datasets/partitioning datasets into calibration (training) and validation (testing) stages cannot be reached. During the recent half-century, data-acquiring systems have been employed to obtain information about some processes. With the emergence of contemporary sciences, these kinds of systems have shown more accurate and reliable results. Furthermore, data acquisition systems can be automated by advanced machines, introduced as machine learning models, to obtain more reliable recognition of behavioral patterns for various phenomena with engineering applications. There is no denying that the use of machine learning models in prediction of various variables can efficiently cover limitations of empirical techniques.
With this study, there is no claim to clarify the overtopping phenomena in a comprehensive manner, but there is a tendency to: (i) emphasize the limitations of the current empirical equations; (ii) highlight the key dimensionless variables controlling the overtopping phenomena; and (iii) try to provide predictive models that, even though more structurally complex, are more accurate. To the best of the authors' knowledge, three powerful AI models, SVM, MARS, and RF have not been used in designing the size of riprap stones. To all this, new contributions in terms of AI modeling methodology are also provided in this study. Namely, the insufficient number of reliable empirical equations for the estimation of unite discharge at failure state, which is employed in the design of the riprap stone size, can jeopardize the slope stability of bed in waterways, rivers, and channels. It is known that regression-based equations still have a high level of inaccuracy. More specifically, in such investigations, the results from the AI models under study are required to be connected with the problem. The only way to show how AI results are linked to solve the problem consists in the investigation of AI results for consistency. This implies that the recognition of general patterns between input and output variables should be conceptually investigated so that agreements between general pattern and experimental studies (given in literature) are recognized. In this study, it is suggested how AI models could be reliable techniques, recognizing existing general variation of input–output systems. Therefore, experimental datasets from the literature are used to assess the performance of SVM, MARS, and RF techniques in the prediction of dimensionless overtopping discharge at the riprap failure condition. A parametric study is conducted to illustrate the consistency of the AI models' results in riprap designing. Finally, results from the AI techniques are compared to those obtained from the empirical equations.
A SURVEY OF EXPERIMENTAL AND FIELD STUDIES
Finally, Hiller et al. (2018) conducted a field study including a large-scale riprap layer with D50 = 0.37 m. Also, a corresponding experimental set up in scale 1:6.5 was constructed with slope 1:1.5 (vertical:horizontal) in order to investigate stability criterion, packing density of riprap stones, and illustrative flow patterns. Their study revealed an interesting similarity between field and laboratory results when considering the stone-referred densimetric Froude number Fs,c.
DATA DESCRIPTIONS
The three dimensionless parameters on the right-hand side of Equation (10) were used as inputs in the SVM, MARS, and RF models. Explored ranges of the dimensional variables are given in Table 1. In this work, 102 experimental data points collected from the literature were considered. The raw data (i.e., unprocessed data) were considered just like measured in the experimental works. The datasets were collected from a wide range of experimental conditions of studies as small, large, and very large-laboratory scales, and in Table 1 the nature of the data (e.g., laboratory experiments, field experiments) related to each literature source is given. The influence of various scales on the accuracy level of AI approaches and empirical equations has become ignorable. Reportedly, this issue can decrease capacity generalization of AI models' performance, as introduced in previous literature (Najafzadeh et al. 2018). The experimental dataset was divided into two parts: 75% of the data (76 data points) was used to train the AI models, and the remaining 25% (26 data points) was utilized to test the models. Overall, empirical equations to predict the discharge at the failure state are in non-dimensional form and, additionally, illustrative representation of the dimensionless parameters against non-dimensional effective parameter (i.e., design curve) is of high interest to engineers. This means that non-dimensional parameters were used to run AI models in order to wipe out the effects of input–output scale (experimental data) on the performance of AI models. In this way, the use of dimensionless parameters causes to increase not only the applicability of traditional equations from experimental to field scale, but also makes estimations more reliable. Furthermore, there are three important issues regarding the considered experimental datasets. First, approach flow was fully turbulent and second, the effect of channel side-walls was negligible. The third issue is that the flow at the interface flow–riprap was fully turbulent.
Authors . | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
Abt et al. (1987) | 2.59–5.59 | 0.21–0.66 | 1.75–2.09 | 1.72 | 0.01–0.02 | 38–40 | 7.77–16.77 | 3 | 12.15–29.20 |
Abt & Johnson (1991) | 2.60–15.70 | 0.03–0.42 | 1.75–2.30 | 1.65–1.72 | 0.01–0.20 | 38–42 | 7.54–31.20 | 2–3 | 0.92–8.13 |
Maynord (1992) | 1.52 | 0.75 | 2.07 | 1.65 | 0.002 | 36 | 2.54 | 1.67 | 99.49 |
Wittler (1994) | 8.13–8.38 | 0.103–0.291 | 1.56–5.33 | 1.52–1.70 | 0.05–0.20 | 41 | 24.39–25.14 | 3 | 1.08–3.18 |
Mishra (1998) | 27.10–65.50 | 0.204–0.929 | 1.52–1.90 | 1.65 | 0.5 | 42 | 53.11–122.48 | 1.58–1.96 | 0.23–0.43 |
Robinson et al. (1998) | 1.5–27.8 | 0.003–1.626 | 1.25–1.73 | 1.54–1.82 | 0.02–0.40 | 36–42 | 3.00–55.60 | 2 | 0.39–5.63 |
Siebel (2007) | 5.2–7.3 | 0.050–0.282 | 2 | 1.65 | 0.10–0.33 | 40–41 | 15.98–16.03 | 2.19–3.08 | 0.93–3.55 |
Thornton et al. (2008) | 9.91–12.19 | 0.467–0.697 | 1.54–1.86 | 1.65 | 0.005 | 41 | 19.82–24.38 | 2 | 3.66–4.02 |
Thornton et al. (2012) | 1.98–12.19 | 0.026–0.562 | 1.54–1.86 | 1.69 | 0.006–0.500 | 38–41 | 3.98–24.38 | 2 | 1.15–29.00 |
Authors . | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
Abt et al. (1987) | 2.59–5.59 | 0.21–0.66 | 1.75–2.09 | 1.72 | 0.01–0.02 | 38–40 | 7.77–16.77 | 3 | 12.15–29.20 |
Abt & Johnson (1991) | 2.60–15.70 | 0.03–0.42 | 1.75–2.30 | 1.65–1.72 | 0.01–0.20 | 38–42 | 7.54–31.20 | 2–3 | 0.92–8.13 |
Maynord (1992) | 1.52 | 0.75 | 2.07 | 1.65 | 0.002 | 36 | 2.54 | 1.67 | 99.49 |
Wittler (1994) | 8.13–8.38 | 0.103–0.291 | 1.56–5.33 | 1.52–1.70 | 0.05–0.20 | 41 | 24.39–25.14 | 3 | 1.08–3.18 |
Mishra (1998) | 27.10–65.50 | 0.204–0.929 | 1.52–1.90 | 1.65 | 0.5 | 42 | 53.11–122.48 | 1.58–1.96 | 0.23–0.43 |
Robinson et al. (1998) | 1.5–27.8 | 0.003–1.626 | 1.25–1.73 | 1.54–1.82 | 0.02–0.40 | 36–42 | 3.00–55.60 | 2 | 0.39–5.63 |
Siebel (2007) | 5.2–7.3 | 0.050–0.282 | 2 | 1.65 | 0.10–0.33 | 40–41 | 15.98–16.03 | 2.19–3.08 | 0.93–3.55 |
Thornton et al. (2008) | 9.91–12.19 | 0.467–0.697 | 1.54–1.86 | 1.65 | 0.005 | 41 | 19.82–24.38 | 2 | 3.66–4.02 |
Thornton et al. (2012) | 1.98–12.19 | 0.026–0.562 | 1.54–1.86 | 1.69 | 0.006–0.500 | 38–41 | 3.98–24.38 | 2 | 1.15–29.00 |
Note: is the angle of repose of the sediment.
The results from empirical equations are compared with the proposed AI models in the testing stage. Furthermore, histograms for all the variables are illustrated in Figures 2(a)–2(f). These histograms represent the frequency distributions for the main variables controlling the processes under study, in order to provide a compact and effective summary of the literature data characteristics. Incidentally, this analysis might turn out to be useful in the planning of future experimental research. The number of classes was selected so that reasonable displays were developed. In general, the number of classes mainly depends on the number of observations (although also the amount of scatter or dispersion in the data is of significance) and, typically, a number between 5 and 20 bins is satisfactory in most cases. Choosing the number of bins approximately equal to the square root of the number of the observations often works well in practice. Since the number of observations related to the diagrams from Figures 2(a)–2(f) is always the same (i.e., 102 observations), we consider a fixed number (equal to 9) of classes for each variable. For each variable, the frequency represents the number of times an outcome takes place in the dataset, in relation to the total number of observations. From these histograms it follows that the explored range for the embankment slope S is satisfactorily large (i.e., from 0.002 to 0.50) with an approximately uniform frequency distribution within it. Conversely, the frequency distribution for the other variable of particular interest, namely, the uniformity coefficient Cu, is almost asymmetric and positively skewed. The tail of the distribution goes to the largest value of 5.33, but the majority of tests (around 90%) were conducted with Cu around 2.0, which would imply the use of bed sediments only slightly different from the uniform ones (i.e., Cu < 1.5). It is worth noting here that also the relative riprap layer thickness, t/D50, was not adequately tested in the literature with values almost always around 2–3, as can be seen from Table 1. Finally, Figure 2(e) shows how the specific density of riprap layer, Gs, is practically always around 2.65, as one would expect for natural bed sediments. However, for future research, it would be interesting to test synthetic materials, with Gs significantly different from 2.65, in order to definitely assess the role of the stone-referred densimetric Froude number Fs,c.
MODELS AND METHODS
In this section, basic definitions of SVM, MARS, and RF models are introduced in brief. More details can be found in the literature (e.g., Vapnik 1995; Giustolisi 2006). Afterwards, development of the proposed models using databases extracted from experimental studies will be implemented.
Support vector machine (SVM)
The SVM is one of the powerful supervised learning techniques to provide reliable and robust predictions. SVM can yield minimum value of the expected computational error by means of structural risk minimization (SRM) technique so as to eradicate occurrence of overfitting. Basically, SRM is used in machine learning theory. Commonly in machine learning, a generalized model should be essentially selected from a dataset with a certain (finite) sample. Consequently, the problem of overfitting may occur in a way that the model suffers from not only becoming too strongly tailored to the particularities of the training dataset, but also generalizing poorly to contemporary (or new) dataset (lack of generalization). In fact, the SRM can eradicate the possibility of this problem by balancing the model's complexity against its prosperity at fitting the training datasets. SVM can map input datasets related to the training phase into a higher dimensional feature space (Vapnik 1995; Yu et al. 2004; Amaranto et al. 2018; Antunes et al. 2018).
Input variables in SVM are transferred to kernel function-based formulations (i.e., radial basis function, sigmoid, linear, and polynomial). In this mapping process, input variables in the forms of kernels are prepared to create a non-linear problem. Different kernel functions are shown in Table 2. They are used in SVM to determine the best one. There are some tuning SVM parameters, such as optimal regularization parameter (C), kind of kernel function, and optimal kernel parameters (r, d, γ), which are required to be assigned (Amaranto et al. 2018). With reference to this study, the optimal magnitudes of parameters for each type of kernel function are given in Table 3. All types of the kernel functions have been tested and their corresponding mean squared errors (MSEs) in the training stage were considered for selection of the best performance. Table 3 indicated that the RBF (MSE = 1.27) and polynomial (MSE = 1.41) kernels could achieve comparatively lower computational errors for predicting Fs,c in comparison with linear (MSE = 3.36) and sigmoid (MSE = 4.41) kernel functions. Also, SVM with RBF kernel function had higher accuracy level in the prediction of Fs,c than that with polynomial kernel function. Hence, in this study, the RBF kernel function was used in the SVM technique. Moreover, through modeling the SVM, control parameter of K-fold number was considered to eradicate the possibility of overfitting. In fact, various K-fold numbers (3, 5, 8, and 10) were assigned and generally it was found that K-fold = 10 had the best performance in terms of accuracy level.
Kernel type . | Formulation . |
---|---|
Linear | |
Radial basis function (RBF) | , |
Polynomial | , |
Sigmoid |
Kernel type . | Formulation . |
---|---|
Linear | |
Radial basis function (RBF) | , |
Polynomial | , |
Sigmoid |
Kernel functions . | Dimensionless variables . | |
---|---|---|
Setting parameters . | MSE . | |
Linear | γ= 0.0164 | 3.36 |
C= 149.071 | ||
Radial basis function (RBF) | γ= 2564.67 | 1.27 |
C= 58.548 | ||
Polynomial | r= 1.66 | 1.41 |
γ= 0.1312 | ||
d= 5 | ||
C= 73.061 | ||
Sigmoid | γ= 0.1560 | 4.41 |
r= 1.85 | ||
C= 54.210 |
Kernel functions . | Dimensionless variables . | |
---|---|---|
Setting parameters . | MSE . | |
Linear | γ= 0.0164 | 3.36 |
C= 149.071 | ||
Radial basis function (RBF) | γ= 2564.67 | 1.27 |
C= 58.548 | ||
Polynomial | r= 1.66 | 1.41 |
γ= 0.1312 | ||
d= 5 | ||
C= 73.061 | ||
Sigmoid | γ= 0.1560 | 4.41 |
r= 1.85 | ||
C= 54.210 |
Multivariate adaptive regression splines (MARS)
BF . | Equations . |
---|---|
BF1 | max(0, S− 0.04) |
BF2 | max(0, 2 −t/D50) × max(0, 0.2 −S) |
BF3 | max(0, t/D50− 2.9) |
BF4 | max(0, S− 0.04) × max(0, t/D50− 2.9) |
BF5 | max(0, S− 0.04) × max(0, 2.9 −t/D50) |
BF6 | max(0, 0.05 −S) |
BF7 | max(0, S− 0.05) × max(0, t/D50− 2.9) |
BF8 | max(0, S− 0.05) × max(0, 2.9 −t/D50) |
BF9 | max(0, 2.9 −t/D50) × max(0, Cu− 1.65) |
BF10 | max(0, 2.9 −t/D50) × max(0, 1.65 −Cu) |
BF11 | max(0, Cu− 1.54) |
BF12 | max(0, S− 0.1) |
BF13 | max(0, 0.1 −S) |
BF14 | max(0, Cu− 2.3) |
BF15 | max(0, 2.3 −Cu) × max(0, t/D50− 2.39) |
BF16 | max(0, 2.3 −Cu) × max(0, 2.39 −t/D50) |
BF17 | max(0, 1.75 −Cu) × max(0, 0.04 −S) |
BF . | Equations . |
---|---|
BF1 | max(0, S− 0.04) |
BF2 | max(0, 2 −t/D50) × max(0, 0.2 −S) |
BF3 | max(0, t/D50− 2.9) |
BF4 | max(0, S− 0.04) × max(0, t/D50− 2.9) |
BF5 | max(0, S− 0.04) × max(0, 2.9 −t/D50) |
BF6 | max(0, 0.05 −S) |
BF7 | max(0, S− 0.05) × max(0, t/D50− 2.9) |
BF8 | max(0, S− 0.05) × max(0, 2.9 −t/D50) |
BF9 | max(0, 2.9 −t/D50) × max(0, Cu− 1.65) |
BF10 | max(0, 2.9 −t/D50) × max(0, 1.65 −Cu) |
BF11 | max(0, Cu− 1.54) |
BF12 | max(0, S− 0.1) |
BF13 | max(0, 0.1 −S) |
BF14 | max(0, Cu− 2.3) |
BF15 | max(0, 2.3 −Cu) × max(0, t/D50− 2.39) |
BF16 | max(0, 2.3 −Cu) × max(0, 2.39 −t/D50) |
BF17 | max(0, 1.75 −Cu) × max(0, 0.04 −S) |
MARS technique by dimensionless variables . | ||||
---|---|---|---|---|
Function . | Standard deviation . | GCV . | Number of basis functions . | Variables . |
1 | 6.54 | 261.97 | 1 | t/D50 |
2 | 3.81 | 74732.21 | 4 | S |
3 | 3.48 | 110.35 | 2 | Cu |
4 | 10.96 | 601.93 | 5 | t/D50 and S |
5 | 5.16 | 106.58 | 4 | t/D50 and Cu |
6 | 2.05 | 32.28 | 1 | S and Cu |
1 | 6.54 | 261.97 | 1 | t/D50 |
MARS technique by dimensionless variables . | ||||
---|---|---|---|---|
Function . | Standard deviation . | GCV . | Number of basis functions . | Variables . |
1 | 6.54 | 261.97 | 1 | t/D50 |
2 | 3.81 | 74732.21 | 4 | S |
3 | 3.48 | 110.35 | 2 | Cu |
4 | 10.96 | 601.93 | 5 | t/D50 and S |
5 | 5.16 | 106.58 | 4 | t/D50 and Cu |
6 | 2.05 | 32.28 | 1 | S and Cu |
1 | 6.54 | 261.97 | 1 | t/D50 |
Intrinsically, MARS technique is capable of producing a polynomial regression (with quadratic form) based on spline conceptions. However, in this study, all the 17 BFs have simple formulations. Additionally, all the input variables (S, Cu, t/D50) used in these BFs are easy to acquire. As will be shown in the following, statistical benchmarks indicated that MARS technique provided a more satisfying performance than empirical equations. Additionally, results from the MARS model are absolutely flexible to the changes in ranges of inputs. In this regard, two important issues were considered through running MARS: (1) preserving the physical meaning of results or consistency and (2) obtaining the highest level of accuracy in comparison to the empirical equations. Finally, it has been implied that the proposed MARS model does not replace the traditional equations in which physical essence is most explicit, but the joint use of this AI model and empirical models can lead to extremely reliable results.
Random forest (RF)
Basically, the more voluminous trees get the more accurate the results. However, the development of trees declines as the number of trees increases, i.e., at a certain point the benefit in precision level of RF from training more trees will become lower than the cost in computation time for training these additional trees. RFs are known as ensemble techniques implying an average over some trees. In a similar way, should one wish to estimate an average of a real-valued random variable, a sample could be considered. In this case, for 102 data series, a forest with 10 trees performs more accurately in comparison with 500 trees. This issue is due to the statistical variance value. If this took place automatically, something goes wrong with the implementation. Typical values for the number of trees (or level trees) are 10, 30, or 100. In the case of very few practices, more than 300 trees outweighs the cost of training trees. In this study, 10 level trees had the most accurate level rather than other level trees.
RESULTS AND DISCUSSION
Quantitative comparisons of the proposed AI models
Quantitative comparisons to investigate the performance of SVM-RBF, MARS, and RF were carried out for both training and testing phases. Table 6 presents the statistical results of models' performance. Through the training phase, the values of R (0.99) and RMSE (1.11) given by MARS indicated higher level of precision of this model when compared to SVM-RBF (R = 0.99 and RMSE = 1.62) and RF (R = 0.98 and RMSE = 3.61). Moreover, with respect to MAE and SI parameters, MARS technique, introduced as a set of linear and quadratic relationships, has estimated the Fs,c with the lowest value of computational errors (MAE = 0.27 and SI = 0.25) than the SVM-RBF (MAE = 0.31 and SI = 0.37) and RF (MAE = 0.23 and SI = 0.81) techniques. Table 6 indicates that the proposed SVM-RBF technique with RBF kernel function has a higher level of accuracy compared to the RF approach. Overall, statistical information in Table 6 showed that R-values for the training phase had marginal differences together and, consequently, these values may not be a permissible platform in order to quantify comparison of performance, whereas other statistical parameters could provide valuable information about models' performance.
. | Training stage . | |||
---|---|---|---|---|
R . | RMSE . | MAE . | SI . | |
AI models | ||||
SVM-RBF | 0.99 | 1.62 | 0.31 | 0.37 |
MARS | 0.99 | 1.11 | 0.27 | 0.25 |
RF | 0.98 | 3.61 | 0.23 | 0.81 |
. | Testing stage . | |||
. | R . | RMSE . | MAE . | SI . |
AI models | ||||
SVM-RBF | 0.98 | 1.17 | 0.58 | 0.37 |
MARS | 0.92 | 2.32 | 0.31 | 0.75 |
RF | 0.89 | 1.93 | 0.37 | 0.63 |
Empirical equations | ||||
Olivier (1967) | 0.94 | 1.45 | 0.36 | 0.48 |
Abt & Johnson (1991) | 0.91 | 2.41 | 0.37 | 0.77 |
Sommer (1997) | 0.94 | 6.01 | 1.77 | 1.45 |
Robinson et al. (1998) | −0.58 | 4.57 | 0.47 | 1.32 |
Dornack (2001) | 0.93 | 3.95 | 0.53 | 1.18 |
Siebel (2007) | 0.94 | 1.79 | 0.37 | 0.52 |
Khan & Ahmad (2011) | 0.95 | 1.67 | 0.43 | 0.53 |
Thornton et al. (2014) | 0.95 | 1.38 | 0.75 | 0.40 |
. | Training stage . | |||
---|---|---|---|---|
R . | RMSE . | MAE . | SI . | |
AI models | ||||
SVM-RBF | 0.99 | 1.62 | 0.31 | 0.37 |
MARS | 0.99 | 1.11 | 0.27 | 0.25 |
RF | 0.98 | 3.61 | 0.23 | 0.81 |
. | Testing stage . | |||
. | R . | RMSE . | MAE . | SI . |
AI models | ||||
SVM-RBF | 0.98 | 1.17 | 0.58 | 0.37 |
MARS | 0.92 | 2.32 | 0.31 | 0.75 |
RF | 0.89 | 1.93 | 0.37 | 0.63 |
Empirical equations | ||||
Olivier (1967) | 0.94 | 1.45 | 0.36 | 0.48 |
Abt & Johnson (1991) | 0.91 | 2.41 | 0.37 | 0.77 |
Sommer (1997) | 0.94 | 6.01 | 1.77 | 1.45 |
Robinson et al. (1998) | −0.58 | 4.57 | 0.47 | 1.32 |
Dornack (2001) | 0.93 | 3.95 | 0.53 | 1.18 |
Siebel (2007) | 0.94 | 1.79 | 0.37 | 0.52 |
Khan & Ahmad (2011) | 0.95 | 1.67 | 0.43 | 0.53 |
Thornton et al. (2014) | 0.95 | 1.38 | 0.75 | 0.40 |
Through testing stages, SVM-RBF model indicated more accurate prediction of Fs,c with regard to RMSE (1.17) and SI (0.37) in comparison to MARS (RMSE = 2.32 and SI = 0.75) and RF (RMSE = 1.93 and SI = 0.63). Similarly, R-values indicated slight superiority of SVM-RBF to the other two AI models. With respect to R and MAE values, MARS approach with R value of 0.92 and MAE of 0.31 predicted Fs,c more accurately than RF (R = 0.89 and MAE = 0.37). Even though the MARS model has provided stone-refereed densimetric Froude number, Fs,c, values with relatively lower accuracy level than SVM-RBF, Equation (21) has more practicability and it is easy to use compared to SVM-RBF and RF approaches.
Qualitative comparisons of the proposed AI models
Figures 3(a)–3(f) illustrate the graphical performance of the AI models used in the current investigation at both training and testing phases. At the training stage, SVM-RBF and MARS techniques had the best performance for the extreme value of Fs,c = 99.49 (Figures 3(a) and 3(b)), while RF technique indicated relatively high underestimation (Figure 3(c)). For Fs,c around 30, SVM-RBF and RF models demonstrated the same manner, showing slight underprediction in the allowable error range, whereas the MARS approach had both underprediction and overprediction. Furthermore, the three AI models indicated the best performance for Fs,c < 10. At the testing stage, Figure 3 indicates that, for observed values of Fs,c between 1 and 2, SVM-RBF overpredicted Fs,c remarkably (Figure 3(d)). Figure 3(e) illustrated that MARS had the best performance for Fs,c < 5. Moreover, MARS had underprediction and overprediction for the ranges of Fs,c 5–10 and 10–15, respectively. From Figure 3(f), for Fs,c < 2, RF model had a relative acceptable performance with slight overprediction. Fs,c just over 6 demonstrated remarkable overprediction whereas, for Fs,c approximately 10, RF had underprediction.
Comparative study of the empirical equations performance
In this section, the efficiency of the considered empirical equations (from Equation (1) to Equation (8)) was investigated by using testing datasets. According to Table 6, Equation (1), suggested by Olivier (1967), had the absolute superiority in estimating Fs,c in comparison to other experimental equations, showing RMSE = 1.45 and SI = 0.48. In contrast, Equation (3), proposed by Sommer (1997), predicted Fs,c with higher computational error in comparison with other equations. Sommer's (1997) equation indicated significant overprediction with RMSE = 6.01 and SI = 1.45. Moreover, Siebel's (2007) equation achieved the second rank of accuracy level with MAE = 0.37 and SI = 0.52. With respect to RMSE and SI criteria, Siebel's (2007) equation (Equation (6)) had better performance than Equation (4) by Robinson et al. (1998) (RMSE = 4.57 and SI = 1.32) and almost the same performance as Equation (7) by Khan & Ahmad (2011) (RMSE = 1.67 and SI = 0.53). According to Table 6, Equation (2), proposed by Abt & Johnson (1991), has provided the Fs,c predictions with relatively higher accuracy than Equation (5) by Dornack (2001) (RMSE = 3.95 and SI = 1.18). In addition, RMSE and SI values obtained by Thornton et al.’s (2014) equation would demonstrate that this equation definitely estimates more accurate Fs,c values in comparison with Dornack's (2001) equation (RMSE = 3.95 and SI = 1.18).
In terms of qualitative comparisons, Figures 4(a)–4(h)) show the performance of the empirical equations with significant over- and underpredictions. As can be seen in Figure 4(a), Olivier's (1967) equation (Equation (1)) had a permissible level of performance, whereas Abt & Johnson's (1991) equation had slight overprediction for lower values of Fs,c, as illustrated in Figure 4(b). Fs,c values by Sommer's (1997) equation suffered from remarkable overpredictions (Figure 4(c)); on the contrary, Figure 4(d) illustrates the opposite trend exhibited by Robinson et al.’s (1998) equation. As shown in Figure 4(e), Dornack's (2001) equation has indicated high underpredictions for Fs,c greater than 2. In fact, Equation (5), proposed by Dornack (2001), depends only on the slope of the riprap layer with a range of S from 0.29 to 0.67. According to Figures 4(f) and 4(g), both Siebel's (2007) and Khan & Ahmad's (2011) equations had comparatively perfectible performance. Ultimately, Figure 4(h) shows how the equation by Thornton et al. (2014) is prone to provide relatively high overpredictions for Fs,c < 4.
In the final analysis, therefore, the equation proposed by Sommer (1997) would appear quite conservative. Conversely, the equations proposed by Robinson et al. (1998) and Dornack (2001) would lead to considerable underpredictions already from values of Fs,c > 2 (i.e., D50 low-values). Perhaps this is linked with the objectives pursued by these authors in their studies: rock-fill dam spillways, in the case of Dornack (2001), and rock chutes, in the case of Robinson et al. (1998). Both cases imply a better focus on D50 high-values and then more restricted values of Fs,c. To a lesser extent, also the equation suggested by Abt & Johnson (1991) would lead to significant underpredictions, but for Fs,c > 10. A plausible reason may relate to the experimental range of the values of Fs,c explored by the authors (from 0.92 to 8.13, as shown in Table 1), values on which their equation was calibrated.
PARAMETRIC STUDY
In this part of the research, the effects of S on Fs,c were investigated for different ranges of S itself (Table 7). As regards RMSE and SI criteria, for S values between 0.002 and 0.080, SVM-RBF and MARS models had the same performance in predicting Fs,c. Conversely, RF indicated a lower value of accuracy (RMSE = 6.70 and SI = 0.52) in comparison with the other two AI techniques. For S ranging from 0.100 to 0.167, the RF model could achieve higher values of accuracy (RMSE = 0.33 and SI = 0.17) than SVM-RBF (RMSE = 0.84 and SI = 0.45) and MARS (RMSE = 0.53 and SI = 0.28). For slope values in the ranges from 0.20 to 0.25 and from 0.30 to 0.50, the RF technique had a similar trend to that observed in the range from 0.10 to 0.17. In addition, Figures 5(a)–5(c) indicate that all the proposed AI models have predicted a downward trend of Fs,c versus S. This trend is in good agreement with the experimental findings by several researchers. Experimental studies by Knauss (1979), Whittaker & Jäggi (1986), Palt (2002), and Siebel (2007) were conducted for Fs,c = 0.5–7.0 and S= 0.005–0.550. From these ranges and performances of AI models, it can be inferred that the simulated variations of Fs,c versus S values are rational and have the capability of preserving the consistency of results.
AI models . | S = 0.002–0.008 . | S = 0.10–0.17 . | S = 0.20–0.25 . | S = 0.30–0.50 . |
---|---|---|---|---|
SVM-RBF | RMSE = 2.90 | RMSE = 0.84 | RMSE = 0.46 | RMSE = 0.42 |
SI = 0.23 | SI = 0.45 | SI = 0.33 | SI = 0.43 | |
MARS | RMSE = 2.90 | RMSE = 0.53 | RMSE = 0.44 | RMSE = 0.39 |
SI = 0.24 | SI = 0.28 | SI = 0.32 | SI = 0.40 | |
RF | RMSE = 6.70 | RMSE = 0.33 | RMSE = 0.27 | RMSE = 0.38 |
SI = 0.52 | SI = 0.17 | SI = 0.20 | SI = 0.33 |
AI models . | S = 0.002–0.008 . | S = 0.10–0.17 . | S = 0.20–0.25 . | S = 0.30–0.50 . |
---|---|---|---|---|
SVM-RBF | RMSE = 2.90 | RMSE = 0.84 | RMSE = 0.46 | RMSE = 0.42 |
SI = 0.23 | SI = 0.45 | SI = 0.33 | SI = 0.43 | |
MARS | RMSE = 2.90 | RMSE = 0.53 | RMSE = 0.44 | RMSE = 0.39 |
SI = 0.24 | SI = 0.28 | SI = 0.32 | SI = 0.40 | |
RF | RMSE = 6.70 | RMSE = 0.33 | RMSE = 0.27 | RMSE = 0.38 |
SI = 0.52 | SI = 0.17 | SI = 0.20 | SI = 0.33 |
In addition to the slope S, the control by the uniformity coefficient of riprap stones, Cu, and the relative thickness of riprap layer, t/D50, over the stone-referred densimetric Froude number Fs,c was explored. This is based on the MARS model proposed in this study (Equation (21)) by varying either Cu or t/D50 and keeping the other independent variables constant. In particular, it was found that the greater is Cu (i.e., the non-uniformity of the riprap material) the greater is the resistance of the riprap layer to erosion. This result is in harmony with some empirical equations from the literature (e.g., Equations (7) and (8)), but the dependence of Fs,c on Cu would appear more reliable in this study because it is based on a much more wide-ranging dataset. Analogous trend was found for t/D50 (i.e., the greater is t/D50 and the greater is the resistance of the riprap layer to erosion), as it is reasonable to expect from a physical point of view. This result would be suitably in contradiction with Equations (7) and (8) from the literature according to which the probability of riprap failure would strangely increase with increasing the thickness t of the riprap layer.
EVALUATION OF THE PROPOSED TECHNIQUES USING DISCREPANCY ANALYSIS
On the basis of Equation (27), if DR is just (or roughly) equal to 1, the estimated Fs,c values are just the same as the observed Fs,c values. If DR becomes larger than 1, the AI model overpredicts Fs,c values, and finally, if DR becomes smaller than 1, the AI model would show underprediction status (Noori et al. 2009).
In the current investigation, results of testing stages obtained by the AI models and empirical equations were used to calculate DR values. Table 8 indicates quite a few statistical parameters of DR values. From Table 8, the MARS approach could achieve the minimum value of variance compared with the SVM-RBF and RF models. Furthermore, the average of DR values calculated by Sommer (1997) (Equation (3)) was indicative of having the lowest accuracy level of performance in comparison to the other equations obtained by experimental observations. In the case of Olivier's (1967) equation, average and variance of DR values showed relatively better performance than Dornack's (2001) equation. Table 8 indicates that the average and variance for Abt & Johnson's (1991) equation are practically the same as Olivier's (1967) equation. Moreover, the average of DR values given by Thornton et al.’s (2014) equation showed higher accuracy than Equation (3) by Sommer (1997). For assessing qualitative term comparisons of DR index, variations of DR values versus S for the AI techniques and conventional equations are shown in Figures 6(a) and 6(b), respectively. Figure 6(a) shows that almost all the DR values for AI models are from 0.5 to 1.5. Figure 6(a) also shows qualitatively that all the proposed models had either a lower level of underprediction or overprediction. For instance, AI models have overpredicted Fs,c values for S around 0.125. Furthermore, most of the points are concentrated around the perfect line of DR = 1. Evidently, Figure 6(b) illustrates that Equation (3) by Sommer (1997) predicted Fs,c values with remarkable overprediction in comparison with other traditional equations. This result clearly corroborates what has previously been said about the rather conservative nature of the Sommer (1997) equation.
DR statistics . | Average . | Minimum . | Maximum . | Variance . |
---|---|---|---|---|
AI models | ||||
SVM-RBF | 1.04 | 0.31 | 2.53 | 0.32 |
MARS | 1.15 | 0.44 | 2.53 | 0.12 |
RF | 1.22 | 0.57 | 4.00 | 0.42 |
Empirical equations | ||||
Olivier (1967) | 1.20 | 0.38 | 3.00 | 0.44 |
Abt & Johnson (1991) | 1.19 | 0.42 | 3.56 | 0.44 |
Sommer (1997) | 2.70 | 1.58 | 8.19 | 2.24 |
Robinson et al. (1998) | 0.64 | 0.02 | 3.01 | 0.39 |
Dornack (2001) | 0.95 | 0.12 | 4.36 | 0.86 |
Siebel (2007) | 0.76 | 0.38 | 2.41 | 0.21 |
Khan & Ahmad (2011) | 1.10 | 0.22 | 3.92 | 0.63 |
Thornton et al. (2014) | 1.59 | 0.26 | 3.93 | 0.75 |
DR statistics . | Average . | Minimum . | Maximum . | Variance . |
---|---|---|---|---|
AI models | ||||
SVM-RBF | 1.04 | 0.31 | 2.53 | 0.32 |
MARS | 1.15 | 0.44 | 2.53 | 0.12 |
RF | 1.22 | 0.57 | 4.00 | 0.42 |
Empirical equations | ||||
Olivier (1967) | 1.20 | 0.38 | 3.00 | 0.44 |
Abt & Johnson (1991) | 1.19 | 0.42 | 3.56 | 0.44 |
Sommer (1997) | 2.70 | 1.58 | 8.19 | 2.24 |
Robinson et al. (1998) | 0.64 | 0.02 | 3.01 | 0.39 |
Dornack (2001) | 0.95 | 0.12 | 4.36 | 0.86 |
Siebel (2007) | 0.76 | 0.38 | 2.41 | 0.21 |
Khan & Ahmad (2011) | 1.10 | 0.22 | 3.92 | 0.63 |
Thornton et al. (2014) | 1.59 | 0.26 | 3.93 | 0.75 |
CONCLUSIONS
This study aimed to evaluate the non-dimensional unit discharge (Fs,c) at the failure condition of riprap layer for various streambank slopes using three data-mining approaches including MARS, SVM-RBF, and RF models. Five input variables were extracted from experimental works with the aim of developing AI approaches. Generally, the following conclusions can be drawn from the current investigation:
Statistical performance of both training and testing stages demonstrated that the SVM-RBF model provided Fs,c values with a higher level of accuracy compared with MARS model, as a set of BFs, and RF techniques. Furthermore, Equation (21), given by MARS technique, was a more precise soft computing tool than other regression-based models.
Results of empirical equations indicated that Equations (3)–(5) are lower than the proposed machine learning approaches in terms of all statistical criteria considered in this study. Equations (1), (6), and (7) exhibit a more relatively acceptable precision in estimating Fs,c than those obtained by Equations (3)–(5).
Quantitative and qualitative variations of Fs,c versus the slope S indicated that findings of the AI approaches were in permissible agreement with the preceding experimental investigations carried out by Siebel (2007). In fact, this issue preserved the consistency of results.
DR analysis has proven that the Fs,c values predicted by the AI techniques are placed in the permissible error bound in comparison with empirical equations, which produced a large amount of over- or underestimation.