Abstract
Accurate prediction of a breached dam's peak outflow is a significant factor for flood risk analysis. In this study, the capability of Support Vector Machine and Kernel Extreme Learning Machine as kernel-based approaches and Gene Expression Programming method was assessed in breached dam peak outflow prediction. Two types of modeling were considered. First, only dam reservoir height and volume at the failure time were used as the input combinations (state 1). Then, soil characteristics were added to input combinations to investigate particularly the impact of soil characteristics (state 2). Results showed that the use of only soil characteristics did not lead to a desired accuracy; however, adding soil characteristics to input combinations (state 2) improved the models' accuracy up to 40%. The outcome of the applied models was also compared with existing empirical equations and it was found the applied models yielded better results. Sensitivity analysis results showed that dam height had the most important role in the peak outflow prediction, while the strength parameters did not have significant impacts. Furthermore, for assessing the best-applied model dependability, uncertainty analysis was used and the results indicated that the SVM model had an allowable degree of uncertainty in peak outflow modelling.
HIGHLIGHTS
Some novelties of the present research can be summarized as follows:
Applicability and accuracy of two different meta models, namely, Gene Expression Programming (GEP) and Support Vector Machine (SVM), are used to predict the peak outflow from breached embankment dams.
Two different scenarios are developed based on dam reservoir height and volume at the time of failure and soil characteristics.
Additional to the geometrical characteristics of dam, the impact of soil characteristics on modelling the peak outflow is investigated.
The outcomes of the SVM and GEP models are also compared with the existing empirical equations and it is shown that the intelligence models yield better results.
INTRODUCTION
The main purpose of dam building is to enhance the quality of human life by providing drinking water, power generation, navigation, flood control and so on. However, dam failure can cause catastrophic flooding and consequently presents high risk to human life and property located in the downstream. In order to prevent and mitigate such a natural hazard, dam owners and agencies responsible for dam safety carefully study, analyze and inspect dams to identify significant failure modes. Over-topping and piping are the most encountered modes of failures causing breach of embankment dams (Wahl 2010). The breach parameters: time of failure and breach width and the peak outflow are crucial in evaluating dam risk assessments. Accurate predictions of such parameters remain a challenging task. Peak outflow rate due to dam failure is an important parameter in prediction of inundation water level for the emergency program planning and flood mitigation. The historical dam failure cases reveal a considerable range of financial, structural, and human losses. For instance, the South Fork Dam in Pennsylvania, USA, failed in 1889 due to overtopping, killing 2200 people and causing significant property losses (Singh & Scarlatos 1988). According to Wahl (2010), the accuracy of flood outflow assessment and related damages strongly depends on the proper computing of the outflow hydrograph caused by a dam break. In the past decades, accurate prediction of peak outflow has been of much interest among many investigators. Several experiments have been performed on cohesive and non-cohesive embankments to study the outflow hydrograph due to a dam break (e.g. Coleman et al. 2002; Gaucher et al. 2010). Kirkpatrick (1997) applied the data of 13 failed dams and six hypothetical failures to offer a relationship between the peak outflow and water depth. A review of the contemporary regression relationships was developed to model the breached earth dam's peak discharge. Wahl (1998, 2004) presented a collection of the previous case studies to propose several empirical relations for estimating dam-breach incidents and the peak discharge. Pierce et al. (2010) developed the Wahl (1998) breach criteria using information about new case studies and performed regression analyses on the composite database. Duricic et al. (2013) proposed a model using the kriging approach to predict peak outflow. Hooshyaripor et al. (2013) derived statistical expressions to predict peak outflow based on observed data and generated synthetic data using a copula method. Considering the complexity and uncertainty of the dam break phenomenon, the results of the classical models are not always reliable. In fact, limited databases, untested model assumption and a lack of filed data make the predictive accuracy of these models often questionable. Finding reliable approaches for modeling the dam failure is critical for assessment of flood risk. Improvement of the quality of this modeling has been addressed by multiple research works (e.g. Thornton et al. 2011; Froehlich 2016). A number of researchers have developed several prediction models using traditional statistical regression methods and artificial intelligence techniques. Application of various multivariate regression analyses has been a common approach in recent years for prediction of peak discharge, such as Froehlich (1995a), Pierce et al. (2010) and Thornton et al. (2011). Differential evolutionary algorithms have also been used, which can effectively solve complex optimization problems (Deng et al. 2020a, 2020b; Song et al. 2021a, 2021b).
So far, the meta model methods [e.g. Artificial Neural Networks (ANNs), Neuro-Fuzzy models (NF), Genetic Programming (GP), Support Vector Machine (SVM), and Kernel Extreme Learning Machine (KELM)], have been used for modelling complex hydraulic and hydrologic phenomena. Some examples of the meta model approaches' applications are total bedload prediction (Chang et al. 2012), suspended sediment concentration prediction (Kisi et al. 2012), urban flash flood forecasting (Yan et al. 2018), and stage-discharge curves developing (Azamathulla et al. 2011). Among other meta model approaches, SVM and GEP methods have been used in long-term prediction of lake water levels (Lan 2014), statistical downscaling of watershed precipitation (Hashmi et al. 2011), sediment transport modeling in sewer (Roushangar & Ghasempour 2017), and annual rainfall-runoff forecasting (Wang et al. 2018).
In artificial intelligence models we are looking for a learning machine capable of finding an accurate approximation of a natural phenomenon, as well as expressing it in the form of an interpretable equation. However, this bias towards interpretability creates several new issues. The computer-generated hypotheses should take advantage of the already existing body of knowledge about the domain in question. However, the method by which we express our knowledge and make it available to a learning machine remains rather unclear (Babovic 2009). Machine learning, a branch of artificial intelligence, deals with representation and generalization using the data learning technique (Sun & Li 2020). Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on instances of unseen data; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory. There is a wide variety of machine learning tasks and successful applications (Mitchell 1997). In general, the task of a machine learning algorithm can be described as follows: given a set of input variables and the associated output variable(s), the objective is learning a functional relationship for the input-output variables set. It should be noted that due to the black-box nature of artificial intelligence models, the learned relationship between the inputs and output is not revealed. These methods typically do not really represent the physics of a modeled process; they are just devices used to capture relationships between the relevant input and output variables. However, when the interrelationships among the relevant variables are poorly understood, finding the size and shape of the ultimate solution is difficult, and conventional mathematical analysis methods do not (or cannot) provide analytical solutions; these methods can predict the variable of interest with more accuracy.
Estimation of the probable flood under uncertain hydrologic conditions and routing the flood wave through downstream rivers could provide invaluable information for decision makers. However, the accuracy of assessment of flood outflow and corresponding damage are heavily reliant on the appropriate calculation of the outflow hydrograph caused by a dam break. Obviously, the higher the accuracy of the input hydrograph, the more precise outputs would result in routing models that significantly affect the ultimate risk management plan. Nowadays, meta model techniques are widely used by many researchers for estimation of the key parameters (e.g. peak value) of the input hydrograph. Drawing upon the knowledge of the previous data mining models, this paper aims to enhance the quality of estimation of dam failure outflows using two kernel-based models (SVM and KELM) and one heuristic method (GEP) under two scenarios. In the first scenario, only dam reservoir height and volume at failure time were considered as input combinations. In the second scenario, soil characteristics were added to input combinations and the impact of soil characteristics on the peak outflow modeling was assessed. Then, the most effective variables in the prediction process were investigated using sensitivity analysis. The prediction performance of the developed models was then compared with a number of previously developed models as benchmarks. Furthermore, Monte Carlo uncertainty analysis was applied to investigate the dependability of the used models.
MATERIALS AND METHODS
Data collection and sample data set
Several data sets of well documented cases of dam failures are available in the literature. In this study, three types of data sets were used, first: 93 embankment dam failures data sets, collected from a variety of sources (i.e. Froehlich 1995a, 1995b; Wahl 1998; Xu & Zhang 2009; Pierce et al. 2010), second: the experimental data from Amini et al. (2011), and third: the data generated using Breach software. The ranges of the relevant parameters in the data sets used in this study are given in Table 1, in which Vw is the volume of the water behind the dam at failure, Hw is the height of the water behind the dam at failure, Qp is the peak outflow, c is the cohesive strength, φ is the internal friction angle, d50 is the median diameter of soil material, γ is the unit weight of the soil, and n is the porosity.
The range of experimental data
Available data of embankment dam failures . | Experimental data . | Data generated by Breach software . | |||
---|---|---|---|---|---|
Parameters . | Amount . | Parameters . | Amount . | Parameters . | Amount . |
Vw (×106 m3) | 0.0037–660 | Vw (m3) | 0.53–1.64 | Vw (acres.ft) | 104,800–248,037 |
Hw (m) | 1.37–44 | Hw (m) | 0.13–0.41 | Hw(ft) | 82–150 |
Qp (m3/s) | 2.12–16,800 | Qp (m3/s) | 0.027–0.067 | Qp(cfs) | 373,518–3,411,232 |
c (lb/ft2) | 100–250 | ||||
φ | 5–40 | ||||
d50 (mm) | 0.03–1 | ||||
γ(Lb/ft3) | 80–160 | ||||
n | 0.25–0.4 |
Available data of embankment dam failures . | Experimental data . | Data generated by Breach software . | |||
---|---|---|---|---|---|
Parameters . | Amount . | Parameters . | Amount . | Parameters . | Amount . |
Vw (×106 m3) | 0.0037–660 | Vw (m3) | 0.53–1.64 | Vw (acres.ft) | 104,800–248,037 |
Hw (m) | 1.37–44 | Hw (m) | 0.13–0.41 | Hw(ft) | 82–150 |
Qp (m3/s) | 2.12–16,800 | Qp (m3/s) | 0.027–0.067 | Qp(cfs) | 373,518–3,411,232 |
c (lb/ft2) | 100–250 | ||||
φ | 5–40 | ||||
d50 (mm) | 0.03–1 | ||||
γ(Lb/ft3) | 80–160 | ||||
n | 0.25–0.4 |
Kernel-based methods
Kernel-based (KB) approaches are new methods used for classification and regression purposes. Kernel based approaches are based on statistical learning theory initiated and can be used for modeling the complex and non-linear phenomenon. Two important kernel-based approaches are KELM and SVM which work based on the different kernel types such as linear, polynomial, radial basis function (RBF) and sigmoid functions in SVM and linear, polynomial, and RBF in KELM.
Kernel Extreme Learning Machine (KELM)



The matrix K is identified as the target matrix of the hidden layers of the neural network. Huang et al. (2012) also introduced kernel functions in the design of ELM. Now number of kernel functions is used in the design of ELM such as Linear, Radial basis, Normalized polynomial, and Polynomial kernel functions. Kernel function based ELM design is named KELM. For more detail about KELM, readers and researchers are referred to Huang et al. (2012).
Support Vector Machine

Gene Expression Programming (GEP)
Gene Expression Programming (GEP) was developed by Ferreria (2001) using fundamental principles of the Genetic Algorithms (GA) and Genetic Programming (GP). One of the benefits of GEP is the simple genetic diversity creation because genetic operators work at the chromosome level. Another benefit of GEP is its unique, multi-genetic nature, which permits the development of more complex programs consisting of several subprograms. GEP, as GA, mimics the biological evolution to make a computer program to simulate a particular phenomenon (Ferreria 2001). A GEP algorithm starts by choosing five elements including the function set, terminal set, fitness function, control parameters, and stopping condition.
The predicted values are compared with the actual values in subsequent step, and when desirable results, in accord with a priori error tolerance are found, the GEP process is terminated. If the desired accuracy is not achieved, some chromosomes are selected by roulette wheel sampling method and for obtaining new chromosomes they are mutated. After the desired fitness score is obtained, this process is terminated and then the chromosomes are decoded for the best solution of the problem (Teodorescu & Sherwood 2008).
Performance criteria







Simulation and models development
In this study, according to Figure 2, two scenarios were considered for predicting the peak outflow. In scenario 1, the dam reservoir height and volume at failure time were used as input parameters. Models developed according to this scenario were tested using the available dam failure data in the literature and the experimental data from Amini et al. (2011). For scenario 2, soil characteristics were added to the input parameters of scenario 1 to examine the impact of soil characteristics on the models' accuracy. Scenario 2 models were tested using 82 data generated by Breach software. Table 2 shows the models developed in this study. The used data sets were divided into two parts of training (75% of the data) and testing (25% of the data) sets (Deng et al. 2020c).
SVM, KELM, and GPE developed models
Scenario 2 . | Scenario 1 . | ||
---|---|---|---|
Input variable(s) . | Model . | Input variable(s) . | Model . |
Hw, Vw | S(I) | Vw | H(I) |
Hw, Vw, d50 | S(II) | Hw | H(II) |
Hw, Vw, c, φ | S(III) | Hw, Vw | H(III) |
c, φ, d50 | S(IV) | ||
Hw, Vw, φ, d50 | S(V) | ||
Hw, Vw, c, φ, d50 | S(VI) |
Scenario 2 . | Scenario 1 . | ||
---|---|---|---|
Input variable(s) . | Model . | Input variable(s) . | Model . |
Hw, Vw | S(I) | Vw | H(I) |
Hw, Vw, d50 | S(II) | Hw | H(II) |
Hw, Vw, c, φ | S(III) | Hw, Vw | H(III) |
c, φ, d50 | S(IV) | ||
Hw, Vw, φ, d50 | S(V) | ||
Hw, Vw, c, φ, d50 | S(VI) |
SVM, KELM, and GEP models development
Each artificial intelligence method has its own settings and parameters and for achieving the desired results, the optimized amount of these parameters should be determined. In SVM and KELM, designing the selection of the appropriate type of kernel function is important. In this section of the paper, SVM and KELM methods were evaluated using model H(III) of scenario 1 in order to select the best kernel functions of each model. The results of performance criteria considering different kernel types are shown in Figure 3. The results show that modeling with RBF kernel function results in more desirable outcomes compared to other kernel functions. Accordingly, the RBF kernel was used as the core tool for the subsequent SVM and KELM models. In using GEP, the basic arithmetic operators of (+, −, ×, /) and several mathematical functions (X2, X3, √, ∛) were applied as the GEP function set. GEP models were evolved until the fitness function remained unchanged for 10,000 runs for each pre-defined number of gene. Then, the GEP model's parameters were optimized. In Table 3 all used genetic operators are listed.
Parameters of GEP models used in this study
Description of parameter . | Setting of parameter . | Description of parameter . | Setting of parameter . |
---|---|---|---|
Function set | +, −, *, /, X2, X3, √, ∛ | Fitness function error type | RMSE |
Chromosomes | 30 | Mutation rate | 0.044 |
Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |
Number of genes | 3 | One and two-point recombination rate | 0.3 |
Linking function | Addition | Gene recombination and transposition rate | 0.1 |
Description of parameter . | Setting of parameter . | Description of parameter . | Setting of parameter . |
---|---|---|---|
Function set | +, −, *, /, X2, X3, √, ∛ | Fitness function error type | RMSE |
Chromosomes | 30 | Mutation rate | 0.044 |
Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |
Number of genes | 3 | One and two-point recombination rate | 0.3 |
Linking function | Addition | Gene recombination and transposition rate | 0.1 |
Statistics parameters for different kernel functions of test series for model H(III).
Statistics parameters for different kernel functions of test series for model H(III).
RESULTS AND DISCUSSION
Modelling based on the dam reservoir height and volume at failure time (scenario 1)
Statistical parameters of developed models for scenario 1
Model . | Performance criteria . | ||||||
---|---|---|---|---|---|---|---|
Train . | Test . | ||||||
R . | DC . | RMSE . | R . | DC . | RMSE . | ||
Dam failures data | |||||||
H(I) | SVM | 0.732 | 0.660 | 0.059 | 0.708 | 0.625 | 0.063 |
KELM | 0.721 | 0.648 | 0.061 | 0.697 | 0.614 | 0.065 | |
GEP | 0.706 | 0.614 | 0.061 | 0.683 | 0.581 | 0.068 | |
H(II) | SVM | 0.755 | 0.702 | 0.054 | 0.716 | 0.658 | 0.060 |
KELM | 0.744 | 0.689 | 0.056 | 0.705 | 0.646 | 0.062 | |
GEP | 0.729 | 0.667 | 0.058 | 0.691 | 0.612 | 0.064 | |
H(III) | SVM | 0.891 | 0.850 | 0.042 | 0.870 | 0.787 | 0.049 |
KELM | 0.884 | 0.841 | 0.043 | 0.871 | 0.781 | 0.051 | |
GEP | 0.883 | 0.833 | 0.045 | 0.833 | 0.756 | 0.053 | |
Experimental data | |||||||
H(I) | SVM | 0.659 | 0.561 | 0.074 | 0.630 | 0.531 | 0.079 |
KELM | 0.647 | 0.554 | 0.075 | 0.621 | 0.521 | 0.081 | |
GEP | 0.635 | 0.522 | 0.077 | 0.608 | 0.494 | 0.085 | |
H(II) | SVM | 0.680 | 0.597 | 0.069 | 0.637 | 0.559 | 0.075 |
KELM | 0.671 | 0.586 | 0.070 | 0.627 | 0.541 | 0.078 | |
GEP | 0.656 | 0.567 | 0.072 | 0.615 | 0.523 | 0.079 | |
H(III) | SVM | 0.752 | 0.627 | 0.068 | 0.744 | 0.612 | 0.065 |
KELM | 0.751 | 0.627 | 0.068 | 0.741 | 0.613 | 0.630 | |
GEP | 0.755 | 0.625 | 0.069 | 0.738 | 0.608 | 0.066 |
Model . | Performance criteria . | ||||||
---|---|---|---|---|---|---|---|
Train . | Test . | ||||||
R . | DC . | RMSE . | R . | DC . | RMSE . | ||
Dam failures data | |||||||
H(I) | SVM | 0.732 | 0.660 | 0.059 | 0.708 | 0.625 | 0.063 |
KELM | 0.721 | 0.648 | 0.061 | 0.697 | 0.614 | 0.065 | |
GEP | 0.706 | 0.614 | 0.061 | 0.683 | 0.581 | 0.068 | |
H(II) | SVM | 0.755 | 0.702 | 0.054 | 0.716 | 0.658 | 0.060 |
KELM | 0.744 | 0.689 | 0.056 | 0.705 | 0.646 | 0.062 | |
GEP | 0.729 | 0.667 | 0.058 | 0.691 | 0.612 | 0.064 | |
H(III) | SVM | 0.891 | 0.850 | 0.042 | 0.870 | 0.787 | 0.049 |
KELM | 0.884 | 0.841 | 0.043 | 0.871 | 0.781 | 0.051 | |
GEP | 0.883 | 0.833 | 0.045 | 0.833 | 0.756 | 0.053 | |
Experimental data | |||||||
H(I) | SVM | 0.659 | 0.561 | 0.074 | 0.630 | 0.531 | 0.079 |
KELM | 0.647 | 0.554 | 0.075 | 0.621 | 0.521 | 0.081 | |
GEP | 0.635 | 0.522 | 0.077 | 0.608 | 0.494 | 0.085 | |
H(II) | SVM | 0.680 | 0.597 | 0.069 | 0.637 | 0.559 | 0.075 |
KELM | 0.671 | 0.586 | 0.070 | 0.627 | 0.541 | 0.078 | |
GEP | 0.656 | 0.567 | 0.072 | 0.615 | 0.523 | 0.079 | |
H(III) | SVM | 0.752 | 0.627 | 0.068 | 0.744 | 0.612 | 0.065 |
KELM | 0.751 | 0.627 | 0.068 | 0.741 | 0.613 | 0.630 | |
GEP | 0.755 | 0.625 | 0.069 | 0.738 | 0.608 | 0.066 |
(a) Comparison of observed and predicted dam failure peak outflow for superior model based on dam failure data and (b) expression trees for the best GEP model.
(a) Comparison of observed and predicted dam failure peak outflow for superior model based on dam failure data and (b) expression trees for the best GEP model.
Modelling based on Breach software data (scenario 2)
In scenario 2, several models were defined based on dam reservoir height and volume at failure time, and soil characteristics to investigate the impact of soil properties on the performance of the models. The 82 data series extracted from Breach software were used to carry out the peak outflow prediction. Table 5 and Figure 5(a) show the results of scenario 2. From the obtained results, it could be seen that between all developed models, model S(VI) with input parameters Hw, Vw, d50, c, and φ performed more successfully. Also, it could be seen that model S(V) with parameters Hw, Vw, d50, and φ approximately showed the same accuracy. According to the results of these two models, it could be indicated that cohesive strength does not have a significant impact on enhancing the models' accuracy. The obtained results showed that using only soil characteristics could not lead to the desired accuracy (model S(IV)). However, adding soil characteristics (c, d50, and φ) to input combinations (Hw, Vw) improve the models' accuracy. It could be seen that d50 increased modeling efficiency up to 5%, while c parameter increased modeling efficiency only 2%. Adding the combination of c and φ improved the models' accuracy up to 40%. Therefore, among soil characteristics, φ was more efficient. In general, it could be stated that dam reservoir height and volume at failure time and soil characteristics had impact on the peak outflow prediction.
Statistical parameters of developed models for scenario 2
Model . | Performance criteria . | ||||||
---|---|---|---|---|---|---|---|
Train . | Test . | ||||||
R . | DC . | RMSE . | R . | DC . | RMSE . | ||
S(I) | SVM | 0.955 | 0.893 | 0.091 | 0.925 | 0.881 | 0.105 |
KELM | 0.941 | 0.877 | 0.094 | 0.911 | 0.865 | 0.109 | |
GEP | 0.922 | 0.848 | 0.098 | 0.893 | 0.838 | 0.112 | |
S(II) | SVM | 0.961 | 0.921 | 0.082 | 0.948 | 0.892 | 0.101 |
KELM | 0.947 | 0.904 | 0.085 | 0.934 | 0.876 | 0.105 | |
GEP | 0.927 | 0.856 | 0.089 | 0.915 | 0.842 | 0.108 | |
S(III) | SVM | 0.988 | 0.967 | 0.045 | 0.978 | 0.951 | 0.064 |
KELM | 0.973 | 0.950 | 0.047 | 0.963 | 0.934 | 0.066 | |
GEP | 0.953 | 0.899 | 0.049 | 0.944 | 0.884 | 0.071 | |
S(IV) | SVM | 0.643 | 0.397 | 0.188 | 0.588 | 0.312 | 0.195 |
KELM | 0.633 | 0.390 | 0.195 | 0.579 | 0.306 | 0.202 | |
GEP | 0.620 | 0.369 | 0.205 | 0.567 | 0.290 | 0.209 | |
S(V) | SVM | 0.989 | 0.984 | 0.035 | 0.982 | 0.975 | 0.055 |
KELM | 0.974 | 0.966 | 0.036 | 0.967 | 0.955 | 0.055 | |
GEP | 0.981 | 0.980 | 0.037 | 0.981 | 0.974 | 0.056 | |
S(VI) | SVM | 0.991 | 0.985 | 0.034 | 0.983 | 0.976 | 0.054 |
KELM | 0.976 | 0.967 | 0.035 | 0.968 | 0.974 | 0.055 | |
GEP | 0.988 | 0.981 | 0.036 | 0.980 | 0.975 | 0.055 |
Model . | Performance criteria . | ||||||
---|---|---|---|---|---|---|---|
Train . | Test . | ||||||
R . | DC . | RMSE . | R . | DC . | RMSE . | ||
S(I) | SVM | 0.955 | 0.893 | 0.091 | 0.925 | 0.881 | 0.105 |
KELM | 0.941 | 0.877 | 0.094 | 0.911 | 0.865 | 0.109 | |
GEP | 0.922 | 0.848 | 0.098 | 0.893 | 0.838 | 0.112 | |
S(II) | SVM | 0.961 | 0.921 | 0.082 | 0.948 | 0.892 | 0.101 |
KELM | 0.947 | 0.904 | 0.085 | 0.934 | 0.876 | 0.105 | |
GEP | 0.927 | 0.856 | 0.089 | 0.915 | 0.842 | 0.108 | |
S(III) | SVM | 0.988 | 0.967 | 0.045 | 0.978 | 0.951 | 0.064 |
KELM | 0.973 | 0.950 | 0.047 | 0.963 | 0.934 | 0.066 | |
GEP | 0.953 | 0.899 | 0.049 | 0.944 | 0.884 | 0.071 | |
S(IV) | SVM | 0.643 | 0.397 | 0.188 | 0.588 | 0.312 | 0.195 |
KELM | 0.633 | 0.390 | 0.195 | 0.579 | 0.306 | 0.202 | |
GEP | 0.620 | 0.369 | 0.205 | 0.567 | 0.290 | 0.209 | |
S(V) | SVM | 0.989 | 0.984 | 0.035 | 0.982 | 0.975 | 0.055 |
KELM | 0.974 | 0.966 | 0.036 | 0.967 | 0.955 | 0.055 | |
GEP | 0.981 | 0.980 | 0.037 | 0.981 | 0.974 | 0.056 | |
S(VI) | SVM | 0.991 | 0.985 | 0.034 | 0.983 | 0.976 | 0.054 |
KELM | 0.976 | 0.967 | 0.035 | 0.968 | 0.974 | 0.055 | |
GEP | 0.988 | 0.981 | 0.036 | 0.980 | 0.975 | 0.055 |
(a) Comparison of observed and predicted dam failure peak outflow for model S(VI), (b) statistical RMSE parameter obtained from sensitivity analysis, and (c) uncertainty analysis for the SVM-best models.
(a) Comparison of observed and predicted dam failure peak outflow for model S(VI), (b) statistical RMSE parameter obtained from sensitivity analysis, and (c) uncertainty analysis for the SVM-best models.
Uncertainty analysis
In order to find the uncertainty of the superior SVM models, uncertainty analysis is done. According to Abbaspour et al. (2007), uncertainty is a result-dependent factor that represents the range of values a modelling result can attain.

Comparison of the artificial intelligence models with the classical equations
In this study, two scenarios were considered for predicting the dam failure peak outflow. In the first scenario, only dam reservoir height and volume at the time of failure were used as the inputs and results showed that the combinations of Hw and Vw led to better outcomes. Also, in the second scenario, soil characteristics were added to the combinations of Hw and Vw and results showed that the model with parameters of Hw, Vw, d50, c, and φ was the superior model. Accuracy of the best proposed models of the considered scenarios and some dam failure peak out flow semi-theoretical formulae available in literature was compared to evaluate the performance of applied approaches. The results of comparison for each scenario and testing procedure are represented in Table 6 and Figure 6. According to three evaluated criteria (R, DC, RMSE), which are shown in Table 6, it could be seen that the estimated values via applied models had more accurate results than equations. It should be noted that the existing equations are developed based on special conditions, therefore the application of equations are limited to special cases of their development and did not show uniform results under different conditions. The mentioned issue can be seen in Figure 6, which showed that the obtained results from equations differ from each other and from the measured data, and had less preciseness than proposed models in this study. From the obtained results, among all equations, the MacDonald and Langridge-Monopolis equation yielded better results. It could be seen that for both scenarios, classical equations in most cases under estimated the peak outflow data. However, the obtained results by artificial intelligence models were more accurate. The intelligence applied approaches had the highest correlation with measured data and the lowest error. This issue shows the high capability of the SVM, KELM, and GEP methods in predicting the peak outflow. According to the black-box nature of artificial intelligence models, the learned relationship between input and output parameters is not apparent; however, these methods can significantly reduce the complexity of the system, identify the effective parameters and achieve the desired result.
Comparison of prediction from proposed equations and developed models
Model . | Performance criteria . | |||
---|---|---|---|---|
R . | DC . | RMSE . | ||
Scenario 1 data | ||||
Evans | Qp = 0.72 Vw0.53 | 0.798 | 0.387 | 0.166 |
MacDonald and Langridge-Monopolis | Qp = 1.154(HwVw)0.412 | 0.779 | 0.574 | 0.138 |
Costa | Qp = 0.763(HwVw)0.42 | 0.731 | 0.419 | 0.162 |
Froehlich | Qp = 0.607(Hw1.24 × Vw0.295) | 0.769 | 0.218 | 0.205 |
SVM | H(III) | 0.870 | 0.787 | 0.049 |
KELM | H(III) | 0.871 | 0.781 | 0.051 |
GEP | H(III) | 0.833 | 0.756 | 0.052 |
Scenario 2 data | ||||
Evans | Qp = 0.72 Vw0.53 | 0.678 | 0.352 | 0.199 |
MacDonald and Langridge-Monopolis | Qp = 1.154(HwVw)0.412 | 0.703 | 0.522 | 0.165 |
Costa | Qp = 0.763(HwVw)0.42 | 0.760 | 0.403 | 0.186 |
Froehlich | Qp = 0.607(Hw1.24 × Vw0.295) | 0.823 | 0.211 | 0.235 |
SVM | S(I) | 0.925 | 0.881 | 0.105 |
KELM | S(I) | 0.911 | 0.865 | 0.109 |
GEP | S(I) | 0.893 | 0.838 | 0.112 |
Model . | Performance criteria . | |||
---|---|---|---|---|
R . | DC . | RMSE . | ||
Scenario 1 data | ||||
Evans | Qp = 0.72 Vw0.53 | 0.798 | 0.387 | 0.166 |
MacDonald and Langridge-Monopolis | Qp = 1.154(HwVw)0.412 | 0.779 | 0.574 | 0.138 |
Costa | Qp = 0.763(HwVw)0.42 | 0.731 | 0.419 | 0.162 |
Froehlich | Qp = 0.607(Hw1.24 × Vw0.295) | 0.769 | 0.218 | 0.205 |
SVM | H(III) | 0.870 | 0.787 | 0.049 |
KELM | H(III) | 0.871 | 0.781 | 0.051 |
GEP | H(III) | 0.833 | 0.756 | 0.052 |
Scenario 2 data | ||||
Evans | Qp = 0.72 Vw0.53 | 0.678 | 0.352 | 0.199 |
MacDonald and Langridge-Monopolis | Qp = 1.154(HwVw)0.412 | 0.703 | 0.522 | 0.165 |
Costa | Qp = 0.763(HwVw)0.42 | 0.760 | 0.403 | 0.186 |
Froehlich | Qp = 0.607(Hw1.24 × Vw0.295) | 0.823 | 0.211 | 0.235 |
SVM | S(I) | 0.925 | 0.881 | 0.105 |
KELM | S(I) | 0.911 | 0.865 | 0.109 |
GEP | S(I) | 0.893 | 0.838 | 0.112 |
Scatter plot of proposed equations and best model of SVM and GEP models, (a) scenario 1, (b) scenario 2.
Scatter plot of proposed equations and best model of SVM and GEP models, (a) scenario 1, (b) scenario 2.
CONCLUSIONS
Dam failure is a catastrophic phenomenon that can create many serious problems. In this paper, three data-driven predictive models, named SVM, KELM, and GEP, were used to predict the peak outflow of a breached embankment dam. To find an effective model, two scenarios are considered in the modelling process. In scenario 1, it is considered that Qp depended on dam reservoir height and volume at the failure time. In scenario 2, soil characteristics are added to input combinations and the impact of soil characteristics was investigated on modelling the peak outflow. The results were then compared to previous research works.
From the results, it was found that scenario 2, which considered both reservoir parameters (i.e. height and volume) at the failure time and soil characteristics as the input for modelling peak outflow, led to more accurate predictions than scenario 1, which used only a combinations of reservoir height and volume parameters as inputs. And this issue confirmed the impact of soil properties on modelling peak outflow. In scenario 1, the model with input parameters Hw and Vw yielded better results, and in scenario 2, the model with parameters Hw, Vw, d50, c, and φ performed better than the others. However, it was found that the model with parameters Hw, Vw, d50, and φ approximately showed the same results and the cohesive strength parameter had a little impact on improving the efficiency of the models. Based on the obtained results, using only soil characteristics could not lead to the desired accuracy; however, adding soil characteristics (c, d50, and φ) to input combinations (Hw and Vw) improved the accuracy of models.
From the sensitivity analysis results, it was observed that Hw was the most important variable in the peak outflow modelling. Also, the SVM-based model dependability was investigated using uncertainty analysis. According to the results, the SVM model had an allowable degree of uncertainty in modelling the dam failure peak outflow. A comparison was also made between artificial intelligence methods and several selected empirical formulas. The obtained results showed the superior performance of intelligence applied methods over all the semi-empirical equations in estimating the peak outflow of breached embankment dams.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.