Accurate prediction of a breached dam's peak outflow is a significant factor for flood risk analysis. In this study, the capability of Support Vector Machine and Kernel Extreme Learning Machine as kernel-based approaches and Gene Expression Programming method was assessed in breached dam peak outflow prediction. Two types of modeling were considered. First, only dam reservoir height and volume at the failure time were used as the input combinations (state 1). Then, soil characteristics were added to input combinations to investigate particularly the impact of soil characteristics (state 2). Results showed that the use of only soil characteristics did not lead to a desired accuracy; however, adding soil characteristics to input combinations (state 2) improved the models' accuracy up to 40%. The outcome of the applied models was also compared with existing empirical equations and it was found the applied models yielded better results. Sensitivity analysis results showed that dam height had the most important role in the peak outflow prediction, while the strength parameters did not have significant impacts. Furthermore, for assessing the best-applied model dependability, uncertainty analysis was used and the results indicated that the SVM model had an allowable degree of uncertainty in peak outflow modelling.

Some novelties of the present research can be summarized as follows:

  • Applicability and accuracy of two different meta models, namely, Gene Expression Programming (GEP) and Support Vector Machine (SVM), are used to predict the peak outflow from breached embankment dams.

  • Two different scenarios are developed based on dam reservoir height and volume at the time of failure and soil characteristics.

  • Additional to the geometrical characteristics of dam, the impact of soil characteristics on modelling the peak outflow is investigated.

  • The outcomes of the SVM and GEP models are also compared with the existing empirical equations and it is shown that the intelligence models yield better results.

The main purpose of dam building is to enhance the quality of human life by providing drinking water, power generation, navigation, flood control and so on. However, dam failure can cause catastrophic flooding and consequently presents high risk to human life and property located in the downstream. In order to prevent and mitigate such a natural hazard, dam owners and agencies responsible for dam safety carefully study, analyze and inspect dams to identify significant failure modes. Over-topping and piping are the most encountered modes of failures causing breach of embankment dams (Wahl 2010). The breach parameters: time of failure and breach width and the peak outflow are crucial in evaluating dam risk assessments. Accurate predictions of such parameters remain a challenging task. Peak outflow rate due to dam failure is an important parameter in prediction of inundation water level for the emergency program planning and flood mitigation. The historical dam failure cases reveal a considerable range of financial, structural, and human losses. For instance, the South Fork Dam in Pennsylvania, USA, failed in 1889 due to overtopping, killing 2200 people and causing significant property losses (Singh & Scarlatos 1988). According to Wahl (2010), the accuracy of flood outflow assessment and related damages strongly depends on the proper computing of the outflow hydrograph caused by a dam break. In the past decades, accurate prediction of peak outflow has been of much interest among many investigators. Several experiments have been performed on cohesive and non-cohesive embankments to study the outflow hydrograph due to a dam break (e.g. Coleman et al. 2002; Gaucher et al. 2010). Kirkpatrick (1997) applied the data of 13 failed dams and six hypothetical failures to offer a relationship between the peak outflow and water depth. A review of the contemporary regression relationships was developed to model the breached earth dam's peak discharge. Wahl (1998, 2004) presented a collection of the previous case studies to propose several empirical relations for estimating dam-breach incidents and the peak discharge. Pierce et al. (2010) developed the Wahl (1998) breach criteria using information about new case studies and performed regression analyses on the composite database. Duricic et al. (2013) proposed a model using the kriging approach to predict peak outflow. Hooshyaripor et al. (2013) derived statistical expressions to predict peak outflow based on observed data and generated synthetic data using a copula method. Considering the complexity and uncertainty of the dam break phenomenon, the results of the classical models are not always reliable. In fact, limited databases, untested model assumption and a lack of filed data make the predictive accuracy of these models often questionable. Finding reliable approaches for modeling the dam failure is critical for assessment of flood risk. Improvement of the quality of this modeling has been addressed by multiple research works (e.g. Thornton et al. 2011; Froehlich 2016). A number of researchers have developed several prediction models using traditional statistical regression methods and artificial intelligence techniques. Application of various multivariate regression analyses has been a common approach in recent years for prediction of peak discharge, such as Froehlich (1995a), Pierce et al. (2010) and Thornton et al. (2011). Differential evolutionary algorithms have also been used, which can effectively solve complex optimization problems (Deng et al. 2020a, 2020b; Song et al. 2021a, 2021b).

So far, the meta model methods [e.g. Artificial Neural Networks (ANNs), Neuro-Fuzzy models (NF), Genetic Programming (GP), Support Vector Machine (SVM), and Kernel Extreme Learning Machine (KELM)], have been used for modelling complex hydraulic and hydrologic phenomena. Some examples of the meta model approaches' applications are total bedload prediction (Chang et al. 2012), suspended sediment concentration prediction (Kisi et al. 2012), urban flash flood forecasting (Yan et al. 2018), and stage-discharge curves developing (Azamathulla et al. 2011). Among other meta model approaches, SVM and GEP methods have been used in long-term prediction of lake water levels (Lan 2014), statistical downscaling of watershed precipitation (Hashmi et al. 2011), sediment transport modeling in sewer (Roushangar & Ghasempour 2017), and annual rainfall-runoff forecasting (Wang et al. 2018).

In artificial intelligence models we are looking for a learning machine capable of finding an accurate approximation of a natural phenomenon, as well as expressing it in the form of an interpretable equation. However, this bias towards interpretability creates several new issues. The computer-generated hypotheses should take advantage of the already existing body of knowledge about the domain in question. However, the method by which we express our knowledge and make it available to a learning machine remains rather unclear (Babovic 2009). Machine learning, a branch of artificial intelligence, deals with representation and generalization using the data learning technique (Sun & Li 2020). Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on instances of unseen data; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory. There is a wide variety of machine learning tasks and successful applications (Mitchell 1997). In general, the task of a machine learning algorithm can be described as follows: given a set of input variables and the associated output variable(s), the objective is learning a functional relationship for the input-output variables set. It should be noted that due to the black-box nature of artificial intelligence models, the learned relationship between the inputs and output is not revealed. These methods typically do not really represent the physics of a modeled process; they are just devices used to capture relationships between the relevant input and output variables. However, when the interrelationships among the relevant variables are poorly understood, finding the size and shape of the ultimate solution is difficult, and conventional mathematical analysis methods do not (or cannot) provide analytical solutions; these methods can predict the variable of interest with more accuracy.

Estimation of the probable flood under uncertain hydrologic conditions and routing the flood wave through downstream rivers could provide invaluable information for decision makers. However, the accuracy of assessment of flood outflow and corresponding damage are heavily reliant on the appropriate calculation of the outflow hydrograph caused by a dam break. Obviously, the higher the accuracy of the input hydrograph, the more precise outputs would result in routing models that significantly affect the ultimate risk management plan. Nowadays, meta model techniques are widely used by many researchers for estimation of the key parameters (e.g. peak value) of the input hydrograph. Drawing upon the knowledge of the previous data mining models, this paper aims to enhance the quality of estimation of dam failure outflows using two kernel-based models (SVM and KELM) and one heuristic method (GEP) under two scenarios. In the first scenario, only dam reservoir height and volume at failure time were considered as input combinations. In the second scenario, soil characteristics were added to input combinations and the impact of soil characteristics on the peak outflow modeling was assessed. Then, the most effective variables in the prediction process were investigated using sensitivity analysis. The prediction performance of the developed models was then compared with a number of previously developed models as benchmarks. Furthermore, Monte Carlo uncertainty analysis was applied to investigate the dependability of the used models.

Data collection and sample data set

Several data sets of well documented cases of dam failures are available in the literature. In this study, three types of data sets were used, first: 93 embankment dam failures data sets, collected from a variety of sources (i.e. Froehlich 1995a, 1995b; Wahl 1998; Xu & Zhang 2009; Pierce et al. 2010), second: the experimental data from Amini et al. (2011), and third: the data generated using Breach software. The ranges of the relevant parameters in the data sets used in this study are given in Table 1, in which Vw is the volume of the water behind the dam at failure, Hw is the height of the water behind the dam at failure, Qp is the peak outflow, c is the cohesive strength, φ is the internal friction angle, d50 is the median diameter of soil material, γ is the unit weight of the soil, and n is the porosity.

Table 1

The range of experimental data

Available data of embankment dam failures
Experimental data
Data generated by Breach software
ParametersAmountParametersAmountParametersAmount
Vw (×106 m30.0037–660 Vw (m30.53–1.64 Vw (acres.ft) 104,800–248,037 
Hw (m) 1.37–44 Hw (m) 0.13–0.41 Hw(ft) 82–150 
Qp (m3/s) 2.12–16,800 Qp (m3/s) 0.027–0.067 Qp(cfs) 373,518–3,411,232 
    c (lb/ft2100–250 
    φ 5–40 
    d50 (mm) 0.03–1 
    γ(Lb/ft380–160 
    0.25–0.4 
Available data of embankment dam failures
Experimental data
Data generated by Breach software
ParametersAmountParametersAmountParametersAmount
Vw (×106 m30.0037–660 Vw (m30.53–1.64 Vw (acres.ft) 104,800–248,037 
Hw (m) 1.37–44 Hw (m) 0.13–0.41 Hw(ft) 82–150 
Qp (m3/s) 2.12–16,800 Qp (m3/s) 0.027–0.067 Qp(cfs) 373,518–3,411,232 
    c (lb/ft2100–250 
    φ 5–40 
    d50 (mm) 0.03–1 
    γ(Lb/ft380–160 
    0.25–0.4 

Kernel-based methods

Kernel-based (KB) approaches are new methods used for classification and regression purposes. Kernel based approaches are based on statistical learning theory initiated and can be used for modeling the complex and non-linear phenomenon. Two important kernel-based approaches are KELM and SVM which work based on the different kernel types such as linear, polynomial, radial basis function (RBF) and sigmoid functions in SVM and linear, polynomial, and RBF in KELM.

Kernel Extreme Learning Machine (KELM)

Extreme Learning Machine (ELM) is a Single Layer Feed Forward Neural Network (SLFFNN) preparing method initially introduced by Huang et al. (2006). SLFFNN is a straight framework where information weights linked to hidden neurons and hidden layer biases are haphazardly chosen, while the weights among the hidden nodes are resolved logically. This strategy likewise has preferred execution and adapts progressively over the bygone era learning methods (Huang et al. 2006), in light of the fact that not at all like traditional techniques that involve numerous variables to set up, demonstrating a complex issue utilizing this technique does not need much human intercession to accomplish ideal parameters. The standard single-layer neural system with N random information (ai,bi), M hidden neurons, and the active function f(a) are shown as follows:
(1)
(2)
(3)
where is the weight vector that joins the input layer to the hidden layer, and is the weight vector that joins the hidden layer to the target layer. ci shows the hidden neuron biases. The general SLFFNN network with the M hidden neuron and the activation function f(a) can predict N information with an average zero error , in which:
(4)
Equations (4) can be summarized as:
(5)
(6)
(7)

The matrix K is identified as the target matrix of the hidden layers of the neural network. Huang et al. (2012) also introduced kernel functions in the design of ELM. Now number of kernel functions is used in the design of ELM such as Linear, Radial basis, Normalized polynomial, and Polynomial kernel functions. Kernel function based ELM design is named KELM. For more detail about KELM, readers and researchers are referred to Huang et al. (2012).

Support Vector Machine

One of the important kernel-based approaches is SVM, which works based on the different kernel types such as linear, polynomial, radial basis function (RBF) and sigmoid functions. Such a model is capable of adapting itself to predict any variable of interest via sufficient inputs. The SVM could model non-linear decision boundaries, and there are many kernels to choose from. It is also fairly robust against overfitting, especially in high-dimensional space. A SVM constructs a hyper plane or set of hyper planes in a high or infinite dimensional space, in fact the SVM method is based on the concept of an optimal hyper plane that separates samples of two classes by considering the widest gap between two classes (see Figure 1). The appropriate selection of kernel type is the most important step in the SVM due to its direct impact on the training and classification precision. The purpose of SVM is to characterize the flattest function that has the minimum deviation from the actually obtained objectives for all training data. SVM formulation for regression aim is:
(8)
where φ(x) is a nonlinear function of input x, b is called the bias and the vector, w, is known as the weight factor. The coefficients of Equation (8) are predicted by minimizing regularized risk function as expressed below:
(9)
where
(10)
C: the cost factor which shows the trade-off between the weight factor and approximation error, ɛ: the radius of the tube within which the regression function must lie, : the loss function in which yi is the forecasted value, and ti is the desired value in period i. According to Equation (10), if the predicted value is out of ɛ tube then the loss will be the absolute value, which is the difference between the predicted value and ɛ.
Figure 1

Data classification and support vectors.

Figure 1

Data classification and support vectors.

Close modal

Gene Expression Programming (GEP)

Gene Expression Programming (GEP) was developed by Ferreria (2001) using fundamental principles of the Genetic Algorithms (GA) and Genetic Programming (GP). One of the benefits of GEP is the simple genetic diversity creation because genetic operators work at the chromosome level. Another benefit of GEP is its unique, multi-genetic nature, which permits the development of more complex programs consisting of several subprograms. GEP, as GA, mimics the biological evolution to make a computer program to simulate a particular phenomenon (Ferreria 2001). A GEP algorithm starts by choosing five elements including the function set, terminal set, fitness function, control parameters, and stopping condition.

The predicted values are compared with the actual values in subsequent step, and when desirable results, in accord with a priori error tolerance are found, the GEP process is terminated. If the desired accuracy is not achieved, some chromosomes are selected by roulette wheel sampling method and for obtaining new chromosomes they are mutated. After the desired fitness score is obtained, this process is terminated and then the chromosomes are decoded for the best solution of the problem (Teodorescu & Sherwood 2008).

Performance criteria

For investigating the performance of applied methods, three statistical parameters were used: Correlation Coefficient (R), Determination Coefficient (DC), and Root Mean Square Errors (RSME), expressions for which are as follows:
(11)
where , , , , N are the observed values, predicted values, mean observed values, mean predicted values and number of data samples, respectively. It should be noted that predicting the parameter of interest via raw (non-normalized) data can results in undesirable predictions. So, in the current study, all input variables were scaled to fall in the range 0.1–1 to eliminate the influence of the variability of the absolute magnitudes of the data. To this end, the following equation was used.
(12)
where , n, , are the normalized, the original, the maximum and minimum amounts of parameter n, respectively.

Simulation and models development

The proper choice of the input combinations is the crucial step in modelling using an intelligent approach. According to Nourani et al. (2012), dam reservoir height and volume at failure time can be considered as the important factors in dam breach peak discharge modelling. Based on Wahl (2004) and Nourani et al. (2012) the important variables which affect the peak outflow are:
(13)

In this study, according to Figure 2, two scenarios were considered for predicting the peak outflow. In scenario 1, the dam reservoir height and volume at failure time were used as input parameters. Models developed according to this scenario were tested using the available dam failure data in the literature and the experimental data from Amini et al. (2011). For scenario 2, soil characteristics were added to the input parameters of scenario 1 to examine the impact of soil characteristics on the models' accuracy. Scenario 2 models were tested using 82 data generated by Breach software. Table 2 shows the models developed in this study. The used data sets were divided into two parts of training (75% of the data) and testing (25% of the data) sets (Deng et al. 2020c).

Table 2

SVM, KELM, and GPE developed models

Scenario 2
Scenario 1
Input variable(s)ModelInput variable(s)Model
Hw, Vw S(I) Vw H(I) 
Hw, Vw, d50 S(II) Hw H(II) 
Hw, Vw, c, φ S(III) Hw, Vw H(III) 
c, φ, d50 S(IV)   
Hw, Vw, φ, d50 S(V)   
Hw, Vw, c, φ, d50 S(VI)   
Scenario 2
Scenario 1
Input variable(s)ModelInput variable(s)Model
Hw, Vw S(I) Vw H(I) 
Hw, Vw, d50 S(II) Hw H(II) 
Hw, Vw, c, φ S(III) Hw, Vw H(III) 
c, φ, d50 S(IV)   
Hw, Vw, φ, d50 S(V)   
Hw, Vw, c, φ, d50 S(VI)   
Figure 2

Schematic view of a dam breach peak outflow modelling and testing.

Figure 2

Schematic view of a dam breach peak outflow modelling and testing.

Close modal

SVM, KELM, and GEP models development

Each artificial intelligence method has its own settings and parameters and for achieving the desired results, the optimized amount of these parameters should be determined. In SVM and KELM, designing the selection of the appropriate type of kernel function is important. In this section of the paper, SVM and KELM methods were evaluated using model H(III) of scenario 1 in order to select the best kernel functions of each model. The results of performance criteria considering different kernel types are shown in Figure 3. The results show that modeling with RBF kernel function results in more desirable outcomes compared to other kernel functions. Accordingly, the RBF kernel was used as the core tool for the subsequent SVM and KELM models. In using GEP, the basic arithmetic operators of (+, −, ×, /) and several mathematical functions (X2, X3, √, ∛) were applied as the GEP function set. GEP models were evolved until the fitness function remained unchanged for 10,000 runs for each pre-defined number of gene. Then, the GEP model's parameters were optimized. In Table 3 all used genetic operators are listed.

Table 3

Parameters of GEP models used in this study

Description of parameterSetting of parameterDescription of parameterSetting of parameter
Function set +, −, *, /, X2, X3, √, ∛ Fitness function error type RMSE 
Chromosomes 30 Mutation rate 0.044 
Head size Inversion, IS and RIS transposition rate 0.1 
Number of genes One and two-point recombination rate 0.3 
Linking function Addition Gene recombination and transposition rate 0.1 
Description of parameterSetting of parameterDescription of parameterSetting of parameter
Function set +, −, *, /, X2, X3, √, ∛ Fitness function error type RMSE 
Chromosomes 30 Mutation rate 0.044 
Head size Inversion, IS and RIS transposition rate 0.1 
Number of genes One and two-point recombination rate 0.3 
Linking function Addition Gene recombination and transposition rate 0.1 
Figure 3

Statistics parameters for different kernel functions of test series for model H(III).

Figure 3

Statistics parameters for different kernel functions of test series for model H(III).

Close modal

Modelling based on the dam reservoir height and volume at failure time (scenario 1)

The models in the scenario 1 were developed based on available data sets of dam failures considering the Hw, Vw, or both as input variables. The defined models were tested with SVM, KELM, and GEP models to carry out the dam failure peak outflow prediction. In the modeling process, two data series were used including the available dam failures data in the literature and experimental data from Amini et al. (2011). The obtained results are shown in Figure 4(a) and listed in Table 4. Based on the statistical parameters (RMSE, R, and DC) results, it could be deduced that for both data series, between developed models the model H(III), which uses both Hw, Vw as input parameters, resulted in more desirable outcomes compared to other models in modeling of the peak outflow. This issue confirmed the importance of both dam reservoir height and volume at failure time in the peak outflow estimating process. Also, it could be seen that the impact of variable Hw on the modelling process was more than variable Vw. The salient feature of GEP is that it can provide an explicit equation for the studied parameter. The GEP chromosomes are usually composed of one or more genes. Each gene code for a sub-expression tree (ET) and the sub-ETs interact with one another to form a more complex multi-subunit ET. Figure 4(b) shows the ETs for the model H(III). In Figure 4(b) the parameters d0 and d1 are Vw and Hw, respectively. In this study, the sub-ETs were linked by addition operator. The mathematical expression of the best GEP model is as follows:
(14)
Table 4

Statistical parameters of developed models for scenario 1

ModelPerformance criteria
Train
Test
RDCRMSERDCRMSE
Dam failures data 
 H(I) SVM 0.732 0.660 0.059 0.708 0.625 0.063 
KELM 0.721 0.648 0.061 0.697 0.614 0.065 
GEP 0.706 0.614 0.061 0.683 0.581 0.068 
 H(II) SVM 0.755 0.702 0.054 0.716 0.658 0.060 
KELM 0.744 0.689 0.056 0.705 0.646 0.062 
GEP 0.729 0.667 0.058 0.691 0.612 0.064 
 H(III) SVM 0.891 0.850 0.042 0.870 0.787 0.049 
KELM 0.884 0.841 0.043 0.871 0.781 0.051 
GEP 0.883 0.833 0.045 0.833 0.756 0.053 
Experimental data 
 H(I) SVM 0.659 0.561 0.074 0.630 0.531 0.079 
KELM 0.647 0.554 0.075 0.621 0.521 0.081 
GEP 0.635 0.522 0.077 0.608 0.494 0.085 
 H(II) SVM 0.680 0.597 0.069 0.637 0.559 0.075 
KELM 0.671 0.586 0.070 0.627 0.541 0.078 
GEP 0.656 0.567 0.072 0.615 0.523 0.079 
 H(III) SVM 0.752 0.627 0.068 0.744 0.612 0.065 
KELM 0.751 0.627 0.068 0.741 0.613 0.630 
GEP 0.755 0.625 0.069 0.738 0.608 0.066 
ModelPerformance criteria
Train
Test
RDCRMSERDCRMSE
Dam failures data 
 H(I) SVM 0.732 0.660 0.059 0.708 0.625 0.063 
KELM 0.721 0.648 0.061 0.697 0.614 0.065 
GEP 0.706 0.614 0.061 0.683 0.581 0.068 
 H(II) SVM 0.755 0.702 0.054 0.716 0.658 0.060 
KELM 0.744 0.689 0.056 0.705 0.646 0.062 
GEP 0.729 0.667 0.058 0.691 0.612 0.064 
 H(III) SVM 0.891 0.850 0.042 0.870 0.787 0.049 
KELM 0.884 0.841 0.043 0.871 0.781 0.051 
GEP 0.883 0.833 0.045 0.833 0.756 0.053 
Experimental data 
 H(I) SVM 0.659 0.561 0.074 0.630 0.531 0.079 
KELM 0.647 0.554 0.075 0.621 0.521 0.081 
GEP 0.635 0.522 0.077 0.608 0.494 0.085 
 H(II) SVM 0.680 0.597 0.069 0.637 0.559 0.075 
KELM 0.671 0.586 0.070 0.627 0.541 0.078 
GEP 0.656 0.567 0.072 0.615 0.523 0.079 
 H(III) SVM 0.752 0.627 0.068 0.744 0.612 0.065 
KELM 0.751 0.627 0.068 0.741 0.613 0.630 
GEP 0.755 0.625 0.069 0.738 0.608 0.066 
Figure 4

(a) Comparison of observed and predicted dam failure peak outflow for superior model based on dam failure data and (b) expression trees for the best GEP model.

Figure 4

(a) Comparison of observed and predicted dam failure peak outflow for superior model based on dam failure data and (b) expression trees for the best GEP model.

Close modal

Modelling based on Breach software data (scenario 2)

In scenario 2, several models were defined based on dam reservoir height and volume at failure time, and soil characteristics to investigate the impact of soil properties on the performance of the models. The 82 data series extracted from Breach software were used to carry out the peak outflow prediction. Table 5 and Figure 5(a) show the results of scenario 2. From the obtained results, it could be seen that between all developed models, model S(VI) with input parameters Hw, Vw, d50, c, and φ performed more successfully. Also, it could be seen that model S(V) with parameters Hw, Vw, d50, and φ approximately showed the same accuracy. According to the results of these two models, it could be indicated that cohesive strength does not have a significant impact on enhancing the models' accuracy. The obtained results showed that using only soil characteristics could not lead to the desired accuracy (model S(IV)). However, adding soil characteristics (c, d50, and φ) to input combinations (Hw, Vw) improve the models' accuracy. It could be seen that d50 increased modeling efficiency up to 5%, while c parameter increased modeling efficiency only 2%. Adding the combination of c and φ improved the models' accuracy up to 40%. Therefore, among soil characteristics, φ was more efficient. In general, it could be stated that dam reservoir height and volume at failure time and soil characteristics had impact on the peak outflow prediction.

Table 5

Statistical parameters of developed models for scenario 2

ModelPerformance criteria
Train
Test
RDCRMSERDCRMSE
S(I) SVM 0.955 0.893 0.091 0.925 0.881 0.105 
KELM 0.941 0.877 0.094 0.911 0.865 0.109 
GEP 0.922 0.848 0.098 0.893 0.838 0.112 
S(II) SVM 0.961 0.921 0.082 0.948 0.892 0.101 
KELM 0.947 0.904 0.085 0.934 0.876 0.105 
GEP 0.927 0.856 0.089 0.915 0.842 0.108 
S(III) SVM 0.988 0.967 0.045 0.978 0.951 0.064 
KELM 0.973 0.950 0.047 0.963 0.934 0.066 
GEP 0.953 0.899 0.049 0.944 0.884 0.071 
S(IV) SVM 0.643 0.397 0.188 0.588 0.312 0.195 
KELM 0.633 0.390 0.195 0.579 0.306 0.202 
GEP 0.620 0.369 0.205 0.567 0.290 0.209 
S(V) SVM 0.989 0.984 0.035 0.982 0.975 0.055 
KELM 0.974 0.966 0.036 0.967 0.955 0.055 
GEP 0.981 0.980 0.037 0.981 0.974 0.056 
S(VI) SVM 0.991 0.985 0.034 0.983 0.976 0.054 
KELM 0.976 0.967 0.035 0.968 0.974 0.055 
GEP 0.988 0.981 0.036 0.980 0.975 0.055 
ModelPerformance criteria
Train
Test
RDCRMSERDCRMSE
S(I) SVM 0.955 0.893 0.091 0.925 0.881 0.105 
KELM 0.941 0.877 0.094 0.911 0.865 0.109 
GEP 0.922 0.848 0.098 0.893 0.838 0.112 
S(II) SVM 0.961 0.921 0.082 0.948 0.892 0.101 
KELM 0.947 0.904 0.085 0.934 0.876 0.105 
GEP 0.927 0.856 0.089 0.915 0.842 0.108 
S(III) SVM 0.988 0.967 0.045 0.978 0.951 0.064 
KELM 0.973 0.950 0.047 0.963 0.934 0.066 
GEP 0.953 0.899 0.049 0.944 0.884 0.071 
S(IV) SVM 0.643 0.397 0.188 0.588 0.312 0.195 
KELM 0.633 0.390 0.195 0.579 0.306 0.202 
GEP 0.620 0.369 0.205 0.567 0.290 0.209 
S(V) SVM 0.989 0.984 0.035 0.982 0.975 0.055 
KELM 0.974 0.966 0.036 0.967 0.955 0.055 
GEP 0.981 0.980 0.037 0.981 0.974 0.056 
S(VI) SVM 0.991 0.985 0.034 0.983 0.976 0.054 
KELM 0.976 0.967 0.035 0.968 0.974 0.055 
GEP 0.988 0.981 0.036 0.980 0.975 0.055 
Figure 5

(a) Comparison of observed and predicted dam failure peak outflow for model S(VI), (b) statistical RMSE parameter obtained from sensitivity analysis, and (c) uncertainty analysis for the SVM-best models.

Figure 5

(a) Comparison of observed and predicted dam failure peak outflow for model S(VI), (b) statistical RMSE parameter obtained from sensitivity analysis, and (c) uncertainty analysis for the SVM-best models.

Close modal
For evaluating the impact of different parameters on the peak outflow, a sensitivity analysis was done for the SVM-best model. In this regard, the impact of each parameter was assessed by omitting it from the input set and calculating the RMSE error criteria. Figure 5(b) shows the sensitivity analysis results for the model S(VI). This figure clearly shows that in the peak outflow predicting Hw was the most efficient parameter and the less significant parameter was c. Therefore, the model S(V) was selected as the superior model. The mathematical expression of GEP for model S(V) is as follows:
(15)

Uncertainty analysis

In order to find the uncertainty of the superior SVM models, uncertainty analysis is done. According to Abbaspour et al. (2007), uncertainty is a result-dependent factor that represents the range of values a modelling result can attain.

In this study, the Monte Carlo uncertainty analysis method is used. In this procedure, two factors are used to test the robustness and analyse the model's uncertainty. The first is the percentage of the studied outputs that are in the range of 95PPU and the second is the average distance between the upper (XU) and lower (XL) uncertainty bands (Noori et al. 2015). In this regard, the considered model should be developed many times (1,000 in the present study), and the empirical cumulative distribution probability of the models should be calculated. After that, the XL and XU are considered 2.5% and 97.5% probabilities of the cumulative distribution, respectively. In the proper confidence level, two important indices should be considered. First: the 95PPU band brackets most of the observations, and second: the average distance between the upper (at 97.5% level) and the lower (at 2.5% level) parts of the 95PPU be smaller than the standard deviation of the measured data (Abbaspour et al. 2007). The two mentioned indices are applied for accounting input uncertainties. For evaluating the average width of the confidence interval band, the band width indicator was suggested by Abbaspour et al. (2007) as follows:
(16)
where σx and are the standard deviation of observed data and the confidence band's average width, respectively. The percentage of the data within the confidence band of 95% is calculated as:
(17)
where 95PPU shows 95% predicted uncertainty; k shows the number of observed data and Xreg shows the current registered data. The obtained results are shown in Figure 5(c). Based on the values obtained for the d-factor and 95PPU, it could be indicated that all observed and predicted values were within the 95PPU band for both train and test sets. Also, it was found that the amount of d-Factors for train and test datasets were smaller than the standard deviation of the observed data. Therefore, based on the results, it could be induced that the dam failure peak outflow modeling via SVM model led to an allowable degree of uncertainty.

Comparison of the artificial intelligence models with the classical equations

In this study, two scenarios were considered for predicting the dam failure peak outflow. In the first scenario, only dam reservoir height and volume at the time of failure were used as the inputs and results showed that the combinations of Hw and Vw led to better outcomes. Also, in the second scenario, soil characteristics were added to the combinations of Hw and Vw and results showed that the model with parameters of Hw, Vw, d50, c, and φ was the superior model. Accuracy of the best proposed models of the considered scenarios and some dam failure peak out flow semi-theoretical formulae available in literature was compared to evaluate the performance of applied approaches. The results of comparison for each scenario and testing procedure are represented in Table 6 and Figure 6. According to three evaluated criteria (R, DC, RMSE), which are shown in Table 6, it could be seen that the estimated values via applied models had more accurate results than equations. It should be noted that the existing equations are developed based on special conditions, therefore the application of equations are limited to special cases of their development and did not show uniform results under different conditions. The mentioned issue can be seen in Figure 6, which showed that the obtained results from equations differ from each other and from the measured data, and had less preciseness than proposed models in this study. From the obtained results, among all equations, the MacDonald and Langridge-Monopolis equation yielded better results. It could be seen that for both scenarios, classical equations in most cases under estimated the peak outflow data. However, the obtained results by artificial intelligence models were more accurate. The intelligence applied approaches had the highest correlation with measured data and the lowest error. This issue shows the high capability of the SVM, KELM, and GEP methods in predicting the peak outflow. According to the black-box nature of artificial intelligence models, the learned relationship between input and output parameters is not apparent; however, these methods can significantly reduce the complexity of the system, identify the effective parameters and achieve the desired result.

Table 6

Comparison of prediction from proposed equations and developed models

ModelPerformance criteria
RDCRMSE
Scenario 1 data 
 Evans Qp = 0.72 Vw0.53 0.798 0.387 0.166 
 MacDonald and Langridge-Monopolis Qp = 1.154(HwVw)0.412 0.779 0.574 0.138 
 Costa Qp = 0.763(HwVw)0.42 0.731 0.419 0.162 
 Froehlich Qp = 0.607(Hw1.24 × Vw0.2950.769 0.218 0.205 
 SVM H(III) 0.870 0.787 0.049 
 KELM H(III) 0.871 0.781 0.051 
 GEP H(III) 0.833 0.756 0.052 
Scenario 2 data 
 Evans Qp = 0.72 Vw0.53 0.678 0.352 0.199 
 MacDonald and Langridge-Monopolis Qp = 1.154(HwVw)0.412 0.703 0.522 0.165 
 Costa Qp = 0.763(HwVw)0.42 0.760 0.403 0.186 
 Froehlich Qp = 0.607(Hw1.24 × Vw0.2950.823 0.211 0.235 
 SVM S(I) 0.925 0.881 0.105 
 KELM S(I) 0.911 0.865 0.109 
 GEP S(I) 0.893 0.838 0.112 
ModelPerformance criteria
RDCRMSE
Scenario 1 data 
 Evans Qp = 0.72 Vw0.53 0.798 0.387 0.166 
 MacDonald and Langridge-Monopolis Qp = 1.154(HwVw)0.412 0.779 0.574 0.138 
 Costa Qp = 0.763(HwVw)0.42 0.731 0.419 0.162 
 Froehlich Qp = 0.607(Hw1.24 × Vw0.2950.769 0.218 0.205 
 SVM H(III) 0.870 0.787 0.049 
 KELM H(III) 0.871 0.781 0.051 
 GEP H(III) 0.833 0.756 0.052 
Scenario 2 data 
 Evans Qp = 0.72 Vw0.53 0.678 0.352 0.199 
 MacDonald and Langridge-Monopolis Qp = 1.154(HwVw)0.412 0.703 0.522 0.165 
 Costa Qp = 0.763(HwVw)0.42 0.760 0.403 0.186 
 Froehlich Qp = 0.607(Hw1.24 × Vw0.2950.823 0.211 0.235 
 SVM S(I) 0.925 0.881 0.105 
 KELM S(I) 0.911 0.865 0.109 
 GEP S(I) 0.893 0.838 0.112 
Figure 6

Scatter plot of proposed equations and best model of SVM and GEP models, (a) scenario 1, (b) scenario 2.

Figure 6

Scatter plot of proposed equations and best model of SVM and GEP models, (a) scenario 1, (b) scenario 2.

Close modal

Dam failure is a catastrophic phenomenon that can create many serious problems. In this paper, three data-driven predictive models, named SVM, KELM, and GEP, were used to predict the peak outflow of a breached embankment dam. To find an effective model, two scenarios are considered in the modelling process. In scenario 1, it is considered that Qp depended on dam reservoir height and volume at the failure time. In scenario 2, soil characteristics are added to input combinations and the impact of soil characteristics was investigated on modelling the peak outflow. The results were then compared to previous research works.

From the results, it was found that scenario 2, which considered both reservoir parameters (i.e. height and volume) at the failure time and soil characteristics as the input for modelling peak outflow, led to more accurate predictions than scenario 1, which used only a combinations of reservoir height and volume parameters as inputs. And this issue confirmed the impact of soil properties on modelling peak outflow. In scenario 1, the model with input parameters Hw and Vw yielded better results, and in scenario 2, the model with parameters Hw, Vw, d50, c, and φ performed better than the others. However, it was found that the model with parameters Hw, Vw, d50, and φ approximately showed the same results and the cohesive strength parameter had a little impact on improving the efficiency of the models. Based on the obtained results, using only soil characteristics could not lead to the desired accuracy; however, adding soil characteristics (c, d50, and φ) to input combinations (Hw and Vw) improved the accuracy of models.

From the sensitivity analysis results, it was observed that Hw was the most important variable in the peak outflow modelling. Also, the SVM-based model dependability was investigated using uncertainty analysis. According to the results, the SVM model had an allowable degree of uncertainty in modelling the dam failure peak outflow. A comparison was also made between artificial intelligence methods and several selected empirical formulas. The obtained results showed the superior performance of intelligence applied methods over all the semi-empirical equations in estimating the peak outflow of breached embankment dams.

All relevant data are included in the paper or its Supplementary Information.

Abbaspour
K. C.
,
Yang
J.
,
Maximov
I.
,
Siber
R.
,
Bogner
K.
,
Mieleitner
J.
,
Zobrist
J.
&
Srinivasan
R.
2007
Modelling hydrology and water quality in the prealpine/alpine Thur watershed using SWAT
.
Journal of Hydrology
333
(
2
),
413
430
.
Amini
A. B.
,
Nourani
V.
&
Hakimzadeh
H.
2011
Application of artificial intelligence tools to estimate peak outflow from earth dam breach
.
The International Journal of Earth Sciences and Engineering
4
(
6
),
243
246
.
Azamathulla
H. M.
,
Ghani
A. A.
,
Leow
C. S.
,
Chang
C. K.
&
Zakaria
N. A.
2011
Gene-expression programming for the development of a stage-discharge curve of the Pahang River
.
Journal of Water Resources Management
25
(
11
),
2901
2916
.
Babovic
V.
2009
Introducing knowledge into learning based on genetic programming
.
Journal of Hydroinformatics
11
(
3–4
),
181
193
.
Chang
C. K.
,
Azamathulla
H. M.
,
Zakaria
N. A.
&
Ghani
A. A.
2012
Appraisal of soft computing techniques in prediction of total bed material load in tropical rivers
.
Journal of Earth System Science
121
(
1
),
125
133
.
Coleman
S.
,
Andrews
D.
&
Webby
M. G.
2002
Overtopping breaching of noncohesive homogeneous embankments
.
Journal of Hydraulic Engineering
128
(
9
),
829
838
.
Deng
W.
,
Xu
J.
,
Song
Y.
&
Zhao
H.
2020a
Differential evolution algorithm with wavelet basis function and optimal mutation strategy for complex optimization problem
.
Applied Soft Computing
15
,
106724
.
Deng
W.
,
Xu
J.
,
Gao
X. Z.
&
Zhao
H.
2020b
An enhanced MSIQDE algorithm with novel multiple strategies for global optimization problems
.
IEEE Transactions on Systems, Man, and Cybernetics: Systems
.
doi:10.1109/TSMC.2020.3030792
.
Deng
W.
,
Liu
H.
,
Xu
J.
,
Zhao
H.
&
Song
Y.
2020c
An improved quantum-inspired differential evolution algorithm for deep belief network
.
IEEE Transactions on Instrumentation and Measurement
.
doi:10.1109/TIM.2020.2983233
.
Duricic
J.
,
Erdik
T.
&
Van Gelder
P.
2013
Predicting peak breach discharge due to embankment dam failure
.
Journal of Hydroinformatics
15
(
4
),
1361
1376
.
Ferreria
C.
2001
Gene expression programming: a new adaptive algorithm for solving problems
.
Complex Systems
13
(
2
),
87
129
.
Froehlich
D. C.
2016
Predicting peak discharge from gradually breached embankment dam
.
Hydraulic Engineering
22
(
8
),
07017008
.
Froehlich
D. C.
1995a
Peak outflow from breached embankment dam
.
Journal of Water Resources Planning and Management
121
(
1
),
90
97
.
Froehlich
D. C.
1995b
Embankment dam breach parameters revisited
. In
Water Resources Engineering, Proc., ASCE Conf. on Water Resources Engineering
,
New York
, pp.
887
891
.
Gaucher
J.
,
Marche
C.
&
Mahdi
T.
2010
Experimental investigation of the hydraulic erosion of noncohesive compacted soils
.
Journal of Hydraulic Engineering
136
(
11
),
901
913
.
Hashmi
M. Z.
,
Shamseldin
A. Y.
&
Melville
B. W.
2011
Statistical downscaling of watershed precipitation using Gene Expression Programming (GEP)
.
Environmental Modelling & Software
26
(
12
),
1639
1646
.
Hooshyaripor
F.
,
Tahershamsi
A.
&
Golian
S.
2013
Application of copula method and neural networks for predicting peak outflow from breached embankments
.
Journal of Hydro Environmental Research
8
(
3
),
292
303
.
Huang
G. B.
,
Zhu
Q. Y.
&
Siew
C. K.
2006
Extreme learning machine: theory and applications
.
Neurocomputing
70
(
1–3
),
489
501
.
Huang
G. B.
,
Zhou
H.
,
Ding
X.
&
Zhang
R.
2012
Extreme learning machine for regression and multiclass classification
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
42
(
2
),
513
529
.
Kirkpatrick
G. W.
1997
Evaluation Guidelines for Spillway Adequacy. The Evaluation of Dam Safety
. In
Engineering Foundation Conference, ASCE
,
Pacific Grove, California
, pp.
395
414
.
Kisi
O.
,
HosseinzadehDalir
A.
,
Cimen
M.
&
Shiri
J.
2012
Suspended sediment modeling using genetic programming and soft computing techniques
.
Journal of Hydrology
450–451
,
48
58
.
Lan
Y.
2014
Forecasting performance of support vector machine for the Poyang Lake's water level
.
Journal of Water Science & Technology
70
(
9
),
1488
1495
.
Mitchell
T. M.
1997
Machine Learning
.
McGraw-Hill
,
New York, NY
.
Noori
R.
,
Deng
Z.
,
Kiaghadi
A.
&
Kachoosangi
F. T.
2015
How reliable are ANN, ANFIS, and SVM techniques for predicting longitudinal dispersion coefficient in natural rivers?
Hydraulic Engineering
142
(
1
),
04015039
.
Nourani
V.
,
Hakimzadeh
H.
&
BabaeyanAmini
A.
2012
Implementation of artificial neural network technique in the simulation of dam breach hydrograph
.
Journal of Hydroinformatics
14
(
2
),
478
496
.
Pierce
M. W.
,
Thornton
C. I.
&
Abt
S. R.
2010
Predicting peak outflow from breached embankment dams
.
Journal of Hydraulic Engineering
15
(
5
),
338
349
.
Roushangar
K.
&
Ghasempour
R.
2017
Prediction of non-cohesive sediment transport in circular channels in deposition and limit of deposition states using SVM
.
Water Science and Technology: Water Supply
17
(
2
),
537
551
.
Singh
V. P.
&
Scarlatos
P. D.
1988
Analysis of gradual earth dam failure
.
Journal of Hydraulic Engineering
114
(
1
),
21
42
.
Song
Y.
,
Wu
D.
,
Deng
W.
,
Gao
X. Z.
,
Li
T.
,
Zhang
B.
&
Li
Y.
2021a
MPPCEDE: multi-population parallel co-evolutionary differential evolution for parameter optimization
.
Energy Conversion and Management
228
,
113661
.
Song
Y.
,
Wu
D.
,
Wagdy Mohamed
A.
,
Zhou
X.
,
Zhang
B.
&
Deng
W.
2021b
Enhanced success history adaptive DE for parameter optimization of photovoltaic models
.
Complexity
2021
,
6660115
.
Teodorescu
L.
&
Sherwood
D.
2008
High energy physics event selection with gene expression programming
.
Computer Physics Communications
178
(
6
),
409
419
.
Thornton
C. I.
,
Pierce
M. W.
&
Abt
S. R.
2011
Enhanced predictions for peak outflow from breached embankment dams
.
Hydraulic Engineering
16
(
1
),
81
88
.
Wahl
T. L.
1998
Prediction of embankment dam breach parameters, A literature review and needs assessment. Rep. No. DSO-98-004, Bureau of Reclamation, U.S. Department of the Interior, Denver, 60.Rep. no. DSO-98-004, 1998
.
Wahl
T. L.
2004
Uncertainty of predictions of embankment dam breach parameters
.
Journal of Hydraulic Engineering
130
(
5
),
389
397
.
Wahl
T. L.
2010
Dam breach modeling an overview of analysis methods
. In
Joint Federal Interagency Conference on Sedimentation and Hydrologic Modeling
,
June, Las Vegas, NV
.
Wang
W. C.
,
Xu
D. M.
,
Chau
K. W.
&
Chen
S.
2018
Improved annual rainfall-runoff forecasting using PSO–SVM model based on EEMD
.
Journal of Hydroinformatics
15
(
4
),
1377
1390
.
Xu
Y.
&
Zhang
L. M.
2009
Breaching parameters for earth and rockfill dams
.
Journal of Geotechnical and Geoenvironmental Engineering
135
(
12
),
1957
1970
.
Yan
J.
,
Jin
J.
,
Chen
F.
,
Yu
G.
,
Yin
H.
&
Wang
W.
2018
Urban flash flood forecast using support vector machine and numerical simulation
.
Journal of Hydroinformatics
20
(
1
),
221
231
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).