Abstract
As a remarkable parameter, the discharge coefficient (Cd) plays an important role in determining weirs' passing capacity. In this research work, the support vector machine (SVM) and the gene expression programming (GEP) algorithms were assessed to predict Cd of piano key weir (PKW), rectangular labyrinth weir (RLW), and trapezoidal labyrinth weir (TLW) with gathered experimental data set. Using dimensional analysis, various combinations of hydraulic and geometric non-dimensional parameters were extracted to perform simulation. The superior model for the SVM and the GEP predictor for PKW, RLW, and TLW included ,
and
respectively. The results showed that both algorithms are potential in predicting discharge coefficient, but the coefficient of determination (RMSE, R2, Cd(DDR)max) illustrated the superiority of the GEP performance over the SVM. The results of the sensitivity analysis determined the highest effective parameters for PKW, RLW, and TLW in predicting discharge coefficients are
,
, and Fr respectively.
HIGHLIGHTS
Three different types of weirs have been studied in this paper.
Two SVM and GEP algorithms have been implemented to predict the discharge coefficient of three weirs.
Eighteen combinations of dimensionless parameters have been tested to achieve optimum prediction of the discharge coefficient.
An equation for a superior model has been extracted to simulate discharge coefficient.
INTRODUCTION
Weirs are a type of hydraulic structure implemented for water surface control, flow measurement, and passing excess water volume. The shape and the length of the weir crest have a remarkable role in determining overflow capacity. Having more crest length, nonlinear weirs named labyrinth type are a form of weir that can convey more flow for an equal hydraulic head than linear types. Their crest is often triangular, rectangular, trapezoidal, and arched in shape. The functional differences of labyrinth weirs are illustrated with discharge coefficient, Cd, dependent on the hydraulic and geometric parameters. Research in recent decades indicated that because of the complexity and diversity of effective parameters on the Cd, numerical simulation has been developed to predict discharge coefficient. The data-driven (soft computing) approach is another technique that has also been used to predict the Cd using experimental data during train and test phases. These models are capable of extracting complex and hidden relationships among dependent and independent variables (Samadi et al. 2014, 2015, 2020). In this regard, researchers used an artificial neural network (ANN), group method of data handling (GMDH), gene expression programming (GEP), support vector machine (SVM), and adaptive neuro-fuzzy inference system (ANFIS). ASCE (2000) considers the SVM as a nonlinear mapping process predicting discharge coefficient. The superiority and tangible advantage of the ANN to regression model for predicting the Cd in a sharp-crest triangle plan weir has been proved by Kumar et al. (2011) and Juma et al. (2014). The application of the ANFIS for estimating the Cd of desirable accuracy was reported by Kisi (2013) and Emiroglu & Kisi (2013). Ebtehaj et al. (2015a, 2015b, 2015c) utilized data-mining methods to determine the Cd of side weirs located on the sidewalls of rectangular channels. Their results proved the significant acceptable accuracy of these models. Azamathulla et al. (2016) used the SVM to predict the Cd for side weir. The determination coefficients for the train and the test phases were obtained as 0.96 and 0.93 respectively, implying more accuracy of the simulation. Parsaei & Haghiabi (2017) performed a comparative study using the GMDH, the multivariate adaptive regression splines (MARS), and the SVM to predict a combined weir-gate discharge coefficient. Their results confirmed not only that all models have the capability of prediction but also the performance of the SVM algorithm is superior to other methods. Norouzi et al. (2019) performed a comparative study to predict discharge coefficient using the SVM and the ANN of trapezoidal labyrinth weir. The results of their study showed not only the good performance of the two models but also the remarkable superiority of the SVM. Azimi et al. (2019) employed the SVR to predict the Cd of a side weir in a trapezoidal channel. They claimed that the SVR model simulates the Cd with acceptable accuracy. Mohammed & Sharifi (2020) proposed using the GEP to predict discharge coefficient for oblique side weir. Zhou et al. (2015), Zaji et al. (2016), Zaji & Bonakdari (2017), Roushangar et al. (2017), Nadiri et al. (2018), Majedi Asl & Fuladipanah (2019), Sadeghfam et al. (2019), Kumar et al. (2019) and Roushangar et al. (2021) proposed to evaluate discharge coefficient using data-driven methods.
Given the significance of the data-driven methods, this paper probes the exert of the GEP and the SVM algorithms to acquire the discharge coefficient of three different weirs, i.e. piano key, rectangular labyrinth, and trapezoidal labyrinth. Further, a new regression equation is extracted to predict the discharge coefficient. Finally, using the sensitivity analysis, the order of the effective parameters on the Cd is determined.
METHODS AND MATERIALS
Experimental models description
Two SVM and GEP models have been used to predict the Cd for piano key weir (PKW), rectangular labyrinth weir (RLW), and trapezoidal labyrinth weir (TLW). Experimental data were gathered from Rostami et al. (2018) and Seamons (2014). Figures 1 and 2 show the geometric characteristics of the weirs.
Discharge coefficient function
Rostami et al. (2018) performed their experiments in a rectangular flume of length 10 m, width 0.3 m, and height 0.6 m at the Khuzestan Water and Power Authority Laboratory, Iran. The range of experimental geometric characteristics of the weir has been presented in Table 1. In this table, W is the width of each cycle, P is the weir height, L is the effective length of the weir, Ts is the weir thickness, Wi and Wo are the width of inlet and outlet keys of the weir, respectively, and N is the number of keys in the weir.
The range of geometric characteristics of the piano key weir (Rostami et al. (2018))
Model . | W . | W/P . | L (cm) . | Ts (mm) . | N . | Wi/Wo . | Number of experiments . |
---|---|---|---|---|---|---|---|
Piano key | 15 | 2, 2.5, 3 | 148.5 | 5 | 2 | 1.25 | 36 |
Piano key | 30 | 2, 2.5, 3 | 148.5 | 5 | 1 | 1.25 | 36 |
Model . | W . | W/P . | L (cm) . | Ts (mm) . | N . | Wi/Wo . | Number of experiments . |
---|---|---|---|---|---|---|---|
Piano key | 15 | 2, 2.5, 3 | 148.5 | 5 | 2 | 1.25 | 36 |
Piano key | 30 | 2, 2.5, 3 | 148.5 | 5 | 1 | 1.25 | 36 |
The range of hydraulic and geometric variables of weirs (Seamons 2014)
Variables . | Range . | Values . |
---|---|---|
Q (m3/s) | 0.02–0.64 | – |
![]() | 0.051–0.835 | – |
Cd | 0.309–0.684 | – |
W (m) | – | 1.15, 1.23 |
N | – | 2 |
α (degree) | – | 12, 15 |
tw (m) | – | 0.04 |
P (m) | – | 0.3, 0.38, 0.46 |
Variables . | Range . | Values . |
---|---|---|
Q (m3/s) | 0.02–0.64 | – |
![]() | 0.051–0.835 | – |
Cd | 0.309–0.684 | – |
W (m) | – | 1.15, 1.23 |
N | – | 2 |
α (degree) | – | 12, 15 |
tw (m) | – | 0.04 |
P (m) | – | 0.3, 0.38, 0.46 |
Overview of Support Vector Machine (SVM)


Some of the most important nuclear functions are presented in Table 3. In this table γ, C and d are kernel parameters. The SVM generalization performance (estimation accuracy) depends on a good set of meta-parameters parameters γ, C and d are the kernel parameters. The choices γ, r, and d control the prediction (regression) model complexity. Of all nuclear functions, the radial-based function (RBF) is the most used one and is proposed by notable researchers. In RBF, σ is the function parameter. The characteristic parameters of the SVM, i.e., ε and C, are optimized, and the value of the setting parameter, , is determined with the trade-off between the minimum fitting error and the estimated function. The user, via trial and error, determines the SVM parameters. In the RBF function, ε is the radius of the tube within which the regression function must lie.
Overview of gene expression programming
Invented by Ferreira (2001, 2006), GEP is an artificial procedure to solve genotype systems. It is used for solving complex real-world problems (Azamathullah 2012; Samadi et al. 2021). Computer programs with different shapes and lengths encoded in linear chromosomes with fixed sizes are progressed with GEP. Encoding of chromosome information is the final step of the GEP algorithm leading to the tree expression, called the translation process. The GEP algorithm involves four main steps: (1) initialize the population by creating the chromosomes (individuals); (2) identify a suitable fitness function to evaluate the best individual; (3) conduct genetic operations to modify the individuals to achieve the optimal solution in the next generation; (4) check the stop conditions. The flowchart of the GEP algorithm is illustrated in Figure 3.
Evaluation criteria
For better judgment and visualization, the Gaussian function of DDR values could be calculated and illustrated in a standard normal distribution format. For this reason, firstly, the DDR values must be standardized and then using the Gaussian function, the normalized value of DDR (Cd(DDR)) is calculated. Generally, more tendencies in the error distribution graph to the centerline and larger value of the maximum Cd(DDR) are equal to more accuracy.
RESULTS AND DISCUSSION
A different combination of dimensionless parameters has been implemented to assess various models to predict discharge coefficient using SVM and GEP algorithms for the PKW, RLW, and TLW. These combinations have been presented in Table 4.
Kernel functions
Kernel's function name . | Function . |
---|---|
Linear | ![]() |
Poly-nominal function | ![]() |
Radial basis function | ![]() |
Sigmoid function | ![]() |
Kernel's function name . | Function . |
---|---|
Linear | ![]() |
Poly-nominal function | ![]() |
Radial basis function | ![]() |
Sigmoid function | ![]() |
Various combinations of dimensionless parameters to predict discharge coefficient of PKW, RLW, and TLW
Weir type . | Model number . | Dimensionless Π . | Weir type . | Model number . | Dimensionless Π . | Weir type . | Model number . | Dimensionless Π . |
---|---|---|---|---|---|---|---|---|
PKW | Model 1 | ![]() | RLW | Model 1 | ![]() | TLW | Model 1 | ![]() |
Model 2 | ![]() | Model 2 | ![]() | Model 2 | ![]() | |||
Model 3 | ![]() | Model 3 | ![]() | Model 3 | ![]() | |||
Model 4 | ![]() | Model 4 | ![]() | Model 4 | ![]() | |||
Model 5 | ![]() | Model 5 | ![]() | Model 5 | ![]() | |||
Model 6 | ![]() | |||||||
Model 7 | ![]() | |||||||
Model 8 | ![]() |
Weir type . | Model number . | Dimensionless Π . | Weir type . | Model number . | Dimensionless Π . | Weir type . | Model number . | Dimensionless Π . |
---|---|---|---|---|---|---|---|---|
PKW | Model 1 | ![]() | RLW | Model 1 | ![]() | TLW | Model 1 | ![]() |
Model 2 | ![]() | Model 2 | ![]() | Model 2 | ![]() | |||
Model 3 | ![]() | Model 3 | ![]() | Model 3 | ![]() | |||
Model 4 | ![]() | Model 4 | ![]() | Model 4 | ![]() | |||
Model 5 | ![]() | Model 5 | ![]() | Model 5 | ![]() | |||
Model 6 | ![]() | |||||||
Model 7 | ![]() | |||||||
Model 8 | ![]() |
SVM solver
The total number of data used for modeling was 72. Of all measured data, the share of train and test phases were 75% and 25%, respectively. The results of the SVM algorithm for the PKW showed model 1 has the best performance among the five models. The setting parameters C, γ, and ε were obtained as 40, 4, and 0.1 for the optimum model with R2 = 0.9785 and RMSE = 0.02418 for the train stage and R2 = 0.9789 and RMSE = 0.027 for the test phase. These results were obtained with the RBF kernel function. Figure 4 illustrates a scatter plot of measured and predicted values of Cd and conformity of data points for Cd during train and test phases for model 1. The distribution of predicted and observed data around the fit line and their conformity during the train and the test phases determine the high accuracy of the SVM model. More statistical indices have been presented in Table 5, demonstrating the fitness among measured and predicted values of the discharge coefficient. As it is seen, all indices have very close values through the test and train stages.
Summary of the SVM performance assessment for PKW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 54 | 18 | ||
Correlation coefficient | 0.9892 | 0.9894 | ||
Mean | 0.62 | 0.63 | 0.58 | 0.58 |
Maximum | 0.930 | 1 | 0.89 | 0.86 |
Minimum | 0.34 | 0.37 | 0.38 | 0.38 |
STDEV | 0.1680 | 0.1627 | 0.1829 | 0.1733 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 54 | 18 | ||
Correlation coefficient | 0.9892 | 0.9894 | ||
Mean | 0.62 | 0.63 | 0.58 | 0.58 |
Maximum | 0.930 | 1 | 0.89 | 0.86 |
Minimum | 0.34 | 0.37 | 0.38 | 0.38 |
STDEV | 0.1680 | 0.1627 | 0.1829 | 0.1733 |
The distribution of the standard distribution of Cd(DDR) values for the SVM model through train and test phases has been shown in Figure 5. The maximum values of Cd(DDR) for the train and the test phases were obtained at 8.153 and 8.698, respectively. These values show that the model has better performance during the test stage than the train.
A sensitivity analysis was performed for model 1 in Table 6 to evaluate the impact of input variables on the estimated discharge coefficient. In this analysis, each time a parameter was omitted from the model inputs, then the model was implemented, and accuracy was evaluated as the most significant parameter. The deleted parameter with the most effect on declined model precision and boosted model error was rated as the most remarkable parameter. A look at the analysis of the results reveals that dropping the parameter has increased the RMSE to 0.1569 and 0.1905 and has decreased R2 to 0.1151 and 0.0127 through train and test phases, respectively. The calculation demonstrated that the prominent and sensitive parameter for the SVM predictor is
. The second and the third most important and effective parameters are
and N, respectively.
Sensitivity analysis of the SVM model for PKW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9785 | 0.0248 | 0.9789 | 0.0270 |
![]() | 0.9112 | 0.0502 | 0.8699 | 0.0657 | |
N | 0.9511 | 0.0369 | 0.9602 | 0.0366 | |
![]() | 0.1151 | 0.1569 | 0.0127 | 0.1905 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9785 | 0.0248 | 0.9789 | 0.0270 |
![]() | 0.9112 | 0.0502 | 0.8699 | 0.0657 | |
N | 0.9511 | 0.0369 | 0.9602 | 0.0366 | |
![]() | 0.1151 | 0.1569 | 0.0127 | 0.1905 |
The next simulation of Cd belongs to RLW. The number of data for the RLW is 33. The share of the train and the test phases are 73% and 27%, respectively. The first model of Table 4 has had the best performance to predict the discharge coefficient. The setting parameters C, γ, and ε were obtained as 60, 1.4, and 0.1, respectively. The values of (R2, RMSE) for the train and the test phases were calculated as (0.9745, 0.0208) and (0.9734, 0.0138), respectively. A scatter plot of observed versus predicted Cd and a plot of data point against discharge coefficient values for the train and the test stages have been presented in Figure 6. As it is clear, the data are less scattered around the fit line and have significant conformity during the test and train phases, indicating remarkable acceptance performance for the SVM model. Table 7 displays the summary of the statistical indices for the performance of the SVM algorithm for a superior combination of parameters. Very little difference among indices proves the good simulation of the SVM model.
Summary of the SVM performance assessment for RLW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 24 | 9 | ||
Correlation coefficient | 0.9871 | 0.9917 | ||
Mean | 0.43 | 0.43 | 0.37 | 0.38 |
Maximum | 0.64 | 0.62 | 0.57 | 0.56 |
Minimum | 0.27 | 0.26 | 0.26 | 0.26 |
STDEV | 0.1255 | 0.1298 | 0.0955 | 0.0955 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 24 | 9 | ||
Correlation coefficient | 0.9871 | 0.9917 | ||
Mean | 0.43 | 0.43 | 0.37 | 0.38 |
Maximum | 0.64 | 0.62 | 0.57 | 0.56 |
Minimum | 0.27 | 0.26 | 0.26 | 0.26 |
STDEV | 0.1255 | 0.1298 | 0.0955 | 0.0955 |
Figure 7 illustrates the Cd(DDR) vs. ZDDR for the superior model through train and test stages. The maximum Cd(DDR) for the train and the test stages are 8.256 and 14.255, respectively. These values verify the better performance of the SVM during the test phase than the train, indicating correct operation. A sensitivity analysis has been performed for model 1 (Table 8). Removing has changed the model function dramatically. The sharp difference is evident between the outputs of the model preference by the omission of
with decreasing R2 from 0.9745 to 0.0636 and from 0.9834 to 0.3682 for the train and test stages. Increasing RMSE from 0.0208 to 0.1209 and from 0.0138 to 0.1019 through the train and the test phases illustrate the highest effect of parameter
.
Sensitivity analysis of the SVM for the RLW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9745 | 0.0208 | 0.9834 | 0.0138 |
![]() | 0.9223 | 0.0344 | 0.9598 | 0.0331 | |
![]() | 0.0636 | 0.1209 | 0.3682 | 0.1019 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9745 | 0.0208 | 0.9834 | 0.0138 |
![]() | 0.9223 | 0.0344 | 0.9598 | 0.0331 | |
![]() | 0.0636 | 0.1209 | 0.3682 | 0.1019 |
Standardized normal distribution graph of DDR values for SVM models of RLW.
The discharge coefficient of the TLW is the third modeling process with the SVM. The share of the train and the test phases from the total observed data (140 data) were 80% and 20%, respectively. Among the eight mentioned combinations in Table 4, combination 3 had the highest adaptation with measured discharge coefficients. This model includes ,
,
and Fr parameters. Setting parameters C, γ, and ε of the SVM algorithm was obtained 100, 1, and 0.1, respectively. The values of the (R2, RMSE) for the train and test phases were (0.9896, 0.009) and (0.9886, 0.0144). The measured versus the predicted values of Cd have been illustrated in Figure 8. High values of well performance indices are well known from this figure. All measured and predicted data have scattered around line 1:1 and conformity of datasets during the train and the test stages indicate the high values of correlation and fewer values of inaccuracy. The summary of statistical characteristics of the predicted and measured values for Cd has been presented in Table 9, an improved SVM performance for the TLW. Figure 9 illustrates the distribution of standardized Cd(DDR) vs. ZDDR for the TLW. The maximum values of Cd(DDR) for the train and the test stages are 25.724 and 33.524, respectively. The better simulation for the test phase proves the correct performance of modeling. The sensitivity analysis (Table 10) shows Fr is the most effective parameter because of the most decreasing R2 and most increasing RMSE by omitting this parameter from the modeling process. The other effective parameters in descending order are
,
and
, respectively.
Summary of the SVM performance assessment for TLW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 112 | 28 | ||
Correrlation coefficient | 0.9947 | 0.9942 | ||
Mean | 0.55 | 0.55 | 0.54 | 0.54 |
Maximum | 0.70 | 0.68 | 0.69 | 0.67 |
Minimum | 0.34 | 0.34 | 0.40 | 0.38 |
STDEV | 0.0795 | 0.0792 | 0.0801 | 0.0914 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 112 | 28 | ||
Correrlation coefficient | 0.9947 | 0.9942 | ||
Mean | 0.55 | 0.55 | 0.54 | 0.54 |
Maximum | 0.70 | 0.68 | 0.69 | 0.67 |
Minimum | 0.34 | 0.34 | 0.40 | 0.38 |
STDEV | 0.0795 | 0.0792 | 0.0801 | 0.0914 |
Sensitivity analysis of SVM for the TLW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 3 | None | 0.9896 | 0.009 | 0.9886 | 0.0114 |
Fr | 0.8978 | 0.027 | 0.899 | 0.0309 | |
![]() | 0.9721 | 0.0133 | 0.9717 | 0.0176 | |
![]() | 0.953 | 0.0183 | 0.9602 | 0.0197 | |
![]() | 0.9123 | 0.023 | 0.9132 | 0.0265 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 3 | None | 0.9896 | 0.009 | 0.9886 | 0.0114 |
Fr | 0.8978 | 0.027 | 0.899 | 0.0309 | |
![]() | 0.9721 | 0.0133 | 0.9717 | 0.0176 | |
![]() | 0.953 | 0.0183 | 0.9602 | 0.0197 | |
![]() | 0.9123 | 0.023 | 0.9132 | 0.0265 |
Standardized normal distribution graph of DDR values for SVM models of TLW.
GEP solver
In this section, the result of the GEP's simulation has been presented for the three weirs. Three mentioned models of PKW in Table 4 were examined, and the first one has the best output based on the presented chromosome's properties in Table 11. The share of the train and the test phases of all data are 75 and 25%, respectively. The values of RMSE as fitness function error of the GEP for the train and the test phases are 0.0297 and 0.0425, respectively. The tree expression of the GEP algorithm is presented in Figure 10, including mathematical functions and operators.
Parameter values used to predict Cd in PKW
Parameters . | Values . |
---|---|
Head size | 10 |
Number of chromosomes | 30 |
Number of genes | 3 |
Mutation rate | 0.044 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |
Parameters . | Values . |
---|---|
Head size | 10 |
Number of chromosomes | 30 |
Number of genes | 3 |
Mutation rate | 0.044 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |


Summary of the GEP performance assessment for the PKW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 54 | 17 | ||
Correlation coefficient | 0.9859 | 0.9883 | ||
Mean | 0.64 | 0.64 | 0.53 | 0.56 |
Maximum | 0.93 | 0.90 | 0.71 | 0.75 |
Minimum | 0.36 | 0.29 | 0.35 | 0.30 |
STDEV | 0.1756 | 0.1676 | 0.1267 | 0.1494 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 54 | 17 | ||
Correlation coefficient | 0.9859 | 0.9883 | ||
Mean | 0.64 | 0.64 | 0.53 | 0.56 |
Maximum | 0.93 | 0.90 | 0.71 | 0.75 |
Minimum | 0.36 | 0.29 | 0.35 | 0.30 |
STDEV | 0.1756 | 0.1676 | 0.1267 | 0.1494 |
Variation of observed and predicted values of Cd through the train and test phases has been displayed in Figure 11. It can be deduced that the GEP model performance is acceptable because of the adaptation of datasets during the train and the test phases. Figure 12 illustrates the standardized normal distribution of the DDR for the PKW using the GEP algorithm. The maximum values of Cd(DDR) for the train and the test phases are 7.531 and 9.305, respectively. Although both bell diagrams have almost the same focus around the vertical axis, the maximum value of the test stage proves the better performance than the test. The sensitivity analysis (Table 13) for the GEP illustrates the highest impact of because of the crucial changes in accuracy indices so that in the test phase RMSE has increased from 0.0425 to 0.2411 and R2 has dropped from 0.9767 to 0.1403.
Sensitivity analysis of the GEP model for PKW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9719 | 0.0297 | 0.9767 | 0.0425 |
![]() | 0.8998 | 0.0702 | 0.9265 | 0.0584 | |
N | 0.9107 | 0.0453 | 0.9301 | 0.0462 | |
![]() | 0.1261 | 0.2207 | 0.1403 | 0.2411 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9719 | 0.0297 | 0.9767 | 0.0425 |
![]() | 0.8998 | 0.0702 | 0.9265 | 0.0584 | |
N | 0.9107 | 0.0453 | 0.9301 | 0.0462 | |
![]() | 0.1261 | 0.2207 | 0.1403 | 0.2411 |
Standardized normal distribution graph of DDR values for GEP models of PKW.
The discharge capacity of RLW was the second one modeled using the GEP. Chromosome's parameters have been presented in Table 14. The values of RMSE for the train and test phases were 0.01179 and 0.026, respectively. A tree expression of the GEP predictor has been illustrated in Figure 13.
Parameter values used to predict Cd in RLW
Parameters . | Values . |
---|---|
Head size | 10 |
Numbers of chromosomes | 30 |
Number of genes | 3 |
Mutation rate | 0.04 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |
Parameters . | Values . |
---|---|
Head size | 10 |
Numbers of chromosomes | 30 |
Number of genes | 3 |
Mutation rate | 0.04 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |



Summary of the GEP performance assessment for the RLW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 24 | 9 | ||
Correrlation coefficient | 0.9935 | 0.9923 | ||
Mean | 0.41 | 0.41 | 0.42 | 0.41 |
Maximum | 0.59 | 0.59 | 0.64 | 0.66 |
Minimum | 0.26 | 0.24 | 0.26 | 0.23 |
STDEV | 0.1065 | 0.1065 | 0.1396 | 0.1557 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 24 | 9 | ||
Correrlation coefficient | 0.9935 | 0.9923 | ||
Mean | 0.41 | 0.41 | 0.42 | 0.41 |
Maximum | 0.59 | 0.59 | 0.64 | 0.66 |
Minimum | 0.26 | 0.24 | 0.26 | 0.23 |
STDEV | 0.1065 | 0.1065 | 0.1396 | 0.1557 |
Sensitivity analysis of the GEP for the RLW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9870 | 0.01179 | 0.9846 | 0.0260 |
![]() | 0.9389 | 0.03965 | 0.9581 | 0.02987 | |
N | 0.9630 | 0.03805 | 0.9706 | 0.0131 | |
![]() | 0.1362 | 0.2597 | 0.1321 | 0.2623 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 1 | None | 0.9870 | 0.01179 | 0.9846 | 0.0260 |
![]() | 0.9389 | 0.03965 | 0.9581 | 0.02987 | |
N | 0.9630 | 0.03805 | 0.9706 | 0.0131 | |
![]() | 0.1362 | 0.2597 | 0.1321 | 0.2623 |
Standardized normal distribution graph of DDR values for GEP models of RLW.
The simulation of the discharge coefficient for TLW has been performed based on the chromosome characteristics mentioned in Table 17. The values of RMSE for the train and the test phases were 0.00849 and 0.0823, respectively. Figure 10 presents the tree expression of the GEP modeling for TLW.
Parameter values used to predict Cd in TLW
Parameters . | Values . |
---|---|
Head size | 9 |
Chromosomes numbers | 33 |
Number of genes | 4 |
Mutation rate | 0.04 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |
Parameters . | Values . |
---|---|
Head size | 9 |
Chromosomes numbers | 33 |
Number of genes | 4 |
Mutation rate | 0.04 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3 |
Two-point recombination rate | 0.3 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
IS transposition rate | 0.1 |
RIS transposition rate | 0.1 |
Fitness function error type | RMSE |
Linking function | + |






Standardized normal distribution graph of DDR values for GEP models of TLW.
Summary of the GEP performance assessment for the TLW
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 74 | 21 | ||
Correlation coefficient | 0.9956 | 0.9983 | ||
Mean | 0.54 | 0.54 | 0.55 | 0.63 |
Maximum | 0.70 | 0.69 | 0.62 | 0.72 |
Minimum | 0.34 | 0.34 | 0.40 | 0.46 |
STDEV | 0.0901 | 0.0888 | 0.0767 | 0.0906 |
Statistical indices . | Train phase . | Test phase . | ||
---|---|---|---|---|
Measured data . | Predicted data . | Measured data . | Predicted data . | |
Number of data | 74 | 21 | ||
Correlation coefficient | 0.9956 | 0.9983 | ||
Mean | 0.54 | 0.54 | 0.55 | 0.63 |
Maximum | 0.70 | 0.69 | 0.62 | 0.72 |
Minimum | 0.34 | 0.34 | 0.40 | 0.46 |
STDEV | 0.0901 | 0.0888 | 0.0767 | 0.0906 |
Sensitivity analysis of the GEP for the TLW
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 3 | None | 0.9912 | 0.00849 | 0.9966 | 0.0823 |
Fr | 0.7021 | 0.01245 | 0.7158 | 0.01198 | |
![]() | 0.9649 | 0.0102 | 0.9685 | 0.0095 | |
![]() | 0.9769 | 0.0134 | 0.9775 | 0.0118 | |
![]() | 0.8257 | 0.0157 | 0.8657 | 0.0127 |
Top model . | Omitted variable . | Training phase . | Test phase . | ||
---|---|---|---|---|---|
R2 . | RMSE . | R2 . | RMSE . | ||
Model 3 | None | 0.9912 | 0.00849 | 0.9966 | 0.0823 |
Fr | 0.7021 | 0.01245 | 0.7158 | 0.01198 | |
![]() | 0.9649 | 0.0102 | 0.9685 | 0.0095 | |
![]() | 0.9769 | 0.0134 | 0.9775 | 0.0118 | |
![]() | 0.8257 | 0.0157 | 0.8657 | 0.0127 |
CONCLUSION
This research work investigates the capability of the SVM and the GEP algorithms in predicting the discharge coefficient of PKW, RLW, and TLW using gathered experimental data sets. Furthermore, a regression equation was extracted for each weir to simulate and predict the Cd based on effective dimensionless parameters. Finally, a sensitivity test was performed to determine the order of the effective parameters on the discharge coefficient. Two algorithms have the potential to simulate discharge coefficients with acceptable accuracy. The results can be summed up as follows:
Two SVM and GEP algorithms perform well in predicting the discharge coefficient of PKW based on statistical performance evaluation indices for PKW. But a comparison between maximum values of Cd(DDR) indicating priority of GEP over SVM.
Obtained analysis results prove the capability of SVM and GEP in simulating the discharge coefficient of RLW. The superiority of the GEP over the SVM is due to a large number of maximum values of Cd(DDR).
The results of statistical indices and the maximum values of Cd(DDR) indicate the remarkable superiority of the GEP over the SVM for TLW.
The sensitivity analysis suggests that the SVM and the GEP are capable of predicting the discharge coefficient of nonlinear weir well using hydraulic and geometric parameters. The most influential parameter for PKW and RLW was
. The most significant parameter for the TLW was Fr.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.