Abstract
In this study, a support vector machine (SVM) and three optimization algorithms are used to develop a discharge coefficient (Cd) prediction model for the semi-circular side weir (SCSW). After that, we derived the input and output parameters of the model by dimensionless analysis as the ratio of the flow depth at the weir crest point upstream to the diameter (h1/D), the ratio of main channel width to diameter (B/D), the ratio of side weir height to diameter (P/D), upstream of side weir Froude number (Fr), and Cd. The sensitivity coefficients for dimensionless parameters to Cd were calculated based on Sobol's method. The research shows that SVM and Genetic Algorithm (GA-SVM) have high prediction accuracy and generalization ability; the average error and maximum error were 0.08 and 2.47%, respectively, which were about 95.72 and 60.86% lower compared with the traditional empirical model. The first-order sensitivity coefficients S1 and global sensitivity coefficients Si of h1/D, B/D, P/D, and Fr were 0.35, 0.07, 0.13, and 0.02; 0.63, 0.25, 0.30, and 0.32, respectively. h1/D has a significant effect on Cd. In particular, when h1/D < 0.24 and 0.48 < Fr < 0.58, 0.67 < Fr < 0.72, the discharge capacity of the SCSW is relatively large.
HIGHLIGHTS
We developed an effective and high-accuracy model for predicting the Cd of SCSW.
The importance of dimensionless parameters on Cd was quantified by Sobol's method.
It explored the flow characteristics of semi-circular side weir.
INTRODUCTION
As one of the most common diversion structures, side weirs are used for flow control, drainage networks, irrigation, and wastewater channels (Zahiri et al. 2013). In recent years, with the change in extreme weather and a significant increase in storm floods, side weirs have been used as common equipment in sewer networks and irrigation systems to divert excess water flow from channels to other channels (Uyumaz et al. 2014). Semi-circular labyrinth side weirs are widely used due to their long overflow front length, stable overflow structure, and facilitation of sediment removal. Also, semi-circular side weir (SCSW) flow as a spatially variable flow has more parameters affecting the discharge coefficient (Cd). Therefore, it is important to accurately evaluate the influence and variation law of different factors on the Cd for the design and operation of this structure.
At present, most scholars mainly use traditional empirical methods to check the discharge capacity of SCSWs. Haghshenas & Vatankhah (2021) proposed discharge calculation equations for SCSW, in which the mean and maximum errors of the best model were 1.87 and 6.31%, respectively. Mamand & Raheem (2018) used SPSS software to fit the empirical equation of the SCSW, and the coefficients of determination (R2) in the form of multivariate linear regression and multivariate power regression were 0.8498 and 0.8584, respectively. Khalili & Honar (2017) gave the calculation equation of the Cd of the SCSW by using the model experiments and dimensional analysis. The research shows that the Cd of the SCSW was higher than that of the rectangular side weir. However, the discharge is affected by the plane position of the weir sill, the shape of the weir, the upstream and downstream flow conditions, and different flow resistances generated, resulting in different expressions of the Cd, which are not convenient for users. Also, the discharge coefficients were determined according to empirical equations, which were limited by certain datasets, effective parameter interactions, high uncertainty, numerous assumptions, and other defects (Tao et al. 2022), resulting in insufficient mining of physical properties among parameters and limited calculation accuracy.
In recent years, many scholars have attempted to use soft computing techniques for solving the problems of large calculations and inconvenient use of empirical equations (Haghbin & Sharafati 2022; Shen et al. 2022; Gharehbaghi et al. 2023; Parsaie et al. 2023; Seyedian et al. 2023; Yarahmadi et al. 2023). Jamei et al. (2021) developed three linear models for predicting the Cd of the triangular side orifices. The research shows that the intelligent model can accurately evaluate the discharge capacity of the side orifices under free-flow conditions. Tao et al. (2022) used three machine learning models to estimate the Cd prediction models of the gate under free-flow and submerged-flow conditions. The results show that the model has higher accuracy for the free-flow condition. Ismael et al. (2021) used neural network technology for predicting the Cd of inclined cylindrical weirs with different diameters; the root mean square error (RMSE) of the radial basis function network model was reduced by 9 and 41% compared with the cascade-forward neural network and the back-propagation neural network (BPNN) in the testing stage, respectively. However, with the wide application of intelligent models in weir flow, it has been gradually discovered that this technology has problems such as overfitting and easily falling into local optimum. Therefore, researchers began to try to optimize the hyperparameters of the model through optimization algorithms to derive the best model parameters to improve the forecast accuracy and stability of the model. For example, Haghbin et al. (2022) developed a hybrid data-driven approach to evaluate the Cd of step spillways, and the optimized model improved the performance index to 86.13%. Pradeep & Samui (2022) used a neural network technology hybrid optimization algorithm to predict rock strain, and the results showed that the optimized model was better than other single models in the training and testing phases. Chen et al. (2022) aimed to predict the discharge coefficient of streamlined weirs, and the results showed that the hybrid deep data-driven algorithms provide more accurate results than the classical ones. Simsek et al. (2023) used the artificial neural network (ANN) to predict the discharge coefficient of trapezoidal broad-crested weir; the study results showed that the Froude number significantly increases the performance of the models in estimating Cd values, and the ANN method was more successful in determining Cd than other methods. Balouchi & Rakhshandehroo (2018) used the soft computing models to evaluate the discharge coefficient for combined weir-gate, and multilayer perceptron was considered superior; it had better statistical indices of RMSE, mean absolute error (MAE), and R2 (0.027, 0.022, and 0.984, respectively).
However, the prediction model needs to meet the requirements of high accuracy and stability due to the large discharge and complex physical parameters of the SCSW. According to the current literature, research shows that a high-precision SCSW Cd prediction model has not been developed yet. Therefore, it is important to develop an accurate and stable prediction model for the Cd of SCSW in this study. In addition, there is also great interest in the interaction characteristics between model inputs and outputs. Zhang et al. (2013) used Sobol's method to analyze the sensitivity of potential hydrological processes under different hydrological models and climatic conditions. Nossent et al. (2011) successfully applied the Sobol sensitivity method to the prioritization of input parameters of complex environmental models. However, most scholars pay more attention to the stability and accuracy of the weir flow prediction model, and the interactions and variation relationships between input parameters and discharge coefficients have not been explored in depth. Hence, this paper not only establishes the discharge coefficient prediction model for the SCSW but also provides a new method for the accurate calculation of the discharge of the structure. More importantly, based on predecessors, the influence of dimensionless parameters on the discharge coefficient is quantified, and this study fills the research gap in this area.
In summary, this study aims to systematically evaluate the effects of the hydraulic parameters of SCSWs on the Cd. First, the particle swarm optimization (PSO) algorithm, genetic algorithm (GA), and sparrow search algorithm (SSA) are used to optimize the hyperparameters c and γ of the support vector machine (SVM) and establish three different models for predicting the Cd of SCSWs. Then, the accuracy and generalization ability of the intelligent and traditional empirical models are compared using different performance indexes. On this basis, Sobol's method is used to explore the interaction and change process between hydraulic parameters and Cd and analyze the change law of hydraulic parameters and Cd. The sensitivity of different hydraulic parameters to Cd is quantified to provide an essential reference basis for the design and promotion of SCSWs.
DATA AND MODELS
Experimental data
Statistical parameters . | B/D . | P/D . | h1/D . | Fr . | Cd . |
---|---|---|---|---|---|
Maximum | 1 | 0.6 | 0.469 | 0.815 | 0.780 |
Minimum | 0.625 | 0.125 | 0.156 | 0.174 | 0.565 |
Mean | 0.799 | 0.299 | 0.304 | 0.433 | 0.663 |
Middle quartile | 0.833 | 0.250 | 0.305 | 0.420 | 0.652 |
SD | 0.155 | 0.136 | 0.083 | 0.153 | 0.056 |
Statistical parameters . | B/D . | P/D . | h1/D . | Fr . | Cd . |
---|---|---|---|---|---|
Maximum | 1 | 0.6 | 0.469 | 0.815 | 0.780 |
Minimum | 0.625 | 0.125 | 0.156 | 0.174 | 0.565 |
Mean | 0.799 | 0.299 | 0.304 | 0.433 | 0.663 |
Middle quartile | 0.833 | 0.250 | 0.305 | 0.420 | 0.652 |
SD | 0.155 | 0.136 | 0.083 | 0.153 | 0.056 |
Dimensional analysis
Support vector machine
In this study, the dataset is small, the sample uncertainty is high, and the sample parameters are highly nonlinear. Therefore, a suitable large-scale, fast, and robust model is selected. Meanwhile, SVM is a powerful supervised learning technique that can provide reliable and robust predictions (Najafzadeh & Oliveto 2020). Considering that the PSO and GA belong to the traditional swarm intelligence algorithm, and the SSA belongs to the new swarm intelligence algorithm by using the same dataset to compare the hyperparameter changes between the three algorithms, the stability and reliability of the model can be better determined.
SVM and PSO
The particle swarm regards the two parameters of C and γ of the SVM as two particle swarms and first sets the parameters of population size and iteration number for population and velocity initialization, inputs the randomly generated C and γ into the SVM model for training; the mean square error of model cross-validation (CVmse) is used as the model fitness function, the minimum fitness of the particle represents the optimal particle position at this time, and the optimization algorithm ends when the iteration number meets the set value.
SVM and GA
The GA is an adaptive optimization method with a global search function that uses random search to efficiently guide the parameter space to encode each individual. The key technology of the algorithm consists of five elements: encoding of parameters, initialization of the population, calculation of the fitness function, layout of genetic operations, and control of the parameter arrangement (Li & Kong 2014). Therefore, through continuous evolution from generation to generation, an optimally adapted individual can eventually be obtained. It has the advantages of global optimality, implicit parallelism, high stability, and wide availability (Li & Kong 2014; Guan et al. 2021).
The basic steps of the GA:
- (1)
Encoding: The GA represents the solution data in the solution space as genotypic string structure data in the genetic space before searching, and the different combinations of these string structure data constitute the different points.
- (2)
Initial population generation: N initial string structure data are randomly generated, each string structure data is called an individual, N individuals form a population, and the GA uses these N string structure data as initial points to start evolution.
- (3)
Adaptability evaluation: Adaptability indicates the strengths and weaknesses of individuals or solutions. The fitness function is defined in different ways for different problems.
Finally, the optimal solution is obtained by three basic operations: selection, crossover, and variation.
SVM and SSA
Sobol's sensitivity analysis method
Evaluation index
RESULTS AND DISCUSSION
Model comparison
In this study, 109 experimental datasets were selected as the training set and the remaining 46 sets were used as the testing set. h1/D, B/D, P/D, and Fr were used as model inputs and Cd as model outputs. The global optimization of the hyperparameters C and γ of SVM was performed by three optimization algorithms, PSO, GA, and SSA; the specific parameter settings of each model are shown in Table 2, and the performance indexes of all models were finally obtained as shown in Tables 3 and 4. When the SVM model is used to calculate the Cd of the SCSW, the RMSE, MAPE, SD, and R were 0.047, 0.076, 0.073, and 0.897 in the training phase, respectively. The RMSE, MAPE, SD, and R were 0.045, 0.072, 0.062, and 0.926 in the testing phase, respectively. The PSO-SVM, GA-SVM, and SSA-SVM are significantly superior in each evaluation index in the training and testing phases than SVM, indicating that all three optimization algorithms can effectively improve the performance of SVM through global optimization search.
Model . | Parameter . | Value . | C . | γ . |
---|---|---|---|---|
PSO-SVM | Particle swarm size | 20 | 0.1 | 6.72 |
Number of iterations | 30 | |||
Inertia factor | 0.9 | |||
Acceleration constants | 2 | |||
Speed range | [−1,1] | |||
GA-SVM | Population size | 20 | 4.05 | 4.40 |
Number of iterations | 30 | |||
Crossover probability | 0.5 | |||
Mutation probability | 0.1 | |||
SSA-SVM | Number of sparrows | 20 | 0.1 | 4.33 |
Number of iterations | 30 | |||
warning value ST | 0.6 | |||
Proportion of discoverers | 0.7 | |||
Proportion of detectors | 0.2 |
Model . | Parameter . | Value . | C . | γ . |
---|---|---|---|---|
PSO-SVM | Particle swarm size | 20 | 0.1 | 6.72 |
Number of iterations | 30 | |||
Inertia factor | 0.9 | |||
Acceleration constants | 2 | |||
Speed range | [−1,1] | |||
GA-SVM | Population size | 20 | 4.05 | 4.40 |
Number of iterations | 30 | |||
Crossover probability | 0.5 | |||
Mutation probability | 0.1 | |||
SSA-SVM | Number of sparrows | 20 | 0.1 | 4.33 |
Number of iterations | 30 | |||
warning value ST | 0.6 | |||
Proportion of discoverers | 0.7 | |||
Proportion of detectors | 0.2 |
Model . | RMSE . | MAPE (%) . | SD . | R . | SI . | Bias . |
---|---|---|---|---|---|---|
SVM | 0.047 | 0.076 | 0.073 | 0.897 | 0.071 | 0.0130 |
PSO-SVM | 0.021 | 0.053 | 0.043 | 0.961 | 0.031 | 0.0028 |
GA-SVM | 0.014 | 0.037 | 0.041 | 0.987 | 0.022 | 0.0008 |
SSA-SVM | 0.019 | 0.048 | 0.044 | 0.967 | 0.024 | 0.0031 |
Model . | RMSE . | MAPE (%) . | SD . | R . | SI . | Bias . |
---|---|---|---|---|---|---|
SVM | 0.047 | 0.076 | 0.073 | 0.897 | 0.071 | 0.0130 |
PSO-SVM | 0.021 | 0.053 | 0.043 | 0.961 | 0.031 | 0.0028 |
GA-SVM | 0.014 | 0.037 | 0.041 | 0.987 | 0.022 | 0.0008 |
SSA-SVM | 0.019 | 0.048 | 0.044 | 0.967 | 0.024 | 0.0031 |
Model . | RMSE . | MAPE (%) . | SD . | R . | SI . | Bias . |
---|---|---|---|---|---|---|
SVM | 0.045 | 0.072 | 0.062 | 0.926 | 0.069 | 0.0120 |
PSO-SVM | 0.017 | 0.016 | 0.046 | 0.953 | 0.026 | 0.0010 |
GA-SVM | 0.009 | 0.008 | 0.043 | 0.965 | 0.014 | 0.0004 |
SSA-SVM | 0.016 | 0.016 | 0.047 | 0.949 | 0.031 | 0.0008 |
Model . | RMSE . | MAPE (%) . | SD . | R . | SI . | Bias . |
---|---|---|---|---|---|---|
SVM | 0.045 | 0.072 | 0.062 | 0.926 | 0.069 | 0.0120 |
PSO-SVM | 0.017 | 0.016 | 0.046 | 0.953 | 0.026 | 0.0010 |
GA-SVM | 0.009 | 0.008 | 0.043 | 0.965 | 0.014 | 0.0004 |
SSA-SVM | 0.016 | 0.016 | 0.047 | 0.949 | 0.031 | 0.0008 |
Comparison with empirical equations
Quantitative analysis of parameters
Parameter sensitivity analysis
Analysis of discharge characteristics
CONCLUSION
In order to achieve accurate water measurement and reasonable distribution of water resources in small channels, a semi-circular labyrinth side weir is used as an efficient and greater discharge capacity control structure. In this study, PSO-SVM, GA-SVM, and SSA-SVM optimization models were developed based on SVM. Then, Sobol's method was introduced to calculate the sensitivity coefficients of different dimensionless parameters h1/D, B/D, P/D, and Fr to Cd. This paper evaluates the effect of different factors on the discharge capacity of SCSW. The parameter variation range of various discharge capacities is proposed, and the variation law between different parameters and Cd is analyzed. The following conclusions were drawn.
- (1)
In the current study, GA-SVM can be used as an efficient and high-accuracy prediction model for the Cd of SCSW. In the testing phase, R = 0.987, MAPE = 0.037%, RMSE = 0.014, SD = 0.041, SI = 0.022, and Bias = 0.008, and 91.31% of the prediction errors were below 2%; the model has high generalization ability, stability, and prediction accuracy, and this model effectively solves the problems of large computational complexity and difficult coefficient correction in traditional empirical models.
- (2)
The quantitative analysis showed that the S1 and Si of h1/D, B/D, P/D, and Fr were 0.35, 0.07, 0.13, and 0.02; and 0.63, 0.25, 0.30, and 0.32, respectively; h1/D was the most important parameter affecting Cd, the effect of Fr on Cd after interacting with other parameters was only inferior to h1/D, and Cd decreased as h1/D increased. As D increased, Cd decreased the greater the trend. As the diameter of the side weir increases, the lateral flow will increase significantly in the subcritical flow regime.
- (3)
When h1/D < 0.24, 0.48 < Fr < 0.58, and 0.67 < Fr < 0.72, the Cd of SCSW is greater. Meanwhile, when Fr < 0.50 and 0.40 < h1/D < 0.47, the Cd of the SCSW is relatively small. This can provide an important reference basis for the application of SCSW in practical engineering.
In addition, in this study, the width of the main channel is constant. Therefore, it is necessary to further explore the influence of the width change of the main channel on the discharge coefficient of the SCSW.
FUNDING
This study was partly supported by the National Natural Science Foundation-sponsored project (grant 52079107), the Natural Science Basic Research Project of Shaanxi Province (grant 2023-JC-QN-0395), and the General Special Scientific Research Project of Shaanxi Province (grant 22JK0470).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.