An ISaDE algorithm combined with support vector regression for estimating discharge coefficient of W-planform weirs

Various shapes of weirs, such as rectangular, trapezoidal, circular, and triangular plan forms, are used to adjust and measure the flow rate in irrigation networks. The discharge coefficient (Cd) of weirs, as the key hydraulic parameter, involves the combined effects of the geometric and hydraulic parameters. It is used to compute the flow rate over the weirs. For this purpose, a hybrid ISaDE-SVR method is proposed as a hybrid model to estimate the Cd of sharp-crested W-planform weirs. ISaDE is a high-performance algorithm among other optimization algorithms in estimating the nonlinear parameters in different phenomena. The ISaDE algorithm is used to improve the performance of SVR by finding optimal values for the SVR’s parameters. To test and validate the proposed model, the experimental datasets of Kumar et al. and Ghodsian were utilized. Six different input scenarios are presented to estimate the Cd. Based on the modeling results, the proposed hybrid method estimates the Cd in terms of H/P, Lw/Wmc, and Lc/Wc. For the superior method, R , RMSE, MAPE, and δ are obtained as 0.982, 0.006, 0.612, and 0.843, respectively. The amount of improvement in comparison with GMDH, ANFIS and SVR is 3.6%, 1.2% and 1.5% in terms of R.


INTRODUCTION
Weirs are among the most essential components of water transmission networks, due to the necessity of determining the flow rate in channels and the allocated amount for consumers. These are essential hydraulic structures for controlling the flow and water level, which can be utilized to increase the height of the water surface level and thereby provide the required water heights to divert the flow to lateral channels.
Also, these structures are utilized as flow-measuring devices in crucial applications in rivers and open channels for their safe operation (Emami et al. ).
The length and shape of the weir crest are among the effective parameters in the flow rates over the weir, and numerous studies have been carried out on the effect of geometric and hydraulic parameters of weirs on the discharge coefficient (C d ). The use of non-linear planar weirs with a specific shape such as triangular, trapezoidal, piano key, circular and parabolic, which are known as labyrinth weirs, is one of the effective ways to increase the flow over a specified width.
Some notable approaches recently proposed to predict the is a machine-learning technique that is widely used for classification and regression purposes by means of a separating hyperplane. The benefits of SVM, especially its significant accuracy in classification utilizing the tuning parameters, associated with its wide real-world applications has led to the ever-increasing popularity of SVM in the last decade. Artificial neural network (ANN), as a famous computing technique, is inspired by the biological neural system which is utilized in solving complicated problems. ANNs can enjoy self-learning capabilities with task-specific rules and they reach better results by more available data. Genetic programming (GP) is a subset of machine learning and like all evolutionary algorithms (EAs) works by an iterative process. EAs are generally used to discover solutions to challenging real-world problems. GP starts by setting a goal function and then uses the Darwin evolution principle to generate a set of candidate solutions. The results obtained by Karami et al. () showed that the SVM method was superior to the other two methods with RMSE ¼ 0.0059. Haghiabi et al. () obtained the C d of the triangular labyrinth weir by using the ANFIS and multi-layer perceptron (MLP) models. An adaptive neuro-fuzzy inference system (ANFIS) is based upon a Takagi-Sugeno fuzzy inference system. This model was first introduced in the early 1990s.  The remaining parts are organized as follows. Problem Definition defines the problem that this paper is focused on.
Methodology describes the proposed methodology. The experiments are given in the fourth section. Discussion of results is provided in the fifth section. Finally, the sixth section concludes the paper and lists some future interesting directions.

PROBLEM DEFINITION
The purpose of determining the C d is to investigate the per- The following equation is used to calculate C d in labyrinth weirs (Bagheri & Heidarpour ): The discharge coefficient of the W-planform weir formula and its effective parameters can be given by Equation (2): where W mc is the main channel width, W c is the width of one cycle of the weir, L w is the total length of the weir, L c is the length of the one cycle, g indicates the acceleration effected by gravity, σ is the surface tension, μ is the dynamic viscosity of the fluid and ρ is the specific mass. Using the dimensional analysis for discharge coefficient, the functional relationship as Equation (3) data object and y i is the label that is assigned to O i . In the SVR algorithm, each data object O i ∈ D is considered as a point in m-dimensional space. The goal is to create a prediction model using some training data to separate data objects through finding a hyperplane that differentiates the data objects into some separate groups. This hyperplane is calculated based on a few data points, known as support vectors. In other words, SVR aims to maximize the minimum distance of data points from a separator hyperplane by solving the following equation: and polynomial basis function. In this paper, the RBF kernel function is used in the SVR due to its high performance and easy configuration compared with other kernel functions.
The precise tuning of C, γ and ε is an important task to increase the prediction performance of the SVR with RBF kernel. To optimize these parameters, we introduced the ISaDE algorithm, which is described in the next section.
Self-adaptive differential evolution algorithm Self-adaptive differential evolution (SaDE) is a well-known iterative optimization algorithm (Brest et al. ). SaDE is an evolutionary and multi-agent algorithm, in which each agent updates its position by the three operators of selection, mutation, and crossover. The flowchart of the SaDE algorithm is shown in Figure 2.
For an m-dimensional optimization problem, the initial population is an N Pop × m matrix defined as follows: where each individual X i ∈ P is an m-dimensional vector defined as follows:

Mutation
In the mutation step, a mutant individual M i is produced for X i as follows: in which r 1 , r 2 , r 3 ∈ [1, N Pop ] are random numbers used for identifying individuals that take part in mutation, and F ∈ [0, 1] indicates a real number that adjusts the amplification of the difference between X r 2 and X r3 . In standard SaDE, F i is defined as where ζ i ∈ [0, 1] is a uniform random variable, and Δ 1 is a probability value adjusting the control factor F. In the simulations, F lo ¼ 0:1, F hi ¼ 0:9 and Δ 1 ¼ 0:1. F is a random value in the range [0, 1].

Crossover
In the crossover phase, for each individual X i , the algorithm computes a trial vector T i , defined as follows: where t i,j ∈ T i is defined as in which r(j) ∈ [0, 1] is a uniform random variable, and is a random variable guarantee that T i gets at least one element from vector M i . CR ∈ [0, 1] indicates the crossover constant, which is defined as where ζ i is a uniform random variable in the interval [0, 1], and Δ 2 is a probability value adjusting the crossover constant CR. In the simulations, Δ 2 is set to 0.1.

Selection
To obtain the next generation, each individual X i is updated as follows: If the trial vector T i obtains a better fitness than X i , then X i is set to T i ; otherwise the old value X i is maintained.

Improved SaDE (ISaDE)
As mutation controls the algorithm exploration, it is the core operator of SaDE. Improving this parameter increases the algorithm search capability. In this study, an improved version of the mutation operator is proposed to improve the The new mutation operator is defined as follows: in which k is the current iteration, and I is the maximum number of iterations; c i is the ith chaotic variable generated based on the Logistic chaotic map. The reason for using a Logistic map is that it shows good chaotic characteristics, represents better randomness than other maps, and helps the algorithm to explore the points that are scattered around the search space as far as possible. The variable c i is computed as follows: where a is a scaler value, c k i is the kth chaotic number in the chaotic sequence, and k is the index of the chaotic sequence.
The initial value of c k i is in the range of (0, 1), provided that c k i ∉ {0, 0:25, 0:5, 0:75, 1}. In this work, a ¼ 4 is used. Equation (14) increases the exploration power of the algorithm such that different points of the search space are explored. The crossover and selection operators are left without change. Figure 3 illustrates the flowchart of the ISaDE algorithm.

The proposed ISaDE-SVR method
As mentioned before, the prediction performance of the SVR with RBF kernel is highly dependent on the three parameters C, γ, and ε. In other words, selection of optimal values for these parameters improves the speed of training and increases the performance of classification as well. To find the optimal values of the parameters and optimum feature selection, we used the ISaDE optimization algorithm. Figure 4 shows the proposed ISaDE-SVR approach for SVM parameter selection and feature extraction. This algorithm provides a better training effect and improves the prediction accuracy.
A detailed description of the ISaDE-SVR follows.

Generate initial population
Since the feature subset selection and the SVR's parameter optimization should be addressed simultaneously, each candidate solution in the population is composed of a feature permutation and parameter combination, as follows where p 1 , p 2 , p 3 are float numbers, which are candidate values for the three parameters C, σ, and ε. These values are generated randomly. The range of values for p 1 is [0, 100], and for p 2 and p 3 it is [0, 1]. Each feature f j is a binary variable and its value is 1 when it is considered in the model, and 0 when the feature is ignored.

Computing the fitness function
The fitness of each solution is evaluated using the mean squared error (MSE) of five-fold cross-validation for the SVR. The fitness function is defined as where X i is the observed value, X ⌢ i is the predicted value, and n is the total number of data in the dataset.
The individual with a smaller value of MSE is more preferable. The idea behind using the cross-validation strategy is to prevent under-fitting or over-fitting effects in the ISaDE-SVR approach. In n-fold cross validation, the training set is divided into n equal subsets. At each iteration, one of the subsets is taken as the testing set in turn, and n À 1 subsets are considered as training sets in the SVR method, then the above procedure is iterated until each subset is validated once.
Until the termination conditions are met, three operators including mutation, crossover and selection are iteratively carried out to update the population.

Evaluation of ISaDE algorithm
To evaluate the performance of the proposed ISaDE algorithm, 14 test problems were used. These functions include nine unimodal and four multimodal functions. Table 1   oceans. This algorithm updates the positions of salps using their local and global knowledge. Table 2 shows the multiproblem-based pairwise comparison among the algorithms using the Wilcoxon signed-rank test (Derrac et al. ).
The comparison is performed based on the mean cost values obtained over 30 simulation runs. This test is considered to statistically evaluate the performance of the algorithm when solving several benchmark functions.
The results testify that the ISaDE is more successful than its counterparts in converging to the global optimum of the benchmark functions, with a significance value α ¼ 0:05. Overall, the ISaDE outperforms the other algorithms in terms of convergence rate and solution quality.
Due the excellent performance of the ISaDE in solving numerical optimization problems, we used it to predict the discharge coefficient of labyrinth weirs. Easom .66 FletcherPowell 10 .12, 5.12] 0 Griewank   Table 3. We split the dataset into two sets: training and test set. The number of 180 records is randomly selected to be in the training set and the remainder form the test set.

Input parameters
To investigate the most appropriate input parameters, six different input combinations were evaluated. To select the most effective input parameters, first, all input parameters are considered for the development of the ISaDE-SVR model and then one of the input parameters is removed from the inputs and the model is re-trained and tested with the same structure. Table 4 where p i is the predicted C d i , e i is the observed C d i , and p and e are average predicted and observed C d values, respectively.

Investigating the effect of input scenarios
The performance of the proposed hybrid method was performed with six different input scenarios to obtain the C d of the W-planform weir. Table 5  L c /W c ), has the best results with R 2 ¼ 0.982 and RMSE ¼ 0.006 in the testing stage. As shown in     of Ω is plotted for the different input models in Figure 10.
The maximum, minimum, and average values of the Ω index for the six models are presented in Table 8. Among all the six models, the largest value of the Ω index is obtained for model П 5 . Therefore, models П 2 , П 4 , and П 6 predict the discharge coefficient close to the observed values compared with the other combinations. The H/P, L w /W mc , and L c /W c parameters were found to be the most effective parameters in the estimation of C d .  Figure 11 and Table 9

CONCLUSION
In this study, a novel hybrid ISaDE method, namely ISaDE-SVR, was used to predict the C d of W-planform weirs. For this purpose, six models were introduced by combining    finding the discharge coefficient. However, the performance of ISaDE-SVR is far from ideal. Therefore, interesting future work is to combine the ISaDE algorithm with artificial neural networks and neuro-fuzzy models to improve the performance of C d prediction. Another search direction is to apply the ISaDE-SVR method on other optimization problems to evaluate its potential and disadvantages.