## Abstract

In the present research, three different data-driven models (DDMs) are developed to predict the discharge coefficient of streamlined weirs (*C*_{dstw}). Some machine-learning methods (MLMs) and intelligent optimization models (IOMs) such as Random Forest (RF), Adaptive Neuro-Fuzzy Inference System (ANFIS), and gene expression program (GEP) methods are employed for the prediction of *C*_{dstw}. To identify input variables for the prediction of *C*_{dstw} by these DMMs, among potential parameters on *C*_{dstw}, the most effective ones including geometric features of streamlined weirs, relative eccentricity (*λ*), downstream slope angle (*β*), and water head over the crest of the weir (*h*_{1}) are determined by applying Buckingham π-theorem and cosine amplitude analyses. In this modeling, by changing architectures and fundamental parameters of the aforesaid approaches, many scenarios are defined to obtain ideal estimation results. According to statistical metrics and scatter plot, the GEP model is determined as a superior method to estimate *C*_{dstw} with high performance and accuracy. It yields an *R*^{2} of 0.97, a Total Grade (TG) of 20, RMSE of 0.032, and MAE of 0.024. Besides, the generated mathematical equation for *C*_{dstw} in the best scenario by GEP is likened to the corresponding measured ones and the differences are within 0–10%.

## HIGHLIGHT

The discharge coefficient of streamlined weirs (

*C*_{dstw}) was analyzed using intelligent models.Random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), and gene expression programming (GEP) methods are used and compared for predicting

*C*_{dstw}.The GEP model has been identified as a superior method for estimating

*C*_{dstw}with high performance and accuracy.

## NOMENCLATURE

*C*_{dstw}discharge coefficient of streamlined weirs (–)

- RF
Random forest

- ANFIS
Adaptive neuro-fuzzy inference system

*L*_{w}length of the weir (cm)

*W*_{w}height of the weir (cm)

*H*_{1}total head over the crest of the weir (cm)

*Q*discharges (L/s)

*K*number of discrete trees in random forest

- GEP
gene expression program

- RMSE
root mean square error (m)

*λ*relative eccentricity (–)

*y*_{i}predicted values of

*C*_{dstw}*x*_{i}observed values of

*C*_{dstw}*μ*_{x}average of observed values of

*C*_{dstw}*μ*_{y}average of estimated values of

*C*_{dstw}*β*downstream slope angle

*N*Number of datasets

*R*^{2}Determination coefficient (–)

*Y*_{1}Upstream flow depths (cm)

*Y*_{2}Flow depths over weirs crest (cm)

*Y*_{3}Downstream flow depths (cm)

*h*_{1}Water head over the crest of the weir (cm)

- TG
Total grade

- MBE
Mean bias error (–)

- CC
Lin's concordance correlation coefficient

- MAE
Mean absolute error (–)

*b*_{w}crest width

*σ*_{x}Standard deviation of observed values of

*C*_{dstw}*σ*_{y}Standard deviation of estimated values of

*C*_{dstw}- ANNs
artificial neural networks

## INTRODUCTION

Weirs diffuse rainwater, and thus reduce erosion. Besides, water flows more slowly and infiltrates into the soil, and it helps to create groundwater reserves available for agricultural use. Precise control of water emissions of weirs is an effective method of precision irrigation (Kröger *et al.* 2011). Weirs are considered to be the most common hydraulic structure worldwide and are also commonly used to improve and develop artificial irrigation methods in barren valley areas. They are normally divided into three main sets, namely short-crested weirs, sharp-crested weirs, and broad-crested weirs. Short-crested weirs are further classified into three diverse types, namely streamlined weirs, circular-crested weirs, and overflow (Ogee) weirs (Bos 1976). Streamlined weirs are a specific sort of short-crested weirs inspired by the airfoil concept. They have some advantages contrasted to the other kinds of weirs such as consistency and fewer vacillations of the free surface of water and especially high discharge coefficient (*C*_{d}).

The discharge coefficient (*C*_{d}) has been assumed to depict the remaining energy losses that have not been pondered in the derivation like turbulence circumstances on account of surface tension, viscous effects, and three-dimensional flow structures ahead of the weir plate (Aydin *et al.* 2011). Precise indication of *C*_{d} has a very significant impact in assessing the discharge of flow over the weirs. Therefore, it is substantial to compute *C*_{d} correctly.

Frequent investigations were performed on different forms of weirs that normally were emphasized on *C*_{d}. In this respect, lots of experimentally based formulations have been recommended to determine *C*_{d} in open channels at early times. Rajaratnam & Muralidhar (1971) performed a variety of exact measurements on velocity and pressure fields in a curved section of flow in the vicinity of the crest of a rectangular sharp-crested weir. They remarked that the presented measurements would be constructive in the improvement of presumptions for curved open-channel flow. Saadatnejadgharahassanlou *et al.* (2017, 2020) investigated experimentally and numerically hydraulic attributes of a special form of sharp-crested V-notch weir (SCVW). They reported that along with the results, SCVW outdid normal weirs. Salmasi (2018) evaluated the impacts of downstream submergence and apron elevation on *C*_{d} of Ogee weirs. The results demonstrated that the relationship of head–discharge was fairly self-determining the downstream submergence when submergence levels were smaller than 0.8. Furthermore, *C*_{d} depended on the spillway crest and vertical distance among weir height and downstream apron. Haghiabi *et al.* (2018) inspected the hydraulic attributes of a circular-crested stepped weir. They affirmed that the mentioned kind of weir might squander the flow energy up to 90%. Abdollahi *et al.* (2017) simulated the flow field around labyrinth side weirs with guide vanes through OpenFOAM software. They reported that based on the outcomes, the pick *C*_{d} was attained when the vane plates were situated vertically to the flow direction in the downstream end of the weirs across the main channel. Sutopo *et al.* (2022) examined the impact of spillway width on flow elevation at the weir crest on the basis of flood discharge design for the Probable Maximum Flood (PMF) return period by flood routing hydrologically at the Cacaban Dam (Indonesia). The outcomes of the investigation approved that snowballing spillway crest widths resulted in lessening flow evaluation at the spillway crest, and consequently amplifying outflow discharge. Yamini *et al.* (2020) evaluated the effect of hydrodynamics flow on flip buckets spillway for flood control in large dam reservoirs. They presented an equation to determine pressure distribution, particularly the position of maximum dynamic pressure on the bed of flip buckets with a high radius, as a function of bucket geometry and flow depth.

Rao & Rao (1973) examined the streamlined weir performance (termed hydrofoil weirs), in both submerged and free overflow conditions. The results of experiments confirmed a higher *C*_{d} in comparison with other kinds of weirs. Bagheri & Kabiri-Samani (2020a, 2020b) performed widespread experimental and numerical works to evaluate the characteristics of streamlined weirs. Experimental outcomes of steady flow discharge confirmed that upstream flow heads on streamlined weirs corresponding to diverse relative eccentricities were particularly constant, which indicated almost fixed *C*_{d} by changes of comparative eccentricities.

Along with the literature review, it can be discerned that most researchers proposed nonlinear mathematical equations to compute the amount of *C*_{d} approximately. In these equations, *C*_{d} was represented with self-determining variables. Since non-linearity integration is concerned, the recommended equations for *C*_{d} have typically some definite restraints. Because of the high inconsistency of *C*_{d} on the weak ability of empirical formulas and the extreme significance of *C*_{d}, researchers presented nonlinear schemes, such as machine-learning models (MLMs) and intelligent optimization models (IOMs). MLMs and IOMs are known as dominant stand-in techniques to explain indefinite engineering problems, particularly appropriate in addressing the elaborate and nonlinear performance of *C*_{d}.

Of artificial intelligence (AI) algorithms, random forest (RF) algorithm, adaptive neuro-fuzzy inference system (ANFIS), and gene expression program (GEP) have a particular status. RF algorithm was initially suggested by Breiman (1996) and is considered an admirable method with an extremely simple and flexible structure, yet more cost-effective for calibration and higher exactness in forecasting. It is a developed kind of decision tree (DT) algorithm with ensemble concepts, which utilizes numerous circles of DT to map relationships among highly nonlinear variables in big datasets to solve various complicated engineering problems (Breiman 2001). GEP is a biologically developed IOM that has the satisfactory capability to compute parameters with a nonlinear relationship. It has been widely employed for forecasting *C*_{d} in diverse weirs. Fuzzy logic (FL) method has particular significance in modeling and controlling the most intricate nonlinear systems (Zadeh 1993). A combination of fuzzy systems and artificial neural networks is called ANFIS presented by Jang (1993). It is a multilayer feed–forward network in which each neuron implements a specific purpose on received signals. Both square and circle node symbols are employed to characterize diverse features of adaptive learning. To execute preferred input–output attributes, the adaptive learning factors are reorganized on the basis of the hybrid learning rule which is an incorporation of the back-propagation gradient descent and the least square error techniques (Jang 1993; Hanbay *et al.* 2009).

Recently, different AI methods have been developed to predict discharge coefficient of diverse weirs such as labyrinth weirs (Norouzi *et al.* 2019; Zounemat-Kermani *et al.* 2019), gated piano key weirs (Akbari *et al.* 2019), side weirs on converging channels (Zarei *et al.* 2020), oblique weir (Norouzi *et al.* 2020), broad-crested weirs with cross-section rectangular and suppressed (Nourani *et al.* 2021), SCVW (Gharehbaghi & Ghasemlounia 2022). Salmasi *et al.* (2013) examined *C*_{d} in the compound rectangular BCW by employing AI approaches. The results confirmed that GEP was more precise than those of AI methods. Roushangar *et al.* (2018) computed *C*_{d} of stepped spillways under skimming flow and nappe regime. The results affirmed that the GEP had strong potential in modeling *C*_{d} via data acquired from physical models. Salazar & Crookston (2018, 2019) estimated *C*_{d} of arced labyrinth weirs by using various MLMs through some input variables including the head over crest, angle of cycle arc, and angle of cycle sidewall. They fed linearity to their models and reported the superiority of RF in comparison with other applied methods for forecasting *C*_{d} in the application of area. Kumar *et al.* (2020) and Aein *et al.* (2020) appraised *C*_{d} in the combined weir-gate and piano key weir by using several MLMs in different flow circumstances, respectively. They declared that the best agreement was gained between measured and forecasted values of *C*_{d} by the RF method. Chen *et al.* (2022) developed different traditional and hybrid machine-learning–deep learning (ML-DL) algorithms to forecast discharge coefficient of streamlined weirs (*C*_{dstw}). The results confirmed that the proposed three-layer classified DL algorithm comprising of a convolutional layer united with two subsequent gated recurrent unit (GRU) levels, which is also hybridized by linear regression (LR) method (i.e., LR-CGRU), outperformed markedly in comparison with the algebraic equations presented by Bagheri & Kabiri-Samani (2020a) and Carollo & Ferro (2021).

Although computational fluid dynamics (CFD) has gotten incredible considerations from both industry and academic worlds to estimate variables in fluid domains, it agonizes on behalf of computationally difficult processes and an obligation of reflective theoretical understanding in the fluid mechanics sphere (Gharehbaghi 2016; Saadatnejadgharahassanlou *et al.* 2017; Dasineh *et al.* 2021; Gharehbaghi & Ghasemlounia 2022). Against the overflow and circular-crested weirs, a wide-ranging scarcity of investigations regarding flow features over the streamlined weirs and most characteristics of the streamlined weirs are still unknown. To get rid of restrictions of empirical relationships and CFD models concerning geometric and hydraulic parameters for discharge coefficient based on experimental or hydraulic models, in the present research, the *C*_{dstw} in steady, aerated, and free overflow situations in an open channel are predicted by using RF, ANFIS, and GEP approaches. In this direction, by combination of several geometric and hydraulic parameters affecting hydraulic operations of streamlined weirs and by tuning the structures and key hyperparameters of these methods, several scenarios are defined. It is vital to mention that all key hyperparameters are chosen via a trial-and-error procedure to accomplish the ideal construction of the methods used. In this regard, 120 observation data are employed in the mentioned approaches to evaluate *C*_{dstw} with regard to the dimensionless parameters which affect the process of estimating *C*_{dstw}. The main contributions of this work are as follow:

Identify the most effective variables on

*C*_{dstw}among a list of potential geometric and hydraulic parameters using preprocessing methods.To develop suitable AI methods to compute

*C*_{dstw}in steady, aerated, and free overflow situations in an open channel using most effective variables specified.To determine an optimal value of hyperparameters and architecture of models developed via the algorithm-tuning process for better configuration and decrease the effect of underfitting or overfitting problems.

To match the experimental results of models developed to distinguish the attributes of the optimum method through statistical evaluation metrics.

The remaining contents are prearranged as follows. Section 2 presents the experimental framework and measuring process. Section 3 presents the theoretical approach for the head–discharge equation for the short-crested weirs and Joukowsky transform function in streamlined weirs. Section 4 presents the application of dimensional analysis in finding potential parameters on *C*_{dstw}. Section 5 presents a sensitivity analysis to pick the most significant predictor variables on *C*_{dstw} by data-driven models (DDMs) proposed. Section 6 presents an overview of DDMs developed to estimate *C*_{dstw.} Section 7 presents performance evaluation metrics used to compare the models' performance in estimation of *C*_{dstw}. Section 8 presents validation of DDMs developed statistically and graphically. Section 9 presents performance comparison of the methods developed using a model scoring procedure. The last section concludes the current research.

## EXPERIMENTAL FRAMEWORK

*b*

_{w}) of 0.4 m, and were placed at 4.5 m downstream from the channel inlet. Free steady flow conditions were applied so that the downstream flow was supercritical in all tests. Features of experimental models are specified in Figure 1. The geometric features and the downstream slope angle of the weirs,

*β*, are computed using Joukowsky transform function (Bagheri & Kabiri-Samani 2020a).

In Figure 1, *Y*_{1}*, Y*_{2}*,* and *Y*_{3} are upstream flow depths, flow depths over weirs crest, and downstream flow depths (cm), respectively. *Y*_{1} and *Y*_{3} are gauged at the nearest segments close to the structure where streamlines' curving is trivial. *L*_{w} and *W*_{w} are the length and height of the weir (cm), respectively. Moreover, *h*_{1} and *H*_{1} are water head and total head over the crest of the weir (cm), respectively. All discharges (*Q*) were gauged by using an electromagnetic flowmeter in the measuring basin with a precision of ±0.5%. The dimensions of the streamlined weirs and the range of applied observation data are provided in Table 1.

C_{dstw}
. | h_{1} (cm)
. | β° (–) . | Q_{(l/s)}
. | λ (–)
. | L_{w} (cm)
. |
---|---|---|---|---|---|

0.87–1.31 | 4–21.6 | 0, 30, 60 | 5.2–77.7 | 0.25, 0.5, 1, 1.25 | 40.2, 53.3, 64.1, 71.2 |

C_{dstw}
. | h_{1} (cm)
. | β° (–) . | Q_{(l/s)}
. | λ (–)
. | L_{w} (cm)
. |
---|---|---|---|---|---|

0.87–1.31 | 4–21.6 | 0, 30, 60 | 5.2–77.7 | 0.25, 0.5, 1, 1.25 | 40.2, 53.3, 64.1, 71.2 |

Interested readers can refer to Bagheri & Kabiri-Samani (2020a) for more details about the experimental setup.

## THEORETICAL APPROACHES

*b*

_{w}and

*H*

_{1}are the weir width and the total head over the weir crest (

*H*

_{1}

*=*

*h*

_{1}

*+*

*v*

^{2}/(2 g)), respectively. Moreover,

*v*indicates the approach velocity, and

*g*is the gravity acceleration (m/s

^{2}). From an empirical standpoint, to gauge

*H*

_{1}directly is not feasible. Because (

*v*

^{2}/(

*2 g*)≪

*h*

_{1}), it is assumed that

*H*

_{1}

*≈*

*h*

_{1}. In other words, the differences among the upstream heads (

*h*

_{1}=

*Y*

_{1}–

*W*

_{w}) are very slight, which is in order of a few millimeters.

As aforesaid, streamlined weirs are inspired by the airfoil concept. In the current paper, 12 streamlined weir prototypes using Joukowsky transform function are constructed to conduct experimental works. Practically, to evaluate the main factors in the design of an airfoil, Joukowsky transform function has been employed. In the Joukowsky transform function, the relative eccentricity *λ* is a significant parameter, which can depict the weir geometry. Interested readers can refer to Bagheri & Kabiri-Samani (2020a) for more information about the concept of Joukowsky transform function.

## DIMENSIONAL ANALYSIS

*C*

_{dstw}

*.*Based on Figure 1, potential parameters that affect

*C*

_{dstw}can be depicted as:where

*b*

_{w}refers to the channel width (m),

*β*describes the slope angle of the downstream weir (degree),

*λ*is the relative eccentricity,

*S*is the slope of main channel bed,

_{o}*ρ*

_{w}is the water density (kg/m

^{3}),

*σ*

_{w}is the surface tension of water (kg/s

^{2}),

*μ*

_{w}is the dynamic viscosity (kg/(s.m)),

*n*is the coarseness of the main channel (s/m

_{r}^{1/3}), and

*ɛ*is the surface roughness. Borghei

_{r}*et al.*(1999) reported that impacts of

*S*,

_{o}*n*,

_{r}*μ*

_{w},

*ɛ*and

_{r},*σ*

_{w}on basic flow particles were very slight. Furthermore, surface tension is significant in small nappe heights, so they are neglected here since, in the current paper, the lowest nappe height over the weir is considered 20 mm. Thus, the set of dimensionless equations attained is as follows:

*π*

_{1}

*, π*

_{2}

*, π*

_{3}

*, π*

_{4}

*, π*

_{5}

*, π*

_{6}

*,*and

*π*

_{7}are dimensionless sets and

*f*is a functional sign. By operating the Buckingham π-theorem and using the features of dimensional analysis, non-dimensional relationships in functional systems can be depicted as follows:

*C*

_{dstw}

*.*The fifth and the sixth expressions on the right-hand side indicate the Weber number (

*W*) and the Reynolds number (

_{b}*R*), respectively. The impacts of the Weber number and the Reynolds number can be ignored excluding slight values of

_{e}*h*

_{1}(Rao & Shukla 1971). Consequently,

*C*

_{dstw}over the studied weir would be expressed as follows:with the purpose of the assessment of variations processes of

*C*

_{dstw}and hydraulic features of streamlined weirs, different experimental values for

*h*

_{1}/

*W*

_{w},

*b*

_{w}/

*L*

_{w},

*β*, and

*λ*are tested. Equation (6) is capable to predict

*C*

_{dstw}of streamlined weirs by the afore-mentioned DDM approaches.

## DATA AND SENSITIVITY ANALYSIS

Because the performance of any simulation method in the precise predicting of target parameter chiefly count on a suitable choice of predictor variables, unsuitable picking could adversely influence the ability of the method. Thus, in the current research, the most significance predictor variables for estimation of *C*_{dstw} by DDMs are selected using cosine amplitude sensitivity analysis.

*et al.*2014):where

*I*and

_{i}*O*are input and target parameters, respectively, and

_{j}*N*is the number of data.

*R*value is in the range of 0 and 1, and defines the relationship strength among every predictor and the target in Equation (6) (Figure 2). According to Figure 2, it can be concluded that due to high

_{ij}*R*values of

_{ij}*h*

_{1}/

*W*

_{w},

*b*

_{w}/

*L*

_{w},

*β*, and

*λ*(over 0.5), they have a noticeable effect on

*C*

_{dstw}and cannot be ignored, so are contemplated as input variables in estimating

*C*

_{dstw}by DDMs. Statistical attributes of parameters used in Equation (6) are presented in Table 2. As seen in Table 2, it is understandable that

*C*

_{dstw}has the most unstable behavior, so robust precise methods are need to analyze and estimate

*C*

_{dstw}.

Statistical indices . | h_{1}/W_{w}
. | λ
. | β
. | b_{w}/L_{w}
. | C_{dstw}
. |
---|---|---|---|---|---|

Mean | 0.71 | 0.56 | 15 | 0.01 | 1.1 |

Minimum | 0.13 | 0.13 | 0 | 0.01 | 0.87 |

Maximum | 3.41 | 1 | 60 | 0.01 | 1.31 |

Std. deviation | 0.59 | 0.34 | 23.01 | 0 | 0.11 |

Coefficient of variation | 0.82 | 0.6 | 1.53 | 0.21 | 0.1 |

Skewness | 2.39 | 0.23 | 1.14 | 0.22 | −0.02 |

Statistical indices . | h_{1}/W_{w}
. | λ
. | β
. | b_{w}/L_{w}
. | C_{dstw}
. |
---|---|---|---|---|---|

Mean | 0.71 | 0.56 | 15 | 0.01 | 1.1 |

Minimum | 0.13 | 0.13 | 0 | 0.01 | 0.87 |

Maximum | 3.41 | 1 | 60 | 0.01 | 1.31 |

Std. deviation | 0.59 | 0.34 | 23.01 | 0 | 0.11 |

Coefficient of variation | 0.82 | 0.6 | 1.53 | 0.21 | 0.1 |

Skewness | 2.39 | 0.23 | 1.14 | 0.22 | −0.02 |

## MODELING

Due to the complicated nonlinear nature in function approximation of relationship amid *C*_{dstw} with the parameters used in Equation (6), precise modeling and analysis are necessitated to cope with data series. As such, ANFIS, RF, and GEP methods are developed to evaluate *C*_{dstw}. Both sides of Equation (6) are firstly normalized to zero mean and unit variance as the recommendation by Lawrence *et al.* (1997). Then, 70% (84) of recorded data are randomly used in the training phase and the other 30% (36) are employed in the testing phase.

### Overview of the RF model

RF is a developed and widespread ensemble technique that contemplates as a forest containing numerous simple decision trees (DT) grown in parallel. RF is suitable for forecasting and classification issues (Cutler *et al.* 2007).

The RF algorithm, by transforming and constantly altering the factors affecting the target parameter, causes the generation of many decision trees, and after that, all trees are united for the prediction mission. By growing the number of trees, the impact of the overfitting problem and error rate are decreased consequently. It operates a bagging process to pick random samples of parameters meant for the training dataset (Trigila *et al.* 2015). To make a relationship among different parameters, it categorizes the dataset in the initial phase, and afterward initiates to produce leaf nodes and roots in a downward path, respectively (Diaz-Uriarte & De Andres 2006). Specifying features is the chief mission of each node and leaf that describes inquiries about input and target parameters. To stipulate a set of responses, the leaves of trees are employed (Al-Juboori 2019).

*m*) and the number of trees grown (

*K*) which represents for every discrete tree, yet affects strongly the exactness of results (Breiman 2001). The RF estimation process can be explained by the following equation as (Breiman 2001):where

*K*is the number of discrete trees in the forest. Figure 3 demonstrates the operated flow network plot of the RF model to predict

*C*

_{dstw}.

#### Model development

Because there are no standards to predetermine a suitable value for *K* in a given dataset*,* its optimal value should be found through a trial-and-error way to get the ideal structure for an RF model. Several scenarios with diverse steps are regarded for *K* value. *R*^{2} and RMSE grades of each scenario are employed as evaluation metrics and consequently, a scenario that has the maximum *R*^{2} and minimum RMSE amount is taken into account as the optimal one. In the current study, inputs cover the experimental data (i.e., *h*_{1}/*W*_{w}, *b*_{w}/*L*_{w}, *β*, *λ)* and the target is *C*_{dstw}, thus the value of *m* is set to 4.

### Overview of the ANFIS

*et al.*2004; Firat & Güngör 2007; Yurdusev & Firat 2009). The ANFIS model is formed from five layers. There are two sorts of FISs, Sugeno-Takagi FIS and Mamdani FIS. The most significant difference among these FISs is a characterization of consequence factor. The consequence factor in Sugeno-Takagi FIS is either a linear equation termed ‘first-order Sugeno FIS’, or a constant coefficient, ‘zero-order Sugeno FIS’ (Jang 1993). Figure 4 demonstrates the structure of ANFIS (Yurdusev & Firat 2009). For the first-order Takagi-Sugeno FIS, common rule sets for two inputs including

*x*and

*y*, two if-then rules, and one output

*f*, can be depicted as the following(Jang 1993),

**Rule 1:** If *x* is *A*_{1} and *y* is *B*_{1} Then *f*_{1}*=**p*_{1}*x**+**q*_{1}*y**+**r*_{1}

**Rule 2:** If *x* is *A*_{2} and *y* is *B*_{2} Then *f*_{2}*=**p*_{2}*x**+**q*_{2}*y**+**r*_{2}

In ANFIS, the rules are regular, but the form and numbers of MFs are optimized. To implement ANFIS, the number of input datasets must be less than six (Kisi & Sanikhani 2015).

#### Model development

In this study, ANFIS toolbox in MATLAB 2019b with Sugeno-Takagi FIS model is applied to predict *C*_{dstw}. In this regard, the dimensionless independent experimental parameters *β*, and *λ* are used as the input variables.

To organize the input datasets and for creating fuzzy rules in the ANFIS model, there are two general schemes including subtractive clustering (SC) and grid partition (GP). In this research, the GP method is utilized to generate FIS. To obtain an appropriate ANFIS structure, suitable MFs and an optimal number of MFs for both input/output datasets should be employed. However, an appropriate MFs and their optimal numbers ought to be determined by the trial-and-error procedure. In this direction, several different scenarios are characterized by the user via a trial-and-error process to accomplish the ideal structure.

In total, eight different MFs including, Trimf (Triangular), Trapmf (trapezoid), Gbellmf (Generalized bell), Gaussmf (Gaussian), Gauss2mf (two Gaussian), Pimf (Pi-shaped), Desigmf, and Psigmf for input parameters are employed to develop various scenarios. In all scenarios, the number of MFs for the input variables is set as 3 and linear MF for the output variable is selected. Additionally, to indicate the nonlinear input and linear output parameters for training FIS, the hybrid algorithm is used as the optimized model with epoch 100 and zero tolerance. The performance of scenarios in the testing stage is evaluated by statistical metrics.

### Overview of GEP

GEP ponders as a circulating and evolutionary intelligence algorithm introduced by Ferreira (2001) and derived from the Darwinian evolution concept with sufficient ability to predict elaborate relationships. Technically, in the present technique, the supreme population is carefully chosen; else, a fresh population is revived to reach the ideal population.

The creation of secluded items (expression tree and genome) with diverse applications yields the algorithm to adopt with great ability that meaningfully outdoes the present evolutionary methods.

The creation of the primary populace is the initial stage in GEP algorithm. This executes haphazardly or with some knowledge of the matter. Subsequently, the chromosomes are indicated in the structure of an expression tree. The consequences are assessed through a fitness function to specify the appropriateness of a resolution. By reaching a reasonable value, the progression procedure is discontinued and the excellent conclusion is reported. If stop situations are not fulfilled, the ideal one for the extant group is kept back. The procedure is repeated for a given number of generations so that an optimum outcome is acquired (Ferreira 2001).

#### GEP model development

In this investigation, GeneXpro Tools 4.0 program is employed in adopting the GEP model. The values of *C*_{dstw} are estimated by using GEP with the following steps (Mehdizadeh *et al.* 2017):

- 1.
In the first step, RMSE is selected as the fitness function.

- 2.
The second step is to determine variables and function sets to produce the chromosomes. Concerning, forecasters variables in Equation (6) (i.e.,

*h*_{1}/*W*_{w},*b*_{w}/*L*_{w},*β*, and*λ*) are selected as inputs variables, yet the target variable is*C*_{dstw}in Equation (6). The functions set comprises four primary mathematics operators { + , –, /, ×} and several mathematical functions, including*x*^{2},*x*^{3},*e*^{x}, etc. - 3.
In the third step, the core construction for chromosomes, such as head size, number of genes, and chromosomes, is outlined.

- 4.
In the fourth step, a linking function is employed to link expression trees and relate subcategories. In this direction, addition, subtraction, multiply, and division are tested.

- 5.
Finally, Maximum Fitness criteria, equal to 5,000, are specified as stop criteria.

In this study, by tuning the number of chromosomes (NC), kind of linking function (LF), and head size (HS) as hyperparameters, numerous scenarios are distinctly listed. It is necessary to mention that the ideal value of these hyperparameters is obtained through a trial-and-error method to accomplish an ideal GEP configuration.

## PERFORMANCE EVALUATION METRICS

*R*

^{2}), root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), and Lin's concordance correlation coefficient (CC) are employed to equate the models’ performance as follows:where

*N*is the number of data,

*x*

_{i}and

*y*

_{i}are observed and forecasted values of

*C*

_{dstw}, respectively. Lesser values for RMSE, MAE, and MBE accompanied by larger values for

*R*

^{2}denote a better forecasting presentation (Kisi 2007).

## RESULTS AND DISCUSSION

### Experimental results

Based on experimental results in *β**=* 0, lowering of *λ* caused a decrease in the weir height and decreases *Y*_{1} accordingly. Nevertheless, the differences among *h*_{1} are very slight. Thus, for a given Q, *h*_{1} was almost constant by shifting *λ*. Apart from *h*_{1}, by lowering *λ*, *Y*_{3} and *Y*_{2} increased. Furthermore, the ratio of increased to some extent by lowering *λ*. Its average for the streamlined weirs was approximately 0.75, while for circular-crested weirs, it was around 0.7 (Jaeger 1956). The reason was that, by lowering *λ*, the structure and streamline's curvature on the weir crest decreased (Bagheri & Kabiri-Samani 2020a). By lowering the streamline's curvature, the streamline's compression declined, and *Y*_{2} increased accordingly (Bagheri & Kabiri-Samani 2020a).

On the basis of experimental results at *β**=* 30° and 60° (base-block), increasing the weir height corresponding to *λ**=* 1 resulted in intensifying substantially the turbulence of the downstream weir flow. However, increasing the height of other ones with *λ* < 1 did not strikingly change the hydraulic manner of downstream weir flow (Bagheri & Kabiri-Samani 2020a). Besides, for streamlined weirs with small *Q* and *λ* < 1, the flow goes through a virtual surface into the downstream and an air pocket was involved below the lower nappe of flow profile (Bagheri & Kabiri-Samani 2020a). As *Q* rises, the air packet was annihilated and the weir base was successively submerged. A rotational flow region was developed near the weir base-block and consequently, a non-turbulent surface was generated. Interested readers can also refer to the previous study by Bagheri & Kabiri-Samani (2020a) for more details about the experimental results.

### Validation of the RF model

Here, after several examinations, testing further numbers for variables and trees on each node showed that *K* = 500 leads to comparatively better results. The value of statistical indices for *C*_{dstw} in the calibration and validation stages under the optimal scenario of the RF model is given in Table 3.

Stages . | Statistical metrics (dimensionless) . | ||||
---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | MBE . | CC . | |

Calibration | 0.98 | 0.0175 | 0.0147 | 381 × 10^{−6} | 0.95 |

Validation | 0.96 | 0.0234 | 0.0192 | 0.0016 | 0.93 |

Stages . | Statistical metrics (dimensionless) . | ||||
---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | MBE . | CC . | |

Calibration | 0.98 | 0.0175 | 0.0147 | 381 × 10^{−6} | 0.95 |

Validation | 0.96 | 0.0234 | 0.0192 | 0.0016 | 0.93 |

The positive value of MBE signifies that the model overestimates the corresponding observed values. Also, it can be inferred that the RF model estimates *C*_{dstw} with high precision in the both calibration and validation stages, yet in the calibration stage, it is slightly more accurate than the validation stage.

*C*

_{dstw}signifies the high performance of model. Figure 6 illustrates comparisons between measured and predicted

*C*

_{dstw}in the validation stage under the optimal scenario by the RF model. Results of statistical indices and visual analysis of Figure 6 confirm a high capability of the structural design of the RF model under the optimum scenario for computing the target with high precision and agreement.

### Validation of the ANFIS model

*C*

_{dstw}by the ANFIS method under the optimal scenario in the testing stage.

Stages . | Statistical metrics (dimensionless) . | ||||
---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | MBE . | CC . | |

Calibration | 0.91 | 0.0306 | 0.0259 | 1.19 × 10^{−5} | 0.95 |

Validation | 0.79 | 0.0688 | 0.0544 | 0.0286 | 0.94 |

Stages . | Statistical metrics (dimensionless) . | ||||
---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | MBE . | CC . | |

Calibration | 0.91 | 0.0306 | 0.0259 | 1.19 × 10^{−5} | 0.95 |

Validation | 0.79 | 0.0688 | 0.0544 | 0.0286 | 0.94 |

### Validation of the GEP model

Rates of parameters and genetic operators of GEP to estimate *C*_{dstw} under the optimal scenario are presented in Table 5. In effect, these parameters are the custom of GEP and have a perceptible influence on the ability of GEP.

Inversion 0.1 | Two-point recombination 0.3 |

Gene transposition 0.1 | |

Mutation 0.044 | Insertion sequence (IS) transposition 0.1 |

Gene recombination 0.1 | |

Number of genes 3 | |

One-point recombination 0.3 | Root insertion sequence (RIS) transposition 0.1 |

Inversion 0.1 | Two-point recombination 0.3 |

Gene transposition 0.1 | |

Mutation 0.044 | Insertion sequence (IS) transposition 0.1 |

Gene recombination 0.1 | |

Number of genes 3 | |

One-point recombination 0.3 | Root insertion sequence (RIS) transposition 0.1 |

Value of statistical metrics by GEP under the optimal scenario in the validation stage are given in Table 6, wherein, NC specifies the number of chromosomes, LF shows the kind of linking function, HS indicates head size, and the bold ones signify the optimal scenario's attributes.

. | . | . | Statistical metrics (dimensionless) . | |||
---|---|---|---|---|---|---|

LF . | NC . | HS . | RMSE . | R^{2}
. | CC . | MAE . |

Addition | 30 | 8 | 0.18 | 0.55 | 0.74 | 0.17 |

Addition | 33 | 7 | 0.069 | 0.96 | 0.98 | 0.056 |

Addition | 35 | 6 | 0.036 | 0.92 | 0.96 | 0.029 |

Subtraction | 30 | 8 | 0.052 | 0.91 | 0.95 | 0.044 |

Subtraction | 33 | 7 | 0.13 | 0.46 | 0.68 | 0.107 |

Subtraction | 35 | 6 | 0.032 | 0.93 | 0.96 | 0.025 |

Multiplication | 30 | 8 | 0.033 | 0.91 | 0.95 | 0.024 |

Multiplication | 33 | 7 | 0.04 | 0.91 | 0.95 | 0.031 |

Multiplication | 35 | 6 | 0.032 | 0.97 | 0.96 | 0.024 |

Division | 30 | 8 | 0.051 | 0.88 | 0.93 | 0.04 |

Division | 33 | 7 | 0.04 | 0.91 | 0.95 | 0.033 |

Division | 35 | 6 | 0.039 | 0.97 | 0.98 | 0.032 |

. | . | . | Statistical metrics (dimensionless) . | |||
---|---|---|---|---|---|---|

LF . | NC . | HS . | RMSE . | R^{2}
. | CC . | MAE . |

Addition | 30 | 8 | 0.18 | 0.55 | 0.74 | 0.17 |

Addition | 33 | 7 | 0.069 | 0.96 | 0.98 | 0.056 |

Addition | 35 | 6 | 0.036 | 0.92 | 0.96 | 0.029 |

Subtraction | 30 | 8 | 0.052 | 0.91 | 0.95 | 0.044 |

Subtraction | 33 | 7 | 0.13 | 0.46 | 0.68 | 0.107 |

Subtraction | 35 | 6 | 0.032 | 0.93 | 0.96 | 0.025 |

Multiplication | 30 | 8 | 0.033 | 0.91 | 0.95 | 0.024 |

Multiplication | 33 | 7 | 0.04 | 0.91 | 0.95 | 0.031 |

Multiplication | 35 | 6 | 0.032 | 0.97 | 0.96 | 0.024 |

Division | 30 | 8 | 0.051 | 0.88 | 0.93 | 0.04 |

Division | 33 | 7 | 0.04 | 0.91 | 0.95 | 0.033 |

Division | 35 | 6 | 0.039 | 0.97 | 0.98 | 0.032 |

*C*

_{dstw}with sufficient precision. Figure 8 shows the comparison of measured and modeled

*C*

_{dstw}by GEP under the ideal scenario in the testing stage. Consistent with Figure 8, due to the high capability of the GEP model, it can suitably capture the fluctuations trend of observed

*C*

_{dstw}.

*C*

_{dstw}by GEP under the optimal scenario in the validation stage. According to the fit line equation in the scatter plot and by assuming that the equation is

*y*

*=*

*a*

_{o}x*+*

*a*

_{1}, the coefficients

*a*

_{1}and

*a*

_{0}are, respectively, near to 1 and 0 with a satisfactory rate of

*R*

^{2}.

*C*

_{dstw}relationship generated by GEP under the ideal scenario is presented in Equation (14):As can be seen, Equation (14) is very complex due to the intrinsic non-linearity of relationships amid the geometry of streamline weirs, channel, flow conditions, and

*C*

_{dstw}. The results emphasize the significance of an opposite input choice process to appraise the complexity and precision of model. The computed

*C*

_{dstw}is compared with the corresponding measured ones, with results within 0–10% and

*R*

^{2}= 0.97. Hence, it can be deduced that the suggested equation can be utilized as a multivariate mathematical relationship for the initial estimation of

*C*

_{dstw}with enough accuracy in the hydraulic engineering field.

## PERFORMANCE COMPARISON OF THE METHODS DEVELOPED

*C*

_{dstw}is carefully chosen by using the model scoring procedure recommended by Vaheddoost

*et al.*2016. In this procedure, the SG (success grade) and the FG (failure grade) of performance as the pivotal criteria are expressed as follows:

TG obtained by Equations (19)–(20) is presented in Table 7.

Model . | R^{2}
. | CC . | PI . | RMSE . | SG . | FG . | TG . |
---|---|---|---|---|---|---|---|

RF | 0.96 | 0.95 | 1.09 | 0.0175 | 19.79 | − 0.492 | 19.30 |

ANFIS | 0.79 | 0.94 | 1.024 | 0.068 | 17.94 | − 10.76 | 7.17 |

GEP | 0.97 | 0.96 | 1.07 | 0.032 | 20 | 0 | 20 |

Model . | R^{2}
. | CC . | PI . | RMSE . | SG . | FG . | TG . |
---|---|---|---|---|---|---|---|

RF | 0.96 | 0.95 | 1.09 | 0.0175 | 19.79 | − 0.492 | 19.30 |

ANFIS | 0.79 | 0.94 | 1.024 | 0.068 | 17.94 | − 10.76 | 7.17 |

GEP | 0.97 | 0.96 | 1.07 | 0.032 | 20 | 0 | 20 |

In relation to the values of TG in Table 7, the GEP model is selected as the superior approach for the prediction of *C*_{dstw}. The results obtained by RF are the second best, which indicates that the GEP model outperforms the other two models and is considered the most accurate method.

## CONCLUSION

In the present research, experimental data of streamlined weirs with different *β* values of 0°, 30°, and 60° from the study of Bagheri & Kabiri-Samani (2020a) were employed for investigation. The experimental setup was performed for large physical models under steady, aerated, and free overflow conditions in an open channel. As a substitute to the CFD technique to forecast *C*_{dstw}, the potential advantage of three different DDMs including RF, ANFIS, and GEP methods are developed in diverse geometric and hydraulic conditions. Main findings of the present study are as follows:

Based on the experimental results at

*β**=*0, lowering of*λ*led to a decrease in the weir height and*Y*_{1}, but an increase in*Y*_{3}*, Y*_{2}, and in the ratio of*Y*_{2}/*h*_{1}. Moreover, at*β**=*30° and 60° (base-block), increasing the weir elevation in*λ**=*1, the disturbance augmented considerably for the flow downstream of the streamlined weir, but, for*λ*< 1, did not demonstrably vary the hydraulic condition of flow in the downstream of the weir.Using Buckingham π-theorem and cosine amplitude (

*R*) analyses as a preprocessing method confirmed that the_{ij}*h*_{1}/*W*_{w},*b*_{w}/*L*_{w},*β*, and*λ*, have significant impact on*C*_{dstw}and consequently were considered as input variables in estimating*C*_{dstw}by developed DDMs, in which*b*_{w}/*L*_{w}was the most significant one.Performances of the three employed models were evaluated using statistical metrics and model scoring procedure. In line with the values of model grading, the GEP model was confirmed as the most superior and precise technique to compute

*C*_{dstw}with RMSE = 0.032, MAE = 0.024,*R*^{2}= 0.97, and CC = 0.96.

Even though the current investigation assessed the ability of a single AI method for predicting *C*_{dstw}, the forthcoming study can be developed by other kinds of MLMs and IOMs via hybridizing approaches. One may note that the application of successful surrogate modeling methods like polynomial chaos expansion/Kriging in other fields of engineering (Amini *et al.* 2021; Hariri-Ardebili *et al.* 2021) can be investigated in future work. The results can be compared with those of the current study so that the best method can be identified. Likewise, even if in the current research all effective variables on the *C*_{dstw} were scrutinized, its outcomes cannot be expanded to other structures.

## ACKNOWLEDGEMENTS

We are grateful to the Research Council of Shahid Chamran University of Ahvaz for financial support (GN: SCU.WH1401.7209).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.