## ABSTRACT

Accurate calculation of flow discharge for sluice gates is essential in irrigation, water supply, and structure safety. The measurement of discharge with the requirement of distinguishing flow regimes is not conducive to application. In this study, a novel approach that considers both free and submerged flow was proposed. The energy–momentum method was employed to derive the coefficient of discharge. Subsequently, the discharge coefficient was determined through the experiment which was performed on the physical model of a vertical sluice gate with a broad-crested weir. Feature engineering, incorporating dimensional analysis, feature construction, and correlation-based selection were performed. The best subset regression method was employed to develop regression equations of the discharge coefficient with the generated features. The derived formula was applied to compute the discharge coefficient in the vertical sluice gate and determine the flow discharge. The accuracy of adopted method was assessed by comparing it with recent studies on submerged flow, and the results demonstrate that the developed approach achieves a high level of accuracy in calculating flow discharge. The coefficient of determination for the calculated flow rate is 0.993, and the root mean square percentage error is 5.04%.

## HIGHLIGHTS

The feature combination with efficient interpretability of free-submerged flow discharge coefficient was proposed.

The computational equation applicable to free-submerged flow regimes based on the EM method was established with high accuracy.

The process of feature selection and calculation is not complicated and has potential for automated calibration.

## INTRODUCTION

Gates and weirs are commonly employed devices for regulating water levels, controlling and measuring flow rates in the open channel. The accuracy of flow measurement plays a crucial role in gate operation of the hydraulic structure within open channel or estuary. Moreover, it has a profound impact on water supply, irrigation, flood control, sedimentation transport, navigation, even the safety of hydraulic structures. (Donnelly *et al.* 2022, 2024; Noori *et al.* 2022). An approach extensively utilized to determine flow discharge is the classical energy–momentum (EM) method, which is applicable for free or submerged flow through the gate. The basis of methodology can be traced back to Wóycicki's doctoral thesis, and gained widespread recognition after Henderson's publication (Wóycicki 1931; Henderson 1966). Some scholars also established stage–discharge relationships by employing an analysis that incorporated the theorem of dimensional analysis and incomplete self-similarity theory (Ferro 2000). Ferro (2018) also used the momentum and mass equations to establish the stage–discharge equation for the sharp-crested gate. This equation is characterized by a momentum coefficient, which can be estimated empirically by the ratio between the orifice height and the upstream water depth. Theoretically, the discharge coefficient for both free and submerged flow can be mathematically represented by the equation derived from the EM method. The discharge coefficient can be effectively characterized by considering a multitude of factors, such as the energy loss coefficient and contraction coefficient . These aforementioned parameters assume a crucial role in understanding and predicting flow behaviour. The coefficient of energy loss is commonly acknowledged to be influenced by hydraulic characteristics, like boundary layer and turbulent flow. Meanwhile, the contraction coefficient is closely associated with geometric features, such as channels and gates (Belaud *et al.* 2009; Cassan & Belaud 2012; Castro-Orgaz *et al.* 2013). The value of submerged flow is comparable to that of free flow when at a small opening, irrespective of the submergence. However, this scenario does not hold true for a large opening. In a large gate opening, the contraction coefficient can exceed 0.6 when the flow is adequately submerged. These conditions typically result in large deviations in the predictions and actual measurement of discharge (Belaud *et al.* 2009). In a gate with broad-crested weir, the variations in the flow capacity can be attributed to the interaction between the flow over the weir and the wall jets. Belaud *et al.* (2012) proposed the incorporation of the Boussinesq and Coriolis coefficient into the equation in order to enhance the accuracy of the discharge coefficient prediction, and Bijankhan *et al.*(2017) demonstrated that Coriolis and pressure losses coefficient can improve the discharge calculation. With the improvement of computer technology, contraction and energy loss coefficient can be determined in the data through constrained optimization algorithms. This approach eliminates the need to assume constant values for unknown coefficients empirically in formulas (Habibzadeh *et al.* 2011). In general, the discharge coefficient is widely recognized as a key parameter that reflects various characteristics of the incoming flow within a channel, such as turbulence, viscosity, non-uniform velocity distribution, and velocity head (Bijankhan *et al.* 2017). Therefore, various factors can be utilized as initial features to establish a discharge coefficient calibration model, such as channel width, upstream and downstream water levels, gate opening, Reynolds number, or Manning's roughness coefficient. The model can be established by directly calibrating the discharge coefficient or calculating the discharge coefficient through calibrating the contraction and energy loss coefficients.

With advances in computational efficiency, researchers have effectively harnessed machine learning in investigation of the discharge coefficient. The computer algorithm has led to substantial enhancements in calibration accuracy compared to traditional regression methods. Multivariate Adaptive Regression Splines (MARS), Artificial Neural Networks (ANN), Symbolic Regression (SR), and various other algorithms have been employed in the study of discharge calculation for gates (Vaheddoost *et al.* 2021; Shakouri *et al.* 2023). Researchers have also compared the predictive accuracy using the same sets of data with different algorithms, such as Support Vector Machines (SVM), ANNs, and Generalized Regression Neural Networks (GRNNs). The objective of these studies is to determine the algorithm that produces the most accurate predictions in discharge calculations (Salmasi *et al.* 2021; Khosravi *et al.* 2022). Research findings on the optimal algorithm for discharge calculation exhibit variations due to factors such as feature selection, parameter tuning, comparison criteria, and algorithm implementation. As researchers attempt to enhance the accuracy of models, they often encounter a trade-off between accuracy and complexity in the formula used. Some studies propose formulas with numerous terms and intricate mapping relationships, occasionally involving nested functions. It is crucial to consider the practical implications of such complex formulations, as they may not be readily applicable to engineering scenarios. Currently, most methods for discharge calculation require the distinction of free or submerged flow regimes, and there is no highly accurate unified calculation method available. However, the advancement of algorithms has provided an opportunity to develop a high-precision computational model for calculating discharge in both free and submerged orifice flow.

The objective of this study is to propose an accurate calculation method for measuring the discharge of free-submerged orifice flow with a broad-crested weir. The prototype of the experimental model is located in a tidal river, which means that the gate flow is influenced by the outer river. Therefore, a considerable portion of the measured data belongs to submerged flow. Besides, it is necessary to fully consider the features that affect flow regime and discharge capability. Therefore, features related to contraction coefficient and energy loss coefficient were extracted. These features were subsequently utilized to construct a high-dimensional feature space based on polynomial theory. In order to streamline the feature space, correlation analysis was employed. Finally, the optimal outcome was determined through the implementation of the best subset regression.

## MATERIALS AND METHODS

### Experimental setup

The experiments comprise measurements of inflow discharge and water levels at various locations upstream and downstream. A triangular weir was utilized to accurately measure the inflow discharge in the experimental setup. The water was pumped out and regulated through the stilling well before passing through the triangular weir. The controlling method of intake flow in the model ensured the precision of the experiment and enhanced the accuracy of the discharge measurements. A total of 24 measurement points were positioned along both the upstream and downstream. Each measurement point was equipped with 0.1 mm precision. Three readings were taken at each point, and the value which exhibited stability was recorded as the water level corresponding to the point. The data collected in the experiment encompassed a wide range of flow rates, gate openings, and tailwater levels. Its aim was to simulate various conditions that could potentially occur during the operation of project. Each experimental group consisted of four sets of tailwater levels, four sets of flow rates, and five sets of gate openings. The tailwater levels for each set are 0.39, 2.20, 3.56, and 4.95 m. The corresponding inflow rates for each set are 200, 500, 875, and 1,100 m^{3}/s. The gate openings for each set are 0.3 m, 0.5 m, 1.0 m, 1.5 m, and fully open. A total of 80 experiments were collected, and 36 sets were identified as suitable for analysing the orifice flow (Table 1).

Run . | Q (m^{3}/s)
. | U_{0} (m/s)
. | H_{0} (m)
. | h_{0} (m)
. | H (m)
. | h (m)
. | D (m)
. | w (m)
. | h_{t} (m)
. |
---|---|---|---|---|---|---|---|---|---|

S-1 | 162 | 0.225 | 3.31 | 2.26 | 4.81 | 3.76 | 90 | 0.5 | 2.2 |

S-2 | 187 | 0.320 | 2.39 | 0.89 | 3.89 | 2.39 | 90 | 0.5 | 0.84 |

S-3 | 200 | 0.219 | 4.59 | 1.41 | 6.09 | 2.91 | 90 | 0.3 | / |

S-4 | 200 | 0.267 | 3.5 | 1.79 | 5 | 3.29 | 90 | 0.5 | / |

S-5 | 200 | 0.333 | 2.5 | 2.08 | 4 | 3.58 | 90 | 1 | / |

S-6 | 200 | 0.357 | 2.24 | 2.08 | 3.74 | 3.58 | 90 | 1.5 | / |

S-7 | 200 | 0.241 | 4.04 | 3.54 | 5.54 | 5.04 | 90 | 1 | 3.56 |

S-8 | 200 | 0.251 | 3.82 | 3.61 | 5.32 | 5.11 | 90 | 1.5 | 3.56 |

S-9 | 200 | 0.256 | 3.71 | 3.61 | 5.21 | 5.11 | 90 | 2 | 3.56 |

S-10 | 200 | 0.267 | 3.49 | 1.43 | 4.99 | 2.93 | 90 | 0.5 | 2.2 |

S-11 | 200 | 0.341 | 2.41 | 1.95 | 3.91 | 3.45 | 90 | 1 | 2.2 |

S-12 | 200 | 0.357 | 2.23 | 2.03 | 3.73 | 3.53 | 90 | 1.5 | 2.2 |

S-13 | 200 | 0.365 | 2.15 | 2.07 | 3.65 | 3.57 | 90 | 2 | 2.2 |

S-14 | 200 | 0.452 | 1.45 | 0.07 | 2.95 | 1.57 | 90 | 0.5 | 0.39 |

S-15 | 200 | 0.617 | 0.66 | 0.35 | 2.16 | 1.85 | 90 | 1 | 0.39 |

S-16 | 351 | 0.434 | 3.89 | 2.21 | 5.39 | 3.71 | 90 | 1 | 2.2 |

S-17 | 463 | 0.744 | 2.65 | 0.89 | 4.15 | 2.39 | 90 | 1 | 0.84 |

S-18 | 500 | 0.543 | 4.64 | 3.13 | 6.14 | 4.63 | 90 | 1.5 | 3.56 |

S-19 | 500 | 0.613 | 3.94 | 1.54 | 5.44 | 3.04 | 90 | 1 | 2.2 |

S-20 | 500 | 0.751 | 2.94 | 1.87 | 4.44 | 3.37 | 90 | 1.5 | 2.2 |

S-21 | 500 | 0.815 | 2.59 | 2.09 | 4.09 | 3.59 | 90 | 2 | 2.2 |

S-22 | 500 | 1.339 | 0.99 | 0.43 | 2.49 | 1.93 | 90 | 1.5 | 0.39 |

S-23 | 585 | 0.700 | 4.07 | 2.23 | 5.57 | 3.73 | 90 | 1.5 | 2.2 |

S-24 | 631 | 0.918 | 3.08 | 0.94 | 4.58 | 2.44 | 90 | 1.3 | 0.84 |

S-25 | 875 | 0.893 | 5.03 | 1.79 | 6.53 | 3.29 | 90 | 1.5 | 2.2 |

S-26 | 875 | 1.162 | 3.52 | 2.18 | 5.02 | 3.68 | 90 | 2 | 2.2 |

S-27 | 875 | 0.854 | 5.33 | 3.23 | 6.83 | 4.73 | 90 | 2 | 3.56 |

S-28 | 1,100 | 1.155 | 4.85 | 3.63 | 6.35 | 5.13 | 90 | 3 | 3.56 |

F-1 | 500 | 0.697 | 3.28 | −0.1 | 4.78 | 1.4 | 90 | 1 | 0.39 |

F-2 | 875 | 0.852 | 5.35 | −0.1 | 6.85 | 1.4 | 90 | 1.5 | 0.39 |

F-3 | 875 | 1.311 | 2.95 | 0.33 | 4.45 | 1.83 | 90 | 2 | 0.39 |

F-4 | 1,100 | 1.497 | 3.4 | 0.68 | 4.9 | 2.18 | 90 | 2.5 | 0.39 |

VS-1 | 461 | 1.229 | 1 | 0.49 | 2.5 | 1.99 | 90 | 1.46 | 0.39 |

VS-2 | 653.3 | 0.831 | 3.74 | 2.24 | 5.24 | 3.74 | 90 | 1.65 | 2.2 |

VF-1 | 653.3 | 1.015 | 2.79 | 0.94 | 4.29 | 2.44 | 90 | 1.46 | 0.84 |

VF-2 | 673 | 0.963 | 3.16 | 0.5 | 4.66 | 2 | 90 | 1.46 | 0.39 |

Run . | Q (m^{3}/s)
. | U_{0} (m/s)
. | H_{0} (m)
. | h_{0} (m)
. | H (m)
. | h (m)
. | D (m)
. | w (m)
. | h_{t} (m)
. |
---|---|---|---|---|---|---|---|---|---|

S-1 | 162 | 0.225 | 3.31 | 2.26 | 4.81 | 3.76 | 90 | 0.5 | 2.2 |

S-2 | 187 | 0.320 | 2.39 | 0.89 | 3.89 | 2.39 | 90 | 0.5 | 0.84 |

S-3 | 200 | 0.219 | 4.59 | 1.41 | 6.09 | 2.91 | 90 | 0.3 | / |

S-4 | 200 | 0.267 | 3.5 | 1.79 | 5 | 3.29 | 90 | 0.5 | / |

S-5 | 200 | 0.333 | 2.5 | 2.08 | 4 | 3.58 | 90 | 1 | / |

S-6 | 200 | 0.357 | 2.24 | 2.08 | 3.74 | 3.58 | 90 | 1.5 | / |

S-7 | 200 | 0.241 | 4.04 | 3.54 | 5.54 | 5.04 | 90 | 1 | 3.56 |

S-8 | 200 | 0.251 | 3.82 | 3.61 | 5.32 | 5.11 | 90 | 1.5 | 3.56 |

S-9 | 200 | 0.256 | 3.71 | 3.61 | 5.21 | 5.11 | 90 | 2 | 3.56 |

S-10 | 200 | 0.267 | 3.49 | 1.43 | 4.99 | 2.93 | 90 | 0.5 | 2.2 |

S-11 | 200 | 0.341 | 2.41 | 1.95 | 3.91 | 3.45 | 90 | 1 | 2.2 |

S-12 | 200 | 0.357 | 2.23 | 2.03 | 3.73 | 3.53 | 90 | 1.5 | 2.2 |

S-13 | 200 | 0.365 | 2.15 | 2.07 | 3.65 | 3.57 | 90 | 2 | 2.2 |

S-14 | 200 | 0.452 | 1.45 | 0.07 | 2.95 | 1.57 | 90 | 0.5 | 0.39 |

S-15 | 200 | 0.617 | 0.66 | 0.35 | 2.16 | 1.85 | 90 | 1 | 0.39 |

S-16 | 351 | 0.434 | 3.89 | 2.21 | 5.39 | 3.71 | 90 | 1 | 2.2 |

S-17 | 463 | 0.744 | 2.65 | 0.89 | 4.15 | 2.39 | 90 | 1 | 0.84 |

S-18 | 500 | 0.543 | 4.64 | 3.13 | 6.14 | 4.63 | 90 | 1.5 | 3.56 |

S-19 | 500 | 0.613 | 3.94 | 1.54 | 5.44 | 3.04 | 90 | 1 | 2.2 |

S-20 | 500 | 0.751 | 2.94 | 1.87 | 4.44 | 3.37 | 90 | 1.5 | 2.2 |

S-21 | 500 | 0.815 | 2.59 | 2.09 | 4.09 | 3.59 | 90 | 2 | 2.2 |

S-22 | 500 | 1.339 | 0.99 | 0.43 | 2.49 | 1.93 | 90 | 1.5 | 0.39 |

S-23 | 585 | 0.700 | 4.07 | 2.23 | 5.57 | 3.73 | 90 | 1.5 | 2.2 |

S-24 | 631 | 0.918 | 3.08 | 0.94 | 4.58 | 2.44 | 90 | 1.3 | 0.84 |

S-25 | 875 | 0.893 | 5.03 | 1.79 | 6.53 | 3.29 | 90 | 1.5 | 2.2 |

S-26 | 875 | 1.162 | 3.52 | 2.18 | 5.02 | 3.68 | 90 | 2 | 2.2 |

S-27 | 875 | 0.854 | 5.33 | 3.23 | 6.83 | 4.73 | 90 | 2 | 3.56 |

S-28 | 1,100 | 1.155 | 4.85 | 3.63 | 6.35 | 5.13 | 90 | 3 | 3.56 |

F-1 | 500 | 0.697 | 3.28 | −0.1 | 4.78 | 1.4 | 90 | 1 | 0.39 |

F-2 | 875 | 0.852 | 5.35 | −0.1 | 6.85 | 1.4 | 90 | 1.5 | 0.39 |

F-3 | 875 | 1.311 | 2.95 | 0.33 | 4.45 | 1.83 | 90 | 2 | 0.39 |

F-4 | 1,100 | 1.497 | 3.4 | 0.68 | 4.9 | 2.18 | 90 | 2.5 | 0.39 |

VS-1 | 461 | 1.229 | 1 | 0.49 | 2.5 | 1.99 | 90 | 1.46 | 0.39 |

VS-2 | 653.3 | 0.831 | 3.74 | 2.24 | 5.24 | 3.74 | 90 | 1.65 | 2.2 |

VF-1 | 653.3 | 1.015 | 2.79 | 0.94 | 4.29 | 2.44 | 90 | 1.46 | 0.84 |

VF-2 | 673 | 0.963 | 3.16 | 0.5 | 4.66 | 2 | 90 | 1.46 | 0.39 |

*Note:* Runs S1-S28 and F1-F4 correspond to submerged and free flow, respectively; VS1-VS2 and VF1-VF2 correspond to submerged and free flow used for validation, respectively. *Q* is discharge; is upstream velocity; *w* is gate opening; is upstream water level; is downstream water level; *H* is the upstream head over the weir; *h* is the downstream head over the weir; *D* is the net width of the gate; is tailgate water level.

### Discharge calculation

*et al.*2021):

It is seen from Equations (1)–(5) that the calculation of flow discharge depends on the discharge coefficient, gate opening, net width of the gate and upstream head over the weir, and the effective parameters on the discharge coefficient consist of the contraction coefficient , the energy loss coefficient and other dimensionless variables like , and . The determination of the contraction and energy loss coefficients can be achieved by employing constrained optimization algorithms. Consequently, two calibration methods for the discharge coefficient emerge: one involves the direct calibration of the discharge coefficient, and the other entails the separate calibration of contraction and energy loss coefficients, calculating the discharge coefficient using Equation (2). This study adopts the former method to develop a calculation model of the discharge coefficient in both free and submerged flow conditions, considering the simplification of calculations and ease of utilization.

### Regression analysis

The regression analysis method involves extracting initial features from the physical processes of flow through the sluice gate, constructing of high-dimensional features, utilizing of correlation analysis, and best subset regression. Correlation analysis was chosen to streamline the feature space, ultimately enhancing reliability of the model and reducing the computing time. The final step entails constructing an optimal discharge coefficient model using the best subset regression approach. The stepwise procedure is outlined as follows:

- (1)
The selection of the initial features is based on the physical significance of the discharge coefficient equation, specifically focusing on the energy loss coefficient and the contraction coefficient. The Pi-theorem was employed to establish dimensionless features. Furthermore, correlation analysis and collinearity diagnosis were performed to identify relationships among the features and streamline the feature space.

- (2)
The dimensionless features were then utilized to construct high-dimensional features, which include higher-order and interaction terms. Subsequently, the correlation coefficients were computed between new features and the discharge coefficient. The discharge coefficient is the dependent variable, while the other features are independent variables. If the absolute value of the correlation coefficient between features is greater than 0.9, the feature which has the higher correlation with the discharge coefficient would be retained, while the remaining features would be removed.

- (3)
The regression model was derived using a best subset regression, which involves considering all possible combinations of predictor variables. Various criteria such as the Akaike information criterion and

*R*^{2}number were employed to assess the quality of each model. These criteria were computed for each model, and one with the most favourable values was selected as the optimal discharge coefficient model. Furthermore, computed discharge values and residuals were compared with the findings of previous studies to evaluate the accuracy and practicality of the proposed method.

### Theory background

#### Preprocessing

Preprocessing the raw data can improve data quality and enhance a model performance (Habib *et al.* 2023; Yeganeh-Bakhtiary *et al.* 2023). Data preprocessing typically involves data cleaning and feature engineering, aiming to improve the predictive performance of features and provide faster and more cost-effective features that help understand the underlying model generation process. In this study, initial features are extracted based on existing research and physical processes, and dimensionless features were constructed using theorems.

#### Correlation analysis

*r*is as follows:where , represent the feature;

*n*is the sample size; , represent the sample means of the two variables.

The correlation coefficient, denoted as *r*, is a measure that ranges from −1 to 1. A value closer to 1 or −1 indicates a stronger linear relationship between the variables, while a value closer to 0 suggests a weak linear relationship. This allows for the identification and removal of inefficient features that exhibit high linear correlation.

#### High-dimensional feature

*n*is the sample size; , represent polynomial coefficient.

#### Best subset regression

The best subset regression method is a commonly utilized approach in the field of predictive modeling. It entails the creation of models for every possible combination of features and the subsequent selection of the most effective model based on its predictive performance. However, the computing demands of the method grow exponentially as the number of features increases, approximately times. Due to the exponential increase in computing time for best subset regression with an increasing number of features, it is necessary to perform a correlation analysis to identify and remove inefficient features that exhibit high linear correlation. By doing so, a more efficient regression analysis can be performed on the remaining subset of variables. This will help reduce the computational burden and improve the efficiency of the regression analysis on the entire subset.

#### Performance evaluation of models

In the process of model production, the Akaike Information Criterion , the condition number , coefficient of determination (*R*^{2}), and the root mean square percentage error were set as evaluation criteria.

*K*is the number of parameters to be estimated in the model;

*L*is the likelihood function.

is a measure that penalizes the number of parameters in a model by introducing a penalty term of . It is used to control the risk of overfitting when the differences in likelihood functions are not significant. A lower value indicates a lower likelihood of overfitting. In general, the model with the minimum value among multiple models is considered to have the best fit, and this approach is known as the minimum estimation method.

*R*

^{2}, also known as the coefficient of determination, is a descriptive statistical measure that quantifies the goodness of fit of a model to the observed data. It represents the proportion of the variability in the dependent variable that can be explained by the model. The formula is:where denotes the observed values; represents the fitted values; represents the mean value of the observed values.

The *R*^{2} statistic serves as a measure of the model's ability to account for the variation in the observed data. The range of *R*^{2} is between 0 and 1. When *R*^{2} is closer to 1, it indicates that the model can effectively explain the variability in the observed data, demonstrating high feature coverage and goodness of fit.

### Selection of parameters and dimensional analysis

During the process of feature selection, it is crucial to conduct the evaluations and selections that accurately capture the intrinsic characteristics of the hydraulic system. The model's accuracy, interpretability, and reliability can be significantly enhanced by incorporating relevant features. Therefore, it is necessary to understand the physical role played by the discharge coefficient when water flows through a gate. It is possible to select features closely related to the research objective based on the understanding of physical characteristics, thereby improving the performance of the model. It is important to ensure that these features can be generalized to other scenarios as well.

*et al.*2011; Castro-Orgaz

*et al.*2013). In contrast to free flow, submerged flow demonstrates diminished energy loss, thereby supporting the utilization of Reynolds number downstream as a potential means to characterize these phenomena. Therefore, it is advisable to consider Reynolds number at the gate opening and in the downstream channel as the initial features of energy loss coefficient

*k*:

*et al.*2009), lip shape, gate type, and water depth of the approaching channel. Consequently, the contraction coefficient may be influenced by geometric features such as channel width

*b*and gate opening

*w*, as well as the flow conditions. In empirical techniques, the downstream depth often carries greater significance. Besides the geometric factors, the contraction coefficient varies with the relative opening of the gate at different submergence conditions. The water level difference between upstream and downstream sections is also utilized due to the impact of downstream in a tidal river. As a result, the statement might be interpreted as:

*et al.*2021), the final expression can be formulated as follows:

## RESULTS

### High-dimensional features and correlation analysis

A careful examination with the threshold of 0.9 on the correlation coefficient of features was conducted based on the findings presented in Figure 2(a). The feature that has the highest correlation with the discharge coefficient would be retained, while the remaining features would be removed. Subsequently, features with low correlation to the were selected. As a result of assessment, was excluded and the remaining dimensionless features are , , , , , , , , . The higher-order terms and interaction terms were conducted based on the filtering features using Equation (7). A total of 47 features were obtained, correlation coefficients were then computed, and a part of the result is shown in Figure 2(b). In order to optimize the efficacy of the features, shorten the calculation time, ensure model coverage, and mitigate concerns about collinearity, it becomes important to pre-select a specific subset of features. This can be achieved by sequentially retaining features that exhibit the highest correlation with , while simultaneously eliminating other features with correlation coefficient exceeding 0.9. The outcome of feature selection process is depicted in Figure 2(c).

Figure 2(c) illustrates the 19 remaining features: , , , ,, , , , , , , , , , , , , . Specific features like demonstrate a strong correlation of 0.91 with . To enhance the search space optimization, the explanatory method of feature correlation selection was employed, replacing conventional heuristic methods typically utilized in general feature selection algorithms. This approach effectively mitigates the influence of the initial model and the selection order on the algorithm. As a result, the best subset regression method can be employed to conduct regression analysis.

### Calculation of discharge coefficient and flow rate

No. . | AIC . | Cond. . | R^{2}. | RMSPE% . |
---|---|---|---|---|

Features . | ||||

1 | −138.8 | 96.1 | 0.964 | 5.3 |

2 | −136.9 | 99.8 | 0.957 | 10.5 |

3 | −136.4 | 97.7 | 0.960 | 8.4 |

No. . | AIC . | Cond. . | R^{2}. | RMSPE% . |
---|---|---|---|---|

Features . | ||||

1 | −138.8 | 96.1 | 0.964 | 5.3 |

2 | −136.9 | 99.8 | 0.957 | 10.5 |

3 | −136.4 | 97.7 | 0.960 | 8.4 |

No. . | AIC . | Cond . | R^{2}. | RMSPE% . |
---|---|---|---|---|

Features . | ||||

1 | −138.8 | 96.1 | 0.964 | 5.3 |

2 | −134.9 | 99.1 | 0.960 | 7.0 |

3 | −125.5 | 83.9 | 0.945 | 7.0 |

No. . | AIC . | Cond . | R^{2}. | RMSPE% . |
---|---|---|---|---|

Features . | ||||

1 | −138.8 | 96.1 | 0.964 | 5.3 |

2 | −134.9 | 99.1 | 0.960 | 7.0 |

3 | −125.5 | 83.9 | 0.945 | 7.0 |

The validation of Equation (18) and calculation outcome of discharge are shown in Figure 3. In Figure 3, black dots represent the observed values of the discharge coefficient, while coloured dots represent the fitted values of the discharge coefficient. The colour of the dots corresponds to the magnitude of the observed flow rate, and the size of the dots represents the relative error in flow rate calculation. The left axis scale represents the distribution range of the discharge coefficient. Equation (18) was validated by two sets of free flow (VF-1, VF-2) and submerged flow (VS-1, VS-2) and the discharge validation result is shown in Figure 3(a). The value for the discharge is 0.991 and is 5.32%, indicating a high accuracy and good fitting of equation. Other fitted results are shown in Figure 3(b)–3(d). The for discharge coefficient is 0.964, with found as 5.28% and the value in discharge coefficient of free-submerged flow is 0.993 and the is 5.04%. In the fitted results, 75.0% of the data have an error within 5%, and 91.7% of the data has an error within 10%. It is worth noting that when the discharge coefficient value is relatively large, errors ranging from 5 to 15% occur more frequently. The for discharge calculation is 0.950, with found as 7.46% in free flow while the is 0.999 and is 5.01% in submerged flow. This outcome indicates that the model can be more suitable to submerged flow than free flow, and the calculation of both flow conditions can obtain high accuracy.

The calculation method for free flow has reached a high level of maturity, whereas research pertaining to submerged flow is still in progress. In order to evaluate the accuracy of the free-submerged flow model proposed, the present study compares recent research findings on submerged flow with the outcomes obtained from the free-submerged calculation, which is presented in Table 4. Table 4 presents the research results of different calibration methods for submerged flow in the past two years, Shakouri *et al.*(2023) utilized the experimental data from Sepúlveda and Bijankhan *et al.* in their study (Toepfer 2008; Bijankhan *et al.* 2017; Shakouri *et al.* 2023). They employed SR as the algorithm method, and the calibration approach in Scenario 1 (Sc.1) was as follows: values of *k* and were computed using a constrained optimal algorithm, and the calibration models of *k* and were established using SR, respectively. The was then calculated using Equation (2), and the flow rate was determined using Equation (1). The results indicate that the is 0.992 and is 5.71% in the discharge calculation. For Scenario 2, the method was direct calibrating the coefficient. The flow rate was then calculated using Equation (1). The final results produced a value of 0.991 for and a value of 5.82% for . Vaheddoost *et al.* (2021) utilized the experimental data from Sepúlveda and Rajaratnam and employed the adaptive spline method for *k* calibration (Rajaratnam & Subramanya 1967; Toepfer 2008; Vaheddoost *et al.* 2021). Based on the calibrated *k* values, the value of was calculated using an empirical relationship formula, followed by computation of the discharge coefficient and flow rate. The was found to be 0.986, and the was 3.10%.

Flow condition . | Method . | R^{2}
. | RMSPE, %
. | Details . |
---|---|---|---|---|

Free-submerged | This study | 0.993 | 5.04 | C_{d}: CalibrationQ: Equation (1) |

Free | 0.950 | 7.46 | ||

Submerged | 0.999 | 5.01 | ||

Submerged | Shakouri et al. (2023) Sc.1 | 0.992 | 5.71 | k, C_{c}: CalibrationC_{d}: Equation (2)Q: Equation (1) |

Submerged | Shakouri et al. (2023) Sc.4 | 0.991 | 5.82 | C_{d}: CalibrationQ: Equation (1) |

Submerged | Vaheddoost et al. (2021) Sc.1 | 0.986 | 3.10 | k, C_{c}: CalibrationC_{d}: Equation (1)Q: Equation (1) |

Flow condition . | Method . | R^{2}
. | RMSPE, %
. | Details . |
---|---|---|---|---|

Free-submerged | This study | 0.993 | 5.04 | C_{d}: CalibrationQ: Equation (1) |

Free | 0.950 | 7.46 | ||

Submerged | 0.999 | 5.01 | ||

Submerged | Shakouri et al. (2023) Sc.1 | 0.992 | 5.71 | k, C_{c}: CalibrationC_{d}: Equation (2)Q: Equation (1) |

Submerged | Shakouri et al. (2023) Sc.4 | 0.991 | 5.82 | C_{d}: CalibrationQ: Equation (1) |

Submerged | Vaheddoost et al. (2021) Sc.1 | 0.986 | 3.10 | k, C_{c}: CalibrationC_{d}: Equation (1)Q: Equation (1) |

In this study, a direct calibration of discharge coefficient in both free and submerged flow was employed. This method resulted in a value of 0.993 for , which is the highest among the four methods considered, and the is 5.04%, which is slightly lower than the results obtained for the second scenario in Vaheddoost's study. Overall, the proposed model in this study demonstrates relative stability in terms of error accuracy, with a few data points exhibiting relative errors that exceeded 10%. The current state of affairs presents opportunities for further advancements. In fact, the calculation of submerged flow obtained a value of 0.999 for and 5.01% for . The result demonstrates that the model is quite suitable to submerged gate flow with a broad-crested weir. Although the of free flow is only 0.950, the is 7.46% within the permitted error extent. Besides, the number of terms in Equation (18) is not large, and the mapping relationships are relatively simple. The process of feature selection and calculation is not complicated and has the potential for automated calibration. Therefore, it can be concluded that the proposed method for calculating the free-submerged flow discharge through the gate with a broad-crested weir is not only conducive to practical applications but also demonstrates a high level of computational accuracy.

## DISCUSSION

The discharge coefficient is determined by initial features , , , , , , , , which were then utilized to construct high-dimensional features. As a result of knowing function, the flow discharge can be determined by Equation (1) in the vertical sluice gate with broad-crested weir for both submerged and free flow circumstances. The evolution and innovation of algorithms have greatly contributed to the improvement of accuracy in flow calculations. Simultaneously, the subcritical phenomena and air entrainment regarding flow through gates has remained an enduring and classical area of study in modern hydraulic problems. However, the generalization ability of model merits more research in dimensionless parameter or functional relationship.

During the process of comparing with research results from the past two years, Equations (18) is not suitable to data collected by Shakouri *et al.* (2023) and Vaheddoost *et al.* (2021), and similarly, the discharge coefficient equations produced by Shakouri *et al.* (2023) and Vaheddoost *et al.* (2021) are not suitable to experimental data in this paper. One of the reasons is a significant disparity in the magnitude of the ratio of gate opening and channel width . The disparity arises from the differences in geometric features between irrigation project and shipping hubs, or other hydraulic structure. Therefore, the method of constructing dimensionless parameters deserves further improvement to extract the geometric features of different types of hydraulic engineering.

The algorithm and formula form need to be carefully chosen from the perspective of application, and the EM method requires further improvement. The formula of expressed by and *k* is quite complicated in the EM method and the EM method has its own condition of application. The calculation process for is overly complicated based on the algorithm-derived calibration of and *k*. The multitude terms and intricate mapping relationship would limit the generalization ability and reliability of the formula, although it can achieve a relatively high level of accuracy in discharge prediction.

The results obtained in this study are limited to a single experimental dataset. To establish a more reliable model, it is suggested that new experiments be conducted to expand the range of the data. Application of big data, reasonable feature engineering and evolutionary machine learning algorithms are recommended as a future research direction in terms of modeling.

## CONCLUSIONS

The accurate calculation of flow rate is crucial for a broad range of applications, and the sluice gate with broad-crested weir is one of the most commonly used structures in flow control and regulation. While there are well-established methods for calculating the flow rate under free-flow conditions and extensive research on submerged flow conditions, there is a lack of methods that can be applied to both free and submerged flow conditions. In this study, we conducted physical model experiments and developed a discharge coefficient calculation model using the EM method, construction of high-dimensional features, correlation analysis, and best subset regression. The analysis yielded the following conclusions:

- (1)
This paper provides a discharge measurement model for both free and submerged flow in sluice gates with a broad-crested weir, which is based on the direct calibration of discharge coefficient. The EM method, high-dimensional feature construction, correlation analysis, and best subset regression were utilized in the model.

- (2)
The downstream channel Reynolds number and the water level difference between upstream and downstream were introduced in the model as initial features to enhance the information coverage for explaining the discharge coefficient. The improvement of feature selection is reflected in the final calculation equation, which means the two dimensionless parameters can be considered as effective features in the future research.

- (3)
The results of the model indicate a high level of accuracy. The for discharge coefficient is 0.964 and the is 5.30%. In discharge calculation, the reaches 0.993 and the is 5.04% for free-submerged flow. Considering the single flow regime, the calculation of submerged flow obtained a value of 0.999 for and 5.01% for , while the of free flow is 0.950 and the is 7.46%. The error of free flow prediction is not good as submerged flow but is still in permitted error extent.

- (4)
The process of feature selection and calculation is not complicated and has potential for automated calibration. The accuracy of the model is comparable to recent studies on submerged flow in the past few years.

- (5)
The results obtained in this study are constrained to a single data set. More experiments on free flow can be conducted to establish a more reliable free-submerged flow model. New data sets about different types of hydraulic engineering with different geometric features can be collected to enhance the reliability and generalization ability of the model. Besides, feature engineering and evolutionary algorithms are recommended as the directions for future research.

## FUNDING

The work presented in this paper is financially supported by the National Key R&D Program of China (Grant No. 2023YFC3209501) and National Natural Science Foundation of China (Nos U2040215 and 52079080).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Instrumentation, Model Identification and Control of an Experimental Irrigation Canal*(Doctoral Thesis)

*Wassersprung, Deckwalze und Ausfluss Unter Einer Schütze*