ABSTRACT
This study investigates the discharge coefficient (Cd) of labyrinth sluice gates, a modern gate design with complex flow characteristics. To accurately estimate Cd, regression techniques (linear regression and stepwise polynomial regression) and machine learning methods (gene expression programming (GEP), decision table, KStar, and M5Prime) were employed. A dataset of 187 experimental results, incorporating dimensionless variables of internal angle (θ), cycle number (N), and water depth contraction ratio (H/G), was used to train and evaluate the models. The results demonstrate the superiority of GEP in predicting Cd, achieving a coefficient of determination (R2) of 97.07% and a mean absolute percentage error of 2.87%. To assess the relative importance of each variable, a sensitivity analysis was conducted. The results revealed that the H/G has the most significant impact on Cd, followed by the internal head angle (θ). The cycle number (N) was found to have a relatively insignificant effect. These findings offer valuable insights into the design and operation of labyrinth sluice gates, contributing to improved water resource management and flood control.
HIGHLIGHTS
The discharge coefficient of labyrinth sluice gates was estimated.
Regression techniques (linear regression and stepwise polynomial regression) and machine learning methods (gene expression programming [GEP], decision table, KStar, and M5Prime) were employed.
The results demonstrate the superiority of GEP in predicting Cd, achieving a coefficient of determination (R2) of 97.07% and a mean absolute percentage error of 2.87%.
INTRODUCTION
A sluice gate is an essential hydraulic structure due to its multiple functions, such as flow measurement, flood control, irrigation and industrial distribution, and navigation management. It is vital in various facilities, including canals, rivers, dams, and wastewater treatment. Furthermore, it is decisive because it can accurately control water flow in different conditions. The planar wall path formed against the flow direction is a standard configuration of traditional sluice gates. However, various attempts to optimize their design performance are based on the hydraulic characteristics investigations of sluice gates having diverse configurations at different flow conditions (Shivapur et al. 2005; Mohammed & Khaleel 2013; Mansoor 2014; Salmasi & Abraham 2020a; Wang & Diao 2021; Daneshfaraz et al. 2022; Abbaszadeh et al. 2023). The discharge coefficient must be determined to assess such structures' hydraulic performance. The discharge coefficient is a crucial and indispensable indicator and is considered an equivalent criterion on which optimization is based.
In recent years, a novel design known as the labyrinth shape was adopted for various hydraulic structures such as weirs, spillways, and sluice gates by suggesting a longer flow path than the traditional form; thereby, the discharge coefficient, discharge capacity, and energy dissipation will increase. The labyrinth design provides valuable benefits (Sadeq Maatooq & Yaseen Ojaimi 2014; Alfatlawi et al. 2023; Daneshfaraz et al. 2023; Hashem et al. 2024). As mentioned earlier, the discharge coefficient is crucial in optimizing the hydraulic design of hydraulic structures and must be estimated precisely, particularly for complex configurations and labyrinth forms. Conventional methods, such as theoretical and experimental techniques, have limited applicability due to the efforts involved, are time-consuming, and are costly (Silva & Rijo 2017; Lauria et al. 2020; Salmasi & Abraham 2020a; Steppert et al. 2021; Wang & Diao 2021; Salmasi et al. 2022; YoosefDoost & Lubitz 2022; Abbaszadeh et al. 2023; Hashem et al. 2024). Moreover, these techniques may make it difficult to capture details of intricate flow patterns and the distinct hydraulic behaviors displayed by structures, especially those with complex configurations.
Recently, machine learning (ML) approaches have detonated in various fields, including hydraulic engineering, offering thumbing opportunities to solve complex problems owing to their innate capacity for data training and understanding interaction patterns. Consequently, complex and nonlinear systems become accessible to simulate based on the efficiency of ML tools. ML contributes a promising way to build precise and supple models that can estimate discharge coefficients and other main hydraulic characteristics (Silva & Rijo 2017; Lauria et al. 2020; Salmasi & Abraham 2020a; Steppert et al. 2021; Wang & Diao 2021; Salmasi & Abraham 2022; YoosefDoost & Lubitz 2022; Abbaszadeh et al. 2023; Fatehi-Nobarian et al. 2023; Mohammed & Sharifi 2023; Fatehi-Nobarian & Moradinia 2024; Hashem et al. 2024; Mohammed & Sihag 2024).
Various ML techniques have been actively investigated in recent years based on previous studies employing genetic programming for expert systems to predict the coefficient of discharge and other hydraulic properties of sluice gates (Salmasi & Abraham 2020b). Many ML algorithms, including gradient boosting machine, random forest, generalized linear models, generalized regression neural network (GRNN), Gaussian process regression, and random tree, have been studied by researchers (Ghorbani et al. 2020; Salmasi et al. 2020). For instance, Naisheng et al. (2021) analyzed ice entrainment through a slide gate using a stacking ensemble model, which combined a support vector machine for classification and the principal component analysis method for dimensionality reduction. Using a long short-term memory (LSTM) neural network, Ho et al. (2022) indicated that their method outperformed more basic classifiers by carefully choosing pertinent input features and discarding unnecessary data. They reported that, in order to maximize the operation efficiency of sluice gates, the forecasted water levels over several time steps based on LSTM models in the short term were also investigated by their work, and accurate outcomes were obtained. Yan et al. (2023) reported that convolutional neural networks (CNNs) outperform other conventional techniques, such as genetic programming and neuro-fuzzy interference systems, achieving coefficient determination R2 up to 90% with minimized computational expense. Further studies have focused on ML model optimization. A backpropagation (BP) neural network created a flow prediction model for a measurement and control gate is adopted by Zheng et al. (2023). Compared to conventional techniques and unoptimized BP models, they improved prediction accuracy and error distribution by incorporating optimization algorithms such as particle swarm optimization and genetic algorithm.
The latest, several combinations of ML techniques have been developed; for instance, to estimate the coefficient of discharge concerning labyrinth weirs, a hybrid model combines eXtreme Gradient Boosting and a novel optimization algorithm (Linear Population Size Reduction Success-History-Based Adaptive Differential Evolution (LSHADE)) was employed (Emami et al. 2023). They demonstrated that the merged model outperforms other techniques such as artificial neural networks, gene expression programming (GEP), adaptive neuro-fuzzy inference system with firefly algorithm, and self-adaptive evolutionary extreme learning machine with the possibility of integrating various ML techniques to improve accuracy in applications concerning hydraulic engineering.
This study is concerned with obtaining the critical indicator known as the coefficient of discharge of a labyrinth sluice gate based on various ML techniques and regression models. The main objective was to introduce a reliable model that could accurately capture the discharge coefficient for this nonstandard structure. To this aim, various statistical metrics were used to thoroughly assess each ML model's performance.
MATERIALS AND METHODS
Experimental model
Channel length (m) . | Channel width (B) (m) . | Channel height (m) . | Internal angle (θ) (degree) . | US water head H (cm) . | Gate opening height G (cm) . |
---|---|---|---|---|---|
10 | 0.3 | 0.5 | 45°–90° | 7.2–45 | 2–5 |
Sluice gate discharge coefficient Cd | Number of labyrinth cycles N | Projection length of the labyrinth section l | Discharge rate Q (L3/T) | Total folded length L (m) | |
0.547–1.343 | 1–2 | 0.3–0.7839 | 5.7–26.1 | 0.3 |
Channel length (m) . | Channel width (B) (m) . | Channel height (m) . | Internal angle (θ) (degree) . | US water head H (cm) . | Gate opening height G (cm) . |
---|---|---|---|---|---|
10 | 0.3 | 0.5 | 45°–90° | 7.2–45 | 2–5 |
Sluice gate discharge coefficient Cd | Number of labyrinth cycles N | Projection length of the labyrinth section l | Discharge rate Q (L3/T) | Total folded length L (m) | |
0.547–1.343 | 1–2 | 0.3–0.7839 | 5.7–26.1 | 0.3 |
Dimensionless parameters
Various factors interact to determine the flow characteristics beneath a sluice gate, measured by discharge rate Q (L3/T) and flow velocity V (L/T). The total folded length L (L), the number of dimensionless labyrinth cycles N, the projection length of the labyrinth section l (L), and the opening height G (L) are among the key characteristics of the gate's geometry. Further factors are the fluid properties, including the water density ρ (M/L3) and the upstream water conditions, which are represented by the upstream water head H (L). The effect of combined parameters to ensure an accurate prediction and control flow beneath sluice gates must be taken into account.
Cd is the labyrinth sluice gate coefficient of discharge, g is the gravity acceleration (LS−2), and others were indicated initially.
Data collection
NO. . | Indicator . | Angle (θ) . | N . | H/G . | Cd . |
---|---|---|---|---|---|
1. | Maximum value | 1.57 | 2 | 22.5 | 1.343764 |
2. | Minimum value | 0.785 | 1 | 2.075 | 0.547681 |
3. | Average | 1.081649 | 1.481283 | 7.060918 | 0.803087 |
4. | Standard deviation | 0.31623 | 0.500991 | 4.61159 | 0.176994 |
NO. . | Indicator . | Angle (θ) . | N . | H/G . | Cd . |
---|---|---|---|---|---|
1. | Maximum value | 1.57 | 2 | 22.5 | 1.343764 |
2. | Minimum value | 0.785 | 1 | 2.075 | 0.547681 |
3. | Average | 1.081649 | 1.481283 | 7.060918 | 0.803087 |
4. | Standard deviation | 0.31623 | 0.500991 | 4.61159 | 0.176994 |
Review of regression methods and ML techniques
This study investigated several regression and ML techniques to find the best model for predicting Cd. The techniques used were linear regression (LR), Symbolic polynomial regression (SPR), GEP, KStar (K*), M5Prime (M5P), and decision table (DT), as indicated in Table 3. Three input features were used for the analysis, as described earlier.
Categories . | Method . | Abbreviation . |
---|---|---|
Regression | Linear regression | LR |
Regression | Symbolic polynomial regression | SPR |
Gene algorithms | Gene expression programming | GEP |
Rules algorithms | Decision table | DT |
Lazy algorithms | KStar | K* |
Trees algorithms | M5Prime | M5P |
Categories . | Method . | Abbreviation . |
---|---|---|
Regression | Linear regression | LR |
Regression | Symbolic polynomial regression | SPR |
Gene algorithms | Gene expression programming | GEP |
Rules algorithms | Decision table | DT |
Lazy algorithms | KStar | K* |
Trees algorithms | M5Prime | M5P |
LR model
Stepwise polynomial regression model
This method, called Symbolic polynomial regression (SPR), builds equations to describe relationships between data points. It starts simple and then refines the equation by adding or removing factors based on how much they improve the fit. This step-by-step process helps identify the key factors that most influence the outcome. There are different approaches within SPR, but popular ones involve adding or removing one factor at a time, either focusing on adding important ones first (forward selection) or removing unimportant ones first (backward removal). Some approaches even combine these techniques (Flom & Cassell 2007); while forward selection and backward elimination are common in SPR, they have drawbacks. They do not consider how adding or removing one factor might affect other factors already included. For example, a seemingly important factor chosen first in forward selection might become less significant as more factors are added.
Similarly, a factor removed early in backward elimination might be relevant when others are excluded. To address this limitation, SPR utilizes a special forward selection process. At each step, it checks the significance of all previously chosen factors. If any no longer meet certain criteria, the method switches to backward elimination, removing variables one by one until all remaining ones are statistically significant. Then, it switches back to forward selection to find potentially important factors again. This back-and-forth approach helps SPR build a more robust model by constantly reevaluating the influence of each factor in the presence of others (Neter 1983).
GEP model
GEP, introduced by Ferreira as an improvement on genetic algorithms, offers a faster route to finding solutions (de Almeida Peres et al. 2011). Unlike older methods, GEP demonstrably converges on solutions quicker in experiments. This transparency is a key advantage, as GEP directly manipulates the program's building blocks (chromosomes). GEP works iteratively: first, generating random program instructions (chromosomes). These instructions are then converted into complex tree structures representing the actual programs. Each program is evaluated for its performance (fitness), and successful ones are chosen to create new variations (offspring) with potentially improved performance through genetic operations. This cycle continues for several generations or until a program with the desired outcome is found (Ferreira 2004).
M5Prime (M5P) (M5 prime decision tree regression algorithm)
M5P trees, developed by Quinlan (1992), offer a unique approach to regression problems. They build a decision tree where each branch analyzes a specific data segment using a simple LR model. Unlike traditional methods that require pre-dividing data, M5P trees can handle continuous values and complex relationships within the data itself. This allows them to discover even subtle patterns. M5P employs a two-step process to prevent overly complex trees: first, building a giant tree and then pruning it by replacing sections with simpler LR models. M5P trees are, therefore, an effective tool for handling regression tasks.
KStar (K*) (instance-based classifier algorithm)
Traditional instance-based classifiers have limitations that are overcome by the K* method when used for regression tasks (Steppert et al. 2021). It determines the degree of similarity between data points by utilizing a special metric based on entropy, which expresses the likelihood of changing one point into another. This method handles symbolic data, real numbers, and even missing values – all frequent problems in practical applications – well. K* provides a theoretically sound and consistent data analysis method, making it a valuable tool for regression problems, in contrast to other methods that struggle with these complexities (Painuli et al. 2014).
Decision table
In order to determine the most likely class for newly collected data, it describes how DTs produce labeled training examples. The prelabelled examples with their corresponding labels and the conditions that define the classification criteria are essential components of a DT. Based on these criteria, the DT looks for an exact match when classifying a new, unlabeled instance. The label of the corresponding instance is assigned if a match is found. In the event that no match is discovered, the DT designates the table's most prevalent class. The essential idea of error estimation is that DTs use labeled examples and predefined conditions to predict labels for new data, even though the mathematical concepts involved are significant (Ayati et al. 2019).
In summary, the choice of ML methods GEP, DT, K*, and M5P for evaluating the coefficient of discharge of a labyrinth sluice gate was likely due to their ability to handle complex relationships, noise, and the need for interpretable models. These methods were selected based on their specific characteristics and advantages, making them suitable for the given application.
Statistical assessment
RESULTS AND DISCUSSION
Derivative LR and SPR models
No. . | Term . | Coefficient . | p-value . |
---|---|---|---|
1. | Constant | 0.9776 | 0 |
2. | Θ | −0.2977 | 0 |
3. | N | −0.0368 | 0.025 |
4. | H/G | 0.02841 | 0 |
No. . | Term . | Coefficient . | p-value . |
---|---|---|---|
1. | Constant | 0.9776 | 0 |
2. | Θ | −0.2977 | 0 |
3. | N | −0.0368 | 0.025 |
4. | H/G | 0.02841 | 0 |
Equation (15) establishes a nonlinear relationship to describe the response variable concerning the three investigated factors. The mathematical framework presented in Equation (15) evinces a substantial influence of a majority of the investigated variables on the coefficient of discharge, Cd. As shown in Table 5, the linear terms of and H/G significantly affect Cd, with p-values less than 0.05%, whereas the variable N has an insignificant effect on Cd, indicated by a p-value greater than 0.05%. Moreover, the quadratic terms for and H/G also show a statistically significant impact on Cd, suggesting these variables alone substantially influence the response. Additionally, the interaction between and H/G is statistically significant for Cd, while the interaction between N and H/G is not, as indicated by a p-value greater than 0.05%.
No. . | Terms . | Coefficient . | p-value . | |
---|---|---|---|---|
1. | Constant | 0.8361 | 0.000 | |
2. | Linear | Θ | −0.543 | 0.000 |
3. | N | −0.0155 | 0.295 | |
4. | H/G | 0.10224 | 0.000 | |
5. | Square | θ × θ | 0.2033 | 0.001 |
6. | H/G × H/G | −0.001669 | 0.000 | |
7. | Interactions | θ × H/G | −0.03884 | 0.000 |
8. | N × H/G | −0.00271 | 0.136 |
No. . | Terms . | Coefficient . | p-value . | |
---|---|---|---|---|
1. | Constant | 0.8361 | 0.000 | |
2. | Linear | Θ | −0.543 | 0.000 |
3. | N | −0.0155 | 0.295 | |
4. | H/G | 0.10224 | 0.000 | |
5. | Square | θ × θ | 0.2033 | 0.001 |
6. | H/G × H/G | −0.001669 | 0.000 | |
7. | Interactions | θ × H/G | −0.03884 | 0.000 |
8. | N × H/G | −0.00271 | 0.136 |
Note. Bold value indicates insignificant effect.
Derivative of the GEP model
The target value for fitness case j is Tj, the range of selection is M, Ct is the total number of fitness cases, and the value returned by chromosome i for fitness case j is and the precision is a zero value if (the precision) is less than or equal to 0.01. Note that the system is capable of determining its ideal solution when given this type of fitness function (Ciftci et al. 2009). The second crucial step involves selecting the set of terminals (T) and the set of functions (F) to construct the chromosomes. The terminal set comprises the independent variables, namely T = {θ}, T = {N}, T = {H/G} and T = {θ, N, H/G}. On the other hand, ascertaining the optimal function set presents a more intricate challenge. However, a well-founded estimation can be employed to encompass all essential functions as documented in the reference (de Almeida Peres et al. 2011). In this context, the function set is deliberately chosen to incorporate four fundamental arithmetic operators (‘ + ’ addition, ‘ − ’ subtraction, ‘ × "multiplication, and ‘ ÷ ’ division) alongside a selection of basic mathematical functions: exponentiation (X2, X3, X4, X5), square root (Sqrt), power of 10 (Pow10), natural logarithm (Ln), base-10 logarithm (Log), and cubic root (3Rt).
Selecting the chromosomal tree's structure, namely the length of the head and the quantity of genes, is the third and most important step. At first, two distinct head lengths and a single gene were used in the GEP model. Then, as each model's training and testing results were tracked, the number of genes and head lengths gradually increased in each run. The study's number of genes and head length were ascertained after multiple trials, as shown in Table 6.
NO. . | Parameter definition . | GEP . |
---|---|---|
1. | Function set | +, −, *, /, X2, X3, X4, X5, EXP, Sqrt, Pow10, Ln, Log, 3Rt |
2. | Number of chromosomes | 64 |
3. | Head size | 7 |
4. | Number of genes | 3 |
5. | Linking function | Addition |
6. | Generation without change | 2,000 |
7. | Number of tries | 3 |
8. | Max complexity (Genes) | 5 |
9. | Mutation rate | 0.00138 |
10. | Inversion rate | 0.00546 |
11. | One-point recombination rate | 0.00277 |
12. | Two-point recombination rate | 0.00277 |
13. | Gene recombination rate | 0.00277 |
14. | Gene transposition rate | 0.00277 |
NO. . | Parameter definition . | GEP . |
---|---|---|
1. | Function set | +, −, *, /, X2, X3, X4, X5, EXP, Sqrt, Pow10, Ln, Log, 3Rt |
2. | Number of chromosomes | 64 |
3. | Head size | 7 |
4. | Number of genes | 3 |
5. | Linking function | Addition |
6. | Generation without change | 2,000 |
7. | Number of tries | 3 |
8. | Max complexity (Genes) | 5 |
9. | Mutation rate | 0.00138 |
10. | Inversion rate | 0.00546 |
11. | One-point recombination rate | 0.00277 |
12. | Two-point recombination rate | 0.00277 |
13. | Gene recombination rate | 0.00277 |
14. | Gene transposition rate | 0.00277 |
Derivative ML models
Table 7 presents the performance evaluation metrics (Pearson R, MAE, RMSE, RAE, and RRSE) for the different models used in the training and testing phases. These metrics are based on the difference between the actual and predicted values. The analysis shows that the K* model estimates the Cd (coefficient of drag) better than the other models during the training phase, based on the performance assessment indicators. Additionally, the results indicate that the M5P model outperforms the DT model in predicting Cd. The models can be ranked from best to worst during the training stage as K*, M5P, and DT.
Statistical parameters . | Training set . | ||
---|---|---|---|
K* . | M5P . | DT . | |
Pearson R | 0.9875 | 0.9748 | 0.9673 |
MAE | 0.0291 | 0.0307 | 0.0325 |
RMSE | 0.0391 | 0.039 | 0.0427 |
RAE | 21.39% | 22.57% | 23.85% |
RRSE | 22.76% | 22.72% | 25.35% |
Testing set | |||
Pearson R | 0.97 | 0.9731 | 0.7534 |
MAE | 0.0436 | 0.0348 | 0.0709 |
RMSE | 0.0579 | 0.0422 | 0.1234 |
RAE | 29.95% | 23.92% | 48.69% |
RRSE | 31.72% | 23.11% | 67.54% |
Statistical parameters . | Training set . | ||
---|---|---|---|
K* . | M5P . | DT . | |
Pearson R | 0.9875 | 0.9748 | 0.9673 |
MAE | 0.0291 | 0.0307 | 0.0325 |
RMSE | 0.0391 | 0.039 | 0.0427 |
RAE | 21.39% | 22.57% | 23.85% |
RRSE | 22.76% | 22.72% | 25.35% |
Testing set | |||
Pearson R | 0.97 | 0.9731 | 0.7534 |
MAE | 0.0436 | 0.0348 | 0.0709 |
RMSE | 0.0579 | 0.0422 | 0.1234 |
RAE | 29.95% | 23.92% | 48.69% |
RRSE | 31.72% | 23.11% | 67.54% |
With the lowest MAE of 0.0348, RMSE of 0.0422, RAE of 23.92%, RRSE of 23.11%, and the highest Pearson R of 0.9731, the M5P model performs the best during the testing phase. During testing, the models can be ranked as M5P, K*, and DT, going from best to worst. According to the results, this dataset can be adapted to accurately predict the Cd using the M5P and K* predictive models.
ACCURACY ASSESSMENT OF MODELS
Various methodologies could be employed to evaluate the derived models. According to common performance metrics and the Taylor diagram, the models are evaluated in the current study.
Taylor diagram
The Taylor diagram is a beneficial visual aid for evaluating prediction model performance. Displaying the model's closeness to the reference point, which stands for the actual values, makes it possible to determine the most accurate and dependable (Taylor 2001; Band et al. 2021). Three essential parameters determine a model's location on the Taylor diagram: the correlation coefficient (represented by the radial lines), the standard deviation (shown on the horizontal and vertical axes), and the RMSE, which is shown by the circular lines centered at the reference point. The most accurate model is the one that comes closest to the reference point (Band et al. 2021).
Performance metrics
In order to assess a model's predictive accuracy, the most prevalent performance metrics are the coefficient of determination R2, adjusted R2, predicted R2, p-value, and F-value. Table 7 reports the performance metrics obtained for each model. Table 7 presents the outcomes of ML techniques for estimating the Cd. Three ML techniques (K*, M5P, and DT) were employed to estimate Cd. From a statistical standpoint, ML techniques with high R2 values and low error measures generally exhibit strong performance (Harith et al. 2021, 2023, 2024b). The results of a single-factor analysis of variance (ANOVA) indicate a significant difference among the various ML techniques. The statistical analysis in Table 7 demonstrates that K* and M5P algorithms yield high-quality predictions. In comparison, the DT algorithm gives lower predictions, considering its ability to create models that match the data and generate accurate predictions.
The order of accuracy for the different ML techniques in predicting Cd, from highest to lowest, is K*, M5P, and DT, respectively.
In the comparison between all techniques used in this study, it can be observed that the GEP outperforms all technique models, while DT achieves the lowest accuracy. The F-test and p-value assessed the significance of the techniques. The p-values for all techniques were less than 0.05 and equal to 0.000, emphasizing their significance. According to the convention, a model is considered significant if it has a high F-value (Gharehbaghi et al. 2023). The F-values for GEP, K*, SPR and M5P, LR, and DT were 6120.88, 4247.12, 257.81, 3409.59, 118.54, and 562.18, respectively, as shown in Table 8, emphasizing the significant relevance of all techniques.
Type of statistical function . | LR . | SPR . | GEP . | K* . | M5P . | DT . |
---|---|---|---|---|---|---|
R2, % | 78.05 | 94.95 | 97.07 | 95.83 | 94.85 | 75.24 |
Adjusted R2, % | 77.39 | 94.58 | 97.05 | 95.80 | 94.83 | 75.11 |
Predicted R2, % | 74.94 | 92.60 | 97.00 | 95.71 | 94.73 | 74.89 |
Difference between adj. R2 and Pred. R2 | 2.45 | 1.98 | 0.05 | 0.09 | 0.10 | 0.22 |
p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
F-value | 118.54 | 257.81 | 6120.88 | 4247.12 | 3409.59 | 562.18 |
Type of statistical function . | LR . | SPR . | GEP . | K* . | M5P . | DT . |
---|---|---|---|---|---|---|
R2, % | 78.05 | 94.95 | 97.07 | 95.83 | 94.85 | 75.24 |
Adjusted R2, % | 77.39 | 94.58 | 97.05 | 95.80 | 94.83 | 75.11 |
Predicted R2, % | 74.94 | 92.60 | 97.00 | 95.71 | 94.73 | 74.89 |
Difference between adj. R2 and Pred. R2 | 2.45 | 1.98 | 0.05 | 0.09 | 0.10 | 0.22 |
p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
F-value | 118.54 | 257.81 | 6120.88 | 4247.12 | 3409.59 | 562.18 |
The adequacy of the fit of the statistical techniques can be evaluated by inspecting the proportions of variance (R2) to confirm that the difference between the predicted R2 and the adjusted R2 does not exceed 20% (Harith 2023; Harith et al. 2024c). The ML techniques demonstrated remarkably high R2, with values of 97.07, 95.83, 94.95, 94.85, 78.05, and 75.24% for GEP, K*, SPR, and M5P, LR, and DT models, respectively. The implemented methodologies effectively captured a remarkably high proportion of the variation, as evidenced by the exceptionally minute residual variance 2.93, 4.17, 5.05, 5.15, 21.95, and 24.76%, respectively.
Statistical analysis confirms the strong performance of the GEP, K*, SPR, and M5P models. This indicates a significant correlation between the Cd values obtained from experiments and those computed by the models.
Overall, the GEP, K*, SPR, and M5P models demonstrate superior performance in accurately capturing the relationship between the measured and predicted Cd values.
Sensitivity analysis
For the input variable xi, the computed Cd values at maximum and minimum are denoted as fmax (xi) and fmin(xi), respectively, while other input variables remain constant at their average values. Figure 8 presents the SA results of Cd to internal angle (θ), cycle number (N), and water depth contraction ratio (H/G). As shown in Figure 8, the water depth contraction ratio is the predominant parameter, with an SA index of approximately 63.42%, followed by 31.96% for the internal head angle and 4.62% for the cycle number, indicating that the cycle number parameter has an insignificant effect on Cd values.
Comparison with previous study
CONCLUSIONS
This study successfully applied regression and ML techniques to predict the discharge coefficient (Cd) of labyrinth sluice gates. The results demonstrate the superiority of ML models, particularly GEP, in accurately forecasting Cd based on fundamental gate properties.
An SA revealed that the water depth contraction ratio (H/G) is the most influential variable, accounting for 63% of the total variation in Cd. The internal head angle (θ) contributes 32%, while the cycle number (N) has a relatively minor impact of 5%. These findings provide valuable insights for the design and operation of labyrinth sluice gates, enabling hydraulic engineers to accurately estimate Cd and optimize gate performance.
While the GEP model offers exceptional accuracy, expanding the training dataset with a wider range of experimental conditions would further enhance its versatility and applicability. This study contributes to the advancement of water resource management and flood control by providing a reliable and efficient tool for predicting the discharge characteristics of labyrinth sluice gates.
FUNDING
There is no fund in this manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.