ABSTRACT
Tyrolean weir can be used as an effective solution to address floatation and sediment deposition in runoff hydropower stations. To improve the efficiency and accuracy of calculating this structure's water intake capacity. The integrated learning algorithm random forest (RF), the firefly algorithm (FA), and the exponential distribution algorithm (EDO) are utilized to develop the algorithm that can be used for the Tyrolean weir Cd and (qw)i/(qw)T prediction models. Sobol's method and SHAP theory are introduced to analyze the above parameters quantitatively and qualitatively. It is shown that EDO-RF is the optimal prediction model for the Tyrolean weir's discharge coefficient and the Froude number Fr has the greatest influence on the Cd prediction results; when Fr < 30, the greater the negative influence of Fr on the model prediction results. When Fr > 30, the greater the positive influence of Fr on the model prediction results. FA-RF is the optimal prediction model for the Tyrolean weir water capture capacity (qw)i/(qw)T, with the ratio of bar length to bar spacing L/e being the largest; When L/e < 20, the greater the negative influence of L/e on the model prediction results. When L/e > 20, the more significant the positive impact of L/e on the model prediction results.
HIGHLIGHTS
The calculation model of Tyrolean weir discharge coefficient and water capture capacity is established.
The influence of Tyrolean weir hydraulic parameters on discharge coefficient and water capture capacity is quantified.
The discharge mechanism of Tyrolean weir is revealed.
INTRODUCTION
Tyrolean weirs first appeared in a stream water intake in the Alps (Arbolí & Polimanti 2018). Nowadays, the structure has been widely used in various hydroelectric engineering, especially in areas with steep riverbed slopes. With the increase in extreme weather, rain, and flood, disasters have been frequent in recent years. This causes sediment, stones, and floating objects to easily enter the hydropower system, damaging the hydro-mechanical equipment, especially the turbine blades (Özkaya 2015). This severely contributes to the inequality of water distribution and increases the cost of equipment maintenance in hydropower systems. Tyrolean weirs can effectively solve this problem, as the structure separates the sediment deposits and transports water directly to the hydropower system.
Therefore, many scholars have studied the hydraulic characteristics of the Tyrolean weir. Yılmaz (2010) investigated the effects of Tyrolean weir rack angle (θ = 14.5, 9.6, and 4.8°) and spacing (3, 6, and 10 mm) on the water capture efficiency and discharge coefficient of this structure based on physical modeling experiments. He also derived the dimensionless formula for the discharge coefficient and analyzed the variation of the discharge coefficient Cd with each dimensionless parameter. Yıldız et al. (2021) further carried out model experiments for rack angles (θ = 25° and 18°) on this basis and also compared and verified the experiment's results with FLOW-3D software. They also gave expressions for the discharge coefficient at different angles with coefficients of determination R2 = 0.838 and 0.825, respectively. Racks (Racks 2012) conducted similar experiments (Racks 2012), which showed that the discharge coefficient Cd generally increases with the increase of the Froude number Fr, and the value of the water capture capacity (qw)i/(qw)T is mainly dependent on the ratio of bar length to bar spacing L/e and Fr. In addition, for the Tyrolean weir's design and other problems, Gufler et al. (2023) proposed a new semi-analytical analysis method, and Maraş (2014) proposed theoretical and experimental methods. It can be seen from the above studies that scholars are currently focusing on the discharge coefficient and water capture efficiency of a Tyrolean weir.
However, the discharge in a Tyrolean weir is divided into directed discharge (qw)i and diverted discharge (qw)d, and (qw)d is mainly related to parameters such as rack slope, bar length, distance between bars, and bar diameter. This results in a much more complicated structure's discharge mechanism than other weir types. Therefore, there is an urgent need for an in-depth study of the discharge characteristics for this structure. Many scholars have used soft computing methods to conduct new analyses and calculate the weir's discharge characteristics (Bijanvand et al. 2023; Kartal et al. 2024; Rahmanshahi et al. 2024; Seyedian et al. 2023; Seyedian & Kisi 2024). Balahang & Ghodsian (2023) conducted the triangular sharp side weir model experiment to obtain the primary dataset. Then, the partial swarm optimization-support vector regression model was used to perform the sensitivity analysis of the dimensionless parameters affecting the discharge coefficient of this structure. The results showed that the sensitivity coefficient of Fr was the largest, and the sensitivity coefficient of the weir crest angle θ was the smallest. Emami et al. (2023) evaluated the discharge coefficient of a pseudo-cosine labyrinth weir using a hybrid machine learning algorithm LXGB and seven input combinations synthesized and the model optimality performance index R2 = 0.971, Root Mean Square Error (RMSE) = 0.014, Nash-Sutcliffe Efficiency coefficient (NSE) = 0.97. Iqbal & Ghani (2024) evaluated the prediction ability of four intelligent models and different parameter combinations for Piano Key Weir, and the results showed that artificial neural network (ANN-22) had the best performance. Meanwhile, sensitivity analysis showed the ratio of head above the crest to weir height h/P was the most influential parameter for Cd. Majedi-Asl et al. (2024) compared the prediction accuracy of four models for the discharge coefficient of the labyrinth weir, and the results showed the ANN model's high performance with evaluation metrics of R2 = 0.998, D.C. coefficient = 0.996, and RMSE = 0.006. Similarly, the ratio of total head to weir height HT/P has the greatest influence on the structure's discharge capacity. Hosseini et al. (2016) optimized the construction cost for the labyrinth spillway design, and they concluded that the differential evolution algorithm and the genetic algorithm could reduce the construction cost by 19.3 and 16.6%, respectively.
Besides, to explain how each input parameter affects the prediction results of the model, the critical features of input parameters and prediction results are mined. Many scholars have carried out a series of explorations and thorough research on this issue. Chadalawada et al. (2020) and Herath et al. (2021) developed physically meaningful rainfall–runoff models based on genetic programming algorithms and Machine Induction Knowledge Augmented – System Hydrologique Asiatique framework to help better understand hydrological processes. Cai et al. (2022) proposed a novel approach that combines the water balance mechanism, and this methodology provided a promising direction for interpretable studies in hydrologic modeling. Jiang et al. (2022) explained Long Short-Term Memory (LSTM's) internal mechanisms using two approaches. They showed that 70.7% of the investigated catchments were dominated by a single flooding mechanism (11.9% snowmelt-dominated, 34.4% recent rainfall-dominated, and 24.4% historical rainfall-dominated), with the remaining 29.3% dominated by a mixed mechanism. Also, Cai et al. (2024) used this approach to show that temperature-related features were large-scale groundwater drought drivers, increasing in importance from 56.1 to 63.1% as the drought event neared from 6 months to 15 days. Zhan et al. (2024) proposed a sequential identification method to improve the interpretability of the resulting partial differential equations. Shen et al. (2022) quantified the discharge coefficient of triangular side orifices using Sobol's method and the sparrow search algorithm-back propagation neural network. Li et al. (2024) quantified the contribution value of dimensionless parameters of a semi-circular labyrinth side weir on the discharge coefficient. Fu et al. (2024) introduced a new hybrid learning method and the SHapley Additive exPlanations (SHAP) value and obtained different data features in the significance of the prediction process. Oyeyi et al. (2024) verified the effectiveness of the model parameters using XGBoost and SHAP. However, there is a lack of research on the discharge mechanism for the problem of complex hydraulic characteristics of a Tyrolean weir under supercritical flow. In particular, how each dimensionless parameter affects the discharge coefficient and water capture efficiency.
This study was conducted to explain the Tyrolean weir's complex discharge mechanism and physical parameters' influence on water capture efficiency under supercritical flow. The integrated learning model RF is proposed, and the hyperparameters of RF are searched in the firefly algorithm (FA) and exponential distribution optimizer (EDO), which not only effectively solves the problems of model overfitting and difficulty in obtaining the optimal combination of parameters, but also reduces the time and computational cost of manual parameter adjusting. Sobol's method and SHAP theory were introduced to quantify the model input parameters, improving the interpretability of the model prediction results. Finally, the sensitivity analysis of each dimensionless parameter was conducted to reveal the prediction mechanism of the discharge coefficient and the water capture capacity calculation model.
MATERIAL AND METHODS
Dimensional analysis for water intake capacity of Tyrolean weirs
Data collection
The dataset is derived from the Tyrolean weirs model experiment (Yılmaz 2010; Yıldız et al. 2021), and the physical model system consists of components such as the main channel, side channel, inlet pipe, and rack. The diameter of the bar rack is 1 cm and the bar spacing is e1 = 3 mm, e2 = 6 mm, and e3 = 10 mm. Yılmaz (2010) measured rack slope angles of 4.8°, 9.6°, and 14.5°; Yıldız et al. (2021) measured rack slope angles of 18° and 25°, respectively. The dataset characteristics are shown in Table 1.
Parameters . | Fr . | L/e . | e (mm) . | θ (°) . | Cd . | (qw)i/(qw)T . |
---|---|---|---|---|---|---|
Min | 1.37 | 5 | 3 | 4.8 | 0.08 | 0.21 |
Max | 103.79 | 133.33 | 10 | 25 | 0.95 | 1.00 |
Mean | 29.39 | 32.73 | 6.33 | 12.94 | 0.44 | 0.84 |
Middle quartile | 19.43 | 25 | 6.00 | 14.50 | 0.43 | 0.96 |
Standard deviation | 27.10 | 28.71 | 2.87 | 6.25 | 0.15 | 0.22 |
Parameters . | Fr . | L/e . | e (mm) . | θ (°) . | Cd . | (qw)i/(qw)T . |
---|---|---|---|---|---|---|
Min | 1.37 | 5 | 3 | 4.8 | 0.08 | 0.21 |
Max | 103.79 | 133.33 | 10 | 25 | 0.95 | 1.00 |
Mean | 29.39 | 32.73 | 6.33 | 12.94 | 0.44 | 0.84 |
Middle quartile | 19.43 | 25 | 6.00 | 14.50 | 0.43 | 0.96 |
Standard deviation | 27.10 | 28.71 | 2.87 | 6.25 | 0.15 | 0.22 |
Integrated learning model
Integrated learning can be treated as combining multiple machine learning techniques, which improves a model's predictive and generalization ability by combining multiple separate, fundamental learning algorithms (Hansen & Salamon 1990). The main idea is to use the prediction results of multiple weak learners to make up for the weaknesses of a single learner, thus obtaining a strong learner with higher predictive ability. Meanwhile, integration learning can be categorized into homogeneous and heterogeneous integration learning based on the composition of base learners.
Random forest
Random forest (RF) obtains the final prediction by building multiple decision trees and combining the forecasts from all of them (Breiman 2001). It is not a separate machine learning method, but an integrated algorithm based on decision trees. The algorithm's core idea is to aggregate multiple learners so that, even if one learner makes an incorrect prediction, the other can correct or reduce the error. Thus, the overfitting phenomenon of a single learner is diminished, and more vital generalization ability and prediction accuracy are obtained. In constructing the decision tree, the RF employs two randomization methods, and the process uses the bootstrap sampling method in both cases. The bootstrap method generates m training sets, and each set constructs a decision tree. When nodes are looking for features for splitting, a portion of them are randomly sampled, and the optimal solution is found in the middle of the features. Apply it to the node for splitting.
Firefly algorithm optimizing random forests (FA-RF)
The hyperparameters of random forests are mainly divided into the number of decision trees and features for each node. The empirical trial and error method is very time-consuming and labor-intensive when adjusting the parameters. Therefore, FA is proposed to optimize the above hyperparameters. In this algorithm, each firefly individual corresponds to an alternative solution to the problem. The global optimization process moves the position of each firefly to cluster toward the brightest firefly. There are two critical parameters in the FA, which are brightness and attractiveness. Brightness reflects the firefly's position's importance and determines its movement's direction. Attractiveness determines the distance the firefly moves. The model is constantly updated by firefly brightness and attractiveness to achieve goal optimization (Gandomi et al. 2011; Johari et al. 2013).
Exponential distribution optimizer optimizing random forests (EDO-RF)
The exponential distribution algorithm, as a novel meta-heuristic algorithm, differs from the swarm intelligence optimization algorithm in that the model is mainly based on the exponential conceptual distribution model in mathematics (Abdel-Basset et al. 2023). In the development phase, several features of the exponential distribution are used, such as statistical features like memorylessness, exponential rate, standard variance, and mean. In addition, a bootstrapping scheme is used to reduce the search process to reach the global optimum. EDO uses memorylessness, guided solution, and exponential variance to maintain the state of solving a new solution. Meanwhile, the model only needs to adjust the hyperparameter to the switch probability. The model has a robust optimization-seeking ability and fast convergence speed, which can be a good solution for practical engineering.
SHAP theory
To further explore the interpretability of the Tyrolean weir optimal prediction model and analyze each dimensionless parameter's effect on discharge coefficient and water capture capacity. The SHAP value can be combined with global and local variables to explain the model's parameters (Lundberg & Lee 2017). It obtains the influence of each variable on the prediction results by calculating the influence degree of each feature on the model output. Therefore, to deeply explore the key influential features of discharge coefficient and water intake capacity, this paper introduces the SHAP theory based on the optimal model, which explains each dimensionless parameter's variation rule.
Model indicators
RESULTS AND DISCUSSION
Model comparison
To address the problem of calculating the Tyrolean weir's discharge coefficient Cd and water capture capacity (qw)i/(qw)T, a total of 674 datasets are collected in this study, and the experimental datasets are divided into the model training set and testing set according to the ratio of 7:3, respectively. For Tyrolean weir discharge coefficient Cd, the model input parameters are rack angle θ, bar length to bar spacing ratio L/e, bar centroid to bar spacing ratio a/e, and Froude number Fr, and the model output parameter is Cd. The number of hyperparameter decision trees and nodes when the prediction models RF, FA-RF, and EDO-RF reach the optimization are 300 and 6, 18 and 1, 29 and 1, respectively. For the Tyrolean weir water capture capacity (qw)i/(qw)T, the model input parameters are rack angle θ, bar length to bar spacing ratio L/e, bar centroid spacing to bar spacing ratio a/e, and the Froude number Fr, and the model output parameter is (qw)i/(qw)T. When the prediction models RF, FA-RF, and EDO-RF are optimal, and the number of hyperparameter decision trees and nodes are 50 and 1, 27 and 1, and 33 and 1, respectively.
Analysis of parameter importance
As seen from Figure 6, when the feature value of Fr gradually increases, its corresponding SHAP value is greater than 0. It increases with the increase of the feature value, indicating that Fr has a positive influence on the predicted value of Cd. When the feature value of Fr gradually decreases, its corresponding SHAP value is less than zero. It decreases with the decrease of the feature value, indicating that the negative effect of Fr on the Cd prediction results is also very significant. For L/e, when the eigenvalue gradually increases, its corresponding SHAP value is less than zero. Still, the overall increase is minor, indicating that the negative effect of L/e on Cd prediction results is not significant. When the eigenvalues gradually decrease, their corresponding SHAP values are greater than zero, indicating that L/e positively affects the Cd prediction results and that the degree of effect is greater than the negative effect. a/e and θ positively affect the Cd prediction results when their eigenvalues increase and their corresponding SHAP values are greater than zero. When their eigenvalues decrease, the corresponding SHAP values are less than zero, which hurts the Cd prediction results.
Parameter sensitivity analysis
CONCLUSION
This paper uses the integrated learning models RF, FA-RF, and EDO-RF to predict discharge coefficient and water capture capacity in Tyrolean weir. First, the dimensionless parameters affecting the discharge coefficient and water capture capacity are obtained by the Buckingham-Π theorem. Furthermore, FA-RF and EDO-RF prediction models are developed based on the optimization algorithm and experimental dataset. Then, based on the performance of the optimal models, Sobol's method and SHAP theory are introduced to quantitatively and qualitatively analyze the discharge coefficient and water capture capacity, and the following conclusions are obtained:
(1) EDO-RF is the optimal prediction model for the Tyrolean weir discharge coefficient with RMSE = 0.041, MAPE = 8.442%, R2 = 0.947, and WIA = 0.977. FA-RF is the optimal prediction model for Tyrolean weir water capture capacity (qw)i/(qw)T with RMSE = 0.049, MAPE = 5.424%, R2 = 0.950, and WIA = 0.984 in the testing phase.
(2) The first-order sensitivity coefficients of the rack angle θ, the ratio of the bar length to the bar distance L/e, the ratio of the bar centroid distance to the bar distance a/e, and the Froude number Fr on the output parameter Cd are 0.136, 0.221, 0.143, and 0.357, respectively; and the global sensitivity coefficients are 0.211, 0.307, 0.210, and 0.453, respectively, which indicate that Fr has the most significant effect on Cd, L/e is the next, θ and a/e is relatively small. The first-order sensitivity coefficients of θ, L/e, a/e, and Fr on the output parameter (qw)i/(qw)T are 0.014, 0.571, 0.009, and 0.225, respectively. The global sensitivity coefficients are 0.033, 0.737, 0.038, and 0.382, respectively, indicating that L/e has the most significant effect on (qw)i/(qw)T, followed by Fr, with a/e and θ are relatively small.
(3) When Fr < 30, the larger the value of L/e, the more significant the negative impact of Fr on the Cd prediction results. When Fr > 30, the larger the value of L/e, the more critical the positive effect of Fr on the Cd prediction result. When L/e < 20, the larger the value of Fr, the more significant the negative effect of L/e on the (qw)i/(qw)T prediction result. When L/e > 20, the larger the value of Fr, the larger the positive effect of L/e on the (qw)i/(qw)T prediction result.
The limitation of this study is that the datasets are all from physical model experiments, and collecting engineering-measured data should be considered to validate the model's generalizability and applicability in future studies.
ACKNOWLEDGEMENTS
This study was supported by the National Natural Science Foundation of China (Grant Nos. 52079122 and 52379080).
AUTHORSHIP CONTRIBUTIONS
G. S. developed the methodology and rendered support in formal analysis, wrote the original draft, investigated the data, and edited the article. Y. L. rendered support in data curation, wrote the manuscript, and edited the article. A. P. developed the methodology, resources, and validation. W. W. investigated the article, wrote and reviewed the whole article, and rendered support in funding acquisition. Y. W. visualized and validated the article. Z. M. developed the language polishing and investigated the article.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.