ABSTRACT
The uneven velocity distribution formed at the lateral inlet/outlet poses a significant risk of damaging the trash racks. Reasonable design of the inlet/outlet structures requires the consideration of two major aspects: the average velocity (Vm) and the coefficient of unevenness (Uc). This paper developed an optimization framework that combines an interpretable Gradient Boosting Decision Tree (SOBOL-GBDT) with a Non-dominated Sorting Genetic Algorithm (NSGA-II). 125 conditions are simulated by performing CFD simulations to generate the dataset, followed by GBDT implemented to establish a nonlinear mapping between the input parameters including vertical (α) and horizontal (β) diffusion angles, diffusion segment length (LD), channel area (CA), and the objectives Uc and Vm. The SOBOL analysis reveals that in Uc prediction, CA and α play more significant roles in the model development compared to β and LD. Besides, GBDT is observed to better capture interactive effects of the input parameters compared with other machine learning models. Subsequently, a multi-objective optimization framework using GBDT-NSGA-II is developed. The framework calculates the optimal Pareto front and determines the best solution using a pseudo-weight method. The results demonstrate that this framework leads to significant improvements in flow separation reduction in the diffusion segment and the normalized velocity distribution. The SOBOL-GBDT-NSGA-II framework facilitates a rational and effective design of the inlet/outlet.
HIGHLIGHTS
Sobol indices can effectively explain the effect of individual and their interactions on the predictive model.
Within multi-objective optimization problems, the GBDT-NSGA-II framework can effectively identify optimal solutions.
In the optimal shape, lateral inlet/outlet reduces the flow separation areas in the channel, simultaneously diminishing the likelihood of flow-induced vibrations in the trash racks.
INTRODUCTION
To achieve the goals of peak carbon emissions and carbon neutrality (Wei et al. 2022; Yang et al. 2022), China is actively and systematically advancing the development of renewable and clean energy utilization. Among these, pumped-storage hydropower technology plays a vital role in renewable energy systems. As an effective method of energy storage, it is crucial for electricity regulation due to its high load-balancing efficiency, reliability as a backup power source, long service life, and large storage capacity. Lateral inlet/outlet is one of the key components in pumped-storage power stations (PSPS), designed for bidirectional flow to meet the varied power demand. During inflow conditions, water is contracted and flows from the reservoir through the inlet/outlet into the tunnel; during outflow conditions, water is expanded and flows from the tunnel back into the reservoir. The bidirectional flow creates complex fluid dynamics within the inlet/outlet, which can damage the trash rack, affecting the efficiency of the power generation and, in severe cases, even damaging the turbines (Tsikata et al. 2009; Raynal et al. 2013; Xue et al. 2023). Damage to the trash rack is mainly influenced by the following two factors: (1) large turbulence fluctuations in the water flow cause long-term fatigue damage to the trash rack (Nguyen & Naudascher 1991), and (2) high local flow speeds can lead to resonance damage when the frequency of the Karman vortex street matches the natural frequency of the trash rack (Naudascher & Wang 1993). In response to these challenges, the design of lateral inlets/outlets should strictly control Vm and Uc at the trash rack section to prevent damage (Gao et al. 2018; NDRC 2019; Zhu et al. 2023).
The design of the lateral inlet/outlet primarily focuses on studying the flow velocity distribution at the trash rack. Sun et al. (2007) measured the flow velocity distribution at the trash rack under different diversion pier arrangements in lateral inlets/outlets and proposed methods to deal with bending and negative flow velocities. Ye & Gao (2011) used the realizable k-ɛ model for numerical simulation of lateral inlets/outlets, analyzed flow distribution, head loss, and inlet vortices, and compared these with experimental results. Terrier et al. (2014) conducted hydraulic model experiments to study surge wave issues in the tailrace tunnels of PSPS and analyzed head loss during sudden starts and stops. Zhu et al. (2023) performed CFD simulations to derive the flow velocity distribution and turbulence intensity at the trash rack and compared these with results from a Laser Doppler Velocimeter (LDV), finally analyzing the causes of fatigue damage to the trash rack. Gao et al. (2018) used Response Surface Methodology (RSM) to construct mappings between influencing factors in the inlet/outlet and head loss and unevenness coefficient. They performed multi-objective optimization with NSGA-II. However, RSM uses quadratic polynomial fitting that is suitable for merely simple data sets, whereas machine learning (ML) models, with their ability to handle more complex, nonlinear data, are applicable in a wider range of scenarios.
ML models are rapidly advancing in the field of water science, for instance, employing SVR (Roushangar et al. 2018; Mia & Dhar 2019; Yu et al. 2023) for dam safety monitoring, discharge coefficient prediction, and optimization of hard turning of steel; using RF (Hong et al. 2012) for predicting scour depth; utilizing GBDT (Qiu et al. 2021) to forecast ground vibration; and employing XGBOOST (Sen et al. 2023) for predicting global temperature anomalies. However, ML models are often viewed as ‘black boxes,’ with their internal decision-making processes not fully understood. This has led to a growing interest in interpretability analysis to demystify their internal mechanisms and results credibility. Initially, the SHapley Additive exPlanations (SHAP) method is widely adopted. Mo et al. (2023) analyzed the importance of different features in cavity water and length models using the XGBOOST-SHAP model. Karakaş et al. (2023) improved the accuracy and interpretability of ML model predictions for the bearing capacity of closed and open piles using SHAP values. While SHAP focuses on the impact of variables acting independently and emphasizes the contribution of individual sample points, interactive effects of more than two variables can be derived from SOBOL analysis. Zouhri et al. (2022) enhanced the robustness of SVMs against feature uncertainty by introducing the SOBOL analysis method. Xu et al. (2023) combined ML models with the SOBOL analysis to propose a global sensitivity analysis approach for assessing the safety and effectiveness of dam structural parameters. Therefore, with the ability to reveal the dependence of the target variables on the input variables, interpretable analysis offers not only valuable insights during the design phase but also the stage for multi-objective optimization analysis.
Currently, the NSGA-II is widely used in solving multi-objective optimization problems. For example, Komasi & Goudarzi (2021) used NSGA-II to design a water network consisting of 12 observation points and significantly reduced the number of monitoring sites. Dey et al. (2019) employed numerical simulation to study flow and heat transfer around square and circular shapes and used NSGA-II for optimization successfully finding the optimal roundness of the cylinder corner. Zhao et al. (2023) combined RSM with NSGA-II for optimizing lithium batteries, finally effectively reducing the maximum temperature difference and pressure difference while enhancing the PCM liquid phase fraction. In this study, considering preventing the damage to the trash rack requires the constraint of the lateral inlets and outlets to reach the desired Vm and Uc, multi-objective optimization analysis is carried out.
To achieve the multi-objective optimization of lateral inlet/outlet, this paper is organized as follows: initially, CFD simulations are performed for 125 different scenarios. Subsequently, the SOBOL-GBDT method is used to establish a nonlinear mapping between influencing factors and the objective functions Vm and Uc. The importance of each factor is analyzed through SOBOL first-order and second-order indices, considering the individual and interactive effects vary at different CA. The model's predictive accuracy and interpretability analysis are then compared with the results from SVR, RF, and XGBOOST models, confirming the reliability of the GBDT model. Finally, the SOBOL-GBDT-NSGA-II multi-objective optimization framework is applied to obtain the Pareto front solution set. Using a pseudo-weight method, the optimal solutions for two design scenarios, S1 and S2, are identified. The paper also analyzed the convergence of the algorithm and the distribution characteristics of the variable space based on these solutions.
CONSTRUCTION OF THE LATERAL INLET/OUTLET MODEL
Model parameters of the inlet/outlet
The experiment, as shown in Figure 1(b), uses a scale model ratio of 1:40. The inlet/outlet is made of acrylic glass. Under the condition of full discharge in pumping mode, the flow rate is 262 m3/s, with an average velocity of 4.11 m/s. Flow velocities in each channel are measured using an Acoustic Doppler Velocimeter (ADV), which has a measurement accuracy of 1 × 10−3 m/s. In the inflow condition, water enters from a water tower into a triangular weir, which controls the flow rate with high precision. The water then flows into the reservoir and finally enters the inlet/outlet and downstream tunnel. The experiment mainly measures the flow velocity across the trash rack cross-section, as depicted in Figure 1(c). Time-averaged velocities at five points along the vertical direction of each channel are measured for validation of the numerical simulation.
Numerical setup and validation
Flow direction . | Flow region . | Inlet flow rate (m3/s) . | Inflow velocity (m/s) . | Outlet water level (m) . |
---|---|---|---|---|
Inflow | Contracted flow | 261 | 4.11 | 740 |
Outflow | Diffusive flow | 261 | 4.11 | 740 |
Flow direction . | Flow region . | Inlet flow rate (m3/s) . | Inflow velocity (m/s) . | Outlet water level (m) . |
---|---|---|---|---|
Inflow | Contracted flow | 261 | 4.11 | 740 |
Outflow | Diffusive flow | 261 | 4.11 | 740 |
MULTI-OBJECTIVE OPTIMIZATION BASED ON THE GBDT MODEL
Gradient boosting decision tree
Here, f(x) denotes the final model obtained by summing the outputs of all the decision trees. The term Cm represents the contribution of the m-th tree, while I(x ∈ Rm) is an indicator function that determines if the input x falls within the region Rm covered by the m-th tree.
Here, hm(x) is the updated model after adding the m-th tree, hm−1(x) is the model from the previous iteration, Cmj is the optimal value that minimizes the loss for the j-th region Qmj, and I(x ∈ Qmj) is an indicator function assigning Cmj to the inputs falling within the region Qmj.
Sobol interpretability analysis
Sobol analysis (Sobol 2001), proposed by Sobol in 2001, is a significant method used for sensitivity analysis. It is primarily employed to assess the impact of different input parameters on the output results of a model. This analysis is particularly suitable for dealing with complex models, especially in cases where there are interdependence or nonlinear relationships between the model's input parameters. In Sobol analysis, it is assumed that the model can be represented as y = f(x), where x = (x1,x2,…,xn) are the model's input parameters, and y is the output result. The core of Sobol analysis is based on the method of variance decomposition, which breaks down the total variance of the model output into contributions from different input variables. This method is primarily implemented through Sobol indices, which can be categorized into two types:
First-order index Si, which represents the contribution of a single variable xi to the output variance V(y), defined as Si = Vi/V(y), where Vi is the variance of y due to xi alone. Second-order index Sij, which accounts for the effect of interactions between two variables xi and xj, defined as Sij = Vij/V(y), where Vij represents the variance of y due to the interactive effects of xi and xj.
The method developed by Sobol was further enhanced by Saltelli et al. (2010), who suggested algorithmic extensions for calculating the sensitivity indices. The Saltelli extension incorporates new strategies for generating sample matrices and is applicable to models with a large number of input variables. Taking a model with m input variables as an example, a sample matrix of size n(2m + 2) is generated, where n is a base sample size to optimize computational efficiency. For example, with a base sample size of n = 1,024 and m = 4 input variables, the sample matrix would consist of 10,240 samples.
NSGA-II and Pareto-font solution set
NSGA-II (Deb et al. 2002) (Non-dominated Sorting Genetic Algorithm II), developed by Kalyanmoy Deb and others in 2002, is an advanced algorithm designed for multi-objective optimization problems. It improves upon NSGA by effectively addressing complex issues with multiple conflicting objectives. The algorithm is distinguished by its non-dominated sorting process, which classifies solutions based on their dominance levels. This is critical for identifying solutions that are closest to the ideal outcome in scenarios with competing objectives. Additionally, NSGA-II integrates a crowding distance mechanism, ensuring a diverse range of solutions and preventing the over-concentration of solutions in a particular region of the solution space.
The Pareto front (Poulos et al. 2001), in the context of multi-objective optimization, is a key concept intimately linked with NSGA-II's objectives. It represents a set of non-dominated solutions, meaning that no other solution in the set is superior in all objectives (Safari et al. 2018). The Pareto front forms a conceptual boundary that maps the distribution of these optimal solutions, showcasing the optimal trade-offs among different objectives. This boundary is crucial for decision-makers as it guides them in identifying the most effective solutions that strike a balance between competing objectives. NSGA-II aims to approximate this Pareto front as closely as possible (Babajamali et al. 2022), providing a comprehensive set of balanced solutions for complex optimization problems.
In multi-objective optimization, the main components include input variables and objective functions. This study is dedicated to finding the optimal shape for lateral inlet/outlet. Therefore, vertical diffusion angle, horizontal diffusion angle, length of diffusion segment, and flow channel area are selected as variables. The objective function is adjusted to reduce average flow velocity and achieve as uniform a velocity distribution as possible across the trash rack section. To address the multi-objective optimization problem effectively, the Pymoo package in Python was utilized. The algorithm was configured with a population size of 100 and iterated across 120 generations.
Objective function and dataset
This paper aims to provide a rational design for the inlet/outlet, as well as enrich the database for research on ML and multi-objective optimization. Specifically, five values have been selected for the vertical diffusion angle (α): 3, 4, 5, 6.12, and 7; and five values for the horizontal diffusion angle (β): 27, 30, 33.77, 37, and 41. Additionally, the diffusion segment length (LD) includes values of 32, 37, 42, 47, and 52. Ranges of the input parameters are listed in Table 2 and CFD simulations are performed for these 125 cases.
. | α (°) . | β (°) . | LD (m) . | CA (m2) . | UC . | Vm (m/s) . | Samplers . |
---|---|---|---|---|---|---|---|
Data | 3.00,4.00,5.00,6.12,7.00 | 27.00,30.00, 33.77,37.00, 41.00 | 32,37,42,47,52 | 52.99–166.74 | 1.28–5.35 | 0.50–1.54 | 125 |
. | α (°) . | β (°) . | LD (m) . | CA (m2) . | UC . | Vm (m/s) . | Samplers . |
---|---|---|---|---|---|---|---|
Data | 3.00,4.00,5.00,6.12,7.00 | 27.00,30.00, 33.77,37.00, 41.00 | 32,37,42,47,52 | 52.99–166.74 | 1.28–5.35 | 0.50–1.54 | 125 |
RESULTS AND DISCUSSION
Establishment of interpretable ML models
Optimization of ML models
Evaluations of the performance of the traditional ML models including SVR (Vapnik 1999), RF (Breiman 2001), and XGBOOST (Chen & Guestrin 2016) together with GBDT are carried out. With 80% of the data allocated as the training set and the remaining 20% as the test set, Bayesian Optimization (BO) (Bergstra et al. 2011) was employed to fine-tune the hyperparameters of each ML model and the results are listed in Table 3. The predictive performance was indicated using common metrics: R2, RMSE, and MAE (Mia & Dhar 2019; Mo et al. 2023), and Table 4 provides a comparison of model performance before and after optimization on the test set. The results indicate that the optimized models have better performance, hence the analysis proceeded with these optimized models.
. | Model . | Hyperparameters . | ||
---|---|---|---|---|
Uc | SVR-BO | C = 98.56 | γ = scale | ε = 0.04 |
RF-BO | n_estimators = 948 | max_depth = 40 | min_samples_leaf = 1 | |
GBDT-BO | n_estimators = 660 | max_depth = 10 | Learning_rate = 0.17 | |
XGboost-BO | n_estimators = 718 | max_depth = 5 | Learning_rate = 0.11 | |
Vm | SVR-BO | C = 69.51 | γ = scale | ε = 0.01 |
RF-BO | n_estimators = 279 | max_depth = 30 | min_samples_leaf = 1 | |
GBDT-BO | n_estimators = 327 | max_depth = 8 | Learning_rate = 0.29 | |
XGboost-BO | n_estimators = 952 | max_depth = 4 | Learning_rate = 0.13 |
. | Model . | Hyperparameters . | ||
---|---|---|---|---|
Uc | SVR-BO | C = 98.56 | γ = scale | ε = 0.04 |
RF-BO | n_estimators = 948 | max_depth = 40 | min_samples_leaf = 1 | |
GBDT-BO | n_estimators = 660 | max_depth = 10 | Learning_rate = 0.17 | |
XGboost-BO | n_estimators = 718 | max_depth = 5 | Learning_rate = 0.11 | |
Vm | SVR-BO | C = 69.51 | γ = scale | ε = 0.01 |
RF-BO | n_estimators = 279 | max_depth = 30 | min_samples_leaf = 1 | |
GBDT-BO | n_estimators = 327 | max_depth = 8 | Learning_rate = 0.29 | |
XGboost-BO | n_estimators = 952 | max_depth = 4 | Learning_rate = 0.13 |
Model . | Uc test . | Vm test . | ||||
---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | |
SVR | 0.939 | 0.204 | 0.161 | 0.924 | 0.058 | 0.046 |
SVR-BO | 0.945 | 0.193 | 0.154 | 0.995 | 0.015 | 0.010 |
RF | 0.940 | 0.202 | 0.162 | 0.995 | 0.015 | 0.011 |
RF-BO | 0.947 | 0.190 | 0.146 | 0.995 | 0.015 | 0.010 |
GBDT | 0.972 | 0.137 | 0.101 | 0.997 | 0.012 | 0.009 |
GBDT-BO | 0.987 | 0.093 | 0.077 | 0.997 | 0.012 | 0.009 |
XGBOOST | 0.959 | 0.168 | 0.127 | 0.987 | 0.024 | 0.020 |
XGBOOST-BO | 0.974 | 0.133 | 0.101 | 0.993 | 0.017 | 0.013 |
Model . | Uc test . | Vm test . | ||||
---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | |
SVR | 0.939 | 0.204 | 0.161 | 0.924 | 0.058 | 0.046 |
SVR-BO | 0.945 | 0.193 | 0.154 | 0.995 | 0.015 | 0.010 |
RF | 0.940 | 0.202 | 0.162 | 0.995 | 0.015 | 0.011 |
RF-BO | 0.947 | 0.190 | 0.146 | 0.995 | 0.015 | 0.010 |
GBDT | 0.972 | 0.137 | 0.101 | 0.997 | 0.012 | 0.009 |
GBDT-BO | 0.987 | 0.093 | 0.077 | 0.997 | 0.012 | 0.009 |
XGBOOST | 0.959 | 0.168 | 0.127 | 0.987 | 0.024 | 0.020 |
XGBOOST-BO | 0.974 | 0.133 | 0.101 | 0.993 | 0.017 | 0.013 |
In Uc prediction, the GBDT-BO showed the best performance, with R2, RMSE, and MAE values of 0.987, 0.093, and 0.077, respectively, followed by the XGBOOST-BO model with R2, RMSE, and MAE values of 0.974, 0.133, and 0.101. The SVR-BO model had the poorest performance that presenting 4.2% lower while 54.4 and 100% higher magnitudes of R2, RMSE and MAE, respectively compared to the GBDT-BO model.
In Vm prediction, the R2 values for the SVR-BO, RF-BO, GBDT-BO, and XGBOOST-BO were 0.995, 0.995, 0.997, and 0.993, respectively, indicating strong predictivity against the test data. GBDT-BO presents the least error, with RMSE and MAE of 0.012 and 0.009, whereas XGBOOST-BO had the largest error, with its RMSE and MAE of 41.6 and 44.4% higher than GBDT-BO, respectively.
Sobol interpretability analysis
With the established ML model, the interactions and significance of various input variables in relation to the model are further investigated. The Sobol method is applied to calculate first- and second-order Sobol indices, revealing the significance of individual variables and the interactive effect of two variables during the prediction process. Considering larger Vm has a relatively milder effect on the trash rack and requires long-term accumulation of fatigue that finally leads to the trash rack failure, whereas the increase of Uc leads to a significant increase of flow-induced vibration and causes large deformation of the lower and middle parts of the trash rack (Nguyen & Naudascher 1991). Therefore, targeted on the output Uc, relative significances of the influencing factors including α, β, LD, and CA are investigated to guide the optimization for the design of lateral inlet/outlet.
Tables 5 and 6 present first-order (Si) and second-order (Sij) Sobol indices in the ML models that correspond to the individual and interactive effects of the input variables, respectively. Focusing on the Si listed in Table 5 firstly, CA presents predominant significance in the SVR and RF models with SCA of 0.97, whereas XGBOOST presents a relatively lower magnitude of SCA and higher Si for α, β and LD, indicating increased importance of these variables in the model development. In GBDT-BO, a further decrease of SCA accompanied by the increase of Sα Sβ and SLD can be observed.
. | Sα . | Sβ . | SLD . | SCA . |
---|---|---|---|---|
SVR | 0.000 | 0.000 | 0.025 | 0.972 |
RF | 0.012 | 0.000 | 0.001 | 0.978 |
GBDT | 0.244 | 0.156 | 0.049 | 0.511 |
XGBOOST | 0.113 | 0.121 | 0.057 | 0.709 |
. | Sα . | Sβ . | SLD . | SCA . |
---|---|---|---|---|
SVR | 0.000 | 0.000 | 0.025 | 0.972 |
RF | 0.012 | 0.000 | 0.001 | 0.978 |
GBDT | 0.244 | 0.156 | 0.049 | 0.511 |
XGBOOST | 0.113 | 0.121 | 0.057 | 0.709 |
. | α,β . | α,LD . | α,CA . | β,LD . | β,CA . | LD,CA . |
---|---|---|---|---|---|---|
SVR | 0.000 | 0.006 | 0.091 | 0.033 | 0.000 | 0.000 |
RF | 0.000 | 0.000 | 0.079 | 0.013 | 0.001 | 0.000 |
GBDT | 0.010 | 0.000 | 0.217 | 0.020 | 0.140 | 0.040 |
XGBOOST | 0.020 | 0.000 | 0.132 | 0.030 | 0.114 | 0.000 |
. | α,β . | α,LD . | α,CA . | β,LD . | β,CA . | LD,CA . |
---|---|---|---|---|---|---|
SVR | 0.000 | 0.006 | 0.091 | 0.033 | 0.000 | 0.000 |
RF | 0.000 | 0.000 | 0.079 | 0.013 | 0.001 | 0.000 |
GBDT | 0.010 | 0.000 | 0.217 | 0.020 | 0.140 | 0.040 |
XGBOOST | 0.020 | 0.000 | 0.132 | 0.030 | 0.114 | 0.000 |
As for the interactive effects of dual variables listed in Table 6, noticeable Sij only exists between α and CA in SVR and RF with magnitudes of less than 0.1, whereas both GBDT and XGBOOST have Sα,CA and Sβ,CA of greater than 0.1, indicating the possible interactive effect between α and CA and β and CA that are captured by these models.
Considering the designs of the inlet/outlet are usually based on a given flow velocity that is determined by CA, a range for CA is selected between 52.99 and 166.74, within which 200 CA are chosen for analysis. By sequentially studying these 200 CA values, the Sobol method is employed in each iteration to calculate the relative significances of α, β, and LD in the ML model development at specific CA values. This approach provides a rational reference for designing the inlet/outlet structure for selected specific CA values.
To reflect the generalization performance of the models, samples were taken during the iterative process mentioned above. The ML models then predicted and averaged the outcomes for the 10,240 samples drawn in that iteration. Variation of UC derived from each ML model with respect to CA were obtained and compared with the original dataset, as shown in Figure 6(e). The figure indicates SVR had the poorest predictive performance, fitting well only when CA was between 72 and 137; followed by RF and XGBOOST, showing errors near the maximum and minimum CA. Lastly, results from GBDT present the best agreement with the experimental data.
Multi-objective optimization analysis of inlet/outlet
Results in Figure 6(e) indicate that the increase of CA leads to an increase of Uc, which increases damage risk due to flow-induced vibrations. However, the increase of CA also causes Vm decreases, suggesting that lower velocities with less turbulent energy reduce fatigue damage risk. Considering the multiple critical output factors including Uc and Vm and input parameters including α, β, LD, and CA in the lateral inlets/outlets design, the necessity of a multi-objective optimization approach is necessary to achieve an optimal design. In this work, the GBDT model is adopted to establish the relationship between input variables α, β, LD, CA, and objectives Uc, Vm. During the model development, two constraints, S1 and S2 are applied: S1 requires Vm below 1.2 m/s whereas S2 keeps CA constant and performs optimization by adjusting α, β, and LD.
Algorithm convergence analysis
Analysis of patterns in objective and variable space
. | α (°) . | β (°) . | LD (m) . | CA (m2) . | Vm (m/s) . | Uc . |
---|---|---|---|---|---|---|
Original | 6.12 | 33.77 | 42.00 | 101.25 | 0.803 | 3.18 |
S1 | 3.29 | 28.80 | 49.60 | 88.76 | 0.937 | 2.13 |
S2 | 3.67 | 32.10 | 50.14 | 101.78 | 0.804 | 2.59 |
. | α (°) . | β (°) . | LD (m) . | CA (m2) . | Vm (m/s) . | Uc . |
---|---|---|---|---|---|---|
Original | 6.12 | 33.77 | 42.00 | 101.25 | 0.803 | 3.18 |
S1 | 3.29 | 28.80 | 49.60 | 88.76 | 0.937 | 2.13 |
S2 | 3.67 | 32.10 | 50.14 | 101.78 | 0.804 | 2.59 |
Analysis of optimization results
Disccusion
This study carried out interpretability analysis for the ML models, providing crucial insights for capturing and understanding the impact of input features on model predictions (Altmann et al. 2010; Mi et al. 2020). Most researchers (Karakaş et al. 2023; Mo et al. 2023; Zhao et al. 2024) conduct interpretability analysis of ML models using the Shapley value method, which calculates the Shap values of input variables to analyze feature importance. However, these values only represent the individual effect of features on the model. This paper employs the SOBOL method to calculate first- and second-order SOBOL indices, revealing that the results are influenced not only by the individual action of features but also by the interactions among features, especially in GBDT and XGBOOST models, as shown in Table 6. The SOBOL method can reveal the impact of variables or variable combinations on model output, serving as a powerful tool for analyzing high-dimensional data and nonlinear models. From an engineering perspective, the quantified relative significances of input variables under different CA provide an intuitional reference for the design of the inlet/outlet.
However, limitations also exist in the current investigation. Due to the extensive time required for CFD simulations, the range of input variables is restricted. Future work should aim to extend the investigated parameter range to derive the ML model with better general applicability. Moreover, different multiple optimization algorithms exist in the literature which makes it worth carrying out an investigation for evaluating their relative performances when applied to the optimized lateral inlet/outlet design.
CONCLUSION
In the PSPS, there is a presence of complex bidirectional flow in the lateral inlet/outlet. It is essential for the design process to effectively reduce the excessive turbulence intensity in the flow channels and prevent resonance damage caused by aligning the water flow frequency with the trash rack frequency. Given the multiple factors that impact the overall design, it is crucial to employ a robust optimization method to simultaneously optimize the Vm and Uc at the trash rack. To address these concerns, this paper proposes the implementation of the SOBOL-GBDT-NSGA-II framework for optimizing the structures of both the inlet/outlet. The principal findings of this study are as follows:
The GBDT model accurately predicts the nonlinear mapping relationship between input variables, such as the vertical diffusion angle (α), horizontal diffusion angle (β), diffusion segment length (LD), and channel area (CA), with the outputs Uc and Vm. Additionally, the precision of GBDT's results is enhanced through BO. For instance, when considering Uc, after optimization, GBDT achieves an R2 of 0.987, RMSE of 0.093, and MAE of 0.077. These performance metrics showcase the high level of accuracy and reliability of the GBDT model, surpassing the results obtained by SVR, RF, and XGBOOST models.
In the GBDT model, SCA is significantly larger than Sα、Sβ and SLD, indicating that CA has the greatest impact on the Uc prediction model. Meanwhile, non-negligible magnitudes for Sα and Sβ are observed, denoting a discernible impact of α and β on the GBDT model. While in other ML models, notably SVR and RF, trivial magnitudes of Sα and Sβ indicate their much lower influence on the model's efficacy. Moreover, the GBDT model exhibits significant values for both the Sα,CA and Sβ, CA, indicative of the pronounced effect of these pair interactions on the model's performance, whereas no interactive effect between the input variables is captured in SVR and RF. This indicates that GBDT is better at extracting interactive information from variables, facilitating the capture of complex, nonlinear relationships between the input variables and the output. As CA increases, α has the greatest influence on the model, followed by β, while LD has the least. This pattern reflects how changes in the CA can significantly affect the relative importance of different design parameters in predicting the Uc, which is crucial for optimizing the design of lateral inlet/outlet in hydraulic structures.
The SOBOL-GBDT-NSGA-II framework is used to identify the optimal Pareto front solutions for two different design scenarios, S1 and S2, aiming to minimize both Vm and Uc. The optimal condition in both S1 and S2 signified lower α, β, and CA with higher LD compared with the original case. This optimization approach can provide guidance for similar projects and help determine appropriate parameter ranges. The multi-objective optimization takes into account the mutual influences of both objectives. Through this framework, flow separation in the lateral inlet/outlet is significantly reduced. In S1, Uc is reduced by 33% while Vm increases by 16.7%, and in S2, Uc decreases by 18.5% with Vm remaining essentially unchanged. This effectively lowers the likelihood of resonance damage to the trash rack.
FUNDING
This research is supported by the National Nature Science Foundation of China (Grant Nos 52179060, 51909024, and 52209081).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.