## ABSTRACT

The uneven velocity distribution formed at the lateral inlet/outlet poses a significant risk of damaging the trash racks. Reasonable design of the inlet/outlet structures requires the consideration of two major aspects: the average velocity (*V _{m}*) and the coefficient of unevenness (

*U*). This paper developed an optimization framework that combines an interpretable Gradient Boosting Decision Tree (SOBOL-GBDT) with a Non-dominated Sorting Genetic Algorithm (NSGA-II). 125 conditions are simulated by performing CFD simulations to generate the dataset, followed by GBDT implemented to establish a nonlinear mapping between the input parameters including vertical (

_{c}*α*) and horizontal (

*β*) diffusion angles, diffusion segment length (

*L*), channel area (

_{D}*C*), and the objectives

_{A}*U*and

_{c}*V*. The SOBOL analysis reveals that in

_{m}*U*prediction,

_{c}*C*and

_{A}*α*play more significant roles in the model development compared to

*β*and

*L*. Besides, GBDT is observed to better capture interactive effects of the input parameters compared with other machine learning models. Subsequently, a multi-objective optimization framework using GBDT-NSGA-II is developed. The framework calculates the optimal Pareto front and determines the best solution using a pseudo-weight method. The results demonstrate that this framework leads to significant improvements in flow separation reduction in the diffusion segment and the normalized velocity distribution. The SOBOL-GBDT-NSGA-II framework facilitates a rational and effective design of the inlet/outlet.

_{D}## HIGHLIGHTS

Sobol indices can effectively explain the effect of individual and their interactions on the predictive model.

Within multi-objective optimization problems, the GBDT-NSGA-II framework can effectively identify optimal solutions.

In the optimal shape, lateral inlet/outlet reduces the flow separation areas in the channel, simultaneously diminishing the likelihood of flow-induced vibrations in the trash racks.

## INTRODUCTION

To achieve the goals of peak carbon emissions and carbon neutrality (Wei *et al.* 2022; Yang *et al.* 2022), China is actively and systematically advancing the development of renewable and clean energy utilization. Among these, pumped-storage hydropower technology plays a vital role in renewable energy systems. As an effective method of energy storage, it is crucial for electricity regulation due to its high load-balancing efficiency, reliability as a backup power source, long service life, and large storage capacity. Lateral inlet/outlet is one of the key components in pumped-storage power stations (PSPS), designed for bidirectional flow to meet the varied power demand. During inflow conditions, water is contracted and flows from the reservoir through the inlet/outlet into the tunnel; during outflow conditions, water is expanded and flows from the tunnel back into the reservoir. The bidirectional flow creates complex fluid dynamics within the inlet/outlet, which can damage the trash rack, affecting the efficiency of the power generation and, in severe cases, even damaging the turbines (Tsikata *et al.* 2009; Raynal *et al.* 2013; Xue *et al.* 2023). Damage to the trash rack is mainly influenced by the following two factors: (1) large turbulence fluctuations in the water flow cause long-term fatigue damage to the trash rack (Nguyen & Naudascher 1991), and (2) high local flow speeds can lead to resonance damage when the frequency of the Karman vortex street matches the natural frequency of the trash rack (Naudascher & Wang 1993). In response to these challenges, the design of lateral inlets/outlets should strictly control *V _{m}* and

*U*at the trash rack section to prevent damage (Gao

_{c}*et al.*2018; NDRC 2019; Zhu

*et al.*2023).

The design of the lateral inlet/outlet primarily focuses on studying the flow velocity distribution at the trash rack. Sun *et al.* (2007) measured the flow velocity distribution at the trash rack under different diversion pier arrangements in lateral inlets/outlets and proposed methods to deal with bending and negative flow velocities. Ye & Gao (2011) used the realizable *k*-*ɛ* model for numerical simulation of lateral inlets/outlets, analyzed flow distribution, head loss, and inlet vortices, and compared these with experimental results. Terrier *et al.* (2014) conducted hydraulic model experiments to study surge wave issues in the tailrace tunnels of PSPS and analyzed head loss during sudden starts and stops. Zhu *et al.* (2023) performed CFD simulations to derive the flow velocity distribution and turbulence intensity at the trash rack and compared these with results from a Laser Doppler Velocimeter (LDV), finally analyzing the causes of fatigue damage to the trash rack. Gao *et al.* (2018) used Response Surface Methodology (RSM) to construct mappings between influencing factors in the inlet/outlet and head loss and unevenness coefficient. They performed multi-objective optimization with NSGA-II. However, RSM uses quadratic polynomial fitting that is suitable for merely simple data sets, whereas machine learning (ML) models, with their ability to handle more complex, nonlinear data, are applicable in a wider range of scenarios.

ML models are rapidly advancing in the field of water science, for instance, employing SVR (Roushangar *et al.* 2018; Mia & Dhar 2019; Yu *et al.* 2023) for dam safety monitoring, discharge coefficient prediction, and optimization of hard turning of steel; using RF (Hong *et al.* 2012) for predicting scour depth; utilizing GBDT (Qiu *et al.* 2021) to forecast ground vibration; and employing XGBOOST (Sen *et al.* 2023) for predicting global temperature anomalies. However, ML models are often viewed as ‘black boxes,’ with their internal decision-making processes not fully understood. This has led to a growing interest in interpretability analysis to demystify their internal mechanisms and results credibility. Initially, the SHapley Additive exPlanations (SHAP) method is widely adopted. Mo *et al.* (2023) analyzed the importance of different features in cavity water and length models using the XGBOOST-SHAP model. Karakaş *et al.* (2023) improved the accuracy and interpretability of ML model predictions for the bearing capacity of closed and open piles using SHAP values. While SHAP focuses on the impact of variables acting independently and emphasizes the contribution of individual sample points, interactive effects of more than two variables can be derived from SOBOL analysis. Zouhri *et al.* (2022) enhanced the robustness of SVMs against feature uncertainty by introducing the SOBOL analysis method. Xu *et al.* (2023) combined ML models with the SOBOL analysis to propose a global sensitivity analysis approach for assessing the safety and effectiveness of dam structural parameters. Therefore, with the ability to reveal the dependence of the target variables on the input variables, interpretable analysis offers not only valuable insights during the design phase but also the stage for multi-objective optimization analysis.

Currently, the NSGA-II is widely used in solving multi-objective optimization problems. For example, Komasi & Goudarzi (2021) used NSGA-II to design a water network consisting of 12 observation points and significantly reduced the number of monitoring sites. Dey *et al.* (2019) employed numerical simulation to study flow and heat transfer around square and circular shapes and used NSGA-II for optimization successfully finding the optimal roundness of the cylinder corner. Zhao *et al.* (2023) combined RSM with NSGA-II for optimizing lithium batteries, finally effectively reducing the maximum temperature difference and pressure difference while enhancing the PCM liquid phase fraction. In this study, considering preventing the damage to the trash rack requires the constraint of the lateral inlets and outlets to reach the desired *V _{m}* and

*U*, multi-objective optimization analysis is carried out.

_{c}To achieve the multi-objective optimization of lateral inlet/outlet, this paper is organized as follows: initially, CFD simulations are performed for 125 different scenarios. Subsequently, the SOBOL-GBDT method is used to establish a nonlinear mapping between influencing factors and the objective functions *V _{m}* and

*U*. The importance of each factor is analyzed through SOBOL first-order and second-order indices, considering the individual and interactive effects vary at different

_{c}*C*. The model's predictive accuracy and interpretability analysis are then compared with the results from SVR, RF, and XGBOOST models, confirming the reliability of the GBDT model. Finally, the SOBOL-GBDT-NSGA-II multi-objective optimization framework is applied to obtain the Pareto front solution set. Using a pseudo-weight method, the optimal solutions for two design scenarios, S1 and S2, are identified. The paper also analyzed the convergence of the algorithm and the distribution characteristics of the variable space based on these solutions.

_{A}## CONSTRUCTION OF THE LATERAL INLET/OUTLET MODEL

### Model parameters of the inlet/outlet

*α*) and vertical (

*β*) diffusion angles of 33.774° and 6.12°, respectively. The lateral inlet/outlet connects to the reservoir and the water conveyance system via an open channel and a tunnel, respectively. Water flows into the inlet/outlet for power generation and flows out for water pumping. The height (

*H*) and width (

*B*) of the channel are 13.5 m and 7.5 m, respectively, with a channel area (

*C*= B × H) of 101.25 m

_{A}^{2}. The diameter (D) of the tunnel is 9 m, and the length of the diffusion section (

*L*) is 42 m.

_{D}The experiment, as shown in Figure 1(b), uses a scale model ratio of 1:40. The inlet/outlet is made of acrylic glass. Under the condition of full discharge in pumping mode, the flow rate is 262 m^{3}/s, with an average velocity of 4.11 m/s. Flow velocities in each channel are measured using an Acoustic Doppler Velocimeter (ADV), which has a measurement accuracy of 1 × 10^{−3} m/s. In the inflow condition, water enters from a water tower into a triangular weir, which controls the flow rate with high precision. The water then flows into the reservoir and finally enters the inlet/outlet and downstream tunnel. The experiment mainly measures the flow velocity across the trash rack cross-section, as depicted in Figure 1(c). Time-averaged velocities at five points along the vertical direction of each channel are measured for validation of the numerical simulation.

### Numerical setup and validation

Flow direction . | Flow region . | Inlet flow rate (m^{3}/s)
. | Inflow velocity (m/s) . | Outlet water level (m) . |
---|---|---|---|---|

Inflow | Contracted flow | 261 | 4.11 | 740 |

Outflow | Diffusive flow | 261 | 4.11 | 740 |

Flow direction . | Flow region . | Inlet flow rate (m^{3}/s)
. | Inflow velocity (m/s) . | Outlet water level (m) . |
---|---|---|---|---|

Inflow | Contracted flow | 261 | 4.11 | 740 |

Outflow | Diffusive flow | 261 | 4.11 | 740 |

*ε*turbulence model (Shaheed

*et al.*2019). Figure 3 illustrates the global and local mesh structures for these components. To improve computational efficiency and accuracy, a structured meshing technique is employed using ICEM. The boundary layer is resolved by 10 layers of grids with a resolution of the first layer next to the wall set as 0.01 m. The mass, momentum, and

*k*-

*ɛ*transport equations are solved using the finite volume method. To deal with the coupling between pressure and velocity in the Navier–Stokes equations, the PISO algorithm is applied. Spatial discretization is achieved through a second-order upwind scheme, while time discretization adopts a first-order implicit approach. The Volume of Fluid (VOF) method is utilized to track the free surface. A time step of 0.001s is employed, converging to 10

^{−5}for each step. Averaging of the flow starts as the flow rates at the lateral inlet/outlet are stabilized.

## MULTI-OBJECTIVE OPTIMIZATION BASED ON THE GBDT MODEL

### Gradient boosting decision tree

*h*

_{0}(

*x*), which forms the foundation of the ensemble, is defined by minimizing the loss function

*L*(

*y*,

*h*(

*x*)). The ensemble model

*C*is represented as:

Here, *f*(*x*) denotes the final model obtained by summing the outputs of all the decision trees. The term *C _{m}* represents the contribution of the

*m*-th tree, while

*I*(

*x*∈

*R*) is an indicator function that determines if the input

_{m}*x*falls within the region

*R*covered by the

_{m}*m*-th tree.

Here, *h _{m}*(

*x*) is the updated model after adding the

*m*-th tree,

*h*

_{m}_{−1}(

*x*) is the model from the previous iteration,

*C*is the optimal value that minimizes the loss for the

_{mj}*j*-th region

*Q*, and

_{mj}*I*(

*x*∈

*Q*) is an indicator function assigning

_{mj}*C*to the inputs falling within the region

_{mj}*Q*.

_{mj}### Sobol interpretability analysis

Sobol analysis (Sobol 2001), proposed by Sobol in 2001, is a significant method used for sensitivity analysis. It is primarily employed to assess the impact of different input parameters on the output results of a model. This analysis is particularly suitable for dealing with complex models, especially in cases where there are interdependence or nonlinear relationships between the model's input parameters. In Sobol analysis, it is assumed that the model can be represented as *y* = *f*(*x*), where *x* = (*x*_{1},*x*_{2},…,*x _{n}*) are the model's input parameters, and

*y*is the output result. The core of Sobol analysis is based on the method of variance decomposition, which breaks down the total variance of the model output into contributions from different input variables. This method is primarily implemented through Sobol indices, which can be categorized into two types:

First-order index *S _{i}*, which represents the contribution of a single variable

*x*to the output variance

_{i}*V*(

*y*), defined as

*S*=

_{i}*V*/

_{i}*V*(

*y*), where

*V*is the variance of

_{i}*y*due to

*x*alone. Second-order index

_{i}*S*, which accounts for the effect of interactions between two variables

_{ij}*x*and

_{i}*x*, defined as

_{j}*S*=

_{ij}*V*(

_{ij}/V*y*), where

*V*represents the variance of

_{ij}*y*due to the interactive effects of

*x*and

_{i}*x*.

_{j}The method developed by Sobol was further enhanced by Saltelli *et al.* (2010), who suggested algorithmic extensions for calculating the sensitivity indices. The Saltelli extension incorporates new strategies for generating sample matrices and is applicable to models with a large number of input variables. Taking a model with *m* input variables as an example, a sample matrix of size *n*(2*m* + 2) is generated, where *n* is a base sample size to optimize computational efficiency. For example, with a base sample size of *n* = 1,024 and *m* = 4 input variables, the sample matrix would consist of 10,240 samples.

### NSGA-II and Pareto-font solution set

NSGA-II (Deb *et al.* 2002) (Non-dominated Sorting Genetic Algorithm II), developed by Kalyanmoy Deb and others in 2002, is an advanced algorithm designed for multi-objective optimization problems. It improves upon NSGA by effectively addressing complex issues with multiple conflicting objectives. The algorithm is distinguished by its non-dominated sorting process, which classifies solutions based on their dominance levels. This is critical for identifying solutions that are closest to the ideal outcome in scenarios with competing objectives. Additionally, NSGA-II integrates a crowding distance mechanism, ensuring a diverse range of solutions and preventing the over-concentration of solutions in a particular region of the solution space.

The Pareto front (Poulos *et al.* 2001), in the context of multi-objective optimization, is a key concept intimately linked with NSGA-II's objectives. It represents a set of non-dominated solutions, meaning that no other solution in the set is superior in all objectives (Safari *et al.* 2018). The Pareto front forms a conceptual boundary that maps the distribution of these optimal solutions, showcasing the optimal trade-offs among different objectives. This boundary is crucial for decision-makers as it guides them in identifying the most effective solutions that strike a balance between competing objectives. NSGA-II aims to approximate this Pareto front as closely as possible (Babajamali *et al.* 2022), providing a comprehensive set of balanced solutions for complex optimization problems.

In multi-objective optimization, the main components include input variables and objective functions. This study is dedicated to finding the optimal shape for lateral inlet/outlet. Therefore, vertical diffusion angle, horizontal diffusion angle, length of diffusion segment, and flow channel area are selected as variables. The objective function is adjusted to reduce average flow velocity and achieve as uniform a velocity distribution as possible across the trash rack section. To address the multi-objective optimization problem effectively, the Pymoo package in Python was utilized. The algorithm was configured with a population size of 100 and iterated across 120 generations.

### Objective function and dataset

*v*) at the trash rack section (Zhu

*et al.*2023), it is necessary to properly design the inlet/outlet therefore control

*v*within a reasonable range (Administration 2019), which is usually less than 1.2 m/s. During the design, the maximum

*v*that corresponds to the most hazardous condition is chosen as the target function

*V*.where

_{m}*i*indicates the channel number.

*U*defined as:where

_{c}*v*represents the average flow velocity, and

_{i}*v*

_{i}_{,max}denotes the maximum local velocity at the trash rack section for channel

*i*. In the design of lateral inlet/outlet, it is necessary to meet the requirements for both

*V*and

_{m}*U*simultaneously.

_{c}*Q*) under specific inflow conditions is governed by four variables:

_{i}*α*,

*β*,

*L*, and

_{D}*L*. As illustrated in Figure 1, these variables represent the vertical and horizontal diffusion angles, diffusion segment length, and spacing between diversion piers, respectively. Hence, the average velocity (

_{S}*v*) is determined as the ratio of the flow rate (

_{i}*Q*) and the channel cross-sectional area (

_{i}*C*), expressed as

_{A}*v*=

_{i}*Q*/

_{i}*C*. When

_{A}*L*remains constant, the simplified equation becomes

_{S}This paper aims to provide a rational design for the inlet/outlet, as well as enrich the database for research on ML and multi-objective optimization. Specifically, five values have been selected for the vertical diffusion angle (*α*): 3, 4, 5, 6.12, and 7; and five values for the horizontal diffusion angle (*β*): 27, 30, 33.77, 37, and 41. Additionally, the diffusion segment length (*L _{D}*) includes values of 32, 37, 42, 47, and 52. Ranges of the input parameters are listed in Table 2 and CFD simulations are performed for these 125 cases.

. | α (°)
. | β (°)
. | L (m)
. _{D} | C (m_{A}^{2})
. | U
. _{C} | V (m/s)
. _{m} | Samplers . |
---|---|---|---|---|---|---|---|

Data | 3.00,4.00,5.00,6.12,7.00 | 27.00,30.00, 33.77,37.00, 41.00 | 32,37,42,47,52 | 52.99–166.74 | 1.28–5.35 | 0.50–1.54 | 125 |

. | α (°)
. | β (°)
. | L (m)
. _{D} | C (m_{A}^{2})
. | U
. _{C} | V (m/s)
. _{m} | Samplers . |
---|---|---|---|---|---|---|---|

Data | 3.00,4.00,5.00,6.12,7.00 | 27.00,30.00, 33.77,37.00, 41.00 | 32,37,42,47,52 | 52.99–166.74 | 1.28–5.35 | 0.50–1.54 | 125 |

## RESULTS AND DISCUSSION

### Establishment of interpretable ML models

#### Optimization of ML models

Evaluations of the performance of the traditional ML models including SVR (Vapnik 1999), RF (Breiman 2001), and XGBOOST (Chen & Guestrin 2016) together with GBDT are carried out. With 80% of the data allocated as the training set and the remaining 20% as the test set, Bayesian Optimization (BO) (Bergstra *et al.* 2011) was employed to fine-tune the hyperparameters of each ML model and the results are listed in Table 3. The predictive performance was indicated using common metrics: *R*^{2}, RMSE, and MAE (Mia & Dhar 2019; Mo *et al.* 2023), and Table 4 provides a comparison of model performance before and after optimization on the test set. The results indicate that the optimized models have better performance, hence the analysis proceeded with these optimized models.

. | Model . | Hyperparameters . | ||
---|---|---|---|---|

U _{c} | SVR-BO | C = 98.56 | γ = scale | ε = 0.04 |

RF-BO | n_estimators = 948 | max_depth = 40 | min_samples_leaf = 1 | |

GBDT-BO | n_estimators = 660 | max_depth = 10 | Learning_rate = 0.17 | |

XGboost-BO | n_estimators = 718 | max_depth = 5 | Learning_rate = 0.11 | |

V _{m} | SVR-BO | C = 69.51 | γ = scale | ε = 0.01 |

RF-BO | n_estimators = 279 | max_depth = 30 | min_samples_leaf = 1 | |

GBDT-BO | n_estimators = 327 | max_depth = 8 | Learning_rate = 0.29 | |

XGboost-BO | n_estimators = 952 | max_depth = 4 | Learning_rate = 0.13 |

. | Model . | Hyperparameters . | ||
---|---|---|---|---|

U _{c} | SVR-BO | C = 98.56 | γ = scale | ε = 0.04 |

RF-BO | n_estimators = 948 | max_depth = 40 | min_samples_leaf = 1 | |

GBDT-BO | n_estimators = 660 | max_depth = 10 | Learning_rate = 0.17 | |

XGboost-BO | n_estimators = 718 | max_depth = 5 | Learning_rate = 0.11 | |

V _{m} | SVR-BO | C = 69.51 | γ = scale | ε = 0.01 |

RF-BO | n_estimators = 279 | max_depth = 30 | min_samples_leaf = 1 | |

GBDT-BO | n_estimators = 327 | max_depth = 8 | Learning_rate = 0.29 | |

XGboost-BO | n_estimators = 952 | max_depth = 4 | Learning_rate = 0.13 |

Model . | U test_{c}. | V test_{m}. | ||||
---|---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | |

SVR | 0.939 | 0.204 | 0.161 | 0.924 | 0.058 | 0.046 |

SVR-BO | 0.945 | 0.193 | 0.154 | 0.995 | 0.015 | 0.010 |

RF | 0.940 | 0.202 | 0.162 | 0.995 | 0.015 | 0.011 |

RF-BO | 0.947 | 0.190 | 0.146 | 0.995 | 0.015 | 0.010 |

GBDT | 0.972 | 0.137 | 0.101 | 0.997 | 0.012 | 0.009 |

GBDT-BO | 0.987 | 0.093 | 0.077 | 0.997 | 0.012 | 0.009 |

XGBOOST | 0.959 | 0.168 | 0.127 | 0.987 | 0.024 | 0.020 |

XGBOOST-BO | 0.974 | 0.133 | 0.101 | 0.993 | 0.017 | 0.013 |

Model . | U test_{c}. | V test_{m}. | ||||
---|---|---|---|---|---|---|

R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | |

SVR | 0.939 | 0.204 | 0.161 | 0.924 | 0.058 | 0.046 |

SVR-BO | 0.945 | 0.193 | 0.154 | 0.995 | 0.015 | 0.010 |

RF | 0.940 | 0.202 | 0.162 | 0.995 | 0.015 | 0.011 |

RF-BO | 0.947 | 0.190 | 0.146 | 0.995 | 0.015 | 0.010 |

GBDT | 0.972 | 0.137 | 0.101 | 0.997 | 0.012 | 0.009 |

GBDT-BO | 0.987 | 0.093 | 0.077 | 0.997 | 0.012 | 0.009 |

XGBOOST | 0.959 | 0.168 | 0.127 | 0.987 | 0.024 | 0.020 |

XGBOOST-BO | 0.974 | 0.133 | 0.101 | 0.993 | 0.017 | 0.013 |

In *U _{c}* prediction, the GBDT-BO showed the best performance, with

*R*

^{2}, RMSE, and MAE values of 0.987, 0.093, and 0.077, respectively, followed by the XGBOOST-BO model with

*R*

^{2}, RMSE, and MAE values of 0.974, 0.133, and 0.101. The SVR-BO model had the poorest performance that presenting 4.2% lower while 54.4 and 100% higher magnitudes of

*R*

^{2}, RMSE and MAE, respectively compared to the GBDT-BO model.

In *V _{m}* prediction, the

*R*

^{2}values for the SVR-BO, RF-BO, GBDT-BO, and XGBOOST-BO were 0.995, 0.995, 0.997, and 0.993, respectively, indicating strong predictivity against the test data. GBDT-BO presents the least error, with RMSE and MAE of 0.012 and 0.009, whereas XGBOOST-BO had the largest error, with its RMSE and MAE of 41.6 and 44.4% higher than GBDT-BO, respectively.

#### Sobol interpretability analysis

With the established ML model, the interactions and significance of various input variables in relation to the model are further investigated. The Sobol method is applied to calculate first- and second-order Sobol indices, revealing the significance of individual variables and the interactive effect of two variables during the prediction process. Considering larger *V _{m}* has a relatively milder effect on the trash rack and requires long-term accumulation of fatigue that finally leads to the trash rack failure, whereas the increase of

*U*leads to a significant increase of flow-induced vibration and causes large deformation of the lower and middle parts of the trash rack (Nguyen & Naudascher 1991). Therefore, targeted on the output

_{c}*U*, relative significances of the influencing factors including

_{c}*α*,

*β*,

*L*, and

_{D}*C*are investigated to guide the optimization for the design of lateral inlet/outlet.

_{A}Tables 5 and 6 present first-order (*S _{i}*) and second-order (

*S*) Sobol indices in the ML models that correspond to the individual and interactive effects of the input variables, respectively. Focusing on the

_{ij}*S*listed in Table 5 firstly,

_{i}*C*presents predominant significance in the SVR and RF models with

_{A}*S*of 0.97, whereas XGBOOST presents a relatively lower magnitude of

_{CA}*S*and higher

_{CA}*S*for

_{i}*α*,

*β*and

*L*, indicating increased importance of these variables in the model development. In GBDT-BO, a further decrease of

_{D}*S*accompanied by the increase of

_{CA}*S*and

_{α}S_{β}*S*can be observed.

_{LD}. | S
. _{α} | S
. _{β} | S
. _{LD} | S
. _{CA} |
---|---|---|---|---|

SVR | 0.000 | 0.000 | 0.025 | 0.972 |

RF | 0.012 | 0.000 | 0.001 | 0.978 |

GBDT | 0.244 | 0.156 | 0.049 | 0.511 |

XGBOOST | 0.113 | 0.121 | 0.057 | 0.709 |

. | S
. _{α} | S
. _{β} | S
. _{LD} | S
. _{CA} |
---|---|---|---|---|

SVR | 0.000 | 0.000 | 0.025 | 0.972 |

RF | 0.012 | 0.000 | 0.001 | 0.978 |

GBDT | 0.244 | 0.156 | 0.049 | 0.511 |

XGBOOST | 0.113 | 0.121 | 0.057 | 0.709 |

. | α,β
. | α,L
. _{D} | α,C
. _{A} | β,L
. _{D} | β,C
. _{A} | L
. _{D},C_{A} |
---|---|---|---|---|---|---|

SVR | 0.000 | 0.006 | 0.091 | 0.033 | 0.000 | 0.000 |

RF | 0.000 | 0.000 | 0.079 | 0.013 | 0.001 | 0.000 |

GBDT | 0.010 | 0.000 | 0.217 | 0.020 | 0.140 | 0.040 |

XGBOOST | 0.020 | 0.000 | 0.132 | 0.030 | 0.114 | 0.000 |

. | α,β
. | α,L
. _{D} | α,C
. _{A} | β,L
. _{D} | β,C
. _{A} | L
. _{D},C_{A} |
---|---|---|---|---|---|---|

SVR | 0.000 | 0.006 | 0.091 | 0.033 | 0.000 | 0.000 |

RF | 0.000 | 0.000 | 0.079 | 0.013 | 0.001 | 0.000 |

GBDT | 0.010 | 0.000 | 0.217 | 0.020 | 0.140 | 0.040 |

XGBOOST | 0.020 | 0.000 | 0.132 | 0.030 | 0.114 | 0.000 |

As for the interactive effects of dual variables listed in Table 6, noticeable *S _{ij}* only exists between

*α*and C

*in SVR and RF with magnitudes of less than 0.1, whereas both GBDT and XGBOOST have*

_{A}*S*and

_{α,CA}*S*of greater than 0.1, indicating the possible interactive effect between

_{β,CA}*α*and

*C*and

_{A}*β*and

*C*that are captured by these models.

_{A}Considering the designs of the inlet/outlet are usually based on a given flow velocity that is determined by *C _{A}*, a range for

*C*is selected between 52.99 and 166.74, within which 200

_{A}*C*are chosen for analysis. By sequentially studying these 200

_{A}*C*values, the Sobol method is employed in each iteration to calculate the relative significances of

_{A}*α*,

*β*, and

*L*in the ML model development at specific

_{D}*C*values. This approach provides a rational reference for designing the inlet/outlet structure for selected specific

_{A}*C*values.

_{A}*α*,

*β*, and

*L*with respect to changes in

_{D}*C*. Results in Figure 6(a) indicate that in the SVR prediction,

_{A}*L*is the least important input, while both

_{D}*α*and

*β*present a stronger effect. The relative importance of these inputs remains invariant with the change of

*C*. The higher significance of

_{A}*α*compared with

*L*can also be observed in RF, GBDT, and XGBOOST models as shown in Figures 6(b)–6(d), respectively, whereas the significance of

_{D}*β*lies in between them. Overall, these results indicate

*L*plays the most significant role whereas

_{D}*β*is the least important parameter in the

*U*prediction.

_{c}To reflect the generalization performance of the models, samples were taken during the iterative process mentioned above. The ML models then predicted and averaged the outcomes for the 10,240 samples drawn in that iteration. Variation of *U _{C}* derived from each ML model with respect to

*C*were obtained and compared with the original dataset, as shown in Figure 6(e). The figure indicates SVR had the poorest predictive performance, fitting well only when

_{A}*C*was between 72 and 137; followed by RF and XGBOOST, showing errors near the maximum and minimum

_{A}*C*. Lastly, results from GBDT present the best agreement with the experimental data.

_{A}### Multi-objective optimization analysis of inlet/outlet

Results in Figure 6(e) indicate that the increase of *C _{A}* leads to an increase of

*U*, which increases damage risk due to flow-induced vibrations. However, the increase of

_{c}*C*also causes

_{A}*V*decreases, suggesting that lower velocities with less turbulent energy reduce fatigue damage risk. Considering the multiple critical output factors including

_{m}*U*and

_{c}*V*and input parameters including

_{m}*α*,

*β*,

*L*, and

_{D}*C*in the lateral inlets/outlets design, the necessity of a multi-objective optimization approach is necessary to achieve an optimal design. In this work, the GBDT model is adopted to establish the relationship between input variables

_{A}*α*,

*β*,

*L*,

_{D}*C*, and objectives

_{A}*U*,

_{c}*V*. During the model development, two constraints, S1 and S2 are applied: S1 requires

_{m}*V*below 1.2 m/s whereas S2 keeps

_{m}*C*constant and performs optimization by adjusting

_{A}*α*,

*β*, and

*L*.

_{D}#### Algorithm convergence analysis

#### Analysis of patterns in objective and variable space

*U*and

_{c}*V*. Any improvement in one solution leads to the deterioration of the other solution, known as a non-dominated solution. It is observed that the range of the solution set for S1 is significantly larger compared to S2, indicating that S2 has stricter constraints in the model development. The selected point (marked as a red cross) represents the optimal solution achievable in the design phase obtained using pseudo-weights (Kalyanmoy 2001). In this process, the weights for

_{m}*U*and

_{c}*V*are 0.7 and 0.3, respectively, indicating a greater emphasis on mitigating the adverse effects caused by

_{m}*U*during the selection process. The objective function values for S1, S2 and the original design are presented in Table 7. It is evident that

_{c}*U*decreases sharply in S1 with a magnitude of 33% compared to the original design. However, an increase of

_{c}*V*with 16.7% compared with the original design exists. S1 aims to increase

_{m}*V*to some extent while reducing

_{m}*U*to prevent excessive

_{c}*U*-induced flow-induced vibration on the debris barrier. In S2,

_{c}*V*remains essentially unchanged at 0.804 m/s, while

_{m}*U*decreases by 18.5% compared to the original design.

_{c}. | α (°)
. | β (°)
. | L (m)
. _{D} | C (m_{A}^{2})
. | V (m/s)
. _{m} | U
. _{c} |
---|---|---|---|---|---|---|

Original | 6.12 | 33.77 | 42.00 | 101.25 | 0.803 | 3.18 |

S1 | 3.29 | 28.80 | 49.60 | 88.76 | 0.937 | 2.13 |

S2 | 3.67 | 32.10 | 50.14 | 101.78 | 0.804 | 2.59 |

. | α (°)
. | β (°)
. | L (m)
. _{D} | C (m_{A}^{2})
. | V (m/s)
. _{m} | U
. _{c} |
---|---|---|---|---|---|---|

Original | 6.12 | 33.77 | 42.00 | 101.25 | 0.803 | 3.18 |

S1 | 3.29 | 28.80 | 49.60 | 88.76 | 0.937 | 2.13 |

S2 | 3.67 | 32.10 | 50.14 | 101.78 | 0.804 | 2.59 |

*α*are predominantly distributed between 3° and 4°, and in S2, the range becomes 3° and 4.5°. Significant reduction exists compared to its original magnitude of 6.12°. Optimal

*β*in S1 and S2 are reduced by 14.7 and 4.9%, respectively, from the original 33.77°. The reductions in

*α*and

*β*aim to prevent flow separation, thus reducing

*U*. The range of

_{c}*C*is more varied in S1 but concentrated between 95 m

_{A}^{2}–110 m

^{2}in S2, with optimal values at 88.79 m

^{2}for S1 and 101.78 m

^{2}for S2.

*L*is focused around 45–50 m in S1 and 49–51 m in S2, compared to the original 42 m, with optimal values at 49.6 and 50.14 m. The optimization strategies, reducing

_{D}*α*,

*β*, and

*C*while increasing

_{A}*L*, are evident in Figures 9(i), 9(j), 9(g), and 9(l), effectively lowering

_{D}*U*while controlling

_{c}*V*.

_{m}#### Analysis of optimization results

*x*-directional fluid velocity distribution in Channel 1 are shown in Figure 10. Under outflow conditions, the original scheme experienced flow separation after the diffusion section, concentrating the main flow velocity near the middle and bottom. Compared to the original case, S1 and S2 significantly reduced the area of flow separation therefore resulting in a more uniform velocity distribution across the trash rack section. During inflow conditions, characterized by contracting flow, the differences in flow patterns were relatively minor.

*C*. In Channel 1, the maximum normalized flow velocity at S1 is 1.57, which is a 29.2% reduction compared to the original case, while the maximum normalized flow velocity at S2 is 2.16, only decreasing by 2.7%. In inflow conditions, the velocity distribution across the trash rack section is relatively similar for different schemes, hence these results are not presented.

_{A}### Disccusion

This study carried out interpretability analysis for the ML models, providing crucial insights for capturing and understanding the impact of input features on model predictions (Altmann *et al.* 2010; Mi *et al.* 2020). Most researchers (Karakaş *et al.* 2023; Mo *et al.* 2023; Zhao *et al.* 2024) conduct interpretability analysis of ML models using the Shapley value method, which calculates the Shap values of input variables to analyze feature importance. However, these values only represent the individual effect of features on the model. This paper employs the SOBOL method to calculate first- and second-order SOBOL indices, revealing that the results are influenced not only by the individual action of features but also by the interactions among features, especially in GBDT and XGBOOST models, as shown in Table 6. The SOBOL method can reveal the impact of variables or variable combinations on model output, serving as a powerful tool for analyzing high-dimensional data and nonlinear models. From an engineering perspective, the quantified relative significances of input variables under different *C _{A}* provide an intuitional reference for the design of the inlet/outlet.

However, limitations also exist in the current investigation. Due to the extensive time required for CFD simulations, the range of input variables is restricted. Future work should aim to extend the investigated parameter range to derive the ML model with better general applicability. Moreover, different multiple optimization algorithms exist in the literature which makes it worth carrying out an investigation for evaluating their relative performances when applied to the optimized lateral inlet/outlet design.

## CONCLUSION

In the PSPS, there is a presence of complex bidirectional flow in the lateral inlet/outlet. It is essential for the design process to effectively reduce the excessive turbulence intensity in the flow channels and prevent resonance damage caused by aligning the water flow frequency with the trash rack frequency. Given the multiple factors that impact the overall design, it is crucial to employ a robust optimization method to simultaneously optimize the *V _{m}* and

*U*at the trash rack. To address these concerns, this paper proposes the implementation of the SOBOL-GBDT-NSGA-II framework for optimizing the structures of both the inlet/outlet. The principal findings of this study are as follows:

_{c}The GBDT model accurately predicts the nonlinear mapping relationship between input variables, such as the vertical diffusion angle (*α*), horizontal diffusion angle (*β*), diffusion segment length (*L _{D}*), and channel area (

*C*), with the outputs

_{A}*U*and

_{c}*V*. Additionally, the precision of GBDT's results is enhanced through BO. For instance, when considering

_{m}**, after optimization, GBDT achieves an**

*U*_{c}*R*

^{2}of 0.987, RMSE of 0.093, and MAE of 0.077. These performance metrics showcase the high level of accuracy and reliability of the GBDT model, surpassing the results obtained by SVR, RF, and XGBOOST models.

In the GBDT model, *S _{CA}* is significantly larger than

*S*、

_{α}*S*and

_{β}*S*, indicating that

_{LD}*C*has the greatest impact on the

_{A}*U*prediction model. Meanwhile, non-negligible magnitudes for

_{c}*S*and

_{α}*S*are observed, denoting a discernible impact of

_{β}*α*and

*β*on the GBDT model. While in other ML models, notably SVR and RF, trivial magnitudes of

*S*and

_{α}*S*indicate their much lower influence on the model's efficacy. Moreover, the GBDT model exhibits significant values for both the

_{β}*S*and

_{α,CA}*S*, indicative of the pronounced effect of these pair interactions on the model's performance, whereas no interactive effect between the input variables is captured in SVR and RF. This indicates that GBDT is better at extracting interactive information from variables, facilitating the capture of complex, nonlinear relationships between the input variables and the output. As

_{β, CA}*C*increases,

_{A}*α*has the greatest influence on the model, followed by

*β*, while

*L*has the least. This pattern reflects how changes in the

_{D}*C*can significantly affect the relative importance of different design parameters in predicting the

_{A}*U*, which is crucial for optimizing the design of lateral inlet/outlet in hydraulic structures.

_{c}The SOBOL-GBDT-NSGA-II framework is used to identify the optimal Pareto front solutions for two different design scenarios, S1 and S2, aiming to minimize both *V _{m}* and

*U*. The optimal condition in both S1 and S2 signified lower

_{c}*α*,

*β*, and

*C*with higher

_{A}*L*compared with the original case. This optimization approach can provide guidance for similar projects and help determine appropriate parameter ranges. The multi-objective optimization takes into account the mutual influences of both objectives. Through this framework, flow separation in the lateral inlet/outlet is significantly reduced. In S1,

_{D}*U*is reduced by 33% while

_{c}*V*increases by 16.7%, and in S2,

_{m}*U*decreases by 18.5% with

_{c}*V*remaining essentially unchanged. This effectively lowers the likelihood of resonance damage to the trash rack.

_{m}## FUNDING

This research is supported by the National Nature Science Foundation of China (Grant Nos 52179060, 51909024, and 52209081).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Water Power& Dam Construction August*