## Abstract

In contrast to the traditional black box machine learning model, the white box model can achieve higher prediction accuracy and accurately evaluate and explain the prediction results. Cavity water depth and cavity length of aeration facilities are predicted in this research based on Extreme Gradient Boosting (XGBoost) and a Bayesian optimization technique. The Shapley Additive Explanation (SHAP) method is then utilized to explain the prediction results. This study demonstrates how SHAP may order all features and feature interaction terms in accordance with the significance of the input features. The XGBoost–SHAP white box model can reasonably explain the prediction results of XGBoost both globally and locally and can achieve prediction accuracy comparable to the black box model. The cavity water depth and cavity length white box model developed in this study have a promising future application in the shape optimization of aeration facilities and the improvement of model experiments.

## HIGHLIGHTS

SHAP can accurately evaluate the prediction results of XGBoost.

SHAP considers the role of both single features and interactive features.

Bayesian optimization can significantly improve XGBoost performance.

Local interpretation can visualize the impact of all features.

The cavity water depth is more complex than the cavity length.

## INTRODUCTION

In high-head flow buildings, the phenomenon of cavitation erosion is often caused by high flow velocity, high pressure, and large flow rate, which will cause erosion and damage to the flow surface of the building (Glazov 1984; Wu & Chao 2011). Currently, engineers use several measures to reduce cavitation damage, including designing overcurrent buildings with appropriate shapes, ensuring the flatness of overcurrent surfaces, and implementing an aeration corrosion reduction scheme. Among them, the experimental study shows that the aeration corrosion reduction scheme has the characteristics of good operability and significant corrosion reduction effect and has the best comprehensive performance in the above scheme (Pfister & Hager 2010; Bai *et al.* 2016). Cavity water depth and cavity length are key indicators for measuring the corrosion reduction effect of aeration facilities. A longer cavity length and lower backwater level indicate more efficient water aeration, resulting in an improved corrosion reduction effect (Glazov 1984; Wu & Ruan 2008). Many factors affect the cavity water depth and the cavity length, including the shape of the aeration facility, the flow conditions, etc., so it is difficult to directly observe and analyze the main influencing factors (Brujan & Matsumoto 2012; Tsuru *et al.* 2017).

At present, machine learning technology has a large number of applications in the field of water conservancy projects, such as the prediction of weir flow rate coefficient, scour depth, and water quality assessment (Parsaie *et al.* 2017; Azamathulla *et al.* 2019). Relevant research results show that machine learning algorithms can achieve good prediction effects in water conservancy project applications. Machine learning models are divided into the black box model and the white box model. Black box models include Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), etc. Most machine learning search results are based on black box patterns (Azimi *et al.* 2016; Pan *et al.* 2022). Some black box models can achieve excellent fitting and prediction results in classification and regression problems. For example, some scholars introduced genetic algorithm (GA) and Bayesian optimization to adjust the hyperparameters of XGBoost and found that the *R*^{2} (coefficient of determination) of the model prediction results was as high as 0.941 (Gu *et al.* 2022; Kim *et al.* 2022). However, the internal mechanism of black box models is too complex, it is difficult for researchers to explain the prediction results and the prediction results are not convincing (Mi *et al.* 2020). White box models are also called interpretable machine learning models, which overcome the defect of poor interpretation performance of black box models. White box models are divided into two categories, intrinsic interpretable machine learning model and ex-post interpretable machine learning model. Intrinsic interpretable machine learning models include the Generalized Addictive Model (GAM), Explainable Boosting Machine (EBM), etc., where GAM ignores the interaction terms of input features and the prediction accuracy of the model is too low (Agarwal *et al.* 2021). Some intrinsic interpretable machine learning models can achieve excellent prediction results and make reasonable explanations for the results, such as GAMI-Net (Yang *et al.* 2021). In addition, there is an interpretable machine learning model based on physical meaning. Research demonstrated that the use of high-level concepts aid in evolving equations that are easier to interpret by domain specialists (Babovic 2009). To simulate groundwater levels, scholars have proposed a method of embedding physical constraints into machine learning models. Research shows that the physically constrained hybrid model exhibits better adaptability and generalization abilities when compared to pure deep learning models (Cai *et al.* 2021, 2022). In the field of rainfall–runoff simulation, there are also physically meaningful machine learning models such as Model Induction Knowledge Augmented-System Hydrologique Asiatique (MIKA-SHA) and Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI). They can help hydrologists better understand catchment dynamics (Chadalawada *et al.* 2020; Herath *et al.* 2021). Ex-post interpretable machine learning models interpret the prediction results of black box models through complex ex-post analysis methods, such as Partial Dependence Plot (PDP), Accumulated Local Effect (ALE), and Shapley Additive Explanation (SHAP) (Maxwell *et al.* 2021; Mangalathu *et al.* 2022). SHAP is an ex-post interpretation method that draws on the idea of game theory. By calculating the marginal contribution value of all input features and feature interaction items in the model, namely the Shapley Value, it is used to measure the influence of features and interaction items, so as to realize the interpretation of the black box model (Meddage *et al.* 2022).

In this study, the cavity water depth and cavity length prediction model based on XGBoost–SHAP was established by collecting experimental data. The model uses Bayesian optimization to search the hyperparameters of the model, uses SHAP to explain the global and local interpretation of the model prediction results, and analyzes the rationality of the interpretation results according to the experimental conclusions.

## METHODS

### Extreme gradient boosting

*n*denotes the total number of samples, denotes the actual value of

*i*th samples, denotes the predicted value of

*i*th samples, denotes the model for

*k*th tree, and denotes the complexity of

*k*th trees.

*et al.*2022). Perform second-order Taylor expansion on the objective function:where , . Next, the objective function is further simplified:

### Bayesian optimization

*et al.*2022). The fundamental principle of Bayesian optimization is to estimate the posterior distribution of the objective function using the data and the Bayesian theorem, and then choose the subsequent sample hyperparameter combination based on the distribution. The data of the preceding sample points are fully utilized, and the optimization process finds the parameter that optimizes the overall improvement of the result by learning the shape of the objective function (Sui & Yu 2020). The following is the Bayesian theorem:where

*f*denotes the unknown objective function, = denotes the observed set, denotes the observed vector, and denotes the observed value error. denotes the likelihood distribution of

*y*, is the prior probability distribution of

*f*. denotes the marginal likelihood distribution of marginal

*f*, and denotes the posterior probability distribution of

*f*. The most commonly used kernel function for Gaussian processes is the radial basis function RBF, which measures the distance between any two points by generating a covariance matrix. The basic form is as follows:where and are hyperparameters of the Gaussian kernel, and are different values of a hyperparameter. Prior Function and Acquisition Function are the two main processes in Bayesian optimization. The Prior Function process primarily employs Gaussian process regression, as illustrated in Figure 1.

### Shapley Additive Explanation

*et al.*2022). Shapley Value is another name for the marginal contributions of different features and interactions (Mangalathu

*et al.*2022). Shapley Value can evaluate the impact of various features on the model outcomes and, in turn, provide an explanation for the black box model's prediction outcomes. The ex-post interpretation model

*g*is written as follows for a single sample

*x*:where

*M*is the number of features in the black box model

*f*; is the average of the predicted values of all samples of

*f*, also known as base value ; is the Shapley value of

*i*th feature that needs to be calculated, and it is also the core of the whole SHAP method. At the same time, the post-interpretation model

*g*needs to satisfy the following three properties:

- 1.
Local accuracy, where the predicted value of model

*g*for a single sample is equal to the predicted value of the black box model for a single sample, . - 2.
Missingness. Missing sample features have no bearing on the ex-post interpretation model

*g*in the case of a single sample with missing values, = 0. - 3.
Consistency. The of the feature changes with the contribution of the feature in the new model for a single sample when the complexity of the model changes, for as from RF to XGBoost. In 1938, Lyold Shapely demonstrated that satisfies the definition and that the three aforementioned conditions each have a sole solution.

*g*, that is:

*x*joins the model and does not join the model under various feature combinations, the simulation results will change, as shown by the expected value of Equation (12). Among them,

_{i}*M*denotes the feature set;

*s*denotes the feature subset of ; the value of

*S*has many cases, corresponding to various feature combinations; and and denote the output results of the model under various feature combinations when is added to the model and when it is not added to the model. denotes the probability corresponding to various feature combinations, ‘| |’ denotes the number of elements in the set, and ‘!’ denotes factorial. Numerous estimated Shapley Value calculation techniques have been created because the actual calculation of the Shapley Value takes a lot of time. Different approximation techniques, such as TreeSHAP for tree models, DeepSHAP for neural networks, and model-independent Kernel SHAP, are developed in accordance with the peculiarities of the black box model to be interpreted (Mi

*et al.*2020). Because of its quick calculation times and accurate interpretation of black box model prediction results, TreeSHAP is the most popular of them all (Meddage

*et al.*2022). Based on TreeSHAP, this study aims to propose an interpretable machine learning model.

## EXPERIMENTAL DATA AND EVALUATION INDEX

### Experiments and data

*i*is the bottom slope of the flume, Δ is the height of the ridge,

*Q*is the flow rate,

*I*is the slope of the flip bucket,

*V*is the flow velocity on the ridge,

*h*is the water depth on the ridge, Fr is the Froude number,

*L*is the cavity length value,

*θ*is the impact angle of the water tongue,

*d*is the water depth value of the cavity. Among them,

*i*, Δ,

*I*, and

*Q*are preset values for different test conditions, and the remaining variables are observed values. A total of 270 sets of experimental data were collected from the two groups of experiments. The data range is shown in Table 1.

i
. | Δ (cm) . | Q (L/s)
. | I
. | V (m/s)
. | h (cm)
. | Fr . | L (cm)
. | θ (°) . | d (cm)
. |
---|---|---|---|---|---|---|---|---|---|

0.077 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.00–1.75 | 1.50–3.40 | 2.46–3.27 | 4.9–20.2 | 6.0–20.0 | 0.40–1.85 |

0.087 | 1.0–4.0 | 1.7–142.7 | 0.1–0.2 | 1.07–5.60 | 1.35–8.50 | 2.55–6.13 | 5.0–71.7 | 6.0–20.0 | 0.00–1.70 |

0.096 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.07–1.75 | 1.25–3.25 | 2.62–3.90 | 5.2–26.5 | 5.5–19.0 | 0.40–1.30 |

0.105 | 1.5–4.0 | 14.4–141.7 | 0.1–0.2 | 1.70–5.90 | 2.80–8.00 | 3.27–6.67 | 13.7–73.0 | 7.8–14.1 | 0.00–2.60 |

0.122 | 1.5–4.0 | 13.4–138.0 | 0.1–0.2 | 1.80–5.90 | 2.50–7.80 | 3.61–7.20 | 14.4–75.0 | 7.5–13.6 | 0.00–2.50 |

i
. | Δ (cm) . | Q (L/s)
. | I
. | V (m/s)
. | h (cm)
. | Fr . | L (cm)
. | θ (°) . | d (cm)
. |
---|---|---|---|---|---|---|---|---|---|

0.077 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.00–1.75 | 1.50–3.40 | 2.46–3.27 | 4.9–20.2 | 6.0–20.0 | 0.40–1.85 |

0.087 | 1.0–4.0 | 1.7–142.7 | 0.1–0.2 | 1.07–5.60 | 1.35–8.50 | 2.55–6.13 | 5.0–71.7 | 6.0–20.0 | 0.00–1.70 |

0.096 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.07–1.75 | 1.25–3.25 | 2.62–3.90 | 5.2–26.5 | 5.5–19.0 | 0.40–1.30 |

0.105 | 1.5–4.0 | 14.4–141.7 | 0.1–0.2 | 1.70–5.90 | 2.80–8.00 | 3.27–6.67 | 13.7–73.0 | 7.8–14.1 | 0.00–2.60 |

0.122 | 1.5–4.0 | 13.4–138.0 | 0.1–0.2 | 1.80–5.90 | 2.50–7.80 | 3.61–7.20 | 14.4–75.0 | 7.5–13.6 | 0.00–2.50 |

### Prediction effect analysis of different models

Results from previous studies demonstrate that the XGBoost Set Learning method performs better than SVM and RF in the regression prediction of complex nonlinear relational issues (Gu *et al.* 2022). In this study, a black box model based on XGBoost was established to predict the cavity water depth and the cavity length. The grid hyperparameters of XGBoost are searched using the Bayesian optimization technique, and the four main hyperparameters of n_estimators, max_depth, colsample_bytree and min_child_weight are optimized, respectively.

*R*

^{2}), root mean square error (RMSE), and mean absolute error (MAE) as measurement indicators. The methodology for calculating the indicators is as follows:where

*n*is the total number of samples, is the real observed value,

*y*is the average of the real observed value, and is the predicted value. SSE is the sum of the remaining squares, the error between the estimated value and the real value, reflecting the degree of fit of the model. SST is the sum of the squares of the total deviation, namely the error between the mean and the real value, reflecting the degree of deviation from the mathematical expectation.

*MSE*is the mean square error.

*R*

^{2}score diagram of the training set of the model, the test set with Bayesian optimization, and the test set without Bayesian optimization under different test set ratios. Among them, Bayesian optimization can improve the cavity water depth model's performance by 1%. Due to the high

*R*

^{2}score of the cavity length model training set, there is still a 0.4% performance improvement even though Bayesian optimization has a relatively low-performance improvement on the model. In conclusion, because Bayesian optimization can run an iterative memory search simultaneously on multiple hyperparameters, it is faster and more efficient than classical grid search and random search in model hyperparameter optimization.

Tables 2 and 3 record the *R*^{2}, RMSE, and MAE values of the cavity water depth and the cavity length prediction models in five test series. Among these, the validation set records the test values of the five-fold cross validation index combined with Bayesian optimization, and the test set records the final predictive effect of the model. As can be observed, the test set of the two models has the greatest *R*^{2} score when the proportion of the training set is 70%, the *R*^{2} of the cavity water depth is 0.919, and the *R*^{2} of the cavity length is as high as 0.987. The RMSE and MAE of the two models are relatively small when the training set ratio is 70%, showing that the model error is relatively minor. When combined with *R*^{2}, it can be demonstrated that relatively good prediction precision may be attained when the two models choose 70% of the training set. The model's optimized hyperparameter is displayed in Table 4.

The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | R^{2}. | RMSE . | MAE . |

40 | 0.968 | 0.128 | 0.096 | 0.948 | 0.195 | 0.141 | 0.921 | 0.197 | 0.145 |

35 | 0.970 | 0.122 | 0.093 | 0.934 | 0.186 | 0.140 | 0.915 | 0.212 | 0.157 |

30 | 0.967 | 0.127 | 0.093 | 0.915 | 0.236 | 0.177 | 0.919 | 0.208 | 0.157 |

25 | 0.967 | 0.131 | 0.095 | 0.900 | 0.223 | 0.164 | 0.899 | 0.222 | 0.167 |

20 | 0.969 | 0.125 | 0.095 | 0.880 | 0.218 | 0.154 | 0.881 | 0.245 | 0.170 |

The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | R^{2}. | RMSE . | MAE . |

40 | 0.968 | 0.128 | 0.096 | 0.948 | 0.195 | 0.141 | 0.921 | 0.197 | 0.145 |

35 | 0.970 | 0.122 | 0.093 | 0.934 | 0.186 | 0.140 | 0.915 | 0.212 | 0.157 |

30 | 0.967 | 0.127 | 0.093 | 0.915 | 0.236 | 0.177 | 0.919 | 0.208 | 0.157 |

25 | 0.967 | 0.131 | 0.095 | 0.900 | 0.223 | 0.164 | 0.899 | 0.222 | 0.167 |

20 | 0.969 | 0.125 | 0.095 | 0.880 | 0.218 | 0.154 | 0.881 | 0.245 | 0.170 |

The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | R^{2}. | RMSE . | MAE . |

40 | 0.992 | 0.090 | 0.071 | 0.987 | 0.114 | 0.086 | 0.985 | 0.124 | 0.091 |

35 | 0.991 | 0.093 | 0.074 | 0.980 | 0.119 | 0.094 | 0.987 | 0.116 | 0.088 |

30 | 0.992 | 0.090 | 0.071 | 0.994 | 0.079 | 0.064 | 0.987 | 0.116 | 0.086 |

25 | 0.992 | 0.087 | 0.069 | 0.997 | 0.077 | 0.061 | 0.988 | 0.108 | 0.083 |

20 | 0.993 | 0.086 | 0.067 | 0.992 | 0.091 | 0.068 | 0.986 | 0.091 | 0.080 |

The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | R^{2}
. | RMSE . | MAE . | R^{2}
. | RMSE . | MAE . | R^{2}. | RMSE . | MAE . |

40 | 0.992 | 0.090 | 0.071 | 0.987 | 0.114 | 0.086 | 0.985 | 0.124 | 0.091 |

35 | 0.991 | 0.093 | 0.074 | 0.980 | 0.119 | 0.094 | 0.987 | 0.116 | 0.088 |

30 | 0.992 | 0.090 | 0.071 | 0.994 | 0.079 | 0.064 | 0.987 | 0.116 | 0.086 |

25 | 0.992 | 0.087 | 0.069 | 0.997 | 0.077 | 0.061 | 0.988 | 0.108 | 0.083 |

20 | 0.993 | 0.086 | 0.067 | 0.992 | 0.091 | 0.068 | 0.986 | 0.091 | 0.080 |

Predicted labels . | max_depth . | n_estimators . | colsample_bytree . | min_child_weight . |
---|---|---|---|---|

Cavity water depth | 3.00 | 20.00 | 0.90 | 5.37 |

Cavity length | 6.57 | 20.00 | 0.50 | 3.04 |

Predicted labels . | max_depth . | n_estimators . | colsample_bytree . | min_child_weight . |
---|---|---|---|---|

Cavity water depth | 3.00 | 20.00 | 0.90 | 5.37 |

Cavity length | 6.57 | 20.00 | 0.50 | 3.04 |

## RESULTS AND DISCUSSION

On the basis of XGBoost and Bayesian optimization, a black box prediction model for the cavity water depth and the cavity length was created in the final section, and the ideal super parameter combination and ideal training set ratio of the model were examined. In this section, the model's prediction outcomes are explained both globally and locally in terms of SHAP, and the validity of the interpretation results is examined by contrasting them with the findings of the tests.

### Global explanation

*n*is the total number of samples with different features and is the absolute Shapley value of a single sample. Figure 4 is the ranking diagram of the feature importance of the cavity water depth and the cavity length. Figure 4(a) shows that the main variables influencing the prediction of the cavity water depth are the impact angle of the water tongue

*θ*, the flow rate

*Q*, the flow velocity

*V*, the flume's bottom slope

*i*, and the height of the ridge

*Δ*which is consistent with the physical model's experimental finding (Chun-ying 2010). In comparison to variables like

*Q*and

*V*,

*θ*is relatively flexible and, in theory, has the biggest impact on cavity backwater (Li-heng 2006). Fr and

*I*, on the other hand, have less of an impact on the cavity water depth.

*V*,

*Q*, and Fr are the primary determining parameters for the cavity length. Theoretically, it is also possible to assess that the length of the cavity and the ejection distance of the water tongue are directly related to the flow velocity on the ridge.

*V*value corresponds to a smaller Shapley value, indicating that the more the cavity water depth is negatively impacted, the smaller the corresponding cavity water depth value is. The positive influence on the cavity length is stronger as

*V*increases, showing that the cavity length grows as flow velocity increases. Similar effects on the cavity length are produced by the flow rate

*Q*, Froude number Fr, and flip bucket slope

*I*, which are in close agreement with the experimental findings.

Equation (18) produces a pure interaction effect by deducting the feature's main effect. A matrix of dimension is created when SHAP interaction values of all the all of the features are calculated, where *M* is the total number of features.

*Q*and

*V*and

*θ*and

*V*interact in the two prediction models using an interaction diagram created by SHAP between the two features. Figures 7(a) and 7(b) show that the cavity length is positively associated with

*Q*and that as

*Q*grows, the corresponding

*V*also increases. When

*Q*is too little or too large, the SHAP value for the cavity water depth is low or even negative, indicating that these values can hinder the creation of the cavity water accumulation. In this experiment, the SHAP value corresponding to several features has an evident mutation when

*Q*approaches 60–80 cm

^{3}/s. According to certain research, there is a critical value for a particular flow condition, aerator size, and bottom slope that must be met in order to maintain the stability of the cavity; if the flow condition changes by less than the critical value, the cavity will vanish (Bai

*et al.*2016). It demonstrates that the critical condition range of

*Q*in this experiment is between 60 and 80 cm

^{3}/s. The feature interaction diagram of

*θ*and

*V*show that when

*θ*is greater than 11°, the smaller or larger

*V*causes serious cavity ponding, indicating that the impact angle of the water tongue being less than 11° is one of the critical conditions for reducing the harm of cavity ponding in this test facility. Figure 7(d) demonstrates that when

*θ*is between 6° and 10°,

*V*greater than 4.0 m/s can completely eliminate the effect of cavity backwater and will not interfere with the formation of cavity length, providing the aeration facilities with the ideal hydraulic conditions for efficient operation.

### Local explanation

*θ*is greater than 11°, which will influence the aeration effect of the aeration facility. Researchers can better comprehend the extent to which the related features have an affect by using the local interpretation diagram. According to Figure 8(a), the prediction results are positively impacted with 0.22 when

*θ*is 15°, demonstrating that the greater

*θ*is the primary contributor to the development of cavity water accumulation. Second, the ridge height of 3 cm also has a positive effect of 0.19 on the cavity water depth. The greatest obstacle to the development of cavity backwater, as shown in Figure 8(d), is the smaller

*θ*. Local interpretations of the cavity length are shown in Figures 8(b) and 8(d), respectively. As can be observed, smaller

*V*and

*Q*are the major factors hindering the creation of cavity length, whereas greater

*V*and

*Q*are the major factors promoting cavity length production, which is consistent with the conclusion of the global interpretation.

## CONCLUSION

This study proposes an XGBoost–SHAP model for predicting the cavity depth and cavity length of aeration facilities. Unlike intrinsic interpretable machine learning models such as physics-informed machine learning models, XGBoost–SHAP belongs to the category of ex-post interpretability models. However, the results demonstrate that it can still effectively explain the nonlinear relationship between the influencing factors of the cavity water depth and the cavity length. The XGBoost–SHAP model proposed in this study provides a novel approach for exploring interpretable machine learning models in research. The main conclusions are as follows:

- a.
A Bayesian optimization algorithm is used to adjust the four hyperparameters of XGBoost, and

*R*^{2}, RMSE, MAE are used as indicators to measure the performance of the model. The results show that the model's performance was considerably improved after the application of Bayesian optimization. The*R*^{2}score of the cavity water depth model has increased by about 1%, and the*R*^{2}score of the cavity length model has increased by about 0.4%. - b.
Global interpretation results show that the main factors affecting the cavity water depth and the cavity length are not identical. Among them, the impact angle of the water tongue

*θ*is the most important factor affecting the cavity water depth, the flow velocity*V*is the most important factor affecting the cavity length. Interpretation results of the interaction feature terms show that the aeration facility can obtain a larger cavity length and eliminate the effect of the cavity water depth when*θ*is in the range of 6°–10° and*V*is greater than 4.0 m/s. Global interpretation results are basically consistent with the results of the aeration experiment. - c.
Local interpretation can sort the features according to the weight of different features of samples, and show the influence value of all features on the prediction results. It shows that the SHAP–XGBoost model can predict the corresponding cavity water depth and cavity length for any given hydraulic conditions and aeration facility size. The SHAP–XGBoost model established in this study studies the nonlinear relationship between the hydraulic conditions and the size of the aeration facility. It can also optimize the aeration experiment scheme and reduce the time and space cost required for physical experiments.

## ACKNOWLEDGEMENTS

The authors would like to thank Dr Ganggui Guo for helping collect data for this work. This work was supported by the National Natural Science Foundation of China [grant number 52079107]. It is also supported by the Natural Science Foundation of Shaanxi Province [grant number 2023-JC-QN-0395] and the Natural Science Foundation of Shaanxi Provincial Department of Education [grant number 22JK0470].

## ETHICAL APPROVAL

This article does not contain any studies with human participants or animals performed by any of the authors.

## INFORMED CONSENT

Informed consent was obtained from all individual participants included in the study.

## AUTHOR CONTRIBUTIONS

T.M. performed the methodology, conceptualized the study, did formal analysis, did investigation, did data curation, wrote the original draft; S.L. did project administration, supervised the study, collected resources, wrote, reviewed, and edited the article; G.L. acquired funds, did project administration, supervised the study, collected resources, wrote, reviewed, and edited the article.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Study on Hydraulic Characteristics of Aerator Under Small Slope and low Froude Number*

*Experimental Investigation on the Backwater in Cavity Pocket of Flow Aeration Types to Prevent Cavitation Ersion*