Abstract
In contrast to the traditional black box machine learning model, the white box model can achieve higher prediction accuracy and accurately evaluate and explain the prediction results. Cavity water depth and cavity length of aeration facilities are predicted in this research based on Extreme Gradient Boosting (XGBoost) and a Bayesian optimization technique. The Shapley Additive Explanation (SHAP) method is then utilized to explain the prediction results. This study demonstrates how SHAP may order all features and feature interaction terms in accordance with the significance of the input features. The XGBoost–SHAP white box model can reasonably explain the prediction results of XGBoost both globally and locally and can achieve prediction accuracy comparable to the black box model. The cavity water depth and cavity length white box model developed in this study have a promising future application in the shape optimization of aeration facilities and the improvement of model experiments.
HIGHLIGHTS
SHAP can accurately evaluate the prediction results of XGBoost.
SHAP considers the role of both single features and interactive features.
Bayesian optimization can significantly improve XGBoost performance.
Local interpretation can visualize the impact of all features.
The cavity water depth is more complex than the cavity length.
INTRODUCTION
In high-head flow buildings, the phenomenon of cavitation erosion is often caused by high flow velocity, high pressure, and large flow rate, which will cause erosion and damage to the flow surface of the building (Glazov 1984; Wu & Chao 2011). Currently, engineers use several measures to reduce cavitation damage, including designing overcurrent buildings with appropriate shapes, ensuring the flatness of overcurrent surfaces, and implementing an aeration corrosion reduction scheme. Among them, the experimental study shows that the aeration corrosion reduction scheme has the characteristics of good operability and significant corrosion reduction effect and has the best comprehensive performance in the above scheme (Pfister & Hager 2010; Bai et al. 2016). Cavity water depth and cavity length are key indicators for measuring the corrosion reduction effect of aeration facilities. A longer cavity length and lower backwater level indicate more efficient water aeration, resulting in an improved corrosion reduction effect (Glazov 1984; Wu & Ruan 2008). Many factors affect the cavity water depth and the cavity length, including the shape of the aeration facility, the flow conditions, etc., so it is difficult to directly observe and analyze the main influencing factors (Brujan & Matsumoto 2012; Tsuru et al. 2017).
At present, machine learning technology has a large number of applications in the field of water conservancy projects, such as the prediction of weir flow rate coefficient, scour depth, and water quality assessment (Parsaie et al. 2017; Azamathulla et al. 2019). Relevant research results show that machine learning algorithms can achieve good prediction effects in water conservancy project applications. Machine learning models are divided into the black box model and the white box model. Black box models include Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), etc. Most machine learning search results are based on black box patterns (Azimi et al. 2016; Pan et al. 2022). Some black box models can achieve excellent fitting and prediction results in classification and regression problems. For example, some scholars introduced genetic algorithm (GA) and Bayesian optimization to adjust the hyperparameters of XGBoost and found that the R2 (coefficient of determination) of the model prediction results was as high as 0.941 (Gu et al. 2022; Kim et al. 2022). However, the internal mechanism of black box models is too complex, it is difficult for researchers to explain the prediction results and the prediction results are not convincing (Mi et al. 2020). White box models are also called interpretable machine learning models, which overcome the defect of poor interpretation performance of black box models. White box models are divided into two categories, intrinsic interpretable machine learning model and ex-post interpretable machine learning model. Intrinsic interpretable machine learning models include the Generalized Addictive Model (GAM), Explainable Boosting Machine (EBM), etc., where GAM ignores the interaction terms of input features and the prediction accuracy of the model is too low (Agarwal et al. 2021). Some intrinsic interpretable machine learning models can achieve excellent prediction results and make reasonable explanations for the results, such as GAMI-Net (Yang et al. 2021). In addition, there is an interpretable machine learning model based on physical meaning. Research demonstrated that the use of high-level concepts aid in evolving equations that are easier to interpret by domain specialists (Babovic 2009). To simulate groundwater levels, scholars have proposed a method of embedding physical constraints into machine learning models. Research shows that the physically constrained hybrid model exhibits better adaptability and generalization abilities when compared to pure deep learning models (Cai et al. 2021, 2022). In the field of rainfall–runoff simulation, there are also physically meaningful machine learning models such as Model Induction Knowledge Augmented-System Hydrologique Asiatique (MIKA-SHA) and Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI). They can help hydrologists better understand catchment dynamics (Chadalawada et al. 2020; Herath et al. 2021). Ex-post interpretable machine learning models interpret the prediction results of black box models through complex ex-post analysis methods, such as Partial Dependence Plot (PDP), Accumulated Local Effect (ALE), and Shapley Additive Explanation (SHAP) (Maxwell et al. 2021; Mangalathu et al. 2022). SHAP is an ex-post interpretation method that draws on the idea of game theory. By calculating the marginal contribution value of all input features and feature interaction items in the model, namely the Shapley Value, it is used to measure the influence of features and interaction items, so as to realize the interpretation of the black box model (Meddage et al. 2022).
In this study, the cavity water depth and cavity length prediction model based on XGBoost–SHAP was established by collecting experimental data. The model uses Bayesian optimization to search the hyperparameters of the model, uses SHAP to explain the global and local interpretation of the model prediction results, and analyzes the rationality of the interpretation results according to the experimental conclusions.
METHODS
Extreme gradient boosting














Bayesian optimization












Shapley Additive Explanation


- 1.
Local accuracy, where the predicted value of model g for a single sample is equal to the predicted value of the black box model for a single sample,
.
- 2.
Missingness. Missing sample features have no bearing on the ex-post interpretation model g in the case of a single sample with missing values,
= 0.
- 3.
Consistency. The
of the feature changes with the contribution of the feature in the new model for a single sample when the complexity of the model changes, for as from RF to XGBoost. In 1938, Lyold Shapely demonstrated that
satisfies the definition
and that the three aforementioned conditions each have a sole solution.










EXPERIMENTAL DATA AND EVALUATION INDEX
Experiments and data
Experimental data
i . | Δ (cm) . | Q (L/s) . | I . | V (m/s) . | h (cm) . | Fr . | L (cm) . | θ (°) . | d (cm) . |
---|---|---|---|---|---|---|---|---|---|
0.077 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.00–1.75 | 1.50–3.40 | 2.46–3.27 | 4.9–20.2 | 6.0–20.0 | 0.40–1.85 |
0.087 | 1.0–4.0 | 1.7–142.7 | 0.1–0.2 | 1.07–5.60 | 1.35–8.50 | 2.55–6.13 | 5.0–71.7 | 6.0–20.0 | 0.00–1.70 |
0.096 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.07–1.75 | 1.25–3.25 | 2.62–3.90 | 5.2–26.5 | 5.5–19.0 | 0.40–1.30 |
0.105 | 1.5–4.0 | 14.4–141.7 | 0.1–0.2 | 1.70–5.90 | 2.80–8.00 | 3.27–6.67 | 13.7–73.0 | 7.8–14.1 | 0.00–2.60 |
0.122 | 1.5–4.0 | 13.4–138.0 | 0.1–0.2 | 1.80–5.90 | 2.50–7.80 | 3.61–7.20 | 14.4–75.0 | 7.5–13.6 | 0.00–2.50 |
i . | Δ (cm) . | Q (L/s) . | I . | V (m/s) . | h (cm) . | Fr . | L (cm) . | θ (°) . | d (cm) . |
---|---|---|---|---|---|---|---|---|---|
0.077 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.00–1.75 | 1.50–3.40 | 2.46–3.27 | 4.9–20.2 | 6.0–20.0 | 0.40–1.85 |
0.087 | 1.0–4.0 | 1.7–142.7 | 0.1–0.2 | 1.07–5.60 | 1.35–8.50 | 2.55–6.13 | 5.0–71.7 | 6.0–20.0 | 0.00–1.70 |
0.096 | 1.0–3.0 | 1.7–5.2 | 0.1–0.2 | 1.07–1.75 | 1.25–3.25 | 2.62–3.90 | 5.2–26.5 | 5.5–19.0 | 0.40–1.30 |
0.105 | 1.5–4.0 | 14.4–141.7 | 0.1–0.2 | 1.70–5.90 | 2.80–8.00 | 3.27–6.67 | 13.7–73.0 | 7.8–14.1 | 0.00–2.60 |
0.122 | 1.5–4.0 | 13.4–138.0 | 0.1–0.2 | 1.80–5.90 | 2.50–7.80 | 3.61–7.20 | 14.4–75.0 | 7.5–13.6 | 0.00–2.50 |
Prediction effect analysis of different models
Results from previous studies demonstrate that the XGBoost Set Learning method performs better than SVM and RF in the regression prediction of complex nonlinear relational issues (Gu et al. 2022). In this study, a black box model based on XGBoost was established to predict the cavity water depth and the cavity length. The grid hyperparameters of XGBoost are searched using the Bayesian optimization technique, and the four main hyperparameters of n_estimators, max_depth, colsample_bytree and min_child_weight are optimized, respectively.


R2 score plots of different test set proportions: (a) prediction of cavity water depth and (b) prediction of cavity length.
R2 score plots of different test set proportions: (a) prediction of cavity water depth and (b) prediction of cavity length.
Tables 2 and 3 record the R2, RMSE, and MAE values of the cavity water depth and the cavity length prediction models in five test series. Among these, the validation set records the test values of the five-fold cross validation index combined with Bayesian optimization, and the test set records the final predictive effect of the model. As can be observed, the test set of the two models has the greatest R2 score when the proportion of the training set is 70%, the R2 of the cavity water depth is 0.919, and the R2 of the cavity length is as high as 0.987. The RMSE and MAE of the two models are relatively small when the training set ratio is 70%, showing that the model error is relatively minor. When combined with R2, it can be demonstrated that relatively good prediction precision may be attained when the two models choose 70% of the training set. The model's optimized hyperparameter is displayed in Table 4.
Index values of cavity water depth at different test set ratios
The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . |
40 | 0.968 | 0.128 | 0.096 | 0.948 | 0.195 | 0.141 | 0.921 | 0.197 | 0.145 |
35 | 0.970 | 0.122 | 0.093 | 0.934 | 0.186 | 0.140 | 0.915 | 0.212 | 0.157 |
30 | 0.967 | 0.127 | 0.093 | 0.915 | 0.236 | 0.177 | 0.919 | 0.208 | 0.157 |
25 | 0.967 | 0.131 | 0.095 | 0.900 | 0.223 | 0.164 | 0.899 | 0.222 | 0.167 |
20 | 0.969 | 0.125 | 0.095 | 0.880 | 0.218 | 0.154 | 0.881 | 0.245 | 0.170 |
The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . |
40 | 0.968 | 0.128 | 0.096 | 0.948 | 0.195 | 0.141 | 0.921 | 0.197 | 0.145 |
35 | 0.970 | 0.122 | 0.093 | 0.934 | 0.186 | 0.140 | 0.915 | 0.212 | 0.157 |
30 | 0.967 | 0.127 | 0.093 | 0.915 | 0.236 | 0.177 | 0.919 | 0.208 | 0.157 |
25 | 0.967 | 0.131 | 0.095 | 0.900 | 0.223 | 0.164 | 0.899 | 0.222 | 0.167 |
20 | 0.969 | 0.125 | 0.095 | 0.880 | 0.218 | 0.154 | 0.881 | 0.245 | 0.170 |
Index values of cavity length at different test set ratios
The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . |
40 | 0.992 | 0.090 | 0.071 | 0.987 | 0.114 | 0.086 | 0.985 | 0.124 | 0.091 |
35 | 0.991 | 0.093 | 0.074 | 0.980 | 0.119 | 0.094 | 0.987 | 0.116 | 0.088 |
30 | 0.992 | 0.090 | 0.071 | 0.994 | 0.079 | 0.064 | 0.987 | 0.116 | 0.086 |
25 | 0.992 | 0.087 | 0.069 | 0.997 | 0.077 | 0.061 | 0.988 | 0.108 | 0.083 |
20 | 0.993 | 0.086 | 0.067 | 0.992 | 0.091 | 0.068 | 0.986 | 0.091 | 0.080 |
The proportion of the test set (%) . | Training dataset . | Validation dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . |
40 | 0.992 | 0.090 | 0.071 | 0.987 | 0.114 | 0.086 | 0.985 | 0.124 | 0.091 |
35 | 0.991 | 0.093 | 0.074 | 0.980 | 0.119 | 0.094 | 0.987 | 0.116 | 0.088 |
30 | 0.992 | 0.090 | 0.071 | 0.994 | 0.079 | 0.064 | 0.987 | 0.116 | 0.086 |
25 | 0.992 | 0.087 | 0.069 | 0.997 | 0.077 | 0.061 | 0.988 | 0.108 | 0.083 |
20 | 0.993 | 0.086 | 0.067 | 0.992 | 0.091 | 0.068 | 0.986 | 0.091 | 0.080 |
Hyperparameters of the model
Predicted labels . | max_depth . | n_estimators . | colsample_bytree . | min_child_weight . |
---|---|---|---|---|
Cavity water depth | 3.00 | 20.00 | 0.90 | 5.37 |
Cavity length | 6.57 | 20.00 | 0.50 | 3.04 |
Predicted labels . | max_depth . | n_estimators . | colsample_bytree . | min_child_weight . |
---|---|---|---|---|
Cavity water depth | 3.00 | 20.00 | 0.90 | 5.37 |
Cavity length | 6.57 | 20.00 | 0.50 | 3.04 |
RESULTS AND DISCUSSION
On the basis of XGBoost and Bayesian optimization, a black box prediction model for the cavity water depth and the cavity length was created in the final section, and the ideal super parameter combination and ideal training set ratio of the model were examined. In this section, the model's prediction outcomes are explained both globally and locally in terms of SHAP, and the validity of the interpretation results is examined by contrasting them with the findings of the tests.
Global explanation

Mean (|SHAP value|) (average impact on model output magnitude): (a) mean (|SHAP value|) of cavity water depth and (b) mean (|SHAP value|) of cavity length.
Mean (|SHAP value|) (average impact on model output magnitude): (a) mean (|SHAP value|) of cavity water depth and (b) mean (|SHAP value|) of cavity length.
SHAP summary plot: (a) SHAP value of cavity water depth and (b) SHAP value of cavity length.
SHAP summary plot: (a) SHAP value of cavity water depth and (b) SHAP value of cavity length.
Equation (18) produces a pure interaction effect by deducting the feature's main effect. A matrix of dimension is created when SHAP interaction values of all the all of the features are calculated, where M is the total number of features.
Interaction diagram of feature values: (a) interaction values for predicting cavity water depth and (b) interaction values for predicting cavity length.
Interaction diagram of feature values: (a) interaction values for predicting cavity water depth and (b) interaction values for predicting cavity length.
Plot of SHAP feature dependence: (a) SHAP interaction values for Q and V of cavity; (b) SHAP interaction values for Q and V of water depth cavity length; (c) SHAP interaction values for θ and V of cavity water depth; and (d) SHAP interaction values for θ and V of cavity length.
Plot of SHAP feature dependence: (a) SHAP interaction values for Q and V of cavity; (b) SHAP interaction values for Q and V of water depth cavity length; (c) SHAP interaction values for θ and V of cavity water depth; and (d) SHAP interaction values for θ and V of cavity length.
Local explanation
Local interpretation diagram: (a) sample local interpretation map when cavity water depth is 1.399 cm; (b) sample local interpretation map when cavity length is 17.413 cm; (c) sample local interpretation map when cavity water depth is 0.8 cm; and (d) sample local interpretation map when cavity length is 54.407 cm.
Local interpretation diagram: (a) sample local interpretation map when cavity water depth is 1.399 cm; (b) sample local interpretation map when cavity length is 17.413 cm; (c) sample local interpretation map when cavity water depth is 0.8 cm; and (d) sample local interpretation map when cavity length is 54.407 cm.
CONCLUSION
This study proposes an XGBoost–SHAP model for predicting the cavity depth and cavity length of aeration facilities. Unlike intrinsic interpretable machine learning models such as physics-informed machine learning models, XGBoost–SHAP belongs to the category of ex-post interpretability models. However, the results demonstrate that it can still effectively explain the nonlinear relationship between the influencing factors of the cavity water depth and the cavity length. The XGBoost–SHAP model proposed in this study provides a novel approach for exploring interpretable machine learning models in research. The main conclusions are as follows:
- a.
A Bayesian optimization algorithm is used to adjust the four hyperparameters of XGBoost, and R2, RMSE, MAE are used as indicators to measure the performance of the model. The results show that the model's performance was considerably improved after the application of Bayesian optimization. The R2 score of the cavity water depth model has increased by about 1%, and the R2 score of the cavity length model has increased by about 0.4%.
- b.
Global interpretation results show that the main factors affecting the cavity water depth and the cavity length are not identical. Among them, the impact angle of the water tongue θ is the most important factor affecting the cavity water depth, the flow velocity V is the most important factor affecting the cavity length. Interpretation results of the interaction feature terms show that the aeration facility can obtain a larger cavity length and eliminate the effect of the cavity water depth when θ is in the range of 6°–10° and V is greater than 4.0 m/s. Global interpretation results are basically consistent with the results of the aeration experiment.
- c.
Local interpretation can sort the features according to the weight of different features of samples, and show the influence value of all features on the prediction results. It shows that the SHAP–XGBoost model can predict the corresponding cavity water depth and cavity length for any given hydraulic conditions and aeration facility size. The SHAP–XGBoost model established in this study studies the nonlinear relationship between the hydraulic conditions and the size of the aeration facility. It can also optimize the aeration experiment scheme and reduce the time and space cost required for physical experiments.
ACKNOWLEDGEMENTS
The authors would like to thank Dr Ganggui Guo for helping collect data for this work. This work was supported by the National Natural Science Foundation of China [grant number 52079107]. It is also supported by the Natural Science Foundation of Shaanxi Province [grant number 2023-JC-QN-0395] and the Natural Science Foundation of Shaanxi Provincial Department of Education [grant number 22JK0470].
ETHICAL APPROVAL
This article does not contain any studies with human participants or animals performed by any of the authors.
INFORMED CONSENT
Informed consent was obtained from all individual participants included in the study.
AUTHOR CONTRIBUTIONS
T.M. performed the methodology, conceptualized the study, did formal analysis, did investigation, did data curation, wrote the original draft; S.L. did project administration, supervised the study, collected resources, wrote, reviewed, and edited the article; G.L. acquired funds, did project administration, supervised the study, collected resources, wrote, reviewed, and edited the article.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.