ABSTRACT
Accurate streamflow estimation is critical in water resources management and requires precise estimation. In this study, multiple tree-based models (decision tree, random forest, light gradient boosting machine, and gradient boosting machine (GBM)) are employed to estimate streamflow in the Cau River basin at Thac Rieng station, providing an illustrative example. Six indices named correlation coefficient (r), Nash–Sutcliffe efficiency (NSE), Kling–Gupta efficiency (KGE), root mean square error (RMSE), mean absolute error (MAE), and mean error (ME) are implemented to quantitatively assess model performance. First, using the available hydro-meteorological data from 1/1/1960 to 31/12/1981, hyper-parameters in each tree model are determined. The results show that four models reproduced well-observed streamflow, with dimensionless errors (r, NSE, and KGE) ranging from 0.84 to 0.99 and dimensional errors (RMSE, MAE, and ME) are less than 10 m3/s (approximately 3.6% of the observed streamflow magnitude). Second, the GBM model was found as the most suitable for simulating observed streamflow in the studied river basin. Third, four models were applied to estimate streamflow in the period from 1/1/1982 to 31/12/2022, revealing similar magnitudes of different statistical indicators of observed and estimated streamflow. The capability of four models in simulating streamflow is finally discussed.
HIGHLIGHTS
Four tree-based models were investigated for daily streamflow estimation in the Cau River basin.
The performance of four tree-based models was assessed using six statistical indices.
The gradient boosting machine model was found as the most suitable for simulating observed streamflow.
Explainable machine learning identifies time-lagged water levels and rainfall as the most significant variables.
LIST OF ABBREVIATIONS
- ANN
artificial neural networks
- DL
deep learning
- DT
decision tree
- GBM
gradient boosting machine
- GP
Gaussian process
- KGE
Kling–Gupta efficiency
- LightGBM
light gradient boosting machine
- MAE
mean absolute error
- ME
mean error
- ML
machine learning
- NSE
Nash–Sutcliffe efficiency
- r
correlation coefficient
- RF
random forest
- RMSE
root mean square error
- SHAP
SHapley Additive exPlanations
- XAI
explainable machine learning
- XGB
extreme gradient boosting
INTRODUCTION
Accurate streamflow estimation is significant for water resources management, flood forecasting, and hydropower generation (Pham Van & Nguyen-Van 2022). Hydrological processes inherently exhibit complex and highly non-linear relationships among components such as precipitation, evaporation, transpiration, runoff, infiltration, groundwater, surface water, soil moisture, revealing significant challenges for making reliable streamflow estimation (Dasgupta et al. 2024). Although there have been many efforts from scholars around the world in finding the optimal method to improve streamflow estimation, efforts still need to be continuously built up because today's hydrological process is affected by human economic production activities, land cover, land use, and climate change makes streamflow forecasting more and more difficult.
Traditionally, physics-based models, which represent the physics processes by mathematics equations are the main tool for streamflow estimation (Won et al. 2023; Imhoff et al. 2024). These models integrate various physical processes, such as precipitation, evaporation, infiltration, and surface runoff, to simulate how water moves through a watershed (Pophillat et al. 2021; Liu et al. 2024; Plunge et al. 2024). Key components typically include equations for water balance, continuity, and momentum, which account for the interactions between rainfall, soil moisture, vegetation, and topography (Beven 2012). By incorporating detailed spatial and temporal data, physics-based models can provide high-resolution prediction of streamflow, capturing the complexities of natural hydrological systems, making them particularly valuable for assessing the impact of land use changes, climate variability, and water management practices on river flow, and aiding in effective water resource planning and flood risk management (Bal et al. 2021; Alawi & Özkul 2023). However, physics-based models require extensive and high-quality data inputs, which can be difficult and expensive to obtain, especially in data-scarce regions (Yang et al. 2020). Indeed, the complexity of models often results in high computational demands and longer processing times (Zahura et al. 2020). Additionally, the need for precise calibration and validation using observed data can be challenging and time-consuming. These limitations can hinder their practical application, especially for real-time streamflow forecasting and in regions where data availability is limited (Ivanov et al. 2021).
Currently, data-driven models, such as machine learning (ML) and deep learning (DL) have emerged as powerful tools for streamflow prediction by leveraging large datasets to learn the complex pattern and non-linear relationships in hydrological data (Sit et al. 2020; Hauswirth et al. 2021; Tripathy & Mishra 2023). Unlike the traditional physics-based models, ML approaches do not require explicit physical equations, instead, they use historical data to train algorithms to make accurate predictions. Data-driven approaches such as artificial neural networks (Uhrig 1995), long-short term memory (Hochreiter & Schmidhuber 1997) or decision tree (DT)-based models have been demonstrated superior in the hydrological filed. Kedam et al. (2024) demonstrated the efficacy of DT-based models in their study on streamflow forecasting in the Narmada river basin, particularly highlighting the robustness of the random forest (RF) model for streamflow estimation. Furthermore, other tree-based models have also been shown to perform well for streamflow prediction. Szczepanek (2022) evaluated the performance of three ML models, namely extreme gradient boosting (XGB), light gradient boosting machine (LightGBM), and categorical boosting (CatBoot), for streamflow prediction in a mountainous catchment. Their results showed that, when hyper-parameters of ML algorithms were optimized, the LightGBM model showed the best performance among the three algorithms. From the results of the above-mentioned studies it can be observed that data-based algorithms generally show high performance for streamflow prediction. ML algorithms are not without limitations, although they require less input data when compared to physical models. They are also notorious for requiring large amounts of data for training (Chia et al. 2022; Cui et al. 2022). This is one of their limitations, especially when they are applied to watersheds with scarce data or sparse gauge density. This means that the predictions from these models do not follow any principles and completely ignore the physical factors of hydrological processes. This can be considered a fundamental weakness of ML models (Núñez et al. 2023).
To overcome the above limitations, explainable artificial intelligence (XAI) is becoming an indispensable tool in exploiting ML models (Jiang et al. 2024; Perera et al. 2024). One of the key tools in XAI is SHapley Additive exPlanations (SHAP), which provides interpretable insights into the predictions made by complex models. In the context of streamflow estimation, SHAP can help hydrologists and water resources managers understand the contribution of each predictor variable, such as precipitation, evaporation, water level, and historical streamflow data, to the model's output (Sushanth et al. 2023; Huang & Zhang 2024; Wang et al. 2024). By quantifying the impact of individual features, SHAP enhances the transparency of ML models, making it easier to validate and trust their predictions (Lundberg & Lee 2017). This is particularly crucial for decision-making processes in water management, where understanding the underlying factors driving model predictions can lead to more informed and effective strategies. Moreover, SHAP can aid in identifying and mitigating potential bias models, ensuring that the predictions are not only accurate but also fair and reliable.
The motivation for our current study stems from the need to enhance the interpretability and reliability of DT-based models used for streamflow estimation. While these models, such as RF, and gradient boosting machine (GBM), are known for their robustness and accuracy, their complexity often hinders their applicability. By incorporating SHAP into our analysis, the present study also aims to unravel the decision-making process of these models. SHAP provides a clear understanding of how each input variable influences the model's predictions, enabling to identification of key drivers of streamflow changes. This study, therefore, seeks to bridge the gap between high predictive performance and interpretability, ultimately contributing to more transparent and effective hydrological forecasting as well.
METHODOLOGY
General flowchart diagram of estimating streamflow in the present study.
Decision tree
DT regression belongs to the supervised class of ML used for predicting continuous values (Quinlan 1986). It operates by splitting the dataset into subsets based on feature values, forming a tree structure that consists of nodes, branches, and leaves. The root nodes represent the entire dataset and initiate the first split. Internal nodes represent decision rules based on specific feature values, and samples per leaf are met. The leaf nodes, which are the terminal points of the tree, provide predicted continuous values. The DT algorithm for regression aims to minimize the variance within each subset to determine the best splits. The variance of dataset D is calculated as the average of the squared differences between each target value yi and the mean target value .
Random forest
Gradient boosting machine
Light gradient boosting machine
LightGBM, which is another boosting ML algorithm, is a highly efficient and fast implementation of the gradient boosting framework specifically designed for large-scale and high-performance regression and classification problems. Like the traditional GBM, LightGBM builds an ensemble of DT in a sequential manner, where each tree corrects the errors of its predecessors. However, LightGBM introduces several key innovations that set it apart from standard GBM, including the use of histogram-based algorithms for faster training, leaf-wise tree growth instead of level-wise, and advanced techniques for handling categorical features. These enhancements enable LightGBM to achieve faster training times and better scalability while maintaining high accuracy.
BO for hyper-parameters

Explainable artificial intelligence
Dimensionless and dimensional errors
Note that dimensionless and dimensional errors mentioned above provide a quantitative evaluation of the models' ability to simulate measured streamflow. For example, r and NSE assess the ratio of residual variance (or noise) to the variance in observations, offering valuable insights into the comparison between observed and predicted values. KGE is a reliable goodness-of-fit metric for comparing estimations to measurements. Meanwhile, RMSE reflects the error magnitude in the same units as the data, making it useful for analyzing results. The optimal values for KGE, r, and NSE, r are 1, while RMSE and MAE are optimal when close to zero.
EXPERIMENTAL DESIGNS AND MODEL SETUP
The Cau River Basin
Collection of meteo-hydrological data
Meteo-hydrological stations and various statistical properties of collected data
Name . | Variable . | Data collection period . | Mean value . | Range . | Standard deviation . | Coefficient of variation . | Correlation coefficienta . |
---|---|---|---|---|---|---|---|
Bac Kan | Evaporation, Z (mm) | 1960–2022 | 2.163 | 0–9.2 | −0.125 | 0.489 | −0.149 |
Rainfall, X (mm) | 1960–2022 | 4.111 | 0–304.9 | 0.356 | 2.923 | 0.609 | |
Thac Rieng | Rainfall, X (mm) | 1960–2022 | 3.691 | 0–201.6 | 0.325 | 2.973 | 0.564 |
Water level, H (m) | 1960–2022 | 94.60 | 93.83–100 | 0.390 | 0.004 | 0.820 | |
Streamflow, Q (m3/s) | 1960–1981, 2018–2022 | 18.4 | 2.22–517 | 24.04 | 1.305 | 1.0 |
Name . | Variable . | Data collection period . | Mean value . | Range . | Standard deviation . | Coefficient of variation . | Correlation coefficienta . |
---|---|---|---|---|---|---|---|
Bac Kan | Evaporation, Z (mm) | 1960–2022 | 2.163 | 0–9.2 | −0.125 | 0.489 | −0.149 |
Rainfall, X (mm) | 1960–2022 | 4.111 | 0–304.9 | 0.356 | 2.923 | 0.609 | |
Thac Rieng | Rainfall, X (mm) | 1960–2022 | 3.691 | 0–201.6 | 0.325 | 2.973 | 0.564 |
Water level, H (m) | 1960–2022 | 94.60 | 93.83–100 | 0.390 | 0.004 | 0.820 | |
Streamflow, Q (m3/s) | 1960–1981, 2018–2022 | 18.4 | 2.22–517 | 24.04 | 1.305 | 1.0 |
aCorrelation coefficient is calculated using the available data in the collected period.
Time series of: (a) evaporation at Bac Kan, (b) rainfall at Bac Kan, (c) rainfall at Thac Rieng, (d) water level at Thac Rieng, and (e) streamflow at Thac Rieng during the data collection period.
Time series of: (a) evaporation at Bac Kan, (b) rainfall at Bac Kan, (c) rainfall at Thac Rieng, (d) water level at Thac Rieng, and (e) streamflow at Thac Rieng during the data collection period.
Model setup
Multiple DT-based models named DT, RF, GBM, and LightGBM were employed to estimate streamflow at the location of interest in the studied river. In order to identify hyper-parameters of multiple models, the BO algorithm presented in Section 2 was employed, with a k-fold cross-validation strategy. BO enhances the model's performance by systematically searching the hyper-parameter space, while k-fold cross-validation ensures robust and reliable model validation by splitting the dataset into k = 5 subsets, training the model k times, each time using a different subset as the validation set and the remaining subsets as the training set. Note that a 5-fold cross-validation strategy was implemented to avoid potential over-training issues and ensure robust model performance. Indeed, a maximum of 30 iterations was also set as in many previous studies (e.g., Zhang et al. 2023; Tran et al. 2024).
In each investigated model, two groups of input data were tested. The first input data group includes data from individual stations, such as (i) rainfall at Bac Kan, (ii) rainfall at Thac Rieng, and (iii) water level at Thac Rieng. The second input data group consists of combined meteo-hydrological data from multiple stations, such as (i) rainfall at Thac Rieng and Bac Kan, (ii) rainfall at Thac Rieng and Bac Kan together with evaporation at Bac Kan, and (iii) evaporation and rainfall at Bac Kan as well as rainfall and water level at Thac Rieng. These two input groups resulted in six input data scenarios, which were tested using multiple tree-based models, as summarized in Table 2. The abbreviations (XBK,T), (XTR,T), (HTR,T), (XBT,T), (X2Z,T), and (XZH,T) refer to the input data as follows rainfall at Bac Kan, rainfall at Thac Rieng, water level at Thac Rieng, rainfall at both Bac Kan and Thac Rieng, rainfall at Thac Rieng and Bac Kan together with evaporation at Bac Kan, and rainfall at Thac Rieng and Bac Kan combined with evaporation at Bac Kan and water level at Thac Rieng, respectively. The subscript ‘T’ denotes the training step, while ‘V’ denotes the testing step.
Different scenarios of input data when using multiple DT-based models
Abbreviation . | Rainfall . | The water level at Thac Rieng . | Evaporation at Bac Kan . | Step . | |
---|---|---|---|---|---|
Bac Kan . | Thac Rieng . | ||||
RF | |||||
RF(XBK,T) | ✓ | Training | |||
RF(XTR,T) | ✓ | ||||
RF(HTR,T) | ✓ | ||||
RF(XBT,T) | ✓ | ✓ | |||
RF(X2Z,T) | ✓ | ✓ | ✓ | ||
RF(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
RF(XBK,V) | ✓ | Testing | |||
RF(XTR,V) | ✓ | ||||
RF(HTR,V) | ✓ | ||||
RF(XBT,V) | ✓ | ✓ | |||
RF(X2Z,V) | ✓ | ✓ | ✓ | ||
RF(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
DT | |||||
DT(XBK,T) | ✓ | Training | |||
DT(XTR,T) | ✓ | ||||
DT(HTR,T) | ✓ | ||||
DT(XBT,T) | ✓ | ✓ | |||
DT(X2Z,T) | ✓ | ✓ | ✓ | ||
DT(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
DT(XBK,V) | ✓ | Testing | |||
DT(XTR,V) | ✓ | ||||
DT(HTR,V) | ✓ | ||||
DT(XBT,V) | ✓ | ✓ | |||
DT(X2Z,V) | ✓ | ✓ | ✓ | ||
DT(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
LightGBM | |||||
LG(XBK,T) | ✓ | Training | |||
LG(XTR,T) | ✓ | ||||
LG(HTR,T) | ✓ | ||||
LG(XBT,T) | ✓ | ✓ | |||
LG(X2Z,T) | ✓ | ✓ | ✓ | ||
LG(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
LG(XBK,V) | ✓ | Testing | |||
LG(XTR,V) | ✓ | ||||
LG(HTR,V) | ✓ | ||||
LG(XBT,V) | ✓ | ✓ | |||
LG(X2Z,V) | ✓ | ✓ | ✓ | ||
LG(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
GBM | |||||
GBM(XBK,T) | ✓ | Training | |||
GBM(XTR,T) | ✓ | ||||
GBM(HTR,T) | ✓ | ||||
GBM(XBT,T) | ✓ | ✓ | |||
GBM(X2Z,T) | ✓ | ✓ | ✓ | ||
GBM(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
GBM(XBK,V) | ✓ | Testing | |||
GBM(XTR,V) | ✓ | ||||
GBM(HTR,V) | ✓ | ||||
GBM(XBT,V) | ✓ | ✓ | |||
GBM(X2Z,V) | ✓ | ✓ | ✓ | ||
GBM(XZH,V) | ✓ | ✓ | ✓ | ✓ |
Abbreviation . | Rainfall . | The water level at Thac Rieng . | Evaporation at Bac Kan . | Step . | |
---|---|---|---|---|---|
Bac Kan . | Thac Rieng . | ||||
RF | |||||
RF(XBK,T) | ✓ | Training | |||
RF(XTR,T) | ✓ | ||||
RF(HTR,T) | ✓ | ||||
RF(XBT,T) | ✓ | ✓ | |||
RF(X2Z,T) | ✓ | ✓ | ✓ | ||
RF(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
RF(XBK,V) | ✓ | Testing | |||
RF(XTR,V) | ✓ | ||||
RF(HTR,V) | ✓ | ||||
RF(XBT,V) | ✓ | ✓ | |||
RF(X2Z,V) | ✓ | ✓ | ✓ | ||
RF(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
DT | |||||
DT(XBK,T) | ✓ | Training | |||
DT(XTR,T) | ✓ | ||||
DT(HTR,T) | ✓ | ||||
DT(XBT,T) | ✓ | ✓ | |||
DT(X2Z,T) | ✓ | ✓ | ✓ | ||
DT(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
DT(XBK,V) | ✓ | Testing | |||
DT(XTR,V) | ✓ | ||||
DT(HTR,V) | ✓ | ||||
DT(XBT,V) | ✓ | ✓ | |||
DT(X2Z,V) | ✓ | ✓ | ✓ | ||
DT(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
LightGBM | |||||
LG(XBK,T) | ✓ | Training | |||
LG(XTR,T) | ✓ | ||||
LG(HTR,T) | ✓ | ||||
LG(XBT,T) | ✓ | ✓ | |||
LG(X2Z,T) | ✓ | ✓ | ✓ | ||
LG(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
LG(XBK,V) | ✓ | Testing | |||
LG(XTR,V) | ✓ | ||||
LG(HTR,V) | ✓ | ||||
LG(XBT,V) | ✓ | ✓ | |||
LG(X2Z,V) | ✓ | ✓ | ✓ | ||
LG(XZH,V) | ✓ | ✓ | ✓ | ✓ | |
GBM | |||||
GBM(XBK,T) | ✓ | Training | |||
GBM(XTR,T) | ✓ | ||||
GBM(HTR,T) | ✓ | ||||
GBM(XBT,T) | ✓ | ✓ | |||
GBM(X2Z,T) | ✓ | ✓ | ✓ | ||
GBM(XZH,T) | ✓ | ✓ | ✓ | ✓ | |
GBM(XBK,V) | ✓ | Testing | |||
GBM(XTR,V) | ✓ | ||||
GBM(HTR,V) | ✓ | ||||
GBM(XBT,V) | ✓ | ✓ | |||
GBM(X2Z,V) | ✓ | ✓ | ✓ | ||
GBM(XZH,V) | ✓ | ✓ | ✓ | ✓ |
The training step uses input and target output in the period from 01/01/1960 to 06/03/1975, covering approximately 70% of the data collection. The testing step utilizes meteo-hydrological data from 07/03/1975 to 31/12/1981. This comprehensive approach ensures that the models are training, leveraging BO to fine-tune hyper-parameters effectively. After that six dimensionless and dimensional errors are calculated for quantitative comparisons between computed and measured streamflows. These errors provide a thorough evaluation of the models' performance, ensuring that computed streamflows are both accurate and reliable. Detailed results and error values for the four investigated models are depicted in the next section.
RESULTS AND DISCUSSION
Input selection and optimization of hyper-parameters
Statistical errors when using multiple DT-based models
Abbreviation . | KGE . | r . | NSE . | RMSE . | MAE . | ME . | |||
---|---|---|---|---|---|---|---|---|---|
m3/s . | % . | m3/s . | % . | m3/s . | % . | ||||
RF(XBK,T) | 0.762 | 0.885 | 0.774 | 11.55 | 2.23 | 7.02 | 1.36 | −0.20 | −0.04 |
RF(XTR,T) | 0.699 | 0.853 | 0.714 | 12.99 | 2.51 | 7.59 | 1.47 | −0.18 | −0.03 |
RF(HTR,T) | 0.826 | 0.919 | 0.840 | 9.72 | 1.88 | 2.68 | 0.52 | 0.02 | 0.00 |
RF(XBT,T) | 0.786 | 0.904 | 0.808 | 10.65 | 2.06 | 6.49 | 1.25 | −0.18 | −0.03 |
RF(X2Z,T) | 0.826 | 0.940 | 0.874 | 8.63 | 1.67 | 4.95 | 0.96 | −0.31 | −0.06 |
RF(XZH,T) | 0.912 | 0.985 | 0.964 | 4.59 | 0.89 | 1.36 | 0.26 | − 0.028 | − 0.005 |
RF(XBK,V) | 0.692 | 0.767 | 0.582 | 15.93 | 5.77 | 9.69 | 3.51 | 1.41 | 0.51 |
RF(XTR,V) | 0.603 | 0.699 | 0.483 | 17.73 | 6.42 | 10.37 | 3.76 | 1.43 | 0.52 |
RF(HTR,V) | 0.858 | 0.971 | 0.931 | 6.49 | 2.35 | 2.28 | 0.83 | − 0.02 | − 0.01 |
RF(XBT,V) | 0.726 | 0.782 | 0.604 | 15.50 | 5.62 | 9.30 | 3.37 | 1.36 | 0.49 |
RF(X2Z,V) | 0.760 | 0.799 | 0.632 | 14.95 | 5.42 | 8.80 | 3.19 | 0.70 | 0.25 |
RF(XZH,V) | 0.894 | 0.971 | 0.937 | 6.17 | 2.24 | 2.25 | 0.81 | − 0.048 | − 0.02 |
DT(XBK,T) | 0.891 | 0.922 | 0.850 | 9.40 | 1.82 | 6.16 | 1.19 | −0.15 | −0.03 |
DT(XTR,T) | 0.848 | 0.891 | 0.795 | 11.01 | 2.13 | 6.88 | 1.33 | −0.13 | −0.03 |
DT(HTR,T) | 0.971 | 0.979 | 0.958 | 4.97 | 0.96 | 1.76 | 0.34 | 0.03 | 0.01 |
DT(XBT,T) | 0.912 | 0.938 | 0.879 | 8.45 | 1.63 | 5.62 | 1.09 | −0.11 | −0.02 |
DT(X2Z,T) | 0.914 | 0.939 | 0.882 | 8.33 | 1.61 | 5.48 | 1.06 | −0.11 | −0.02 |
DT(XZH,T) | 0.960 | 0.996 | 0.982 | 0.49 | 0.09 | 0.22 | 0.04 | 0.005 | 0.001 |
DT(XBK,V) | 0.647 | 0.661 | 0.356 | 19.78 | 7.17 | 10.64 | 3.86 | 1.47 | 0.53 |
DT(XTR,V) | 0.602 | 0.606 | 0.200 | 22.05 | 7.99 | 11.43 | 4.14 | 1.09 | 0.39 |
DT(HTR,V) | 0.872 | 0.902 | 0.814 | 10.63 | 3.85 | 2.52 | 0.91 | 0.20 | 0.07 |
DT(XBT,V) | 0.675 | 0.687 | 0.319 | 20.34 | 7.37 | 10.57 | 3.83 | 0.86 | 0.31 |
DT(X2Z,V) | 0.568 | 0.634 | 0.049 | 24.03 | 8.71 | 10.44 | 3.78 | 0.68 | 0.25 |
DT(XZH,V) | 0.805 | 0.916 | 0.838 | 9.92 | 3.59 | 2.60 | 0.94 | 0.088 | 0.03 |
LG(XBK,T) | 0.722 | 0.841 | 0.702 | 13.25 | 2.56 | 7.63 | 1.47 | −0.21 | −0.04 |
LG(XTR,T) | 0.625 | 0.780 | 0.602 | 15.32 | 2.96 | 8.36 | 1.62 | −0.20 | −0.04 |
LG(HTR,T) | 0.778 | 0.862 | 0.741 | 12.35 | 2.39 | 3.21 | 0.62 | 0.03 | 0.01 |
LG(XBT,T) | 0.744 | 0.863 | 0.738 | 12.43 | 2.40 | 7.11 | 1.38 | −0.17 | −0.03 |
LG(X2Z,T) | 0.785 | 0.897 | 0.797 | 10.95 | 2.12 | 6.18 | 1.19 | −0.17 | −0.03 |
LG(XZH,T) | 0.924 | 0.972 | 0.942 | 5.84 | 1.13 | 2.05 | 0.40 | 0.006 | 0.001 |
LG(XBK,V) | 0.682 | 0.744 | 0.544 | 16.65 | 6.03 | 9.97 | 3.61 | 1.29 | 0.47 |
LG(XTR,V) | 0.595 | 0.669 | 0.434 | 18.55 | 6.72 | 10.77 | 3.90 | 1.18 | 0.43 |
LG(HTR,V) | 0.883 | 0.980 | 0.951 | 5.48 | 1.98 | 2.37 | 0.86 | − 0.09 | − 0.03 |
LG(XBT,V) | 0.731 | 0.759 | 0.558 | 16.40 | 5.94 | 9.70 | 3.52 | 1.03 | 0.37 |
LG(X2Z,V) | 0.751 | 0.775 | 0.583 | 15.91 | 5.77 | 9.12 | 3.30 | 0.87 | 0.32 |
LG(XZH,V) | 0.906 | 0.957 | 0.913 | 7.28 | 2.64 | 2.56 | 0.93 | − 0.563 | − 0.20 |
GBM(XBK,T) | 0.800 | 0.921 | 0.838 | 9.78 | 1.89 | 7.15 | 1.38 | −1.61 | −0.31 |
GBM(XTR,T) | 0.678 | 0.798 | 0.635 | 14.68 | 2.84 | 8.41 | 1.63 | −0.22 | −0.04 |
GBM(HTR,T) | 0.914 | 0.976 | 0.949 | 5.47 | 1.06 | 2.09 | 0.40 | − 0.55 | − 0.11 |
GBM(XBT,T) | 0.760 | 0.858 | 0.734 | 12.53 | 2.42 | 7.56 | 1.46 | −0.24 | −0.05 |
GBM(X2Z,T) | 0.901 | 0.962 | 0.923 | 6.72 | 1.30 | 4.71 | 0.91 | −0.26 | −0.05 |
GBM(XZH,T) | 0.913 | 0.963 | 0.928 | 6.53 | 1.26 | 2.84 | 0.55 | − 0.062 | − 0.012 |
GBM(XBK,V) | 0.701 | 0.734 | 0.523 | 17.03 | 6.17 | 10.33 | 3.74 | −0.26 | −0.09 |
GBM(XTR,V) | 0.583 | 0.678 | 0.452 | 18.25 | 6.61 | 10.66 | 3.86 | 1.43 | 0.52 |
GBM(HTR,V) | 0.888 | 0.963 | 0.923 | 6.84 | 2.48 | 2.74 | 0.99 | − 0.61 | − 0.22 |
GBM(XBT,V) | 0.708 | 0.765 | 0.577 | 16.03 | 5.81 | 9.61 | 3.48 | 1.39 | 0.50 |
GBM(X2Z,V) | 0.780 | 0.800 | 0.626 | 15.06 | 5.46 | 8.73 | 3.16 | 0.57 | 0.21 |
GBM(XZH,V) | 0.913 | 0.963 | 0.925 | 6.74 | 2.44 | 2.83 | 1.02 | 0.049 | 0.02 |
Abbreviation . | KGE . | r . | NSE . | RMSE . | MAE . | ME . | |||
---|---|---|---|---|---|---|---|---|---|
m3/s . | % . | m3/s . | % . | m3/s . | % . | ||||
RF(XBK,T) | 0.762 | 0.885 | 0.774 | 11.55 | 2.23 | 7.02 | 1.36 | −0.20 | −0.04 |
RF(XTR,T) | 0.699 | 0.853 | 0.714 | 12.99 | 2.51 | 7.59 | 1.47 | −0.18 | −0.03 |
RF(HTR,T) | 0.826 | 0.919 | 0.840 | 9.72 | 1.88 | 2.68 | 0.52 | 0.02 | 0.00 |
RF(XBT,T) | 0.786 | 0.904 | 0.808 | 10.65 | 2.06 | 6.49 | 1.25 | −0.18 | −0.03 |
RF(X2Z,T) | 0.826 | 0.940 | 0.874 | 8.63 | 1.67 | 4.95 | 0.96 | −0.31 | −0.06 |
RF(XZH,T) | 0.912 | 0.985 | 0.964 | 4.59 | 0.89 | 1.36 | 0.26 | − 0.028 | − 0.005 |
RF(XBK,V) | 0.692 | 0.767 | 0.582 | 15.93 | 5.77 | 9.69 | 3.51 | 1.41 | 0.51 |
RF(XTR,V) | 0.603 | 0.699 | 0.483 | 17.73 | 6.42 | 10.37 | 3.76 | 1.43 | 0.52 |
RF(HTR,V) | 0.858 | 0.971 | 0.931 | 6.49 | 2.35 | 2.28 | 0.83 | − 0.02 | − 0.01 |
RF(XBT,V) | 0.726 | 0.782 | 0.604 | 15.50 | 5.62 | 9.30 | 3.37 | 1.36 | 0.49 |
RF(X2Z,V) | 0.760 | 0.799 | 0.632 | 14.95 | 5.42 | 8.80 | 3.19 | 0.70 | 0.25 |
RF(XZH,V) | 0.894 | 0.971 | 0.937 | 6.17 | 2.24 | 2.25 | 0.81 | − 0.048 | − 0.02 |
DT(XBK,T) | 0.891 | 0.922 | 0.850 | 9.40 | 1.82 | 6.16 | 1.19 | −0.15 | −0.03 |
DT(XTR,T) | 0.848 | 0.891 | 0.795 | 11.01 | 2.13 | 6.88 | 1.33 | −0.13 | −0.03 |
DT(HTR,T) | 0.971 | 0.979 | 0.958 | 4.97 | 0.96 | 1.76 | 0.34 | 0.03 | 0.01 |
DT(XBT,T) | 0.912 | 0.938 | 0.879 | 8.45 | 1.63 | 5.62 | 1.09 | −0.11 | −0.02 |
DT(X2Z,T) | 0.914 | 0.939 | 0.882 | 8.33 | 1.61 | 5.48 | 1.06 | −0.11 | −0.02 |
DT(XZH,T) | 0.960 | 0.996 | 0.982 | 0.49 | 0.09 | 0.22 | 0.04 | 0.005 | 0.001 |
DT(XBK,V) | 0.647 | 0.661 | 0.356 | 19.78 | 7.17 | 10.64 | 3.86 | 1.47 | 0.53 |
DT(XTR,V) | 0.602 | 0.606 | 0.200 | 22.05 | 7.99 | 11.43 | 4.14 | 1.09 | 0.39 |
DT(HTR,V) | 0.872 | 0.902 | 0.814 | 10.63 | 3.85 | 2.52 | 0.91 | 0.20 | 0.07 |
DT(XBT,V) | 0.675 | 0.687 | 0.319 | 20.34 | 7.37 | 10.57 | 3.83 | 0.86 | 0.31 |
DT(X2Z,V) | 0.568 | 0.634 | 0.049 | 24.03 | 8.71 | 10.44 | 3.78 | 0.68 | 0.25 |
DT(XZH,V) | 0.805 | 0.916 | 0.838 | 9.92 | 3.59 | 2.60 | 0.94 | 0.088 | 0.03 |
LG(XBK,T) | 0.722 | 0.841 | 0.702 | 13.25 | 2.56 | 7.63 | 1.47 | −0.21 | −0.04 |
LG(XTR,T) | 0.625 | 0.780 | 0.602 | 15.32 | 2.96 | 8.36 | 1.62 | −0.20 | −0.04 |
LG(HTR,T) | 0.778 | 0.862 | 0.741 | 12.35 | 2.39 | 3.21 | 0.62 | 0.03 | 0.01 |
LG(XBT,T) | 0.744 | 0.863 | 0.738 | 12.43 | 2.40 | 7.11 | 1.38 | −0.17 | −0.03 |
LG(X2Z,T) | 0.785 | 0.897 | 0.797 | 10.95 | 2.12 | 6.18 | 1.19 | −0.17 | −0.03 |
LG(XZH,T) | 0.924 | 0.972 | 0.942 | 5.84 | 1.13 | 2.05 | 0.40 | 0.006 | 0.001 |
LG(XBK,V) | 0.682 | 0.744 | 0.544 | 16.65 | 6.03 | 9.97 | 3.61 | 1.29 | 0.47 |
LG(XTR,V) | 0.595 | 0.669 | 0.434 | 18.55 | 6.72 | 10.77 | 3.90 | 1.18 | 0.43 |
LG(HTR,V) | 0.883 | 0.980 | 0.951 | 5.48 | 1.98 | 2.37 | 0.86 | − 0.09 | − 0.03 |
LG(XBT,V) | 0.731 | 0.759 | 0.558 | 16.40 | 5.94 | 9.70 | 3.52 | 1.03 | 0.37 |
LG(X2Z,V) | 0.751 | 0.775 | 0.583 | 15.91 | 5.77 | 9.12 | 3.30 | 0.87 | 0.32 |
LG(XZH,V) | 0.906 | 0.957 | 0.913 | 7.28 | 2.64 | 2.56 | 0.93 | − 0.563 | − 0.20 |
GBM(XBK,T) | 0.800 | 0.921 | 0.838 | 9.78 | 1.89 | 7.15 | 1.38 | −1.61 | −0.31 |
GBM(XTR,T) | 0.678 | 0.798 | 0.635 | 14.68 | 2.84 | 8.41 | 1.63 | −0.22 | −0.04 |
GBM(HTR,T) | 0.914 | 0.976 | 0.949 | 5.47 | 1.06 | 2.09 | 0.40 | − 0.55 | − 0.11 |
GBM(XBT,T) | 0.760 | 0.858 | 0.734 | 12.53 | 2.42 | 7.56 | 1.46 | −0.24 | −0.05 |
GBM(X2Z,T) | 0.901 | 0.962 | 0.923 | 6.72 | 1.30 | 4.71 | 0.91 | −0.26 | −0.05 |
GBM(XZH,T) | 0.913 | 0.963 | 0.928 | 6.53 | 1.26 | 2.84 | 0.55 | − 0.062 | − 0.012 |
GBM(XBK,V) | 0.701 | 0.734 | 0.523 | 17.03 | 6.17 | 10.33 | 3.74 | −0.26 | −0.09 |
GBM(XTR,V) | 0.583 | 0.678 | 0.452 | 18.25 | 6.61 | 10.66 | 3.86 | 1.43 | 0.52 |
GBM(HTR,V) | 0.888 | 0.963 | 0.923 | 6.84 | 2.48 | 2.74 | 0.99 | − 0.61 | − 0.22 |
GBM(XBT,V) | 0.708 | 0.765 | 0.577 | 16.03 | 5.81 | 9.61 | 3.48 | 1.39 | 0.50 |
GBM(X2Z,V) | 0.780 | 0.800 | 0.626 | 15.06 | 5.46 | 8.73 | 3.16 | 0.57 | 0.21 |
GBM(XZH,V) | 0.913 | 0.963 | 0.925 | 6.74 | 2.44 | 2.83 | 1.02 | 0.049 | 0.02 |
Bold values represent good model performance.
Radar plot for: (a) dimensionless and (b) dimensional errors in the training step.
Radar plot for: (a) dimensionless and (b) dimensional errors in the training step.
Radar plot for: (a) dimensionless and (b) dimensional errors in the testing step.
Radar plot for: (a) dimensionless and (b) dimensional errors in the testing step.
Comparison between estimated and observed streamflow when using (a) RF for the training step, (b) RF for the testing step, (c) DT for the training step, (d) DT for the testing step, (e) LG for the training step, (f) LG for the testing step, (g) GBM for the training step, and (h) GBM for the testing step.
Comparison between estimated and observed streamflow when using (a) RF for the training step, (b) RF for the testing step, (c) DT for the training step, (d) DT for the testing step, (e) LG for the training step, (f) LG for the testing step, (g) GBM for the training step, and (h) GBM for the testing step.
When rainfall at Bac Kan station was used as input data for the RF model (e.g., RF(XBK,T) and RF(XBK,V)), the model demonstrated moderate performance during both the training and testing phases. The dimensionless errors (KGE, r, and NSE) range from 0.58 to 0.89, while the dimensional errors (RMSE, MAE, and ME) vary between −0.20 and 15.9 m3/s, corresponding to −0.04 to 5.77% of the observed streamflow magnitude at Thac Rieng. Slightly smaller dimensionless errors and slightly larger dimensional errors are observed when rainfall data at Thac Rieng is used as input in the RF model. A good agreement between observed and estimated streamflows is achieved when the water level is used as input data, with dimensionless errors ranging from 0.83 to 0.97 and dimensional errors varying from 0.02 to 9.72 m3/s. These results align with the correlation coefficients between streamflow at Thac Rieng and the meteo-hydrological data at individual stations, as reported in Table 1. The model reproduced observed streamflow very accurately when additional meteo-hydrological data were used such as (i) rainfall at both Bac Kan and Thac Rieng, (ii) rainfall at both stations combined with evaporation at Bac Kan, or (iii) rainfall and evaporation at Bac Kan combined with water level at Thac Rieng. The dimensionless errors range from 0.60 to 0.99, while dimensional errors vary between −0.31 and 15.5 m3/s. These findings suggest that the RF model using the second input data group tends to outperform models using only meteo-hydrological data at individual station. In other words, a good agreement between observed and estimated streamflows is also obtained when more comprehensive meteo-hydrological data are used. In addition, among the six scenarios tested, the combination of (i) water level and rainfall at Thac Rieng and (ii) rainfall and evaporation at Bac Kan results in the most accurate streamflow estimates compared to other scenarios.
The DT model, provides a good estimation of observed streamflow during the training phase when using the first input data group. The dimensionless errors (KGE, r, and NSE) range from 0.80 to 0.98, indicating that the model accurately represents both the value and variation trends of streamflow during the training period. The dimensional errors (RMSE, MAE, and ME) are less than 11 m3/s, which is approximately only 2.13% of the observed streamflow magnitude at the station. In the testing phase, the dimensionless errors range from 0.20 to 0.90, while the dimensional errors vary from 0.2 to 22.05 m3/s. When the second input data group is applied, high dimensionless accuracy and low dimensional errors are observed in the training phase, with dimensionless errors ranging from 0.88 to 0.99, and dimensional errors from 0.005 to 8.45 m3/s. In the testing phase, the dimensionless errors range from 0.05 to 0.92, and the dimensional errors vary between 0.088 and 24.03 m3/s. While the DT model performs well in the training phase, its performance is slightly weaker during the testing phase. Among the six input scenarios investigated, the DT model simulates the observed streamflow effectively when using either the water level at Thac Rieng or a combination of (i) water level and rainfall at Thac Rieng and (ii) evaporation and rainfall at Bac Kan.
Regarding the LightGBM model, the LightGBM model represents the measured streamflow acceptably during the training phase when using the first and second input data groups. The dimensionless errors range between 0.60 and 0.97, while dimensional errors vary from −0.21 to 15.32 m3/s (see Table 3). In the testing phase, similar results are obtained regardless of whether the first or second input data group is applied. Specifically, dimensionless errors range from 0.43 to 0.96, while dimensional errors range from −0.56 to 18.55 m3/s (see Table 3). Among the different input scenarios investigated, the LightGBM model showed the best performance in both the training and testing phases when meteorological and hydrological data from Bac Kan and Thac Rieng were used as input data (see Figures 4 and 5). In this case, dimensionless errors exceeded 0.92, and dimensional errors were less than 5.84 m3/s. These results are consistent with those obtained from the RF and DT models, showing that the more input data provided, the more accurate streamflow estimations.
In the training phase of the GBM model, there is a very good agreement between observed and estimated streamflow when using both the first and second input data groups, especially in cases using either the water level at Thac Rieng or a combination of (i) water level and rainfall at Thac Rieng and (ii) rainfall and evaporation at Bac Kan. When using the first input data group, dimensionless errors range from 0.64 to 0.98, while dimensional errors are less than 14.68 m3/s, corresponding to 2.84% of the observed streamflow magnitude. With the second input data group, dimensionless errors range from 0.73 to 0.96, and dimensional errors vary from −0.26 to 12.53 m3/s (see Table 3). In the testing phase, similar values for both dimensionless and dimensional errors are obtained when using either the first or second input data group (see Table 3). Furthermore, among the six input scenarios, the GBM model showed the best performance in both the training and testing phases when using the combination of rainfall and evaporation at Bac Kan together with rainfall and the water level at Thac Rieng (see Figures 4 and 5).
In general, across the RF, DT, LightGBM, and GBM models investigated, the best agreement between observed and estimated streamflow is achieved when using all meteo-hydrological data at both Thac Rieng and Bac Kan (Figure 6). Using the BO algorithm, optimal values of hyper-parameters in the best performance of the DT model were identified as max_depth = 22. The optimal values of hyper-parameters for the best performance of the RF model were n_estimators = 77, max_features = 0.27, max_depth = 26, and min_sample_split = 3. In the LightGBM model, optimal hyper-parameters were n_estimators = 179, max_depth = 13, and min_sample_split = 2. The best results of the GBM model correspond to the use of hyper-parameters values as n_estimators = 89, lr = 0.09, max_depth = 3, and min_child_weight = 1. These models also perform well, with high accuracy and low dimensional and dimensionless errors when the water level at Thac Rieng is used as input. Reliable streamflow estimates, with acceptable errors, are obtained when using rainfall data alone or in combination with evaporation data. The results from all four models demonstrate that four DT-based models named RF, DT, LightGBM, and GBM using combined input data tend to outperform models using data from individual stations.
Comparison of best results obtained in each model
Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the training step.
Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the training step.
Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the testing step.
Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the testing step.
Extremely event capturing
Results of multiple tree-based models in estimating extreme high and low streamflow: (a) a log scale for exceedance probability and (b) a log scale for streamflow.
Results of multiple tree-based models in estimating extreme high and low streamflow: (a) a log scale for exceedance probability and (b) a log scale for streamflow.
Application for streamflow reconstruction
Different statistical characteristics of observed and estimated streamflow
Model . | Collected period . | Mean value . | Range . | Standard deviation . | Coefficient of variation . |
---|---|---|---|---|---|
Observation | 1960–1981, 2018–2022 | 18.4 | 2.22–517 | 24.04 | 1.305 |
RF | 1960–2022 | 20.0 | 2.32–370.0 | 22.60 | 1.128 |
DT | 19.9 | 2.22–517.0 | 25.25 | 1.270 | |
LG | 20.2 | 0.03–393.2 | 24.15 | 1.196 | |
GBM | 20.0 | 3.85–495.2 | 23.73 | 1.188 |
Model . | Collected period . | Mean value . | Range . | Standard deviation . | Coefficient of variation . |
---|---|---|---|---|---|
Observation | 1960–1981, 2018–2022 | 18.4 | 2.22–517 | 24.04 | 1.305 |
RF | 1960–2022 | 20.0 | 2.32–370.0 | 22.60 | 1.128 |
DT | 19.9 | 2.22–517.0 | 25.25 | 1.270 | |
LG | 20.2 | 0.03–393.2 | 24.15 | 1.196 | |
GBM | 20.0 | 3.85–495.2 | 23.73 | 1.188 |
Time series of estimated streamflow in the period from 1960 to 2022.
Interpreted final model
SHAP analysis showing feature importance for predicting streamflow at Thac Rieng: (a) mean SHAP values indicate the water level and (b) SHAP summary plot shows how feature values impact predictions.
SHAP analysis showing feature importance for predicting streamflow at Thac Rieng: (a) mean SHAP values indicate the water level and (b) SHAP summary plot shows how feature values impact predictions.
Figure 11(b) adds further depth to this analysis by illustrating not only the importance of these features but also how their actual values affect the predicted streamflow. The x-axis shows the SHAP values, which quantify the positive or negative impact each feature has on the model's output. The color gradient, with blue representing lower values and red representing higher values, indicates the actual magnitude of each feature. For example, HTRt−2, the most important feature, shows that higher water levels (red) lead to positive SHAP values, meaning they increase the predicted streamflow. Conversely, lower water levels (blue) result in negative SHAP values, decreasing the predicted streamflow. This makes intuitive sense from a hydrological perspective: higher water levels from two-time steps ago indicate an increased storage in the river system, which translates into higher downstream flows after a time lag. A similar relationship is observed with XTRt−1, where higher rainfall values (red) push the predicted streamflow upwards, and lower rainfall values (blue) push it downwards. Rainfall at Thac Rieng has a relatively immediate effect on the streamflow, as it contributes directly to surface runoff and river levels. The rainfall at Bac Kan (XBK) also exhibits this pattern but with a narrower spread of SHAP values, suggesting that while rainfall in Bac Kan impacts streamflow at Thac Rieng, it does so with less variability. The more distant location of Bac Kan and the delayed effects of upstream rainfall reduce its influence compared to local rainfall at Thac Rieng. Evaporation at Bac Kan (ZBK) shows the smallest impact, regardless of whether evaporation values are high or low. This indicates that evaporation, while part of the overall water balance, contributes minimally to the short-term prediction of streamflow.
CONCLUSIONS
In this study, different tree-based ML models named DT, RF, LightGBM, and GBM were investigated to estimate streamflow in river basins, and they were specifically applied in the Cau River at the Thac Rieng location to illustrate their performance. The performance of these models was evaluated for six input scenarios to assess their robustness and predictive capabilities in streamflow estimation. In each mode, hyper-parameters were optimized using the BO tool, which allowed for more efficient and effective optimization compared to traditional or random search methods.
The results showed that while all four models performed satisfactorily in estimating streamflow, there were clear differences in their accuracy and reliability. The ensemble methods, particularly RF and GBM, consistently outperformed the single-tree models like the DT. Among the ensemble methods, GBM emerged as the best-performing model, offering superior predictive power across various data scenarios. Dimensionless statistical errors (KGE, r, and NSE) of streamflow range from 0.81 to 0.99, while dimensional errors of streamflow were smaller than 3.59% of the observed streamflow magnitude. The model is able to handle non-linear relationships among different components of streamflow processes, combined with its robustness to overfitting, which made it particularly well-suited for estimating streamflow tasks.
The use of XAI provided valuable insights into the factors most critical for accurately estimating streamflow, identifying time-lagged water levels and rainfall as the most influential variables. This additional layer of understanding helped validate the model output and enhanced confidence in the model's practical applicability. The results also demonstrate that ensemble-based algorithms of ML, especially XGBoost, are highly effective for streamflow estimation in complex hydrological systems like the Cau River basin. The integration of XAI techniques also adds significant value, making the models not only more accurate but also more interpretable and transparent, which is essential for practical applications in water resource management.
The results indicated that the water level at Thac Rieng and meteorological-hydrological data are the primary factors influencing the accuracy of estimated streamflow in the studied river basin. This finding is expected to assist both researchers and decision-makers in using hydrological and hydraulic models in the future.
ACKNOWLEDGEMENTS
The author extends sincere thanks to the editor and two anonymous reviewers for their insightful suggestions and comments.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.