Accurate streamflow estimation is critical in water resources management and requires precise estimation. In this study, multiple tree-based models (decision tree, random forest, light gradient boosting machine, and gradient boosting machine (GBM)) are employed to estimate streamflow in the Cau River basin at Thac Rieng station, providing an illustrative example. Six indices named correlation coefficient (r), Nash–Sutcliffe efficiency (NSE), Kling–Gupta efficiency (KGE), root mean square error (RMSE), mean absolute error (MAE), and mean error (ME) are implemented to quantitatively assess model performance. First, using the available hydro-meteorological data from 1/1/1960 to 31/12/1981, hyper-parameters in each tree model are determined. The results show that four models reproduced well-observed streamflow, with dimensionless errors (r, NSE, and KGE) ranging from 0.84 to 0.99 and dimensional errors (RMSE, MAE, and ME) are less than 10 m3/s (approximately 3.6% of the observed streamflow magnitude). Second, the GBM model was found as the most suitable for simulating observed streamflow in the studied river basin. Third, four models were applied to estimate streamflow in the period from 1/1/1982 to 31/12/2022, revealing similar magnitudes of different statistical indicators of observed and estimated streamflow. The capability of four models in simulating streamflow is finally discussed.

  • Four tree-based models were investigated for daily streamflow estimation in the Cau River basin.

  • The performance of four tree-based models was assessed using six statistical indices.

  • The gradient boosting machine model was found as the most suitable for simulating observed streamflow.

  • Explainable machine learning identifies time-lagged water levels and rainfall as the most significant variables.

ANN

artificial neural networks

DL

deep learning

DT

decision tree

GBM

gradient boosting machine

GP

Gaussian process

KGE

Kling–Gupta efficiency

LightGBM

light gradient boosting machine

MAE

mean absolute error

ME

mean error

ML

machine learning

NSE

Nash–Sutcliffe efficiency

r

correlation coefficient

RF

random forest

RMSE

root mean square error

SHAP

SHapley Additive exPlanations

XAI

explainable machine learning

XGB

extreme gradient boosting

Accurate streamflow estimation is significant for water resources management, flood forecasting, and hydropower generation (Pham Van & Nguyen-Van 2022). Hydrological processes inherently exhibit complex and highly non-linear relationships among components such as precipitation, evaporation, transpiration, runoff, infiltration, groundwater, surface water, soil moisture, revealing significant challenges for making reliable streamflow estimation (Dasgupta et al. 2024). Although there have been many efforts from scholars around the world in finding the optimal method to improve streamflow estimation, efforts still need to be continuously built up because today's hydrological process is affected by human economic production activities, land cover, land use, and climate change makes streamflow forecasting more and more difficult.

Traditionally, physics-based models, which represent the physics processes by mathematics equations are the main tool for streamflow estimation (Won et al. 2023; Imhoff et al. 2024). These models integrate various physical processes, such as precipitation, evaporation, infiltration, and surface runoff, to simulate how water moves through a watershed (Pophillat et al. 2021; Liu et al. 2024; Plunge et al. 2024). Key components typically include equations for water balance, continuity, and momentum, which account for the interactions between rainfall, soil moisture, vegetation, and topography (Beven 2012). By incorporating detailed spatial and temporal data, physics-based models can provide high-resolution prediction of streamflow, capturing the complexities of natural hydrological systems, making them particularly valuable for assessing the impact of land use changes, climate variability, and water management practices on river flow, and aiding in effective water resource planning and flood risk management (Bal et al. 2021; Alawi & Özkul 2023). However, physics-based models require extensive and high-quality data inputs, which can be difficult and expensive to obtain, especially in data-scarce regions (Yang et al. 2020). Indeed, the complexity of models often results in high computational demands and longer processing times (Zahura et al. 2020). Additionally, the need for precise calibration and validation using observed data can be challenging and time-consuming. These limitations can hinder their practical application, especially for real-time streamflow forecasting and in regions where data availability is limited (Ivanov et al. 2021).

Currently, data-driven models, such as machine learning (ML) and deep learning (DL) have emerged as powerful tools for streamflow prediction by leveraging large datasets to learn the complex pattern and non-linear relationships in hydrological data (Sit et al. 2020; Hauswirth et al. 2021; Tripathy & Mishra 2023). Unlike the traditional physics-based models, ML approaches do not require explicit physical equations, instead, they use historical data to train algorithms to make accurate predictions. Data-driven approaches such as artificial neural networks (Uhrig 1995), long-short term memory (Hochreiter & Schmidhuber 1997) or decision tree (DT)-based models have been demonstrated superior in the hydrological filed. Kedam et al. (2024) demonstrated the efficacy of DT-based models in their study on streamflow forecasting in the Narmada river basin, particularly highlighting the robustness of the random forest (RF) model for streamflow estimation. Furthermore, other tree-based models have also been shown to perform well for streamflow prediction. Szczepanek (2022) evaluated the performance of three ML models, namely extreme gradient boosting (XGB), light gradient boosting machine (LightGBM), and categorical boosting (CatBoot), for streamflow prediction in a mountainous catchment. Their results showed that, when hyper-parameters of ML algorithms were optimized, the LightGBM model showed the best performance among the three algorithms. From the results of the above-mentioned studies it can be observed that data-based algorithms generally show high performance for streamflow prediction. ML algorithms are not without limitations, although they require less input data when compared to physical models. They are also notorious for requiring large amounts of data for training (Chia et al. 2022; Cui et al. 2022). This is one of their limitations, especially when they are applied to watersheds with scarce data or sparse gauge density. This means that the predictions from these models do not follow any principles and completely ignore the physical factors of hydrological processes. This can be considered a fundamental weakness of ML models (Núñez et al. 2023).

To overcome the above limitations, explainable artificial intelligence (XAI) is becoming an indispensable tool in exploiting ML models (Jiang et al. 2024; Perera et al. 2024). One of the key tools in XAI is SHapley Additive exPlanations (SHAP), which provides interpretable insights into the predictions made by complex models. In the context of streamflow estimation, SHAP can help hydrologists and water resources managers understand the contribution of each predictor variable, such as precipitation, evaporation, water level, and historical streamflow data, to the model's output (Sushanth et al. 2023; Huang & Zhang 2024; Wang et al. 2024). By quantifying the impact of individual features, SHAP enhances the transparency of ML models, making it easier to validate and trust their predictions (Lundberg & Lee 2017). This is particularly crucial for decision-making processes in water management, where understanding the underlying factors driving model predictions can lead to more informed and effective strategies. Moreover, SHAP can aid in identifying and mitigating potential bias models, ensuring that the predictions are not only accurate but also fair and reliable.

The motivation for our current study stems from the need to enhance the interpretability and reliability of DT-based models used for streamflow estimation. While these models, such as RF, and gradient boosting machine (GBM), are known for their robustness and accuracy, their complexity often hinders their applicability. By incorporating SHAP into our analysis, the present study also aims to unravel the decision-making process of these models. SHAP provides a clear understanding of how each input variable influences the model's predictions, enabling to identification of key drivers of streamflow changes. This study, therefore, seeks to bridge the gap between high predictive performance and interpretability, ultimately contributing to more transparent and effective hydrological forecasting as well.

The general flowchart for streamflow estimation used in this study is shown in Figure 1, comprising (i) the pre-processing stage, which involves the collection of meteo-hydrological data, and the identification of input variable groups and output data, (ii) the processing stage, where multiple tree-based models (DT, RF, LightGBM, and GBM) are trained and tested using the 70 and 30% datasets, respectively as well as followed by suitable models selection, and (iii) the post-processing stage, which focuses on applying models to estimate streamflow. Note that hyper-parameters are optimized using Bayesian optimization (BO), with root mean square error (RMSE) of streamflow as the objective function. Six indices, including both dimensionless and dimensional errors, are employed for quantitative comparison of the estimated streamflow with observed data. In addition, SHAP analysis is performed to evaluate the influences of each input variable on the model's performance.
Figure 1

General flowchart diagram of estimating streamflow in the present study.

Figure 1

General flowchart diagram of estimating streamflow in the present study.

Close modal

Decision tree

DT regression belongs to the supervised class of ML used for predicting continuous values (Quinlan 1986). It operates by splitting the dataset into subsets based on feature values, forming a tree structure that consists of nodes, branches, and leaves. The root nodes represent the entire dataset and initiate the first split. Internal nodes represent decision rules based on specific feature values, and samples per leaf are met. The leaf nodes, which are the terminal points of the tree, provide predicted continuous values. The DT algorithm for regression aims to minimize the variance within each subset to determine the best splits. The variance of dataset D is calculated as the average of the squared differences between each target value yi and the mean target value .

Random forest

RF is an ensemble learning method used for both classification and regression tasks that construct multiple DTs during train and outputs the aggregate of their prediction (Nguyen et al. 2021). For regression, the RF model builds several DT on different subsets of data, introduces the randomness by selecting a random subset of features at each split and by training each tree on a random sample of data obtained from the bootstrapping (random sampling with replacement). This ensemble approach aims to enhance predictive accuracy and reduce overfitting compared to a single DT. The final prediction for a given input is obtained by averaging the predictions from all individual trees as the following equation:
(1)
where T is the total number of trees, and hx (t) is the prediction of the tth tree for the input x. The averaging process reduces the variance of the model, leading to more accurate and reliable predictions.

Gradient boosting machine

GBM is an advanced ensemble learning method used for regression and classification tasks, which builds a model in a stage-wise manner by sequentially adding DTs to minimize the loss function (Nguyen et al. 2023). Unlike the RF, which trains multiple DT independently and combines their predictions, GBM builds trees one at a time, with each new tree focusing on the most difficult-to-predict cases, thereby improving overall accuracy. For regression, the prediction for a given input is the sum of the predictions from all the trees, with each tree being fitted to the residual errors of the preceding trees. The prediction y of GBM is given as below equation:
(2)
where M is the number of trees, hm (x) is the prediction of the mth tree for the input x, and αm is the learning rates that scale the contribution of each tree. This additive model reduces the bias and variance, leading to highly accurate and robust predictions. The key difference between GBM and RF lies in the approach, i.e., GBM focuses on sequentially improving the model by addressing errors, whereas RF relies on averaging multiple independently trained trees to enhance predictive performance.

Light gradient boosting machine

LightGBM, which is another boosting ML algorithm, is a highly efficient and fast implementation of the gradient boosting framework specifically designed for large-scale and high-performance regression and classification problems. Like the traditional GBM, LightGBM builds an ensemble of DT in a sequential manner, where each tree corrects the errors of its predecessors. However, LightGBM introduces several key innovations that set it apart from standard GBM, including the use of histogram-based algorithms for faster training, leaf-wise tree growth instead of level-wise, and advanced techniques for handling categorical features. These enhancements enable LightGBM to achieve faster training times and better scalability while maintaining high accuracy.

BO for hyper-parameters

BO is a method for optimizing expensive-to-evaluate functions that do not have an explicit analytical form, making it particularly useful in scientific research. The process relies on constructing a probabilistic surrogate model of the objective function, most commonly using a Gaussian process (GP). The GP models the objective function f(x) by providing a posterior distribution over functions based on observed data points , where yi = f (xi) + ɛ and ɛ represents Gaussian noise. The GP is defined by its mean function μ(x) and covariance function k(x,x′). Given observed data, the posterior mean μ(x|D) and variance σ2(x|D) at the new point x are given by:
(3)
(4)
where X is the set of observed input points, K is the covariance matrix computed over X with entries k (xi, xj), and y is the vector of observed output. The acquisition function α(x|D) determines the next point xnext to evaluate balancing exploration and exploitation. A common choice of is the expected improvement (EI) acquisition function, which is defined as follows:
(5)
where f(x+) is the best observed value. The EI can be computed as:
(6)
where Z is calculated as follows:
(7)
where Φ and ϕ are the cumulative distribution function and probability density function of the standard normal distribution, respectively. The term ζ is the parameter that controls the tradeoff between exploration and exploitation. BO iteratively updates the GP model and the acquisition function with each new observation, converging efficiently to the global optimum of the objective function with a minimal number of evaluations. This process is particularly advantageous in scientific applications, where it is essential to optimize resource usage while achieving high precision in optimization outcomes.

Explainable artificial intelligence

Explainable machine learning (XAI) focuses on creating models that are interpretable and transparent, which is crucial in fields where accountability, fairness, and trust are paramount, such as healthcare, finance, legal systems, and hydrology. In hydrology, XAI enhances the understanding and reliability of predictive models used for water resource management, flood forecasting, and climate change impact assessments. One of the most powerful techniques for explainability in ML is SHAP, a framework based on Shapley values from cooperative game theory, which fairly distributes the model's prediction among the features. The main core of SHAP is based on the Shapely value formula, which ensures that the sum of feature attributions equals the model prediction, providing consistent and interpretable explanations. The Shapley value ϕi for the feature i is calculated as follows:
(8)
where F is the set of all features, S is the subset of F that does not include feature i, f(S) is the model's prediction using feature subset S, | S | is the number of features in subset S. SHAP offers both local and global interpretability, revealing the overall impact of each feature on the model, and local interpretability, explain individual predictions (Lundberg & Lee 2017). Its model-agnostic nature allows it to be applied to any ML model, including complex models like DL. Additionally, SHAP provides various visualization tools, such as summary plots, dependance plots, and force plots, to intuitively illustrate feature impacts and interaction. In hydrology, XAI can be used to determine the impact of various factors (e.g., precipitation, evaporation, and water level) on hydrological phenomena like streamflow and flood risks. For example, SHAP can be used to elucidate why a model predicted a high risk of flooding in a specific area by highlighting the contribution factors, such as recent rainfall patterns, soil moisture level, and upstream flow. The transparency fosters trust in the model and enables water resources managers and policymakers to make informed decisions. Thus, SHAP is an indispensable tool in explainable machine learning, ensuring transparency and reliability in artificial intelligence (AI) systems, and its application in hydrology significantly enhances the interpretability and trustworthiness of predictive models in managing and mitigating water-related risks.

Dimensionless and dimensional errors

Dimensionless and dimensional errors, including Kling–Gupta efficiency (KGE), correlation coefficient (r), Nash–Sutcliffe efficiency (NSE), RMSE, mean absolute error (MAE), and mean error (ME), are employed to quantitatively assess models' performance (Nash & Sutcliffe 1970; Knoben et al. 2019). These dimensionless and dimensional errors are calculated as follows:
(9)
(10)
(11)
(12)
(13)
(14)
where Qobs and Qsim present measured and computed streamflows, respectively; i is the time step, N denotes the length of the time series data; Qobs,m and Qsim,m stand for mean measured and computed streamflows, respectively; and σobs and σsim are, respectively, the standard deviation of measured and computed streamflows.

Note that dimensionless and dimensional errors mentioned above provide a quantitative evaluation of the models' ability to simulate measured streamflow. For example, r and NSE assess the ratio of residual variance (or noise) to the variance in observations, offering valuable insights into the comparison between observed and predicted values. KGE is a reliable goodness-of-fit metric for comparing estimations to measurements. Meanwhile, RMSE reflects the error magnitude in the same units as the data, making it useful for analyzing results. The optimal values for KGE, r, and NSE, r are 1, while RMSE and MAE are optimal when close to zero.

The Cau River Basin

The river of interest is the Cau River, which extends from its source in the Tam Tao mountains (at an elevation of 1,326 m) to the Thac Rieng location in Bac Kan province (see Figure 2). The river area is approximately 712 km2, part of a total river basin of 6,030 km2. The latter associates from the river's source to the Pha Lai confluence in the Thai Binh River system. In the upper region, the river features a narrow, steep riverbed with a slope of approximately 10%. In the middle region, which extends from Cho Chu to Thac Huong waterfall, the river has an observed slope of about 0.5% and a considerably wide width. In the downstream area, from Thac Huong to the Pha Lai confluence, the river width ranges from 70 to 150 m, with a slope of 0.1%. Streamflow in the Cau River varies significantly both throughout the year and between the flood and dry seasons. The flood season normally starts in June and ends between September and October, depending on the location within the basin. During this period, the streamflow volume accounts for about 80–85% of the total annual volume. Maximum values of streamflow occur in June, July, and August, and these values are of about 50–60% of the annual streamflow. The dry season typically happens between November and May. Indeed, low streamflow values often occur from January to March, with the lowest value around 1.6–2.5% of the annual streamflow observed in February.
Figure 2

Map of the river of interest and meteo-hydrological locations.

Figure 2

Map of the river of interest and meteo-hydrological locations.

Close modal

Collection of meteo-hydrological data

Meteo-hydrological data were collected in the period from 01/01/1960 to 31/12/2022 (Table 1). These meteo-hydrological data include records of evaporation and rainfall at Bac Kan as well as of rainfall, water level, and streamflow at Thac Rieng station, on a daily timescale. In this study, streamflow data at Thac Rieng serves as the target output, while the remaining meteo-hydrological data are utilized as input for different DT-based models (DT, RF, LightGBM, and GBM). Hyper-parameters of each model are determined by training and testing steps using the BO algorithm. Figure 3 illustrates daily records of collected meteo-hydrological data, revealing that rainfall at Bac Kan ranges from 0 to 305 mm, while evaporation varies between 0 and 9.2 mm. A value of 517 m3/s is also observed for the streamflow magnitude at Thac Rieng, while the mean, standard deviation, skewness, and kurtosis of the daily flow during the collected period (i.e., 1960–1981 and 2018–2022) are 18.43 m3/s, 24.04 m3/s, 5.55, and 61.59, respectively.
Table 1

Meteo-hydrological stations and various statistical properties of collected data

NameVariableData collection periodMean valueRangeStandard deviationCoefficient of variationCorrelation coefficienta
Bac Kan Evaporation, Z (mm) 1960–2022 2.163 0–9.2 −0.125 0.489 −0.149 
Rainfall, X (mm) 1960–2022 4.111 0–304.9 0.356 2.923 0.609 
Thac Rieng Rainfall, X (mm) 1960–2022 3.691 0–201.6 0.325 2.973 0.564 
Water level, H (m) 1960–2022 94.60 93.83–100 0.390 0.004 0.820 
Streamflow, Q (m3/s) 1960–1981, 2018–2022 18.4 2.22–517 24.04 1.305 1.0 
NameVariableData collection periodMean valueRangeStandard deviationCoefficient of variationCorrelation coefficienta
Bac Kan Evaporation, Z (mm) 1960–2022 2.163 0–9.2 −0.125 0.489 −0.149 
Rainfall, X (mm) 1960–2022 4.111 0–304.9 0.356 2.923 0.609 
Thac Rieng Rainfall, X (mm) 1960–2022 3.691 0–201.6 0.325 2.973 0.564 
Water level, H (m) 1960–2022 94.60 93.83–100 0.390 0.004 0.820 
Streamflow, Q (m3/s) 1960–1981, 2018–2022 18.4 2.22–517 24.04 1.305 1.0 

aCorrelation coefficient is calculated using the available data in the collected period.

Figure 3

Time series of: (a) evaporation at Bac Kan, (b) rainfall at Bac Kan, (c) rainfall at Thac Rieng, (d) water level at Thac Rieng, and (e) streamflow at Thac Rieng during the data collection period.

Figure 3

Time series of: (a) evaporation at Bac Kan, (b) rainfall at Bac Kan, (c) rainfall at Thac Rieng, (d) water level at Thac Rieng, and (e) streamflow at Thac Rieng during the data collection period.

Close modal

Model setup

Multiple DT-based models named DT, RF, GBM, and LightGBM were employed to estimate streamflow at the location of interest in the studied river. In order to identify hyper-parameters of multiple models, the BO algorithm presented in Section 2 was employed, with a k-fold cross-validation strategy. BO enhances the model's performance by systematically searching the hyper-parameter space, while k-fold cross-validation ensures robust and reliable model validation by splitting the dataset into k = 5 subsets, training the model k times, each time using a different subset as the validation set and the remaining subsets as the training set. Note that a 5-fold cross-validation strategy was implemented to avoid potential over-training issues and ensure robust model performance. Indeed, a maximum of 30 iterations was also set as in many previous studies (e.g., Zhang et al. 2023; Tran et al. 2024).

In each investigated model, two groups of input data were tested. The first input data group includes data from individual stations, such as (i) rainfall at Bac Kan, (ii) rainfall at Thac Rieng, and (iii) water level at Thac Rieng. The second input data group consists of combined meteo-hydrological data from multiple stations, such as (i) rainfall at Thac Rieng and Bac Kan, (ii) rainfall at Thac Rieng and Bac Kan together with evaporation at Bac Kan, and (iii) evaporation and rainfall at Bac Kan as well as rainfall and water level at Thac Rieng. These two input groups resulted in six input data scenarios, which were tested using multiple tree-based models, as summarized in Table 2. The abbreviations (XBK,T), (XTR,T), (HTR,T), (XBT,T), (X2Z,T), and (XZH,T) refer to the input data as follows rainfall at Bac Kan, rainfall at Thac Rieng, water level at Thac Rieng, rainfall at both Bac Kan and Thac Rieng, rainfall at Thac Rieng and Bac Kan together with evaporation at Bac Kan, and rainfall at Thac Rieng and Bac Kan combined with evaporation at Bac Kan and water level at Thac Rieng, respectively. The subscript ‘T’ denotes the training step, while ‘V’ denotes the testing step.

Table 2

Different scenarios of input data when using multiple DT-based models

AbbreviationRainfallThe water level at Thac RiengEvaporation at Bac KanStep
Bac KanThac Rieng
RF 
RF(XBK,T) ✓       Training 
RF(XTR,T)  ✓   
RF(HTR,T)   ✓  
RF(XBT,T) ✓ ✓   
RF(X2Z,T) ✓ ✓  ✓ 
RF(XZH,T) ✓ ✓ ✓ ✓ 
RF(XBK,V) ✓    Testing 
RF(XTR,V)  ✓   
RF(HTR,V)   ✓  
RF(XBT,V) ✓ ✓   
RF(X2Z,V) ✓ ✓  ✓ 
RF(XZH,V) ✓ ✓ ✓ ✓ 
DT 
DT(XBK,T) ✓    Training 
DT(XTR,T)  ✓   
DT(HTR,T)   ✓  
DT(XBT,T) ✓ ✓   
DT(X2Z,T) ✓ ✓  ✓ 
DT(XZH,T) ✓ ✓ ✓ ✓ 
DT(XBK,V) ✓    Testing 
DT(XTR,V)  ✓   
DT(HTR,V)   ✓  
DT(XBT,V) ✓ ✓   
DT(X2Z,V) ✓ ✓  ✓ 
DT(XZH,V) ✓ ✓ ✓ ✓ 
LightGBM 
LG(XBK,T) ✓    Training 
LG(XTR,T)  ✓   
LG(HTR,T)   ✓  
LG(XBT,T) ✓ ✓   
LG(X2Z,T) ✓ ✓  ✓ 
LG(XZH,T) ✓ ✓ ✓ ✓ 
LG(XBK,V) ✓    Testing 
LG(XTR,V)  ✓   
LG(HTR,V)   ✓  
LG(XBT,V) ✓ ✓   
LG(X2Z,V) ✓ ✓  ✓ 
LG(XZH,V) ✓ ✓ ✓ ✓ 
GBM 
GBM(XBK,T) ✓    Training 
GBM(XTR,T)  ✓   
GBM(HTR,T)   ✓  
GBM(XBT,T) ✓ ✓   
GBM(X2Z,T) ✓ ✓  ✓ 
GBM(XZH,T) ✓ ✓ ✓ ✓ 
GBM(XBK,V) ✓    Testing 
GBM(XTR,V)  ✓   
GBM(HTR,V)   ✓  
GBM(XBT,V) ✓ ✓   
GBM(X2Z,V) ✓ ✓  ✓ 
GBM(XZH,V) ✓ ✓ ✓ ✓ 
AbbreviationRainfallThe water level at Thac RiengEvaporation at Bac KanStep
Bac KanThac Rieng
RF 
RF(XBK,T) ✓       Training 
RF(XTR,T)  ✓   
RF(HTR,T)   ✓  
RF(XBT,T) ✓ ✓   
RF(X2Z,T) ✓ ✓  ✓ 
RF(XZH,T) ✓ ✓ ✓ ✓ 
RF(XBK,V) ✓    Testing 
RF(XTR,V)  ✓   
RF(HTR,V)   ✓  
RF(XBT,V) ✓ ✓   
RF(X2Z,V) ✓ ✓  ✓ 
RF(XZH,V) ✓ ✓ ✓ ✓ 
DT 
DT(XBK,T) ✓    Training 
DT(XTR,T)  ✓   
DT(HTR,T)   ✓  
DT(XBT,T) ✓ ✓   
DT(X2Z,T) ✓ ✓  ✓ 
DT(XZH,T) ✓ ✓ ✓ ✓ 
DT(XBK,V) ✓    Testing 
DT(XTR,V)  ✓   
DT(HTR,V)   ✓  
DT(XBT,V) ✓ ✓   
DT(X2Z,V) ✓ ✓  ✓ 
DT(XZH,V) ✓ ✓ ✓ ✓ 
LightGBM 
LG(XBK,T) ✓    Training 
LG(XTR,T)  ✓   
LG(HTR,T)   ✓  
LG(XBT,T) ✓ ✓   
LG(X2Z,T) ✓ ✓  ✓ 
LG(XZH,T) ✓ ✓ ✓ ✓ 
LG(XBK,V) ✓    Testing 
LG(XTR,V)  ✓   
LG(HTR,V)   ✓  
LG(XBT,V) ✓ ✓   
LG(X2Z,V) ✓ ✓  ✓ 
LG(XZH,V) ✓ ✓ ✓ ✓ 
GBM 
GBM(XBK,T) ✓    Training 
GBM(XTR,T)  ✓   
GBM(HTR,T)   ✓  
GBM(XBT,T) ✓ ✓   
GBM(X2Z,T) ✓ ✓  ✓ 
GBM(XZH,T) ✓ ✓ ✓ ✓ 
GBM(XBK,V) ✓    Testing 
GBM(XTR,V)  ✓   
GBM(HTR,V)   ✓  
GBM(XBT,V) ✓ ✓   
GBM(X2Z,V) ✓ ✓  ✓ 
GBM(XZH,V) ✓ ✓ ✓ ✓ 

The training step uses input and target output in the period from 01/01/1960 to 06/03/1975, covering approximately 70% of the data collection. The testing step utilizes meteo-hydrological data from 07/03/1975 to 31/12/1981. This comprehensive approach ensures that the models are training, leveraging BO to fine-tune hyper-parameters effectively. After that six dimensionless and dimensional errors are calculated for quantitative comparisons between computed and measured streamflows. These errors provide a thorough evaluation of the models' performance, ensuring that computed streamflows are both accurate and reliable. Detailed results and error values for the four investigated models are depicted in the next section.

Input selection and optimization of hyper-parameters

As mentioned previously, each DT-based model (RF, DT, LightGBM, and GBM) was evaluated using six different input scenarios. The BO was also employed for identifying optimal hyper-parameters values. Table 3 summaries detailed errors (KGE, r, NSE, RMSE, MAE, and ME) for six different input scenarios in both the training and testing steps of each model, while the radar plots for both dimensionless and dimensional errors are depicted in Figure 4 for training step and Figure 5 for testing step.
Table 3

Statistical errors when using multiple DT-based models

AbbreviationKGErNSERMSE
MAE
ME
m3/s%m3/s%m3/s%
RF(XBK,T) 0.762 0.885 0.774 11.55 2.23 7.02 1.36 −0.20 −0.04 
RF(XTR,T) 0.699 0.853 0.714 12.99 2.51 7.59 1.47 −0.18 −0.03 
RF(HTR,T) 0.826 0.919 0.840 9.72 1.88 2.68 0.52 0.02 0.00 
RF(XBT,T) 0.786 0.904 0.808 10.65 2.06 6.49 1.25 −0.18 −0.03 
RF(X2Z,T) 0.826 0.940 0.874 8.63 1.67 4.95 0.96 −0.31 −0.06 
RF(XZH,T) 0.912 0.985 0.964 4.59 0.89 1.36 0.26 − 0.028 − 0.005 
RF(XBK,V) 0.692 0.767 0.582 15.93 5.77 9.69 3.51 1.41 0.51 
RF(XTR,V) 0.603 0.699 0.483 17.73 6.42 10.37 3.76 1.43 0.52 
RF(HTR,V) 0.858 0.971 0.931 6.49 2.35 2.28 0.83 − 0.02 − 0.01 
RF(XBT,V) 0.726 0.782 0.604 15.50 5.62 9.30 3.37 1.36 0.49 
RF(X2Z,V) 0.760 0.799 0.632 14.95 5.42 8.80 3.19 0.70 0.25 
RF(XZH,V) 0.894 0.971 0.937 6.17 2.24 2.25 0.81 − 0.048 − 0.02 
DT(XBK,T) 0.891 0.922 0.850 9.40 1.82 6.16 1.19 −0.15 −0.03 
DT(XTR,T) 0.848 0.891 0.795 11.01 2.13 6.88 1.33 −0.13 −0.03 
DT(HTR,T) 0.971 0.979 0.958 4.97 0.96 1.76 0.34 0.03 0.01 
DT(XBT,T) 0.912 0.938 0.879 8.45 1.63 5.62 1.09 −0.11 −0.02 
DT(X2Z,T) 0.914 0.939 0.882 8.33 1.61 5.48 1.06 −0.11 −0.02 
DT(XZH,T) 0.960 0.996 0.982 0.49 0.09 0.22 0.04 0.005 0.001 
DT(XBK,V) 0.647 0.661 0.356 19.78 7.17 10.64 3.86 1.47 0.53 
DT(XTR,V) 0.602 0.606 0.200 22.05 7.99 11.43 4.14 1.09 0.39 
DT(HTR,V) 0.872 0.902 0.814 10.63 3.85 2.52 0.91 0.20 0.07 
DT(XBT,V) 0.675 0.687 0.319 20.34 7.37 10.57 3.83 0.86 0.31 
DT(X2Z,V) 0.568 0.634 0.049 24.03 8.71 10.44 3.78 0.68 0.25 
DT(XZH,V) 0.805 0.916 0.838 9.92 3.59 2.60 0.94 0.088 0.03 
LG(XBK,T) 0.722 0.841 0.702 13.25 2.56 7.63 1.47 −0.21 −0.04 
LG(XTR,T) 0.625 0.780 0.602 15.32 2.96 8.36 1.62 −0.20 −0.04 
LG(HTR,T) 0.778 0.862 0.741 12.35 2.39 3.21 0.62 0.03 0.01 
LG(XBT,T) 0.744 0.863 0.738 12.43 2.40 7.11 1.38 −0.17 −0.03 
LG(X2Z,T) 0.785 0.897 0.797 10.95 2.12 6.18 1.19 −0.17 −0.03 
LG(XZH,T) 0.924 0.972 0.942 5.84 1.13 2.05 0.40 0.006 0.001 
LG(XBK,V) 0.682 0.744 0.544 16.65 6.03 9.97 3.61 1.29 0.47 
LG(XTR,V) 0.595 0.669 0.434 18.55 6.72 10.77 3.90 1.18 0.43 
LG(HTR,V) 0.883 0.980 0.951 5.48 1.98 2.37 0.86 − 0.09 − 0.03 
LG(XBT,V) 0.731 0.759 0.558 16.40 5.94 9.70 3.52 1.03 0.37 
LG(X2Z,V) 0.751 0.775 0.583 15.91 5.77 9.12 3.30 0.87 0.32 
LG(XZH,V) 0.906 0.957 0.913 7.28 2.64 2.56 0.93 − 0.563 − 0.20 
GBM(XBK,T) 0.800 0.921 0.838 9.78 1.89 7.15 1.38 −1.61 −0.31 
GBM(XTR,T) 0.678 0.798 0.635 14.68 2.84 8.41 1.63 −0.22 −0.04 
GBM(HTR,T) 0.914 0.976 0.949 5.47 1.06 2.09 0.40 − 0.55 − 0.11 
GBM(XBT,T) 0.760 0.858 0.734 12.53 2.42 7.56 1.46 −0.24 −0.05 
GBM(X2Z,T) 0.901 0.962 0.923 6.72 1.30 4.71 0.91 −0.26 −0.05 
GBM(XZH,T) 0.913 0.963 0.928 6.53 1.26 2.84 0.55 − 0.062 − 0.012 
GBM(XBK,V) 0.701 0.734 0.523 17.03 6.17 10.33 3.74 −0.26 −0.09 
GBM(XTR,V) 0.583 0.678 0.452 18.25 6.61 10.66 3.86 1.43 0.52 
GBM(HTR,V) 0.888 0.963 0.923 6.84 2.48 2.74 0.99 − 0.61 − 0.22 
GBM(XBT,V) 0.708 0.765 0.577 16.03 5.81 9.61 3.48 1.39 0.50 
GBM(X2Z,V) 0.780 0.800 0.626 15.06 5.46 8.73 3.16 0.57 0.21 
GBM(XZH,V) 0.913 0.963 0.925 6.74 2.44 2.83 1.02 0.049 0.02 
AbbreviationKGErNSERMSE
MAE
ME
m3/s%m3/s%m3/s%
RF(XBK,T) 0.762 0.885 0.774 11.55 2.23 7.02 1.36 −0.20 −0.04 
RF(XTR,T) 0.699 0.853 0.714 12.99 2.51 7.59 1.47 −0.18 −0.03 
RF(HTR,T) 0.826 0.919 0.840 9.72 1.88 2.68 0.52 0.02 0.00 
RF(XBT,T) 0.786 0.904 0.808 10.65 2.06 6.49 1.25 −0.18 −0.03 
RF(X2Z,T) 0.826 0.940 0.874 8.63 1.67 4.95 0.96 −0.31 −0.06 
RF(XZH,T) 0.912 0.985 0.964 4.59 0.89 1.36 0.26 − 0.028 − 0.005 
RF(XBK,V) 0.692 0.767 0.582 15.93 5.77 9.69 3.51 1.41 0.51 
RF(XTR,V) 0.603 0.699 0.483 17.73 6.42 10.37 3.76 1.43 0.52 
RF(HTR,V) 0.858 0.971 0.931 6.49 2.35 2.28 0.83 − 0.02 − 0.01 
RF(XBT,V) 0.726 0.782 0.604 15.50 5.62 9.30 3.37 1.36 0.49 
RF(X2Z,V) 0.760 0.799 0.632 14.95 5.42 8.80 3.19 0.70 0.25 
RF(XZH,V) 0.894 0.971 0.937 6.17 2.24 2.25 0.81 − 0.048 − 0.02 
DT(XBK,T) 0.891 0.922 0.850 9.40 1.82 6.16 1.19 −0.15 −0.03 
DT(XTR,T) 0.848 0.891 0.795 11.01 2.13 6.88 1.33 −0.13 −0.03 
DT(HTR,T) 0.971 0.979 0.958 4.97 0.96 1.76 0.34 0.03 0.01 
DT(XBT,T) 0.912 0.938 0.879 8.45 1.63 5.62 1.09 −0.11 −0.02 
DT(X2Z,T) 0.914 0.939 0.882 8.33 1.61 5.48 1.06 −0.11 −0.02 
DT(XZH,T) 0.960 0.996 0.982 0.49 0.09 0.22 0.04 0.005 0.001 
DT(XBK,V) 0.647 0.661 0.356 19.78 7.17 10.64 3.86 1.47 0.53 
DT(XTR,V) 0.602 0.606 0.200 22.05 7.99 11.43 4.14 1.09 0.39 
DT(HTR,V) 0.872 0.902 0.814 10.63 3.85 2.52 0.91 0.20 0.07 
DT(XBT,V) 0.675 0.687 0.319 20.34 7.37 10.57 3.83 0.86 0.31 
DT(X2Z,V) 0.568 0.634 0.049 24.03 8.71 10.44 3.78 0.68 0.25 
DT(XZH,V) 0.805 0.916 0.838 9.92 3.59 2.60 0.94 0.088 0.03 
LG(XBK,T) 0.722 0.841 0.702 13.25 2.56 7.63 1.47 −0.21 −0.04 
LG(XTR,T) 0.625 0.780 0.602 15.32 2.96 8.36 1.62 −0.20 −0.04 
LG(HTR,T) 0.778 0.862 0.741 12.35 2.39 3.21 0.62 0.03 0.01 
LG(XBT,T) 0.744 0.863 0.738 12.43 2.40 7.11 1.38 −0.17 −0.03 
LG(X2Z,T) 0.785 0.897 0.797 10.95 2.12 6.18 1.19 −0.17 −0.03 
LG(XZH,T) 0.924 0.972 0.942 5.84 1.13 2.05 0.40 0.006 0.001 
LG(XBK,V) 0.682 0.744 0.544 16.65 6.03 9.97 3.61 1.29 0.47 
LG(XTR,V) 0.595 0.669 0.434 18.55 6.72 10.77 3.90 1.18 0.43 
LG(HTR,V) 0.883 0.980 0.951 5.48 1.98 2.37 0.86 − 0.09 − 0.03 
LG(XBT,V) 0.731 0.759 0.558 16.40 5.94 9.70 3.52 1.03 0.37 
LG(X2Z,V) 0.751 0.775 0.583 15.91 5.77 9.12 3.30 0.87 0.32 
LG(XZH,V) 0.906 0.957 0.913 7.28 2.64 2.56 0.93 − 0.563 − 0.20 
GBM(XBK,T) 0.800 0.921 0.838 9.78 1.89 7.15 1.38 −1.61 −0.31 
GBM(XTR,T) 0.678 0.798 0.635 14.68 2.84 8.41 1.63 −0.22 −0.04 
GBM(HTR,T) 0.914 0.976 0.949 5.47 1.06 2.09 0.40 − 0.55 − 0.11 
GBM(XBT,T) 0.760 0.858 0.734 12.53 2.42 7.56 1.46 −0.24 −0.05 
GBM(X2Z,T) 0.901 0.962 0.923 6.72 1.30 4.71 0.91 −0.26 −0.05 
GBM(XZH,T) 0.913 0.963 0.928 6.53 1.26 2.84 0.55 − 0.062 − 0.012 
GBM(XBK,V) 0.701 0.734 0.523 17.03 6.17 10.33 3.74 −0.26 −0.09 
GBM(XTR,V) 0.583 0.678 0.452 18.25 6.61 10.66 3.86 1.43 0.52 
GBM(HTR,V) 0.888 0.963 0.923 6.84 2.48 2.74 0.99 − 0.61 − 0.22 
GBM(XBT,V) 0.708 0.765 0.577 16.03 5.81 9.61 3.48 1.39 0.50 
GBM(X2Z,V) 0.780 0.800 0.626 15.06 5.46 8.73 3.16 0.57 0.21 
GBM(XZH,V) 0.913 0.963 0.925 6.74 2.44 2.83 1.02 0.049 0.02 

Bold values represent good model performance.

Figure 4

Radar plot for: (a) dimensionless and (b) dimensional errors in the training step.

Figure 4

Radar plot for: (a) dimensionless and (b) dimensional errors in the training step.

Close modal
Figure 5

Radar plot for: (a) dimensionless and (b) dimensional errors in the testing step.

Figure 5

Radar plot for: (a) dimensionless and (b) dimensional errors in the testing step.

Close modal
Figure 6

Comparison between estimated and observed streamflow when using (a) RF for the training step, (b) RF for the testing step, (c) DT for the training step, (d) DT for the testing step, (e) LG for the training step, (f) LG for the testing step, (g) GBM for the training step, and (h) GBM for the testing step.

Figure 6

Comparison between estimated and observed streamflow when using (a) RF for the training step, (b) RF for the testing step, (c) DT for the training step, (d) DT for the testing step, (e) LG for the training step, (f) LG for the testing step, (g) GBM for the training step, and (h) GBM for the testing step.

Close modal

When rainfall at Bac Kan station was used as input data for the RF model (e.g., RF(XBK,T) and RF(XBK,V)), the model demonstrated moderate performance during both the training and testing phases. The dimensionless errors (KGE, r, and NSE) range from 0.58 to 0.89, while the dimensional errors (RMSE, MAE, and ME) vary between −0.20 and 15.9 m3/s, corresponding to −0.04 to 5.77% of the observed streamflow magnitude at Thac Rieng. Slightly smaller dimensionless errors and slightly larger dimensional errors are observed when rainfall data at Thac Rieng is used as input in the RF model. A good agreement between observed and estimated streamflows is achieved when the water level is used as input data, with dimensionless errors ranging from 0.83 to 0.97 and dimensional errors varying from 0.02 to 9.72 m3/s. These results align with the correlation coefficients between streamflow at Thac Rieng and the meteo-hydrological data at individual stations, as reported in Table 1. The model reproduced observed streamflow very accurately when additional meteo-hydrological data were used such as (i) rainfall at both Bac Kan and Thac Rieng, (ii) rainfall at both stations combined with evaporation at Bac Kan, or (iii) rainfall and evaporation at Bac Kan combined with water level at Thac Rieng. The dimensionless errors range from 0.60 to 0.99, while dimensional errors vary between −0.31 and 15.5 m3/s. These findings suggest that the RF model using the second input data group tends to outperform models using only meteo-hydrological data at individual station. In other words, a good agreement between observed and estimated streamflows is also obtained when more comprehensive meteo-hydrological data are used. In addition, among the six scenarios tested, the combination of (i) water level and rainfall at Thac Rieng and (ii) rainfall and evaporation at Bac Kan results in the most accurate streamflow estimates compared to other scenarios.

The DT model, provides a good estimation of observed streamflow during the training phase when using the first input data group. The dimensionless errors (KGE, r, and NSE) range from 0.80 to 0.98, indicating that the model accurately represents both the value and variation trends of streamflow during the training period. The dimensional errors (RMSE, MAE, and ME) are less than 11 m3/s, which is approximately only 2.13% of the observed streamflow magnitude at the station. In the testing phase, the dimensionless errors range from 0.20 to 0.90, while the dimensional errors vary from 0.2 to 22.05 m3/s. When the second input data group is applied, high dimensionless accuracy and low dimensional errors are observed in the training phase, with dimensionless errors ranging from 0.88 to 0.99, and dimensional errors from 0.005 to 8.45 m3/s. In the testing phase, the dimensionless errors range from 0.05 to 0.92, and the dimensional errors vary between 0.088 and 24.03 m3/s. While the DT model performs well in the training phase, its performance is slightly weaker during the testing phase. Among the six input scenarios investigated, the DT model simulates the observed streamflow effectively when using either the water level at Thac Rieng or a combination of (i) water level and rainfall at Thac Rieng and (ii) evaporation and rainfall at Bac Kan.

Regarding the LightGBM model, the LightGBM model represents the measured streamflow acceptably during the training phase when using the first and second input data groups. The dimensionless errors range between 0.60 and 0.97, while dimensional errors vary from −0.21 to 15.32 m3/s (see Table 3). In the testing phase, similar results are obtained regardless of whether the first or second input data group is applied. Specifically, dimensionless errors range from 0.43 to 0.96, while dimensional errors range from −0.56 to 18.55 m3/s (see Table 3). Among the different input scenarios investigated, the LightGBM model showed the best performance in both the training and testing phases when meteorological and hydrological data from Bac Kan and Thac Rieng were used as input data (see Figures 4 and 5). In this case, dimensionless errors exceeded 0.92, and dimensional errors were less than 5.84 m3/s. These results are consistent with those obtained from the RF and DT models, showing that the more input data provided, the more accurate streamflow estimations.

In the training phase of the GBM model, there is a very good agreement between observed and estimated streamflow when using both the first and second input data groups, especially in cases using either the water level at Thac Rieng or a combination of (i) water level and rainfall at Thac Rieng and (ii) rainfall and evaporation at Bac Kan. When using the first input data group, dimensionless errors range from 0.64 to 0.98, while dimensional errors are less than 14.68 m3/s, corresponding to 2.84% of the observed streamflow magnitude. With the second input data group, dimensionless errors range from 0.73 to 0.96, and dimensional errors vary from −0.26 to 12.53 m3/s (see Table 3). In the testing phase, similar values for both dimensionless and dimensional errors are obtained when using either the first or second input data group (see Table 3). Furthermore, among the six input scenarios, the GBM model showed the best performance in both the training and testing phases when using the combination of rainfall and evaporation at Bac Kan together with rainfall and the water level at Thac Rieng (see Figures 4 and 5).

In general, across the RF, DT, LightGBM, and GBM models investigated, the best agreement between observed and estimated streamflow is achieved when using all meteo-hydrological data at both Thac Rieng and Bac Kan (Figure 6). Using the BO algorithm, optimal values of hyper-parameters in the best performance of the DT model were identified as max_depth = 22. The optimal values of hyper-parameters for the best performance of the RF model were n_estimators = 77, max_features = 0.27, max_depth = 26, and min_sample_split = 3. In the LightGBM model, optimal hyper-parameters were n_estimators = 179, max_depth = 13, and min_sample_split = 2. The best results of the GBM model correspond to the use of hyper-parameters values as n_estimators = 89, lr = 0.09, max_depth = 3, and min_child_weight = 1. These models also perform well, with high accuracy and low dimensional and dimensionless errors when the water level at Thac Rieng is used as input. Reliable streamflow estimates, with acceptable errors, are obtained when using rainfall data alone or in combination with evaporation data. The results from all four models demonstrate that four DT-based models named RF, DT, LightGBM, and GBM using combined input data tend to outperform models using data from individual stations.

Comparison of best results obtained in each model

Figure 7 shows the comparison between observed and estimated streamflow for the best performance of each model named RF, DT, LightGBM, and GBM. It is clearly observed that in the training step (corresponding to the period from 01/01/1960 to 6/3/1975), all models reproduce well the observed streamflow during both dry and wetting seasons across different years when using rainfall, evaporation, and water level as input data (see Table 3). Among the four models investigated, the GBM model demonstrates the best performance. When using this model, the dimensionless statistical errors (KGE, r, and NSE) of streamflow range from 0.96 to 0.99, indicating that the model accurately represents both the observations and their variability. Furthermore, the dimensional statistical errors (RMSE, MAE, and ME) vary between 0.005 and 0.5 m3/s, and these errors are less than 0.1% of the observed magnitude of streamflow at the station.
Figure 7

Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the training step.

Figure 7

Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the training step.

Close modal
During the testing period (from 7/3/1975 and 31/12/1981), all models accurately reproduced the observed streamflow as shown in Figure 8. The values of dimensionless statistical errors change in a range from 0.81 to 0.97, while the values of dimensional errors of streamflow change from −0.56 to 9.92 m3/s, which is approximately 0.02 to 3.59% of the observed streamflow magnitude (see Table 3).
Figure 8

Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the testing step.

Figure 8

Multiple tree-based models in simulating streamflow including (a) scatters of observed data versus simulated results and (b) time series for the testing step.

Close modal

Extremely event capturing

Figure 9 shows the performance of four DT-based models in capturing extreme high and low streamflow over the collected period (from 01/01/1960 to 31/12/1981). The results show that each model is able to compute streamflow under different conditions. For extremely high streamflow, the RF, LightGBM, and GBM models tend to underestimate the observations, while the DT model provides the best fit to the observed streamflow. In terms of extremely low streamflow, models named RF, LightGBM, and GBM show a slight overestimation of observed streamflow values. The DT model also performs well in observed low streamflow. In general, among different DT-based models investigated, the DT model is found as the most accurate representation of the observed streamflow over the long period, for both extremely high and low streamflow conditions.
Figure 9

Results of multiple tree-based models in estimating extreme high and low streamflow: (a) a log scale for exceedance probability and (b) a log scale for streamflow.

Figure 9

Results of multiple tree-based models in estimating extreme high and low streamflow: (a) a log scale for exceedance probability and (b) a log scale for streamflow.

Close modal

Application for streamflow reconstruction

Four DT-based models (i.e., RF, DT, LightGBM, and GBM) were applied to estimate streamflow during the missing data period (from 01/01/1982 to 31/12/2017). Figure 10 illustrates the time series of streamflow for the entire collected period from 1/1/1960 to 31/12/2022, including the aforementioned missing data period. It is clear to observe that the simulated streamflow shows a temporal variation that aligns with changes in both rainfall and water level. High streamflow is observed during periods of heavy rainfall and high water levels across different years of the considered period. In addition, large streamflow also occurs in wet seasons, while small streamflow is obtained in dry seasons. Table 4 summarizes the various statistical properties (e.g., range, mean, standard deviation, and coefficient of variation) of estimated streamflow when different DT-based models were applied. All models show similar statistical properties in comparison with those obtained from the observed streamflow.
Table 4

Different statistical characteristics of observed and estimated streamflow

ModelCollected periodMean valueRangeStandard deviationCoefficient of variation
Observation 1960–1981, 2018–2022 18.4 2.22–517 24.04 1.305 
RF 1960–2022 20.0 2.32–370.0 22.60 1.128 
DT 19.9 2.22–517.0 25.25 1.270 
LG 20.2 0.03–393.2 24.15 1.196 
GBM 20.0 3.85–495.2 23.73 1.188 
ModelCollected periodMean valueRangeStandard deviationCoefficient of variation
Observation 1960–1981, 2018–2022 18.4 2.22–517 24.04 1.305 
RF 1960–2022 20.0 2.32–370.0 22.60 1.128 
DT 19.9 2.22–517.0 25.25 1.270 
LG 20.2 0.03–393.2 24.15 1.196 
GBM 20.0 3.85–495.2 23.73 1.188 
Figure 10

Time series of estimated streamflow in the period from 1960 to 2022.

Figure 10

Time series of estimated streamflow in the period from 1960 to 2022.

Close modal

Interpreted final model

The SHAP analysis provided in Figure 11 demonstrates the relative importance of input features – rainfall and evaporation at Bac Kan, and rainfall and water levels at Thac Rieng – in predicting streamflow at Thac Rieng for the XGB model (i.e., the highest performance compared to others). The results show that the most influential feature is the water level at Thac Rieng from two-time steps prior (HTRt−2), followed by rainfall at Thac Rieng from one time step prior (XTRt−1), and the current water level at Thac Rieng (HTRt). The lagged water level at Thac Rieng (HTRt−2) is particularly influential because it captures the cumulative effects of previous rainfall and water inflows. Due to hydrological inertia, past water levels often take time to fully manifest in downstream flow, making this feature the most significant predictor. Rainfall at Thac Rieng (XTRt−1), though also highly influential, has a more immediate impact on streamflow, as it directly contributes to surface runoff, raising river levels quickly but not with the same sustained influence as water levels over time. While the current water level (HTRt) and the lagged water level from one time step ago (HTRt−1) are also important, their impacts are slightly less significant compared to HTRt−2 because they provide less information about delayed hydrological responses. Rainfall at Bac Kan (XBK) shows some influence, particularly rainfall two-time steps prior (XBKt−2), but its impact is smaller, likely due to the distance between Bac Kan and Thac Rieng, which attenuates the effects of rainfall from Bac Kan on the streamflow at Thac Rieng. Evaporation at Bac Kan (ZBK), on the other hand, has the least influence on streamflow predictions, as evaporation processes generally have a slower, more indirect effect on river systems compared to direct inputs such as rainfall.
Figure 11

SHAP analysis showing feature importance for predicting streamflow at Thac Rieng: (a) mean SHAP values indicate the water level and (b) SHAP summary plot shows how feature values impact predictions.

Figure 11

SHAP analysis showing feature importance for predicting streamflow at Thac Rieng: (a) mean SHAP values indicate the water level and (b) SHAP summary plot shows how feature values impact predictions.

Close modal

Figure 11(b) adds further depth to this analysis by illustrating not only the importance of these features but also how their actual values affect the predicted streamflow. The x-axis shows the SHAP values, which quantify the positive or negative impact each feature has on the model's output. The color gradient, with blue representing lower values and red representing higher values, indicates the actual magnitude of each feature. For example, HTRt−2, the most important feature, shows that higher water levels (red) lead to positive SHAP values, meaning they increase the predicted streamflow. Conversely, lower water levels (blue) result in negative SHAP values, decreasing the predicted streamflow. This makes intuitive sense from a hydrological perspective: higher water levels from two-time steps ago indicate an increased storage in the river system, which translates into higher downstream flows after a time lag. A similar relationship is observed with XTRt−1, where higher rainfall values (red) push the predicted streamflow upwards, and lower rainfall values (blue) push it downwards. Rainfall at Thac Rieng has a relatively immediate effect on the streamflow, as it contributes directly to surface runoff and river levels. The rainfall at Bac Kan (XBK) also exhibits this pattern but with a narrower spread of SHAP values, suggesting that while rainfall in Bac Kan impacts streamflow at Thac Rieng, it does so with less variability. The more distant location of Bac Kan and the delayed effects of upstream rainfall reduce its influence compared to local rainfall at Thac Rieng. Evaporation at Bac Kan (ZBK) shows the smallest impact, regardless of whether evaporation values are high or low. This indicates that evaporation, while part of the overall water balance, contributes minimally to the short-term prediction of streamflow.

In this study, different tree-based ML models named DT, RF, LightGBM, and GBM were investigated to estimate streamflow in river basins, and they were specifically applied in the Cau River at the Thac Rieng location to illustrate their performance. The performance of these models was evaluated for six input scenarios to assess their robustness and predictive capabilities in streamflow estimation. In each mode, hyper-parameters were optimized using the BO tool, which allowed for more efficient and effective optimization compared to traditional or random search methods.

The results showed that while all four models performed satisfactorily in estimating streamflow, there were clear differences in their accuracy and reliability. The ensemble methods, particularly RF and GBM, consistently outperformed the single-tree models like the DT. Among the ensemble methods, GBM emerged as the best-performing model, offering superior predictive power across various data scenarios. Dimensionless statistical errors (KGE, r, and NSE) of streamflow range from 0.81 to 0.99, while dimensional errors of streamflow were smaller than 3.59% of the observed streamflow magnitude. The model is able to handle non-linear relationships among different components of streamflow processes, combined with its robustness to overfitting, which made it particularly well-suited for estimating streamflow tasks.

The use of XAI provided valuable insights into the factors most critical for accurately estimating streamflow, identifying time-lagged water levels and rainfall as the most influential variables. This additional layer of understanding helped validate the model output and enhanced confidence in the model's practical applicability. The results also demonstrate that ensemble-based algorithms of ML, especially XGBoost, are highly effective for streamflow estimation in complex hydrological systems like the Cau River basin. The integration of XAI techniques also adds significant value, making the models not only more accurate but also more interpretable and transparent, which is essential for practical applications in water resource management.

The results indicated that the water level at Thac Rieng and meteorological-hydrological data are the primary factors influencing the accuracy of estimated streamflow in the studied river basin. This finding is expected to assist both researchers and decision-makers in using hydrological and hydraulic models in the future.

The author extends sincere thanks to the editor and two anonymous reviewers for their insightful suggestions and comments.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alawi
S. A.
&
Özkul
S.
(
2023
)
Evaluation of land use/land cover datasets in hydrological modelling using the SWAT model
,
H2Open Journal
,
6
(
1
),
63
74
.
https://doi.org/10.2166/h2oj.2023.062
.
Bal
M.
,
Dandpat
A. K.
&
Naik
B.
(
2021
)
Hydrological modeling with respect to impact of land-use and land-cover change on the runoff dynamics in Budhabalanga River basing using ArcGIS and SWAT model
,
Remote Sensing Applications: Society and Environment
,
23
,
100527
.
https://doi.org/10.1016/j.rsase.2021.100527
.
Beven
K. J.
(
2012
)
Rainfall-Runoff Modelling: the Primer
.
Cumbria, UK
:
John Wiley & Sons
, p.
457
.
Chia
M. Y.
,
Huang
Y. F.
&
Koo
C. H.
(
2022
)
Resolving data-hungry nature of machine learning reference evapotranspiration estimating models using inter-model ensembles with various data management schemes
,
Agricultural Water Management
,
261
,
107343
.
https://doi.org/10.1016/j.agwat.2021.107343
.
Cui
Z.
,
Gao
T.
,
Talamadupula
K.
&
Ji
Q.
(
2022
)
Knowledge-augmented deep learning and its applications: a survey
. In: Song, Y. (ed.)
IEEE Transactions on Neural Networks and Learning Systems
, pp.
1
20
.
Piscataway, NJ, USA: IEEE. Available at: https://api.semanticscholar.org/CorpusID:254125341.
Dasgupta
R.
,
Das
S.
,
Banerjee
G.
&
Mazumdar
A.
(
2024
)
Revisit hydrological modeling in ungauged catchments comparing regionalization, satellite observations, and machine learning approaches
,
HydroResearch
,
7
,
15
31
.
https://doi.org/10.1016/j.hydres.2023.11.001
.
Hauswirth
S. M.
,
Bierkens
M. F. B.
,
Beijk
V.
&
Wanders
N.
(
2021
)
The potential of data driven approaches for quantifying hydrological extremes
,
Advances in Water Resources
,
155
,
104017
.
https://doi.org/10.1016/j.advwatres.2021.104017
.
Hochreiter
S.
&
Schmidhuber
J.
(
1997
)
Long short-term memory
,
Neural Computation
,
9
(
8
),
1735
1780
.
https://doi.org/10.1162/neco.1997.9.8.1735
.
Huang
F.
&
Zhang
X.
(
2024
)
A new interpretable streamflow prediction approach based on SWAT-BiLSTM and SHAP
,
Environmental Science and Pollution Research
,
31
,
23896
23908
.
https://doi.org/10.1007/s11356-024-32725-z
.
Imhoff
R. O.
,
Buitink
J.
,
van Verseveld
W. J.
&
Weerts
A. H.
(
2024
)
A fast high resolution distributed hydrological model for forecasting, climate scenarios and digital twin applications using wflow_sbm
,
Environmental Modelling & Software
,
179
,
106099
.
https://doi.org/10.1016/j.envsoft.2024.106099
.
Ivanov
V. Y.
,
Xu
D.
,
Chase Dwelle
M.
,
Sargsyan
K.
,
Wright
D. B.
,
Katopodes
N.
,
Kim
J.
,
Tran
V. N.
,
Warnock
A.
,
Fatichi
S.
,
Burlando
P.
,
Caporali
E.
,
Restrepo
P.
,
Sanders
B. F.
,
Chaney
M. M.
,
Nunes
A. M. B.
,
Nardi
F.
,
Vivoni
E. R.
,
Istanbulluoglu
E.
,
Bisht
G.
&
Bras
R. L.
(
2021
)
Breaking down the computational barriers to real-time urban flood forecasting
,
Geophysical Research Letters
,
48
(
20
),
e2021GL093585
.
https://doi.org/10.1029/2021GL093585
.
Jiang
S.
,
Sweet
L.-b.
,
Blougouras
G.
,
Brenning
A.
,
Li
W.
,
Reichstein
M.
,
Denzler
J.
,
Shangguan
W.
,
Yu
G.
,
Huang
F.
&
Zscheischler
J.
(
2024
)
How interpretable machine learning can benefit process understanding in the geosciences
,
Earth's Future
,
12
(
7
),
e2024EF004540
.
https://doi.org/10.1029/2024EF004540
.
Kedam
N.
,
Tiwari
D. K.
,
Kumar
V.
,
Khedher
K. M.
&
Salem
M. A.
(
2024
)
River stream flow prediction through advanced machine learning models for enhanced accuracy
,
Results in Engineering
,
22
,
102215
.
https://doi.org/10.1016/j.rineng.2024.102215
.
Knoben
W. J.
,
Freer
J. E.
&
Woods
R. A.
(
2019
)
Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores
,
Hydrology and Earth System Sciences
,
23
(
10
),
4323
4331
.
https://doi.org/10.5194/hess-23-4323-2019
.
Liu
J.
,
Koch
J.
,
Stisen
S.
,
Troldborg
L.
&
Schneider
R. J. M.
(
2024
)
A national-scale hybrid model for enhanced streamflow estimation–consolidating a physically based hydrological model with long short-term memory (LSTM) networks
,
Hydrology and Earth System Sciences
,
28
(
13
),
2871
2893
.
https://doi.org/10.5194/hess-28-2871-2024
.
Lundberg
S. M.
&
Lee
S.-I.
(
2017
)
A unified approach to interpreting model predictions
,
Advances in Neural Information Processing Systems
. Long Beach, CA, USA: 31st Conference on Neural Information Processing Systems (NIPS 2017).
https://doi.org/10.48550/arXiv.1705.07874
.
Nash
J. E.
&
Sutcliffe
J. V.
(
1970
)
River flow forecasting through conceptual models part I – a discussion of principles
,
Journal of Hydrology
,
10
(
3
),
282
290
.
https://doi.org/10.1016/0022-1694(70)90255-6
.
Nguyen
G. V.
,
Le
X.-H.
,
Van
L. N.
,
Jung
S.
,
Yeon
M.
&
Lee
G.
(
2021
)
Application of random forest algorithm for merging multiple satellite precipitation products across South Korea
,
Remote Sensing
,
13
(
20
),
4033
.
https://doi.org/10.3390/rs13204033
.
Nguyen
G. V.
,
Le
X.-H.
,
Van
L. N.
,
May
D. T. T.
,
Jung
S.
&
Lee
G.
(
2023
)
Machine learning approaches for reconstructing gridded precipitation based on multiple source products
,
Journal of Hydrology: Regional Studies
,
48
,
101475
.
https://doi.org/10.1016/j.ejrh.2023.101475
.
Perera
U. A. K. K.
,
Coralage
D. T. S.
,
Ekanayake
I. U.
,
Alawatugoda
J.
&
Meddage
D. P. P.
(
2024
)
A new frontier in streamflow modeling in ungauged basins with sparse data: a modified generative adversarial network with explainable AI
,
Results in Engineering
,
21
,
101920
.
https://doi.org/10.1016/j.rineng.2024.101920
.
Pham Van
C.
&
Nguyen-Van
G.
(
2022
)
Three different models to evaluate water discharge: an application to a river section at Vinh Tuy location in the Lo river basin, Vietnam
,
Journal of Hydro-Environment Research
,
40
,
38
50
.
https://doi.org/10.1016/j.jher.2021.12.002
.
Plunge
S.
,
Schürz
C.
,
Čerkasova
N.
,
Strauch
M.
&
Piniewski
M.
(
2024
)
SWAT+ model setup verification tool: SWATdoctR
,
Environmental Modelling & Software
,
171
,
105878
.
https://doi.org/10.1016/j.envsoft.2023.105878
.
Quinlan
J. R.
(
1986
)
Induction of decision trees
,
Machine Learning
,
1
,
81
106
.
https://doi.org/10.1007/BF00116251
.
Sit
M.
,
Demiray
B. Z.
,
Xiang
Z.
,
Ewing
G. J.
,
Sermet
Y.
&
Demir
I.
(
2020
)
A comprehensive review of deep learning applications in hydrology and water resources
,
Water Science and Technology
,
82
(
12
),
2635
2670
.
https://doi.org/10.2166/wst.2020.369
.
Sushanth
K.
,
Mishra
A.
,
Mukhopadhyay
P.
&
Singh
R.
(
2023
)
Real-time streamflow forecasting in a reservoir-regulated river basin using explainable machine learning and conceptual reservoir module
,
Science of The Total Environment
,
861
,
160680
.
https://doi.org/10.1016/j.scitotenv.2022.160680
.
Szczepanek
R.
(
2022
)
Daily streamflow forecasting in mountainous catchment using XGBoost, LightGBM and CatBoost
,
Hydrology
,
9
,
226
.
https://doi.org/10.3390/hydrology9120226
.
Tran
N. V.
,
Ivanov
V. Y.
,
Nguyen
G. T.
,
Anh
T. N.
,
Nguyen
P. H.
,
Kim
D.-H.
&
Kim
J.
(
2024
)
A deep learning modeling framework with uncertainty quantification for inflow-outflow predictions for cascade reservoirs
,
Journal of Hydrology
,
629
,
130608
.
https://doi.org/10.1016/j.jhydrol.2024.130608
.
Tripathy
K. P.
&
Mishra
A. K.
(
2023
)
Deep learning in hydrology and water resources disciplines: concepts, methods, applications, and research directions
,
Journal of Hydrology
,
628
,
130458
.
https://doi.org/10.1016/j.jhydrol.2023.130458
.
Uhrig
R. E.
(
1995
)
Introduction to artificial neural networks
. In: Lynn, M. S. (ed.)
Proceedings of IECON'95-21st Annual Conference on IEEE Industrial Electronics
, Vol.
1
.
Orlando, FL, USA: IEEE
, pp.
33
37
.
Wang
Z.
,
Xu
N.
,
Bao
X.
,
Wu
J.
&
Cui
X.
(
2024
)
Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion
,
Environmental Modelling & Software
,
178
,
106091
.
https://doi.org/10.1016/j.envsoft.2024.106091
.
Yang
S.
,
Yang
D.
,
Chen
J.
,
Santisirisomboon
J.
,
Lu
W.
&
Zhao
B.
(
2020
)
A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data
,
Journal of Hydrology
,
590
,
125206
.
https://doi.org/10.1016/j.jhydrol.2020.125206
.
Zahura
F. T.
,
Goodall
J. L.
,
Sadler
J. M.
,
Shen
Y.
,
Morsy
M. M.
&
Behl
M.
(
2020
)
Training machine learning surrogate models from a high-fidelity physics based model: application for real-time street-scale flood prediction in an urban coastal community
,
Water Resources Research
,
56
(
10
),
e2019WR027038
.
https://doi.org/10.1029/2019WR027038
.
Zhang
L.
,
Dou
H.
,
Zhang
K.
,
Huang
R.
,
Lin
X.
,
Wu
S.
,
Zhang
R.
,
Zhang
C.
&
Zheng
S.
(
2023
)
CNN-LSTM Model optimized by Bayesian optimization for predicting single-Well production in water flooding reservoir
,
Geofluids
,
5467956
,
16
.
https://doi.org/10.1155/2023/5467956
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC 4.0), which permits copying, adaptation and redistribution for non-commercial purposes, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc/4.0/).