ABSTRACT
This study evaluates the performance of decision tree model (XGBoost), time series model (LSTM), and semi-distributed hydrological model (HBV) in simulating daily runoff in a watershed with significant snowmelt contributions and intensive human activity. Daily runoff data from the Shiribeshi-Toshibetsu basin in Japan, spanning the period from 1998 to 2015, were used for model training, with data from 2016 to 2019 for validation. Comparative analysis reveals that the XGBoost model outperforms the LSTM and HBV models,achieving a Nash–Sutcliffe Efficiency (NSE) of 0.94, compared to 0.85 and 0.84 for the LSTM and HBV models, respectively. Additionally, Partial Dependence Plots (PDPs) were utilized to investigate the nonlinear impacts of climatic factors on runoff, identifying precipitation, average temperature, and wind speed as the most significant drivers. Specifically, the relationship between temperature and runoff follows a unique pattern: runoff initially increases with rising temperatures, then decreases, and subsequently increases again. These findings underscore the robustness of decision tree models in handling complex hydrological data and highlight the importance of understanding the intricate interactions between climatic variables and runoff. This study provides scientific support for water resource management in similar watersheds under changing climatic conditions.
HIGHLIGHTS
XGBoost outperforms LSTM and HBV models in predicting daily runoff in snowmelt and human activity-influenced basin.
PDPs reveal complex interactions between climate variables and runoff.
Insights into water resource management under climate change scenarios.
INTRODUCTION
Runoff is integral to regional water cycles (Hobeichi et al. 2022), and its accurate prediction is essential for effective water resource management (Cho & Kim 2022). The complex interactions among climate patterns, land cover characteristics, soil properties, and topography results in diverse and intricate response of runoff to these driving factors (Zhang et al. 2020; Yang et al. 2021). Although current research has significantly improved the accuracy of runoff prediction, the challenge persists in basins with substantial snowmelt contributions and intense human activities, where the nonlinear behavior of runoff complicates the correlation between precipitation and runoff (Farsi & Mahjouri 2019; Sexstone et al. 2020). Therefore, it is imperative to further explore and refine models tailored to such specific basins to address the challenges of runoff prediction under these complex conditions.
Recent years have seen significant advances in runoff prediction, resulting in numerous research findings (Busico et al. 2020; Hu et al. 2021; Yang et al. 2024). Runoff prediction methods fall into three main categories: statistical and empirical analysis, process simulation and physical modeling, and machine learning. Statistical methods, such as regression analysis (Zhang et al. 2016) and the cumulative curve method (Kong et al. 2016), are commonly used for establishing functional relationships between influencing variables and runoff. For example, Sharifi et al. (2017) predicted daily runoff in the Amameh watershed using multiple linear regression, improving prediction accuracy through optimized input variable combinations. Although this approach is convenient for handling data and directly assessing the impact of long-term observations, it relies on extensive empirical data and may not fully reveal the deeper physical mechanisms of hydrological processes (Wang et al. 2022). Process simulation and physical modeling employ models based on physical principles to simulate and predict runoff, covering complex equations of natural processes like moisture movement, evaporation, and precipitation (Mirus et al. 2011). This allows researchers to deeply understand the physical mechanisms of hydrological processes and their drivers, assessing the long-term impact of climate and land-use changes on runoff. For example, Clark et al.(2015) highlighted that physically-based hydrological models are particularly effective in capturing the complexities of water flow, storage, and distribution in a catchment, offering valuable insights into underlying hydrological mechanisms. While this method provides insight into deeper mechanisms, it requires a large number of region-specific parameters, with model accuracy depending on precise parameter settings (Yassin et al. 2019).Numerous studies have affirmed the applicability of machine learning methods in the hydrological domain and discussed their comparison with traditional hydrological models (Rahman et al. 2022). Machine learning has achieved results consistent with or even superior to conventional methods in runoff simulation (Xiang et al. 2020), river water temperature prediction (Rehana & Rajesh 2023), water quality forecasting (Chen et al. 2020a), and groundwater level simulation (Cai et al. 2021). For example, comprehensive comparison studies have demonstrated that Long Short-Term Memory (LSTM) networks outperform traditional Artificial Neural Networks (ANN) on a daily scale, especially when considering hydrological hysteresis effects, with significant improvements in Root Mean Square Error (RMSE) by 27% (Mao et al. 2021). Another study showed that machine learning multi-model combination (MMC) methods improved runoff simulation performance by 45% compared to the best single model and over 100% compared to the ensemble mean (Zaherpour et al. 2019). However, a principal limitation of traditional machine learning models is their ‘black box’ nature, characterized by a lack of transparency and interpretability in their prediction and decision-making processes (Chai et al. 2024). Interpretability ensures that model decisions are transparent and comprehensible, enhancing trust while revealing biases and improving model fairness, safety, and reliability (Başağaoğlu et al. 2022). The Partial Dependence Plot (PDP) method appears to have mitigated this issue (Li et al. 2020). PDP offers a visual means to enhance model interpretability by showing the average impact of features on the predicted outcome (Shortridge et al. 2016).
XGBoost is a tree-based ensemble machine learning algorithm widely recognized for its efficiency and ability to handle complex nonlinear relationships (Hao & Bai 2023). LSTM, a type of RNN, excels in capturing long-term dependencies in time series data and is extensively used in time series prediction tasks (Xiang et al. 2020). The HBV model, a semi-distributed conceptual rainfall–runoff model, is extensively utilized in hydrological simulations due to its effectiveness in modeling rainfall–runoff processes and its consideration of spatial heterogeneity (Bergstorm 1973). While these models have been widely applied in runoff prediction, their performance in basins characterized by significant snowmelt and intense human activities, such as in Shiribeshi-Toshibetsu, Japan, has not been systematically evaluated. This study addresses this gap by selects these three distinct types of models – XGBoost, LSTM, and HBV – to validate their applicability in Shiribeshi-Toshibetsu catchment. The main contents include: (1) comparing and analysing the performance of decision tree model (XGBoost), recurrent neural network model (LSTM) and semi-distributed model (HBV) for predicting daily runoff in this basin, identifying their respective advantages and applicable conditions; and (2) using PDP provides insights into the nonlinear patterns of interaction between climatic variables and runoff. Thus, emphasising the need to understand these complex dynamics in hydrological forecasting and water resource management practices.
MATERIALS AND METHODS
Study area and data
The runoff data for the Shiribeshi-Toshibetsu basin used in this study were sourced from the public database of the Ministry of Land, Infrastructure, Transport and Tourism (MLIT) of Japan (http://www1.river.go.jp/). This database offers long-term observational records of river runoff from 1998 to 2022, serving as a critical data source for evaluating watershed hydrological models and water resource analysis. Meteorological data, including parameters such as precipitation, temperature, and wind speed (Table 1), were obtained from the Japan Meteorological Agency (JMA) (https://www.data.jma.go.jp/gmd/risk/obsdl/index.php). The official database of the JMA is renowned for the reliability of its data, covering an extensive time range consistent with the MLIT runoff data. As the authoritative institution for meteorological monitoring, the JMA ensures the accuracy and credibility of the data used in this study. The combined data from MLIT and JMA form the foundation of the hydrological analysis conducted in this research, providing solid support for hydrological modeling and prediction analysis within the Shiribeshi-Toshibetsu basin. Independence of variables tests are shown in Figure S1 of the Supplementary Material.
Variable name . | Unit . | Description . | Range . | Standard deviation . |
---|---|---|---|---|
Runoff | m3/s | Daily runoff | 3.53–495.18 | 24.83 |
Precipitation | mm | Daily precipitation | 0–204 | 8.76 |
Ave-T | °C | Average daily temperature | −11.50 to 27.10 | 9.13 |
Wind | m/s | Wind speed | −9.60 to 34.80 | 1.01 |
Daylight | h | Daily light hours | −21.50 to 23.60 | 3.92 |
Radiation | MJ/m2 | Daily total solar radiation | 0–41.98 | 9.62 |
Variable name . | Unit . | Description . | Range . | Standard deviation . |
---|---|---|---|---|
Runoff | m3/s | Daily runoff | 3.53–495.18 | 24.83 |
Precipitation | mm | Daily precipitation | 0–204 | 8.76 |
Ave-T | °C | Average daily temperature | −11.50 to 27.10 | 9.13 |
Wind | m/s | Wind speed | −9.60 to 34.80 | 1.01 |
Daylight | h | Daily light hours | −21.50 to 23.60 | 3.92 |
Radiation | MJ/m2 | Daily total solar radiation | 0–41.98 | 9.62 |
Extreme gradient boosting (XGBoost)
Here, and are regularization parameters; represents the score for each leaf; T denotes the number of leaves in the tree.
For a more detailed description and explanation of XGBoost, please refer to (Chen et al. 2016).
Long Short-Term Memory (LSTM) network
Here, Ct and ht represent the cell state and hidden state, respectively. ⊗ denotes the element-wise product; Ct−1 is the cell state from the previous time step; Cp are the potential update vector of the cells; W and b are the weights and biases, respectively; each subscript on the weights and biases indicates the input/hidden state vector and gate to which they correspond, where weights and biases are parameters to be calibrated; σ represents the logistic function; xt is the input at the current time step, and ℎt−1 represents the hidden state from the previous time step.
HBV model
The HBV model, developed by Bergstrom & Forsman (1973) at the Swedish Meteorological and Hydrological Institute, is a semi-distributed, conceptual rainfall–runoff model that requires minimal input variables to simulate catchment runoff. The HBV Light model, an upgraded version of the original, is employed. This model predicts the influence of groundwater on runoff and uses delay parameters to represent catchment response processes. An automated Monte Carlo method generates random parameter values within predefined ranges to determine the most sensitive parameters and the optimal objective function. Model parameters see Table S3 in the supplementary material.
The recession coefficients for Q2, Q1, and Q0 are denoted as K2, K1, and K0 respectively.
Partial dependence plots
Here, xj represents the values of feature j; denotes all features except j; is the model prediction; indicates the expectation over the joint distribution of all other features except j. In practical implementation, for each value xj,k of the feature xj, we fix xj = xj,k, and then sample the other features . The model predictions are averaged over these samples. This process reveals the average impact of different values of xj on the predicted outcome, helping us understand how the model uses this feature for predictions.
Model validation
Here, n represents the number of observed runoff values, Oi is the observed runoff value, is the mean of the observed data, and Pi is the predicted runoff value. r is Correlation coefficient between simulated and observed streamflow; β is bias ratio, defined as the ratio of the mean simulated streamflow to the mean observed streamflow; γ is Variability ratio, defined as the ratio of the coefficient of variation of the simulated streamflow to the coefficient of variation of the observed streamflow.
RESULTS
Model results
This study compares the performance of three representative models, XGBoost, LSTM and HBV to identify the most effective method for runoff prediction in a specific catchment. The training period is crucial for the models learning, as it directly impacts prediction performance. During validation, model performance is closely monitored to assess generalization ability and practical application. Daily runoff data from 1998 to 2015 were used for training, while data from 2016 to 2019 were used for validation.
Models . | MAE . | MSE . | NSE . | KGE . |
---|---|---|---|---|
XGBoost | 2.40 | 20.38 | 0.94 | 0.96 |
LSTM | 3.49 | 52.09 | 0.85 | 0.92 |
HBV | 4.07 | 58.84 | 0.84 | 0.90 |
Models . | MAE . | MSE . | NSE . | KGE . |
---|---|---|---|---|
XGBoost | 2.40 | 20.38 | 0.94 | 0.96 |
LSTM | 3.49 | 52.09 | 0.85 | 0.92 |
HBV | 4.07 | 58.84 | 0.84 | 0.90 |
Overall, the XGBoost model excels in overall accuracy and performance metrics, the LSTM model shows strong performance in capturing peak flows, and the HBV model provides stable trend-following capabilities. These findings offer valuable insights into the applicability of different types of models for runoff prediction in catchments with significant snowmelt contributions and human activities.
Partial dependence plots of driver factors on runoff influence
DISCUSSION
This study compares three different types of models – decision tree model (XGBoost), time series model (LSTM), and semi-distributed conceptual rainfall–runoff model (HBV) – to evaluate their performance in predicting runoff in a snowmelt-dominated and human activity-influenced watershed. The results indicate that the XGBoost model performs best in handling the highly nonlinear patterns of hydrological data in such a watershed. This finding is consistent with Friedman's research, which emphasizes XGBoost's ability to manage nonlinear associations in complex datasets (Friedman 2001).
In snowmelt-dominated and human activity-intensive watersheds, physical models like HBV and time series models like LSTM face significant challenges in predicting runoff variations. LSTM models, although proficient in processing time series data, are less flexible and efficient than XGBoost in capturing the nonlinear interactions of meteorological factors and human activities (Chen et al. 2020b). These limitations result in LSTM models underperforming XGBoost in complex hydrological prediction tasks. The HBV model involves multiple factors such as SM, evaporation, and groundwater flow. These parameters are not only numerous but also exhibit highly nonlinear and complex interactions, making it challenging to optimize model performance under varying conditions (Milly et al. 2008).
The PDP method plays a crucial role in enhancing model interpretability. PDPs reveal the nonlinear relationships between these driving factors and runoff (Goldstein et al. 2015). Additionally, PDPs show the nonlinear impact of temperature on runoff, highlighting the intricate effects of temperature variations on the water cycle (Woodhouse & Pederson 2018; Huang et al. 2023). Specifically, temperature affects runoff differently across various ranges, potentially linked to changes in snowmelt processes and evaporation rates, a conclusion supported by other research methods (Miller & White 1998). Wind speed significantly affects runoff by influencing snowmelt and surface water redistribution, providing new insights into the role of wind speed in hydrological models. Similarly, a study on the Geum River basin in Korea found that increased runoff is associated with higher wind speeds (Sim et al. 2014). Wind speed's influence on snowmelt and subsequent runoff variations is further corroborated by research focusing on similar climatic conditions (Kershaw 2018). Our research further confirms the importance of considering these factors in hydrological prediction models, particularly in managing flood and drought events (Wang et al. 2022).
CONCLUSION
This study compares the performances of three typical models — decision tree model (XGBoost), time series model (LSTM), and semi-distributed HBV in simulating daily runoff in a watershed with significant snowmelt contributions and intensive human activities. The results indicate that the XGBoost outperforms the LSTM and HBV models, with a NSE of 0.94, compared to 0.85 and 0.84 for the LSTM and HBV models, respectively. Additionally, the contributions and mechanisms of meteorological factors such as precipitation and temperature on runoff were further quantified. These findings have important implications for water resource management practices in similar watersheds, particularly in addressing the impacts of climate change on the water cycle. This study also emphasizes the necessity of better understanding the complex interactions among meteorological factors and the importance of validating the models' generalization capabilities across various geographical and climatic conditions. Future research should further explore model inputs, including a wider range of meteorological and topographical factors, and investigate new methods to enhance model interpretability.
ACKNOWLEDGEMENTS
We would like to express our sincere gratitude to the two anonymous reviewers for their valuable comments and constructive feedback, which have greatly helped us improve the quality of this manuscript. This study was funded by the Henan Polytechnic University Doctoral Fund Sponsorship Project for the Year 2023 (B2023-59) and the Special Focus Project of Basic Research Operating Expenses for the Year 2024, Henan Polytechnic University (SKJZD2024-08). We also acknowledge the funding from the Henan Province Higher Education Teaching Reform Research and Practice Key Project (2024SJGLX0073).
AUTHORS CONTRIBUTIONS
Y.C. contributed to writing-original, draft revision, funding acquisition. J.L. contributed to software, data analysis, writing-original. Y.J. contributed to data analysis and validation. J.G. contributed to investigation and data curation. Y.H. contributed to writing-original, data curation, writing-review & editing, supervision.
DATA AVAILABILITY STATEMENT
All relevant data available from an online repository at DOI: 10.5281/zenodo.13369247.
CONFLICT OF INTEREST
The authors declare there is no conflict.