Abstract
Accurate streamflow prediction is essential for optimal water management and disaster preparedness. While data-driven methods’ performance often surpasses process-based models, concerns regarding their ‘black-box’ nature persist. Hybrid models, integrating domain knowledge and process modeling into a data-driven framework, offer enhanced streamflow prediction capabilities. This study investigated watershed memory and process modeling-based hybridizing approaches across diverse hydrological regimes – Korean and Ethiopian watersheds. Following watershed memory analysis, the Soil and Water Assessment Tool (SWAT) was calibrated using the recession constant and other relevant parameters. Three hybrid models, incorporating watershed memory and residual error, were developed and evaluated against standalone long short-term memory (LSTM) models. Hybrids outperformed the standalone LSTM across all watersheds. The memory-based approach exhibited superior and consistent performance across training, evaluation periods, and regions, achieving 17–66% Nash–Sutcliffe efficiency coefficient improvement. The residual error-based technique showed varying performance across regions. While hybrids improved extreme event predictions, particularly peak flows, all models struggled at low flow. Korean watersheds’ significant prediction improvements highlight the hybrid models’ effectiveness in regions with pronounced temporal hydrological variability. This study underscores the importance of selecting a specific hybrid approach based on the desired objectives rather than solely relying on statistical metrics that often reflect average performance.
HIGHLIGHTS
Three hybrid machine learning models were developed, considering watershed memory and hydrological model residual error.
We evaluated hybrid models for enhanced streamflow prediction across diverse hydro-meteorological regions.
Watershed memory-based hybrid models offered superior and consistent performance across regions.
INTRODUCTION
Reliable streamflow prediction is of paramount importance for effective water resource allocation and disaster management (Erdal & Karakurt 2013; Chu et al. 2021; Reis et al. 2021). In this regard, hydrological models, including data-driven, conceptual, and physically based, play a pivotal role in enhancing our comprehension of the watershed physical processes and how these processes link (Bourdin et al. 2012). Despite criticisms for parametric complexity and challenging implementation, process-based watershed models emerge as indispensable when a comprehensive grasp of hydrological processes and region water balance is crucial (Fatichi et al. 2016). Process-based models' complexity introduces certain drawbacks, such as heightened uncertainty, calibration and validation challenges, and modeling costs (Moges et al. 2020; Herrera et al. 2022).
Data-driven modeling has emerged as a viable alternative to traditional hydrological techniques (Fahimi et al. 2017; Lee et al. 2023a). These approaches leverage the power of data to uncover patterns from time-series input–output data, thereby reducing the reliance on extensive physical data inputs (Karandish & Šimůnek 2016). In essence, data-driven models are ‘black-box’ methodologies that establish relationships between inputs and outputs without explicitly modeling the underlying physical hydrological processes (Solomatine & Ostfeld 2008; Bourdin et al. 2012; Chu et al. 2021). Despite this limitation, data-driven models have consistently shown superior performance compared to traditional approaches (Nearing et al. 2021; Reis et al. 2021; Yu et al. 2023). This performance advantage has fueled a surge in the adoption of data-driven modeling for hydrological analysis (Yaseen et al. 2015; Zounemat-Kermani et al. 2020; Mohammadi et al. 2024).
However, in addition to their inherent black-box nature, data-driven models have faced criticism for their vulnerability to overfitting or underfitting, which is contingent upon the model's complexity, as well as the volume and quality of the data (Ghaith et al. 2020; Mudiyanselage Viraj et al. 2021). Furthermore, both data-driven and process-based modeling approaches exhibit suboptimal performance when predicting extreme events (Zheng et al. 2018; Tan et al. 2020; Zounemat-Kermani et al. 2021). Bridging the gap, expert knowledge-based approaches have surged in popularity, offering both enhanced performance and interpretability for accurate prediction (Mudiyanselage Viraj et al. 2021; Willard et al. 2023). In hydrology, these methods involve integrating watershed memory indicators such as baseflow analysis with other input data (Zemzami & Benaabidate 2016; Tongal & Booij 2018; Li et al. 2022), simulating intermediate variables (Humphrey et al. 2016; Noori & Kalin 2016), and accounting for residual errors (Tian et al. 2018; Kassem et al. 2020; Kim et al. 2021; Cho & Kim 2022).
This watershed memory/domain knowledge and process-based modeling-related data-driven modeling, known by various terms such as physics-informed machine learning, hybrid machine learning, and theory-based machine learning, offers several benefits, including improved physical consistency, interpretability, and prediction accuracy (Humphrey et al. 2016; Noori & Kalin 2016). However, the choice between simulating watershed processes and leveraging watershed memory for performance enhancement remains unclear, with limited research exploring their comparative effectiveness and reasoning behind the selection. In particular, residual error modeling is aimed at solely improving model performance with little possibility of providing uncertainty sources. It is crucial to acknowledge that the simulation of watershed processes can introduce complexities into the process of streamflow prediction. In contrast, watershed memory-related techniques such as baseflow analysis provide a comparatively more straightforward approach. The disparity in complexity highlights the need for thorough reasoning behind technique selection, further comparative research, and examining alternatives within the framework.
This study addresses this gap by investigating the relative effectiveness of simulating hydrological processes and employing watershed memory techniques for improving data-driven streamflow prediction. We explored the efficacy of the baseflow index in enhancing prediction accuracy, a novel approach not previously investigated in this context. Additionally, we used watershed memory for the calibration and validation of process-based models, highlighting its potential for wider application. Most importantly, we tackled the critical question: can hybrid models conquer the ever-growing threat of extreme events in streamflow prediction where traditional methods falter? The findings will provide valuable insights for researchers and practitioners in the field of hydrological modeling, informing the selection of suitable approaches for data-driven streamflow prediction and contributing to the advancement of hybrid modeling approaches. This paper is structured as follows. The first section provides an overview of streamflow improvement techniques utilizing watershed memory and residual error modeling. Next, the study regions and dataset are described in detail. Subsequently, the methods section outlines the models and approaches employed in the study. The results and discussion section presents the key model outputs, followed by a concluding section that summarizes the main findings and their implications.
OVERVIEW OF ENHANCING STREAMFLOW PREDICTION USING WATERSHED MEMORY AND RESIDUAL ERROR
Beyond the present: watershed memory for improved model predictions
Watershed memory quantifies the retention and release of water within a watershed over time. It reflects the duration for which past climate, hydrogeological features, and watershed characteristics influence the current hydrological response. Conceptualizing streamflow into quick flow, interflow, and baseflow components helps understand this dynamic (Tallaksen 1995; Duncan 2019).
Baseflow, distinguished by its persistence and sustained nature within streamflow, primarily originates from groundwater and delayed sources (Hall 1968; Lim et al. 2005; Duncan 2019; McMahon & Nathan 2021). Understanding baseflow dynamics is crucial for maintaining the delicate balance between healthy ecosystems, reliable water supplies, and clean water, with applications ranging from assessing drought risks to calibrating hydrological models (Tallaksen 1995; Brutsaert 2008; Eckhardt 2008). Recognizing its significance, hydrologists and hydrogeologists have dedicated over a century to studying baseflow, gaining valuable insights into aquifer properties, and improving water resource management (Lim et al. 2005; Thiesen et al. 2019; McMahon & Nathan 2021).
BFI reflects how watershed features, geology, and land use impact water storage (Bloomfield et al. 2009; Van Loon & Laaha 2015; Sutanto & Van Lanen 2022). Watershed memory is typically represented by BFI and baseflow recession constant (Sutanto & Van Lanen 2022; Gu et al. 2023). Studies also show a positive link between this memory and accurate streamflow prediction (Harrigan et al. 2018; Girons Lopez et al. 2021).
Several data-driven studies leverage baseflow separation to enhance streamflow predictions. Corzo & Solomatine (2007) demonstrated that even traditional methods, such as the constant slope approach, can enhance simulation accuracy, with further improvement achieved using optimized baseflow filtering equations. Similarly, Zemzami & Benaabidate (2016) found that recursive digital filters outperformed simpler alternatives when paired with artificial neural networks (ANN). Two main methods utilize baseflow separation: (1) baseflow as a predictor: integrate baseflow with other inputs (Chen et al. 2021; Tongal & Booij 2018, 2022). (2) Separate models for baseflow and excess flow: build individual models for each component, combining results for prediction (Corzo & Solomatine 2007; Isik et al. 2013; Taormina et al. 2015). While both approaches demonstrably improve forecasting, comparative studies and clear selection guidance are lacking, presenting an interesting research avenue.
Leveraging residual error modeling for enhanced model performance
The hydrological model residual-based streamflow prediction enhancing framework has been applied in two ways: (1) data-driven models trained solely on process-based model errors, and (2) residual errors as additional input variables (Tian et al. 2018; Kassem et al. 2020; Sikorska-Senoner & Quilty 2021; Cho & Kim 2022). A pivotal consideration in the realm of hydrological modeling hybridization involves the impact of calibrating process-based models before their integration with machine learning frameworks. This issue was addressed by Shen et al. (2022), who evaluated the predictive performance of both calibrated and uncalibrated versions of the PCRaster Global Water Balance (PCR-GLOBWB) model when coupled with random forest (RF). Their findings revealed significant enhancements in accuracy for both configurations, signifying the substantial potential of residual error correction within this hybridization paradigm.
STUDY REGION AND DATA DESCRIPTION
Watersheds from two worlds: South Korea and Ethiopia
Overview of model input dataset and data sources
Data description . | Details of the data . | Data sources (online and office access) . |
---|---|---|
Weather | Daily from 2000 to 2020 | Korea Meteorological Administration, KMA (KMA 2022) |
Streamflow | Daily from 2000 to 2020 | Han River Flood Control Office, HRFCO (HRFCO 2023) |
DEM | 30 m | National Geographic Information Institute, NGII (NGII 2022) |
Land use/Land cover | 30 m | Environmental Geographic Information Service, EGIS (EGIS 2022) |
Soil classes | – | Rural Development Administration, RDA (RDA 2022) |
Weather* | Daily from 1990 to 2013 | National Meteorological Service Agency of Ethiopia |
Streamflow* | Daily from 1990 to 2015 | Ministry of Water Resources, Irrigation, and Electricity, Ethiopia |
DEM* | 30 m | USGS ‘earthexplorer’ website |
Land use/land cover* | 30 m | GLOBELAND 30 (Jun et al. 2014) |
Soil classes* | – | Harmonized World Soil Database (FAO) |
Data description . | Details of the data . | Data sources (online and office access) . |
---|---|---|
Weather | Daily from 2000 to 2020 | Korea Meteorological Administration, KMA (KMA 2022) |
Streamflow | Daily from 2000 to 2020 | Han River Flood Control Office, HRFCO (HRFCO 2023) |
DEM | 30 m | National Geographic Information Institute, NGII (NGII 2022) |
Land use/Land cover | 30 m | Environmental Geographic Information Service, EGIS (EGIS 2022) |
Soil classes | – | Rural Development Administration, RDA (RDA 2022) |
Weather* | Daily from 1990 to 2013 | National Meteorological Service Agency of Ethiopia |
Streamflow* | Daily from 1990 to 2015 | Ministry of Water Resources, Irrigation, and Electricity, Ethiopia |
DEM* | 30 m | USGS ‘earthexplorer’ website |
Land use/land cover* | 30 m | GLOBELAND 30 (Jun et al. 2014) |
Soil classes* | – | Harmonized World Soil Database (FAO) |
The Gapcheon and Chogang watersheds benefit from readily available, up-to-date streamflow and weather observations. Conversely, the Melka Kunture watershed presents a data limitation, with complete observations accessible only until the year 2013. Similar data availability constraints are evident in other studies focusing on this region, with model simulations and observed data restricted to this period (e.g., Shawul & Chakma 2020; Birhanu et al. 2021; Mitiku et al. 2023).
METHODS
Soil and Water Assessment Tool
The SWAT model was used to develop hydrological models for the Gapcheon and Chogang watersheds, which comprise 26 and 27 subbasins and 1,295 and 1,199 HRUs, respectively. In the Gapcheon watershed, over 56% of the area exhibits a slope of less than 25%. For HRU generation in both watersheds, a threshold option of 7/7/7% for land use/soil/slope was employed. Approximately 73% of the total area in both watersheds is covered by forest. In the Chogang watershed specifically, 44% of the terrain features a slope of less than 25%.
Melka Kunture watershed had around 37 subbasins, and the HRU was created using a 3/3/3% Land use/Soil/Slope threshold. Around 86% of the Melka Kunture watershed is used for rainfed agriculture.
The SWAT model was calibrated and validated for the selected parameters using observed streamflow data from the outlets of each watershed (Table 2). Sequential uncertainty fitting version 2 (SUFI-2) was applied within the SWAT-calibration and uncertainty program (SWAT-CUP) user interface (Abbaspour et al. 2007; Abbaspour 2011) for calibration and validation. For each watershed, the calibration and validation periods were as follows:
Gapcheon watershed: calibration: 2002–2012; validation: 2013–2020
Chogang watershed: calibration: 2006–2014; validation: 2015–2020
Melka Kunture: calibration: 2000–2007; validation: 2008–2013
Parameter . | Description of parameters . | Initial range . |
---|---|---|
r__CN2.mgt | Soil Conservation Service (SCS) runoff curve number | −0.2 to 0.5 |
V__GWQMN.gw | Threshold depth of water in the shallow aquifer required for return flow to occur (mm) | 0–5,000 |
v__RCHRG_DP.gw | Deep aquifer percolation fraction | 0–1 |
*v__CH_K2.sub | Effective hydraulic conductivity in main channel alluvium | 0.01–150 |
*r__OV_N.hru | Manning's ‘n’ value for overland flow | −0.25 to 0.25 |
v__CANMX.hru | Maximum canopy storage | 0–100 |
r__SOL_K().sol | Saturated hydraulic conductivity | −0.25 to 0.25 |
v__SURLAG.hru | Surface runoff lag time | 0.05–24 |
v__ALPHA_BF.gw | Baseflow recession factor | 0.02–0.07 |
*v__ESCO.hru | Soil evaporation demand coefficient | 0–1 |
**v__ALPHA_BNK.rte | Baseflow alpha factor for bank storage | 0–1 |
**r__SOL_Z.sol | Depth from the soil surface to the bottom of layer | −0.2 to 0.5 |
r__SOL_AWC.sol | Soil available moisture capacity (mm H2O/mm soil) | −0.5 to 0.5 |
Parameter . | Description of parameters . | Initial range . |
---|---|---|
r__CN2.mgt | Soil Conservation Service (SCS) runoff curve number | −0.2 to 0.5 |
V__GWQMN.gw | Threshold depth of water in the shallow aquifer required for return flow to occur (mm) | 0–5,000 |
v__RCHRG_DP.gw | Deep aquifer percolation fraction | 0–1 |
*v__CH_K2.sub | Effective hydraulic conductivity in main channel alluvium | 0.01–150 |
*r__OV_N.hru | Manning's ‘n’ value for overland flow | −0.25 to 0.25 |
v__CANMX.hru | Maximum canopy storage | 0–100 |
r__SOL_K().sol | Saturated hydraulic conductivity | −0.25 to 0.25 |
v__SURLAG.hru | Surface runoff lag time | 0.05–24 |
v__ALPHA_BF.gw | Baseflow recession factor | 0.02–0.07 |
*v__ESCO.hru | Soil evaporation demand coefficient | 0–1 |
**v__ALPHA_BNK.rte | Baseflow alpha factor for bank storage | 0–1 |
**r__SOL_Z.sol | Depth from the soil surface to the bottom of layer | −0.2 to 0.5 |
r__SOL_AWC.sol | Soil available moisture capacity (mm H2O/mm soil) | −0.5 to 0.5 |
Note: v__ means that the default parameter is replaced by a given value, and r__ means that the existing parameter value is multiplied by (1 + a given value).
The calibration range for the baseflow recession factor (ALPHA_BF.gw) was determined based on baseflow recession analysis results and set to around ±30% of those values. For other parameters, the initial ranges were determined systematically using published values (Shawul & Chakma 2020; Lee et al. 2023b), expert knowledge, and sensitivity simulations exploring parameter space and parameter types.
Watershed memory: baseflow filtering using two-parameter digital filter
The recession analysis and BFImax were done to separate baseflow from streamflow. The recession constant for the Gapcheon watershed was 0.978, with an average BFImax of 0.426. For the Chogang watershed, the recession constant was 0.977, and the BFImax was 0.344. In the Melka Kunture watershed, the recession constant was 0.959, and the BFImax was 0.475.
Long short-term memory
Long short-term memory (LSTM) (Hochreiter & Schmidhuber 1997) is a type of recurrent neural network (RNN) designed to excel at capturing long-term dependencies in sequential data. Unlike traditional RNNs, which suffer from the vanishing gradient problem, LSTMs employ a memory cell, gates, and a complex gating mechanism to manage the flow of information. These gates allow LSTMs to selectively retain or discard information as needed, enabling them to handle long-term dependencies effectively.
LSTM has gained significant traction in hydrology for its ability to predict short- and long-term hydrological variables (Shen 2018; Kim et al. 2021; Xie et al. 2022). Detailed explanations of LSTM's inner workings are provided by numerous researchers, including Kim et al. (2021) and Kratzert et al. (2018).
This study used an LSTM layer with 50 units and the Adam optimizer with a learning rate of 0.001. The model was trained for 60–700 epochs with a batch size of 32–128. A dropout rate of 0.1 was employed for regularization. The architecture included multiple dense layers with varying numbers of units (256, 512, 512, 64) and rectified linear unit (ReLU) activation functions. These hyperparameters, including the LSTM units, learning rate, batch size, number of epochs, dropout rate, and dense layer architecture, are pivotal in influencing the model's ability to capture temporal patterns in the input data and can be fine-tuned to optimize prediction performance.
A hybrid machine learning approach for improved streamflow prediction
Section 2 provides an overview of two widely adopted hybridization techniques for streamflow prediction: residual error-based and watershed memory-based methods. While the former demonstrably increases prediction accuracy, it demands substantial computational resources by executing all process-based modeling steps to extract residual errors for subsequent machine learning analysis. This translates to increased cost and time investment in streamflow prediction. Conversely, watershed memory-based approaches offer relative simplicity but lack clear justifications for their selection over residual error methods.
Furthermore, while both process-based models and data-driven methods struggle with extreme events (Zhang et al. 2018; Tan et al. 2020; Yifru et al. 2024), research lacks clarity on which specific hydrological extremes benefit from hybridized approaches. Therefore, this study endeavors to address three fundamental questions regarding the two extensively employed hybridization techniques.
The methodology involves three key steps. First, baseflow recession analysis was employed to calibrate and validate the SWAT model, ensuring accurate simulation of baseflow contributions to streamflow. Next, baseflow, baseflow index, and SWAT model residuals were utilized as additional inputs to train an LSTM model, which aims to improve streamflow prediction accuracy. In this framework, four modeling approaches were investigated. The first model utilized an LSTM model with weather and observed streamflow data for prediction. In the second hybrid model, Bflow-LSTM, baseflow data joined weather and streamflow data as inputs. The third method, BFI-LSTM, employed an LSTM model with baseflow index, weather, and streamflow data. The fourth, SWAT-LSTM, utilized an LSTM to predict and update residual errors using weather and SWAT model residual errors. For all watersheds and models, data were divided into 80% for training and 20% for testing. Furthermore, 20% of the training data were allocated for cross-validation, further optimizing model performance.
Model performance indices and evaluations
Three key metrics were used to assess model performance: Nash–Sutcliffe efficiency (NSE) (Nash & Sutcliffe 1970), percent bias (PBIAS) (Sorooshian et al. 1993), and coefficient of determination (R2) (Equations (5) and (6)). NSE measures the model's ability to capture the observed variability, offering a comprehensive assessment of its overall performance. PBIAS reveals whether the model overestimates or underestimates streamflow, highlighting potential biases in the predictions. R2 evaluates how well the model explains the observed variations in streamflow, providing insight into its fit and explanatory power.
In Equations (5) and (6), Qi and Si are the observed and computed data for the ith day, n is the length of data considered, and is the mean of n observed data, respectively.
While traditional model performance indices offer a general assessment of overall model accuracy, they often fail to reveal specific limitations in predicting critical flow regimes like peak and low flows. To address this, we employed flow duration curves (FDCs) to visually evaluate the performance of all modeled outputs and observed flow. This approach leverages the inherent sensitivity of FDCs to changes in flow frequency and magnitude, allowing us to identify potential biases in model predictions at both extreme and intermediate flow levels. To specifically assess model performance at extreme events, we zoomed in on the far ends of the FDC plots, enabling a close-up examination of peak and low flow predictions.
RESULTS AND DISCUSSION
SWAT model calibration and validation
Based on the widely used model performance rating by Moriasi et al. (2007), the Gapcheon watershed model achieved a ‘good’ rating based on its NSE values (Table 3). However, its performance varied between calibration and validation periods, receiving ‘satisfactory’ and ‘unsatisfactory’ ratings, respectively. In contrast, the Chogang watershed model's performance was ‘unsatisfactory.’ As discussed in other studies (Tigabu et al. 2023), the observed high inter-annual flow variability in this Korean watershed might be a contributing factor to the SWAT model's underperformance. The Melka Kunture watershed model exhibited the best performance, achieving a ‘generally good’ rating.
Watershed . | NSE . | PBIAS . | R2 . |
---|---|---|---|
Gapcheon | 0.72 (0.70) | 32.7 (23.8) | 0.72 (0.70) |
Chogang | 0.53 (0.37) | 52.4 (37.6) | 0.69 (0.46) |
Melka Kunture | 0.70 (0.68) | − 3.3 ( − 19.8) | 0.74 (0.71) |
Watershed . | NSE . | PBIAS . | R2 . |
---|---|---|---|
Gapcheon | 0.72 (0.70) | 32.7 (23.8) | 0.72 (0.70) |
Chogang | 0.53 (0.37) | 52.4 (37.6) | 0.69 (0.46) |
Melka Kunture | 0.70 (0.68) | − 3.3 ( − 19.8) | 0.74 (0.71) |
The values in the bracket are validations.
Overall performance of hybrid models
Across study regions, performance differences between the standalone SWAT and LSTM models were negligible. The hybrid models demonstrated superior predictive accuracy compared to the standalone LSTM model. Notably, the Bflow-LSTM and BFI-LSTM models significantly outperformed the weather and streamflow-based LSTM predictions. This finding underscores the critical importance of incorporating watershed memory, particularly through baseflow and baseflow index, to enhance model robustness and ensure consistent performance across training and testing phases.
The hybrid model's performance in South Korean watersheds exhibited an NSE value range of 0.79–0.99 during training and 0.55–0.98 during testing (Table 4). Conversely, in the Ethiopian watershed, all tested hybrid models achieved an NSE value of 0.99 during training, with values ranging from 0.98 to 0.99 in the testing period. Across all watersheds, Bflow-LSTM consistently outperformed other models.
Study area . | Modeling scenarios . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
NSE . | PBIAS . | R2 . | NSE . | PBIAS . | R2 . | ||
Gapcheon | LSTM | 0.57 | 30.2 | 0.54 | 0.32 | 36.8 | 0.27 |
Bflow-LSTM | 0.98 | 6.8 | 0.98 | 0.98 | 15.2 | 0.98 | |
BFI-LSTM | 0.79 | − 3.7 | 0.80 | 0.73 | 1.21 | 0.73 | |
SWAT-LSTM | 0.88 | 10.6 | 0.88 | 0.71 | 12.5 | 0.71 | |
Chogang | LSTM | 0.71 | 31.5 | 0.73 | 0.43 | 27.4 | 0.46 |
Bflow-LSTM | 0.99 | − 1.8 | 0.99 | 0.97 | − 4.1 | 0.98 | |
BFI-LSTM | 0.89 | 22.9 | 0.90 | 0. 77 | 8.6 | 0.83 | |
SWAT-LSTM | 0.85 | 24.4 | 0.86 | 0.55 | 4.2 | 0.57 | |
Melka Kunture | LSTM | 0.89 | − 5.0 | 0.89 | 0.82 | 8.6 | 0.82 |
Bflow-LSTM | 0.99 | 2.9 | 0.99 | 0.99 | 6.2 | 0.99 | |
BFI-LSTM | 0.99 | − 1.4 | 0.99 | 0.98 | 5.7 | 0.97 | |
SWAT-LSTM | 0.99 | 3.9 | 0.99 | 0.99 | 4.4 | 0.99 |
Study area . | Modeling scenarios . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
NSE . | PBIAS . | R2 . | NSE . | PBIAS . | R2 . | ||
Gapcheon | LSTM | 0.57 | 30.2 | 0.54 | 0.32 | 36.8 | 0.27 |
Bflow-LSTM | 0.98 | 6.8 | 0.98 | 0.98 | 15.2 | 0.98 | |
BFI-LSTM | 0.79 | − 3.7 | 0.80 | 0.73 | 1.21 | 0.73 | |
SWAT-LSTM | 0.88 | 10.6 | 0.88 | 0.71 | 12.5 | 0.71 | |
Chogang | LSTM | 0.71 | 31.5 | 0.73 | 0.43 | 27.4 | 0.46 |
Bflow-LSTM | 0.99 | − 1.8 | 0.99 | 0.97 | − 4.1 | 0.98 | |
BFI-LSTM | 0.89 | 22.9 | 0.90 | 0. 77 | 8.6 | 0.83 | |
SWAT-LSTM | 0.85 | 24.4 | 0.86 | 0.55 | 4.2 | 0.57 | |
Melka Kunture | LSTM | 0.89 | − 5.0 | 0.89 | 0.82 | 8.6 | 0.82 |
Bflow-LSTM | 0.99 | 2.9 | 0.99 | 0.99 | 6.2 | 0.99 | |
BFI-LSTM | 0.99 | − 1.4 | 0.99 | 0.98 | 5.7 | 0.97 | |
SWAT-LSTM | 0.99 | 3.9 | 0.99 | 0.99 | 4.4 | 0.99 |
While the standalone LSTM served as the primary benchmark for evaluating hybrid model performance, comparisons with the standalone SWAT model further revealed significant improvements achieved by the SWAT residual error-based hybrid models. This enhancement was particularly noteworthy in the Korean watersheds, where the standalone SWAT model exhibited limitations. The hybrid SWAT-LSTM model demonstrably improved core streamflow prediction metrics (NSE and PBIAS) in the Korean watersheds.
Despite significant streamflow prediction improvements in the Korean watersheds compared to the standalone SWAT model in the region, the hybrid SWAT-LSTM achieved near-perfect performance in the Melka Kunture watershed. This exceptional outcome might be directly linked to the standalone SWAT model's initial performance in this specific case. While other studies (Shen et al. 2022) suggest equivalent performance improvements from hybridizing machine learning models regardless of process-based model calibration, our findings hint at a potential influence of calibration performance on hybrid model outcomes. Further investigation is warranted to explore this possible connection.
Hybrid models’ adaptability to streamflow extremes
The Bflow-LSTM model significantly outperformed other models in peak flow prediction across all study watersheds. Interestingly, the SWAT model achieved performance comparable to the hybrid modeling approaches at low-flow prediction. This finding aligns with previous studies, such as Kim et al. (2021), who reported that machine learning models often excel at peak flow prediction while process-based models are better suited for low-flow simulations. Notably, each model exhibited significantly different performance at extremely low-flow prediction, with the watershed memory-based models consistently overestimating extreme low flow in all watersheds.
Recent studies (Vinuesa et al. 2020) have underscored the potential of artificial intelligence in advancing various sustainable development goals (SDGs). The environmental-related SDGs are SDG 13 (climate action), SDG 14 (life below water), and SDG 15 (life on land) (United Nations Development Programme 2016). This research aligns directly with these broader initiatives, illuminating the performance of various modeling approaches in capturing extreme events. Improved prediction of low flows enables more effective management of water resources during droughts, ensuring equitable access to safe water for all. Furthermore, understanding how models handle extreme events facilitates the development of strategies for flood preparedness and mitigation, safeguarding communities and infrastructure from the impacts of climate change.
Limitations and outlook
The focus on the popular residual error approach in this study, while motivated by its widespread application, restricted the exploration of other promising hybrid techniques. This limited scope prevented an in-depth investigation of incorporating additional variables, feeding process-based model outputs directly into machine learning algorithms, and even replacing specific modules within the models themselves (Bhasme et al. 2022; Cho & Kim 2022; Feng et al. 2022; Liu et al. 2022; Yu et al. 2023). Additionally, the study solely explored baseflow-based memory techniques, overlooking the potential of other intermediate process outputs such as evapotranspiration that could further enhance model performance. Therefore, this study can serve as a valuable springboard for future research to delve into these untapped avenues.
Unlocking the full potential of hybrid hydrological models demands investigating promising alternatives beyond the residual error approach. Can incorporating residual error as a predictor refine process-based simulations, or should it be included as an input variable? Similarly, does a modular approach using watershed memory terms, encompassing both surface runoff and baseflow, outperform individual variable inputs? Evaluating the efficacy of other deep learning models within this framework also presents an intriguing avenue for future research. Answering these seemingly simple questions will unlock a new era of efficiency and accuracy in hybrid machine learning prediction studies, paving the way for more robust and adaptable hydrological models.
CONCLUSIONS
This study delved into the efficacy of two prevalent streamflow prediction hybridization techniques across two geographically and hydrometeorologically diverse regions: residual error-based and watershed memory-based methods. While the former demonstrably elevates accuracy by capitalizing on comprehensive process-based modeling, it incurs substantial computational expenses. Conversely, watershed memory-based approaches present a computationally efficient alternative, though their competitive edge against the more data-intensive residual error methods and the rationale for selecting one over the other remains unclear. To bridge this knowledge gap and inform best practices, this study investigates three key questions:
Which hybridization methodology demonstrates superior overall performance?
Do these techniques significantly enhance forecasts of specific hydrological extremes (low flow, peak flow)?
Can alternative watershed memory terms surpass the efficacy of the traditional baseflow data-based hybridization approach?
Employing a rigorous methodological framework incorporating baseflow analysis, watershed process modeling using SWAT, and development of standalone and hybrid LSTM models, our study yielded key insights. First, the Bflow-LSTM model, leveraging baseflow data, consistently outperformed all other investigated methods across diverse watersheds and metrics (NSE, PBIAS, and R2) during training and testing. Notably, it significantly improved extreme events predictions, particularly peak flows, relative to the standalone LSTM. Second, while overall performance remained modest, the standalone SWAT model showed comparably good performance at low-flow events. This highlights the continued relevance of process-based models in capturing low-flow dynamics. Third, our exploration of alternative memory terms yielded nuanced results. Although the BFI-LSTM model, which utilizes the baseflow index as memory, showed promising performance, it did not consistently surpass the Bflow-LSTM model across all watersheds and metrics.
The choice between watershed memory and residual error-based hybrid models may depend on the specific study focus. In the context of drought-related or low-flow studies, the standalone SWAT model may be a well-suited tool. However, for peak flow or flood events, the superior performance of the Bflow-LSTM model makes it a valuable tool for hydrological studies and disaster management applications.
ACKNOWLEDGEMENTS
This study was supported by the Surface Soil Conservation and Management (SS) projects, funded by the Ministry of Environment (MOE) of Korea, under grant number 2019002820003, and by a National Research Foundation of Korea (NRF) grant, which is funded by the Korea government (MSIT), with the grant number 2022R1F1A1073748.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.