ABSTRACT
Accurate hydrological modeling is essential for understanding and managing water resources. This study conducts a comparative analysis of hydrological modeling strategies in a date-scarce region. This study examines lumped (IHACRES), semi-distributed (HEC-HMS), and hybrid-lumped/long short-term memory (LSTM) models, aiming to assess their performance and accuracy in a data-scarce region. It investigates whether lump models can accurately simulate flow and evaluates the impact of combining lump models with machine learning to enhance accuracy, compared to semi-distributed models. The IHACRES model underestimates discharge, but its commendable NSE during calibration (0.628) and validation (0.681) signifies reliable simulation. The HEC-HMS model accurately depicts daily streamflow but struggles with extreme events, showcasing limitations in predicting maximum flows. The hybrid-lumped/LSTM model exhibits improved accuracy over IHACRES. Despite some underestimation, it mitigates IHACRES limitations during extreme events. However, challenges persist in simulating high flows, emphasizing the necessity for further refinement. The findings contribute to the discourse on merging machine learning with traditional hydrological models in data-scarce regions. The hybrid model offers promise but underscores the need for ongoing research to optimize performance, especially during extreme events. This study provides valuable insights for advancing hydrological modeling capabilities in complex watersheds.
HIGHLIGHTS
Introducing a novel approach that combines traditional lumped models with Long Short-Term Memory (LSTM) machine learning for improved accuracy.
Addressing challenges of modeling in data-scarce regions, demonstrating robust performance with minimal input data.
Application of models in a complex watershed scenario, reflecting practical implications for water resource management.
Comprehensive evaluation of lumped (IHACRES), semi-distributed (HEC-HMS), and hybrid lumped/LSTM models, highlighting their strengths and weaknesses.
Integration of machine learning techniques to enhance the accuracy and reliability of hydrological models in challenging environmental conditions.
INTRODUCTION
Rapid urbanization, industrialization, and associated factors such as deforestation and land cover changes necessitate an accurate hydrological modeling to understand the evolving relationships between water dynamics and the environment across the various phases of the hydrological cycle (Devi et al. 2015). The main objective of hydrological modeling is to accurately replicate flows, aiming for minimal errors, and a proficient model is anticipated to demonstrate resilience to alterations in watershed conditions (Aqnouy et al. 2023). Hydrological models aid in decision-making, especially in situations with limited data and incomplete comprehension of a hydrological system (Duan et al. 2019; Ávila et al. 2022). Accurate hydrological modeling is crucial for effective water resource management and climate change adaptation, especially as populations increase and urbanization accelerates, making understanding water availability and flow dynamics increasingly important. Hydrological models not only aid in managing water resources but also play a vital role in predicting and mitigating the impacts of climate change, such as altered precipitation patterns and more frequent extreme weather events (IPCC 2021). These models provide insights into how changes in land use, vegetation, and climate affect the water cycle, thereby assisting policymakers and water resource managers in making informed decisions (Bates et al. 2008).
Rainfall–runoff models can be categorized as either lumped or distributed models, depending on how the model parameters vary with spatial considerations (Devi et al. 2015). Lumped models consider the entire river basin as a single unit, disregarding spatial variability, while distributed models divide the catchment into small units to account for spatial variations in parameters and outputs (Moradkhani & Sorooshian 2008). In addition, machine learning models provide an efficient, data-driven approach with reduced data requirements and time demands compared to hydrological models (Mosavi et al. 2018; Wen & Feng 2023). Long-term short-term memory (LSTM) has gained significant popularity in machine learning competitions due to its speed, efficiency, and scalability and is used for different hydrological purposes such as flood susceptibility mapping (Fang et al. 2021), groundwater level prediction (Solgi et al. 2021), and water level prediction (Cho et al. 2022). LSTM has also been widely used for continuous runoff simulation in watersheds with abundant data availability (Ni et al. 2020; Xiang et al. 2020; Ren et al. 2022). Additionally, using machine learning models exclusively for streamflow simulation has drawbacks, including a tendency to neglect the physical aspects of the rainfall–runoff process (Mohammadi et al. 2022). Consequently, the fusion of hydrologic models and machine learning models has gained traction among hydrologists in recent years.
There has been recent research using lumped models such as the Hydrologiska Byråns Vattenbalansavdelning model (HBV), Identification of Hydrological Catchment Response and Simulation model (IHACRES), and semi-distributed model such as Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) for hydrological studies. For instance, Shakarneh et al. (2022) used HEC-HMS to simulate 20 rainfall–runoff events in two Palestinian catchments. The calibrated HEC-HMS model showed good performance for short-term flow forecasting in similar environments, aiding water resource managers in predicting and mitigating flood risks under future climatic scenarios. Esmaeili-Gisavandani et al. (2021) employed five hydrological models – Soil and Water Assessment Tool (SWAT), IHACRES, HBV, Australian Water Balance Model (AWBM), and Soil Moisture Accounting (SMA) – to simulate the Hablehroud River flow in north-central Iran. During calibration, SWAT, IHACRES, and HBV yielded satisfactory results. However, in the validation phase, only the SWAT model demonstrated excellent performance, surpassing the other models. Sorman et al. (2020) evaluated HEC-HMS and HBV-light for the water years from 2008 to 2015 in the mountainous headwaters of the Aras Basin in eastern Turkey. The findings indicated that HEC-HMS exhibited superior performance compared to HBV-light.
In poorly gauged areas, hydrological modeling becomes difficult, prompting researchers to employ various techniques to enhance model accuracy. For instance, (Rahman et al. 2020) utilized two merged precipitation datasets (MPDs) to predict daily streamflow using the SWAT in the Potohar Plateau, Pakistan. The study concludes that MPDs combine the strengths of individual satellite precipitation datasets and show greater potential for hydrological applications. Brocca et al. (2020) examined river flow prediction in regions with limited data, particularly focusing on West Africa. They discovered that incorporating satellite rainfall data with soil moisture measurements yielded more accurate predictions of river flow compared to relying solely on rain gauge observations. Akhtar et al. (2021) utilized remote sensing in conjunction with the SWAT to simulate the effects of climate change on water resources in the Kabul River Basin (KRB). They derived many of the biophysical parameters necessary for the SWAT model from remote sensing algorithms. Their findings suggest that integrating remote sensing data with the SWAT model can be a valuable approach to facilitate sustainable management and strategic planning of water resources in the KRB. Furthermore, many other studies used a combination of hydrological models and machine learning to increase the accuracy of their hydrological simulation. For example, Young & Liu (2015); Narayana Reddy & Pramada (2022) integrated HEC-HMS with an artificial neural network (ANN) to forecast runoff. Their developed hybrid model, HEC-HMS–ANN, showed improved accuracy in predicting runoff outcomes. Achite et al. (2022) applied the Modello Idrologico SemiDistribuito in continuo (MISD) model, a conceptual hydrological model, along with the group method of data handling to simulate daily streamflow in the Kalixälven River basin in northern Sweden. Their findings revealed that integrating meteorological variables into the existing conceptual hydrological model with parallel settings enhances the accuracy of streamflow simulation using deep learning models.
It is notable that our research explores the effectiveness of combining machine learning and traditional hydrological models, particularly in regions with limited data. Emphasizing the significance of our study being conducted in a data-scarce region, where hydrological modeling is particularly challenging due to the lack of extensive data, distinguishes our research from studies conducted in more data-rich environments. This underscores the unique challenges faced in accurately modeling water dynamics. Our study provides a thorough assessment of model accuracy, incorporating both quantitative metrics and qualitative evaluations of each model's performance, especially during extreme events. While previous studies may have evaluated similar models, our research examines their performance specifically in the context of limited data availability, which affects model accuracy differently. Therefore, the objective of this study is to conduct a comparative analysis between lumped model and a combined lumped and LSTM model, requiring minimal input data (limited to precipitation and temperature), and a semi-distributed model. The primary focus is to address the inquiry of whether lump models can accurately simulate flow in a data-scarce region and evaluates the impact of combining lump models with machine learning to enhance accuracy, compared to semi-distributed models.
The following is how the rest of the text is organized. In Section 2, detailed information about the study area, including its geographical features and the dataset used, is provided. Section 3 discusses the specific hydrological models employed in our analysis, including lumped, semi-distributed, and hybrid models, outlining their methodologies and parameters. Moving forward, the results of our calibration and validation processes for each model are presented in Section 4, evaluating their performance in accurately simulating streamflow. This includes an in-depth discussion on the strengths and limitations of each model. Finally, in the conclusions section, insights into the implications of our findings for water resource management are provided, discussing how these models can assist in decision-making and adaptation strategies. The need for further research to improve the accuracy and applicability of hydrological models in addressing the challenges posed by urbanization and climate change is also highlighted.
MATERIALS AND METHODS
Study area and dataset
Located in the western part of Iran, the Kashkan watershed is a notable region renowned for its rivers. Over time, the presence of these rivers has led to the establishment of numerous settlements along the watercourses in the province. Covering an expansive area of 9,560 km2, the Kashkan River exhibits an average annual discharge of 52 cubic meters per second, contributing to a total annual discharge volume of 1 billion and 636 million cubic meters. On April 1, 2019, the Kashkan River experienced an unprecedented event, with a peak flow recorded at approximately 5,232 cubic meters per second at the watershed's outlet. This extraordinary occurrence resulted in severe damage to a significant number of residential units, roads, agricultural lands, and critical infrastructure, causing substantial human and financial losses.
Geographical location of the study area and location of hydrometeorological stations.
Geographical location of the study area and location of hydrometeorological stations.
METHODOLOGY
Lump model


Hybrid model (IHACRES/LSTM)




An essential phase in preparing input variables for a machine learning model involves screening potential factors, coupled with the runoff output generated by IHACRES. Commonly considered variables include rainfall, maximum temperature, minimum temperature, and evapotranspiration. In alignment with the study's goals and the data accessibility in the region, this research integrates precipitation and average temperature, along with the predicted runoff output from IHACRES.
HEC-HMS model
Basin model
Canopy (simple canopy)
The sub-basin incorporates a canopy component designed to simulate the vegetation in the region, employing the HEC-HMS canopy method primarily for continuous simulations, where rainfall accumulates until the canopy storage capacity is reached (Ouédraogo et al. 2018). This method requires two parameters: initial storage in percentage and maximum storage in millimeters. Assuming zero initial storage (dry leaves and tree trunks at the beginning of modeling), calculation of evaporation and transpiration in dry and wet period and simple absorption method, initial values of maximum storage were calculated from the land use map and the coefficients suggested by USACE.
Surface (simple surface)
This method offers a simple representation of the top layer of soil, where rainfall accumulates until reaching the surface's maximum capacity. Infiltration is possible even before reaching full capacity, but if precipitation exceeds the infiltration rate, surface runoff occurs. This method requires two parameters: initial storage in percentage and maximum storage in millimeters. We assume that in the start of our model all the lands are dry (initial storage = 0) and we estimated the maximum storage based on the slope, more detail can be found in Fleming & Neary (2004).
Routing (Muskingum method)
The flood wave velocity (Vw), considered as 1.5 times the average velocity, and the reach length (L) are calculated using data obtained from stream gauging sites (Hamdan et al. 2021).
Loss (SMA)
The SMA accurately represents soil moisture dynamics within the watershed, capturing spatial and temporal variations under different climatic conditions. It integrates multiple factors including soil properties, land cover, and climate data, providing a comprehensive understanding of soil moisture influences. Additionally, its flexibility allows adaptation to different geographic areas and land use scenarios, making it versatile for various applications. The methods' utility extends to water resources management, aiding in water availability assessment, runoff prediction, and decision-making. The SMA method replicates the flow and retention of water within the soil profile and across various groundwater layers (Leavesley et al. 1983). The upper threshold for the rate at which water enters the soil from surface storage, known as the maximum infiltration rate, was established by analyzing the soil in the catchment, reflecting the saturated hydraulic conductivity (Saxton & Willey 2005). The impervious area was identified as the percentage of the urban area, and these values were calculated based on a land use/land cover map. The determination of storage coefficients and depths for GW1 and GW2 relied on an analysis of historical flow data related to streamflow recession (Ouédraogo et al. 2018) The average hydraulic conductivity of all sub-basins was selected as the percolation rate for both soil and the first groundwater layer (GW1) (Ouédraogo et al. 2018). The starting values for soil water storage, defined as porosity, tension storage representing the field capacity of the soil, and the percolation rate of GW2 were sourced from the Davtalab et al. (2017) study, where they incorporated the Kashkan watershed as one of their sub-basins.
Base flow (recession)
The recession base flow technique aims to mimic the common patterns observed in watersheds as channel flow gradually diminishes in an exponential manner. Due to varying water withdrawals in the main stream and sub-basin rivers, the base flow of the Kashkan watershed exhibits nonlinear behavior, making recession the most suitable method for simulating base flow in this study. The starting base flow (in cubic meters per second) needs to be defined at the start of a simulation. The recession constant indicates the pace at which base flow diminishes between storm events, and by using the parameter of the ratio to peak, we can determine the approach for resetting the base flow during storm events. Those parameters were estimated by analysis of historical flow data.
Transform (Clark unit hydrograph)
Metrological model
Here, c represents a coefficient, N denotes the count of daylight hours, and Pt stands for the saturated water vapor density at the daily mean temperature. According to the HEC-HMS User Manual (U.S. Army Corps of Engineers 2018), the only input for this method in HEC-HMS is the coefficient (mm/g/m3), where the initial value is 0.1651 mm/g/m3, then it was calibrated during the calibration process.
Due to lack of snow water equivalent time series, temperature index method was used for snow modeling. In this degree-day approach a fixed amount of snowmelt for each degree above freezing is considered. The parameters for these methods are: lapse rate which represents the temperature change per 1,000 m, it is typically −6.5 °C (Muralikrishna & Manickam 2017). The PX temperature is employed to differentiate between rain and snow, with any precipitation occurring when the air temperature is below the specified threshold assumed to be snow, according to field surveys this value is equal to 1.5 °C. The base temperature in set to be 0, that means if the air temperature is less than the base temperature, then the amount of melt is zero. Another parameter is wet melt rate (millimeter/degree °C-day) and dry melt rate (millimeter/degree °C-day), where the initial values were selected from Davtalab et al. (2017) then calibrated.
Model evaluation
RESULTS AND DISCUSSION
Models calibration and validation
HEC-HMS model
For calibration purposes, we utilized the inherent calibration functions in the software, which include Percentage Bias and Peak Weighted RMSE. Additionally, considering the multitude of parameters in HEC-HMS and drawing insights from previous research (Fleming & Neary 2004; Bhuiyan et al. 2017; Dariane et al. 2019), we also incorporated a manual calibration approach. This involved calibrating the discharge for each hydrometric station from the upper sub-basins to the outlet for the calibration period. Table 1 illustrates the parameter value ranges employed for sub-basins throughout the calibration process.
The range of parameter values pertaining to the sub-basins during the calibration process
Sub-models . | Parameters . | Definition . | Range for calibrated values . |
---|---|---|---|
Canopy method | Max storage (mm) | Amount of water that can be held on leaves | 1–1.70 |
Surface storage | Max storage (mm) | Amount of water that can be held on the soil surface | 10–25 |
Loss method | Max infiltration | Upper bond on infiltration from the surface to the soil | 0.8–1.2 |
Impervious | Land surface that does not allow water to infiltrate | 8–20 | |
Soil storage | Total storage availabe on the soil | 40–120 | |
Tension storage | Quantity of water that remains stationary | 10–30 | |
Soil precolation | Upper bound on precolation from the soil storage into the upper ground water | 10–30 | |
GW1 storage | Total storage in the upper ground water layer | 20–80 | |
GW1 precolation | Percolation from the upper groundwater into the lower groundwater | 0.05–10 | |
GW1 coefficient | Delay in transforming stored water to lateral outflow in a linear reservoir | 150–800 | |
GW2 storage | Total storage in the lower growndwater | 2–40 | |
GW2 precolation | Upper bound on deep precolation out of the system | 1–10 | |
GW2 coefficient | Delay in transforming stored water to lateral outflow in a linear reservoir | 250–900 | |
Transform | Time of concentration | Maximun travel time in the sub-basin | 10–80 |
Storage coefficient | Accounts for storage effects | 25–100 | |
Baseflow | Recession constant | Rate at which baseflow receds between storm events | 0.85–0.995 |
Ratio | Proportion of groundwater discharge that is attributed to baseflow | 0.03–0.30 | |
Evaportranspiration | Hamon coeficient | Used to estimate potential evapotranspiration | 0.10–0.16 |
Snowmelt | PX temperature | Discriminate between precipitatiom falling as rain or snow | 1–1.5 |
Wet meltrate | The rate at which snowpack melts when it is raining | 1–4.5 | |
Dry meltrate | The rate at which snowpack melts when it is no raining | 1–4.5 |
Sub-models . | Parameters . | Definition . | Range for calibrated values . |
---|---|---|---|
Canopy method | Max storage (mm) | Amount of water that can be held on leaves | 1–1.70 |
Surface storage | Max storage (mm) | Amount of water that can be held on the soil surface | 10–25 |
Loss method | Max infiltration | Upper bond on infiltration from the surface to the soil | 0.8–1.2 |
Impervious | Land surface that does not allow water to infiltrate | 8–20 | |
Soil storage | Total storage availabe on the soil | 40–120 | |
Tension storage | Quantity of water that remains stationary | 10–30 | |
Soil precolation | Upper bound on precolation from the soil storage into the upper ground water | 10–30 | |
GW1 storage | Total storage in the upper ground water layer | 20–80 | |
GW1 precolation | Percolation from the upper groundwater into the lower groundwater | 0.05–10 | |
GW1 coefficient | Delay in transforming stored water to lateral outflow in a linear reservoir | 150–800 | |
GW2 storage | Total storage in the lower growndwater | 2–40 | |
GW2 precolation | Upper bound on deep precolation out of the system | 1–10 | |
GW2 coefficient | Delay in transforming stored water to lateral outflow in a linear reservoir | 250–900 | |
Transform | Time of concentration | Maximun travel time in the sub-basin | 10–80 |
Storage coefficient | Accounts for storage effects | 25–100 | |
Baseflow | Recession constant | Rate at which baseflow receds between storm events | 0.85–0.995 |
Ratio | Proportion of groundwater discharge that is attributed to baseflow | 0.03–0.30 | |
Evaportranspiration | Hamon coeficient | Used to estimate potential evapotranspiration | 0.10–0.16 |
Snowmelt | PX temperature | Discriminate between precipitatiom falling as rain or snow | 1–1.5 |
Wet meltrate | The rate at which snowpack melts when it is raining | 1–4.5 | |
Dry meltrate | The rate at which snowpack melts when it is no raining | 1–4.5 |
Table 2 presents an evaluation of the model's performance using three distinct metrics. Specifically, the values for R², NSE, and AEPF1y were 0.859, 0.696, and 15.21%, respectively. In the validation phase, these values were 0.916, 0.831, and 23.62%. Additionally, the AEPF1y for HEC-HMS was 15.21% during calibration and 23.62% during validation, indicating a notable discrepancy and underscoring the model's limitations in accurately simulating high flows.
Accuracy assessment of HEC-HMS
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
HEC-HMS | Calibration | 0.859 | 0.696 | 15.21 |
Validation | 0.916 | 0.831 | 23.62 |
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
HEC-HMS | Calibration | 0.859 | 0.696 | 15.21 |
Validation | 0.916 | 0.831 | 23.62 |
IHACRES
The IHACRES hydrological model was employed to conduct hydrological modeling in this study. For the calibration of IHACRES models, a grid search function was utilized, which is originally implemented in the IHACRES 2.1.2 software. In this calibration section, a range for each parameter can be identified. The calibrated values of IHACRES are presented in Table 3, providing detailed insight into the calibration process and its outcomes.
The optimal values during the calibration process for the IHACRES model
Parameters . | Description . | Optimal value . |
---|---|---|
τq | Time constant governing the rate of recession of quick flow (day) | 8.229 |
τs | Time constant governing rate of recession of slow flow (day) | 278.933 |
![]() | The proportion of slow flow to total flow (proportion) | 0.294 |
tw | Drying rate at reference temperature (day) | 7 |
f | Temperature dependence of drying rate (°C−1) | 4 |
C | Mass balance term (mm) | 0.0015 |
Parameters . | Description . | Optimal value . |
---|---|---|
τq | Time constant governing the rate of recession of quick flow (day) | 8.229 |
τs | Time constant governing rate of recession of slow flow (day) | 278.933 |
![]() | The proportion of slow flow to total flow (proportion) | 0.294 |
tw | Drying rate at reference temperature (day) | 7 |
f | Temperature dependence of drying rate (°C−1) | 4 |
C | Mass balance term (mm) | 0.0015 |
In Figure 4, the hydrograph produced by the IHACRES model is presented. Throughout the entire duration from 1997 to 2019, the IHACRES model consistently underestimates the discharge in comparison to the observed data. Notably, during the validation period from January 1, 2013, to August 31, 2019, the underestimation is more prominent. Specifically, during the flood event on April 1, 2019, where the observed data records a flow of 5,237 m3/s, IHACRES generates a lower flow of 3,978 m3/s. For the IHACRES model, although overestimation persists in flow simulation during dry seasons, the error tends to be lower compared to the HEC-HMS model. For instance, in the summer of 2017, the average observed flow was 7.1 m³/s, while the simulation showed a slightly higher value of 8.7 m³/s.
Table 4 offers a comprehensive assessment of the IHACRES model's performance across two distinct periods, employing three key metrics. The NSE indicates commendable performance during both calibration (0.795) and validation (0.826), with the latter period showing a slight improvement. The AEPF1y values of 43.63% and 56.59% during calibration and validation, respectively, highlight the model's incapability in modeling high flows, which can contribute to the higher error in observation data in extreme whether event.
Accuracy assessment of the IHACRES model
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
IHACRES | Calibration | 0.795 | 0.628 | 43.63 |
Validation | 0.826 | 0.681 | 56.59 |
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
IHACRES | Calibration | 0.795 | 0.628 | 43.63 |
Validation | 0.826 | 0.681 | 56.59 |
Hybrid model
As an input to the LSTM model for generating hybrid model, output runoff from IHACRES, daily precipitation, and average temperature were used. Grid search was utilized to find the best parameters for LSTM, calibrated parameters are outlined on Table 5.
The optimal values and states during the calibration process for the LSTM model
Parameters . | Optimal value/state . |
---|---|
Number of layers | 2 |
Number of neurons in the first layer | 150 |
Number of neurons in the second layer | 100 |
Activation function | Relu |
Optimizer | Adam |
Loss | MSE |
Epochs | 200 |
Batch size | 10 |
Parameters . | Optimal value/state . |
---|---|
Number of layers | 2 |
Number of neurons in the first layer | 150 |
Number of neurons in the second layer | 100 |
Activation function | Relu |
Optimizer | Adam |
Loss | MSE |
Epochs | 200 |
Batch size | 10 |
Table 6 displays the accuracy assessment metrics for the hybrid model. During the calibration period, all metrics showed slight improvements, but these were more substantial during the validation period. However, there was not a significant improvement in handling high flows, highlighting the hybrid model's inadequacy in simulating maximum flows.
Accuracy assessment of the hybrid model
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
Hybrid model | Calibration | 0.803 | 0.644 | 38.70 |
Validation | 0.955 | 0.909 | 45.23 |
Model . | Period . | ![]() | NSE . | AEPF1y (%) . |
---|---|---|---|---|
Hybrid model | Calibration | 0.803 | 0.644 | 38.70 |
Validation | 0.955 | 0.909 | 45.23 |
CONCLUSION
This study undertook a comparative analysis between a hybrid-lumped and LSTM model, requiring minimal input data, and a semi-distributed model for hydrological modeling in the Kashkan watershed in western Iran. The research aimed to address the question of whether the hybrid-lumped/LSTM model could achieve comparable accuracy to a semi-distributed model, which requires a more extensive dataset.
The lumped model IHACRES, the semi-distributed model HEC-HMS, and the hybrid-lumped/LSTM model were calibrated and validated using various hydrological metrics. The HEC-HMS model exhibited accurate simulation of daily streamflow but showed limitations in accurately predicting high flows, particularly during extreme events such as the flood on April 1, 2019. The IHACRES model, while underestimating discharge, displayed inaccurate performance, especially in simulating high flows.
The hybrid-lumped/LSTM model, incorporating machine learning techniques, demonstrated improved accuracy compared to IHACRES, particularly in the validation phase. The model addressed some of the limitations of IHACRES, reducing the underestimation of discharge, especially during extreme events. However, challenges remain in accurately simulating high flows, emphasizing the need for further refinement and exploration of hybrid modeling approaches.
In comparing our study with similar hydrological modeling research conducted in neighboring countries or regions, several trends and differences emerge. For instance, Prakasam et al. (2023) investigated hydrological modeling in Himalayan catchment, utilizing the HEC-HMS model, which showed satisfactory performance in simulating daily streamflow, while our study also achieved accurate daily streamflow simulation using the HEC-HMS model. Esmaeili-Gisavandani et al. (2021) focused on hydrological modeling in the Hablehroud River, north-central Iran, employing various models including IHACRES and HBV-light, both of which demonstrated good performance in simulating streamflow. However, both our study and Esmaeili-Gisavandani et al. (2021) identified challenges in accurately predicting high-flow events. Garee et al. (2017) found limitations in the SWAT model's ability to accurately simulate high flows. They conducted a case study in the Glacierized Catchment Hunza and observed consistent underestimation of discharge by the SWAT model, particularly during high-flow events. Sorman et al. (2020) studied the hydrological modeling of the upper Aras Basin, Turkey, using HBV-light and also encountered challenges in modeling high flows. In the study conducted by Narayana Reddy & Pramada (2022), which utilized a combination of HEC-HMS and ANN for flow simulation in an Indian watershed, the underestimation of the hybrid model during extreme high flows was also observed in their study, aligning with our findings. In summary, employing lumped models for hydrological modeling in data-scarce regions can result in inaccuracies in high-flow modeling. Despite the use of hybrid models improving the overall performance of hydrological models, the issue of inaccuracies in simulating high flows may persist. These comparisons highlight commonalities in modeling approaches, such as the use of meteorological and topographic data, but also underscore differences in model performance, particularly in capturing extreme precipitation events. By understanding these trends, our study contributes to the broader understanding of hydrological dynamics in the region and emphasizes the need for improved modeling approaches to support water resource management.
Discussing the limitations of hydrological models such as HEC-HMS, HBV-light, and hybrid models requires considering various aspects of their structure, assumptions, and application. HEC-HMS poses several limitations, including its complexity in setup, necessitating detailed input data on topography, land use, soil properties, and meteorological data, which may not always be available or precise. Its accuracy heavily relies on parameterization, and poorly calibrated parameters can result in significant discrepancies between simulated and observed hydrographs. Additionally, its spatial and temporal resolution might not adequately capture small-scale variability or short-duration events, potentially leading to underestimation or overestimation of runoff. Errors in precipitation input, parameter uncertainties, and simplifications in the model structure further contribute to potential sources of error or uncertainty. Similarly, HBV-light simplifies hydrological processes such as snowmelt and soil moisture dynamics, which may not fully represent real-world conditions, especially in complex terrain or areas with non-uniform land cover. Its simple linear reservoir approach for channel routing may also not accurately capture the dynamics of fast-flowing events or flood routing. Errors can arise from inaccuracies in snowmelt estimation, soil moisture dynamics, and uncertainties in parameter values, particularly in regions with limited observational data. The hybrid model combining HBV-light and LSTM introduces complexity in model interpretation due to the interaction between the components and requires careful calibration and validation of both parts. Errors may emerge from the integration of HBV-light and LSTM components, inconsistent outputs, and limited transferability of LSTM models across different hydrological conditions and regions. Additionally, errors in input data quality, such as biases in meteorological data or inaccuracies in historical hydrological records, can affect the hybrid model's performance.
The findings of this study contribute to the ongoing discourse on the integration of machine learning techniques with traditional hydrological models. While the hybrid-lumped/LSTM model showed promise in enhancing the accuracy of hydrological simulations, further research is warranted to optimize the model's performance, especially in capturing extreme events. This study underscores the importance of considering both lumped and semi-distributed approaches, along with machine learning techniques, to advance hydrological modeling capabilities and improve decision-making in water resource management.
Overall, while the hybrid model offered some improvement over the individual models, all three models faced challenges in accurately simulating high-flow events. This highlights the limitations of these models in capturing the complex dynamics of watersheds susceptible to extreme weather events.
Future research directions:
Explore alternative modeling approaches, such as fully distributed models, that can incorporate additional spatial and physical characteristics of the watershed.
Investigate the integration of data assimilation techniques to improve model performance during high-flow events.
Employ more advanced machine learning models or hybrid approaches, potentially incorporating additional relevant data sources (e.g., remote sensing data) to enhance model accuracy.
Utilize more sophisticated optimization techniques, like meta heuristic approaches, in hybrid and machine learning models to enhance result accuracy.
By addressing these limitations and exploring new avenues, future research can contribute to the development of more robust and reliable hydrological models for accurate streamflow simulation, particularly in watersheds prone to extreme weather events.
ACKNOWLEDGEMENT
We appreciate the Iran Meteorological Organization for providing the data in this study.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.