ABSTRACT
Accurate streamflow simulation is crucial for effective hydrological management, especially in regions like the upper Baro watershed, Ethiopia, where data scarcity challenges conventional modeling approaches. This study evaluates the efficacy of three hydrological models: the Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS), artificial neural network (ANN), and support vector regression (SVR) in predicting runoff. Using data from 2000 to 2016, the analysis focused on various performance metrics such as the Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and coefficient of determination (R2). The results indicated that the ANN model significantly outperformed the others, achieving an NSE of 0.98, RMSE of 24 m3/s, and R2 of 0.99. In comparison, the HEC-HMS model yielded an NSE of 0.85, RMSE of 113.4 m3/s, and R2 of 0.89, while the SVR model displayed an NSE of 0.97, RMSE of 27 m3/s, and R2 of 0.99. These findings highlight the superior performance of ANN in regions with limited hydrological data, suggesting its potential as a reliable alternative to traditional physical models. By demonstrating the efficacy of machine learning models, this research facilitates the way for innovative approaches to water resource management, offering valuable insights for policymakers and practitioners.
HIGHLIGHTS
Accurate simulation of runoff is vital for effective water resource management.
The HEC-HMS, a widely recognized physical model in hydrology, was juxtaposed against the ANN and SVR models.
The ANN model significantly outperformed the others in terms of prediction accuracy and consistency.
Data-driven models like ANN and SVR can be vital for effective water resource management for regions facing data scarcity.
INTRODUCTION
Accurately modeling streamflow is of paramount importance in properly planning how to respond to flooding, drought, water allocation, and reservoir operation (Fan et al. 2020; Xiang et al. 2020). In reservoir irrigation planning with multi-objective optimization, accurate streamflow prediction plays a crucial role. It assists in evaluating water availability and making informed decisions regarding reservoir operations (Khazaipoul et al. 2019). The data on streamflow is essential for identifying optimal reservoir operation strategies, enabling efficient management of reservoirs across various scenarios to ensure water availability for purposes such as irrigation, hydropower generation, and environmental conservation (Hatamkhani et al. 2021). Additionally, accurate streamflow predictions are vital for assessing the impacts of climate change on crop patterns and resolving water allocation conflicts (Akbari et al. 2022). This information offers valuable insights into shifting water availability, facilitating stakeholder adaptation of agricultural practices, and effective allocation of water resources.
The majority of streamflow data comes from riverside stations, but access to this data is decreasing in several areas of the world (Sichangi et al. 2018). In research conducted by Tourian et al. (2013), they applied the data from the Global Runoff Data Centre to investigate the pattern in several stations having streamflow data. Their investigation showed a remarkable decline in the overall measured yearly runoff from 1970 to 2010. Additionally, malfunctioning gauging stations and inadequate streamflow observation exacerbate the situation in developing countries (Sichangi et al. 2016), which is similar to Ethiopia (Mekonnen et al. 2009). This constraint stresses how vital it is to look into an effective streamflow model.
In this regard, rainfall-runoff models, including conceptual, physical, and data-driven models, play a crucial role in streamflow simulation (Park & Markus 2014; Meng et al. 2016). While physically based models are essential for comprehensively understanding hydrological processes and the physical phenomena within a hydrological system, they do come with limitations (Sahoo et al. 2017; Yifru et al. 2024). Validating these models can be challenging in watersheds that lack adequate historical data (Yifru et al. 2024). A conceptual model simplifies the water movement in the hydrological cycle. It is easy to understand and needs less data, but it can be inaccurate due to oversimplification, may miss complexities, and relies on assumptions that might not be accurate.
The Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) is a physical-based model that has several distinct advantages over other hydrological models, particularly in its applicability to varied hydrological problems and its adaptability to different environmental settings. Below are the key strengths of the HEC-HMS model that are highlighted in this study:
1. Physical-based modeling: HEC-HMS is a physical-based model that uses detailed descriptions of physical hydrological processes, which can be more interpretable for hydrologists and water resource managers. This makes it particularly useful in studies where understanding the interaction of various hydrological processes is crucial (Chekole et al. 2024; Hassaan et al. 2024).
2. Flexibility and extensibility: HEC-HMS offers extensive options for customization and extension, including the ability to integrate user-defined components. This flexibility allows it to be adapted for a wide range of applications, from urban drainage to agricultural and large-scale river basin management (Rauf & Ghumman 2018).
3. Data integration: Unlike some machine learning (ML) models that require large datasets for training, HEC-HMS can be effectively used with limited data by incorporating physically based parameters that are often known or can be estimated from typical watershed characteristics (Guduru et al. 2022; Khaira 2024).
4. Wide acceptance and support: As a tool developed by the US Army Corps of Engineers, HEC-HMS benefits from widespread recognition and a robust support network (Peker et al. 2024). This includes comprehensive documentation and a large community of users, which facilitates knowledge sharing and technical support.
5. Educational utility: Due to its detailed representation of hydrological processes and open availability, HEC-HMS is widely used in academic settings for educational purposes, helping new hydrologists understand fundamental concepts through practical application (Hamdan et al. 2021).
6. Cost-effectiveness: HEC-HMS is freely available, which reduces the barriers to entry for its use, especially in developing countries or among researchers with limited funding. To further explore these advantages, this study contrasts the performance of HEC-HMS with ML models such as artificial neural network (ANN) and support vector regression (SVR). While ML models may excel in predictive accuracy, especially in data-rich environments, HEC-HMS offers a robust framework for understanding hydrological variability and supports strategic decision-making through its detailed process simulation (Deulkar et al. 2024). This detailed comparative analysis highlights the conditions under which each model type may be preferable, offering insights into their optimal applications in water resource management. The HEC-HMS model has been effectively used in event-based or continuous hydrological modeling (Hamdan et al. 2021; Shakarneh et al. 2022).
Several studies have utilized the HEC-HMS model to predict runoff. Namara et al. (2020) used it in the upper Awash watershed, Tassew et al. (2019) in the Lake Tana Basin, and Guduru et al. (2022) in the Meki River watershed of Ethiopia. These studies confirmed the model's accuracy through historical streamflow data, showing high performance with an R2 value above 0.8. The Nash–Sutcliffe efficiency (NSE) was used to compare predicted and measured hydrographs, indicating the model's suitability for hydrological modeling. Due to the physical HEC-HMS method constraints in handling larger watershed areas, data-driven methods are being used as complementary tools (Rajaee et al. 2020; Zounemat-Kermani et al. 2021).
Contrary to physical models, data-driven models neglect the knowledge of the physical processes and rely on the data describing input and output characteristics (Kan et al. 2017). The limitation is the lack of interpretability of the models. Despite this limitation, data-driven models have consistently shown superior performance compared with traditional approaches. This performance advantage has fueled a surge in the adoption of data-driven modeling for hydrological analysis (Radfar & Rockaway 2016).
ANNs are such data-driven models that are well suited to dynamic non-linear system modeling. This technique is utilized in hydrology as an alternative to traditional models since it can capture non-linearity and non-stationary hydrological events (Radfar & Rockaway 2016; Tamiru & Dinka 2021). Several studies have utilized ANN for rainfall-runoff simulation. Mohseni & Muskula's (2023) work focused on developing ANN models for runoff prediction in the Yerli sub-catchment of the upper Tapi basin, showing strong model performance. Turhan (2021) studied rainfall-runoff relationships at the Nergizlik Dam in Turkey's Seyhan sub-basin, finding reliable results with ANN methods. Tamiru & Dinka (2021) demonstrated the strong predictive capabilities of an ANN model for flood forecasting in the lower Baro Akobo River basin, Ethiopia.
Similar to ANN, the SVR is a data-driven model that can be used for the rainfall-runoff process (Parisouj et al. 2022). While SVR offers high accuracy and robustness in predicting runoffs, users should consider the computational complexity and parameter tuning required for effective implementation. Several studies compared SVR with other ML models for runoff prediction. Badrzadeh et al. (2015) assessed SVR, ANN, and auto-regressive moving average for monthly runoff prediction, finding SVR superior performance. Young et al. (2017) studied ANN and SVR in hourly runoff data forecasting, showing both performed well. He et al. (2014) explored ANN, adaptive neuro-fuzzy inference system, and SVR for semi-arid areas, with SVR outperforming both models.
There are some studies done on the comparison of physical HEC-HMS with ANN and SVR in runoff simulation. Gholami & Khaleghi (2021) consider a comparison between the ANN and the HEC-HMS approach for runoff prediction. The findings demonstrated that the ANN method showed superior performance than HEC-HMS. Furthermore, Young & Liu (2015) used ANN along with HEC-HMS for hourly runoff forecasting to improve the HEC-HMS accuracy. The finding demonstrated that the ANN surpasses HEC-HMS in accuracy. Chiang et al. (2022) compared rainfall-runoff simulation using SVR and HEC-HMS in a rural Taiwanese watershed. SVR outperformed HEC-HMS due to the SVR requiring less parameter optimization and HEC-HMS optimization complexity. Hussain et al. (2021) compared short-term flood forecasting models and found SVR outperformed the HEC-HMS model.
This research employs the semi-distributed physical model, HEC-HMS, alongside cutting-edge ML models, ANN, and SVR, to predict runoff in Ethiopia's upper Baro watershed. These models were selected due to the limited data availability in the region, which poses challenges for conventional modeling techniques. The integration of these diverse methodologies aligns with the overarching objective of this study: to enhance the accuracy of runoff predictions and provide a comparative analysis of the efficiency between physical and data-driven approaches in a data-scarce environment.
Given the critical importance of effective water resource management in Ethiopia, this study not only aims to advance scientific understanding in hydrological modeling but also seeks to offer practical solutions for regions with similar challenges. By employing ANN and SVR, which are less reliant on extensive data for calibration compared with traditional physical models, this research contributes to the body of knowledge by demonstrating their potential to improve prediction accuracy under data constraints.
Furthermore, this study contributes to the hydrology field by:
1. Providing a comprehensive evaluation of the performance of traditional and ML models under varying hydrological conditions in the upper Baro watershed.
2. Exploring the sensitivity of input variables in ML models could guide the optimization of these models in other similar regions.
3. Offering insights into the application of ML techniques in areas where conventional hydrological data collection is challenging, thus broadening the potential for these technologies in global water resource management.
STUDY AREA AND DATA DESCRIPTION
Study domain
Data used and its sources
The input data utilized in runoff prediction has been categorized into hydrological (streamflow), meteorological (rainfall, temperature, and evapotranspiration), and physiographic (land use/cover, soil data, and elevation map). The 14 rain gauge stations situated within and near the upper Baro watershed (Figure 1) received the necessary daily rainfall as well as temperature data from the Ethiopian Meteorological Institute between 2000 and 2016. The daily flow rate of the Gambella gauging station was provided by the Ethiopian Ministry of Water Resources (MoWR). Three river gauging stations close to the region's outlet provided additional stream flow data (see Figure 1). The data from these three river stations served two purposes: filling in missing data and conducting consistency tests. Physiographic data are obtained from a range of sources, including the digital elevation model that was acquired from https://www.usgs.gov/. ArcGIS was used to process these data to extract physical and hydrological parameters related to watersheds. Data on soil and land cover/use were gathered from the MoWR.
Methods
Data quality control
To maintain the highest standards of data quality, our study utilized a multifaceted approach to data quality control and processing. The methods included:
1. Gap filling: We employed the normal ratio method to estimate missing rainfall data, which calculates missing values based on the ratio of average rainfall at neighboring stations (De Silva et al. 2007; Burhanuddin et al. 2017).
2. Data consistency checks: We conducted a double mass curve analysis to assess the consistency of the cumulative data records, ensuring that the data series are homogenous and reliable over time (Namara et al. 2020).
3. Error estimation: Linear regression was used to estimate discrepancies in discharge data by comparing observed values with predicted values based on historical trends (Noori et al. 2010).
4. Outlier detection: Statistical techniques, including Z-score and Grubbs' test, were applied to identify and handle outliers in the dataset, ensuring that the analysis was not skewed by anomalous values (Mohammed & Scholz 2023).
5. Homogeneity testing: We tested the homogeneity of the meteorological data using the Von Neumann ratio test at a 95% confidence level applied for this purpose, which helps in identifying any abrupt changes within the data series (Kabbilawsh et al. 2023).
To maintain transparency and provide comprehensive details to the academic community, these methodologies are elaborated further in the Supplementary material. This addition will allow other researchers to understand and replicate these quality control and data processing steps, ensuring the reproducibility and reliability of the findings.
ML approach
Determination of data-based model inputs
This study focuses on runoff modeling strategies that predict outputs based on several input variables: rainfall, runoff, temperature, evapotranspiration, and base flow data. To consider the time delay between rainfall and runoff, which represents daily groundwater storage, the current work uses ANN and SVR approaches. These approaches incorporate lagged rainfall and runoff as input data, aligning with previous work by Sayed et al. (2023). Auto-correlation function, partial auto-correlation function, and cross-correlation function were used to find the proper delays of rainfall (P), temperature (T), evapotranspiration (ETP), and streamflow (Q) for predicting current discharge (Qt) in the data-based approach. The predictors in this study include the antecedent runoff, which indicates initial basin conditions, and climate variables at different lags. The lag that exhibited the highest correlation with the dependent variable (runoff discharge) was chosen as the optimal input. Consequently, for the time of runoff, 11 input variables were prepared: Tt, Tt−1, Pt, Pt−1, Pt−2, ETPt, ETPt−1, ETPt−2, Qt−1, Qt−2, and Bt (base flow), where t − 1 and t − 2 represent lag 1 and lag 2, respectively.
Data splitting
To develop both the process-based and data-based models, the whole database was divided into two sets: 65% (2000–2010) of seen data and 35% (2011–2016) of unseen data. The 65% of seen data was used for building the model, and the leftover 35% of unseen data was applied for checking the performance following the model development.
Input combination sensitivity
To create the intended strategy, the rainfall-runoff process was taken to be a Markovian process, which states that the discharge quantity at a particular point in space and time is a mathematical relation of a finite number of prior observations (Aytek et al. 2008; Narayana Reddy & Pramada 2022). Model variables can be stated using this supposition, as shown in Table 1. In this research, five scenarios were considered to know the sensitivity of the different input combinations on the output discharge. Depending on the different performance evaluation metrics considered; R2, NSE, and RMSE, the second scenario performed well in the data-based model (Table 1).
Scenario . | Input combination . | Output . | Model structure . |
---|---|---|---|
1 | Tt, ETPt,, Pt, Qt−1, Qt−2, and Bt | Qt | (6,10,1) |
2 | Pt, Pt−2, Qt−1,Tt, ETPt−1, ETPt−2, and Bt | Qt | (7,10,1) |
3 | Pt, Pt−1, Pt−2,Qt−1, Qt−2,Tt, ETPt−1, ETPt, and Bt | Qt | (9,10,1) |
4 | Pt, Pt−1, Qt−1, Qt−2, Tt−1, ETPt, Pt−2, and Bt | Qt | (8,10,1) |
5 | Pt, Pt−1,Pt−2,Qt−1,Qt−2, Tt, Tt−1, ETPt, ETPt−1, ETPt−2, Bt | Qt | (11,10,1) |
Scenario . | Input combination . | Output . | Model structure . |
---|---|---|---|
1 | Tt, ETPt,, Pt, Qt−1, Qt−2, and Bt | Qt | (6,10,1) |
2 | Pt, Pt−2, Qt−1,Tt, ETPt−1, ETPt−2, and Bt | Qt | (7,10,1) |
3 | Pt, Pt−1, Pt−2,Qt−1, Qt−2,Tt, ETPt−1, ETPt, and Bt | Qt | (9,10,1) |
4 | Pt, Pt−1, Qt−1, Qt−2, Tt−1, ETPt, Pt−2, and Bt | Qt | (8,10,1) |
5 | Pt, Pt−1,Pt−2,Qt−1,Qt−2, Tt, Tt−1, ETPt, ETPt−1, ETPt−2, Bt | Qt | (11,10,1) |
P, rainfall; T, temperature; ETP, evapotranspiration; Q, streamflow; t, period in day; t − 1, lag 1; t − 2, lag 2; B, base flow.
ANN approach
The primary components of the ANN algorithm used in this research are as follows:
1. Architecture and structure: The ANN model consists of three hidden layers, each containing 10 hidden nodes, along with an input layer including 6–11 input nodes and a single output node.
2. Input parameters: For the feed-forward ANN architecture, seven input data were applied for streamflow prediction among different scenarios used: Pt, Pt−2, Qt−1, Tt, ETPt−2, ETPt−1, and Bt. These parameters help as inputs to the ANN and are linked to the hidden nodes.
3. Optimization method: To optimize the ANN method, the Adam solver, a recognized optimization algorithm, was used to update the weights. Moreover, the grid search technique was implemented to explore various hyperparameter configurations and find the optimal values for parameters like the hidden layer's number, nodes number in each hidden layer, maximum iteration, activation function, solver, and learning rate.
4. Streamflow prediction: After the ANN is trained on the given inputs and streamflow data, the ANN can generate predictions on fresh data by feeding the input variables into the trained network.
SVR approach
The SVR method in this study was run with the following hyperparameters: Kernel = radial basis function (RBF), ε = 0.01 (maximum deviation permitted), and C = 2 (controls the penalty for error). RBF is used to convert the input vectors into a higher-dimensional feature space, where linear separation may be easier. RBF kernels are frequently employed to capture complicated relationships, as they calculate the similarity between two samples based on their Euclidean distance. Optimization techniques for SVR involve finding the optimal hyperparameters. This research utilized a 10-fold cross-validation technique to minimize the RMSE function for optimizing parameters in SVR and ANN models during training. In this research, grid searching is one such method that was applied. To find the combination that produces the highest performance, a predetermined set of hyperparameter variables is methodically searched.
Rainfall-runoff modeling using HEC-HMS
The HEC-HMS is a runoff simulation program widely used around the world (Sayed et al. 2023). Major parts of the HEC-HMS element involve the control specification, the meteorological model, and the basin (Guduru et al. 2022). It used the analyzed hydrometrological data, and the curve number (CN) produced from soil and land use data as input variables for runoff simulation. This study has applied soil conservation service (SCS)-CN for loss calculation, SCS-unit hydrography (SCS-UH) for runoff transformation, Muskinghum for flood routing, and a constant monthly model for base flow separation in line with the methodology of Namara et al. (2020) and Guduru et al. (2022). This method was selected based on data availability and applicability. The study estimated the spatial rainfall using the Thiessen polygon method.
Rainfall loss method
Rainfall-runoff transform method: SCS-UH
where Tlag stands for lag time (h), L is the hydraulic length of the basin (in feet), γ is basin slope (in percent), and S is total loss (in).
Routing method: Muskinghum
Sensitivity analysis
After determining the optimized parameters, their sensitivity is then assessed. Notably, CN, basin lag time, Muskinghum K, and Muskinghum X exhibit higher sensitivity among these parameters. These values are adjusted by both a 30% increase and decrease from the optimized value for the CN, and by 20, 15, and 10% variations for basin lag time, Muskinghum K, and Muskinghum X, respectively. The impact of these changes on the total volume of runoff is subsequently evaluated.
Model performance assessment
Four selected statistical indicators were employed to assess the effectiveness of the data-based and physical model (HEC-HMS): the correlation coefficient (R2), NSE, root mean square error (RMSE), and mean absolute error (MAE).
In this case, Qm represents the measured flow rate, Qp is the predicted flow rate, and Qmav is the average measured discharge. Additionally, n denotes the total number of observations.
RESULTS
HEC-HMS model
In this study, the HEC-HMS model was used for streamflow simulation due to its robust ability to model key hydrologic processes affecting streamflow. This model allows for complex simulation of rainfall-runoff processes like infiltration, evapotranspiration, and routing, which are essential for precise streamflow predictions. Water use does not significantly influence hydrological processes in the upper Baro watershed due to the underutilization of water resources and a lack of reliable water consumption data. As a result, the study has chosen not to include it in the HEC-HMS modeling. The analysis focuses solely on natural factors impacting hydrological patterns in the area.
HEC-HMS model parameters
Element . | Parameter . | Units . | Minimum value . | Maximum value . | Optimized value . |
---|---|---|---|---|---|
R100 | Muskinghum K | HR | 0.1 | 150 | 33.073 |
R130 | Muskinghum X | 0 | 0.5 | 0.5 | |
W470 | CN | 30 | 99 | 75.792 | |
W390 | CN | 30 | 99 | 80.344 | |
W430 | CN | 30 | 99 | 75.887 | |
W520 | Basin lag time | MIN | 0 | 30,000 | 1,440 |
W730 | CN | 30 | 99 | 61.848 | |
W700 | Basin lag time | MIN | 0 | 30,000 | 1,354 |
R140 | Muskinghum K | HR | 0.1 | 150 | 66.073 |
W690 | CN | 30 | 99 | 67.772 | |
W670 | CN | 30 | 99 | 64.571 | |
W610 | CN | 30 | 99 | 64.735 | |
R90 | Muskinghum X | 0 | 0.5 | 0.37255 | |
W680 | CN | 30 | 99 | 72.274 |
Element . | Parameter . | Units . | Minimum value . | Maximum value . | Optimized value . |
---|---|---|---|---|---|
R100 | Muskinghum K | HR | 0.1 | 150 | 33.073 |
R130 | Muskinghum X | 0 | 0.5 | 0.5 | |
W470 | CN | 30 | 99 | 75.792 | |
W390 | CN | 30 | 99 | 80.344 | |
W430 | CN | 30 | 99 | 75.887 | |
W520 | Basin lag time | MIN | 0 | 30,000 | 1,440 |
W730 | CN | 30 | 99 | 61.848 | |
W700 | Basin lag time | MIN | 0 | 30,000 | 1,354 |
R140 | Muskinghum K | HR | 0.1 | 150 | 66.073 |
W690 | CN | 30 | 99 | 67.772 | |
W670 | CN | 30 | 99 | 64.571 | |
W610 | CN | 30 | 99 | 64.735 | |
R90 | Muskinghum X | 0 | 0.5 | 0.37255 | |
W680 | CN | 30 | 99 | 72.274 |
R, reach; W, sub-basin; X, weighted coefficient of discharge; K, flood wave travel time; HR, hour; MIN, minute.
The HEC-HMS model's performance result
Inputs parameter sensitivity analysis for the ML method
In this research work on SVR and ANN input parameter sensitivity, the importance of the mentioned scenarios is notable. Adding more data did not result in a significant correlation between input features and targets. It should be noted that having a smaller set of clean data is preferable to a larger set of messy data. The common belief that increased data leads to better regression models is inaccurate. As shown in Table 1, scenarios 1, 2, 3, 4, and 5 utilized 6, 7, 9, 8, and 11 input variables, respectively. Notably, the second scenario with seven input variables displayed greater accuracy than the other mentioned scenarios.
ML approach performance analysis
Additionally, the statistical analysis produced very good results for both ANN and SVR, as shown by strong R2 and NSE values that are consistent with well-known studies (Kan et al. 2020; Vidyarthi et al. 2020; Tamiru & Dinka 2021). Moreover, both methods achieved NSE and R2 values greater than 0.75 (considered indicative of very good performance according to Nash & Sutcliffe (1970) and Moriasi et al. (2015)).
Hyper-tuning parameters of ANN and SVR
Table 3 displays the optimized hyperparameters in the ANN model found through grid search with various tested values. In the SVR model, key parameters of the radial basis kernel function, including penalty coefficient (C), gamma, and tolerance threshold (ε), were optimized within specified ranges. SVR uses the ε parameter to measure the deviation between observed and predicted values, with a chosen value of 0.01 in this study. Additionally, the cost of error C was set to 2 to control the function's flatness. The refined parameters from both SVR and ANN models were then applied to simulate runoff.
Parameter . | Optimized value . |
---|---|
Hidden layers | 3 |
Nodes | 10 |
Maximum iteration | 10,000 |
Initial learning rate | 0.001 |
Activation function | ReLU |
Solver | Adam |
Learning rate | Constant |
Parameter . | Optimized value . |
---|---|
Hidden layers | 3 |
Nodes | 10 |
Maximum iteration | 10,000 |
Initial learning rate | 0.001 |
Activation function | ReLU |
Solver | Adam |
Learning rate | Constant |
Statistical indicators . | ANN . | SVR . | HEC-HMS . | |||
---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | Calibration . | Validation . | |
R2 | 0.99 | 0.9925 | 0.98 | 0.991 | 0.867 | 0.89 |
NSE | 0.993 | 0.98 | 0.98 | 0.97 | 0.87 | 0.85 |
RMSE (m3/s) | 27.74 | 24 | 38 | 27 | 122.9 | 113.4 |
MAE (m3/s) | 13.23 | 12.94 | 21.5 | 16.1 | 80.4 | 77.4 |
Statistical indicators . | ANN . | SVR . | HEC-HMS . | |||
---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | Calibration . | Validation . | |
R2 | 0.99 | 0.9925 | 0.98 | 0.991 | 0.867 | 0.89 |
NSE | 0.993 | 0.98 | 0.98 | 0.97 | 0.87 | 0.85 |
RMSE (m3/s) | 27.74 | 24 | 38 | 27 | 122.9 | 113.4 |
MAE (m3/s) | 13.23 | 12.94 | 21.5 | 16.1 | 80.4 | 77.4 |
ML and HEC-HMS model comparison
In this research, ML models outperformed HEC-HMS in the watershed based on their consistency and prediction accuracy. Remarkably, the performance during testing is better than during training. HEC-HMS did not demonstrate superior performance as expected, likely because of the parameters taken out of the field. The concept of stationarity and heterogeneity of the selected catchments necessitates significant efforts in determining the optimum parameter using the HEC-HMS approach.
DISCUSSION
This study evaluates the efficacy of three hydrological models: the HEC-HMS, ANN, and SVR in predicting runoff. The finding demonstrates that the ANN model performed better than the HEC-HMS and SVR methods in terms of R2 and NSE values, indicating its greater accuracy in predicting daily runoff. The ANN model achieved an NSE of 0.98, which is comparable to or exceeds results reported in similar studies. For instance, a study by Xiang et al. (2020) using long short-term memory-based models for rainfall-runoff modeling reported an NSE of 0.90, demonstrating the competitive performance of ANN in this study. Additionally, the RMSE value achieved by ANN in this research is 27.74 m3/s, which is lower than the 170 m3/s reported by Fan et al. (2020) in their study using traditional hydrological models, underscoring the effectiveness of ML models in capturing the complexities of hydrological processes in data-scarce environments.
Furthermore, the performance of the HEC-HMS model in this study, with an NSE of 0.85, aligns with the findings of Namara et al. (2020), who reported an NSE of 0.739 using the HEC-HMS model for runoff prediction in a similar geographic setting. Additionally, the performance of the HEC-HMS in this study aligns with the findings of Guduru et al. (2022), who reported an NSE of 0.804 using the HEC-HMS model for runoff prediction. This comparison not only validates the model's performance but also highlights the consistency of HEC-HMS in diverse applications.
In this research, HEC-HMS and SVR models successfully represented low-flow conditions but faced challenges with peak flows. On the other hand, the ANN algorithm excelled in accurately illustrating both high and low flows with superior precision. Furthermore, the ANN model outperformed HEC-HMS and SVR in precisely capturing the measured flow during both the rising and falling limbs of the hydrograph peak flow.
Despite SVR and HEC-HMS's good performance, the results showed that the ANN model outperformed them both significantly in terms of prediction accuracy and consistency. Due to the crucial significance of efficient water resource management in Ethiopia, this research study strives to enhance scientific knowledge in hydrological modeling while providing practical solutions for areas facing similar issues. Through the utilization of ANN and SVR, which require less data for calibration compared with conventional physical models, this investigation adds to the existing knowledge by showcasing their ability to enhance prediction accuracy when data availability is limited.
While the use of a substantial dataset including 17 years of hydrometeorological data for rainfall-runoff modeling is advantageous to this study, it is crucial to recognize that data scarcity has resulted in some restrictions. The findings' generalizability outside the particular period and geographic extent covered may be limited due to the constrained nature of the accessible dataset.
Policymakers can improve the accuracy of their forecasts and decision-making processes concerning infrastructure planning, flood control, and water resource management by utilizing ANN. It would be beneficial to prioritize the uptake and development of ANN-based models to develop more resilient and successful policies that address the effects of extreme weather events and guarantee sustainable water management practices.
Future research could focus on hybrid modeling approaches that leverage the strengths of both HEC-HMS and ML models to enhance runoff predictions. Additionally, efforts should be made to improve the transparency and interpretability of ML models for better integration into decision-making processes within hydrological studies.
CONCLUSIONS
In this study, a comprehensive evaluation of three distinct hydrological models: HEC-HMS, ANN, and SVR is undertaken in predicting runoff in the upper Baro watershed. The finding reveals that the ANN model performed better than the HEC-HMS and SVR methods in terms of R2 and NSE values, indicating its greater accuracy in predicting daily runoff.
Given the critical importance of effective water resource management in Ethiopia, this study not only aims to advance scientific understanding in hydrological modeling but also seeks to offer practical solutions for regions with similar challenges. By employing ANN and SVR, which are less reliant on extensive data for calibration compared with traditional physical models, this research contributes to the body of knowledge by demonstrating their potential to improve prediction accuracy under data constraints. The results have important implications for water resource management, allocation, flood risk planning, and damage assessment.
Future research could focus on hybrid modeling approaches that leverage the strengths of both HEC-HMS and ML models to enhance runoff predictions. Additionally, efforts should be made to improve the transparency and interpretability of ML models for better integration into decision-making processes within hydrological studies.
ACKNOWLEDGEMENTS
The authors would like to thank the Ethiopian Ministry of Water Resource (MoWR) and the Ethiopian National Metrological Institute (NMI) for providing us with the data for this study. Additionally, the author would like to thank the Haramaya University for providing the opportunity for my PhD studies and for sponsoring my tuition. I want to express my gratitude to the anonymous reviewer for their insightful comments, which helped the paper's quality greatly.
STATEMENT OF DECLARATION
I certify that the information presented here is true and complete to the best of my knowledge. I declare that this work has not been published elsewhere and has not been submitted to any other journal for publication.
FINANCIAL DISCLOSURE STATEMENT
The authors did not receive support from any organization for the submitted work.
AUTHOR CONTRIBUTIONS
All authors contributed to the study's conception and design. Material preparation, data collection, and analysis were performed by Y.B.E. The first draft of the manuscript was written and edited by Y.B. Both A.K. and M.M. read and commented on previous versions of the manuscript. The final version proofread was undertaken by Y.B. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.