Accurate streamflow simulation is crucial for effective hydrological management, especially in regions like the upper Baro watershed, Ethiopia, where data scarcity challenges conventional modeling approaches. This study evaluates the efficacy of three hydrological models: the Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS), artificial neural network (ANN), and support vector regression (SVR) in predicting runoff. Using data from 2000 to 2016, the analysis focused on various performance metrics such as the Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and coefficient of determination (R2). The results indicated that the ANN model significantly outperformed the others, achieving an NSE of 0.98, RMSE of 24 m3/s, and R2 of 0.99. In comparison, the HEC-HMS model yielded an NSE of 0.85, RMSE of 113.4 m3/s, and R2 of 0.89, while the SVR model displayed an NSE of 0.97, RMSE of 27 m3/s, and R2 of 0.99. These findings highlight the superior performance of ANN in regions with limited hydrological data, suggesting its potential as a reliable alternative to traditional physical models. By demonstrating the efficacy of machine learning models, this research facilitates the way for innovative approaches to water resource management, offering valuable insights for policymakers and practitioners.

  • Accurate simulation of runoff is vital for effective water resource management.

  • The HEC-HMS, a widely recognized physical model in hydrology, was juxtaposed against the ANN and SVR models.

  • The ANN model significantly outperformed the others in terms of prediction accuracy and consistency.

  • Data-driven models like ANN and SVR can be vital for effective water resource management for regions facing data scarcity.

Accurately modeling streamflow is of paramount importance in properly planning how to respond to flooding, drought, water allocation, and reservoir operation (Fan et al. 2020; Xiang et al. 2020). In reservoir irrigation planning with multi-objective optimization, accurate streamflow prediction plays a crucial role. It assists in evaluating water availability and making informed decisions regarding reservoir operations (Khazaipoul et al. 2019). The data on streamflow is essential for identifying optimal reservoir operation strategies, enabling efficient management of reservoirs across various scenarios to ensure water availability for purposes such as irrigation, hydropower generation, and environmental conservation (Hatamkhani et al. 2021). Additionally, accurate streamflow predictions are vital for assessing the impacts of climate change on crop patterns and resolving water allocation conflicts (Akbari et al. 2022). This information offers valuable insights into shifting water availability, facilitating stakeholder adaptation of agricultural practices, and effective allocation of water resources.

The majority of streamflow data comes from riverside stations, but access to this data is decreasing in several areas of the world (Sichangi et al. 2018). In research conducted by Tourian et al. (2013), they applied the data from the Global Runoff Data Centre to investigate the pattern in several stations having streamflow data. Their investigation showed a remarkable decline in the overall measured yearly runoff from 1970 to 2010. Additionally, malfunctioning gauging stations and inadequate streamflow observation exacerbate the situation in developing countries (Sichangi et al. 2016), which is similar to Ethiopia (Mekonnen et al. 2009). This constraint stresses how vital it is to look into an effective streamflow model.

In this regard, rainfall-runoff models, including conceptual, physical, and data-driven models, play a crucial role in streamflow simulation (Park & Markus 2014; Meng et al. 2016). While physically based models are essential for comprehensively understanding hydrological processes and the physical phenomena within a hydrological system, they do come with limitations (Sahoo et al. 2017; Yifru et al. 2024). Validating these models can be challenging in watersheds that lack adequate historical data (Yifru et al. 2024). A conceptual model simplifies the water movement in the hydrological cycle. It is easy to understand and needs less data, but it can be inaccurate due to oversimplification, may miss complexities, and relies on assumptions that might not be accurate.

The Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) is a physical-based model that has several distinct advantages over other hydrological models, particularly in its applicability to varied hydrological problems and its adaptability to different environmental settings. Below are the key strengths of the HEC-HMS model that are highlighted in this study:

  • 1. Physical-based modeling: HEC-HMS is a physical-based model that uses detailed descriptions of physical hydrological processes, which can be more interpretable for hydrologists and water resource managers. This makes it particularly useful in studies where understanding the interaction of various hydrological processes is crucial (Chekole et al. 2024; Hassaan et al. 2024).

  • 2. Flexibility and extensibility: HEC-HMS offers extensive options for customization and extension, including the ability to integrate user-defined components. This flexibility allows it to be adapted for a wide range of applications, from urban drainage to agricultural and large-scale river basin management (Rauf & Ghumman 2018).

  • 3. Data integration: Unlike some machine learning (ML) models that require large datasets for training, HEC-HMS can be effectively used with limited data by incorporating physically based parameters that are often known or can be estimated from typical watershed characteristics (Guduru et al. 2022; Khaira 2024).

  • 4. Wide acceptance and support: As a tool developed by the US Army Corps of Engineers, HEC-HMS benefits from widespread recognition and a robust support network (Peker et al. 2024). This includes comprehensive documentation and a large community of users, which facilitates knowledge sharing and technical support.

  • 5. Educational utility: Due to its detailed representation of hydrological processes and open availability, HEC-HMS is widely used in academic settings for educational purposes, helping new hydrologists understand fundamental concepts through practical application (Hamdan et al. 2021).

  • 6. Cost-effectiveness: HEC-HMS is freely available, which reduces the barriers to entry for its use, especially in developing countries or among researchers with limited funding. To further explore these advantages, this study contrasts the performance of HEC-HMS with ML models such as artificial neural network (ANN) and support vector regression (SVR). While ML models may excel in predictive accuracy, especially in data-rich environments, HEC-HMS offers a robust framework for understanding hydrological variability and supports strategic decision-making through its detailed process simulation (Deulkar et al. 2024). This detailed comparative analysis highlights the conditions under which each model type may be preferable, offering insights into their optimal applications in water resource management. The HEC-HMS model has been effectively used in event-based or continuous hydrological modeling (Hamdan et al. 2021; Shakarneh et al. 2022).

Several studies have utilized the HEC-HMS model to predict runoff. Namara et al. (2020) used it in the upper Awash watershed, Tassew et al. (2019) in the Lake Tana Basin, and Guduru et al. (2022) in the Meki River watershed of Ethiopia. These studies confirmed the model's accuracy through historical streamflow data, showing high performance with an R2 value above 0.8. The Nash–Sutcliffe efficiency (NSE) was used to compare predicted and measured hydrographs, indicating the model's suitability for hydrological modeling. Due to the physical HEC-HMS method constraints in handling larger watershed areas, data-driven methods are being used as complementary tools (Rajaee et al. 2020; Zounemat-Kermani et al. 2021).

Contrary to physical models, data-driven models neglect the knowledge of the physical processes and rely on the data describing input and output characteristics (Kan et al. 2017). The limitation is the lack of interpretability of the models. Despite this limitation, data-driven models have consistently shown superior performance compared with traditional approaches. This performance advantage has fueled a surge in the adoption of data-driven modeling for hydrological analysis (Radfar & Rockaway 2016).

ANNs are such data-driven models that are well suited to dynamic non-linear system modeling. This technique is utilized in hydrology as an alternative to traditional models since it can capture non-linearity and non-stationary hydrological events (Radfar & Rockaway 2016; Tamiru & Dinka 2021). Several studies have utilized ANN for rainfall-runoff simulation. Mohseni & Muskula's (2023) work focused on developing ANN models for runoff prediction in the Yerli sub-catchment of the upper Tapi basin, showing strong model performance. Turhan (2021) studied rainfall-runoff relationships at the Nergizlik Dam in Turkey's Seyhan sub-basin, finding reliable results with ANN methods. Tamiru & Dinka (2021) demonstrated the strong predictive capabilities of an ANN model for flood forecasting in the lower Baro Akobo River basin, Ethiopia.

Similar to ANN, the SVR is a data-driven model that can be used for the rainfall-runoff process (Parisouj et al. 2022). While SVR offers high accuracy and robustness in predicting runoffs, users should consider the computational complexity and parameter tuning required for effective implementation. Several studies compared SVR with other ML models for runoff prediction. Badrzadeh et al. (2015) assessed SVR, ANN, and auto-regressive moving average for monthly runoff prediction, finding SVR superior performance. Young et al. (2017) studied ANN and SVR in hourly runoff data forecasting, showing both performed well. He et al. (2014) explored ANN, adaptive neuro-fuzzy inference system, and SVR for semi-arid areas, with SVR outperforming both models.

There are some studies done on the comparison of physical HEC-HMS with ANN and SVR in runoff simulation. Gholami & Khaleghi (2021) consider a comparison between the ANN and the HEC-HMS approach for runoff prediction. The findings demonstrated that the ANN method showed superior performance than HEC-HMS. Furthermore, Young & Liu (2015) used ANN along with HEC-HMS for hourly runoff forecasting to improve the HEC-HMS accuracy. The finding demonstrated that the ANN surpasses HEC-HMS in accuracy. Chiang et al. (2022) compared rainfall-runoff simulation using SVR and HEC-HMS in a rural Taiwanese watershed. SVR outperformed HEC-HMS due to the SVR requiring less parameter optimization and HEC-HMS optimization complexity. Hussain et al. (2021) compared short-term flood forecasting models and found SVR outperformed the HEC-HMS model.

This research employs the semi-distributed physical model, HEC-HMS, alongside cutting-edge ML models, ANN, and SVR, to predict runoff in Ethiopia's upper Baro watershed. These models were selected due to the limited data availability in the region, which poses challenges for conventional modeling techniques. The integration of these diverse methodologies aligns with the overarching objective of this study: to enhance the accuracy of runoff predictions and provide a comparative analysis of the efficiency between physical and data-driven approaches in a data-scarce environment.

Given the critical importance of effective water resource management in Ethiopia, this study not only aims to advance scientific understanding in hydrological modeling but also seeks to offer practical solutions for regions with similar challenges. By employing ANN and SVR, which are less reliant on extensive data for calibration compared with traditional physical models, this research contributes to the body of knowledge by demonstrating their potential to improve prediction accuracy under data constraints.

Furthermore, this study contributes to the hydrology field by:

  • 1. Providing a comprehensive evaluation of the performance of traditional and ML models under varying hydrological conditions in the upper Baro watershed.

  • 2. Exploring the sensitivity of input variables in ML models could guide the optimization of these models in other similar regions.

  • 3. Offering insights into the application of ML techniques in areas where conventional hydrological data collection is challenging, thus broadening the potential for these technologies in global water resource management.

Study domain

The upper Baro watershed, which spans 23,400 km2, is located in Ethiopia's Baro Akobo River basin between 7°51′ and 9°54′ N and 34°50′ and 36°17′ E (Mengistu et al. 2022). This region is depicted visually in Figure 1. The southwestern highlands have the highest point at 3,266 m beyond the sea level and the lowest point at 390 m close to the outlet. Starting from the highlands in Ethiopia's southwest, this sub-basin stretches to the Gambella region's level plains (Alemseged et al. 2014). Figure 1 also depicts important features and attributes of the study region, such as topography, rainfall stations, stream flow stations, river lines, and geographic information. The upper Baro sub-basin is characterized by a variety of land use and land cover (LULC) categories, with agricultural land constituting the majority. Forests primarily occupy the mountainous sections of the region, while lower-lying areas are characterized by agriculture and woodland, as shown in the Supplementary material.
Figure 1

Description of the study domain.

Figure 1

Description of the study domain.

Close modal

Data used and its sources

The input data utilized in runoff prediction has been categorized into hydrological (streamflow), meteorological (rainfall, temperature, and evapotranspiration), and physiographic (land use/cover, soil data, and elevation map). The 14 rain gauge stations situated within and near the upper Baro watershed (Figure 1) received the necessary daily rainfall as well as temperature data from the Ethiopian Meteorological Institute between 2000 and 2016. The daily flow rate of the Gambella gauging station was provided by the Ethiopian Ministry of Water Resources (MoWR). Three river gauging stations close to the region's outlet provided additional stream flow data (see Figure 1). The data from these three river stations served two purposes: filling in missing data and conducting consistency tests. Physiographic data are obtained from a range of sources, including the digital elevation model that was acquired from https://www.usgs.gov/. ArcGIS was used to process these data to extract physical and hydrological parameters related to watersheds. Data on soil and land cover/use were gathered from the MoWR.

Methods

Data quality control

To maintain the highest standards of data quality, our study utilized a multifaceted approach to data quality control and processing. The methods included:

  • 1. Gap filling: We employed the normal ratio method to estimate missing rainfall data, which calculates missing values based on the ratio of average rainfall at neighboring stations (De Silva et al. 2007; Burhanuddin et al. 2017).

  • 2. Data consistency checks: We conducted a double mass curve analysis to assess the consistency of the cumulative data records, ensuring that the data series are homogenous and reliable over time (Namara et al. 2020).

  • 3. Error estimation: Linear regression was used to estimate discrepancies in discharge data by comparing observed values with predicted values based on historical trends (Noori et al. 2010).

  • 4. Outlier detection: Statistical techniques, including Z-score and Grubbs' test, were applied to identify and handle outliers in the dataset, ensuring that the analysis was not skewed by anomalous values (Mohammed & Scholz 2023).

  • 5. Homogeneity testing: We tested the homogeneity of the meteorological data using the Von Neumann ratio test at a 95% confidence level applied for this purpose, which helps in identifying any abrupt changes within the data series (Kabbilawsh et al. 2023).

To maintain transparency and provide comprehensive details to the academic community, these methodologies are elaborated further in the Supplementary material. This addition will allow other researchers to understand and replicate these quality control and data processing steps, ensuring the reproducibility and reliability of the findings.

ML approach

Determination of data-based model inputs

This study focuses on runoff modeling strategies that predict outputs based on several input variables: rainfall, runoff, temperature, evapotranspiration, and base flow data. To consider the time delay between rainfall and runoff, which represents daily groundwater storage, the current work uses ANN and SVR approaches. These approaches incorporate lagged rainfall and runoff as input data, aligning with previous work by Sayed et al. (2023). Auto-correlation function, partial auto-correlation function, and cross-correlation function were used to find the proper delays of rainfall (P), temperature (T), evapotranspiration (ETP), and streamflow (Q) for predicting current discharge (Qt) in the data-based approach. The predictors in this study include the antecedent runoff, which indicates initial basin conditions, and climate variables at different lags. The lag that exhibited the highest correlation with the dependent variable (runoff discharge) was chosen as the optimal input. Consequently, for the time of runoff, 11 input variables were prepared: Tt, Tt−1, Pt, Pt−1, Pt−2, ETPt, ETPt−1, ETPt−2, Qt−1, Qt−2, and Bt (base flow), where t − 1 and t − 2 represent lag 1 and lag 2, respectively.

Data splitting

To develop both the process-based and data-based models, the whole database was divided into two sets: 65% (2000–2010) of seen data and 35% (2011–2016) of unseen data. The 65% of seen data was used for building the model, and the leftover 35% of unseen data was applied for checking the performance following the model development.

Input combination sensitivity

To create the intended strategy, the rainfall-runoff process was taken to be a Markovian process, which states that the discharge quantity at a particular point in space and time is a mathematical relation of a finite number of prior observations (Aytek et al. 2008; Narayana Reddy & Pramada 2022). Model variables can be stated using this supposition, as shown in Table 1. In this research, five scenarios were considered to know the sensitivity of the different input combinations on the output discharge. Depending on the different performance evaluation metrics considered; R2, NSE, and RMSE, the second scenario performed well in the data-based model (Table 1).

Table 1

Input combination sensitivity analysis

ScenarioInput combinationOutputModel structure
Tt, ETPt,, Pt, Qt−1, Qt−2, and Bt Qt (6,10,1) 
Pt, Pt−2, Qt−1,Tt, ETPt−1, ETPt−2, and Bt Qt (7,10,1) 
Pt, Pt−1, Pt−2,Qt−1, Qt−2,Tt, ETPt−1, ETPt, and Bt Qt (9,10,1) 
Pt, Pt−1, Qt−1, Qt−2, Tt−1, ETPt, Pt−2, and Bt Qt (8,10,1) 
Pt, Pt−1,Pt−2,Qt−1,Qt−2, Tt, Tt−1, ETPt, ETPt−1, ETPt−2, Bt Qt (11,10,1) 
ScenarioInput combinationOutputModel structure
Tt, ETPt,, Pt, Qt−1, Qt−2, and Bt Qt (6,10,1) 
Pt, Pt−2, Qt−1,Tt, ETPt−1, ETPt−2, and Bt Qt (7,10,1) 
Pt, Pt−1, Pt−2,Qt−1, Qt−2,Tt, ETPt−1, ETPt, and Bt Qt (9,10,1) 
Pt, Pt−1, Qt−1, Qt−2, Tt−1, ETPt, Pt−2, and Bt Qt (8,10,1) 
Pt, Pt−1,Pt−2,Qt−1,Qt−2, Tt, Tt−1, ETPt, ETPt−1, ETPt−2, Bt Qt (11,10,1) 

P, rainfall; T, temperature; ETP, evapotranspiration; Q, streamflow; t, period in day; t − 1, lag 1; t − 2, lag 2; B, base flow.

ANN approach

The ANN model we employed is a backpropagation ANN, which contains three layers. Each neuron in the network of this ANN structure takes a weighted sum of inputs from neurons in the layer above, which it then turns into final output signals using a function called activation. For instance, the temporary signal of neuron n, Hn, is calculated as a function of the weighted sum of inputs, WIm,nIm, and transformed using the activation function f. Similarly, the final output of neuron l, O1, is determined by the weighted sum of temporary signals, WHn,lHn, and passed through the activation function f in the following equation:
(1)
To ensure effective training, the inputs to the neurons, Im, are normalized. The synaptic weights, WIm,n, and WHn,l, signify the strength of connections between neurons. The rectified linear unit (ReLU) transfer function, ReLU(x) = max (0, x), is used in the hidden and output layers. This function helps introduce non-linearity and allows the network to capture complex relationships in the data. The training process aims to minimize a cost function called (Equation (2)). The cost function is defined as follows:
(2)
where P is the length of the training data points; L is the total number of neurons in the output layer, and el (P) is the difference between the target value and the output value at neuron l for the pth training pattern. Through iteratively adjusting the weights and biases using optimization algorithms, the ANN learns to make precise predictions. When the ideal weights and biases are determined, the model can be applied to forecast stream flow values for data that has not yet been seen.

The primary components of the ANN algorithm used in this research are as follows:

  • 1. Architecture and structure: The ANN model consists of three hidden layers, each containing 10 hidden nodes, along with an input layer including 6–11 input nodes and a single output node.

  • 2. Input parameters: For the feed-forward ANN architecture, seven input data were applied for streamflow prediction among different scenarios used: Pt, Pt−2, Qt−1, Tt, ETPt−2, ETPt−1, and Bt. These parameters help as inputs to the ANN and are linked to the hidden nodes.

  • 3. Optimization method: To optimize the ANN method, the Adam solver, a recognized optimization algorithm, was used to update the weights. Moreover, the grid search technique was implemented to explore various hyperparameter configurations and find the optimal values for parameters like the hidden layer's number, nodes number in each hidden layer, maximum iteration, activation function, solver, and learning rate.

  • 4. Streamflow prediction: After the ANN is trained on the given inputs and streamflow data, the ANN can generate predictions on fresh data by feeding the input variables into the trained network.

SVR approach

The SVR method is a type of support vector machine algorithm that is used for regression. The goal is to decrease the error by adjusting the hyperplane while allowing for a certain level of error tolerance. The parameters of the linear function in Equation (3) are represented by the weightage vector (W) and the constant (b). ε is the highest value of divergence from the goal values. Slack variables ξ and ξ*, both bigger than and equal to zero, are introduced to account for deviations. The tradeoff between flatness and allowed changes in the function is influenced by the constant C. For every regression situation, the estimate function (F) has a defined value:
(3)
where Tf stands for the non-linear transfer function.

The SVR method in this study was run with the following hyperparameters: Kernel = radial basis function (RBF), ε = 0.01 (maximum deviation permitted), and C = 2 (controls the penalty for error). RBF is used to convert the input vectors into a higher-dimensional feature space, where linear separation may be easier. RBF kernels are frequently employed to capture complicated relationships, as they calculate the similarity between two samples based on their Euclidean distance. Optimization techniques for SVR involve finding the optimal hyperparameters. This research utilized a 10-fold cross-validation technique to minimize the RMSE function for optimizing parameters in SVR and ANN models during training. In this research, grid searching is one such method that was applied. To find the combination that produces the highest performance, a predetermined set of hyperparameter variables is methodically searched.

Rainfall-runoff modeling using HEC-HMS

The HEC-HMS is a runoff simulation program widely used around the world (Sayed et al. 2023). Major parts of the HEC-HMS element involve the control specification, the meteorological model, and the basin (Guduru et al. 2022). It used the analyzed hydrometrological data, and the curve number (CN) produced from soil and land use data as input variables for runoff simulation. This study has applied soil conservation service (SCS)-CN for loss calculation, SCS-unit hydrography (SCS-UH) for runoff transformation, Muskinghum for flood routing, and a constant monthly model for base flow separation in line with the methodology of Namara et al. (2020) and Guduru et al. (2022). This method was selected based on data availability and applicability. The study estimated the spatial rainfall using the Thiessen polygon method.

Rainfall loss method

Various modeling techniques are used in the HEC-HMS for loss estimates. The SCS-CN modeling method was employed in the current study because it is flexible, generally applicable for calculating runoff, requires minimal inputs, and produces accurate findings (Soulis 2021; Guduru et al. 2022) to compute excess rainfall, calculated through Equations (4)–(6). The total loss was computed with the help of Equation (4) and the initial abstraction was calculated by Equation (5):
(4)
(5)
(6)
where CN is the curve number, Ia is the initial loss (mm), S is the total loss (mm), Q is the excess rainfall (mm), and P is the total precipitation depth (mm). The CN is computed with the help of soil and land use/cover/information. To generate CN-Grid, the HEC-Geo HMS in the ArcGIS medium with topographic data was used.

Rainfall-runoff transform method: SCS-UH

The SCS-UH changes excess rainfall from a volume or depth unit into a discharge unit (Bhusal et al. 2022). The SCS-UH approach was used for this study to calculate direct runoff. SCS-UH lag time, which was exported from each sub-basin to the HEC-HMS model using HEC-Geo HMS, is the sole parameter employed in the SCS-UH approach. The watershed lag time was computed using the physical features extracted from topographical data of the study domain watershed using the following equation:
(7)

where Tlag stands for lag time (h), L is the hydraulic length of the basin (in feet), γ is basin slope (in percent), and S is total loss (in).

Routing method: Muskinghum

The Muskinghum routing method was implemented in this research because it is less complicated and requires fewer input data than other methods (Farzin et al. 2018; Namara et al. 2020; Lee 2021). The main objective of this method is to compute the runoff hydrography at the sub-basins outlet. It requires two input criteria: the attenuation flood wave (X) and the flood travel time (K) of the flood wave through routing reach. Equation (8) was utilized to generate these values during the calibration process using observed hydrometeorological data:
(8)
where K is the flood wave traveling time, S is storage, I is inflow, Q is outflow, and X is a weighting factor. Figure 2 illustrates the overall approach taken.
Figure 2

General methodology used.

Figure 2

General methodology used.

Close modal

Sensitivity analysis

After determining the optimized parameters, their sensitivity is then assessed. Notably, CN, basin lag time, Muskinghum K, and Muskinghum X exhibit higher sensitivity among these parameters. These values are adjusted by both a 30% increase and decrease from the optimized value for the CN, and by 20, 15, and 10% variations for basin lag time, Muskinghum K, and Muskinghum X, respectively. The impact of these changes on the total volume of runoff is subsequently evaluated.

Model performance assessment

Four selected statistical indicators were employed to assess the effectiveness of the data-based and physical model (HEC-HMS): the correlation coefficient (R2), NSE, root mean square error (RMSE), and mean absolute error (MAE).

To calculate these statistical indices, Equations (9)–(12) were used. During training and testing, the NSE indices are implemented to assess how well the trend of the target hydrography and the simulation agree overall. According to Moriasi et al. (2015) and Nash & Sutcliffe (1970), an NSE score of 0.75–1 indicates very good, whereas a score of 0.65–0.75 indicates good performance:
(9)
(10)
(11)
(12)

In this case, Qm represents the measured flow rate, Qp is the predicted flow rate, and Qmav is the average measured discharge. Additionally, n denotes the total number of observations.

HEC-HMS model

In this study, the HEC-HMS model was used for streamflow simulation due to its robust ability to model key hydrologic processes affecting streamflow. This model allows for complex simulation of rainfall-runoff processes like infiltration, evapotranspiration, and routing, which are essential for precise streamflow predictions. Water use does not significantly influence hydrological processes in the upper Baro watershed due to the underutilization of water resources and a lack of reliable water consumption data. As a result, the study has chosen not to include it in the HEC-HMS modeling. The analysis focuses solely on natural factors impacting hydrological patterns in the area.

HEC-HMS model parameters

The study area parameters, such as CN, basin lag time, Muskinghum K, and Muskinghum X, in Table 2 were adjusted by increments and decrements of 10, 15, 20, and 30% from their optimized values in Figure 3. The total runoff volume for the basin was then observed, as depicted in Figure 3. The research indicates that the parameter most sensitive to changes in the upper Baro watershed is the CN, with the highest level of sensitivity. Conversely, Muskinghum K and Muskinghum X exhibited minimal sensitivity, resulting in negligible alterations to the runoff volume.
Table 2

HEC-HMS optimized parameters

ElementParameterUnitsMinimum valueMaximum valueOptimized value
R100 Muskinghum K HR 0.1 150 33.073 
R130 Muskinghum X  0.5 0.5 
W470 CN  30 99 75.792 
W390 CN  30 99 80.344 
W430 CN  30 99 75.887 
W520 Basin lag time MIN 30,000 1,440 
W730 CN  30 99 61.848 
W700 Basin lag time MIN 30,000 1,354 
R140 Muskinghum K HR 0.1 150 66.073 
W690 CN  30 99 67.772 
W670 CN  30 99 64.571 
W610 CN  30 99 64.735 
R90 Muskinghum X  0.5 0.37255 
W680 CN  30 99 72.274 
ElementParameterUnitsMinimum valueMaximum valueOptimized value
R100 Muskinghum K HR 0.1 150 33.073 
R130 Muskinghum X  0.5 0.5 
W470 CN  30 99 75.792 
W390 CN  30 99 80.344 
W430 CN  30 99 75.887 
W520 Basin lag time MIN 30,000 1,440 
W730 CN  30 99 61.848 
W700 Basin lag time MIN 30,000 1,354 
R140 Muskinghum K HR 0.1 150 66.073 
W690 CN  30 99 67.772 
W670 CN  30 99 64.571 
W610 CN  30 99 64.735 
R90 Muskinghum X  0.5 0.37255 
W680 CN  30 99 72.274 

R, reach; W, sub-basin; X, weighted coefficient of discharge; K, flood wave travel time; HR, hour; MIN, minute.

Figure 3

Sensitivity of HEC-HMS model parameters for the upper Baro watershed.

Figure 3

Sensitivity of HEC-HMS model parameters for the upper Baro watershed.

Close modal

The HEC-HMS model's performance result

As shown in Figure 4, the HEC-HMS successfully catches the high flow occurrences for the majority of the simulated periods and appropriately represents the low-flow situations. Though it usually followed the falling limb trend, there were several years when the HEC-HMS underestimated the observed values during the rising limb. The study presents the statistical evaluation results of the HEC-HMS model in Table 4. The model's statistical performance assessment shows dependable outcomes, with values for R2 and NSE above 0.75. This level of performance aligns with the results of previous researchers such as Nash & Sutcliffe (1970) and Moriasi et al. (2015), who have suggested that models with NSE and R2 values greater than 0.75 are taken as very good. The finding reveals that this method is deemed appropriate for application within the research domain.
Figure 4

The HEC-HMS simulated runoff hydrography.

Figure 4

The HEC-HMS simulated runoff hydrography.

Close modal

Inputs parameter sensitivity analysis for the ML method

In this research work on SVR and ANN input parameter sensitivity, the importance of the mentioned scenarios is notable. Adding more data did not result in a significant correlation between input features and targets. It should be noted that having a smaller set of clean data is preferable to a larger set of messy data. The common belief that increased data leads to better regression models is inaccurate. As shown in Table 1, scenarios 1, 2, 3, 4, and 5 utilized 6, 7, 9, 8, and 11 input variables, respectively. Notably, the second scenario with seven input variables displayed greater accuracy than the other mentioned scenarios.

ML approach performance analysis

Figure 5(a) provides graphical plots that demonstrate how closely the ANN results align with observed daily stream flow, capturing peak and low flows effectively. Additionally, Figure 5(b) and 5(c) shows scatter plots of the ANN models, which indicate a strong relationship with the measured data.
Figure 5

ANN model: (a) hydrography, (b) scatter plot in training, and (c) scatter plot in testing.

Figure 5

ANN model: (a) hydrography, (b) scatter plot in training, and (c) scatter plot in testing.

Close modal
For the SVR model, Figure 6(a) illustrates graphical plots indicating that while it underestimated peak flow for most of the simulation period, the simulation values still followed the pattern of the measured data. Figure 6(b) and 6(c) further demonstrates a relationship between the predicted and measured data for SVR.
Figure 6

SVR model: (a) hydrography, (b) scatter plot in training, and (c) scatter plot in testing.

Figure 6

SVR model: (a) hydrography, (b) scatter plot in training, and (c) scatter plot in testing.

Close modal

Additionally, the statistical analysis produced very good results for both ANN and SVR, as shown by strong R2 and NSE values that are consistent with well-known studies (Kan et al. 2020; Vidyarthi et al. 2020; Tamiru & Dinka 2021). Moreover, both methods achieved NSE and R2 values greater than 0.75 (considered indicative of very good performance according to Nash & Sutcliffe (1970) and Moriasi et al. (2015)).

Hyper-tuning parameters of ANN and SVR

Table 3 displays the optimized hyperparameters in the ANN model found through grid search with various tested values. In the SVR model, key parameters of the radial basis kernel function, including penalty coefficient (C), gamma, and tolerance threshold (ε), were optimized within specified ranges. SVR uses the ε parameter to measure the deviation between observed and predicted values, with a chosen value of 0.01 in this study. Additionally, the cost of error C was set to 2 to control the function's flatness. The refined parameters from both SVR and ANN models were then applied to simulate runoff.

Table 3

Optimized hyperparameters using the grid search method in ANN

ParameterOptimized value
Hidden layers 
Nodes 10 
Maximum iteration 10,000 
Initial learning rate 0.001 
Activation function ReLU 
Solver Adam 
Learning rate Constant 
ParameterOptimized value
Hidden layers 
Nodes 10 
Maximum iteration 10,000 
Initial learning rate 0.001 
Activation function ReLU 
Solver Adam 
Learning rate Constant 
Table 4

Performance evaluation result for ANN, SVR, and HMS for comparison

Statistical indicatorsANN
SVR
HEC-HMS
TrainingTestingTrainingTestingCalibrationValidation
R2 0.99 0.9925 0.98 0.991 0.867 0.89 
NSE 0.993 0.98 0.98 0.97 0.87 0.85 
RMSE (m3/s) 27.74 24 38 27 122.9 113.4 
MAE (m3/s) 13.23 12.94 21.5 16.1 80.4 77.4 
Statistical indicatorsANN
SVR
HEC-HMS
TrainingTestingTrainingTestingCalibrationValidation
R2 0.99 0.9925 0.98 0.991 0.867 0.89 
NSE 0.993 0.98 0.98 0.97 0.87 0.85 
RMSE (m3/s) 27.74 24 38 27 122.9 113.4 
MAE (m3/s) 13.23 12.94 21.5 16.1 80.4 77.4 

ML and HEC-HMS model comparison

Among the five scenarios considered, scenario 2 (Figures 5 and 6) showed better performance compared with the other four scenarios. In this research, the second scenario of both ANN and SVR models was selected for comparison with the physical HEC-HMS model (refer to Figure 7) based on its superior performance as demonstrated in Table 4.
Figure 7

(a) Hydrography and (b) scatter plot comparing simulated and observed discharge.

Figure 7

(a) Hydrography and (b) scatter plot comparing simulated and observed discharge.

Close modal

In this research, ML models outperformed HEC-HMS in the watershed based on their consistency and prediction accuracy. Remarkably, the performance during testing is better than during training. HEC-HMS did not demonstrate superior performance as expected, likely because of the parameters taken out of the field. The concept of stationarity and heterogeneity of the selected catchments necessitates significant efforts in determining the optimum parameter using the HEC-HMS approach.

This study evaluates the efficacy of three hydrological models: the HEC-HMS, ANN, and SVR in predicting runoff. The finding demonstrates that the ANN model performed better than the HEC-HMS and SVR methods in terms of R2 and NSE values, indicating its greater accuracy in predicting daily runoff. The ANN model achieved an NSE of 0.98, which is comparable to or exceeds results reported in similar studies. For instance, a study by Xiang et al. (2020) using long short-term memory-based models for rainfall-runoff modeling reported an NSE of 0.90, demonstrating the competitive performance of ANN in this study. Additionally, the RMSE value achieved by ANN in this research is 27.74 m3/s, which is lower than the 170 m3/s reported by Fan et al. (2020) in their study using traditional hydrological models, underscoring the effectiveness of ML models in capturing the complexities of hydrological processes in data-scarce environments.

Furthermore, the performance of the HEC-HMS model in this study, with an NSE of 0.85, aligns with the findings of Namara et al. (2020), who reported an NSE of 0.739 using the HEC-HMS model for runoff prediction in a similar geographic setting. Additionally, the performance of the HEC-HMS in this study aligns with the findings of Guduru et al. (2022), who reported an NSE of 0.804 using the HEC-HMS model for runoff prediction. This comparison not only validates the model's performance but also highlights the consistency of HEC-HMS in diverse applications.

In this research, HEC-HMS and SVR models successfully represented low-flow conditions but faced challenges with peak flows. On the other hand, the ANN algorithm excelled in accurately illustrating both high and low flows with superior precision. Furthermore, the ANN model outperformed HEC-HMS and SVR in precisely capturing the measured flow during both the rising and falling limbs of the hydrograph peak flow.

Despite SVR and HEC-HMS's good performance, the results showed that the ANN model outperformed them both significantly in terms of prediction accuracy and consistency. Due to the crucial significance of efficient water resource management in Ethiopia, this research study strives to enhance scientific knowledge in hydrological modeling while providing practical solutions for areas facing similar issues. Through the utilization of ANN and SVR, which require less data for calibration compared with conventional physical models, this investigation adds to the existing knowledge by showcasing their ability to enhance prediction accuracy when data availability is limited.

While the use of a substantial dataset including 17 years of hydrometeorological data for rainfall-runoff modeling is advantageous to this study, it is crucial to recognize that data scarcity has resulted in some restrictions. The findings' generalizability outside the particular period and geographic extent covered may be limited due to the constrained nature of the accessible dataset.

Policymakers can improve the accuracy of their forecasts and decision-making processes concerning infrastructure planning, flood control, and water resource management by utilizing ANN. It would be beneficial to prioritize the uptake and development of ANN-based models to develop more resilient and successful policies that address the effects of extreme weather events and guarantee sustainable water management practices.

Future research could focus on hybrid modeling approaches that leverage the strengths of both HEC-HMS and ML models to enhance runoff predictions. Additionally, efforts should be made to improve the transparency and interpretability of ML models for better integration into decision-making processes within hydrological studies.

In this study, a comprehensive evaluation of three distinct hydrological models: HEC-HMS, ANN, and SVR is undertaken in predicting runoff in the upper Baro watershed. The finding reveals that the ANN model performed better than the HEC-HMS and SVR methods in terms of R2 and NSE values, indicating its greater accuracy in predicting daily runoff.

Given the critical importance of effective water resource management in Ethiopia, this study not only aims to advance scientific understanding in hydrological modeling but also seeks to offer practical solutions for regions with similar challenges. By employing ANN and SVR, which are less reliant on extensive data for calibration compared with traditional physical models, this research contributes to the body of knowledge by demonstrating their potential to improve prediction accuracy under data constraints. The results have important implications for water resource management, allocation, flood risk planning, and damage assessment.

Future research could focus on hybrid modeling approaches that leverage the strengths of both HEC-HMS and ML models to enhance runoff predictions. Additionally, efforts should be made to improve the transparency and interpretability of ML models for better integration into decision-making processes within hydrological studies.

The authors would like to thank the Ethiopian Ministry of Water Resource (MoWR) and the Ethiopian National Metrological Institute (NMI) for providing us with the data for this study. Additionally, the author would like to thank the Haramaya University for providing the opportunity for my PhD studies and for sponsoring my tuition. I want to express my gratitude to the anonymous reviewer for their insightful comments, which helped the paper's quality greatly.

I certify that the information presented here is true and complete to the best of my knowledge. I declare that this work has not been published elsewhere and has not been submitted to any other journal for publication.

The authors did not receive support from any organization for the submitted work.

All authors contributed to the study's conception and design. Material preparation, data collection, and analysis were performed by Y.B.E. The first draft of the manuscript was written and edited by Y.B. Both A.K. and M.M. read and commented on previous versions of the manuscript. The final version proofread was undertaken by Y.B. All authors read and approved the final manuscript.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alemseged
T.
,
Negash
W.
&
Ermias
A.
2014
Impact of Flooding on Human Settlement in Rural Households of Gambella Region in Ethiopia
.
UNECA
,
Addis Ababa, Ethiopia
.
Aytek
A.
,
Asce
M.
&
Alp
M.
2008
An application of artificial intelligence for rainfall-runoff modeling
.
Journal of Earth System Science
117
,
145
155
.
Badrzadeh
H.
,
Sarukkalige
R.
&
Jayawardena
A.
2015
Hourly runoff forecasting for flood risk management: Application of various computational intelligence models
.
Journal of Hydrology
529
,
1633
1643
.
Burhanuddin
S. N. Z. A.
,
Deni
S. M.
&
Ramli
N. M.
2017
Imputation of missing rainfall data using revised normal ratio method
.
Advanced Science Letters
23
,
10981
10985
.
Chekole
A. G.
,
Belete
M. A.
,
Fikadie
F. T.
&
Wubneh
M. A.
2024
Evaluate the performance of HEC-HMS and SWAT models in simulating the streamflow in the Gumara watershed, Ethiopia
.
Sustainable Water Resources Management
10
,
26
.
Chiang
S.
,
Chang
C.-H.
&
Chen
W.-B.
2022
Comparison of rainfall-runoff simulation between support vector regression and HEC-HMS for a rural watershed in Taiwan
.
Water
14
,
1
18
.
De Silva
R. P.
,
Dayawansa
N.
&
Ratnasiri
M.
2007
A comparison of methods used in estimating missing rainfall data
.
Journal of Agricultural Sciences
3
,
101
108
.
Deulkar
A. M.
,
Londhe
S. N.
,
Jain
R. K.
&
Dixit
P. R.
2024
Rainfall-runoff modelling – A comparison of artificial neural networks (ANNs) and Hydrologic Engineering Centre-Hydrologic Modelling System (HEC-HMS)
.
ISH Journal of Hydraulic Engineering
30
,
1
11
.
Fan
H.
,
Jiang
M.
,
Xu
L.
,
Zhu
H.
,
Cheng
J.
&
Jiang
J.
2020
Comparison of long short term memory networks and the hydrological model in runoff simulation
.
Water
12
,
1
15
.
Farzin
S.
,
Singh
V. P.
,
Karami
H.
,
Farahani
N.
,
Ehteram
M.
,
Kisi
O.
,
Allawi
M. F.
,
Mohd
N. S.
&
El-Shafie
A.
2018
Flood routing in river reaches using a three-parameter Muskingum model coupled with an improved bat algorithm
.
Water
10
,
1130
.
Guduru
J.
,
Jilo
N.
,
Rabba
Z.
&
Namara
W.
2022
Rainfall-runoff modeling using HEC-HMS model for Meki River watershed, rift valley basin, Ethiopia
.
Journal of African Earth Sciences
197
,
104743
.
Hassaan
H. A.
,
Rauf
A. U.
,
Ghumman
A. R.
,
Khan
S.
&
Aamir
E.
2024
Assessment of climate change impact on inflows to Amandara headwork using HEC-HMS and ANNs
.
Journal of Umm Al-Qura University for Engineering and Architecture
15
,
1
18
.
Kabbilawsh
P.
,
Kumar
D. S.
&
Chithra
N.
2023
Assessment of temporal homogeneity of long-term rainfall time-series datasets by applying classical homogeneity tests
.
Environment, Development and Sustainability
26
,
1
45
.
Kan
G.
,
Li
J.
,
Zhang
X.
,
Ding
L.
,
He
X.
,
Liang
K.
,
Jiang
X.
,
Ren
M.
,
Li
H.
&
Wang
F.
2017
A new hybrid data-driven model for event-based rainfall-runoff simulation
.
Neural Computing and Applications
28
,
2519
2534
.
Kan
G.
,
Liang
K.
,
Yu
H.
,
Sun
B.
,
Ding
L.
,
Li
J.
,
He
X.
&
Shen
C.
2020
Hybrid machine learning hydrological model for flood forecast purpose
.
Open Geosciences
12
,
813
820
.
Khaira
H.
2024
Hydrological modelling with HEC-HMS in Krueng Peudada sub-watershed Bireuen Regency
.
IOP Conference Series: Earth and Environmental Science
1391
,
012032
.
Mekonnen
M. A.
,
Wörman
A.
,
Dargahi
B.
&
Gebeyehu
A.
2009
Hydrological modelling of Ethiopian catchments using limited data
.
Hydrological Processes: An International Journal
23
,
3401
3408
.
Mengistu
A. G.
,
Woldesenbet
T. A.
,
Dile
Y. T.
&
Bayabil
H. K.
2022
Modeling the impacts of climate change on hydrological processes in the Baro–Akobo River basin, Ethiopia
.
Acta Geophysica
71
,
1915
1935
.
Mohammed
R.
&
Scholz
M.
2023
Quality control and homogeneity analysis of precipitation time series in the climatic region of Iraq
.
Atmosphere
14
,
1
12
.
Mohseni
U.
&
Muskula
S. B.
2023
Rainfall-runoff modeling using artificial neural network – A case study of Purna sub-catchment of Upper Tapi Basin, India
.
Environmental Sciences Proceedings
25
,
1
8
.
Moriasi
D. N.
,
Gitau
M. W.
,
Pai
N.
&
Daggupati
P.
2015
Hydrologic and water quality models: Performance measures and evaluation criteria
.
Transactions of the ASABE
58
,
1763
1785
.
Namara
W. G.
,
Damise
T. A.
&
Tufa
F. G.
2020
Rainfall runoff modeling using HEC-HMS: The case of Awash Bello sub-catchment, upper Awash basin, Ethiopia
.
International Journal of Environment
9
,
68
86
.
Parisouj
P.
,
Mokari
E.
,
Mohebzadeh
H.
,
Goharnejad
H.
,
Jun
C.
,
Oh
J.
&
Bateni
S. M.
2022
Physics-informed data-driven model for predicting streamflow: A case study of the Voshmgir Basin, Iran
.
Applied Sciences
12
,
7464
.
Peker
İ. B.
,
Gülbaz
S.
,
Demir
V.
,
Orhan
O.
&
Beden
N.
2024
Integration of HEC-RAS and HEC-HMS with GIS in flood modeling and flood hazard mapping
.
Sustainability
16
,
1
18
.
Radfar
A.
&
Rockaway
T. D.
2016
Captured runoff prediction model by permeable pavements using artificial neural networks
.
Journal of Infrastructure Systems
22
,
04016007
.
Rajaee
T.
,
Khani
S.
&
Ravansalar
M.
2020
Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review
.
Chemometrics and Intelligent Laboratory Systems
200
,
103978
.
Sahoo
S.
,
Russo
T.
,
Elliott
J.
&
Foster
I.
2017
Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US
.
Water Resources Research
53
,
3878
3895
.
Sayed
B. T.
,
Al-Mohair
H. K.
,
Alkhayyat
A.
,
Ramírez-Coronel
A. A.
&
Elsahabi
M.
2023
Comparing machine-learning-based black box techniques and white box models to predict rainfall-runoff in a northern area of Iraq, the Little Khabur River
.
Water Science and Technology
87
,
812
822
.
Shakarneh
M. O. A.
,
Khan
A. J.
,
Mahmood
Q.
,
Khan
R.
,
Shahzad
M.
&
Tahir
A. A.
2022
Modeling of rainfall–runoff events using HEC-HMS model in southern catchments of Jerusalem Desert-Palestine
.
Arabian Journal of Geosciences
15
,
127
.
Sichangi
A. W.
,
Wang
L.
,
Yang
K.
,
Chen
D.
,
Wang
Z.
,
Li
X.
,
Zhou
J.
,
Liu
W.
&
Kuria
D.
2016
Estimating continental river basin discharges using multiple remote sensing data sets
.
Remote Sensing of Environment
179
,
36
53
.
Soulis
K. X.
2021
Soil Conservation Service Curve Number (SCS-CN) Method: Current Applications, Remaining Challenges, and Future Perspectives
.
MDPI
,
Athens, Greece
.
Tamiru
H.
&
Dinka
M. O.
2021
Application of ANN and HEC-RAS model for flood inundation mapping in lower Baro Akobo River Basin, Ethiopia
.
Journal of Hydrology: Regional Studies
36
,
100855
.
Tourian
M.
,
Sneeuw
N.
&
Bárdossy
A.
2013
A quantile function approach to discharge estimation from satellite altimetry (ENVISAT)
.
Water Resources Research
49
,
4174
4186
.
Vidyarthi
V. K.
,
Jain
A.
&
Chourasiya
S.
2020
Modeling rainfall-runoff process using artificial neural network with emphasis on parameter sensitivity
.
Modeling Earth Systems and Environment
6
,
2177
2188
.
Xiang
Z.
,
Yan
J.
&
Demir
I.
2020
A rainfall-runoff model with LSTM-based sequence-to-sequence learning
.
Water Resources Research
56
,
e2019WR025326
.
Yifru
B. A.
,
Lim
K. J.
,
Bae
J. H.
,
Park
W.
&
Lee
S.
2024
A hybrid deep learning approach for streamflow prediction utilizing watershed memory and process-based modeling
.
Hydrology Research
55
,
498
518
.
Zounemat-Kermani
M.
,
Batelaan
O.
,
Fadaee
M.
&
Hinkelmann
R.
2021
Ensemble machine learning paradigms in hydrology: A review
.
Journal of Hydrology
598
,
126266
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data