Reducing uncertainty in streamflow simulation is vital for effective water resource management. The impact of uncertainty in model calibration data (discharge), commonly derived from the rating curve, is often overlooked. This study applies the Monte Carlo simulation technique (MCST) to assess uncertainty in the rating curve. Advanced machine learning (ML) models, bidirectional long short-term memory (BiLSTM), and bidirectional gated recurrent units (BiGRUs) were used comparatively to evaluate the propagation of this uncertainty onto streamflow simulation on both daily and monthly temporal scales. Different sets of streamflow data, derived from the fitted curve and its lower and upper uncertainty bands, were utilized to train ML models independently. The results show the substantial impact of rating curve uncertainty in streamflow simulations, with the BiGRU model surpassing the BiLSTM model on both scales. As a result, the uncertainty in the rating curve results in an uncertainty of the streamflow of up to 30 and 25% on daily and monthly simulations, respectively. These findings underscore the importance of considering rating curve uncertainty in streamflow simulation to ensure accurate and reliable results. Therefore, streamflow should be treated as an uncertain variable and managed by incorporating rating curve uncertainty in decision-making.

  • MCST was integrated with ML models to evaluate the propagation of the rating curve uncertainty onto streamflow simulation.

  • Two advanced ML models are applied comparatively for streamflow simulation under rating curve uncertainty conditions.

  • Rating curve uncertainty substantially propagates onto streamflow simulation, specifically in peak flow regions.

Rating curves are commonly employed to derive streamflow time series data due to the challenges associated with continuous discharge measurement, while methods for continuously monitoring stage height are readily accessible (Haile et al. 2023). However, developing a rating curve involves inherent uncertainties that can have significant economic implications (McMillan et al. 2010). The main purpose of a hydrometric gauging station is to monitor discharge data in real time. This is accomplished by continuously measuring the water stage and then converting it to discharge using a developed rating curve (McMillan et al. 2010). The rating curve's accuracy and reliability rely on the site's hydraulic conditions, understanding the physical processes connecting stage and discharge, and the availability and uncertainty of individual ratings (Le Coz et al. 2014). According to Dymond & Christian (1982), if discharge measurements encompass the full range of stages observed during a period of stable stage–discharge relationship, defining the discharge rating curve for that period is relatively straightforward. Conversely, if the discharge measurement does not cover the upper range of the rating curve, it becomes necessary to extrapolate the lower portion of the rating curve to the highest stage encountered, which in turn introduces uncertainties into the system (Herschy 2002). The majority of the authors recognize that the uncertainty associated with stage measurements is typically smaller in comparison to the uncertainty associated with velocity measurements (Kumlachew et al. 2023). This is primarily because stage measurements can be obtained with relatively high accuracy using precise instruments. On the other hand, velocity measurements are more challenging and prone to larger uncertainties due to factors such as flow turbulence, variations in flow velocity across the channel, and limitations of velocity measurement techniques (Domeneghetti et al. 2012). Guerrero et al. (2012) examined the impact of rating curve uncertainty on flood frequency estimation and found that ignoring rating curve uncertainty can lead to biased flood frequency estimates. Aronica et al. (2006) assessed the uncertainty in river flow predictions caused by errors in the rating curve and found that the uncertainty in flow predictions increased as the magnitude of the flow increased. Existing studies have highlighted the significant level of uncertainty in discharge measurements, with reported uncertainties as high as 20% of the observed value (Pelletier 1989). The uncertainty ranges mentioned earlier are influenced by operational factors, such as the number of sampling points, the duration of single velocity measurement, and the characteristics of the gauging sites (Le Coz et al. 2012). This uncertainty can propagate onto streamflow simulation, leading to potential inaccuracies and limitations in water resource management. Recognizing the need to account for this uncertainty, previous studies have proposed various approaches to estimate the uncertainty in the rating curve, varying in complexity and suitability for different situations (Le Coz et al. 2014). Monte Carlo simulation technique (MCST), Bayesian inference (BI), and generalized likelihood uncertainty estimation (GLUE) are commonly applied methods (Khan et al. 2023). MCST is a versatile and model-agnostic approach that involves random sampling of input parameter distributions. Its flexibility enables it to be applied to a broad spectrum of problems and models without necessitating specific assumptions about underlying probability distributions (Khan et al. 2023). Furthermore, it facilitates the comprehensive incorporation of uncertainty across multiple model parameters and inputs, encompassing both aleatory and epistemic (Le Coz et al. 2014). In contrast, BI provides a formal probabilistic framework for quantifying uncertainty by updating prior knowledge about model parameters using observed data. GLUE is a method that involves evaluating the likelihood of model outputs given observed data and using this likelihood to quantify uncertainty (Domeneghetti et al. 2012). Each method has its strengths and applicability. In this study, the MCST is employed to assess uncertainty in the rating curve.

Streamflow simulation provides important information for managing water resource systems (Pagano et al. 2014; Hirpa et al. 2016). However, owing to the complexity and nonlinearity of the rainfall–runoff relationship, dependable streamflow simulation is highly difficult (Werner & Yeager 2013). Hence, researchers attempt to provide models that simulate streamflow accurately and easily. Currently, there are two major approaches: the model driven and data driven (Mena et al. 2024). Model-driven approaches consist of mathematical models that simulate the hydrodynamic process of the water flow and are widely utilized due to their foundation in hydraulics and hydrology principles (Kumar Singh & Marcy 2017). Nevertheless, these models often necessitate extensive input data, prolonged calibration periods, and present challenges in comprehending the intricate and nonlinear nature of hydrological processes.

Considering these physical model restrictions, a data-driven model like machine learning (ML) is an appealing way to get beyond these limitations (Mosavi et al. 2018). The artificial neural network (ANN), a subfield of the ML model, is extensively employed in hydrology due to its reliability and ability to capture rainfall–runoff relationships. Tareke & Awoke (2023) used ANN to investigate and perform long-term forecasting of both streamflow and hydrological drought over Ethiopia, and the result indicates that ANN is a good tool to forecast streamflow. Further, Niu et al.(2019) applied four methods to derive the operation rule of hydropower reservoirs, including multiple linear regression (MLR), artificial neural network (ANN), extreme learning machine (ELM), and support vector machine (SVM). The simulations show that three artificial intelligence algorithms (ANN, SVM, and ELM) were able to provide better performances than the conventional MLR and the scheduling graph method. They concluded that the applications of artificial intelligence algorithms in deriving the operation rule of hydropower reservoirs might be a challenge, but represent valuable research work for the future. Recently, to successfully address complicated issues, deep learning (DL) networks have been enhanced with multilayered architectures. Time-series data, like streamflow, can be effectively modeled using recurrent neural network (RNN) variants of DL algorithms (Mena 2024). RNNs are often chosen for modeling in DL tasks for several reasons such as sequential data handling, memory of previous inputs, ability to capture long-term dependencies, and they have several architectural variants, which have been specifically designed to address the vanishing gradient problem and improve the learning of long-term dependencies. Gao et al. (2020) employed two popular variants of RNN networks, namely, long short-term memory (LSTM) and gated recurrent unit (GRU) networks, along with ANN model to simulate runoff in the Yutan station control catchment, Fujian Province, Southeast China. Results show that LSTM and GRU models perform better than the ANN model, and the GRU model performs equally well as the LSTM model. Moreover, Liu et al. (2023) evaluated the performance of eight different DL networks to compute river discharge time series based on water surface elevation time series observed from the Zhutuo gauging station on the Yangtze River. Their exploration shows that the BiGRU model outperformed other DL models and also shows that with this data-driven-based approach, river discharge can be accurately, objectively, and quickly computed directly from water surface elevation, which is of practical value for flood protection and water resources management (Mathewos et al. 2024). There are several irrigation projects and settlements downstream of the Woybo watershed, and the watershed experienced high flooding for years (Yisehak et al. 2020). Despite the importance of rating curve uncertainty assessment, there is a lack of research conducted in the watershed. This research gap highlights the need to assess rating curve uncertainty and its propagation onto streamflow simulation. Therefore, this study aims to assess the effect of rating curve uncertainty on the accuracy and reliability of streamflow simulation at the outlet of the Woybo watershed by offering a novel viewpoint through integrating MCST with different ML models to enhance water resource management practices and support informed decision-making in the study area.

Description of the study area

Woybo River is one of the tributaries of the Omo River Basin flowing in the southwest of Ethiopia. It flows into the Omo Gibe River and is situated between latitudes 6° 40′ N and 7° 10′ N, and longitudes 37° 30′ E and 38° 00′ E as shown in Figure 1. The study area has a tropical climate regime with a watershed area of 533.65 km2. Precipitation in the watershed has strong seasonal and elevation variability. The wet season extends from April to October with July and August as the wettest months. Rainfall distribution is largely controlled by the South-North movement of the Inter Tropical Convergence Zone. The maximum and minimum average temperature varies between 19.67 °C to 21.83 °C and 16.19 °C to 18.71 °C, respectively. The subbasin receives an average annual rainfall of 1,377.74 mm depicting a high spatial and temporal variation of rainfall. The drainage network of the Woybo River watershed was extracted from the digital elevation model (DEM). The watershed comprising third order with a drainage density of 0.45 km/km2 consists of 23 streams extending 148.9 km length having a 1,944 km longest flow path and a bifurcation ratio of 0.96 (Ukumo et al. 2022a). The selection of the Woybo station was based on the availability of gauging data, rating curve, and long-term streamflow measurements and stage recordings.
Figure 1

Description of the study area map.

Figure 1

Description of the study area map.

Close modal

Data collection

The data used in this study were collected from different sources. The spatial data (DEM) of the Woybo watershed of (30 × 30 m resolution) were downloaded from the United States Geological Survey (USGS) database. The meteorological data (daily precipitation, maximum and minimum temperature) from 1997 to 2018 of four stations were collected from the Ethiopian Meteorology Institute (EMI). The long-time daily streamflow time series and river stage measurements (from 1987 to 2017) for the Woybo gauging station were obtained from the Ministry of Water and Energy (MoWE). Measured pairs of discharge and stage were also obtained from MoWE.

Methodology

The quality and completeness of data heavily influence data analysis outcomes (Ukumo et al. 2022b). In this study, the multiple imputation method and inverse distance method were used comparatively to impute the missing meteorological and hydrological data. The homogeneity test of the selected gauging stations was assessed using the relative method through a nondimensional zing equation. The consistency of rainfall records was checked by double mass curve analysis. The Hargreaves & Samani method was applied for calculating potential evapotranspiration (PET), as this method offers a practical and reliable approach to estimating PET (Edamo et al. 2022). Rain gauges provide a limited representation of the spatial distribution of rainfall during a storm, as they only offer point sampling. However, when conducting hydrological analyses that encompass large areas, it becomes essential to determine average rainfall depths over subwatershed areas. For hydrological analyses covering large areas, it is important to estimate average rainfall depths over subwatershed areas as rain gauges provide limited point sampling of rainfall distribution (Chow et al. 1988). In this study, the spatial distribution of rainfall across the watershed was computed using the Thiessen polygon method with the assistance of ArcGIS 10.3 tools (Ukumo et al. 2023).

Input data set selection and model architectures

The variables that influence streamflow or hydrological modeling are numerous. When dealing with the hydrological situation where rainfall is the driving force behind runoff generation (another possibility is climate data like temperature), rainfall input seems the most logical variable to input to an ML model for streamflow simulation (Wegayehu & Muluneh 2022). In this study, variables that were used as ML model inputs are as follows: potential PET, rainfall with its lagged time, and lag time discharge. Since the feature size in this study is not large, rigorous feature selection criteria are not necessary. Linear correlation statistics, such as Pearson's correlation coefficient, serve as a simple yet effective method to understand the dependence between variables (Zeroual et al. 2016). The autocorrelation function for the discharge reveals a significant correlation up to two lagged times. The correlation analysis was performed between the independent variables, namely, rainfall and discharge, and a good correlation was obtained with a single lagged time. However, PET has a weak correlation with discharge. The ML models were designed with inputs including variables such as Q(t-1), RF(t-1), RF(t), and PET (Figure 3).

To remove the impact of varying scales and measurement units, it is crucial to normalize data (Zeroual et al. 2016). This process involves standardizing the values of different variables to a common range, usually between 0 and 1 or −1 and 1, through various techniques. In this study, the min–max scalar function is employed for this purpose (Equation (1)):
(1)
where is the scaled value and x is the original value.
The preprocessed data are divided into training and testing sets. The training set is utilized for model training and hyperparameter optimization, whereas the testing set is employed to assess the performance of the final model. In our case, the data are split into two: 70% for training and 30% for testing (Figure 2).
Figure 2

Training and testing dataset split for daily time series.

Figure 2

Training and testing dataset split for daily time series.

Close modal
Figure 3

Pearson correlation plot for dependent and independent variables.

Figure 3

Pearson correlation plot for dependent and independent variables.

Close modal

The various statistical properties are calculated and visualized in the correlation matrix of the dataset. These statistical properties provide valuable insights into the distribution, variability, and central tendency of the dataset, which can help in understanding the data characteristics and making informed decisions in data analysis and modeling (Table 1). Skewness measures the asymmetry of the distribution. A positive skewness indicates a right-skewed distribution where the tail on the right side is longer or fatter. The coefficient of variation is a measure of relative variability. The standard deviation values provide insights into the dispersion or spread of data points around the mean for each variable (Wegayehu & Muluneh 2021). A lower standard deviation indicates that data points are closer to the mean, while a higher standard deviation suggests greater variability or dispersion in the data. Understanding these standard deviation values helps in assessing the consistency and variability of the measurements for our dataset. Flow measurements show the highest level of variability among the three variables, indicating a wider range of values and potentially more diverse data points. The correlation matrix visualization helps in understanding the relationships between different variables in the dataset (Figure 3). It can help in gaining insights into the data distribution, identifying patterns, and making data-driven decisions based on the analysis results.

Table 1

Descriptive statistics of time series data for the Woybo watershed

Data typePearson correlation with streamflowskewnessMeanMinMax SDaCVb
Streamflow (m3/sec) 1.00 2.81 9.04 0.00 120.08 13.98 1.54 
RF (mm/day) 0.38 2.63 4.19 0.00 43.48 4.98 1.19 
PET (oc) −0.14 −0.34 4.03 1.84 11.14 0.76 0.18 
Data typePearson correlation with streamflowskewnessMeanMinMax SDaCVb
Streamflow (m3/sec) 1.00 2.81 9.04 0.00 120.08 13.98 1.54 
RF (mm/day) 0.38 2.63 4.19 0.00 43.48 4.98 1.19 
PET (oc) −0.14 −0.34 4.03 1.84 11.14 0.76 0.18 

aSD stands for standard deviation; bCV stands for co-efficient of variation.

In this study, various materials were utilized. ArcGIS 10.3 software was used to delineate the watershed and subbasins. MCST was applied to establish the rating curve and evaluate associated uncertainty. ML models, specifically, bidirectional long short-term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU), were employed comparatively to evaluate the propagation of this uncertainty into streamflow simulation on both daily and monthly simulations. Python, a programming language, along with the Jupiter Notebook, was used for data analysis and ML model development.

Rating curve establishment and uncertainty estimation using MCST

The stage–discharge relationships are typically determined by regularly measuring the water level and flow rate in a specific stream or river (Kumlachew et al. 2023). The power equation is commonly employed in hydrology for fitting rating curves because of its simplicity and its capacity to accurately represent the nonlinear relations between the water level and the flow rate (Negatu et al. 2022). The power type stage–discharge relationship is presented in Equation (2):
(2)
where Q is the discharge (m3s−1); h (m) represents the water level above a vertical reference; , a, and b are site-specific constants. The parameters a and b represent friction and geometric characteristics at the gauging site, with b indicating the river bank's deviation from the vertical and a serving as the scale coefficient incorporating cross-section width, Manning coefficient, and local bottom slope (Haile et al. 2023). The number of stage–discharge data points required for constructing a rating curve varies based on the specific characteristics of the river and the desired level of precision (Kumlachew et al. 2023). In this study, 26 measured stage–discharge pairs were used to construct the rating curve. Determining the zero-discharge level (h0) in a stream is crucial for establishing accurate stage–discharge relationships. The lowest point opposite to the gauge can serve as an indicator of the gauge height at zero discharge (Haile et al. 2023). Various factors must be considered, and methods such as trial and error, arithmetic, or computer-based optimization can be used to estimate the zero-discharge level. In this study, a computer-based optimization method was employed (Kumlachew et al. 2023).

Machine learning models

ML models are a sophisticated evolution of artificial intelligence (AI), leveraging multi-layered neural networks to improve overall performance (Mosavi et al. 2018). This advanced approach has garnered significant attention as a powerful tool for precise rainfall-runoff simulations. Specifically, RNNs models have demonstrated exceptional suitability for time series prediction tasks (Ayele et al. 2024).

Bidirectional long short-term memory (BiLSTM)

BiLSTM is a variation of the LSTM architecture (Figure 4) that considers information from both past (t − 1) and future (t + 1) time steps when determining the output at each time step. This enables the model to effectively capture bidirectional dependencies within the data (Wegayehu & Muluneh 2022).
Figure 4

The architecture of BiLSTM (Wegayehu & Muluneh 2022).

Figure 5

The architecture of BiGRU, source; (Staudemeyer & Morris 2019).

Figure 5

The architecture of BiGRU, source; (Staudemeyer & Morris 2019).

Close modal

Bidirectional Gated Recurrent Units (BiGRU)

BiGRU is a type of RNNs architecture that incorporates bidirectional processing. The key difference between BiGRU and GRU lies in the direction of information flow. GRU processes the input sequence in only one direction, while BiGRU processes it in both directions, enabling it to capture more comprehensive context information (Staudemeyer & Morris 2019).

BiGRU and BiLSTM are both bidirectional recurrent neural network architectures, but they use different types of recurrent units (GRU and LSTM, respectively). Both architectures are effective for capturing complex dependencies in sequential data, but the choice between them depends on the specific requirements of the task at hand (Hunt et al. 2022). In this study, the two latest algorithms, BiLSTM and BiGRU, are applied comparatively (Figure 5).

Machine Learning models development in python environment

The process of developing ML models in a Python environment involves a series of steps, including importing libraries, preprocessing data, designing model architecture, compiling the model, training it, utilizing it for predictions, assessing its performance, refining it through iterations, and eventually deploying it for real-world applications (Ayele et al. 2024). To achieve optimal performance in ML models, decisions must be made regarding a combination of parameters and hyperparameters (Wegayehu & Muluneh 2021). The subsequent discussion focuses on the key hyperparameters that are optimized (Niu et al. 2019).

1. Number of hidden units: The number of neurons in the hidden layer is crucial in capturing complex patterns within the data. Increasing the number of hidden units enhances the model's ability to discern intricate details, albeit at the cost of heightened computational complexity.

2. Activation function: Activation functions are crucial in neural networks as they introduce non-linearity, control the output range, and add interpretability to the model (Mosavi et al. 2018). The selection of an activation function relies on the specific problem at hand and the desired behavior of the network. There are several activation functions commonly used in ML models. For this study, Sigmoid and Tanh activation functions are used.

2.1. Sigmoid function: It converts the input to a value between 0 and 1, which makes it suitable for modeling probabilities or binary classification problems, and interpreted as the activation level of a neuron. Values that are near 0 indicate low activation, while values that are close to 1 indicate high activation (Equation (3)).
(3)
2.2. Tanh function: The hyperbolic tangent (tanh) function transforms the input into a value between −1 and 1, enabling the effective capture of both positive and negative values in the data (Equation (4)).
(4)

3. Learning rate: It governs the magnitude of the step taken by the model to update its parameters during training. A higher learning rate accelerates convergence but increases the likelihood of the model overshooting the optimal solution. Conversely, a lower learning rate slows down convergence but enhances the model's ability to accurately fine-tune its parameters.

4. Batch size: The batch size determines the number of training samples that are processed together in each iteration during training. Increasing the batch size can expedite training by processing more samples at once, yet it necessitates more memory. Smaller batch sizes may allow the model to generalize better as it updates its parameters more frequently.

5. Optimization algorithm: The optimization algorithm controls parameter updates during training. Common optimization algorithms used in ML models include stochastic gradient descent (SGD) and Adam. The choice of an optimization algorithm can affect the convergence speed and final performance of the model.

Model training and testing

RNNs model training involves optimizing the model's parameters using labeled data, while testing evaluates its performance on unseen data. The purpose is to improve the model's accuracy and assess its suitability for real-world applications (Ayele et al. 2024). In this study, The ML models were trained for daily and monthly streamflow data from 1997 to 2011 and then tested from 2012 to 2018.

Evaluation of model performance

In this study, the model evaluation was performed by comparing the simulated streamflow from the three distinct scenarios to the observed streamflow using the following performance measures (Equations (5)–(7)).

1. Root Mean Squared Error (RMSE).
(5)
2. Mean Absolute Error (MAE).
(6)
3. Coefficient of Determination ().
(7)
where; n is the number of data points, Qsim is the simulated streamflow value, Qobs is the actual streamflow value, SS_res is the sum of squared residuals and SS_tot is the total sum of squares.

Taylor Diagram

The Taylor Diagram is a graphical tool used to evaluate the performance of models by summarizing key statistics in a single plot. It simultaneously represents three statistical metrics: Correlation Coefficient, Standard Deviation, and Centered Root Mean Square Error. The diagram provides a visual way to compare multiple models or scenarios against observed data. In the context of streamflow simulation, the Taylor Diagram is highly applicable because:

  • 1. Comprehensive Evaluation: It allows for a simultaneous assessment of multiple performance metrics (correlation, variability, and error), offering a holistic evaluation of model accuracy.

  • 2. Model Comparison: The diagram facilitates the comparison of different models, configurations, or datasets.

  • 3. Communication of Results: Its compact, visual format makes it easier to communicate complex model evaluation results to a broader audience, including researchers and practitioners.

In this research, the Taylor Diagram will be used to assess the performance of streamflow simulations in the Woybo River by comparing modeled and observed values at daily and monthly scales.

MCST model result

In this study, MCST was applied to estimate uncertainty in rating curves. The method estimates the uncertainty in the rating curve by randomly sampling input parameters' distributions, evaluating the model for each set of sampled values, and aggregating the results to create a distribution of possible rating curves with upper and lower uncertainty bands (Figure 6). Visual aids like histograms and cumulative distribution functions are employed to convey the outcomes and associated uncertainties to stakeholders and decision-makers (Khan et al. 2023).
Figure 6

The fitted curve with its upper and lower uncertainty bands.

Figure 6

The fitted curve with its upper and lower uncertainty bands.

Close modal
The plot above illustrates that, during low and medium flows, the majority of data points closely align with the fitted curve. Conversely, for high water depths, the data points exhibit significant deviation from the fitted curve. This implies that, with an increase in water depth, there is a corresponding increase in rating curve uncertainty (Le Coz et al. 2012). The uncertainty band is notably broad in comparison to the data range, indicating substantial uncertainty in the rating curve estimation and implying considerable variability in the estimated discharge values. A bivariate kernel density estimate (KDE) plot provides a visual representation of the joint distribution of variables a and b, and the width of the histogram in the marginal posterior distribution plot provides information about the uncertainty and variability in the estimated parameter values (Selle & Hannah 2010). It also provides insights into the shape, skewness, and outliers in the histograms. The box plots are useful for comparing the central tendency, spread, and variability of different parameters (Figure 7).
Figure 7

Histogram plot for marginal posterior distributions, bivariate KDE, and box plot of parameter estimates.

Figure 7

Histogram plot for marginal posterior distributions, bivariate KDE, and box plot of parameter estimates.

Close modal

From the above analysis the optimized parameters of the power law equation are extracted, representing the best-fit values for the relationship and its upper and lower bands that are further used for streamflow time series determination. The goodness of fit was evaluated by visually inspecting the fitted curve against the data points and metrics such as the R2 and RMSE. The RMSE value of 10.95 m3s−1, indicating an average prediction error of 10.95 m3s−1, and the R2 value of 0.84, implies that the fitted curve explains approximately 84% of data variation, suggesting that there is unexplained variation in the data that is not captured by the fitted curve. This uncertainty, particularly in the peak flow region, renders the rating curve unreliable for predicting discharge at the Woybo gauging station, emphasizing the need to acknowledge and exercise caution regarding this uncertainty when using the curve for flow predictions. The propagation of this uncertainty into ML streamflow simulation models is assessed in the next section.

ML models result

In this study, two different ML models (BiLSTM, BiGRU) were constructed in Python programming language for streamflow simulation. The cost function used to determine the optimal output was the Mean Squared Error (MSE). An attempt was made to enhance the accuracy of outcome predictions through a network. The network's precision is assessed using a cost function that penalizes errors during the training process, aiming for the optimal output with the lowest cost. During the training process, a repetition step involves dividing the training data into batches, where each batch contains a specific number of samples. This number of samples per batch is a hyper-parameter that is typically determined through trial and error. In all models, this parameter is set at 128 for daily simulations and 16 for monthly simulations in the most effective mode. During each repetition step, the cost function is calculated as the average Mean Squared Error (MSE) of 128 samples for daily simulations and 16 samples for monthly simulations, comparing the observed and simulated streamflow data.

In neural networks, the number of iteration steps is referred to as an epoch, where the network simulates the streamflow time series once in each epoch. Similar to other networks, in recurrent networks, neurons or network layers can be chosen arbitrarily to optimize model performance (Wegayehu & Muluneh 2022). The structures of both RNN models are generated equally to compare them with one another. Each network has a double hidden layer, with 12 units for daily data and 8 units for monthly data in each of the first and second layers. The last layer's output of the network is connected to a dense layer with a single output neuron. Between the layers, a dropout equal to 10% is used. The structure of the neural network is also used in two hidden layers. The first and second layers have 12 and 8 neurons each for daily and monthly simulations, respectively. In both networks, the sigmoid activation function is applied for the hidden layer, which introduces non-linearity to the model, enabling it to learn complex patterns in the time series data. One key benefit of using the sigmoid activation function is that it maintains a constant derivative for all inputs greater than 0. This consistent derivative helps accelerate the learning process of the neural network (Ayele et al. 2024). The parametric values of the weights and biases associated with the connections between the neurons in the neural networks are learned during the training process, where the models are optimized to minimize the Mean Squared Error (MSE) cost function.

The trial-and-error method is adopted to tune the hyper-parameters of the BiGRU and BiLSTM networks. Each method is run with different epoch numbers. After numerous iterations, the optimal hyperparameter settings for the networks are presented in Table 2. The optimized model results were assessed utilizing the Hydrostats packages, which employ statistical error evaluation methods. Hydrostats facilitates both statistical and visual assessments by comparing observed and simulated flow using error metric functions, and graphically by plotting the simulated and observed flow (Figures 8 and 9). Various descriptive statistics are employed for predictive model assessment. Specifically, RMSE, MAE, and R2 were intentionally selected for this analysis, and the outcomes of both methods and scenarios based on the evaluation criteria are presented in Tables 3 and 4.
Table 2

Optimal hyperparameter networks on a daily and monthly scale

HyperparameterDaily scaleMonthly scale
Neuron 12 
Optimization Adam Adam 
Learning rate 0.001 0.001 
Activation function Sigmoid and Tanh Sigmoid and Tanh 
Max Epoch 1,000 100 
Batch size 128 16 
HyperparameterDaily scaleMonthly scale
Neuron 12 
Optimization Adam Adam 
Learning rate 0.001 0.001 
Activation function Sigmoid and Tanh Sigmoid and Tanh 
Max Epoch 1,000 100 
Batch size 128 16 
Table 3

Evaluation of the model's performance for daily simulation

Curve type (scenarios)ML
Training
Testing
ModelsRMSEMAER2RMSEMAER2
Fitted curve BiGRU 2.02 1.19 0.97 1.82 1.14 0.98 
BiLSTM 6.96 6.67 0.78 5.56 4.97 0.85 
Upper band BiGRU 2.12 1.29 0.95 2.08 1.26 0.96 
BiLSTM 5.34 4.32 0.86 2.94 2.96 0.91 
Lower band BiGRU 2.08 1.26 0.96 2.02 1.19 0.97 
BiLSTM 6.06 5.77 0.81 4.84 4.12 0.88 
Curve type (scenarios)ML
Training
Testing
ModelsRMSEMAER2RMSEMAER2
Fitted curve BiGRU 2.02 1.19 0.97 1.82 1.14 0.98 
BiLSTM 6.96 6.67 0.78 5.56 4.97 0.85 
Upper band BiGRU 2.12 1.29 0.95 2.08 1.26 0.96 
BiLSTM 5.34 4.32 0.86 2.94 2.96 0.91 
Lower band BiGRU 2.08 1.26 0.96 2.02 1.19 0.97 
BiLSTM 6.06 5.77 0.81 4.84 4.12 0.88 
Table 4

Evaluation of the model's performance for monthly simulation.

Curve type (scenarios)ML
Training
Testing
ModelsRMSEMAER2RMSEMAER2
Fitted curve BiGRU 2.57 2.16 0.93 2.94 2.96 0.91 
BiLSTM 8.89 7.13 0.72 10.05 8.23 0.65 
Upper band BiGRU 4.74 4.08 0.89 5.34 4.32 0.86 
BiLSTM 7.89 6.13 0.78 9.05 5.03 0.72 
Lower band BiGRU 1.48 1.17 0.96 2.14 2.05 0.94 
BiLSTM 6.56 5.97 0.80  9.81 6.35 0.71 
Curve type (scenarios)ML
Training
Testing
ModelsRMSEMAER2RMSEMAER2
Fitted curve BiGRU 2.57 2.16 0.93 2.94 2.96 0.91 
BiLSTM 8.89 7.13 0.72 10.05 8.23 0.65 
Upper band BiGRU 4.74 4.08 0.89 5.34 4.32 0.86 
BiLSTM 7.89 6.13 0.78 9.05 5.03 0.72 
Lower band BiGRU 1.48 1.17 0.96 2.14 2.05 0.94 
BiLSTM 6.56 5.97 0.80  9.81 6.35 0.71 
Figure 8

Comparison of observed and simulated daily values for FC, UB, and LB scenarios, respectively.

Figure 8

Comparison of observed and simulated daily values for FC, UB, and LB scenarios, respectively.

Close modal
Figure 9

Comparison of observed and simulated monthly values for FC, UB, and LB.

Figure 9

Comparison of observed and simulated monthly values for FC, UB, and LB.

Close modal

The daily and monthly simulated and observed flow hydrographs are presented below in their Performance comparison for different scenarios; fitted curve (FC), lower band (LB), and upper bands (UB).

The Taylor diagram is used in Figure 10 to illustrate the model's performance. Using the polar and radial axes (standard deviation and correlation coefficient), its main goal is to determine which predictor of the observations is the closest (Wegayehu & Muluneh 2021). Also, it shows the RMSE values. The output of the BiGRU model, as shown in Figure 10, has a lower RMSE than BiLSTM model and is closer to the real observations. Additionally, the BiGRU model and data have a stronger correlation.
Figure 10

Taylor diagram displays the standard deviations and correlation coefficient between observed and simulated streamflow for the proposed models. Training and test loss function of the optimized model plot.

Figure 10

Taylor diagram displays the standard deviations and correlation coefficient between observed and simulated streamflow for the proposed models. Training and test loss function of the optimized model plot.

Close modal

The evaluation of rating curve uncertainty on streamflow simulation was conducted by independently training and testing ML models using a time series of streamflow data derived from various rating curve uncertainty analysis scenarios. The analysis revealed a diverse range of results that can be attributed to the inherent uncertainty in rating curve estimation, the temporal scale used, and the different ML models utilized in the study. The notable distinctions observed in daily and monthly simulations across various scenarios highlight the sensitivity of ML models to rating curve uncertainty in different time windows (Figures 8 and 9).

The results indicate that the rating curve uncertainty propagates significantly onto streamflow simulations, specifically in high flow depths. The simulated flow captured the baseflow, recession limb, and rising limb of the observed hydrograph, and also peak flow fairly. The ML models performed well on a daily basis; this is due to the length of time series data used (Figure 8). The daily data provides a higher temporal resolution and more frequent observations, allowing the model to capture finer-grained patterns and fluctuations in the data (Hunt et al. 2022).

The bidirectional architecture capabilities allowed the models to effectively capture the complex temporal dependencies and patterns present in the streamflow data (Wegayehu & Muluneh 2022). This highlights the potential of advanced ML techniques to enhance the accuracy and reliability of hydrological models, with the BiGRU model consistently surpassing the BiLSTM models across all scenarios and performance metrics, as evidenced by statistical measures in Tables 3 and 4 and the scatter plot in Figure 11.
Figure 11

The scatter plot for observed and simulated streamflow.

Figure 11

The scatter plot for observed and simulated streamflow.

Close modal

The purpose of the training and test loss function plot is to evaluate the model's performance during the training process as it provide insights into how the model's loss, which is used to assess the convergence of the model, identifies potential overfitting or underfitting, and determines the effectiveness of the model in capturing the underlying patterns in the streamflow data (Mosavi et al. 2018). On the other hand, the scatter plot for observed and simulated streamflow serves as a means to visually compare the model's predictions with the actual streamflow observations. This finding aligns with prior research, which has indicated the superior capability of BiGRU in capturing temporal patterns within diverse datasets (Wang et al. 2022; Ayana et al. 2023). This might be due to the GRU's computationally lower cost than LSTMs with their simpler structure. This can allow BiGRU to process data more efficiently, especially in tasks like streamflow simulation where large amounts of data need to be processed. Thus, the BiGRU model was applied to assess the propagation of rating curve uncertainty onto the streamflow simulation of the Woybo River (Table 5).

Table 5

Comparison of simulated peak flows for all scenarios using the BiGRU model

ModelTemporal scaleScenario-1(FC)Scenario-2(UB)Scenario-3(LB)
BiGRU Daily (m3s−1 116.5  151.5  76.4 
BiGRU Monthly (m3s−1 52.1  65.3  39.8 
ModelTemporal scaleScenario-1(FC)Scenario-2(UB)Scenario-3(LB)
BiGRU Daily (m3s−1 116.5  151.5  76.4 
BiGRU Monthly (m3s−1 52.1  65.3  39.8 

The result shows that there is a significant difference in the simulated streamflow when comparing the reference streamflow obtained from the fitted rating curve with the upper and lower bands of the fitted rating curve. This difference is particularly pronounced for high flows compared to medium and low flows. Consequently, the rating curve uncertainty in the Woybo River resulted in about 30 and 25% uncertainty in streamflow at peak discharge for daily and monthly mean simulations, respectively. Future research could expand the findings of this study to other geographic regions. In our next investigation, we plan to utilize a range of ML models to implement an ensemble learning approach for simulating streamflow. This will incorporate data products derived from remote sensing, such as vegetation and precipitation indices.

This study evaluated the rating curve uncertainty propagation onto streamflow simulation using MCST models coupled with the ML models. MCST was utilized to establish the rating curve and estimate related uncertainty and ML models, specifically BiGRU, and BiLSTM, were utilized to evaluate the impact of rating curve uncertainties on streamflow simulation in different temporal scales (daily and monthly). The results indicate that the integration of the MCST with ML models highly contributed to the evaluation of the propagation of rating curve uncertainty into streamflow simulation, and the rating curve uncertainty propagates significantly into streamflow simulations, particularly during extreme events. While the ML models performed well individually for each scenario, the differences in simulated flow between the scenarios indicate that the uncertainty in the rating curve can introduce variability and influence the simulation results. The ML models performed well on a daily scale compared to a monthly scale, and the BiGRU excels at capturing flow data patterns on both temporal scales as per all performance matrices used. Consequently, the rating curve uncertainty in the Woybo River led to an uncertainty of the streamflow of about 30 and 25% at peak discharge for daily and monthly mean simulations, respectively. Overall, the findings of this study highlight the importance of considering rating curve uncertainty in streamflow simulation, particularly on different temporal scales. The results can be useful for water resource managers and decision-makers in the Woybo Watershed.

No fund was provided from any source.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Aronica
G. T.
,
Candela
A.
,
Viola
F.
&
Cannarozzo
M.
(
2006
)
Influence of rating curve uncertainty on daily rainfall-runoff model predictions
,
IAHS-AISH Publication
,
303
,
116
124
.
Ayana
Ö.
,
Kanbak
D. F.
,
Kaya Keleş
M.
&
Turhan
E.
(
2023
)
Monthly streamflow prediction and performance comparison of machine learning and deep learning methods
,
Acta Geophysica
,
71
(
6
),
2905
2922
.
https://doi.org/10.1007/s11600-023-01023-6
.
Ayele
E. G.
,
Ergete
E. T.
&
Geremew
G. B.
(
2024
)
Predicting the peak flow and assessing the hydrologic hazard of the Kessem Dam, Ethiopia using machine learning and risk management centre-reservoir frequency analysis software
,
Journal of Water and Climate Change
,
15
(
2
),
370
391
.
https://doi.org/10.2166/wcc.2024.320
.
Chow
V. T.
,
Maidment
D. R.
,
Mays
L. W.
,
Ven Te Chow
D. R.
&
Maidment
L. W. M.
(
1988
)
Applied Hydrology Chow 1988
,
New York: McGraw-Hill
, pp.
1
294
.
Domeneghetti
A.
,
Castellarin
A.
&
Brath
A.
(
2012
)
Assessing rating-curve uncertainty and its effects on hydraulic model calibration
,
Hydrology and Earth System Sciences
,
16
(
4
),
1191
1202
.
https://doi.org/10.5194/hess-16-1191-2012
.
Dymond
J. R.
&
Christian
R.
(
1982
)
Accuracy of discharge determined from a rating curve
,
Hydrological Sciences Journal
,
27
(
4
),
493
504
.
https://doi.org/10.1080/02626668209491128
.
Edamo
M. L.
,
Bushira
K. M.
,
Ukumo
T. Y.
,
Ayele
M. A.
,
Alaro
M. A.
&
Borko
H. B.
(
2022
)
Effect of climate change on water availability in Bilate catchment, Southern Ethiopia
,
Water Cycle
,
3
(
June
),
86
99
.
https://doi.org/10.1016/j.watcyc.2022.06.001
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
(
2020
)
Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation
,
Journal of Hydrology
,
589
(
June
),
125188
.
https://doi.org/10.1016/j.jhydrol.2020.125188
.
Guerrero
J. L.
,
Westerberg
I. K.
,
Halldin
S.
,
Xu
C. Y.
&
Lundin
L. C.
(
2012
)
Temporal variability in stage-discharge relationships
,
Journal of Hydrology
,
446–447
,
90
102
.
https://doi.org/10.1016/j.jhydrol.2012.04.031
.
Haile
A. T.
,
Taye
M. T.
,
Geremew
Y.
,
Wassie
S.
&
Fekadu
A. G.
(
2023
)
Filling streamflow data gaps through the construction of rating curves in the Lake Tana sub-basin, Nile basin
,
Journal of Water and Climate Change
,
14
(
4
),
1162
1175
.
https://doi.org/10.2166/wcc.2023.372
.
Herschy
R. W.
(
2002
)
The uncertainty in a current meter measurement
,
Flow Measurement and Instrumentation
,
13
(
5–6
),
281
284
.
https://doi.org/10.1016/S0955-5986(02)00047
.
Hirpa
F. A.
,
Salamon
P.
,
Alfieri
L.
,
Thielen-del Pozo
J.
,
Zsoter
E.
&
Pappenberger
F.
(
2016
)
The effect of reference climatology on global flood forecasting
,
Journal of Hydrometeorology
,
17
(
4
),
1131
1145
.
https://doi.org/10.1175/JHM-D-15-0044.1
.
Hunt
K. M. R.
,
Matthews
G. R.
,
Pappenberger
F.
&
Prudhomme
C.
(
2022
)
Using a long short-term memory (LSTM) neural network to boost river streamflow forecasts over the western United States
,
Hydrology and Earth System Sciences
,
26
(
21
),
5449
5472
.
https://doi.org/10.5194/hess-26-5449-2022
.
Khan
Z.
,
Rahman
A.
&
Karim
F.
(
2023
)
An assessment of uncertainties in flood frequency estimation using bootstrapping and monte carlo simulation
,
Hydrology
,
10
(
1
),
1
16
.
https://doi.org/10.3390/hydrology10010018
.
Kumar Singh
S.
&
Marcy
N.
(
2017
)
Comparison of simple and complex hydrological models for predicting catchment discharge under climate change
,
AIMS Geosciences
,
3
(
3
),
467
497
.
https://doi.org/10.3934/geosci.2017.3.467
.
Kumlachew
Y. Z.
,
Tilahun
S. A.
,
Cherie
F. F.
,
Akale
A. T.
,
Kibret
E. A.
,
Alemie
N. A.
&
Animut
M.
(
2023
)
Quantifying flow rate using stage-discharge rating curve and Scs runoff equation on upland watershed of Lake Tana Sub Basin, Ethiopia
,
Sustainable Water Resources Management
,
9
(
2
),
1
20
.
https://doi.org/10.1007/s40899-022-00793
.
Le Coz
J.
,
Camenen
B.
,
Peyrard
X.
&
Dramais
G.
(
2012
)
Uncertainty in open-channel discharges measured with the velocity-area method
,
Flow Measurement and Instrumentation
,
26
,
18
29
.
https://doi.org/10.1016/j.flowmeasinst.2012.05.001
.
Le Coz
J.
,
Renard
B.
,
Bonnifait
L.
,
Branger
F.
&
Le Boursicaud
R.
(
2014
)
Combining hydraulic knowledge and uncertain gaugings in the estimation of hydrometric rating curves: A Bayesian approach
,
Journal of Hydrology
,
509
,
573
587
.
https://doi.org/10.1016/j.jhydrol.2013.11.016
.
Liu
W.
,
Zou
P.
,
Jiang
D.
,
Quan
X.
&
Dai
H.
(
2023
)
Computing river discharge using water surface elevation based on deep learning networks
,
Water (Switzerland)
,
15
(
21
),
1
15
.
https://doi.org/10.3390/w15213759
.
Mathewos
S.
,
Yisihak
T.
,
Kumar
T.
,
Bekele
N.
,
Legesse
M.
&
Arja
M.
(
2024
)
Heliyon comparative analysis in selecting best irrigation method to maximize tomato yield from various irrigation approaches in water scarce regions
,
Heliyon
,
10
(
7
),
e28746
.
https://doi.org/10.1016/j.heliyon.2024.e28746
.
McMillan
H.
,
Freer
J.
,
Pappenberger
F.
,
Krueger
T.
&
Clark
M.
(
2010
)
Impacts of uncertain river flow data on rainfall-runoff model calibration and discharge predictions
,
Hydrological Processes
,
24
(
10
),
1270
1284
.
https://doi.org/10.1002/hyp.7587
.
Mena
N. B.
(
2024
)
Application of SA-conv1d-BiGRU model for streamflow prediction in southern Ethiopia
,
Hydrology Research
,
55
(
9
),
936
957
.
https://doi.org/10.2166/nh.2024.074
.
Mena
N. B.
,
Ayele
E. G.
,
Chora
H. G.
&
Dada
T.
(
2024
)
Assessing the effect of rating curve uncertainty in streamflow simulation on Kulfo watershed, Southern Ethiopia
,
Journal of Water and Climate Change
,
15
(
9
),
4199
4219
.
https://doi.org/10.2166/wcc.2024.645
.
Mosavi
A.
,
Ozturk
P.
&
Chau
K. W.
(
2018
)
Flood prediction using machine learning models: Literature review
,
Water (Switzerland)
,
10
,
11
.
https://doi.org/10.3390/w10111536
.
Negatu
T. A.
,
Zimale
F. A.
&
Steenhuis
T. S.
(
2022
)
Establishing stage–Discharge rating curves in developing Countries: Lake Tana Basin, Ethiopia
,
Hydrology
,
9
(
1
),
1
26
.
https://doi.org/10.3390/hydrology9010013
.
Niu
W. J.
,
Feng
Z. K.
,
Feng
B. F.
,
Min
Y. W.
,
Cheng
C. T.
&
Zhou
J. Z.
(
2019
)
Comparison of multiple linear regression, artificial neural network, extreme learning machine, and support vector machine in deriving operation rule of hydropower reservoir
,
Water (Switzerland)
,
11
(
1
),
1
17
.
https://doi.org/10.3390/w11010088
.
Pagano
T. C.
,
Wood
A. W.
,
Ramos
M.-H.
,
Cloke
H. L.
,
Pappenberger
F.
,
Clark
M. P.
,
Cranston
M.
,
Kavetski
D.
,
Mathevet
T.
,
Sorooshian
S.
&
Verkade
J. S.
(
2014
)
Challenges of operational river forecasting
,
Journal of Hydrometeorology
,
15
(
4
),
1692
1707
.
https://doi.org/10.1175/jhm-d-13-0188.1
.
Pelletier
P. M.
(
1989
)
Reply: Uncertainties in the single determination of river discharge: A literature review
,
Canadian Journal of Civil Engineering
,
16
(
5
),
780
781
.
https://doi.org/10.1139/l89-116
.
Selle
B.
&
Hannah
M.
(
2010
)
A bootstrap approach to assess parameter uncertainty in simple catchment models
,
Environmental Modelling and Software
,
25
(
8
),
919
926
.
https://doi.org/10.1016/j.envsoft.2010.03.005
.
Staudemeyer
R. C.
&
Morris
E. R.
(
2019
)
Understanding LSTM – A Tutorial Into Long Short-Term Memory Recurrent Neural Networks. September
.
Ithaca, NY: Cornell University, Available from: http://arxiv.org/abs/1909.09586.
Tareke
K. A.
&
Awoke
A. G.
(
2023
)
Hydrological drought forecasting and monitoring system development using artificial neural network (ANN) in Ethiopia
,
Heliyon
,
9
(
2
),
e13287
.
https://doi.org/10.1016/j.heliyon.2023.e13287
.
Ukumo
T. Y.
,
Edamo
M. L.
,
Abdi
D. M.
&
Derebe
M. A.
(
2022a
)
Evaluating water availability under changing climate scenarios in the Woybo catchment, Ethiopia
,
Journal of Water and Climate Change
,
13
(
11
),
4130
4149
.
https://doi.org/10.2166/wcc.2022.343
.
Ukumo
T. Y.
,
Lohani
T. K.
,
Edamo
M. L.
,
Alaro
M. A.
,
Ayele
M. A.
&
Borko
H. B.
(
2022b
)
Application of regional climatic models to assess the performance evaluation of changes on flood frequency in woybo catchment, Ethiopia
,
Advances in Civil Engineering
,
2022
,
1
16
.
https://doi.org/10.1155/2022/3351375
.
Ukumo
T. Y.
,
Abebe
A.
,
Lohani
T. K.
&
Edamo
M. L.
(
2023
)
Flood hazard mapping and analysis under climate change using hydro-dynamic model and RCPs emission scenario in Woybo River catchment of Ethiopia
,
World Journal of Engineering
,
20
(
3
),
559
576
.
https://doi.org/10.1108/WJE-07-2021-0410
.
Wang
S.
,
Shao
C.
,
Zhang
J.
,
Zheng
Y.
&
Meng
M.
(
2022
)
Traffic flow prediction using bi-directional gated recurrent unit method
,
Urban Informatics
,
1
(
1
),
1
12
.
https://doi.org/10.1007/s44212-022-00015
.
Wegayehu
E. B.
&
Muluneh
F. B.
(
2021
)
Multivariate streamflow simulation using hybrid deep learning models
,
Computational Intelligence and Neuroscience
,
2021
(
1
),
1
16
.
https://doi.org/10.1155/2021/5172658
.
Wegayehu
E. B.
&
Muluneh
F. B.
(
2022
)
Short-Term daily univariate streamflow forecasting using deep learning models
,
Advances in Meteorology
,
2022
,
1
21
.
https://doi.org/10.1155/2022/1860460
.
Werner
K.
&
Yeager
K.
(
2013
)
Challenges in forecasting the 2011 runoff season in the Colorado basin
,
Journal of Hydrometeorology
,
14
(
4
),
1364
1371
.
https://doi.org/10.1175/JHM-D-12-055.1
.
Yisehak
B.
,
Adhena
K.
,
Shiferaw
H.
,
Hagos
H.
,
Abrha
H.
&
Bezabh
T.
(
2020
)
Characteristics of hydrological extremes in Kulfo River of Southern Ethiopian Rift Valley Basin
,
SN Applied Sciences
,
2
(
7
),
1
12
.
https://doi.org/10.1007/s42452-020-3097-1
.
Zeroual
A.
,
Meddi
M.
&
Assani
A. A.
(
2016
)
Artificial neural network rainfall-Discharge model assessment under rating curve uncertainty and monthly discharge volume predictions
,
Water Resources Management
,
30
(
9
),
3191
3205
.
https://doi.org/10.1007/s11269-016-1340-8
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).