ABSTRACT
Accurate streamflow simulation and comprehending its associated uncertainty are crucial for effective water resource management. However, the uncertainty of rating curves from which streamflow data is derived remains poorly understood. This study aims to simulate streamflow under rating curve uncertainty conditions. The bootstrap resampling technique (BSRT) was used to establish the rating curve and estimate associated uncertainty. Furthermore, it integrated with standalone and hybrid models (GRU, Bi-LSTM, and Conv1D-LSTM), to assess the effect of this uncertainty on streamflow simulation. Different lag times of rainfall and discharge are used as input for DL streamflow simulation models. Despite the complexity, the Conv1D-LSTM model did not outperform the Bi-LSTM model. However, it slightly outperforms the GRU model. Moreover, the rating curve uncertainty significantly propagates to streamflow simulation, particularly in high-flow regions. Consequently, the uncertainties related to rating curves on the Kulfo River led to a streamflow uncertainty of about 17.8 m3 s−1, representing 22% at peak discharge. The performance of the DL models was evaluated using different metrics (RMSE, MAE, NSE, and R2). The findings underscore the importance of considering rating curve uncertainty in streamflow simulation to enhance water resource management practices and support informed decision-making in the study area.
HIGHLIGHTS
BSRT was integrated with DL models to assess the effect of rating curve uncertainty on streamflow simulation accuracy and reliability.
Different standalone and hybrid DL models were compared for streamflow simulation under rating curve uncertainty conditions.
Rating curve uncertainty significantly propagates to streamflow simulations, particularly in high-flow regions.
INTRODUCTION
The magnitude of streamflow data is particularly valuable as it informs the design of important hydraulic and irrigation structures (Apaydin et al. 2020). Water managers can make informed decisions regarding water allocation, release schedules, and other operational strategies by accurately estimating the flow magnitude, and these decisions directly impact economic returns (Mcmillan & Westerberg 2015). However, conducting continuous discharge measurements and establishing a reliable stage–discharge relationship, commonly called a rating curve, can be costly, time-consuming, and impracticable (Kumlachew et al. 2023). To address these difficulties, it is advisable to adopt a dual-stage process. The first step involves connecting the discharge in a specific river and the stage by conducting continuous measurements. Subsequently, the stream stage can be observed relatively inexpensively, and the discharge can be predicted by utilizing a preexisting rating curve. However, this indirect determination of streamflow data from water level measurements can introduce uncertainties that may not always be communicated to data users (Negatu et al. 2022).
Numerous investigations have scrutinized the various sources of uncertainty that impact discharge measurements and the development of rating curves. One specific challenge is the extrapolation beyond the range of paired velocity and stage measurements (Haile et al. 2023). According to Haile et al. (2023), around 30% of the streamflow volumes reported for their specific research region in Australia were derived through extrapolation of rating curves. The authors also noted that the errors in stage measurements were generally below 10 mm and seldom reached 100 mm. These errors are relatively insignificant compared with the errors in discharge measurements. Pelletier (1989) reviewed over 140 publications and found that the uncertainty in discharge measurements can escalate up to 20% of the observed value. The uncertainty ranges mentioned above are influenced by operational factors, such as the number of sampling points, the duration of single velocity measurement, and the characteristics of the gauging sites (Le Coz et al. 2012). To evaluate the propagation of this uncertainty to streamflow simulation, various techniques were reviewed. Streamflow simulation techniques employ various statistical, mathematical, and computational approaches (Gupta & Nearing 2014). The choice of technique is influenced by the available data, system complexity, and the specific requirements of the application (Wegayehu & Muluneh 2022).
Conventional hydrological models are widely utilized due to their foundation in hydraulics and hydrology principles (Kumar Singh & Marcy 2017). Nevertheless, these models often necessitate extensive input data, and prolonged calibration periods, and present challenges in comprehending the intricate and nonlinear nature of hydrological processes. Given the limitations of conventional hydrological models, a data-driven model presents an attractive solution to overcome these challenges (Ji et al. 2012). Data-driven models rely on the statistical relationship between input and output data. These models are further classified as linear and nonlinear models. Autoregressive moving average (ARMA), multiple linear regression (MLR), and autoregressive integrated moving average (ARIMA) are the most common linear methods (Mosavi et al. 2018), and the most common nonlinear data-driven model is a machine learning (ML) model. The major drawback of the former model is that it is incapable of handling the system's nonlinearity (Apaydin et al. 2020). Advanced data-driven models, like ML, are increasingly used due to the shortcomings of the physical-based and linear models. ML, a subfield of artificial intelligence, is nowadays the most widely used in hydrology. It involves using computational power to extract insights from data by iteratively learning relationships from datasets (Salehinejad et al. 2017).
Artificial neural networks (ANNs), one of the popular data-driven models, have been widely applied in hydrological modeling for their strong nonlinear fitting ability (Ji et al. 2012). A significant limitation of ANNs is their inability to be constructed with more than one or two hidden layers, which can restrict their ability to model complex relationships effectively. To successfully address complicated issues, deep learning (DL) networks such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have recently been enhanced with multi-layered architectures, and time series data, such as streamflow, can be effectively modeled using DL algorithms of neural networks (Wegayehu & Muluneh 2022). An RNN is a neural network that is specialized for processing a sequence of data, swiftly adjusting to temporal dynamics through prior time-step data (Apaydin et al. 2020). Nevertheless, they struggle to capture prolonged dependencies and are prone to issues of vanishing and exploding gradients. To overcome this deficiency, Hochreiter & Schmidhuber (1997) proposed a long short-term memory (LSTM) network for learning long-term dependence. A few years back, gated recurrent units (GRUs) were introduced as an alternative to LSTM. GRUs, similar to LSTM but with a forget gate and fewer parameters, do not include an output gate (Apaydin et al. 2020). GRUs have demonstrated proficiency in time series modeling and natural language processing akin to LSTM. Nonetheless, there is an ongoing discussion regarding the comparative effectiveness of these two architectures in simulating streamflow and reservoir inflow. Clark et al. (2024) conducted a large-scale comparison, evaluating monthly LSTM predictions against conceptual rainfall-runoff (WAPABA model) predictions across nearly 500 catchments in Australia. The findings revealed that LSTM models matched or outperformed WAPABA in prediction accuracy for more than two-thirds of the examined catchments. Yifru et al. (2024) explored watershed memory and process modeling-based hybrid approaches across varied hydrological settings, encompassing Korean and Ethiopian watersheds. Three hybrid models, integrating watershed memory and residual error, were developed and assessed against independent LSTM models. The findings demonstrated that the hybrid models outperformed the standalone LSTM models across all watersheds. Yuan et al. (2018) employed LSTM networks to predict monthly runoff in the Astor River Basin, Pakistan by optimizing the parameters with the ant lion optimizer (ALO) model (LSTM-ALO). The study demonstrated that the LSTM-ALO model improved the model performance by demonstrating the sensitivity of LSTM to datasets and parameter values. Adnan et al. (2023) investigated the effectiveness of a relevant vector machine enhanced with advanced manta ray foraging optimization (RVM-IMRFO) for forecasting monthly pan evaporation based on a restricted set of climatic input data. The results revealed that the RVM-IMRFO outperformed alternative approaches in predicting monthly pan evaporation solely using temperature data. This finding is significant, particularly in regions with limited access to other climatic data, such as developing countries. CNN has a superior capability in capturing spatial data features and has played a key role in the latest advancements in DL (Wegayehu & Muluneh 2022). Li & Xu (2022) used single-variable and multi-variable time series data in LSTM and CNN-LSTM models. Consequently, when predicting particulate matter (PM2.5) concentration for air quality analysis, the suggested multi-variable CNN-LSTM model demonstrated superior performance with minimal error. The fusion of CNN and LSTM models enhances time series prediction models by enabling the LSTM model to effectively capture extended sequences of pattern information. Conversely, CNN models excel in noise filtration of input data and extracting crucial features, potentially boosting the prediction model's accuracy (Livieris et al. 2020). Even though combining CNN with LSTM showed remarkable results in different studies, its application in hydrological fields still demands more research (Wegayehu & Muluneh 2023). In the context of the Kulfo watershed in Southern Ethiopia, assessing the impact of rating curve uncertainty on streamflow simulation is of paramount importance for improved water management, flood forecasting, and risk assessment (Yisehak et al. 2020). Despite its importance, the uncertainty associated with the rating curve and its impact on streamflow simulation have not been explicitly investigated. Therefore, this study aims to assess the effect of rating curve uncertainty on the accuracy and reliability of streamflow simulation at the outlet of the Kulfo watershed by offering a novel viewpoint through integrating BSRT with single and integrated DL models to enhance water resource management practices and support informed decision-making in the study area.
MATERIALS AND METHODS
Study area description
Data collection and analysis
Terrain data of Kulfo watershed DEM (30 × 30 m resolution) were downloaded from the United States Geological Survey (USGS) database. The meteorological data (daily precipitation, maximum, and minimum temperature) from 1991 to 2013 of four stations were collected from the Ethiopian Meteorology Institute (EMI). The quality and completeness of data heavily influence data analysis outcomes (Mathewos et al. 2024). In this study, the multiple imputation method and ML techniques were used comparatively to impute the missing meteorological data due to their ability to capture complex relationships and patterns. The consistency of rainfall records was checked by double mass curve analysis. The nondimensional parametrization method was utilized to verify the homogeneity of the precipitation data. The long-time daily flow and river water level readings (from 1991 to 2013) for the Kulfo station were obtained from the Ministry of Water and Energy (MoWE). Measured pairs of discharge and stage needed for rating curve establishment were also obtained from MoWE, which was taken since the commissioning of the Kulfo station.
Rain gauges provide a limited representation of the spatial distribution of rainfall during a storm, as they only offer point sampling. However, when conducting hydrological analyses that encompass large areas, it becomes essential to determine average rainfall depths over sub-watershed areas (Chow et al. 1988). In this study, the Thiessen polygon method was employed to calculate the spatial distribution of rainfall across the catchments with the help of ArcGIS 10.3 tools. Different combinations of monthly rainfall and discharge with different lag times are assessed to determine the optimal input combination for the DL streamflow simulation models. Temperature data are omitted due to their very weak relation with discharge. Linear correlation statistics, such as Pearson's correlation coefficient method, were used to understand the dependence between variables (Wegayehu & Muluneh 2022). The autocorrelation function for the monthly discharge reveals a good correlation up to two lagged times. The correlation analysis was performed between the independent variables: monthly rainfall and monthly discharge, and a significant correlation was obtained with 1- (t − 1) and 2-month lag time (t − 2). Hence, the DL models were designed with inputs including variables such as Q(t − 2), Q(t − 1), R(t − 2), R(t − 1), and R(t), while the output variable was limited to the monthly discharge for the current month, Q(t).
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow | 1.00 | 1.69 | 10.75 | 0.00 | 50.73 | 5.43 | 0.61 |
Rainfall | 0.68 | 1.90 | 11.25 | 0.00 | 56.87 | 6.34 | 0.67 |
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow | 1.00 | 1.69 | 10.75 | 0.00 | 50.73 | 5.43 | 0.61 |
Rainfall | 0.68 | 1.90 | 11.25 | 0.00 | 56.87 | 6.34 | 0.67 |
CV, coefficient of variation; SD, standard deviation.
Materials used
In this case study, various materials were utilized. ArcGIS 10.3 software was used to delineate the watershed and sub-basins. The bootstrap resampling technique (BSRT) was used to establish the rating curve and estimate associated uncertainty. For streamflow simulation, the DL models are employed. Python programming language, along with the Jupyter Notebook, was used for data processing and analysis and also for DL model development.
Methods
Rating curve establishment
The number of stage–discharge pairs needed to develop a rating curve depends on the unique attributes of the river and the desired level of accuracy (Kumlachew et al. 2023). For this study, 36 measured stage–discharge pairs were utilized to construct the rating curve and assess the associated uncertainty. To generate a daily time series of discharge based on the developed rating curve, the instantaneous time series data of the water level from 1991 to 2013 was used. Subsequently, a time series of monthly discharges was computed. The selection of the Kulfo station was based on the availability of gauging data, rating curve, and long-term streamflow measurements and stage recordings.
Determination of gauging height of zero discharge
The cease-to-flow reference level or the zero-discharge level () is an important parameter that represents the water level at zero discharge in a stream. It plays a vital role in establishing stage–discharge relationships, as it can influence the shape of the lower part of the rating curve (Kumlachew et al. 2023). Although determining the exact location of the datum correction that corresponds to zero flow is challenging for channel-controlled gauging stations, the lowest point opposite the gauge can serve as a reasonable indicator of the gauge height at zero discharge. Factors such as backwater effects, flow obstructions, or changes in water level need to be accounted for to ensure an accurate estimation of the zero-discharge point in the rating curve (Haile et al. 2023). There are three well-known mechanisms used to estimate the gauge height of zero discharge, such as the trial-and-error method, the arithmetic method, and computer-based optimization (Kumlachew et al. 2023). In this study, the zero-discharge level was determined using a computer-based optimization method, and verified by visually examining the stage–discharge graph, as it yields the most suitable curve for scattered stage and discharge data.
Rating curve uncertainty estimation
Uncertainty estimation, in a sense, aims at quantifying potential outcome ranges based on input variability, measuring output variability. Various approaches have been proposed to estimate the uncertainty in stage–discharge rating curves. The methodology for evaluating the uncertainty associated with a particular stage–discharge relationship is still an ongoing scientific matter (Le Coz et al. 2014). The research community has developed multiple uncertainty estimation methods, varying in complexity and suitability for different situations. There are some commonly employed methods listed below:
(1) Monte Carlo simulation: This is a probabilistic technique that involves randomly sampling the stage measurements, propagating the uncertainty through the rating curve equation, and generating a distribution of discharge values. It can capture complex relationships between variables but requires a large number of simulations to obtain reliable results (Khan et al. 2023).
(2) Bayesian inference: This method combines prior knowledge about the rating curve parameters with observed data to update this knowledge using Bayes' theorem. It provides a probabilistic framework for estimating uncertainty and can incorporate expert judgment or additional information into the analysis. Careful consideration of prior knowledge, computational resources, model specification, and interpretation is necessary to ensure accurate and reliable results (Sikorska & Renard 2017).
(3) Generalized likelihood uncertainty estimation (GLUE): GLUE involves defining multiple rating curve equations with different parameter sets and evaluating their performance against observed stage–discharge data. Uncertainty is then estimated by considering the range of parameter sets that provide an acceptable fit to the data. It relies on subjective choices, such as defining an acceptable performance threshold and selecting the likelihood function (Khan et al. 2023).
(4) BSRT: It involves randomly selecting subsets of observed data with replacement to create multiple bootstrap samples. These samples are then used to generate multiple model realizations or parameter estimates.
Different methods can present the results in various ways, such as upper/lower bands (Westerberg et al. 2011), distributions of discharge for each stage value, or full distributions of rating curve samples (Le Coz et al. 2014). However, these differences in output can limit the propagation of uncertainty to other analyses. Choosing the appropriate method depends on several factors, including data availability, computational resources, desired level of detail, and specific characteristics of the study site. Hence, there is no single optimal method for this purpose (Le Coz et al. 2014). In this study, the BSRT is employed in a Python environment for rating curve uncertainty estimation.
Rating curve uncertainty estimation using the BSRT involves generating multiple sets of rating curve parameters through resampling from the original dataset. This creates a distribution of possible rating curves, and uncertainty bands are defined based on the power law equation. The BSRT method offers several advantages over other techniques. It is more robust and flexible due to its ability to be applied to diverse datasets without assuming specific distributional forms. It effectively handles nonlinear relationships and outliers, which can be challenging for other methods. Furthermore, it provides a direct and straightforward approach to estimating uncertainty by analyzing variability in parameter estimates and predictions (Selle & Hannah 2010).
DL algorithms
DL is an advanced form of ML that uses neural networks with multiple layers to enhance performance (Mosavi et al. 2018). DL has emerged as a highly promising ML technique for accurate rainfall-runoff simulations.
Gated recurrent unit (GRU)
Bidirectional long short-term memory (Bi-LSTM)
Convolutional neural network (CNN)
CNN is one of the most successful DL models, especially for feature extraction, and its network architectures encompass 1D CNN, 2D CNN, and 3D CNN (Wegayehu & Muluneh 2021). The CNN structure generally consists of a convolution layer, a pooling layer, and a full connection layer. 1D CNN is mainly implemented for sequence data processing (Duan et al. 2020). 2D CNN is usually used for text and image identification (Lin et al. 2023), and usually, 3D CNN is usually recognized for modeling medical images and video data identification (Duan et al. 2020). Given that the current study focuses on time series analysis, we opted to implement 1D CNN.
Conv1D-LSTM hybrid model
DL model development in Python
The development steps for a DL model in Python can be summarized as follows: import different libraries, preprocess data, design model architecture, compile the model, train the model, use the model for prediction, evaluate its performance, iterate and refine, and deploy the model for real-world applications (Ergete et al. 2022). Achieving optimal performance in DL models necessitates making decisions on a combination of parameters and hyperparameters (Wegayehu & Muluneh 2021). Parameters are the variables that the model learns from the data during the training process, including the weights and biases of the neurons in the hidden layer(s) and the output layer. Hyperparameters, on the other hand, are set by the user before the training process begins and are not learned from the data. The following paragraph discusses the main hyperparameters optimized.
1. Number of hidden units: This parameter determines the number of neurons in the hidden layer of the DLs. Increasing the number of hidden units allows the model to capture more intricate patterns in the data, but it also increases the computational complexity.
2. Activation function: Activation functions are crucial in neural networks as they introduce nonlinearity, control the output range, and add interpretability to the model (Mosavi et al. 2018). The selection of an activation function relies on the specific problem at hand and the desired behavior of the network. There are several activation functions commonly used in DLs. In this case, Sigmoid and Tanh activation functions were used.
- 2.1. Sigmoid function: It converts the input to a value between 0 and 1, which makes it suitable for modeling probabilities or binary classification problems, and interpreted as the activation level of a neuron. Values that are near 0 indicate low activation, while values that are close to 1 indicate high activation.
3. Learning rate: It governs the magnitude of the step taken by the model to update its parameters during training. Setting the learning rate too high can lead to unstable training while setting it too low can result in slow convergence. Fine-tuning the learning rate is crucial for achieving optimal training performance and model accuracy. Experimentation and validation are essential to determining the best learning rate for a specific dataset and model architecture.
4. Number of epochs: The other hyperparameter that determines how many times the model sees the entire training dataset. It is essential to find the right balance to prevent underfitting or overfitting. Monitoring loss, using early stopping, adjusting learning rates, considering computational resources, and hyperparameter tuning are key aspects to optimizing the number of epochs for efficient training and accurate predictions in streamflow simulation tasks.
5. Dropout rate: Dropout is a regularization technique commonly used in DL models to prevent overfitting. It involves randomly setting a fraction of the neurons in the hidden layers to zero during each training iteration. Tuning the dropout rate can help improve the model's generalization performance and prevent it from memorizing noise in the training data.
6. Batch size: The batch size determines the number of training samples that are processed together in each iteration during training. Increasing the batch size can expedite training by processing more samples at once, yet it necessitates more memory. Smaller batch sizes may allow the model to generalize better as it updates its parameters more frequently.
7. Optimization algorithm: The optimization algorithm controls parameter updates during training. Common optimization algorithms used in DLs include stochastic gradient descent and Adam. The choice of an optimization algorithm can affect the convergence speed and final performance of the model.
The rationale behind choosing a set of parameters for effectively fine-tuning a DL model for streamflow simulation and achieving accurate and reliable predictions were model complexity, computational resources, and training performance (Wegayehu & Muluneh 2021). For example; monitoring the model's training performance, such as loss and accuracy, can guide the selection of parameters. Experimenting with different configurations and evaluating their impact on training metrics can help identify the optimal set of parameters. Understanding the sensitivities of these parameters and their effects on the model's performance is crucial for optimizing a DL model for streamflow simulation tasks and achieving accurate and reliable predictions.
Model training and testing
DL model training involves optimizing the model's parameters using labeled data, while testing evaluates its performance on unseen data. The purpose is to improve the model's accuracy and assess its suitability for real-world applications (Ergete et al. 2022). In this study, the DL models were trained for monthly streamflow data from 1991 to 2008 and tested from 2009 to 2013.
Evaluation of model performance
The purpose of performance measures in streamflow simulation is to evaluate the accuracy and reliability of the simulated streamflow data compared with observed data. In this research, evaluation is done by comparing the simulated streamflow from the three scenarios to the observed one using the following performance measures.
RESULTS AND DISCUSSION
BSRT model result
Figure 6 shows that, for medium and low flows, most of the data points are almost along the fitted curve. However, the data points highly deviated from the fitted curve for high water depths. The uncertainty was represented by the width of the uncertainty bands, which were determined by the lower and upper percentiles of the distribution of rating curves. The uncertainty bands were relatively wide compared with the range of the data. This suggested that there was significant uncertainty in the estimation of the rating curve that there could be substantial variation in the estimated discharge values.
The optimized parameters of the power law equation are extracted, representing the best-fit values for the relationship and its upper and lower bands. The goodness of fit was evaluated by visually inspecting the fitted curve against the data points and metrics such as the R2 and RMSE. The RMSE value of 7.6 m3s−1 indicates that, on average, the fitted curve predicts the discharge at gauging stations with an error of 7.6 m3s−1 and the R2 value of 0.87 implies that the fitted curve explains approximately 87% of the variation in the data, suggesting that there is unexplained variation in the data that is not captured by the fitted curve. Considering these results, it can be concluded that there is uncertainty in the rating curve. Consequently, the rating curve is not reliable in predicting discharge at the Kulfo gauging station. Therefore, it is important to acknowledge the presence of uncertainty in the rating curve and exercise caution when using it to predict flows. The next section evaluates how this uncertainty is propagated to the DL stream simulation model.
DL models result
Different DL models (Bi-LSTM, Conv1D-LSTM, and GRU) were constructed in the Python programming language for streamflow prediction during the historical period on a monthly scale. We experimented with different sets of values to assess the sensitivities of parameters on the results. The learning rate is the most sensitive parameter followed by the number of neurons in each layer, as it directly impacts training convergence and performance. Fine-tuning the learning rate is crucial for achieving optimal training results. The cost function used to determine the optimal output was the mean squared error (MSE). An attempt was made to enhance the accuracy of outcome predictions through a network. The network's precision is assessed using a cost function that penalizes errors during the training process, aiming for the optimal output with the lowest cost. During the training process, a repetition step involves dividing the training data into batches, where each batch contains a specific number of samples. This number of samples per batch is a hyperparameter that is typically determined through trial and error. In all models, this parameter is set at 32 in the most effective mode. During each repetition step, the cost function is calculated as the average MSE of 32 samples, comparing the observed and simulated streamflow data. In neural networks, the number of iteration steps is referred to as an epoch, where the network simulates the streamflow time series once in each epoch. Similar to other networks, in deep neural networks, neurons or network layers can be chosen arbitrarily to optimize model performance (Wegayehu & Muluneh 2022). Each network has a double hidden layer, with eight units in each of the first and second layers. The last layer's output of the network is connected to a dense layer with a single output neuron. Between the layers, a dropout equal to 10% is used. The first and second layers have eight neurons. The sigmoid activation function was applied for the hidden layer, which introduces nonlinearity to the model, enabling it to learn complex patterns in the time series data. One key benefit of using the sigmoid activation function is that it maintains a constant derivative for all inputs greater than 0. This consistent derivative helps accelerate the learning process of the neural network (Ergete et al. 2022). The parametric values of the weights and biases associated with the connections between the neurons in the neural networks are learned during the training process. The trial-and-error method is adopted to tune the hyperparameters of the networks. Each method is run with different epoch numbers. After numerous iterations, the optimal hyperparameter settings for the networks are presented in Table 2.
Hyperparameter . | Values . |
---|---|
Neuron | 8 |
Batch size | 32 |
Learning rate | 0.001 |
Max epoch | 100 |
Activation function | Sigmoid and Tanh |
Optimization | Adam |
Hyperparameter . | Values . |
---|---|
Neuron | 8 |
Batch size | 32 |
Learning rate | 0.001 |
Max epoch | 100 |
Activation function | Sigmoid and Tanh |
Optimization | Adam |
DL models . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | R2 . | |
FC | ||||||||
Bi-LSTM | 1.28 | 1.11 | 0.95 | 0.97 | 1.88 | 1.13 | 0.95 | 0.96 |
Conv1D-LSTM | 2.94 | 2.77 | 0.81 | 0.86 | 2.73 | 2.55 | 0.86 | 0.87 |
GRU | 3.97 | 3.72 | 0.73 | 0.77 | 3.67 | 3.48 | 0.78 | 0.84 |
LB | ||||||||
Bi-LSTM | 1.26 | 1.07 | 0.95 | 0.97 | 1.65 | 1.08 | 0.94 | 0.95 |
Conv1D-LSTM | 1.96 | 1.80 | 0.88 | 0.90 | 1.87 | 1.71 | 0.89 | 0.91 |
GRU | 2.57 | 2.42 | 0.84 | 0.85 | 2.29 | 2.18 | 0.85 | 0.87 |
UB | ||||||||
Bi-LSTM | 1.06 | 0.87 | 0.96 | 0.98 | 1.47 | 0.82 | 0.95 | 0.97 |
Conv1D-LSTM | 2.28 | 2.07 | 0.85 | 0.90 | 2.03 | 1.95 | 0.93 | 0.94 |
GRU | 2.97 | 2.82 | 0.85 | 0.87 | 2.67 | 2.54 | 0.79 | 0.83 |
DL models . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | R2 . | |
FC | ||||||||
Bi-LSTM | 1.28 | 1.11 | 0.95 | 0.97 | 1.88 | 1.13 | 0.95 | 0.96 |
Conv1D-LSTM | 2.94 | 2.77 | 0.81 | 0.86 | 2.73 | 2.55 | 0.86 | 0.87 |
GRU | 3.97 | 3.72 | 0.73 | 0.77 | 3.67 | 3.48 | 0.78 | 0.84 |
LB | ||||||||
Bi-LSTM | 1.26 | 1.07 | 0.95 | 0.97 | 1.65 | 1.08 | 0.94 | 0.95 |
Conv1D-LSTM | 1.96 | 1.80 | 0.88 | 0.90 | 1.87 | 1.71 | 0.89 | 0.91 |
GRU | 2.57 | 2.42 | 0.84 | 0.85 | 2.29 | 2.18 | 0.85 | 0.87 |
UB | ||||||||
Bi-LSTM | 1.06 | 0.87 | 0.96 | 0.98 | 1.47 | 0.82 | 0.95 | 0.97 |
Conv1D-LSTM | 2.28 | 2.07 | 0.85 | 0.90 | 2.03 | 1.95 | 0.93 | 0.94 |
GRU | 2.97 | 2.82 | 0.85 | 0.87 | 2.67 | 2.54 | 0.79 | 0.83 |
FC, fitted curve; LB, lower band; UB, upper band.
This section presents a comparison between the training and testing period's observed and simulated streamflow values. As shown in Table 3, it can be seen that the Bi-LSTM model is better than other models. High R2 index models are more confident in their correctness and dependability. Since the model does not account for all potential factors that could affect the discharge, a high R2 value does not always imply an accurate prediction (Wegayehu & Muluneh 2022).
In the investigation of monthly streamflow simulation within the Kulfo watershed, a comparative analysis of Bi-LSTM, GRU, and Conv1D-LSTM across three distinct scenarios (fitted curve, lower, and upper band) revealed compelling insights. Notably, all DL models demonstrated commendable performance across the scenarios (Figure 8). The simulated flow highly captures the baseflow, recession limb, and rising limb of the observed hydrograph, and also peak flow fairly. Furthermore, the findings revealed that the Bi-LSTM model consistently outperformed the GRU and Conv1D-LSTM models across all scenarios as per all performance metrics utilized with the Conv1D-LSTM model demonstrating the second-best performance (Table 3). The bidirectional architecture and LSTM capabilities of the Bi-LSTM model allowed it to effectively capture the complex temporal dependencies and patterns present in the streamflow data (Wegayehu & Muluneh 2022). This finding aligns with prior research, which has consistently indicated the superior capability of Bi-LSTM in capturing temporal patterns within diverse datasets, including streamflow (Kumar Singh & Marcy 2017). Ergete et al. (2022) also demonstrated the superiority of Bi-LSTM over other variants of the DL models for peak flow prediction in the Kessem watershed, and a similar result was registered by Siami-Namini et al. (2019), in time series prediction. Therefore, the Bi-LSTM model is used to assess the propagation of rating curve uncertainty to the streamflow simulation of the Kulfo River (Table 4). To assess the propagation of rating curve uncertainty on streamflow simulation, the DL models were trained and tested independently using the time series of streamflow data generated from rating curve uncertainty analysis scenarios. The result shows that there is a significant difference in the simulated streamflow when comparing the reference streamflow obtained from the fitted rating curve with the upper and lower bands of the fitted rating curve. The analysis revealed a diverse range of results in terms of discharge values. These variations can be attributed to the inherent uncertainty in rating curve estimation and the different DL architectures utilized in the study. The results indicate that the rating curve uncertainty propagates significantly to streamflow simulations, particularly in high-flow regions (Table 4). Consequently, the uncertainties related to rating curves in the Kulfo River led to a streamflow uncertainty of about 17.8 m3s−1, representing 22% at peak discharge. Zeroual et al. (2016) conducted a quantitative approach to reflect the impact of rating curve uncertainty on the improvement of monthly discharge volume prediction quality by the ANN rainfall-discharge model. The result revealed that the extrapolation using the deterministic rating curve overestimates the peak discharge and the rating curve uncertainties in this river lead to an uncertainty of the streamflow of about 30%.
Model . | Temporal scale . | Scenario-1 (FC) . | Scenario-2 (LB) . | Scenario-3 (UB) . |
---|---|---|---|---|
Bi-LSTM | Monthly (m3s−1) | 51.24 | 46.93 | 57.25 |
Monthly (MCM) | 132.81 | 121.64 | 148.39 |
Model . | Temporal scale . | Scenario-1 (FC) . | Scenario-2 (LB) . | Scenario-3 (UB) . |
---|---|---|---|---|
Bi-LSTM | Monthly (m3s−1) | 51.24 | 46.93 | 57.25 |
Monthly (MCM) | 132.81 | 121.64 | 148.39 |
CONCLUSION
Effective water resource management is dependent on accurate streamflow modeling. The rating curve, commonly used to drive streamflow data, is prone to uncertainty. This study investigated the effect of rating curve uncertainty in streamflow simulation through standalone and integrated DL models on the Kulfo River in Southern Ethiopia. The BSRT was applied to establish the rating curve and estimate related uncertainty. Three sets of reference streamflow data, derived from the fitted rating curve and its lower and upper uncertainty bands, were utilized to train DL models at different times. The analysis revealed a diverse range of results in terms of discharge volume, attributed to the inherent uncertainty in rating curve estimation and the different DL architectures employed. The integration of the BSRT with DL models highly enhanced the assessment of rating curve uncertainty propagation to streamflow simulation. Notably, the rating curve uncertainty has a substantial impact on streamflow simulations, specifically during extreme events. Bi-LSTM excels at capturing flow data patterns by incorporating both past and future information. While the DL models performed well individually for each scenario, the differences in simulated flow between the scenarios indicate that the uncertainty in the rating curve can introduce variability in simulation results. Thus, the simulated flow should not be directly used for decision-making in the watershed. Instead, streamflow should be treated as an uncertain variable and managed by incorporating rating curve uncertainty in decision-making. This study did not consider the economic implications of rating curve uncertainty in water resource planning and management. Thus, the economic impact of rating curve uncertainty at different temporal scales should be included in future research to bolster this study. Overall, the findings of this study highlight the importance of considering rating curve uncertainty in streamflow simulation, and the results can be useful for water resource managers and decision-makers in the watershed.
FUNDING STATEMENT
No fund was provided from any source.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.