ABSTRACT
Reducing uncertainty in streamflow simulation is vital for effective water resource management. The impact of uncertainty in model calibration data (discharge), commonly derived from the rating curve, is often overlooked. This study applies the Monte Carlo simulation technique (MCST) to assess uncertainty in the rating curve. Advanced machine learning (ML) models, bidirectional long short-term memory (BiLSTM), and bidirectional gated recurrent units (BiGRUs) were used comparatively to evaluate the propagation of this uncertainty onto streamflow simulation on both daily and monthly temporal scales. Different sets of streamflow data, derived from the fitted curve and its lower and upper uncertainty bands, were utilized to train ML models independently. The results show the substantial impact of rating curve uncertainty in streamflow simulations, with the BiGRU model surpassing the BiLSTM model on both scales. As a result, the uncertainty in the rating curve results in an uncertainty of the streamflow of up to 30 and 25% on daily and monthly simulations, respectively. These findings underscore the importance of considering rating curve uncertainty in streamflow simulation to ensure accurate and reliable results. Therefore, streamflow should be treated as an uncertain variable and managed by incorporating rating curve uncertainty in decision-making.
HIGHLIGHTS
MCST was integrated with ML models to evaluate the propagation of the rating curve uncertainty onto streamflow simulation.
Two advanced ML models are applied comparatively for streamflow simulation under rating curve uncertainty conditions.
Rating curve uncertainty substantially propagates onto streamflow simulation, specifically in peak flow regions.
INTRODUCTION
Rating curves are commonly employed to derive streamflow time series data due to the challenges associated with continuous discharge measurement, while methods for continuously monitoring stage height are readily accessible (Haile et al. 2023). However, developing a rating curve involves inherent uncertainties that can have significant economic implications (McMillan et al. 2010). The main purpose of a hydrometric gauging station is to monitor discharge data in real time. This is accomplished by continuously measuring the water stage and then converting it to discharge using a developed rating curve (McMillan et al. 2010). The rating curve's accuracy and reliability rely on the site's hydraulic conditions, understanding the physical processes connecting stage and discharge, and the availability and uncertainty of individual ratings (Le Coz et al. 2014). According to Dymond & Christian (1982), if discharge measurements encompass the full range of stages observed during a period of stable stage–discharge relationship, defining the discharge rating curve for that period is relatively straightforward. Conversely, if the discharge measurement does not cover the upper range of the rating curve, it becomes necessary to extrapolate the lower portion of the rating curve to the highest stage encountered, which in turn introduces uncertainties into the system (Herschy 2002). The majority of the authors recognize that the uncertainty associated with stage measurements is typically smaller in comparison to the uncertainty associated with velocity measurements (Kumlachew et al. 2023). This is primarily because stage measurements can be obtained with relatively high accuracy using precise instruments. On the other hand, velocity measurements are more challenging and prone to larger uncertainties due to factors such as flow turbulence, variations in flow velocity across the channel, and limitations of velocity measurement techniques (Domeneghetti et al. 2012). Guerrero et al. (2012) examined the impact of rating curve uncertainty on flood frequency estimation and found that ignoring rating curve uncertainty can lead to biased flood frequency estimates. Aronica et al. (2006) assessed the uncertainty in river flow predictions caused by errors in the rating curve and found that the uncertainty in flow predictions increased as the magnitude of the flow increased. Existing studies have highlighted the significant level of uncertainty in discharge measurements, with reported uncertainties as high as 20% of the observed value (Pelletier 1989). The uncertainty ranges mentioned earlier are influenced by operational factors, such as the number of sampling points, the duration of single velocity measurement, and the characteristics of the gauging sites (Le Coz et al. 2012). This uncertainty can propagate onto streamflow simulation, leading to potential inaccuracies and limitations in water resource management. Recognizing the need to account for this uncertainty, previous studies have proposed various approaches to estimate the uncertainty in the rating curve, varying in complexity and suitability for different situations (Le Coz et al. 2014). Monte Carlo simulation technique (MCST), Bayesian inference (BI), and generalized likelihood uncertainty estimation (GLUE) are commonly applied methods (Khan et al. 2023). MCST is a versatile and model-agnostic approach that involves random sampling of input parameter distributions. Its flexibility enables it to be applied to a broad spectrum of problems and models without necessitating specific assumptions about underlying probability distributions (Khan et al. 2023). Furthermore, it facilitates the comprehensive incorporation of uncertainty across multiple model parameters and inputs, encompassing both aleatory and epistemic (Le Coz et al. 2014). In contrast, BI provides a formal probabilistic framework for quantifying uncertainty by updating prior knowledge about model parameters using observed data. GLUE is a method that involves evaluating the likelihood of model outputs given observed data and using this likelihood to quantify uncertainty (Domeneghetti et al. 2012). Each method has its strengths and applicability. In this study, the MCST is employed to assess uncertainty in the rating curve.
Streamflow simulation provides important information for managing water resource systems (Pagano et al. 2014; Hirpa et al. 2016). However, owing to the complexity and nonlinearity of the rainfall–runoff relationship, dependable streamflow simulation is highly difficult (Werner & Yeager 2013). Hence, researchers attempt to provide models that simulate streamflow accurately and easily. Currently, there are two major approaches: the model driven and data driven (Mena et al. 2024). Model-driven approaches consist of mathematical models that simulate the hydrodynamic process of the water flow and are widely utilized due to their foundation in hydraulics and hydrology principles (Kumar Singh & Marcy 2017). Nevertheless, these models often necessitate extensive input data, prolonged calibration periods, and present challenges in comprehending the intricate and nonlinear nature of hydrological processes.
Considering these physical model restrictions, a data-driven model like machine learning (ML) is an appealing way to get beyond these limitations (Mosavi et al. 2018). The artificial neural network (ANN), a subfield of the ML model, is extensively employed in hydrology due to its reliability and ability to capture rainfall–runoff relationships. Tareke & Awoke (2023) used ANN to investigate and perform long-term forecasting of both streamflow and hydrological drought over Ethiopia, and the result indicates that ANN is a good tool to forecast streamflow. Further, Niu et al.(2019) applied four methods to derive the operation rule of hydropower reservoirs, including multiple linear regression (MLR), artificial neural network (ANN), extreme learning machine (ELM), and support vector machine (SVM). The simulations show that three artificial intelligence algorithms (ANN, SVM, and ELM) were able to provide better performances than the conventional MLR and the scheduling graph method. They concluded that the applications of artificial intelligence algorithms in deriving the operation rule of hydropower reservoirs might be a challenge, but represent valuable research work for the future. Recently, to successfully address complicated issues, deep learning (DL) networks have been enhanced with multilayered architectures. Time-series data, like streamflow, can be effectively modeled using recurrent neural network (RNN) variants of DL algorithms (Mena 2024). RNNs are often chosen for modeling in DL tasks for several reasons such as sequential data handling, memory of previous inputs, ability to capture long-term dependencies, and they have several architectural variants, which have been specifically designed to address the vanishing gradient problem and improve the learning of long-term dependencies. Gao et al. (2020) employed two popular variants of RNN networks, namely, long short-term memory (LSTM) and gated recurrent unit (GRU) networks, along with ANN model to simulate runoff in the Yutan station control catchment, Fujian Province, Southeast China. Results show that LSTM and GRU models perform better than the ANN model, and the GRU model performs equally well as the LSTM model. Moreover, Liu et al. (2023) evaluated the performance of eight different DL networks to compute river discharge time series based on water surface elevation time series observed from the Zhutuo gauging station on the Yangtze River. Their exploration shows that the BiGRU model outperformed other DL models and also shows that with this data-driven-based approach, river discharge can be accurately, objectively, and quickly computed directly from water surface elevation, which is of practical value for flood protection and water resources management (Mathewos et al. 2024). There are several irrigation projects and settlements downstream of the Woybo watershed, and the watershed experienced high flooding for years (Yisehak et al. 2020). Despite the importance of rating curve uncertainty assessment, there is a lack of research conducted in the watershed. This research gap highlights the need to assess rating curve uncertainty and its propagation onto streamflow simulation. Therefore, this study aims to assess the effect of rating curve uncertainty on the accuracy and reliability of streamflow simulation at the outlet of the Woybo watershed by offering a novel viewpoint through integrating MCST with different ML models to enhance water resource management practices and support informed decision-making in the study area.
MATERIALS AND METHODS
Description of the study area
Data collection
The data used in this study were collected from different sources. The spatial data (DEM) of the Woybo watershed of (30 × 30 m resolution) were downloaded from the United States Geological Survey (USGS) database. The meteorological data (daily precipitation, maximum and minimum temperature) from 1997 to 2018 of four stations were collected from the Ethiopian Meteorology Institute (EMI). The long-time daily streamflow time series and river stage measurements (from 1987 to 2017) for the Woybo gauging station were obtained from the Ministry of Water and Energy (MoWE). Measured pairs of discharge and stage were also obtained from MoWE.
Methodology
The quality and completeness of data heavily influence data analysis outcomes (Ukumo et al. 2022b). In this study, the multiple imputation method and inverse distance method were used comparatively to impute the missing meteorological and hydrological data. The homogeneity test of the selected gauging stations was assessed using the relative method through a nondimensional zing equation. The consistency of rainfall records was checked by double mass curve analysis. The Hargreaves & Samani method was applied for calculating potential evapotranspiration (PET), as this method offers a practical and reliable approach to estimating PET (Edamo et al. 2022). Rain gauges provide a limited representation of the spatial distribution of rainfall during a storm, as they only offer point sampling. However, when conducting hydrological analyses that encompass large areas, it becomes essential to determine average rainfall depths over subwatershed areas. For hydrological analyses covering large areas, it is important to estimate average rainfall depths over subwatershed areas as rain gauges provide limited point sampling of rainfall distribution (Chow et al. 1988). In this study, the spatial distribution of rainfall across the watershed was computed using the Thiessen polygon method with the assistance of ArcGIS 10.3 tools (Ukumo et al. 2023).
Input data set selection and model architectures
The variables that influence streamflow or hydrological modeling are numerous. When dealing with the hydrological situation where rainfall is the driving force behind runoff generation (another possibility is climate data like temperature), rainfall input seems the most logical variable to input to an ML model for streamflow simulation (Wegayehu & Muluneh 2022). In this study, variables that were used as ML model inputs are as follows: potential PET, rainfall with its lagged time, and lag time discharge. Since the feature size in this study is not large, rigorous feature selection criteria are not necessary. Linear correlation statistics, such as Pearson's correlation coefficient, serve as a simple yet effective method to understand the dependence between variables (Zeroual et al. 2016). The autocorrelation function for the discharge reveals a significant correlation up to two lagged times. The correlation analysis was performed between the independent variables, namely, rainfall and discharge, and a good correlation was obtained with a single lagged time. However, PET has a weak correlation with discharge. The ML models were designed with inputs including variables such as Q(t-1), RF(t-1), RF(t), and PET (Figure 3).
The various statistical properties are calculated and visualized in the correlation matrix of the dataset. These statistical properties provide valuable insights into the distribution, variability, and central tendency of the dataset, which can help in understanding the data characteristics and making informed decisions in data analysis and modeling (Table 1). Skewness measures the asymmetry of the distribution. A positive skewness indicates a right-skewed distribution where the tail on the right side is longer or fatter. The coefficient of variation is a measure of relative variability. The standard deviation values provide insights into the dispersion or spread of data points around the mean for each variable (Wegayehu & Muluneh 2021). A lower standard deviation indicates that data points are closer to the mean, while a higher standard deviation suggests greater variability or dispersion in the data. Understanding these standard deviation values helps in assessing the consistency and variability of the measurements for our dataset. Flow measurements show the highest level of variability among the three variables, indicating a wider range of values and potentially more diverse data points. The correlation matrix visualization helps in understanding the relationships between different variables in the dataset (Figure 3). It can help in gaining insights into the data distribution, identifying patterns, and making data-driven decisions based on the analysis results.
Data type . | Pearson correlation with streamflow . | skewness . | Mean . | Min . | Max . | SDa . | CVb . |
---|---|---|---|---|---|---|---|
Streamflow (m3/sec) | 1.00 | 2.81 | 9.04 | 0.00 | 120.08 | 13.98 | 1.54 |
RF (mm/day) | 0.38 | 2.63 | 4.19 | 0.00 | 43.48 | 4.98 | 1.19 |
PET (oc) | −0.14 | −0.34 | 4.03 | 1.84 | 11.14 | 0.76 | 0.18 |
Data type . | Pearson correlation with streamflow . | skewness . | Mean . | Min . | Max . | SDa . | CVb . |
---|---|---|---|---|---|---|---|
Streamflow (m3/sec) | 1.00 | 2.81 | 9.04 | 0.00 | 120.08 | 13.98 | 1.54 |
RF (mm/day) | 0.38 | 2.63 | 4.19 | 0.00 | 43.48 | 4.98 | 1.19 |
PET (oc) | −0.14 | −0.34 | 4.03 | 1.84 | 11.14 | 0.76 | 0.18 |
aSD stands for standard deviation; bCV stands for co-efficient of variation.
In this study, various materials were utilized. ArcGIS 10.3 software was used to delineate the watershed and subbasins. MCST was applied to establish the rating curve and evaluate associated uncertainty. ML models, specifically, bidirectional long short-term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU), were employed comparatively to evaluate the propagation of this uncertainty into streamflow simulation on both daily and monthly simulations. Python, a programming language, along with the Jupiter Notebook, was used for data analysis and ML model development.
Rating curve establishment and uncertainty estimation using MCST
Machine learning models
ML models are a sophisticated evolution of artificial intelligence (AI), leveraging multi-layered neural networks to improve overall performance (Mosavi et al. 2018). This advanced approach has garnered significant attention as a powerful tool for precise rainfall-runoff simulations. Specifically, RNNs models have demonstrated exceptional suitability for time series prediction tasks (Ayele et al. 2024).
Bidirectional long short-term memory (BiLSTM)
Bidirectional Gated Recurrent Units (BiGRU)
BiGRU is a type of RNNs architecture that incorporates bidirectional processing. The key difference between BiGRU and GRU lies in the direction of information flow. GRU processes the input sequence in only one direction, while BiGRU processes it in both directions, enabling it to capture more comprehensive context information (Staudemeyer & Morris 2019).
BiGRU and BiLSTM are both bidirectional recurrent neural network architectures, but they use different types of recurrent units (GRU and LSTM, respectively). Both architectures are effective for capturing complex dependencies in sequential data, but the choice between them depends on the specific requirements of the task at hand (Hunt et al. 2022). In this study, the two latest algorithms, BiLSTM and BiGRU, are applied comparatively (Figure 5).
Machine Learning models development in python environment
The process of developing ML models in a Python environment involves a series of steps, including importing libraries, preprocessing data, designing model architecture, compiling the model, training it, utilizing it for predictions, assessing its performance, refining it through iterations, and eventually deploying it for real-world applications (Ayele et al. 2024). To achieve optimal performance in ML models, decisions must be made regarding a combination of parameters and hyperparameters (Wegayehu & Muluneh 2021). The subsequent discussion focuses on the key hyperparameters that are optimized (Niu et al. 2019).
1. Number of hidden units: The number of neurons in the hidden layer is crucial in capturing complex patterns within the data. Increasing the number of hidden units enhances the model's ability to discern intricate details, albeit at the cost of heightened computational complexity.
2. Activation function: Activation functions are crucial in neural networks as they introduce non-linearity, control the output range, and add interpretability to the model (Mosavi et al. 2018). The selection of an activation function relies on the specific problem at hand and the desired behavior of the network. There are several activation functions commonly used in ML models. For this study, Sigmoid and Tanh activation functions are used.
3. Learning rate: It governs the magnitude of the step taken by the model to update its parameters during training. A higher learning rate accelerates convergence but increases the likelihood of the model overshooting the optimal solution. Conversely, a lower learning rate slows down convergence but enhances the model's ability to accurately fine-tune its parameters.
4. Batch size: The batch size determines the number of training samples that are processed together in each iteration during training. Increasing the batch size can expedite training by processing more samples at once, yet it necessitates more memory. Smaller batch sizes may allow the model to generalize better as it updates its parameters more frequently.
5. Optimization algorithm: The optimization algorithm controls parameter updates during training. Common optimization algorithms used in ML models include stochastic gradient descent (SGD) and Adam. The choice of an optimization algorithm can affect the convergence speed and final performance of the model.
Model training and testing
RNNs model training involves optimizing the model's parameters using labeled data, while testing evaluates its performance on unseen data. The purpose is to improve the model's accuracy and assess its suitability for real-world applications (Ayele et al. 2024). In this study, The ML models were trained for daily and monthly streamflow data from 1997 to 2011 and then tested from 2012 to 2018.
Evaluation of model performance
Taylor Diagram
The Taylor Diagram is a graphical tool used to evaluate the performance of models by summarizing key statistics in a single plot. It simultaneously represents three statistical metrics: Correlation Coefficient, Standard Deviation, and Centered Root Mean Square Error. The diagram provides a visual way to compare multiple models or scenarios against observed data. In the context of streamflow simulation, the Taylor Diagram is highly applicable because:
1. Comprehensive Evaluation: It allows for a simultaneous assessment of multiple performance metrics (correlation, variability, and error), offering a holistic evaluation of model accuracy.
2. Model Comparison: The diagram facilitates the comparison of different models, configurations, or datasets.
3. Communication of Results: Its compact, visual format makes it easier to communicate complex model evaluation results to a broader audience, including researchers and practitioners.
In this research, the Taylor Diagram will be used to assess the performance of streamflow simulations in the Woybo River by comparing modeled and observed values at daily and monthly scales.
RESULTS
MCST model result
From the above analysis the optimized parameters of the power law equation are extracted, representing the best-fit values for the relationship and its upper and lower bands that are further used for streamflow time series determination. The goodness of fit was evaluated by visually inspecting the fitted curve against the data points and metrics such as the R2 and RMSE. The RMSE value of 10.95 m3s−1, indicating an average prediction error of 10.95 m3s−1, and the R2 value of 0.84, implies that the fitted curve explains approximately 84% of data variation, suggesting that there is unexplained variation in the data that is not captured by the fitted curve. This uncertainty, particularly in the peak flow region, renders the rating curve unreliable for predicting discharge at the Woybo gauging station, emphasizing the need to acknowledge and exercise caution regarding this uncertainty when using the curve for flow predictions. The propagation of this uncertainty into ML streamflow simulation models is assessed in the next section.
ML models result
In this study, two different ML models (BiLSTM, BiGRU) were constructed in Python programming language for streamflow simulation. The cost function used to determine the optimal output was the Mean Squared Error (MSE). An attempt was made to enhance the accuracy of outcome predictions through a network. The network's precision is assessed using a cost function that penalizes errors during the training process, aiming for the optimal output with the lowest cost. During the training process, a repetition step involves dividing the training data into batches, where each batch contains a specific number of samples. This number of samples per batch is a hyper-parameter that is typically determined through trial and error. In all models, this parameter is set at 128 for daily simulations and 16 for monthly simulations in the most effective mode. During each repetition step, the cost function is calculated as the average Mean Squared Error (MSE) of 128 samples for daily simulations and 16 samples for monthly simulations, comparing the observed and simulated streamflow data.
In neural networks, the number of iteration steps is referred to as an epoch, where the network simulates the streamflow time series once in each epoch. Similar to other networks, in recurrent networks, neurons or network layers can be chosen arbitrarily to optimize model performance (Wegayehu & Muluneh 2022). The structures of both RNN models are generated equally to compare them with one another. Each network has a double hidden layer, with 12 units for daily data and 8 units for monthly data in each of the first and second layers. The last layer's output of the network is connected to a dense layer with a single output neuron. Between the layers, a dropout equal to 10% is used. The structure of the neural network is also used in two hidden layers. The first and second layers have 12 and 8 neurons each for daily and monthly simulations, respectively. In both networks, the sigmoid activation function is applied for the hidden layer, which introduces non-linearity to the model, enabling it to learn complex patterns in the time series data. One key benefit of using the sigmoid activation function is that it maintains a constant derivative for all inputs greater than 0. This consistent derivative helps accelerate the learning process of the neural network (Ayele et al. 2024). The parametric values of the weights and biases associated with the connections between the neurons in the neural networks are learned during the training process, where the models are optimized to minimize the Mean Squared Error (MSE) cost function.
Hyperparameter . | Daily scale . | Monthly scale . |
---|---|---|
Neuron | 12 | 8 |
Optimization | Adam | Adam |
Learning rate | 0.001 | 0.001 |
Activation function | Sigmoid and Tanh | Sigmoid and Tanh |
Max Epoch | 1,000 | 100 |
Batch size | 128 | 16 |
Hyperparameter . | Daily scale . | Monthly scale . |
---|---|---|
Neuron | 12 | 8 |
Optimization | Adam | Adam |
Learning rate | 0.001 | 0.001 |
Activation function | Sigmoid and Tanh | Sigmoid and Tanh |
Max Epoch | 1,000 | 100 |
Batch size | 128 | 16 |
Curve type (scenarios) . | ML . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
Models . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | |
Fitted curve | BiGRU | 2.02 | 1.19 | 0.97 | 1.82 | 1.14 | 0.98 |
BiLSTM | 6.96 | 6.67 | 0.78 | 5.56 | 4.97 | 0.85 | |
Upper band | BiGRU | 2.12 | 1.29 | 0.95 | 2.08 | 1.26 | 0.96 |
BiLSTM | 5.34 | 4.32 | 0.86 | 2.94 | 2.96 | 0.91 | |
Lower band | BiGRU | 2.08 | 1.26 | 0.96 | 2.02 | 1.19 | 0.97 |
BiLSTM | 6.06 | 5.77 | 0.81 | 4.84 | 4.12 | 0.88 |
Curve type (scenarios) . | ML . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
Models . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | |
Fitted curve | BiGRU | 2.02 | 1.19 | 0.97 | 1.82 | 1.14 | 0.98 |
BiLSTM | 6.96 | 6.67 | 0.78 | 5.56 | 4.97 | 0.85 | |
Upper band | BiGRU | 2.12 | 1.29 | 0.95 | 2.08 | 1.26 | 0.96 |
BiLSTM | 5.34 | 4.32 | 0.86 | 2.94 | 2.96 | 0.91 | |
Lower band | BiGRU | 2.08 | 1.26 | 0.96 | 2.02 | 1.19 | 0.97 |
BiLSTM | 6.06 | 5.77 | 0.81 | 4.84 | 4.12 | 0.88 |
Curve type (scenarios) . | ML . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
Models . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | |
Fitted curve | BiGRU | 2.57 | 2.16 | 0.93 | 2.94 | 2.96 | 0.91 |
BiLSTM | 8.89 | 7.13 | 0.72 | 10.05 | 8.23 | 0.65 | |
Upper band | BiGRU | 4.74 | 4.08 | 0.89 | 5.34 | 4.32 | 0.86 |
BiLSTM | 7.89 | 6.13 | 0.78 | 9.05 | 5.03 | 0.72 | |
Lower band | BiGRU | 1.48 | 1.17 | 0.96 | 2.14 | 2.05 | 0.94 |
BiLSTM | 6.56 | 5.97 | 0.80 | 9.81 | 6.35 | 0.71 |
Curve type (scenarios) . | ML . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
Models . | RMSE . | MAE . | R2 . | RMSE . | MAE . | R2 . | |
Fitted curve | BiGRU | 2.57 | 2.16 | 0.93 | 2.94 | 2.96 | 0.91 |
BiLSTM | 8.89 | 7.13 | 0.72 | 10.05 | 8.23 | 0.65 | |
Upper band | BiGRU | 4.74 | 4.08 | 0.89 | 5.34 | 4.32 | 0.86 |
BiLSTM | 7.89 | 6.13 | 0.78 | 9.05 | 5.03 | 0.72 | |
Lower band | BiGRU | 1.48 | 1.17 | 0.96 | 2.14 | 2.05 | 0.94 |
BiLSTM | 6.56 | 5.97 | 0.80 | 9.81 | 6.35 | 0.71 |
The daily and monthly simulated and observed flow hydrographs are presented below in their Performance comparison for different scenarios; fitted curve (FC), lower band (LB), and upper bands (UB).
DISCUSSION
The evaluation of rating curve uncertainty on streamflow simulation was conducted by independently training and testing ML models using a time series of streamflow data derived from various rating curve uncertainty analysis scenarios. The analysis revealed a diverse range of results that can be attributed to the inherent uncertainty in rating curve estimation, the temporal scale used, and the different ML models utilized in the study. The notable distinctions observed in daily and monthly simulations across various scenarios highlight the sensitivity of ML models to rating curve uncertainty in different time windows (Figures 8 and 9).
The results indicate that the rating curve uncertainty propagates significantly onto streamflow simulations, specifically in high flow depths. The simulated flow captured the baseflow, recession limb, and rising limb of the observed hydrograph, and also peak flow fairly. The ML models performed well on a daily basis; this is due to the length of time series data used (Figure 8). The daily data provides a higher temporal resolution and more frequent observations, allowing the model to capture finer-grained patterns and fluctuations in the data (Hunt et al. 2022).
The purpose of the training and test loss function plot is to evaluate the model's performance during the training process as it provide insights into how the model's loss, which is used to assess the convergence of the model, identifies potential overfitting or underfitting, and determines the effectiveness of the model in capturing the underlying patterns in the streamflow data (Mosavi et al. 2018). On the other hand, the scatter plot for observed and simulated streamflow serves as a means to visually compare the model's predictions with the actual streamflow observations. This finding aligns with prior research, which has indicated the superior capability of BiGRU in capturing temporal patterns within diverse datasets (Wang et al. 2022; Ayana et al. 2023). This might be due to the GRU's computationally lower cost than LSTMs with their simpler structure. This can allow BiGRU to process data more efficiently, especially in tasks like streamflow simulation where large amounts of data need to be processed. Thus, the BiGRU model was applied to assess the propagation of rating curve uncertainty onto the streamflow simulation of the Woybo River (Table 5).
Model . | Temporal scale . | Scenario-1(FC) . | Scenario-2(UB) . | Scenario-3(LB) . |
---|---|---|---|---|
BiGRU | Daily (m3s−1) | 116.5 | 151.5 | 76.4 |
BiGRU | Monthly (m3s−1) | 52.1 | 65.3 | 39.8 |
Model . | Temporal scale . | Scenario-1(FC) . | Scenario-2(UB) . | Scenario-3(LB) . |
---|---|---|---|---|
BiGRU | Daily (m3s−1) | 116.5 | 151.5 | 76.4 |
BiGRU | Monthly (m3s−1) | 52.1 | 65.3 | 39.8 |
The result shows that there is a significant difference in the simulated streamflow when comparing the reference streamflow obtained from the fitted rating curve with the upper and lower bands of the fitted rating curve. This difference is particularly pronounced for high flows compared to medium and low flows. Consequently, the rating curve uncertainty in the Woybo River resulted in about 30 and 25% uncertainty in streamflow at peak discharge for daily and monthly mean simulations, respectively. Future research could expand the findings of this study to other geographic regions. In our next investigation, we plan to utilize a range of ML models to implement an ensemble learning approach for simulating streamflow. This will incorporate data products derived from remote sensing, such as vegetation and precipitation indices.
CONCLUSION
This study evaluated the rating curve uncertainty propagation onto streamflow simulation using MCST models coupled with the ML models. MCST was utilized to establish the rating curve and estimate related uncertainty and ML models, specifically BiGRU, and BiLSTM, were utilized to evaluate the impact of rating curve uncertainties on streamflow simulation in different temporal scales (daily and monthly). The results indicate that the integration of the MCST with ML models highly contributed to the evaluation of the propagation of rating curve uncertainty into streamflow simulation, and the rating curve uncertainty propagates significantly into streamflow simulations, particularly during extreme events. While the ML models performed well individually for each scenario, the differences in simulated flow between the scenarios indicate that the uncertainty in the rating curve can introduce variability and influence the simulation results. The ML models performed well on a daily scale compared to a monthly scale, and the BiGRU excels at capturing flow data patterns on both temporal scales as per all performance matrices used. Consequently, the rating curve uncertainty in the Woybo River led to an uncertainty of the streamflow of about 30 and 25% at peak discharge for daily and monthly mean simulations, respectively. Overall, the findings of this study highlight the importance of considering rating curve uncertainty in streamflow simulation, particularly on different temporal scales. The results can be useful for water resource managers and decision-makers in the Woybo Watershed.
FUNDING STATEMENT
No fund was provided from any source.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.