ABSTRACT
Streamflow prediction offers crucial information for managing water resources, flood control, and hydropower generation. Yet, reliable streamflow prediction is challenging due to the complexity and nonlinearity of the rainfall-runoff relationship. This study investigated the comparative performance of the newly integrated self-attention-based deep learning (DL) model, SA-Conv1D-BiGRU with Conv1D-LSTM, and bidirectional long short-term memory (Bi-LSTM) models for streamflow prediction under different time-series conditions, and a range of variable input combinations based on flood events. All datasets passed quality control procedures, and the time lag for generating input series was established through Pearson correlation analysis. 80% of the data was used for training, whereas 20% was used to evaluate the model's performance. The performance of the models was evaluated using three metrics: mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (R2). The findings reveal the excellent potential of DL models for streamflow prediction, with the SA-Conv1D-BiGRU model outperforming other models under different time-series characteristics. Despite the complexity, the Conv1D-LSTM models did not outperform the Bi-LSTM model. In conclusion, the results are condensed into themes of model variability and time-series characteristics. Consequently, different architectures in DL models had a greater influence on streamflow prediction accuracy than input time lags and time-series features.
HIGHLIGHTS
Deep learning (DL) algorithms are compared for streamflow prediction across various time-series traits.
Bi-LSTM outperforms Conv1D-LSTM in capturing time-series characteristics.
Hybrid self-attention improves spatial and temporal feature identification in time series.
INTRODUCTION
Streamflow prediction offers crucial information for managing water resource systems, flood control, and hydropower generation. However, due to the high nonlinearity and spatiotemporal variability in hydrological processes reliable streamflow prediction is challenging (Apaydin et al. 2020). Streamflow prediction techniques employ various statistical, mathematical, and computational approaches (Gupta & Nearing 1969). The choice of technique is influenced by the available data, system complexity, and the specific requirements of the application (Wegayehu & Muluneh 2022). Although traditional statistical models and lumped hydrological models are effective in managing the temporal fluctuations observed in precipitation and flow time series, they encounter challenges in accurately depicting the spatial variations inherent in these phenomena (Niu et al. 2019). Hence, researchers attempt to provide models that simulate streamflow accurately and easily. Given these limitations of physical models, a data-driven model presents an attractive solution to overcome these challenges (Ji et al. 2012).
Data-driven models rely on the statistical relationship between input and output data. These models are further classified as linear and nonlinear models. Autoregressive moving average (ARMA), multiple linear regression (MLR), and autoregressive integrated moving average (ARIMA) are the most common linear methods (Mosavi et al. 2018), and the most common nonlinear data-driven model is a machine learning (ML) model. The major drawback of the former model is that it is incapable of handling the system's nonlinearity (Apaydin et al. 2020). Advanced data-driven models, like ML, are increasingly used due to the shortcomings of the physical-based and linear models discussed above. ML, a subfield of artificial intelligence (AI), is nowadays the most widely used in hydrology. It involves using computational power to extract insights from data by iteratively learning relationships from datasets (Salehinejad et al. 2017).
Artificial neural networks (ANNs), one of the popular data-driven models, have been widely applied in hydrological modeling for their strong nonlinear fitting ability (Ji et al. 2012). A significant limitation of ANNs is their inability to be constructed with more than one or two hidden layers, which can restrict their ability to model complex relationships effectively. To successfully address complicated issues, deep learning (DL) networks such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have recently been enhanced with multi-layered architectures, and time-series data, like streamflow, can be effectively modeled using DL algorithms of neural networks (Wegayehu & Muluneh 2022).
RNN is a neural network that is specialized for processing a sequence of data, swiftly adjusting to temporal dynamics through prior time-step data (Apaydin et al. 2020). Nevertheless, they struggle to capture prolonged dependencies and are prone to issues of vanishing and exploding gradients. To overcome this deficiency, Hochreiter & Schmidhuber (1997) proposed a long short-term memory (LSTM) network for learning long-term dependence. Recently, LSTM models have been explored and researched in watershed hydrological modeling. Their capabilities have been showcased in various applications, including river flow forecasting and flood prediction (Wegayehu & Muluneh 2022). Kratzert et al. (2018) utilized the LSTM network for daily flow prediction and discovered that it significantly outperformed hydrological models calibrated at both the regional and individual basin levels. In their study, Hu et al. (2018) evaluated the LSTM model using 98 flood events and concluded that the LSTM model surpassed conceptual and physical models in terms of performance. Studies have shown that LSTM exhibits impressive performance in streamflow prediction when compared with other sophisticated multi-layered techniques. A few years back, gated recurrent units (GRUs) were introduced as an alternative to LSTM. GRUs, similar to LSTM but with a forget gate and fewer parameters, do not include an output gate (Apaydin et al. 2020). GRUs have demonstrated proficiency in time-series modeling and natural language processing akin to LSTM. Nonetheless, there is an ongoing discussion regarding the comparative effectiveness of these two architectures in simulating streamflow and reservoir inflow. This area has not been extensively explored across different timescales and environments.
CNN has a superior capability in capturing spatial data features and has played a key role in the latest advancements in DL (Wegayehu & Muluneh 2022). Van et al. (2020) developed a novel 1D CNN with a ReLU activation function for rainfall-runoff modeling. Recently, integrated DL approaches have received more attention in hydrological modeling. Specifically, the 1D CNN component is used to capture temporal patterns and spatial dependencies in the time-series data. It is effective in extracting local features from the input data. Then, the LSTM component is used to model the temporal dependencies and long-range dependencies present in the streamflow data. By combining the capabilities of 1D CNN and LSTM in the integrated model, the system can effectively leverage both the spatial features extracted by the 1D CNN and the temporal dependencies captured by the LSTM for improved streamflow prediction performance. Furthermore, combining CNN with GRU can enhance data preprocessing robustness, offering a promising avenue to enhance the model's precision (Wegayehu & Muluneh 2022). In their study, Li & Xu (2022) employed both single-variable and multi-variable time-series data in LSTM and CNN-LSTM models. The results indicated that when forecasting particulate matter (PM2.5) concentration for air quality analysis, the multi-variable CNN-LSTM model they proposed exhibited superior performance, showcasing minimal error. The fusion of CNN and LSTM models enhances time-series prediction models by enabling the LSTM model to effectively capture extended sequences of pattern information. Conversely, CNN models excel in noise filtration of input data and extracting crucial features, potentially boosting the prediction model's accuracy (Livieris et al. 2020). Moreover, Wegayehu & Muluneh (2021) used a hybrid CNN-LSTM and CNN-GRU models for multivariate streamflow prediction in different climatic regions, and the model effectively captured the complex temporal dependencies and patterns present in the streamflow data. Their findings indicated that the CNN-GRU model surpasses both CNN-LSTM and traditional LSTM and GRU models.
Most recently, to allow the model to weigh the importance of different time steps in the input sequence, the self-attention mechanism was applied. Forghanparast (2022) conducted a study comparing the effectiveness of three DL algorithms – CNN, LSTM, and self-attention LSTM models – with a baseline extreme learning machine (ELM) model for forecasting monthly streamflow in the upper reaches of the Texas Colorado River. According to the results, the SA-LSTM model offered higher accuracy and better stability. Zhou et al. (2023) compared a new hybrid DL model to predict hourly streamflow: SA-CNN-LSTM with LSTM, CNN, ANN, RF, SA-LSTM, and SA-CNN. Their findings revealed that the SA-CNN-LSTM model exhibited strong predictive capabilities across varying flood intensities and lead times, emphasizing capturing temporal and feature interdependencies in runoff forecasting. Despite the variations in their performance, choosing suitable time-series models from a range of established DL network architectures poses a challenge. Further research is needed to achieve predictions with enhanced prediction accuracy, quicker processing times, and simpler model structures. To the best of our knowledge, there is limited literature available that explores the performance differences among various hybrid DL models for streamflow prediction under different input variability conditions. Hence, conducting a comparative assessment of diverse network architectures can assist in identifying the most optimized solution for time-series analysis.
In this study, the SA-Conv1D-BiGRU hybrid streamflow prediction model is introduced to capture the interdependencies between time steps and features within the streamflow series. By considering these relationships, the model aims to enhance the accuracy and performance of monthly streamflow predictions. Thus, as a main aim, we compared the newly integrated self-attention-based DL model, SA-Conv1D-BiGRU with other integrated and standalone models, Conv1D-LSTM, and bidirectional LSTM (Bi-LSTM) for streamflow prediction under different time-series characteristics.
STUDY AREA AND DATA
Study area description
(A) The study was carried out in the Kulfo River Watershed, which is located in the Abaya-Chamo sub-basin of the Southern Ethiopian Rift Valley. This watershed flows into Lake Chamo and is positioned between latitudes 5°55′ N and 6°15′ N, and longitudes 37°18′ E and 37°36′ E (Figure 1). The elevation in the area ranges from 1,208 to 3,547 meters above sea level (masl), covering a total area of about 384.56 km2. The annual rainfall in the catchment area varies from 620 to 1,250 mm, and the mean annual temperature ranges from 14 to 23 °C.
(B) Woybo River is one of the tributaries of the Omo River Basin flowing in the southwest of Ethiopia. It flows into Omo Gibe River and is situated between latitudes 6°40′ N and 7°10′ N and longitudes 37°30′ E and 38°00′ E as shown in Figure 1. The study area has a tropical climate regime with a watershed area of 533.65 km2. Precipitation in the watershed has strong seasonal and elevation variability. The rainy season spans from April to October, with July and August being the wettest months. Rainfall distribution is mainly influenced by the movement of the Inter Tropical Convergence Zone from south to north. The average maximum and minimum temperatures range from 19.67 to 21.83 °C and 16.19 to 18.71 °C, respectively. The sub-basin receives an average annual rainfall of 1,377.74 mm, indicating significant spatial and temporal variability in precipitation. The drainage network of the Woybo River watershed was derived from the digital elevation model (DEM). The watershed, classified as third order with a drainage density of 0.45 km/km2, consists of 23 streams totaling 148.9 km in length. The longest flow path in the watershed is 1,944 km, and the bifurcation ratio is 0.96 (Ukumo et al. 2022).
Data collection and preprocessing
The DEM data for the Kulfo watershed with a 30 × 30 m resolution was obtained from the United States Geological Survey (USGS) database. Meteorological data, including daily precipitation, and maximum and minimum temperatures were gathered from the Ethiopian Meteorology Institute (EMI). The quality and completeness of the data play a crucial role in influencing the outcomes of data analysis (Mathewos et al. 2024). In this study, the multiple imputation method and ML techniques were used comparatively to impute the missing data, given their ability to capture complex relationships and patterns. Rainfall records were examined for consistency using double mass curve analysis. The nondimensional parametrization method was used to verify the homogeneity of the rainfall data. The long-time daily flow (from 1991 to 2013) for the Kulfo station and from 1997 to 2018 for the Woybo station was obtained from the Ministry of Water and Energy (MoWE).
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow | 1.00 | 1.69 | 10.75 | 0.00 | 50.73 | 5.43 | 0.61 |
Rainfall | 0.68 | 1.90 | 11.25 | 0.00 | 56.87 | 6.34 | 0.67 |
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow | 1.00 | 1.69 | 10.75 | 0.00 | 50.73 | 5.43 | 0.61 |
Rainfall | 0.68 | 1.90 | 11.25 | 0.00 | 56.87 | 6.34 | 0.67 |
CV, coefficient of variation; SD, standard deviation.
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow (m3/s) | 1.00 | 2.81 | 9.04 | 0.00 | 120.08 | 13.98 | 1.54 |
RF (mm/day) | 0.38 | 2.63 | 4.19 | 0.00 | 43.48 | 4.98 | 1.19 |
Data type . | Pearson correlation with streamflow . | Skewness . | Mean . | Min . | Max . | SD . | CV . |
---|---|---|---|---|---|---|---|
Streamflow (m3/s) | 1.00 | 2.81 | 9.04 | 0.00 | 120.08 | 13.98 | 1.54 |
RF (mm/day) | 0.38 | 2.63 | 4.19 | 0.00 | 43.48 | 4.98 | 1.19 |
CV, coefficient of variation; SD, standard deviation.
The datasets underwent thorough quality control procedures, including preprocessing steps such as data standardization, and post-processing tasks like model evaluation metric computations and visualizations, all carried out using Python. The models were constructed and executed in Python (Zeroual et al. 2016). The pre-processed data was split into training and testing sets for model development and assessment. The training set is employed to train the model and conduct hyperparameter tuning, whereas the testing set is utilized to assess the final model's performance and split into 80% for training and 20% for testing, as depicted in Figure 2.
Input combination . | Output . | Scenario . | Model name . |
---|---|---|---|
Qt−1, Qt−3, Qt−6 | Qt | 1 | Bi-LSTM1 Conv1D-LSTM1 SA-Conv1D-BiGRU1 |
Qt−1, Qt−3, Qt−6, Qt−9, Qt−11, Qt−15 | Qt | 2 | Bi-LSTM2 Conv1D-LSTM2 SA-Conv1D-BiGRU2 |
Rt−1, Rt, Qt−1 | Qt | 3 | Bi-LSTM3 Conv1D-LSTM3 SA-Conv1D-BiGRU3 |
Rt, Rt−1, Rt−2, Qt−1, Qt−2 | Qt | 4 | Bi-LSTM4 Conv1D-LSTM4 SA-Conv1D-BiGRU4 |
Input combination . | Output . | Scenario . | Model name . |
---|---|---|---|
Qt−1, Qt−3, Qt−6 | Qt | 1 | Bi-LSTM1 Conv1D-LSTM1 SA-Conv1D-BiGRU1 |
Qt−1, Qt−3, Qt−6, Qt−9, Qt−11, Qt−15 | Qt | 2 | Bi-LSTM2 Conv1D-LSTM2 SA-Conv1D-BiGRU2 |
Rt−1, Rt, Qt−1 | Qt | 3 | Bi-LSTM3 Conv1D-LSTM3 SA-Conv1D-BiGRU3 |
Rt, Rt−1, Rt−2, Qt−1, Qt−2 | Qt | 4 | Bi-LSTM4 Conv1D-LSTM4 SA-Conv1D-BiGRU4 |
Input combination . | Output . | Scenario . | Model name . |
---|---|---|---|
Qt−1, Qt−2, Qt−3 | Qt | 1 | Bi-LSTM1 Conv1D-LSTM1 SA-Conv1D-BiGRU1 |
Qt−2, Qt−1, Rt, Rt−1 | Qt | 2 | Bi-LSTM2 Conv1D-LSTM2 SA-Conv1D-BiGRU2 |
Rt, Rt−1, Rt−2, Qt−1, Qt−2 | Qt | 3 | Bi-LSTM3 Conv1D-LSTM3 SA-Conv1D-BiGRU3 |
Input combination . | Output . | Scenario . | Model name . |
---|---|---|---|
Qt−1, Qt−2, Qt−3 | Qt | 1 | Bi-LSTM1 Conv1D-LSTM1 SA-Conv1D-BiGRU1 |
Qt−2, Qt−1, Rt, Rt−1 | Qt | 2 | Bi-LSTM2 Conv1D-LSTM2 SA-Conv1D-BiGRU2 |
Rt, Rt−1, Rt−2, Qt−1, Qt−2 | Qt | 3 | Bi-LSTM3 Conv1D-LSTM3 SA-Conv1D-BiGRU3 |
METHODS
In this case study, a variety of resources were employed. The delineation of the watershed and sub-basins was conducted using ArcGIS 10.3 software. Python, a programming language, along with the Jupyter Notebook, was applied for data processing, analysis, and the development of DL models.
DL algorithms
DL is an advanced form of ML that uses neural networks with multiple layers to enhance performance (Mosavi et al. 2018). DL has emerged as a highly promising ML technique for accurate rainfall-runoff predictions.
Bidirectional long short-term memory (Bi-LSTM)
Convolutional neural network (CNN)
The CNN is recognized as one of the most successful DL models, particularly for its effectiveness in feature extraction and its network architectures encompass 1D CNN, 2D CNN, and 3D CNN (Wegayehu & Muluneh 2021). The structure of a CNN typically includes a convolutional layer, a pooling layer, and a fully connected layer. In a CNN, the convolution and pooling layers serve as the fundamental building blocks. These layers extract various features from the input layer and reduce their dimensions by conducting convolution operations on the input layer and consolidating the outputs of neuron clusters into a single neuron. The pooling mechanism plays a crucial role in reducing the number of parameters in the network, enhancing the efficiency, ease, and speed of the training phase of CNNs compared with traditional ANNs. The 1D CNN is primarily utilized for processing sequence data (Duan et al. 2020). The 2D CNN is usually used for text and image identification (Lin et al. 2023), and usually, the 3D CNN is recognized for modeling medical images and video data identification (Duan et al. 2020). Streamflow data is one-dimensional, so a Conv1D model is used for this research. CNN models have become popular for predicting streamflow in recent years due to their speed, accuracy, and stability compared with other DL algorithms (Wegayehu & Muluneh 2021).
Conv1D-LSTM hybrid model
Attention/self-attention mechanism
Self-attention-based hybrid deep learning model (SA-Conv1D-BiGRU)
The specific working principle of the proposed model is as follows:
(1) Prepare the input dataset with features such as rainfall and streamflow data with their lagged time.
(2) The data extracted by Conv1D is input to the fully connected layer. The Conv1D layers in the model can capture local patterns in the streamflow data.
(3) Implement a BiGRU layer to capture both past and future dependencies in the combined features, and their outputs are merged. This helps the model understand the sequential nature of streamflow data and how past and future conditions can impact the current flow rate.
(4) The sequence self-attention calculation is performed on the output of the BiGRU layer, and different weights are assigned according to the degree of influence of the feature on the prediction result.
(5) Conduct hyperparameter tuning using random search methods to optimize the model's performance.
(6) The output of the sequence multiplicative self-attention mechanism is fully connected through the dense layer to extract nonlinear features, and then the forward and backward hidden states are connected to obtain the final output.
(7) Finally, compile the model using the Adam optimizer and mean squared error loss function.
DL model development in Python
The development steps for a DL model in Python can be summarized as follows: import different libraries, preprocess data, design model architecture, compile the model, train the model, use the model for prediction, evaluate its performance, iterate and refine, and deploy the model for real-world applications (Ergete & Geremew 2024). Achieving optimal performance in DL models necessitates making decisions on a combination of parameters and hyperparameters (Wegayehu & Muluneh 2021). Parameters are the variables that the model learns from the data during the training process including the weights and biases of the neurons in the hidden layer(s) and the output layer. Hyperparameters, on the other hand, are set by the user before the training process begins and are not learned from the data. The following paragraph discusses the main hyperparameters optimized.
Number of hidden units: This parameter determines the number of neurons in the hidden layer of the DLs. Increasing the number of hidden units allows the model to capture more intricate patterns in the data, but it also increases the computational complexity.
Activation function: Activation functions are crucial in neural networks as they introduce nonlinearity, control the output range, and add interpretability to the model (Mosavi et al. 2018). The selection of an activation function relies on the specific problem at hand and the desired behavior of the network. There are several activation functions commonly used in DLs. In this case, Sigmoid and Tanh activation functions were used.
The value of e ≈ 2.718.
The i-th entry in the softmax output vector softmax(z) can be thought of as the predicted probability of the test input belonging to class i.
Learning rate: It governs the magnitude of the step taken by the model to update its parameters during training. Setting the learning rate too high can cause training to be unstable, potentially leading to divergence or oscillations in the optimization process. Conversely, setting the learning rate too low can result in slow convergence, where the model takes a long time to reach an optimal solution. It is crucial to choose an appropriate learning rate that balances between fast convergence and stable training. Hyperparameter tuning and experimentation are often necessary to find the optimal learning rate for a specific model and dataset. Fine-tuning the learning rate is crucial for achieving optimal training performance and model accuracy. Experimentation and validation are essential to determine the best learning rate for a specific dataset and model architecture.
Number of epochs: The other hyperparameter that determines how many times the model sees the entire training dataset. It is essential to find the right balance to prevent underfitting or overfitting. Monitoring loss, using early stopping, adjusting learning rates, considering computational resources and hyperparameter tuning are key aspects to optimize the number of epochs for efficient training and accurate predictions in streamflow prediction tasks.
Dropout rate: Dropout is a regularization technique commonly used in DL models to prevent overfitting. It involves randomly setting a fraction of the neurons in the hidden layers to zero during each training iteration. Tuning the dropout rate can help improve the model's generalization performance and prevent it from memorizing noise in the training data.
Batch size: The batch size determines the number of training samples that are processed together in each iteration during training. Increasing the batch size can expedite training by processing more samples at once, yet it necessitates more memory. Smaller batch sizes may allow the model to generalize better as it updates its parameters more frequently.
Optimization algorithm: The optimization algorithm controls parameter updates during training. Common optimization algorithms used in DLs include stochastic gradient descent (SGD) and Adam. The choice of an optimization algorithm can affect the convergence speed and final performance of the model.
The rationale behind choosing a set of parameters for effectively fine-tuning a DL model for streamflow prediction and achieving accurate and reliable predictions were: model complexity, computational resources, and training performance (Wegayehu & Muluneh 2021). For example, monitoring the model's training performance, such as loss and accuracy, can guide the selection of parameters. Experimenting with different configurations and evaluating their impact on training metrics can help identify the optimal set of parameters. Understanding the sensitivities of these parameters and their effects on the model's performance is crucial for optimizing a DL model for streamflow prediction tasks and achieving accurate and reliable predictions.
Model training and testing
In the context of DL, training and testing are two critical phases in the development and evaluation of a model. Training refers to the process of using a labeled dataset to teach a DL model to recognize patterns and make predictions. On the other hand, testing, also known as evaluation or validation, is the phase where the trained model is assessed for its performance on new, unseen data. The objective is to improve the model's accuracy and determine its suitability for real-world applications (Ergete & Geremew 2024). In this study, the DL models were trained for monthly streamflow data from 1991 to 2008 and then tested from 2009 to 2013 for Kulfo trained from 1997 to 2012, and then tested from 2014 to 2018 for the Woybo catchment.
Performance measures
The purpose of performance measures in streamflow prediction is to assess the accuracy and reliability of the predicted streamflow data in comparison with observed data. In this study, the evaluation is conducted by comparing the predicted streamflow from the three models to the observed streamflow using the following performance measures (Equations (9)–(12)):
RESULTS AND DISCUSSION
DL models result
Parameter . | Range . |
---|---|
Conv1D layer | 16–128 |
Self-attention layer | 16–128 |
BiGRU layer | 16–128 |
Learning rate | [0.1, 0.01, 0.001, 0.0001] |
Parameter . | Range . |
---|---|
Conv1D layer | 16–128 |
Self-attention layer | 16–128 |
BiGRU layer | 16–128 |
Learning rate | [0.1, 0.01, 0.001, 0.0001] |
Various performance metrics can be utilized to assess DL models. In our case, we used RMSE, MAE, and R2 to evaluate the models' performance (Tables 6 and 7).
Model name . | Testing period . | |||
---|---|---|---|---|
MAE . | RMSE . | NSE . | R2 . | |
Bi-LSTM1 | 5.13 | 3.01 | 0.87 | 0.89 |
Bi-LSTM2 | 4.64 | 2.94 | 0.88 | 0.91 |
Bi-LSTM3 | 4.12 | 2.87 | 0.90 | 0.93 |
Bi-LSTM4 | 4.05 | 2.52 | 0.92 | 0.94 |
Conv1D-LSTM1 | 5.91 | 4.11 | 0.84 | 0.86 |
Conv1D-LSTM2 | 5.95 | 4.12 | 0.83 | 0.86 |
Conv1D-LSTM3 | 5.53 | 3.81 | 0.85 | 0.87 |
Conv1D-LSTM4 | 4.74 | 3.12 | 0.87 | 0.88 |
SA-Conv1D-BiGRU1 | 3.34 | 2.32 | 0.93 | 0.95 |
SA-Conv1D-BiGRU2 | 3.13 | 2.11 | 0.94 | 0.95 |
SA-Conv1D-BiGRU3 | 3.04 | 1.62 | 0.94 | 0.96 |
SA-Conv1D-BiGRU4 | 2.85 | 1.41 | 0.96 | 0.97 |
Model name . | Testing period . | |||
---|---|---|---|---|
MAE . | RMSE . | NSE . | R2 . | |
Bi-LSTM1 | 5.13 | 3.01 | 0.87 | 0.89 |
Bi-LSTM2 | 4.64 | 2.94 | 0.88 | 0.91 |
Bi-LSTM3 | 4.12 | 2.87 | 0.90 | 0.93 |
Bi-LSTM4 | 4.05 | 2.52 | 0.92 | 0.94 |
Conv1D-LSTM1 | 5.91 | 4.11 | 0.84 | 0.86 |
Conv1D-LSTM2 | 5.95 | 4.12 | 0.83 | 0.86 |
Conv1D-LSTM3 | 5.53 | 3.81 | 0.85 | 0.87 |
Conv1D-LSTM4 | 4.74 | 3.12 | 0.87 | 0.88 |
SA-Conv1D-BiGRU1 | 3.34 | 2.32 | 0.93 | 0.95 |
SA-Conv1D-BiGRU2 | 3.13 | 2.11 | 0.94 | 0.95 |
SA-Conv1D-BiGRU3 | 3.04 | 1.62 | 0.94 | 0.96 |
SA-Conv1D-BiGRU4 | 2.85 | 1.41 | 0.96 | 0.97 |
Model name . | Testing period . | |||
---|---|---|---|---|
MAE . | RMSE . | NSE . | R2 . | |
Bi-LSTM1 | 5.23 | 3.45 | 0.85 | 0.86 |
Bi-LSTM2 | 4.92 | 3.11 | 0.85 | 0.87 |
Bi-LSTM3 | 4.72 | 3.01 | 0.88 | 0.90 |
Conv1D-LSTM1 | 6.54 | 4.71 | 0.77 | 0.78 |
Conv1D-LSTM2 | 6.25 | 4.62 | 0.80 | 0.82 |
Conv1D-LSTM3 | 5.73 | 3.71 | 0.81 | 0.84 |
SA-Conv1D-BiGRU1 | 3.54 | 2.53 | 0.94 | 0.96 |
SA-Conv1D-BiGRU2 | 3.06 | 1.61 | 0.95 | 0.97 |
SA-Conv1D-BiGRU3 | 2.74 | 1.32 | 0.96 | 0.98 |
Model name . | Testing period . | |||
---|---|---|---|---|
MAE . | RMSE . | NSE . | R2 . | |
Bi-LSTM1 | 5.23 | 3.45 | 0.85 | 0.86 |
Bi-LSTM2 | 4.92 | 3.11 | 0.85 | 0.87 |
Bi-LSTM3 | 4.72 | 3.01 | 0.88 | 0.90 |
Conv1D-LSTM1 | 6.54 | 4.71 | 0.77 | 0.78 |
Conv1D-LSTM2 | 6.25 | 4.62 | 0.80 | 0.82 |
Conv1D-LSTM3 | 5.73 | 3.71 | 0.81 | 0.84 |
SA-Conv1D-BiGRU1 | 3.54 | 2.53 | 0.94 | 0.96 |
SA-Conv1D-BiGRU2 | 3.06 | 1.61 | 0.95 | 0.97 |
SA-Conv1D-BiGRU3 | 2.74 | 1.32 | 0.96 | 0.98 |
This study collected daily discharge data from Kulfo and Woybo stations and daily rainfall data from 10 gauging stations. Data for 22 flood events from 1991 to 2013 and 34 flood events from 1997 to 2018 were obtained for Kulfo and Woybo catchments, respectively.
DISCUSSION
Event No. . | Time . | Observed (m3s−1) . | Bi-LSTM (predicted) . | Conv1D-LSTM (predicted) . | SA-Conv1D-BiGRU model (predicted) . |
---|---|---|---|---|---|
1 | 1997/11/26 | 70.32 | 67.76 | 65.88 | 68.89 |
2 | 2001/8/9 | 81.95 | 77.78 | 74.56 | 79.34 |
3 | 2001/8/11 | 78.32 | 76.21 | 74.54 | 77.06 |
4 | 2001/8/13 | 77.42 | 75.74 | 74.38 | 76.03 |
… | … | … | … | … | |
19 | 2002/9/24 | 81.04 | 77.09 | 86.90 | 79.97 |
20 | 2002/9/25 | 79.67 | 77.34 | 76.88 | 78.82 |
21 | 2002/9/27 | 77.42 | 73.76 | 75.12 | 76.32 |
22 | 2002/10/3 | 87.29 | 85.67 | 80.36 | 86.45 |
Event No. . | Time . | Observed (m3s−1) . | Bi-LSTM (predicted) . | Conv1D-LSTM (predicted) . | SA-Conv1D-BiGRU model (predicted) . |
---|---|---|---|---|---|
1 | 1997/11/26 | 70.32 | 67.76 | 65.88 | 68.89 |
2 | 2001/8/9 | 81.95 | 77.78 | 74.56 | 79.34 |
3 | 2001/8/11 | 78.32 | 76.21 | 74.54 | 77.06 |
4 | 2001/8/13 | 77.42 | 75.74 | 74.38 | 76.03 |
… | … | … | … | … | |
19 | 2002/9/24 | 81.04 | 77.09 | 86.90 | 79.97 |
20 | 2002/9/25 | 79.67 | 77.34 | 76.88 | 78.82 |
21 | 2002/9/27 | 77.42 | 73.76 | 75.12 | 76.32 |
22 | 2002/10/3 | 87.29 | 85.67 | 80.36 | 86.45 |
Event No. . | Time . | Observed (m3s−1) . | Bi-LSTM (predicted) . | Conv1D-LSTM (predicted) . | SA-Conv1D-BiGRU model (predicted) . |
---|---|---|---|---|---|
1 | 2003/9/19 | 109.329 | 103.54 | 99.20 | 106.13 |
2 | 2004/1/23 | 98.329 | 92.34 | 84.32 | 96.87 |
3 | 2005/2/12 | 93.975 | 89.67 | 87.67 | 91.25 |
4 | 2005/8/10 | 104.32 | 100.02 | 94.76 | 101.21 |
5 | 2005/8/9 | 120.076 | 115.78 | 112.76 | 117.34 |
… | … | … | … | … | |
31 | 2009/8/27 | 99.671 | 93.21 | 87.89 | 96.01 |
32 | 2012/9/16 | 97.958 | 92.12 | 88.67 | 91.12 |
33 | 2016/8/14 | 102.822 | 94.23 | 91.56 | 97.34 |
34 | 2018/8/17 | 90.113 | 84.23 | 81.54 | 88.56 |
Event No. . | Time . | Observed (m3s−1) . | Bi-LSTM (predicted) . | Conv1D-LSTM (predicted) . | SA-Conv1D-BiGRU model (predicted) . |
---|---|---|---|---|---|
1 | 2003/9/19 | 109.329 | 103.54 | 99.20 | 106.13 |
2 | 2004/1/23 | 98.329 | 92.34 | 84.32 | 96.87 |
3 | 2005/2/12 | 93.975 | 89.67 | 87.67 | 91.25 |
4 | 2005/8/10 | 104.32 | 100.02 | 94.76 | 101.21 |
5 | 2005/8/9 | 120.076 | 115.78 | 112.76 | 117.34 |
… | … | … | … | … | |
31 | 2009/8/27 | 99.671 | 93.21 | 87.89 | 96.01 |
32 | 2012/9/16 | 97.958 | 92.12 | 88.67 | 91.12 |
33 | 2016/8/14 | 102.822 | 94.23 | 91.56 | 97.34 |
34 | 2018/8/17 | 90.113 | 84.23 | 81.54 | 88.56 |
CONCLUSIONS
Aiming at the problem that a single model is not accurate enough when dealing with hydrological modeling, this study introduces the SA-Conv1D-BiGRU streamflow prediction model along with Conv1D-LSTM and Bi-LSTM models at two different catchments. The Conv1D and BiGRU models were employed to extract the inherent characteristics of time-series data and investigate the temporal and feature dependencies within streamflow input data by leveraging the self-attention mechanism. The models were implemented using Python programming language within the Jupyter Notebook through different libraries and packages. The analysis revealed a diverse range of results, attributed to the lag time variation, time-series characteristics, and type DL algorithms deployed. The performance is highly dependent on lag time variations and type DL algorithms deployed compared with a time-series characteristics. The SA-Conv1D-BiGRU model consistently excels at capturing flow data patterns at both catchments as per all performance matrices utilized. The self-attention mechanism introduces additional computations to calculate attention weights for each element in the input sequence, allowing the model to capture long-range dependencies and focus on relevant features. This added complexity can result in a higher number of parameters and computational overhead compared with a model without the self-attention mechanism. However, the incorporation of self-attention in the SA-Conv1D-BiGRU model further increases the model's capacity to learn intricate patterns in the data. Despite the complexity, the Conv1D-LSTM models did not outperform the Bi-LSTM standalone model. Overall, the findings of this study provide insights into the performance of different DL models under different catchments. The results can be useful for evaluating various water resources management.
FUNDING
No fund was provided from any source.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.