ABSTRACT
River stage prediction is indispensably a challenging task in flood-prone river basins to disseminate accurate early warning in advance. In this study, multivariate wavelet-based long short-term memory (WLSTM) models have been developed to predict river stage at six gauging stations of the Teesta River basin in India for 1, 3, and 5-day lead time, the comparison of which has been done with long short-term memory (LSTM) models. Various combinations of wavelet decomposed components were utilized to form different sub-series that were fed as input in WLSTM models. In terms of statistical indicators, both the models yielded exceptionally good results, but the root-mean-square error values of the WLSTM model for 1- and 3-day lead time were minimal compared to the LSTM model. However, the accuracy of the LSTM model in longer lead time prediction is noticeable. Specifically, the WLSTM model predicted the peak stage values more precisely compared to the LSTM model, indicating the potential of wavelet analysis to capture the variations and periodicities of the data by removing the noise. Though the WLSTM model marginally outperformed the LSTM model in prediction accuracy, the results highlight both models as feasible alternatives for longer lead time water level prediction.
HIGHLIGHTS
River stage data decomposed with discrete wavelet transform are utilized in the long short-term memory (LSTM) model to develop a hybrid wavelet-based long short-term memory (WLSTM) model.
The WLSTM model enhanced the accuracy in peak stage prediction compared to the LSTM model.
The utilization of multivariate inputs in LSTM and WLSTM models highlights the influence of upstream stage values in downstream stage predictions.
Adjustment of hyperparameters in models produces better results.
INTRODUCTION
Flood forecasting is an indispensable tool devised to predict water level or flow in advance and disseminate this information in the form of a warning. Early warnings are intended to help the people or local bodies make prompt decisions and relevant actions for evacuation and relocation (Chau et al. 2005). Flood forecasting is a part of flood management planning and development devised to withstand the foreseen disaster through competency. Flood forecasting models are utilized in predicting water level or river stage to provide prior intimation about the extreme event (Latt & Wittenberg 2014). Stage is defined as the level of water in the river measured from a specific datum (Subramanya 2021). People can easily infer the stage prediction results as the concept of rising water levels is convenient to understand. Over the years, physical models utilized for prediction involved intensive usage of various data and mathematical equations to describe the underlying hydrological processes (Thirumalaiah & Deo 2000). Sometimes these physical models face some significant issues in producing accurate and reliable flood forecasting. Moreover, the requirement of abundant data by these models makes it difficult to apply when data is inadequate (Tiwari & Chatterjee 2010). In this context, data-driven models have gained immense acceptability in the field of flood forecasting for their capability to capture the input–output relationship within limited data (Kasiviswanathan et al. 2016). Therefore, many researchers pondered the application of different popular machine learning (ML) models (Hadi et al. 2024) such as artificial neural network (ANN) (Liong et al. 2000; Campolo et al. 2003), genetic algorithm-based ANN (ANN-GA), and neuro-fuzzy (ANFIS) (Chau et al. 2005) for stage forecasting.
Despite several advantages, sometimes some standalone ML models fail to achieve accuracy for longer lead time predictions (Tiwari & Chatterjee 2010; Kasiviswanathan et al. 2016). Linh et al. (2021) concluded that the performance of the wavelet neural network hybrid ML model outperformed the traditional ANN and multiple linear regression models for predicting maximum monthly discharge by incorporating two climatic signals as input. Wavelet analysis exhibits a time–frequency representation of the time series data, thus providing extensive statistics about the data pattern (Daubechies 1990). The intrinsic characteristics of wavelet transform to decompose the non-stationary signal into various components of different resolution levels aid in improving the model predictability by capturing the periodicity and trend of the time series signal. Wavelet transform is extensively applied in various fields of hydrology, such as water level prediction, streamflow forecasting, rainfall–runoff, and water quality prediction (Nourani et al. 2008; Guimarães Santos & da Silva 2014; Seo et al. 2015; Barzegar et al. 2016). All these studies reveal that the wavelet-based ANN/ANFIS model has increased the correlation with the actual data, improving accuracy compared to the single ANN/ANFIS model for longer lead time prediction.
In recent times, among the deep learning techniques, the long short-term memory (LSTM) model has become immensely popular in prediction studies. Hochreiter & Schmidhuber (1997) introduced the LSTM model by improvising a recurrent neural network (RNN) to overcome the limitations of vanishing and exploding gradients. The advantage of LSTM lies in its capability to tackle long-term dependencies by creating a linkage between the sequential time series data. Apart from various interdisciplinary fields, the LSTM model has found its application in prediction of hydrological issues, especially rainfall–runoff (Yin et al. 2021), modelling reservoir operation (Zhang et al. 2018), rainfall and monthly streamflow (Ni et al. 2020; Dalkilic et al. 2023), evapotranspiration (Wang et al. 2023), and many more for better accuracy in longer lead time prediction. With the application of proper feature engineering and fine-tuning of the hyperparameters, the multivariate LSTM model yielded satisfactory performance for 1 week ahead discharge prediction of the Central Delaware River in New Jersey using seven stream-water variables (Khosravi et al. 2023). Similarly, the LSTM model has outperformed the nonlinear autoregressive with exogenous inputs for multi-step ahead prediction of river flow in the Kelantan River in Malaysia (Hayder et al. 2022). Furthermore, Faruq et al. (2020) employed the LSTM model as a deep learning technique to predict the water level of the Klang River basin in Malaysia for the next few hours and obtained almost accurate results when compared with the observed values.
As inferred from the aforementioned studies, deep neural networks (DNNs) are utilized to obtain higher accuracy for longer lead time prediction irrespective of hydrological challenges and geographical locations. Presently, the application of hybrid models like variational mode decomposition-based genetic algorithm Elman neural network model (Xing et al. 2022), adaptive step size cuckoo search algorithm-based LSTM and self-attention mechanism (ASCS-LSTM-ATT) model (Li et al. 2020), and particle swarm optimization-LSTM model (Ruma et al. 2023), and many more have gained immense attention, mainly for enhancing model performance. In this process, either the input data are processed with various data pre-processing methods or the parameters are optimized with some algorithms. The prime objective is to achieve better results in a longer lead time prediction, which is especially essential for anticipating a flood situation. Despite solving many hydrological issues, there is an evident gap in the literature addressing the river water level prediction utilizing the upstream water levels as inputs with DNNs. There are many river basins with gauging stations distributed across the mountainous and plain regions such as the Teesta River basin in India. Although the stations within this basin are monitored by the Central Water Commission, prediction studies of river water level or stage in this basin are not properly carried out. It is indispensable to predict the water levels in advance to prevent substantial loss as the Teesta River basin is a flood-prone basin experiencing sudden floods almost every year.
This research attempts to address the issue of the Teesta River basin by explaining the potentiality of wavelet analysis coupled with deep learning techniques and providing insight into the most researched topic of recent times. Influenced by the robustness and higher efficiency of the DNNs, multivariate multi-step LSTM models are developed to predict the daily river stage at six different gauging stations for 1-day, 3-day, and 5-day lead times. Finally, a hybrid deep learning model, a wavelet-based long short-term memory (WLSTM) model with multivariate input for multi-step prediction, is developed individually for all the stations. The uniqueness of this study lies in integrating wavelet transform with a deep learning algorithm, LSTM, for the prediction of the river water level of the Teesta River basin, which explains some new possibilities of leveraging deep learning approaches in flood forecasting.
MATERIALS AND METHODS
Study area and data used
There are six gauging stations present within the study area (Figure 1) maintained by the Central Water Commission (CWC), New Delhi. These are Sankalan (North Sikkim), Khanitar (East Sikkim), Teesta Bazaar (West Bengal), Coronation (WB), Domohani (WB), and Mekhliganj (WB), chronologically from upstream to downstream. The stage data at Teesta Bazaar, Coronation, Domohani, and Mekhliganj gauging stations are characterized by the West Bengal Irrigation Department into three significant levels, i.e., preliminary-danger level (PDL), danger level (DL), and extreme danger level (EDL). Table 1 represents the PDL, DL, and EDL for Teesta Bazaar, Coronation, Domohani, and Mekhliganj. Daily stage data available during the monsoon season (1 May to 31 October) from 2006 to 2017 (2208 sets) were divided into two datasets. Daily stage data for year 2006–2014 (1656 patterns) were randomly divided for training (70%) and k-fold cross-validation (30%) and 2015–2017 (552 patterns) for testing. Some of the statistical properties of daily stage data are illustrated in Table 2.
Gauging stations . | PDL (m) . | DL (m) . | EDL (m) . |
---|---|---|---|
Teesta Bazaar | 210.40 | 211.00 | 213.00 |
Coronation | 149.40 | 150.00 | 153.60 |
Domohani | 85.60 | 85.95 | 86.30 |
Mekhliganj | 65.35 | 65.95 | 66.30 |
Gauging stations . | PDL (m) . | DL (m) . | EDL (m) . |
---|---|---|---|
Teesta Bazaar | 210.40 | 211.00 | 213.00 |
Coronation | 149.40 | 150.00 | 153.60 |
Domohani | 85.60 | 85.95 | 86.30 |
Mekhliganj | 65.35 | 65.95 | 66.30 |
Gauging stations . | Sankalan . | Khanitar . | Teesta Bazaar . | Coronation . | Domohani . | Mekhliganj . |
---|---|---|---|---|---|---|
Datasets from 2006 to 2014 | ||||||
Maximum (m) | 760.85 | 295.32 | 210.00 | 150.50 | 89.13 | 65.84 |
Minimum (m) | 752.20 | 290.32 | 201.50 | 142.10 | 82.34 | 63.38 |
Mean (m) | 755.75 | 292.57 | 204.70 | 145.30 | 85.26 | 64.73 |
Standard deviation (m) | 1.57 | 0.74 | 1.73 | 1.32 | 0.49 | 0.44 |
Skewness | −0.11 | −0.04 | 0.16 | 0.32 | −0.26 | −0.38 |
Kurtosis | −0.46 | 0.35 | −0.69 | −0.10 | 2.68 | −0.27 |
Datasets from 2015 to 2017 | ||||||
Maximum (m) | 758.11 | 293.28 | 210.34 | 146.61 | 85.85 | 65.61 |
Minimum (m) | 753.22 | 290.39 | 205.01 | 141.22 | 83.46 | 62.81 |
Mean (m) | 755.48 | 291.65 | 207.59 | 144.09 | 84.91 | 64.33 |
Standard deviation (m) | 1.27 | 0.66 | 1.34 | 1.17 | 0.56 | 0.58 |
Skewness | 0.22 | −0.03 | −0.04 | −0.27 | −0.75 | −0.16 |
Kurtosis | −0.99 | −0.86 | −0.81 | −0.51 | −0.05 | −0.42 |
Gauging stations . | Sankalan . | Khanitar . | Teesta Bazaar . | Coronation . | Domohani . | Mekhliganj . |
---|---|---|---|---|---|---|
Datasets from 2006 to 2014 | ||||||
Maximum (m) | 760.85 | 295.32 | 210.00 | 150.50 | 89.13 | 65.84 |
Minimum (m) | 752.20 | 290.32 | 201.50 | 142.10 | 82.34 | 63.38 |
Mean (m) | 755.75 | 292.57 | 204.70 | 145.30 | 85.26 | 64.73 |
Standard deviation (m) | 1.57 | 0.74 | 1.73 | 1.32 | 0.49 | 0.44 |
Skewness | −0.11 | −0.04 | 0.16 | 0.32 | −0.26 | −0.38 |
Kurtosis | −0.46 | 0.35 | −0.69 | −0.10 | 2.68 | −0.27 |
Datasets from 2015 to 2017 | ||||||
Maximum (m) | 758.11 | 293.28 | 210.34 | 146.61 | 85.85 | 65.61 |
Minimum (m) | 753.22 | 290.39 | 205.01 | 141.22 | 83.46 | 62.81 |
Mean (m) | 755.48 | 291.65 | 207.59 | 144.09 | 84.91 | 64.33 |
Standard deviation (m) | 1.27 | 0.66 | 1.34 | 1.17 | 0.56 | 0.58 |
Skewness | 0.22 | −0.03 | −0.04 | −0.27 | −0.75 | −0.16 |
Kurtosis | −0.99 | −0.86 | −0.81 | −0.51 | −0.05 | −0.42 |
Long short-term memory model
Wavelet analysis
Model performance evaluation
Statistical indicators are required to evaluate the goodness of fit between observed and predicted values. In this study, the coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE), and root-mean-square error (RMSE) are the statistical indicators utilized to assess the model accuracy for both the training as well as the testing phase. During training, the efficiency of the model is determined by comparing the model-generated output with a set of target data given to it. While testing the best model, the correlation between the actual values and predicted values for a new dataset is assessed through these statistical indicators. Based on specific values of these indicators, various performance ratings are assigned to the models illustrated in Table 3. The statistical indicators used in this study are expressed by Equations (3)–(5):
- (ii) NSE (Nash & Sutcliffe 1970) – It is expressed as follows:
Sl No. . | Performance rating . | R2 . | NSE . |
---|---|---|---|
1 | Very good | 0.75–1 | 0.75–1 |
2 | Good | 0.65–0.75 | 0.65–0.75 |
3 | Satisfactory | 0.50–0.65 | 0.50–0.65 |
4 | Unsatisfactory | <0.50 | <0.50 |
Sl No. . | Performance rating . | R2 . | NSE . |
---|---|---|---|
1 | Very good | 0.75–1 | 0.75–1 |
2 | Good | 0.65–0.75 | 0.65–0.75 |
3 | Satisfactory | 0.50–0.65 | 0.50–0.65 |
4 | Unsatisfactory | <0.50 | <0.50 |
Table 3 illustrates the performance rating for R2 and NSE values. For RMSE, the lesser the RMSE values, the better the performance (Latt & Wittenberg 2014).
Procedure for developing LSTM model
Procedure for developing wavelet-based LSTM model
In the present study, the original data series of the river stage was decomposed up to three levels (L = 3 approximately), producing the DWCs (D1, D2, D3, and A3). The detail components were extracted in such a way from the original series so that various combinations of approximation components can be used to develop the sub-series. Therefore, the new sub-series was formed by various combinations of approximation components. As a result, at least two or more approximation components were combined to form four different sub-series shown in Table 4. This procedure of forming four different sub-series was conducted for each gauging station, i.e., Sankalan, Khanitar, Teesta Bazaar, Coronation, Domohani, and Mekhliganj. The sub-series was fed as input in the LSTM model to form individual WLSTM models from each sub-series. Thus, four WLSTM models for each gauging station were prepared among which the best model was selected based on its performance.
Sub-series . | Input combinations . | In terms of Q . |
---|---|---|
1 | I = A1+ A2 | Q – D1 – D2 |
2 | I = A1+ A3 | Q – D1 – D2 – D3 |
3 | I = A2+ A3 | Q – D1 – D2 – D3 |
4 | I = A1+ A2 + A3 | Q – D1 – D2 – D3 |
Sub-series . | Input combinations . | In terms of Q . |
---|---|---|
1 | I = A1+ A2 | Q – D1 – D2 |
2 | I = A1+ A3 | Q – D1 – D2 – D3 |
3 | I = A2+ A3 | Q – D1 – D2 – D3 |
4 | I = A1+ A2 + A3 | Q – D1 – D2 – D3 |
RESULTS
Prediction of daily river stage with LSTM models
. | 1 day . | |||||
---|---|---|---|---|---|---|
Gauging stations . | Training . | Testing . | ||||
R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | |
Sankalan | 0.9321 | 0.9319 | 0.3221 | 0.9516 | 0.951 | 0.2791 |
Khanitar | 0.8954 | 0.8951 | 0.3011 | 0.9068 | 0.8978 | 0.2099 |
Teesta Bazaar | 0.9113 | 0.9101 | 0.3976 | 0.9493 | 0.9357 | 0.3411 |
Coronation | 0.9228 | 0.9221 | 0.31 | 0.9421 | 0.9414 | 0.2807 |
Domohani | 0.8862 | 0.879 | 0.2876 | 0.9107 | 0.9095 | 0.1673 |
Mekhliganj | 0.8913 | 0.8902 | 0.2421 | 0.9213 | 0.9202 | 0.1645 |
3 days . | ||||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.891 | 0.8873 | 0.4676 | 0.8946 | 0.8938 | 0.4109 |
Khanitar | 0.7956 | 0.7548 | 0.3854 | 0.8104 | 0.7895 | 0.3009 |
Teesta Bazaar | 0.8434 | 0.8219 | 0.5618 | 0.886 | 0.8423 | 0.5335 |
Coronation | 0.8207 | 0.8043 | 0.5234 | 0.8474 | 0.8385 | 0.4674 |
Domohani | 0.8289 | 0.8242 | 0.279 | 0.847 | 0.8389 | 0.2236 |
Mekhliganj | 0.8567 | 0.8521 | 0.2355 | 0.8815 | 0.8797 | 0.2018 |
. | 5 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.7577 | 0.7431 | 0.6221 | 0.794 | 0.7927 | 0.5752 |
Khanitar | 0.7225 | 0.7065 | 0.3839 | 0.7599 | 0.7179 | 0.3474 |
Teesta Bazaar | 0.8293 | 0.7536 | 0.6409 | 0.8726 | 0.7937 | 0.6103 |
Coronation | 0.8012 | 0.7852 | 0.5971 | 0.833 | 0.7705 | 0.5596 |
Domohani | 0.7798 | 0.7402 | 0.3054 | 0.8069 | 0.7927 | 0.2545 |
Mekhliganj | 0.8364 | 0.8275 | 0.2738 | 0.866 | 0.8615 | 0.2165 |
. | 1 day . | |||||
---|---|---|---|---|---|---|
Gauging stations . | Training . | Testing . | ||||
R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | |
Sankalan | 0.9321 | 0.9319 | 0.3221 | 0.9516 | 0.951 | 0.2791 |
Khanitar | 0.8954 | 0.8951 | 0.3011 | 0.9068 | 0.8978 | 0.2099 |
Teesta Bazaar | 0.9113 | 0.9101 | 0.3976 | 0.9493 | 0.9357 | 0.3411 |
Coronation | 0.9228 | 0.9221 | 0.31 | 0.9421 | 0.9414 | 0.2807 |
Domohani | 0.8862 | 0.879 | 0.2876 | 0.9107 | 0.9095 | 0.1673 |
Mekhliganj | 0.8913 | 0.8902 | 0.2421 | 0.9213 | 0.9202 | 0.1645 |
3 days . | ||||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.891 | 0.8873 | 0.4676 | 0.8946 | 0.8938 | 0.4109 |
Khanitar | 0.7956 | 0.7548 | 0.3854 | 0.8104 | 0.7895 | 0.3009 |
Teesta Bazaar | 0.8434 | 0.8219 | 0.5618 | 0.886 | 0.8423 | 0.5335 |
Coronation | 0.8207 | 0.8043 | 0.5234 | 0.8474 | 0.8385 | 0.4674 |
Domohani | 0.8289 | 0.8242 | 0.279 | 0.847 | 0.8389 | 0.2236 |
Mekhliganj | 0.8567 | 0.8521 | 0.2355 | 0.8815 | 0.8797 | 0.2018 |
. | 5 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.7577 | 0.7431 | 0.6221 | 0.794 | 0.7927 | 0.5752 |
Khanitar | 0.7225 | 0.7065 | 0.3839 | 0.7599 | 0.7179 | 0.3474 |
Teesta Bazaar | 0.8293 | 0.7536 | 0.6409 | 0.8726 | 0.7937 | 0.6103 |
Coronation | 0.8012 | 0.7852 | 0.5971 | 0.833 | 0.7705 | 0.5596 |
Domohani | 0.7798 | 0.7402 | 0.3054 | 0.8069 | 0.7927 | 0.2545 |
Mekhliganj | 0.8364 | 0.8275 | 0.2738 | 0.866 | 0.8615 | 0.2165 |
Prediction of daily river stage with WLSTM models
. | 1 day . | |||||
---|---|---|---|---|---|---|
Gauging stations . | Training . | Testing . | ||||
R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | |
Sankalan | 0.9823 | 0.9812 | 0.1142 | 0.9941 | 0.9941 | 0.0971 |
Khanitar | 0.9771 | 0.9742 | 0.121 | 0.988 | 0.9866 | 0.076 |
Teesta Bazaar | 0.9798 | 0.9778 | 0.1591 | 0.99 | 0.9893 | 0.1339 |
Coronation | 0.9882 | 0.9721 | 0.1327 | 0.995 | 0.9944 | 0.0867 |
Domohani | 0.9642 | 0.9611 | 0.121 | 0.9957 | 0.9956 | 0.0369 |
Mekhliganj | 0.9769 | 0.9684 | 0.1224 | 0.9943 | 0.9916 | 0.0535 |
. | 3 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.9391 | 0.9337 | 0.3321 | 0.9515 | 0.9515 | 0.2776 |
Khanitar | 0.8723 | 0.8628 | 0.2772 | 0.9021 | 0.8987 | 0.2087 |
Teesta Bazaar | 0.9348 | 0.9253 | 0.3907 | 0.9499 | 0.9318 | 0.3386 |
Coronation | 0.9056 | 0.8967 | 0.411 | 0.9525 | 0.9415 | 0.2812 |
Domohani | 0.9273 | 0.9045 | 0.2564 | 0.9511 | 0.9491 | 0.1256 |
Mekhliganj | 0.9433 | 0.9386 | 0.1726 | 0.9608 | 0.9558 | 0.1224 |
. | 5 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.8604 | 0.8575 | 0.5677 | 0.8616 | 0.8609 | 0.4711 |
Khanitar | 0.7671 | 0.7421 | 0.4753 | 0.7768 | 0.7631 | 0.3184 |
Teesta Bazaar | 0.8744 | 0.8439 | 0.6679 | 0.8924 | 0.8083 | 0.5702 |
Coronation | 0.8045 | 0.7787 | 0.6329 | 0.8202 | 0.7866 | 0.5396 |
Domohani | 0.8773 | 0.8691 | 0.2691 | 0.9053 | 0.9015 | 0.1754 |
Mekhliganj | 0.8548 | 0.8472 | 0.2194 | 0.902 | 0.8967 | 0.1871 |
. | 1 day . | |||||
---|---|---|---|---|---|---|
Gauging stations . | Training . | Testing . | ||||
R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | |
Sankalan | 0.9823 | 0.9812 | 0.1142 | 0.9941 | 0.9941 | 0.0971 |
Khanitar | 0.9771 | 0.9742 | 0.121 | 0.988 | 0.9866 | 0.076 |
Teesta Bazaar | 0.9798 | 0.9778 | 0.1591 | 0.99 | 0.9893 | 0.1339 |
Coronation | 0.9882 | 0.9721 | 0.1327 | 0.995 | 0.9944 | 0.0867 |
Domohani | 0.9642 | 0.9611 | 0.121 | 0.9957 | 0.9956 | 0.0369 |
Mekhliganj | 0.9769 | 0.9684 | 0.1224 | 0.9943 | 0.9916 | 0.0535 |
. | 3 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.9391 | 0.9337 | 0.3321 | 0.9515 | 0.9515 | 0.2776 |
Khanitar | 0.8723 | 0.8628 | 0.2772 | 0.9021 | 0.8987 | 0.2087 |
Teesta Bazaar | 0.9348 | 0.9253 | 0.3907 | 0.9499 | 0.9318 | 0.3386 |
Coronation | 0.9056 | 0.8967 | 0.411 | 0.9525 | 0.9415 | 0.2812 |
Domohani | 0.9273 | 0.9045 | 0.2564 | 0.9511 | 0.9491 | 0.1256 |
Mekhliganj | 0.9433 | 0.9386 | 0.1726 | 0.9608 | 0.9558 | 0.1224 |
. | 5 days . | |||||
. | Training . | Testing . | ||||
Gauging stations . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . |
Sankalan | 0.8604 | 0.8575 | 0.5677 | 0.8616 | 0.8609 | 0.4711 |
Khanitar | 0.7671 | 0.7421 | 0.4753 | 0.7768 | 0.7631 | 0.3184 |
Teesta Bazaar | 0.8744 | 0.8439 | 0.6679 | 0.8924 | 0.8083 | 0.5702 |
Coronation | 0.8045 | 0.7787 | 0.6329 | 0.8202 | 0.7866 | 0.5396 |
Domohani | 0.8773 | 0.8691 | 0.2691 | 0.9053 | 0.9015 | 0.1754 |
Mekhliganj | 0.8548 | 0.8472 | 0.2194 | 0.902 | 0.8967 | 0.1871 |
DISCUSSION
The results demonstrate the application of LSTM and WLSTM models for the prediction of river stage for the six gauging stations for 1-, 3-, and 5-day lead time. In general, the range of R2 and NSE values from 1 day to 5 days is 0.95–0.72 and 0.99–0.76 for LSTM and WLSTM models, respectively. The range of RMSE values is 0.16–0.61 and 0.03–0.57 m. The results prove that the performance of both models slightly deteriorated with the increase in lead days accordingly the congruency between observed and predicted values gradually fell apart with the advancement of days. However, the LSTM models of Sankalan, Domohani, and Mekhliganj perform well compared to the other gauging stations considering the results of all the statistical indicators (R2, NSE, and RMSE). Although the LSTM models of Teesta Bazaar and Coronation perform better in terms of R2 and NSE, the error difference is more compared to other stations as inferred from the RMSE values. Overall, in LSTM models, the agreement between the observed and predicted stage for Mekhliganj is better than that of other stations for all the lead times as the RMSE error is lower. The idea of utilizing multivariate inputs (river stage values of the upstream gauging stations) has proven beneficial in enhancing the accuracy of the model and implementing an idea of natural phenomena in which the downstream water level is influenced by its upstream values. As the LSTM model network learns from the previous timesteps and decides which important data to retain and carry forward for the next iteration, it highlights its ability to capture long-term dependencies in the sequential data. The model discards the redundant information to retain necessary information in the memory for achieving high prediction accuracy. Moreover, the unique architecture of the LSTM model helps to interpret the spatiotemporal correlation of the data while processing the time series data. By incorporating the wavelet components as input in the LSTM model, the results of the statistical indicators of WLSTM models improved with respect to LSTM models. The sequential data handling capacity of the LSTM model and the ability of wavelet transform to discard noise from the data collectively lead to an improvement in prediction accuracy for the WLSTM model. As a result, the R2 and NSE values of WLSTM models have escalated and RMSE values indicated lower errors.
Thus, LSTM models are a more reliable choice than the traditional models for longer lead time predictions because they can capture long-term dependencies very well (Adli Zakaria et al. 2023). Similarly, Atashi et al. (2022) recommended the LSTM model over seasonal autoregressive integrated moving average (SARIMA) and random forest models for achieving more accurate results in floodwater level prediction of the Red River basin. Even, LSTM has outperformed other deep learning models like gated recurrent unit and bidirectional LSTM (Bi-LSTM) in hourly water level prediction at Kien Giang river in Vietnam as reported by Hieu et al. (2023). The appropriate usage of disintegrated wavelet components (details and approximation) in the ML models enhances the predictive capability of the model presenting them as a better predictive tool than conventional models (Anh et al. 2018). The study by Xie et al. (2021) suggests that the WLSTM model has stronger generalization capability over the LSTM, wavelet-artificial neural network (WANN), and wavelet-autoregressive integrated moving average (WA-ARIMA) models while predicting the water level at the Yangtze River in China. The methodology of combining wavelet with deep learning algorithm for prediction purposes aids in depicting the nonlinear relationships between the predictor and response variables by capturing the pattern of the data series proficiently. From the findings of various researchers, it can be concluded that hybrid models are superior to conventional models and deep learning models are also an alternative to traditional ML models. Moreover, many researchers (Le et al. 2019; Liu et al. 2020; Mehedi et al. 2022) experimented with various hyperparameters (epochs, hidden neurons) through trial and error similar to this study to achieve the best prediction results. However, the approach is not limited to this methodology, implementation of various hybridization and optimization techniques with the models, and significant parametric modifications are recommended to explore the outcomes in this prediction study.
CONCLUSIONS
This study explores the utilization of wavelet analysis with a deep learning algorithm and a comparative analysis of LSTM and WLSTM models to predict the daily river stage at six gauging stations for 1-, 3-, and 5-day lead time. The LSTM and WLSTM models utilized multivariate inputs (river stage values of upstream stations) for all the gauging stations except Sankalan and produced satisfactory results in terms of the statistical indicators (R2, NSE, and RMSE) by adjusting the hyperparameters. Overall, the LSTM model for Mekhliganj performed extremely well with minimum RMSE values even up to 5-day lead time. Advantageously, the LSTM model can be a choice for Mekhliganj even for a longer lead time; warnings can be disseminated earlier since it is the catchment outlet. Regarding the peak stage prediction, for all the stations, the WLSTM model outperforms the LSTM model for 1-day and 3-day lead time. Simultaneously, the LSTM model yields a minimal error in peak stage prediction for 5-day lead time at Sankalan, Khanitar, and Teesta Bazaar gauging stations. Specifically, for Domohani and Mekhliganj, where peak river stages exceeding the PDL were prominent in some instances, these models well predicted those peaks for 1-day lead which was gradually underestimated with the increase in the forecast period. The recursive framework of the LSTM model enables it to solve the long-term dependencies of time series problems. Thus, it can be concluded that both LSTM and WLSTM models are recommended for overall performance with lower errors and proficient peak stage prediction for a certain lead time. In most of the gauging stations, the predicted river stage by the WLSTM model adhered to the observed variation of the time series, but in some cases, overestimation and underestimation were noted with the increase in lead time. This implies that the indigenous property of DWT disintegrates the noise from the original data which enable the model to capture the trend as well as to understand the orientation of the data for better prediction. However, the degradation of the model's performance with the advancement of lead time is considered one of the limitations of the model and further improvement in the LSTM or WLSTM model for enhancing accuracy with increasing forecast period needs to be explored. The variation in the performances of both the LSTM and WLSTM models for different gauging stations also highlights the data sensitivity of the LSTM models. However, the LSTM model handles noisy data but fails to quantify the uncertainties in the prediction. In this regard, integrating the various types of data and optimizing hyperparameters can enhance the robustness of the model for better accuracy. This study of river stage prediction at various gauging stations of the Teesta River basin is indispensably significant in this flood-prone basin to issue early warnings of rising water levels to the inhabitants during emergencies and save them from adversity.
ACKNOWLEDGEMENTS
The authors duly thank the Central Water Commission, New Delhi, for furnishing the data support to carry out this study.
FUNDING
No funding was received for conducting this study.
AUTHOR CONTRIBUTIONS
S.C. conceptualized the study, collected data, performed the analysis, and wrote the first draft of the manuscript. S.B. edited the previous version of the manuscript and supervised the entire study. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.