Abstract
River water level prediction (WLP) plays an important role in flood control, navigation, and water supply. In this study, a WaveNet-based convolutional neural network (WCNN) with a lightweight structure and good parallelism was developed to improve the prediction accuracy and time effectiveness of WLP. It was applied to predict 1/2/3 days ahead the water levels at the Waizhou gauging station of the Ganjiang River (GR) in China, and it was compared with two recurrent neural networks (long short-term memory (LSTM) and gated recurrent unit (GRU)). The results showed that the WCNN model achieved the best prediction performance with the fewest training parameters and time. Compared with the LSTM and GRU models in the 1-day ahead prediction, the training parameters were reduced from 73,851 and 55,851 to 32,937, respectively. The root mean square error (RMSE) was reduced from 0.071 and 0.076 to 0.057, respectively. The mean absolute error (MAE) was reduced from 0.052 and 0.059 to 0.038, respectively. The Nash–Sutcliffe efficiency (NSE) and coefficient of determination (R2) both increased to 0.998. This result indicated that the improved model was more efficient for WLP.
HIGHLIGHTS
A WaveNet-based convolutional neural network was proposed for water level prediction.
WCNN with a lightweight structure and good parallelism achieved better prediction performance.
WCNN obtained higher accuracy results with the fewest parameters and training time than RNN.
Influence of different inputs and hyperparameters of models on prediction results was revealed.
INTRODUCTION
The water level is an important hydrological feature of rivers, and accurate water level prediction (WLP) is crucial to flood control, shipping and water supply planning and management (Deng et al. 2021). However, variations in the water level are highly nonlinear due to various factors, such as rainfall, runoff, topography, water conservancy projects, and human activities, which increase the difficulty of accurate prediction (Lai et al. 2013; Zhang et al. 2016).
The WLP models mainly include hydrologic models based on physical processes and machine learning (ML) models based on a data-driven approach (abbreviations: physical models and ML models). Physical models can obtain the meticulous simulation of water level processes by establishing basic equations to express the interaction mechanism between variables (Lai et al. 2013). However, the establishment of physical models requires a wealth of professional knowledge, a large number of basic data and physical parameters, the modelling process is complex, difficult and time-consuming. ML models do not need to understand the mechanism of physical systems, and they predict the water level by directly detecting correlations between variables and realizing the mapping of inputs to outputs. The data that ML models require are easy to obtain, and the modelling process is relatively simple (Yin et al. 2021).
Various ML models were used in hydrological prediction tasks. For example, in Ahmed et al. (2022), six different ML algorithms (i.e., linear regression (LR), interaction regression (IR), robust regression (RR), stepwise regression (SR), support vector regression (SVR), boosted trees ensemble regression (BOOSTER), bagged trees ensemble regression (BAGER), XGBoost, tree regression (TR), and Gaussian process regression (GPR)) were developed to predict the river's water level, on a daily basis based on collected data from 1990 to 2019 which were used to train and test the proposed models. They found that the GPR model was capable of predicting the water level of the river with high precision and less uncertainty. In Latif & Ahmed (2020), (2023) and Latif (2023), the LSTM model was used to predict the daily streamflow of the Kowmung River at Cedar Ford in Australia, daily reservoir inflow of the Dokan Dam in Iraq and daily pan evaporation at Sydney airport in Australia, respectively. The results showed the LSTM model outperformed other conventional ML models (e.g., random forest (RF), tree boost (TB), multilayer perceptron neural network (MLP-NN), and boosted regression tree (BRT)). Huang et al. (2021) built three ML models, namely, an artificial neural network (ANN), a nonlinear autoregressive model with exogenous input (NARX) and GRU to simulate the daily Poyang Lake level from 2003 to 2016. They found that ML models with historical memory (i.e., the GRU model) were more suitable for simulating the Poyang Lake level under the influence of the Three Gorges Dam. In Ho et al. (2022), the LSTM model was proposed to predict short-term water levels in tidal sluice gates from 6 to 48 h ahead in the Bac Hung Hai irrigation system in Vietnam. The findings of this study highlighted the performance of LSTM models in providing high-accuracy short-period water level forecasts for areas near estuaries. Kima et al. (2022) used ML models such as gradient boosting (GB), support vector model (SVM), and LSTM to predict the flood water level of the Heungcheon bridge station which was located in the downstream of Bokha bridge. Rainfall, water level, and discharge data of Bokha bridge station from 2005 to 2020 were collected, especially the collected rainfall data were classified into 53 rainfall events using Interevent Time Definition (IETD) analysis of rainfall data. The LSTM model showed the best predictive power and was selected as the optimal model for real-time floodwater level forecasting in this study. Zhang et al. (2018) built four different neural networks to predict the water level of the combined sewer overflow structure in Norway, through a comparison of other different neural networks (e.g., multilayer perceptron (MLP) and wavelet neural network (WNN)), the LSTM and GRU presented superior capabilities for multistep-ahead time series prediction. In Cai et al. (2021), a GRU model was built for groundwater level simulation in 78 catchments in the central-eastern continental United States, the results showed that the GRU model performed better in regions where hydrogeological properties could promote more effective responses of groundwater to external changes. The above shows that the LSTM and GRU models in recurrent neural networks (RNNs) can usually obtain better prediction performance. However, they still have some shortcomings that need to be improved. For example, their internal recurrent connections required processing inputs according to time sequence, resulting in models that could not be trained in parallelism and increased training time (Fan et al. 2021). Furthermore, they consumed additional memory to hold long-term information. For practical problems with very large datasets, the LSTM and GRU models are relatively low in terms of cost-efficiency.
A convolutional neural network (CNN) is a lightweight structure that has unique advantages in capturing spatial dependencies of input data (Collado-Villaverde et al. 2021). Nevertheless, its performance on time series regression tasks is poor (Yan et al. 2020). A new convolutional structure named WaveNet, proposed by DeepMind in 2016, has performed well in sequential analysis problems (van den Oord et al. 2018). It combines the advantages of dilated causal convolutions, residual, and skip connections. It not only receives nonfixed-length inputs but also learns complex long-term dependencies of sequences.
Compared with RNN models, WaveNet has a smaller number of parameters, large receptive fields, and good parallelism. It has good application prospects in solving sequence analysis problems with large datasets (Borovykh et al. 2019; Rizvi et al. 2021). However, WaveNet was initially proposed for audio generation tasks, and it cannot directly address the time series regression problem. Many researchers have improved it and achieved better prediction performance than LSTM or GRU in the prediction tasks on the power load (Wang et al. 2021), traffic flow (Zhang et al. 2021a, 2021b), and air quality (Benhaddi & Ouarzazi 2021) datasets. However, to the best of our knowledge, WaveNet has not been improved for WLP tasks.
This study aims to develop a highly efficient WLP model of the convolutional structure. A more lightweight CNN, named WaveNet-based convolutional neural network (WCNN), was built by improving WaveNet on the basis of WLP characteristics. The GR in China was selected as the study area, the 1/2/3 days ahead water levels were predicted for the Waizhou gauging station (abbreviation: Waizhou station), and the long-term sequence data of gauging stations could be used for verification. To make full use of the ability of the WCNN to extract spatial information, the water level and discharge sequences of three upstream gauging stations were used as auxiliary input features. Two RNN models (LSTM and GRU) were established, and the performance of the WCNN model was evaluated by comparing their results on the test set.
STUDY AREA AND DATA DESCRIPTION
METHODS
Procedure of WLP of models
Feature selection
. | Water levels of gauging stations . | Dsicharges of gauging stations . | ||||||
---|---|---|---|---|---|---|---|---|
Waizhou station . | Zhangshu station . | Xiajiang station . | Ji'an station . | Waizhou station . | Zhangshu station . | Xiajiang station . | Ji'an station . | |
1-day ahead water level Zwz | 0.89 | 0.70 | 0.64 | 0.61 | 0.70 | 0.71 | 0.65 | 0.62 |
2-day ahead water level Zwz | 0.80 | 0.65 | 0.63 | 0.63 | 0.64 | 0.66 | 0.62 | 0.62 |
3-day ahead water level Zwz | 0.73 | 0.61 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.61 |
. | Water levels of gauging stations . | Dsicharges of gauging stations . | ||||||
---|---|---|---|---|---|---|---|---|
Waizhou station . | Zhangshu station . | Xiajiang station . | Ji'an station . | Waizhou station . | Zhangshu station . | Xiajiang station . | Ji'an station . | |
1-day ahead water level Zwz | 0.89 | 0.70 | 0.64 | 0.61 | 0.70 | 0.71 | 0.65 | 0.62 |
2-day ahead water level Zwz | 0.80 | 0.65 | 0.63 | 0.63 | 0.64 | 0.66 | 0.62 | 0.62 |
3-day ahead water level Zwz | 0.73 | 0.61 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.61 |
The WCNN
WaveNet is a deep network model proposed by DeepMind for generating raw audio waveforms that have large and flexible receptive fields and good parallelism, and can capture long-term dependencies of sequences (Rethage et al. 2018; van den Oord et al. 2018). WaveNet can be extrapolated to datasets that are outside of audio, and many achievements have been made in research on time series prediction (Luo et al. 2021; Rueda et al. 2021; Nie et al. 2022).
- (1)
The input layer received the sequences of predicted variables X and auxiliary input A, and the input shape was [timesteps, features]. As mentioned in Section 3.2, X was Zwz, and A included Zzs, Zxj, Zja, Qwz, Qzs, Qxj, and Qja in this study. To implement the residual operation, inputs were first transformed into the output shape of the residual blocks through a causal convolutional layer.
- (2)
The main components of the residual block were the convolutional layer and ReLU function. Through these two structures, the order of data was ensured and the spatiotemporal nonlinear mapping of data was learned. Moreover, a temporal-excitation (TE) block based on a squeeze-and-excitation (SE) block was proposed to learn the long-term dependencies of data (Hu et al. 2019). The TE block obtained the global temporal information by explicitly modelling the relationships between the timesteps of the convolution channel U. The structure is shown in Figure 5. First, the transpose function Ftr(·) is used to swap the coordinate systems of the temporal features and channel features of the channel U. Then, the excitation operation Fex(·) is used to capture the temporal dependence of the channel U, generating a set of modulation weights for each channel. Specifically, a fully connected (FC) layer with a dimensionality-reduction ratio r (r = 2) and ReLU function are used to parameterize the nonlinearity between the time steps, with an FC layer that restores the dimension and a sigmoid function that scales the weights, as shown in Figure 5. Finally, Ftr(·) is used to restore the coordinate system, and a multiplication operation Fmul(·,·) is used to integrate the results into the backbone network.
The number and size of the convolution kernels for all residual blocks in the WCNN were the same, which ensured that all residual blocks outputted a uniform shape. As shown in Figure 5, in the first residual block, the predicted variables X and condition A are based on a convolution operation and ReLU function to obtain channel U containing temporal and spatial features. The calculation formula is shown in Equation (3). Subsequently, the TE block is utilized to learn the global temporal information of the channel U to recalibrate the convolutional channel features, and the calculation formula is shown in Equation (4). The output of the TE block is fed into the backbone network by multiplying it with U, and then, it is added to the inputs to obtain the final output of the residual block z1, as shown in Equations (5) and (6). The other residual blocks receive zk−1 and output zk, and the calculation formulas are as follows in Equations (7)–(10).
- (3)The output layer is a 1 × 1 convolutional layer with linear activation. The output zk of the last residual block first performs the calculation of the ReLU function and then enters the output layer and outputs O=(ot−n+1,ot−n+2,…,ot), n is the input length, and ot denotes the tth timestep prediction, which is the required predicted value. The calculation formula is shown in Equation (11).where zK is the output of the last residual block; Wo and bo represent the weight and bias of the output layer, respectively; and O refers to the result of the output layer.
The WCNN inherits the advantages of WaveNet. It has a lightweight structure, good learning ability and parallelism, and can solve network degradation (vanishing/explosion gradient) problems. It differs from the original WaveNet in the following three points:
- (1)
Replace the gated activation function in the residual structure with the ReLU function. It can learn nonlinear correlations of sequences effectively, making the WCNN more applicable to 1-D sequential regression problems. This replacement has been proven to be effective in nonstationary and noisy time series forecasting tasks (Borovykh et al. 2019). In addition, the application of the ReLU function can also reduce the complexity of the model and training time.
- (2)
Replace the dilated convolutional structure with a TE block. A TE block based on an SE block is proposed to learn the long-term temporal dependencies of sequences. It obtains global temporal information by capturing the information between time steps and feeds back this information to the convolutional channel. Thus, the network can select key temporal features for mapping.
- (3)
For application to 1-D sequential regression studies, the softmax distribution of the output layer is replaced with a linear function.
Baseline model
LSTM and GRU are widely used ML models, and have already been applied to various WLP tasks (Zhang et al. 2018; Ren et al. 2020; Noor et al. 2022). Thus, LSTM and GRU were chosen as the baseline models in this study. LSTM solves the short-term memory and vanishing gradient problems by setting three gate (e.g., forget, input, and output gate) structures and two hidden states (e.g., short-term and long-term state). The specific structure and calculation formulas of LSTM are shown in Gers et al. (1999). GRU is a simplified version of LSTM that combines the two state variables in the LSTM unit into one. It controls the forget gate and input gate by a single gate controller (Géron 2019).
Model parameter settings and evaluation metrics
The hyperparameters have a great influence on the prediction performance of the model. The basic network architecture of WCNN is different from that of LSTM and GRU models, thus, its hyperparameters are also different. The network hyperparameters of the WCNN model include the residual block number, kernel size and filter number of the convolutional layer. The network hyperparameters of LSTM and GRU models are the recurrent layer number and neurons number in each layer. The hyperparameters of the three models in the training stage are the same, which are optimizer, epochs, batch size and learning rate. For the above hyperparameters, the grid search method was used to determine the optimum value. This method selected one value from the value range of each hyperparameter of the model and combined the parameters to build the model, and then the training data were used to validate it. The optimal hyperparameter combination was determined by comparing the loss between the predicted value and the measured value on the validation set under different parameter combinations. The range of hyperparameters and the optimal parameter configuration of the three models are shown in Table 2. After optimization by the grid search method, the optimal parameter configuration of the three models is as follows: the WCNN model consists of four residual blocks. The filter number and kernel size of each residual block are 40 and 5, respectively. The LSTM and GRU models consist of two recurrent layers, the number of neurons in the first and second layers are 100 and 50, respectively. The dropout rate is 0.2. For the hyperparameters of models in the training stage, the optimal parameters of the three models are the same. Adam is selected to be the optimizer, the batch size is set to 100 and the timestep is 10. The initial learning rate is set to 0.001, which is reduced by 50% while the validation loss does not decrease in 20 epochs for scheduling.
Parameters . | Optimization range . | WCNN . | LSTM . | GRU . |
---|---|---|---|---|
Layers | [1, 2, 3] | – | 2 | 2 |
Neurons | [50, 100, 150] | – | [100, 50] | [100, 50] |
Dropout | [0.1, 0.2, 0.3, 0.4, 0.5] | – | 0.2 | 0.2 |
Residual blocks | [1, 2, 3, 4, 5, 6] | 4 | – | – |
Kernel size | [1, 2, 3, 4, 5, 6, 7, 8] | 5 | – | – |
Filter number | [20, 30, 40, 50, 60] | 40 | – | – |
Timesteps or Input Length | [6, 10, 17, 30, 45] | 10 | 10 | 10 |
Optimizer | [SGD,RMSprop,Adam,Nadam] | Adam | Adam | Adam |
Epochs | [50,100,200,300,400,500] | 500 | 500 | 500 |
Batch size | [32,64,100,128,256] | 100 | 100 | 100 |
Initial learning rate | [0.01,0.005,0.001,0.0005,0.0001] | 0.001 | 0.001 | 0.001 |
Parameters . | Optimization range . | WCNN . | LSTM . | GRU . |
---|---|---|---|---|
Layers | [1, 2, 3] | – | 2 | 2 |
Neurons | [50, 100, 150] | – | [100, 50] | [100, 50] |
Dropout | [0.1, 0.2, 0.3, 0.4, 0.5] | – | 0.2 | 0.2 |
Residual blocks | [1, 2, 3, 4, 5, 6] | 4 | – | – |
Kernel size | [1, 2, 3, 4, 5, 6, 7, 8] | 5 | – | – |
Filter number | [20, 30, 40, 50, 60] | 40 | – | – |
Timesteps or Input Length | [6, 10, 17, 30, 45] | 10 | 10 | 10 |
Optimizer | [SGD,RMSprop,Adam,Nadam] | Adam | Adam | Adam |
Epochs | [50,100,200,300,400,500] | 500 | 500 | 500 |
Batch size | [32,64,100,128,256] | 100 | 100 | 100 |
Initial learning rate | [0.01,0.005,0.001,0.0005,0.0001] | 0.001 | 0.001 | 0.001 |
The number of training parameters is an important measure of model training efficiency. The fewer the training parameters, the shorter the training time of the model. The number of training parameters of WCNN, LSTM and GRU under the same input conditions identified in Section 3.2 was calculated, which were 32,937, 73,851, and 55,851, respectively. The WCNN model had the fewest parameters, and the LSTM model had the most parameters.
RESULTS
Performance comparison of different models
The WCNN, LSTM and GRU models were built to simulate Zwz for 1/2/3 days ahead, and the results are shown in Table 3. Compared with the LSTM and GRU models for 1-day ahead prediction of total water levels, the MAE and RMSE of the WCNN models are improved by 26.30 and 34.98% and by 19.97 and 24.93%, respectively. The NSE and R2 are both improved to 0.998 on the complete test set.
. | . | Model & Forecast period . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | WCNN . | LSTM . | GRU . | ||||||
Range of water level . | Metrics . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . |
All water levels | MAE | 0.038 | 0.115 | 0.213 | 0.052 | 0.127 | 0.222 | 0.059 | 0.131 | 0.238 |
RMSE | 0.057 | 0.171 | 0.333 | 0.071 | 0.179 | 0.337 | 0.076 | 0.180 | 0.348 | |
NSE | 0.998 | 0.979 | 0.920 | 0.996 | 0.977 | 0.918 | 0.996 | 0.977 | 0.912 | |
R2 | 0.998 | 0.980 | 0.921 | 0.997 | 0.979 | 0.920 | 0.997 | 0.979 | 0.918 | |
Water level < 17 m | MAE | 0.021 | 0.047 | 0.077 | 0.036 | 0.077 | 0.087 | 0.071 | 0.092 | 0.104 |
RMSE | 0.031 | 0.067 | 0.102 | 0.044 | 0.089 | 0.112 | 0.080 | 0.106 | 0.130 | |
NSE | 0.982 | 0.916 | 0.801 | 0.964 | 0.848 | 0.758 | 0.878 | 0.786 | 0.676 | |
R2 | 0.982 | 0.922 | 0.831 | 0.977 | 0.914 | 0.819 | 0.973 | 0.904 | 0.805 | |
17 m ≤ Water level ≤ 19 m | MAE | 0.041 | 0.117 | 0.204 | 0.052 | 0.120 | 0.215 | 0.048 | 0.121 | 0.230 |
RMSE | 0.058 | 0.153 | 0.278 | 0.069 | 0.159 | 0.285 | 0.064 | 0.159 | 0.297 | |
NSE | 0.990 | 0.930 | 0.768 | 0.986 | 0.924 | 0.757 | 0.988 | 0.924 | 0.736 | |
R2 | 0.990 | 0.941 | 0.805 | 0.987 | 0.933 | 0.805 | 0.989 | 0.936 | 0.807 | |
Water level > 19 m | MAE | 0.059 | 0.219 | 0.466 | 0.078 | 0.230 | 0.469 | 0.071 | 0.229 | 0.484 |
RMSE | 0.082 | 0.297 | 0.619 | 0.106 | 0.306 | 0.621 | 0.101 | 0.301 | 0.633 | |
NSE | 0.985 | 0.809 | 0.170 | 0.976 | 0.797 | 0.165 | 0.978 | 0.804 | 0.131 | |
R2 | 0.986 | 0.832 | 0.451 | 0.979 | 0.842 | 0.442 | 0.978 | 0.836 | 0.427 |
. | . | Model & Forecast period . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | WCNN . | LSTM . | GRU . | ||||||
Range of water level . | Metrics . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . |
All water levels | MAE | 0.038 | 0.115 | 0.213 | 0.052 | 0.127 | 0.222 | 0.059 | 0.131 | 0.238 |
RMSE | 0.057 | 0.171 | 0.333 | 0.071 | 0.179 | 0.337 | 0.076 | 0.180 | 0.348 | |
NSE | 0.998 | 0.979 | 0.920 | 0.996 | 0.977 | 0.918 | 0.996 | 0.977 | 0.912 | |
R2 | 0.998 | 0.980 | 0.921 | 0.997 | 0.979 | 0.920 | 0.997 | 0.979 | 0.918 | |
Water level < 17 m | MAE | 0.021 | 0.047 | 0.077 | 0.036 | 0.077 | 0.087 | 0.071 | 0.092 | 0.104 |
RMSE | 0.031 | 0.067 | 0.102 | 0.044 | 0.089 | 0.112 | 0.080 | 0.106 | 0.130 | |
NSE | 0.982 | 0.916 | 0.801 | 0.964 | 0.848 | 0.758 | 0.878 | 0.786 | 0.676 | |
R2 | 0.982 | 0.922 | 0.831 | 0.977 | 0.914 | 0.819 | 0.973 | 0.904 | 0.805 | |
17 m ≤ Water level ≤ 19 m | MAE | 0.041 | 0.117 | 0.204 | 0.052 | 0.120 | 0.215 | 0.048 | 0.121 | 0.230 |
RMSE | 0.058 | 0.153 | 0.278 | 0.069 | 0.159 | 0.285 | 0.064 | 0.159 | 0.297 | |
NSE | 0.990 | 0.930 | 0.768 | 0.986 | 0.924 | 0.757 | 0.988 | 0.924 | 0.736 | |
R2 | 0.990 | 0.941 | 0.805 | 0.987 | 0.933 | 0.805 | 0.989 | 0.936 | 0.807 | |
Water level > 19 m | MAE | 0.059 | 0.219 | 0.466 | 0.078 | 0.230 | 0.469 | 0.071 | 0.229 | 0.484 |
RMSE | 0.082 | 0.297 | 0.619 | 0.106 | 0.306 | 0.621 | 0.101 | 0.301 | 0.633 | |
NSE | 0.985 | 0.809 | 0.170 | 0.976 | 0.797 | 0.165 | 0.978 | 0.804 | 0.131 | |
R2 | 0.986 | 0.832 | 0.451 | 0.979 | 0.842 | 0.442 | 0.978 | 0.836 | 0.427 |
Influence of different inputs on results
Influence of hyperparameters
DISCUSSION
The proposed WCNN model was compared with LSTM and GRU models in the WLP of l/2/3 days ahead of the Waizhou station in GR, China. The result shows that the best prediction with the fewest training parameters was achieved in the WCNN model, which was consistent with the findings of variant models of WaveNet proposed by researchers (Benhaddi & Ouarzazi 2021; Zhang et al. 2021a) in air quality prediction and other sequence modelling tasks, respectively. The WCNN model can be generalized to other rivers with different climate conditions. Because the inputs of the model (i.e., water level and discharge data of the gauging stations) reflect the characteristics of the weather conditions, the data of the predicted station and its upstream stations are easy to obtain.
On the other hand, according to the propagating physical characteristics of rivers, the water level and discharge sequences of gauging stations were selected to be the model inputs. The reasonable input features were selected by calculating the MIC of variables, referring to the studies of Lu et al. (2021) and Zhang et al. (2021b). The influence of input features on the prediction results of models was quantified. The results showed that adding variables of high correlation with predicted variables to model inputs could improve prediction performance. But with the increase of input features, the increase rate of prediction accuracy gradually slowed down, and the training time of models also increased. Therefore, when selecting input features, it was necessary to comprehensively consider the relationship between the time cost caused by increasing input features and the degree of performance improvement. When setting the model parameters, the grid search method was used to optimize the parameters and determine the optimal values. The influence of the main hyperparameters of the WCNN model, such as input length, layer number, kernel size, and filter number, on the prediction results was analyzed and discussed in section 4.3. At present, there are various methods for hyperparameter optimization, such as Random Search, Bayesian Optimization, and Genetic Algorithm. These methods will be selected and adopted for our future research.
Range of water level . | Metrics . | Model & Forecast period . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
WCNN . | LSTM . | GRU . | ||||||||
t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | ||
All water levels | MAE | 0.063 | 0.180 | 0.370 | 0.088 | 0.181 | 0.316 | 0.097 | 0.212 | 0.408 |
RMSE | 0.098 | 0.267 | 0.503 | 0.123 | 0.281 | 0.470 | 0.130 | 0.284 | 0.517 | |
NSE | 0.997 | 0.981 | 0.934 | 0.996 | 0.979 | 0.943 | 0.996 | 0.979 | 0.930 | |
R2 | 0.998 | 0.982 | 0.936 | 0.997 | 0.980 | 0.945 | 0.996 | 0.981 | 0.939 | |
Water level < 14 m | MAE | 0.047 | 0.127 | 0.245 | 0.099 | 0.107 | 0.183 | 0.079 | 0.208 | 0.452 |
RMSE | 0.069 | 0.164 | 0.298 | 0.113 | 0.147 | 0.248 | 0.097 | 0.235 | 0.492 | |
NSE | 0.970 | 0.837 | 0.454 | 0.921 | 0.869 | 0.623 | 0.941 | 0.665 | 0.486 | |
R2 | 0.975 | 0.894 | 0.670 | 0.972 | 0.903 | 0.744 | 0.957 | 0.897 | 0.696 | |
14 ≤ Water level ≤ 16 m | MAE | 0.060 | 0.172 | 0.390 | 0.070 | 0.179 | 0.298 | 0.107 | 0.173 | 0.312 |
RMSE | 0.097 | 0.258 | 0.516 | 0.111 | 0.267 | 0.411 | 0.146 | 0.252 | 0.418 | |
NSE | 0.971 | 0.792 | 0.171 | 0.962 | 0.778 | 0.473 | 0.933 | 0.802 | 0.454 | |
R2 | 0.971 | 0.809 | 0.500 | 0.962 | 0.815 | 0.601 | 0.934 | 0.824 | 0.560 | |
Water level > 16 m | MAE | 0.082 | 0.245 | 0.479 | 0.096 | 0.261 | 0.479 | 0.103 | 0.261 | 0.474 |
RMSE | 0.122 | 0.353 | 0.641 | 0.145 | 0.386 | 0.670 | 0.141 | 0.357 | 0.633 | |
NSE | 0.994 | 0.946 | 0.823 | 0.991 | 0.936 | 0.806 | 0.991 | 0.945 | 0.827 | |
R2 | 0.994 | 0.949 | 0.853 | 0.992 | 0.942 | 0.836 | 0.993 | 0.948 | 0.854 |
Range of water level . | Metrics . | Model & Forecast period . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
WCNN . | LSTM . | GRU . | ||||||||
t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | t + 1 . | t + 2 . | t + 3 . | ||
All water levels | MAE | 0.063 | 0.180 | 0.370 | 0.088 | 0.181 | 0.316 | 0.097 | 0.212 | 0.408 |
RMSE | 0.098 | 0.267 | 0.503 | 0.123 | 0.281 | 0.470 | 0.130 | 0.284 | 0.517 | |
NSE | 0.997 | 0.981 | 0.934 | 0.996 | 0.979 | 0.943 | 0.996 | 0.979 | 0.930 | |
R2 | 0.998 | 0.982 | 0.936 | 0.997 | 0.980 | 0.945 | 0.996 | 0.981 | 0.939 | |
Water level < 14 m | MAE | 0.047 | 0.127 | 0.245 | 0.099 | 0.107 | 0.183 | 0.079 | 0.208 | 0.452 |
RMSE | 0.069 | 0.164 | 0.298 | 0.113 | 0.147 | 0.248 | 0.097 | 0.235 | 0.492 | |
NSE | 0.970 | 0.837 | 0.454 | 0.921 | 0.869 | 0.623 | 0.941 | 0.665 | 0.486 | |
R2 | 0.975 | 0.894 | 0.670 | 0.972 | 0.903 | 0.744 | 0.957 | 0.897 | 0.696 | |
14 ≤ Water level ≤ 16 m | MAE | 0.060 | 0.172 | 0.390 | 0.070 | 0.179 | 0.298 | 0.107 | 0.173 | 0.312 |
RMSE | 0.097 | 0.258 | 0.516 | 0.111 | 0.267 | 0.411 | 0.146 | 0.252 | 0.418 | |
NSE | 0.971 | 0.792 | 0.171 | 0.962 | 0.778 | 0.473 | 0.933 | 0.802 | 0.454 | |
R2 | 0.971 | 0.809 | 0.500 | 0.962 | 0.815 | 0.601 | 0.934 | 0.824 | 0.560 | |
Water level > 16 m | MAE | 0.082 | 0.245 | 0.479 | 0.096 | 0.261 | 0.479 | 0.103 | 0.261 | 0.474 |
RMSE | 0.122 | 0.353 | 0.641 | 0.145 | 0.386 | 0.670 | 0.141 | 0.357 | 0.633 | |
NSE | 0.994 | 0.946 | 0.823 | 0.991 | 0.936 | 0.806 | 0.991 | 0.945 | 0.827 | |
R2 | 0.994 | 0.949 | 0.853 | 0.992 | 0.942 | 0.836 | 0.993 | 0.948 | 0.854 |
CONCLUSIONS
To improve the effectiveness of WLP models under the conditions of large datasets, a new CNN model named WCNN was proposed by adapting the WaveNet architecture in the audio generation domain. It was applied to predict the water level at the Waizhou station of the GR in China. The main conclusions are as follows:
- (1)
For l/2/3 days ahead WLP of the Waizhou station, the WCNN, LSTM, and GRU models all had good predicted accuracy. Moreover, the WCNN showed better prediction performance than the baseline models of LSTM and GRU. Compared with the other two models, the model structure of WCNN was lighter, the training parameters were minimal, and the model had good parallelism, which could significantly reduce the training time of the model. It is a more efficient method for WLP.
- (2)
It is recommended to select the water level and discharge of upstream stations that are highly correlated with the prediction station as input features to improve the prediction accuracy. However, when the input features increase to a certain amount, the increase rate of the prediction accuracy gradually slows down, and the training time of models also increases. The time and data costs associated with the addition of input features should be considered in combination with the incremental performance improvement.
ACKNOWLEDGEMENTS
This work was supported by the National Key Research and Development Programme of China (2021YFD1700802).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.