Abstract
River flow prediction is a challenging problem due to highly nonlinear hydrological processes and high spatio-temporal variability. Here we present a hybrid network of convolutional neural network (CNN) and long short-term memory (LSTM) network for river flow prediction. The hybridization enables accurate identification of the spatial and temporal features in precipitation. A shortcut layer is used as an additional channel of passing input features through the deep network to increase feature diversity. The flows in Hun River Basin, China are predicted using the trained hybrid network and are compared with the results from the Soil and Water Assessment Tool (SWAT) model. The results demonstrate the learning efficiency of the hybrid network is greatly affected by its structure and parameters, including the number of convolutional layers and LSTM cell layers, the step size of pooling and training data size. Further, the shortcut layer can effectively solve the diversity reduction problem in a deep network. The hybrid network is shown to have a similar predictive performance to SWAT but is superior in wet seasons due to its nonlinear learning ability. This study shows that the hybrid network has great promise in learning nonlinear and high spatio-temporal variability in river flow forecasting.
HIGHLIGHTS
Developed a hybrid convolutional neural network and long short-term memory network (CNN-LSTM) for hydrological process prediction.
The performances of the network structures and the effects of shortcut layers are evaluated separately.
CNN-LSTM has good predictive accuracy compared to Soil and Water Assessment Tool (SWAT) model.
INTRODUCTION
Hydrological processes are normally characterized by a high degree of nonlinearity and spatial-temporal variability. Traditional statistical models and lumped hydrological models can address the temporal variability in precipitation and flow time series, however, it is difficult to represent their spatial variability (Shi et al. 2011). Distributed hydrological models have long been constructed to represent the temporal and spatial variability of hydrometeorological data (Zamani et al. 2020). However, they are computationally expensive and have difficulty in parameterization in particular due to the equifinality phenomenon (Beven & Binley 1992).
Data-driven models, an alternative to hydrological models, have been shown to be able to learn rainfall-runoff relationships directly from data (Nourani 2017). Artificial neural networks (ANNs), one of the popular data-driven models, have been widely applied in hydrological modelling for its strong nonlinear fitting ability (ASCE Task Committee 2000). Traditional ANNs, however, cannot automatically extract input data features and effectively represent spatial variability, thus their applications are limited in practice. With the recent advances in deep learning technologies, various algorithms such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied to solve water management problems including hydrological predictions (Kratzert et al. 2018; Shen 2018; Bi et al. 2020).
The CNN has superior capability in capturing spatial data features (Krizhevsky et al. 2012). It played a key role in the latest development of deep learning (LeCun et al. 2015). The prototype of CNNs originated from the convolutional neural layer (CNL) constructed by Fukushima (1980). LeCun et al. (1989) then constructed a CNN network using back-propagation and applied it to word recognition problems. Subsequently, LeCun & Bottou (1998) proposed the LeNet-5 network to improve the recognition accuracy. Krizhevsky et al. (2012) proposed an efficient CNN named as AlexNet to classify the high-resolution images in the LSVRC-2010 ImageNet training set. Szegedy et al. (2015) established a network named as GoogleNet with more than 20 layers in Image Net Large-Scale Visual Recognition Challenge 2014. Compared with AlexNet, the parameter number of the GoogleNet is reduced by 12 times and it has a higher classification capability. Since then, more networks such as ResNet and DenseNet have been proposed, which promoted the application of CNNs to solving real-world problems such as leakage detection and localization problems (Zhou et al. 2019). In hydrological modelling, Ge et al. (2018) constructed a deep convolutional neural network for soil moisture estimation from satellite observations, which achieved better performance than traditional neural networks. Pan et al. (2019) constructed a CNN network to foster the aspect of statistical downscaling for daily precipitation prediction. Wang et al. (2020) constructed a CNN network for flood susceptibility assessment using 13 flood triggering factors related to historical flood events. Song et al. (2020) developed a novel 1D CNN with a ReLU activation function for rainfall–runoff modelling.
The RNN has strong learning ability for time series prediction. During the training, however, it can experience the gradient disappearance and gradient explosion problems which make it difficult to learn long-distance information (Bengio et al. 1994). To overcome this deficiency, Hochreiter & Schmidhuber (1997) proposed a long short-term memory (LSTM) network for learning long-term dependence. Based on the LSTM network, many variants have been constructed to improve the learning ability for different tasks (Xu et al. 2020). At present, the LSTM has been successfully used in speech recognition and text translation (Wu et al. 2016; Rocha et al. 2019). In recent years, the LSTM has been tested and studied in watershed hydrological modelling, and their potential has been demonstrated in many applications such as river flow and flood predictions (Shen 2018). Kratzert et al. (2018) applied the LSTM network to simulate the daily flows and found that it greatly outperforms hydrological models that are calibrated both at the regional level and at the individual basin level. Lee et al. (2018) developed an LSTM for daily runoff simulations based on the water level data of 10 stations at the upper Mekong River, and showed that the LSTM performs better than the Soil and Water Assessment Tool (SWAT) model. Sahoo et al. (2019) applied the LSTM to forecast daily flows during low-flow periods in the Mahanadi River Basin, India. Hu et al. (2018) tested the LSTM on 98 flood events and indicated that the LSTM model outperforms conceptual and physical models. Muhammad et al. (2019) proposed a hybrid model by combining LSTM and gated recurrent units (GRU) model for river flow simulations, which was used for early flood warning. Le et al. (2019) used the LSTM in modelling 1-, 2-, and 3-day flood events in Vietnam's Da River basin. Li et al. (2020) proposed a self-attention mechanism LSTM network, which can effectively capture the dependence between time series and extract key hydrologic features to address the problems of low accuracy and limited hydrological data. Xu et al. (2020) constructed an LSTM network for river flow prediction, and evaluated the impacts of network structures and parameters on the performance. The results show that the batch size and the number of LSTM cells are sensitive parameters and should be carefully tuned to achieve a balance between learning efficiency and stability. Li et al. (2021) developed a sequence-to-sequence long-short-term-memory (LSTM) network for flood forecasting, and compared with the output of a gridded surface subsurface hydrologic analysis (GSSHA), the results showed that the LSTM model was able to efficiently predict discharge and achieve good model performance. LSTM was also able to predict surface water temperature in a deep reservoir when trained using outputs from a 3-D hydrodynamic model (Wang et al. 2022). In summary, previous research has shown the LSTM has good performance in river flow prediction.
To solve problems with temporal characteristics, the Convolutional network with 3D tensors has been combined in the LSTM network for input data processing. Shi et al. (2015) constructed a Convolutional LSTM Network (ConvLSTM) for Precipitation Nowcasting. In the ConvLSTM, the radar map is divided into tiled non-overlapping patches and the pixels inside a patch are regarded as its measurements. To encode the spatial information, all the inputs, cell outputs, hidden states and gates of the ConvLSTM are 3D tensors whose last two dimensions are spatial dimensions (rows and columns). Based on the ConvLSTM network, Cao et al. (2019) presented an effective network, Star-Bridge Convolutional LSTM for the task of precipitation prediction. The Star Shape information bridge to add more information from the last time-step to make the feature flow in multi-layer ConvLSTM more robust. Kumar et al. (2020) developed a convcast architecture, in which three ConvLSTM layers are stacked for spatial and temporal learning and are followed by a 3D convolutional layer for the next 30 min precipitation prediction. Despite their seminal effort, there still exist some deficiencies in the ConvLSTM network. In the ConvLSTM network, there is no pooling and convolutional neural layers. Thus, when working with high resolution data, the convergence performance of the model can degrade rapidly (Cao et al. 2019; Chen et al. 2020).
With the development of the CNN network, combining the CNN and LSTM has already been applied to solve problems of temporal and spatial characteristics. Donahue et al. (2017) proposed a long-term recurrent convolutional network by combining CNN and LSTM networks to handle video recognition tasks, retrieval problems, and video narration challenges. The information in the video screens can be quickly and efficiently extracted by CNNs. Tsironi et al. (2017) proposed a convolutional LSTM for gesture recognition, which outperformed both CNN and LSTM separately. The pooling and convolutional neural layers give the CNN the ability to extract high resolution data. Baek et al. (2020) combined the CNN and LSTM network in parallel for predicting water levels and water quality concentrations, and the CNN is applied to extract high resolution radar data. Shao et al. (2020) proposed a novel hybrid deep model for multiple forecasts, in which the CNN and LSTM are constructed in parallel, and the features extracted by CNN and LSTM are concatenated and fused by flattening statistics components. The above research results show that the CNN and LSTM hybrid network has high learning abilities in spatial and temporal data, however, there is lack of its research and application in hydrological modelling.
The main aim of this study is to develop a new hybrid CNN and LSTM network (CNN-LSTM) for river flow prediction. The CNN-LSTM network has the ability to identify the spatial and temporal precipitation patterns. Specifically, the CNN is used to identify and extract the spatial patterns in precipitation data. Then the LSTM is used to learn the time series relationships between precipitation and flows. In this study, the Hun River Basin in China is taken as a case study to evaluate the learning efficiency of the hybrid network and its performance for flow prediction. The Soil and Water Assessment Tool (SWAT) model is used as a baseline model for comparison purposes.
METHODOLOGY
In this section, the CNN-LSTM network for flow predication is first presented, with the key components explained. An interpolation model is then introduced to process the precipitation data in the case study. Finally, the evaluation criteria are introduced.
CNN-LSTM network
The architecture of the CNN-LSTM network is shown in Figure 1, with the key components including the input matrices, CNN convolution, shortcut layer, flatten, concatenation, fully connected layer, and LSTM cells. The precipitation data as input matrices are convoluted by filters to generate feature matrices. The filter is a matrix, which can be regarded as the eye of the network to scan the input matrices for feature extraction. The feature matrices are pooled using the maximum pooling method to extract useful information. The shortcut layer is used as another channel to extract input information, which is actually as a pooling process and can increase the diversity of the extracted information. Then, the feature matrices are flattened into two vectors, which are concatenated and transferred through a fully connected layer to LSTM cells as an input vector. The LSTM cells have memory abilities to learn the rainfall-runoff relationship. Finally, the outputs of LSTM cells are transferred by the fully connected layers into output runoff vector.
Convolution
Maximum pooling
Flatten and concatenation
The high-dimensional feature matrix is converted in the Flatten function to a one-dimensional vector. The converting vectors in the convolution and shortcut channels are concatenated into a one-dimensional vector in the concatenation function.
Fully connected layer
LSTM cell structure
Figure 2 shows the structure of the LSTM cell. There are two key states in the LSTM cell, i.e., cell state and hidden state. In Figure 2, Ct−1 and Ht−1 represent the cell state and hidden state at time step t−1, respectively. The information in the hidden states can be added or removed from the cell state (Le et al. 2019), which is controlled by forget gate, input gate and output gate, represented by the dashed boxes in Figure 2. The LSTM cell uses gates to control the memory process to avoid the long-term dependency problem (Hochreiter & Schmidhuber 1997). The gates are constituted by five sample neural networks including three sigmoid networks and two tanh networks.




Loss function
Inverse distance weighting


Model evaluation criteria
CASE STUDY
Hun River Basin
In this study, the Hun River Basin is taken as a case study. It is located in the northeast of China, as shown in Figure 3. The river basin covers 124°43′–126°50′ E and 40°40′–42°15′ N with an approximate area of 15,000 km2. The precipitation is affected by the temperate monsoon continental climate, and 70% of the annual precipitation occurs from June to September.
Data
In the Hun River Basin, the 10-day perception data and flow process from 1970 to 2010 were obtained, including 12 meteorological stations and Huanren (HR) hydrological station. The data from 1970 to 1999 are used to train the CNN-LSTM and calibrate the SWAT. The data from 2000 to 2010 are used to verify the performances of the two models.
Comparison model
In this study, the SWAT model is constructed as a comparison model to evaluate the performance of the CNN-LSTM network. The SWAT model is a distributed hydrological model developed by U.S. Department of Agriculture and the Agricultural Research Service (Lee & Bae 2018). In this study, the SWAT model is calibrated using the Calibration Uncertainty Procedure (SWAT-CUP) program (Abbaspour et al. 2015). At last, the key parameter values, as shown in Table 1, are calibrated for the Hun River Basin as below.
RESULTS AND DISCUSSION
In this section, the impacts of the network structure on the learning efficiency are first evaluated. Then the flow simulation performance of the CNN-LSTM is evaluated by comparing with the SWAT model.
SWAT key parameters for Hun River Basin
Parameters . | Definition . | Hun River . |
---|---|---|
CN2 | SCS runoff curve number for moisture condition II | 72 |
ALPHA_BF | Base flow recession constant (days) | 0.8 |
GW_DELAY | Delay time for aquifer recharge (days) | 31 |
CH_N2 | Manning's n value for the main and tributary channels | 0.1 |
CH_K2 | Effective hydraulic conductivity of channel (mm/hr) | 5 |
SOL_AWC | Available water capacity (mm/mm) | 0.005 |
SMTMP | Snow melt minimum temperature (°C) | −1 |
CANMX | Canopy maximum storage (mm) | 4 |
ESCO | Soil evaporation compensation factor (mm) | 0.95 |
REVAPMN | Threshold depth for evaporation to occur (mm) | 71 |
Parameters . | Definition . | Hun River . |
---|---|---|
CN2 | SCS runoff curve number for moisture condition II | 72 |
ALPHA_BF | Base flow recession constant (days) | 0.8 |
GW_DELAY | Delay time for aquifer recharge (days) | 31 |
CH_N2 | Manning's n value for the main and tributary channels | 0.1 |
CH_K2 | Effective hydraulic conductivity of channel (mm/hr) | 5 |
SOL_AWC | Available water capacity (mm/mm) | 0.005 |
SMTMP | Snow melt minimum temperature (°C) | −1 |
CANMX | Canopy maximum storage (mm) | 4 |
ESCO | Soil evaporation compensation factor (mm) | 0.95 |
REVAPMN | Threshold depth for evaporation to occur (mm) | 71 |
Learning efficiency evaluation
The learning efficiencies of the CNN-LSTM are evaluated and compared using different CNN structures, LSTM cell layers, fully connected layers and batch sizes. In this study, the CNN-LSTM network training ends after 500 epochs.
CNN structures
The convolutional and pooling layers in the CNN-LSTM are the eyes to read and identify input data. Two CNN structures, namely CNN and CNN-shortcut, are compared regarding the effects on learning efficiency.
- (1)
Learning efficiencies of the CNN structures
The input data are extracted by the CNN layers, and the performance is affected by the CNN structure. In this section, the performances of the CNN with different number of convolutional layers, pooling layers, and step sizes of each pooling layer are evaluated by using four structure scenarios separately, as show in Table 2. In this study, the study catchment is divided into a grid of 111×87 resolution. The input matrices are represented as [360,111,87], in which 360 is the batch size, 111 and 87 are the height and width of the matrix.
The four structural scenarios of the CNN
Layers . | Scenario A1 . | Scenario A2 . | Scenario A3 . | Scenario A4 . |
---|---|---|---|---|
Input matrices | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] |
Convolutional layer 1 | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] |
Maximum pooling 1 | [360,55,43] (np=2) | [360,27,21] (np=4) | [360,27,21] (np=4) | [360,55,43] (np=2) |
Convolutional layer 2 | [360,55,43] | [360,27,21] | [360,27,21] | [360,55,43] |
Maximum pooling 2 | [360,27,21] (np=2) | [360,13,10] (np=2) | [360,6,5] (np=4) | [360,27,21] (np=2) |
Convolutional layer 3 | — | — | — | [360,27,21] |
Maximum pooling 3 | — | — | — | [360,13,10] (np=2) |
Convolutional layer 4 | — | — | — | [360,13,10] |
Maximum pooling 4 | — | — | — | [360,6,5] (np=2) |
Flatten | [360,567] | [360,130] | [360,30] | [360,30] |
Concatenate | [360,567] | [360,130] | [360,30] | [360,30] |
Fully connected layer | [360,10] | [360,10] | [360,10] | [360,10] |
LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] |
Fully connected layer 1 | [360,50] | [360,50] | [360,50] | [360,50] |
Fully connected layer 2 | [360,30] | [360,30] | [360,30] | [360,30] |
Fully connected layer 3 (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
Layers . | Scenario A1 . | Scenario A2 . | Scenario A3 . | Scenario A4 . |
---|---|---|---|---|
Input matrices | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] |
Convolutional layer 1 | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] |
Maximum pooling 1 | [360,55,43] (np=2) | [360,27,21] (np=4) | [360,27,21] (np=4) | [360,55,43] (np=2) |
Convolutional layer 2 | [360,55,43] | [360,27,21] | [360,27,21] | [360,55,43] |
Maximum pooling 2 | [360,27,21] (np=2) | [360,13,10] (np=2) | [360,6,5] (np=4) | [360,27,21] (np=2) |
Convolutional layer 3 | — | — | — | [360,27,21] |
Maximum pooling 3 | — | — | — | [360,13,10] (np=2) |
Convolutional layer 4 | — | — | — | [360,13,10] |
Maximum pooling 4 | — | — | — | [360,6,5] (np=2) |
Flatten | [360,567] | [360,130] | [360,30] | [360,30] |
Concatenate | [360,567] | [360,130] | [360,30] | [360,30] |
Fully connected layer | [360,10] | [360,10] | [360,10] | [360,10] |
LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] |
Fully connected layer 1 | [360,50] | [360,50] | [360,50] | [360,50] |
Fully connected layer 2 | [360,30] | [360,30] | [360,30] | [360,30] |
Fully connected layer 3 (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
Figure 4 shows the loss value variations of the four structure scenarios in Table 2 by five independent trainings, which represent the learning efficiency of the networks. Figure 4(a) shows the variations of the CNN-LSTM based on the Scenario A1. The loss values stop decreasing after 55 epochs of training, which indicates that the rainfall-runoff relationship cannot be captured well during all the five trainings. The reason is that this CNN structure does not accurately extract the input information. Thus the dimension of Flatten is high and it contain redundant information.
The loss value variations of the CNN-LSTM with different CNN structures.
Increasing the number of convolutional and pooling layers and the step size of pooling (np), the CNN can gradually reduce the output information to Flatten. Based on the Scenario A1, Scenario A2 changes the step size of the first pooling layer from 2 to 4, and the dimension of Flatten in the Scenario A2 is 4.36 times smaller than that of scenario A1. Two trainings can map the relationship between input and output data as shown in Figure 4(b).
The scenarios A3 and A4 have the same dimension of Flatten, which is 18.3 times smaller than that of Scenario A1. All five trainings of the Scenarios A3 and A4 succeeded, as shown in Figure 4(c) and 4(d). The convolutional layers and maximum pooling layers in the Scenarios A3 and A4 are two and four, respectively. The step sizes of pooling layers are 2 and 4, respectively. Thus, the number of neurons in Scenario A3 is smaller than that of Scenario A4. Comparing the loss values from 0 to 150 epochs in Figure 4(c) and 4(d), the loss values of the Scenario A4 in three trainings decrease faster than Scenario A3. It showed that Scenario A4 has stronger learning ability. However, Scenario A4 has more weights that need to be calibrated. With weights increasing, the network has higher possibility that falling into local optimization, and reducing the convergence speed. Thus, the loss values of Scenario A4 in two trainings decrease slower to find the optimal solution during 0 and 150 epochs.
- (2)
Learning efficiency with different CNN-shortcut structures
The CNN-shortcut structure included the shortcut layer as shown in Figure 1. Based on the Scenarios A2 and A3, four structure scenarios are constructed by adding the shortcut layer as shown in Table 3. The loss value variations of the four scenarios are shown in Figure 5. Comparing the variations of the scenarios A2, B1 and B2, the successful trainings in the scenarios B1 and B2 gradually increase as shown in Figure 5(a) and 5(b). The results indicate that the learning efficiency is improved by the shortcut. By adding the shortcut, the CNN has two channels with different mapping relationships to extract the input information, which make the extracted information more diverse. Comparing the variations of the Scenarios B1 and B2, the neurons and outputs of Scenario B2 are reduced with pooling step size of the shortcut increasing, which make input information more condensed and the network more stable. Scenario A3 has high learning efficiency, the Scenarios B3 and B4 modified from the Scenario A3 also have high efficiency, which indicates the shortcut has no negative impacts.
The four structural scenarios of CNN with shortcutting
Layers . | Scenario B1 . | Scenario B2 . | Scenario B3 . | Scenario B4 . | ||||
---|---|---|---|---|---|---|---|---|
Input matrices | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] | ||||
Convolutional layer | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] | ||||
Maximum pooling | [360,27,21] (np=4) | Shortcut: [360,6,5] (np=16) | [360,27,21] (np=4) | Shortcut: [360,3,2] (np=32) | [360,27,21] (np=4) | Shortcut: [360,6,5] (np=16) | [360,27,21] (np=4) | Shortcut: [360,3,2] (np=32) |
Convolutional layer | [360,27,21] | [360,27,21] | [360,27,21] | [360,27,21] | ||||
Maximum pooling | [360,13,10] (np=2) | [360,13,10] (np=2) | [360,6,5] (np=4) | [360,6,5] (np=4) | ||||
Flatten | [360,130] | [360,30] | [360,130] | [360,6] | [360,30] | [360,30] | [360,30] | [360,6] |
Concatenate | [360,160] | [360,136] | [360,60] | [360,36] | ||||
Fully connected layer | [360,10] | [360,10] | [360,10] | [360,10] | ||||
LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] | ||||
Fully connected layer | [360,50] | [360,50] | [360,50] | [360,50] | ||||
Fully connected layer | [360,30] | [360,30] | [360,30] | [360,30] | ||||
Fully connected layer (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
Layers . | Scenario B1 . | Scenario B2 . | Scenario B3 . | Scenario B4 . | ||||
---|---|---|---|---|---|---|---|---|
Input matrices | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] | ||||
Convolutional layer | [360,111,87] | [360,111,87] | [360,111,87] | [360,111,87] | ||||
Maximum pooling | [360,27,21] (np=4) | Shortcut: [360,6,5] (np=16) | [360,27,21] (np=4) | Shortcut: [360,3,2] (np=32) | [360,27,21] (np=4) | Shortcut: [360,6,5] (np=16) | [360,27,21] (np=4) | Shortcut: [360,3,2] (np=32) |
Convolutional layer | [360,27,21] | [360,27,21] | [360,27,21] | [360,27,21] | ||||
Maximum pooling | [360,13,10] (np=2) | [360,13,10] (np=2) | [360,6,5] (np=4) | [360,6,5] (np=4) | ||||
Flatten | [360,130] | [360,30] | [360,130] | [360,6] | [360,30] | [360,30] | [360,30] | [360,6] |
Concatenate | [360,160] | [360,136] | [360,60] | [360,36] | ||||
Fully connected layer | [360,10] | [360,10] | [360,10] | [360,10] | ||||
LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] | ||||
Fully connected layer | [360,50] | [360,50] | [360,50] | [360,50] | ||||
Fully connected layer | [360,30] | [360,30] | [360,30] | [360,30] | ||||
Fully connected layer (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
LSTM cell layer
In this section, 15 scenarios have been constructed based on the scenario B3 by changing the cell layers (from 1 to 5) and the cell sizes (i.e., 5, 10 and 15). The loss value variations of these scenarios are shown in Figure 6.
The loss value variations of LSTM with different cell layers and cell size.
Figure 6(a1)–6(a5) represent the loss value variations of the CNN-LSTM, when change cell layers from 1 to 5 and cell size is 5. The figures show that the successful training times are gradually decreasing with cell layers increasing. That is, the network with more cells needs to optimize more neuron weight parameters, which makes the network easy to fall into a local optimum. In Figure 6(a1), 6(b1) and 6(c1), the LSTM networks have one layer with 5, 10 and 15 cell sizes, respectively. Taking the black horizontal line as the reference line, the loss values during 400 and 500 epochs obviously decrease with the cells size increasing, which indicate that the LSTM learning efficiency is improved and the output results are closer to the target values. In this study, the structure in Figure 6(b1), i.e., one cell layer and 10 cell sizes, is selected as benchmark network.
Fully connected layer
The fully connected layers establish a non-linear transformation bridge between the LSTM cells and the output runoff vector. To compare the performance of the fully connected layers, four structure scenarios are presented based on the benchmark network as shown in Table 4. In the benchmark, the LSTM cell size is 10, thus the output vector of the last cell layer is a one-dimensional vector with 10 nodes and is transferred by the fully connected layers to the output vector. The output vector represents the simulation flows at the HR hydrological station.
The four structural scenarios of the fully connected layers
Layers . | Scenarios C1 . | Scenarios C2 . | Scenarios C3 . | Scenarios C4 . |
---|---|---|---|---|
Output of LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] |
Fully connected layer 1 | — | [360,50] | [360,50] | [360,100] |
Fully connected layer 2 | — | — | [360,30] | [360,30] |
Fully connected layer 3 (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
Layers . | Scenarios C1 . | Scenarios C2 . | Scenarios C3 . | Scenarios C4 . |
---|---|---|---|---|
Output of LSTM cell layer (1 layer and 10 cells) | [360,10] | [360,10] | [360,10] | [360,10] |
Fully connected layer 1 | — | [360,50] | [360,50] | [360,100] |
Fully connected layer 2 | — | — | [360,30] | [360,30] |
Fully connected layer 3 (Output layer) | [360,1] | [360,1] | [360,1] | [360,1] |
Scenario C1 has only one fully connected layer, which directly transforms the outputs of the cell layer to the output runoff vector. In Scenario C1, the number of neurons is too small and the nonlinear mapping ability is weak. With the number of fully connected layers increasing, the learning efficiency is improved gradually as shown in Figure 7.
The loss value variations of four fully connected layer structural scenarios.
Training data size
In this section, the 10-day perception data and flow process from 1970 to 2010 are taken as network training data to evaluate the performance of the LSTM. The training data size scenarios with 360 (from 1970 to 1979), 720 (from 1970 to 1989) and 1080 (from 1970 to 1999) are used to train the network, respectively. In each scenario, the LSTM network trained five times independently. Figure 8 shows the NSE values of the three training data size scenarios. The NES values of each scenario in Figure 8 are the average NSE values of the five times independent predication. The results show that with the training data size increasing, the performance of the LSTM during training is gradually diminished, however, the performance during verification is gradually improved. The network trained with small training data size, which makes the network prone to over-fitting, and the performance is poor in the verification. Increasing the training data size can suppress network over-fitting and obtain a network with stable performance.
The NSE values of the LSTM network using different training data size.
Performance evaluation
The observed flow and precipitation data from 1970 to 1999 are used to calibrate the SWAT and train the CNN-LSTM. The two models are verified using the data from 2000 to 2010. Figure 9 shows the flow processes of the simulated and observed flows during the calibration and verification. As shown in Figure 9(a), the training data contains large, medium and small flood samples, and flood processes with one flood peak and multiple peaks. The diversity of training data is conducive to network learning and reduces network over-fitting. The results also show that the SWAT and CNN-LSTM have the ability to forecast the river flows in the study basin.
The observed and simulated flows from CNN-LSTM network and SWAT model.
To evaluate the performances, the flows during wet season (from May to September) and dry season (October to April) are compared separately. The scatter plots and three evaluation criteria during the calibration and verification are shown in Figures 10 and 11, respectively.
The scatter plots and evaluation criteria of the two models during the calibration.
The scatter plots and evaluation criteria of the two models during the calibration.
The scatter plots and evaluation criteria of the two models during the verification.
The scatter plots and evaluation criteria of the two models during the verification.
Figure 10 shows the scatter plots and evaluation criteria during calibration. In the entire year and wet season, the NSE and R2 values of the CNN-LSTM are very close to SWAT model. The results indicate that the CNN-LSTM has the ability to learn the rainfall-runoff relationship. The R2 value of the SWAT is higher than the CNN-LSTM, which demonstrates the SWAT model has a stronger ability to interpret the observed flows in the study basin. The simulated flows of the CNN-LSTM are more dispersed than that of SWAT, thus the RE value is higher. In the dry season, the NSE and RE values of the SWAT are lower than CNN-LSTM, which indicates the CNN-LSTM has a stronger ability to simulate flow data.
Figure 11 shows the scatter plots and the criteria during the verification. In the entire year and wet season, the values of NSE and R2 are still relatively high compared with the calibration. This shows that the CNN-LSTM network has a strong predictive performance that matches the SWAT model. The forecasted peak flows of the two models are lower than the observed flows. In this study, the forecasting flows of four years, i.e., 2000, 2001, 2006 and 2010, are used to analyze the performances at the peak flows as shown in Figure 12.
The flow processes of the SWAT and CNN-LSTM models for years 2000, 2001, 2006 and 2010.
The flow processes of the SWAT and CNN-LSTM models for years 2000, 2001, 2006 and 2010.
During the dry season, the winter snow begins to melt from periods 6 to 16 and forms a peak flow. The flows are affected by temperature, precipitation and winter snow cover. The SWAT and CNN-LSTM models have poor performances for forecasting snowmelt as shown in Figure 12(a) and 12(d).
During the wet season, the flows are mainly affected by precipitation. Thus, the SWAT and CNN-LSTM have good performance both in simulation and forecasting, especially for single-peak flow processes, as shown in the periods from 20 to 26 in Figure 12(a) and 12(b). Figure 12(c) and 12(d) represent the multi-peak flow processes. The forecasted peak flows of the two models have large errors. In Figure 12(c), the time interval between the two precipitation peaks is relatively long. Both the SWAT and CNN-LSTM can forecast the two peak flow processes. However, when the time interval is short, the CNN-LSTM forecast only one peak flow instead of two, as shown in Figure 12(d).
CONCLUSIONS
In this study, the CNN and LSTM networks are combined to construct a hybrid deep learning network for river flow prediction, with an aim to learning the spatial and temporal patterns in meteorological and hydrological data. The performances of the CNN-LSTM network with different structures are evaluated and tested in the Hun River Basin. The performance of the CNN-LSTM is compared against the SWAT model. The key research conclusions are summarized below.
In the CNN structure evaluation, the results show that the number of convolutional layers and step size of pooling have great impacts on the learning efficiency. Increasing the number of convolutional layers, the CNN has more neurons to establish a nonlinear relationship between input and output, thus the recognition ability is improved. Increasing the step size of pooling, the amount of CNN output is reduced, which can effectively reduce redundant information and improve the network learning efficiency.
The effectiveness of the shortcut is evaluated in this study, and the results show that the shortcut can improve the learning efficiency and stability of the network. The shortcut makes the CNN have two channels with different mapping relationships to extract the input information and makes the extracted information more diverse.
The cell layers and cell size in each layer have significant impacts on the learning efficiency. The research results in this study show that the LSTM learning efficiency improved with the cell layers increasing and the output results are closer to the target values. However, the network with more cell size and layers needs to optimize more neuron weight parameters, which makes the network easy to fall into local optimum.
With the training data size increasing, the training data contains large, medium and small flood samples, and flood processes with one flood peak and multiple peaks. It makes the data more diverse, which can suppress network over-fitting and improve model performance.
The CNN-LSTM network is shown to have a superior nonlinear learning ability for spatial and temporal data, and its predictive performance is comparable to that of SWAT in terms of three criteria, i.e., R2, NSE, RE. Seasonal comparisons reveal that the CNN-LSTM has higher performance in the wet season than the dry season. This is similar to the performance of SWAT and can be explained by precipitation being the key factor for peak flows while other factors such as temperature and snowmelting also affect peak flows. For multi-peak flows, the performance of the CNN-LSTM depends on the intervals between peaks. This demonstrates the challenges in the use of deep learning to capture the complex hydrological processes such as snowmelting and multi-peak flows.
ACKNOWLEDGEMENTS
This research is supported by the National Natural Science Foundation of China (Grant Nos. 51609025 and 51709108), National Key Research and Development Program of China (2018YFC1508003), and the UK Royal Society through an industry fellowship to Guangtao Fu (Ref: IF160108) and an international collaboration project (Ref: IEC\NSFC\170249). Guangtao Fu is also supported by The Alan Turing Institute under the EPSRC (Grant EP/N510129/1). Chongqing technology innovation and application demonstration project (cstc2018jscx-msybX0274, cstc2016shmszx30002). We would like to thank the Hun River Cascade Hydropower Reservoirs Development Ltd for collecting the data (http://www.hydroshare.org/ resource/9e851535b3fd42a49d00c41bf277652c), and the observed data can be obtained after being approved by the organization.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. LSTM test data is provided on HydroShare website, http://www.hydroshare.org/resource/9e851535b3fd42a49d00c41bf277652c.