Abstract

Data representation and prediction model design play an important role in mid- to long-term runoff prediction. However, it is challenging to extract key factors that accurately characterize the changes in the runoff of a river basin because of the complex nature of the runoff process. In addition, the low accuracy is another problem for mid- to long-term runoff prediction. With an aim to solve these problems, two improvements are proposed in this paper. First, the partial mutual information (PMI)-based approach was employed for estimating the importance of various factors. Second, a deep learning architecture was introduced by using the deep belief network (DBN) with partial least-squares regression (PLSR), together denoted as PDBN, for mid- to long-term runoff prediction, which solves the problem of parameter optimization for the DBN using PLSR. The novelty of the proposed method lies in the key factor selection and a novel forecasting method for mid- to long-term runoff. Experimental results demonstrated that the proposed method can significantly improve the effect of mid- to long-term runoff prediction. Also, compared with the results obtained by current state-of-the-art prediction methods, i.e., DBN, backpropagation neural networks, and support vector machine models, our prediction results demonstrate the performance of the proposed method.

HIGHLIGHTS

  • Constructing the comprehensive runoff index to perform the comprehensive characterization of runoff changes in the whole watershed.

  • Applying the partial mutual information method for factor selection.

  • The PDBN is used as a deep learning approach for the mid- to long-term runoff prediction.

  • The proposed method can significantly improve the effect of the mid- to long-term runoff prediction.

Graphical Abstract

Graphical Abstract
Graphical Abstract

INTRODUCTION

The rapid development of information technology, especially the theory of data science and soft computing techniques, has significantly influenced the development and application of hydrology (Chen & Han 2016; Chen & Wang 2018; Ma et al. 2018), such as drought prediction (Shamshirband et al. 2020), flood prediction and management (Wang et al. 2017; Fotovatikhah et al. 2018; Mosavi et al. 2018), river flow forecasting (Bai et al. 2016b; Yaseen et al. 2019a, 2019b), and runoff forecasting (Valipour 2015; Tan et al. 2018). Among them, the mid- to long-term runoff prediction is an important research direction in hydrological forecasting and hydroinformatics. In general, the amount of basic hydrological data necessary for hydrological forecasting is increasing, both in terms of the amount of data and the number of data modes, which provides informative factors for the mid- to long-term runoff prediction. However, the increasing amount of hydrological data can introduce redundant and noisy information to the prediction feature or factor, which may deteriorate the performance of the mid- to long-term runoff prediction. At this time, traditional methods face difficulty in adapting to hydrological data analysis and runoff prediction due to insufficient noise removal and representation learning capabilities.

Data representation plays an important role in mid- to long-term runoff prediction. However, it is challenging to extract key factors that accurately characterize the changes in the runoff of a river basin because of the complex nature of the runoff process, which seriously affects the speed and accuracy of runoff prediction. Over the past decades, input factors have also evolved from considering only previous rainfall and runoff to multiple factors including runoff, rainfall, climate, and others. How to select a suitable set of key factors from the above factors is still difficult in mid- to long-term runoff prediction. In order to select key factors, some methods have been applied in runoff prediction, mainly including prior knowledge method (Huang et al. 2004), correlation coefficient method (Li et al. 2012), principal component analysis (PCA; Li et al. 2019), and partial mutual information (PMI; Fernando et al. 2009). Among them, the prior knowledge method is seldom applied due to the large amount of calculation. The methods of correlation coefficient and PCA are commonly used to describe the linear relationship between the impact factors, but difficult to characterize the complex nonlinear relationship, which limits the applicable scope of these general methods. The PMI method is based on information entropy theory, which has the advantages of characterizing relationships of linear and nonlinear among factors, and has been widely used in many fields.

Prediction model design is another important research content in mid- to long-term runoff prediction. At present, some various advanced data-driven methods have been proposed to improve the effect of mid- to long-term runoff prediction, such as the wavelet transforms (Jiang et al. 2018), support vector machines (Zhao et al. 2017), artificial neural networks (Alizadeh et al. 2017; Shoaib et al. 2018; Yaseen et al. 2019a, 2019b), and Bayesian statistic prediction models (Ma et al. 2013; Chu et al. 2017). Data-driven models have provided a useful tool for solving the watersheds with too complex and unknown runoff production. However, the mid- to long-term runoff prediction is low accuracy because of the highly nonlinear complex characteristics of mid- to long-term runoff process in the watershed. Moreover, an extremely high dimensional feature will be extracted if multiple factors are modeled, which significantly increases the scale of the prediction model and calculation complexity.

The important point for solving the above problems is to extract key factors that are closely related to the runoff in a river basin and design prediction model. Aiming to improve the mid- to long-term prediction of the runoff process, this study comprehensively introduces multiple hydrological factors to construct the prediction feature. Moreover, aiming to solve the problems caused by the highly nonlinear complex characteristics of mid- to long-term runoff forecasting in the watershed, and the shallow network is degenerated for the mid- to long-term runoff forecasting task, a deep belief network (DBN) combined with partial least-squares regression (PDBN) is established to learn the most representative key factors for mid- to long-term runoff prediction, which solves the problem of weight optimization for the DBN using PLSR. The main contribution of this study is three-fold:

  1. Investigating the consistency of runoff at different hydrological stations for constructing the COM according to the granulation computing to perform the comprehensive characterization of runoff changes in the whole watershed.

  2. Generating key factors for runoff prediction on the basis of the PMI method combined with manual selection.

  3. Applying to the mid- to long-term runoff prediction with the proposed PDBN based on the above key factors. The novelty of the proposed PDBN lies in: (i) an adaptive learning rate is introduced to accelerate the convergence rate in the unsupervised learning stage of the DBN and (ii) the PLSR is used to determine the optimal weights in the supervised learning stage to improve prediction accuracy.

A flowchart of our proposed method is shown in Figure 1.

Figure 1

Flowchart of our proposed method.

Figure 1

Flowchart of our proposed method.

The remainder of this paper is structured as follows: The introduction and review of literature are presented in this section. The next two sections describe the methodology and the proposed model PDBN used in this study, respectively. Moreover, the performance indices used to evaluate the models developed in this study is also reported in the section ‘Methodology’. The data used in the study are described in the section ‘Case Study’, and results of the different developed models are elaborated in the section ‘Case Study’. The conclusions are provided in the final section.

METHODOLOGY

The mid- to long-term runoff prediction method presented in this study consists of three parts: construction of COM, factor reduction using PMI, and DBN. In the following section, these parts will be elaborated in detail.

Construction of COM

The main influencing factors of the runoff change within the basin include rainfall, climate, and other factors. Unlike the previous research works, we investigate the consistency of runoff at different hydrological stations for constructing the COM according to granulation computing (Yao et al. 2013), to characterize the abundance and drought of runoff in the basin. According to the characteristics of mid- to long-term runoff forecasting, we choose the monthly scale as the time granularity. Specifically, the study includes two parts: the average runoff consistency analysis based on information from different hydrological stations and the construction of the COM.

Average runoff consistency analysis based on information from different hydrological stations

Firstly, the runoff consistency analysis considering information from different hydrological stations is conducted, and then, the correlation is used to calculate the correlation of the monthly mean runoff at different hydrological stations, thereby providing the basis for the construction of the COM for the entire basin.

Construction of the COM

To avoid the cumulative impact between the hydrological stations, we estimate the weights of the selected hydrological stations depending on their contribution and influence on the downstream hydrological stations. The COM is usually established on the monthly mean runoff at different hydrological stations (COM1). Specifically, the proposed COM is established on the controlling area of different hydrological stations (COM2).

Calculation steps of COM1
Assuming that the number of hydrological stations is n, the average monthly runoff at the hydrological station i is , the mean monthly runoff at the hydrological station i in the month j is Cij, and the weight of hydrological station i can be expressed as: 
formula
(1)
Therefore, the COM1 is established on the monthly mean runoff at different hydrological stations in the month j can be expressed as: 
formula
(2)
It should be noted that label j is not successive, as hydrological stations collect the data with different time intervals.
Calculation steps of COM2
Assuming that the number of hydrological stations is n, the percentage of the control area of ith hydrological station is Si, the mean monthly runoff at the hydrological station i in the month j is Cij, and the weight of hydrological station i can be expressed as: 
formula
(3)
Therefore, the COM2 is established on the controlling area of different hydrological stations in the month j can be expressed as: 
formula
(4)
It should be noted that label j is not successive, as hydrological stations collect the data with different time intervals.

Factor reduction using PMI

There is nonlinear correlation observed among hydrological and meteorological variables. In this case, MI can be used to select the input factors in runoff forecasting more effectively (He et al. 2011). However, while selecting input variables using MI, it should be taken into consideration that the coupling relationship between input variables affects the calculation results, which may lead to the mis-selection or omission of the input variables.

To solve the above problems, May et al. (May et al. 2008a; 2008b; Qian & Shu 2015) proposed the PMI methods to eliminate the relationship between variables through calculating conditional expectations, thereby ensuring the reliability and accuracy of variable selection.

Partial mutual information (PMI)

In the 1940s, Shannon employed the information entropy theory to measure the amount of information, and the information entropy of the COM can be expressed as follows: 
formula
(5)
where H(Cj) represents the information entropy of variable Cj, pi denotes the probability of variable Cj at different values, and k represents the number of variables.
Assuming that the joint probability distribution of the COM (Ci, Cj) is , then the two-dimensional joint entropy can be expressed as: 
formula
(6)
Assuming that the marginal distribution values of Ci and Cj are pi and pj, respectively, the conditional entropy of Ci can be expressed as follows: 
formula
(7)
Consequently, the mutual information can be expressed as: 
formula
(8)
Considering the third variable Cl, and taking the part of that is not included in Cl, we define the PMI: 
formula
(9)
Here, we select the Gaussian kernel function to estimate the sample probability density function.

PMI-based factor selection

When inputting multiple variables, there may be correlation presented among these variables. For example, if X and Z are input variables, and Y is the output variable, then there may be correlation between input variables X and Z, and the value of mutual information I (Y,Z) may be greater than the actual value. Therefore, we can eliminate X contained in variables Y and Z with conditional expectation, and the correlation between variables can be measured using PMI. After eliminating x, Y is denoted as u, Z is denoted as v, which can be expressed as: 
formula
(10)
 
formula
(11)
 
formula
(12)
PMI between Y and Z is converted into: 
formula
(13)

Specifically assuming that C stands for the input variable set, Y stands for the output variable (runoff forecasting results), S stands for the optimal input variable set, and stands for the candidate variable corresponding to the maximum PMI, the PMI-based variable selection can be realized as per the following steps:

  • Step 1. Initialize S as an empty set

  • Step 2. If C is a null set, return

  • Step 3. Calculate the u=YmY(S) by Equation (11)

  • Step 4. For each element of , calculate by Equation (12)

  • Step 5. If reaches the maximum, select Cs as the candidate variable

  • Step 6. Calculate the Akaike Information Criterion (AIC). If the AIC is decreased, moving Cs to the optimal input variable set S, and returning Step 2; Stop selection if AIC increases.

AIC is defined as follows: 
formula
(14)
where ui stands for the Y regression residual calculated from the selected variables; n stands for the number of samples, and p stands for the number of selected variables. With the selection of independent variables, AIC keeps decreasing. When AIC reaches the minimum, the collection of optimal independent variables is finished.

Deep belief network

The DBN is constructed by sequential stacking of multiple restricted Boltzmann machines (RBMs) (Ackley et al. 1985). The learning includes two stages: unsupervised and supervised stages. The unsupervised stage is mainly used to pre-train the RBM, while in the supervised stage, the gradient-based error back propagation algorithm is used to fine tune the whole network. This method allows for particular advantages in learning the deep structure and has been successfully applied to pattern recognition, prediction, and other tasks. In addition to its advantages in weight optimization, the DBN can also be used as a probability generation model to overcome the problem of overfitting and underfitting of the model. A typical RBM is a random indirect graphical model, and a common DBN is a ubiquitous deep neural network (Figure 2).

Figure 2

Architectures of RBM and DBN.

Figure 2

Architectures of RBM and DBN.

Unsupervised contrastive divergence algorithm

Unsupervised learning based on the contrastive divergence (CD) algorithm (Hinton 2002) is based on obtaining the initial weight of the entire DBN by training each RBM layer by layer. In an RBM, assuming that v represents the states of visual neurons, h represents the states of hidden neurons, and the model parameter is , so the probability distribution function based on the energy model can be defined as: 
formula
(15)
where Z is the normalization factor: 
formula
(16)
The edge distribution of the energy model v is defined as 
formula
(17)
The energy function of the RBM is defined as follows: 
formula
(18)
where the connection weight of the network is , the bias value of the visual layer is bi, and the bias value of the hidden layer is cj. Therefore, the conditional probability distribution can be described as: 
formula
(19)
 
formula
(20)
where σ is activation function, and .
For the convenience of operation, the states of visible neurons and hidden neurons are binary, the values of Equations (19) and (20) are usually determined as follows (taking the hidden neuron as an example): 
formula
(21)
where δ is a constant ranging from 0.5 to 1.
Calculating the gradient of , and the renewal equation of RBM is as follows: 
formula
(22)
 
formula
(23)
where η is the learning rate, is the expectation of input samples, and is the expectation of the distribution of the model. The approximate expectation values can be obtained by the Gibbs sampling method.

Supervised learning

Supervised learning fine-tunes the initial weights obtained by unsupervised learning based on the CD algorithm in the DBN and finally obtains the optimal parameters of the model. Taking the last hidden layer and the output layer as examples, if is the desired output and is the actual output of the model, the cost function of the output layer can be defined as: 
formula
(24)
where α is the number of iterations and K is the number of samples. The changed weights are as follows: 
formula
(25)
 
formula
(26)
where η is the learning rate.

Thus, the updated weight between the output layer and the last hidden layer can be obtained, and the weight of the whole DBN model can be obtained by fine-tuning layer by layer.

Evaluation criteria

To estimate the performance of the PDBN model, some evaluation criteria are included in the evaluation, including mean absolute percentage error (MAPE), root-mean-square error (RMSE), deterministic coefficient (DC), the relative error (RE), and the running time (RT): 
formula
(27)
 
formula
(28)
 
formula
(29)
 
formula
(30)
where yt represents the observed value of the COM in the Yalong River basin at t, denotes the predicted value at t, and n is the sample data. represents the mean of the observed value of the COM in the Yalong River basin.

The MAPE is a most frequently used statistics index and employed for examining the error between the predicted value and the observed value. The RMSE is used to measure the predicted value accuracy. A RMSE value of 0 indicates perfect match between the predicted value and the observed value. Likewise, the DC provides a measure of the capacity of the model to predict observed values. In general, high values of DC (up to 1) and small values for RMSE indicate a good model. The RE is used to reflect the reliability of the prediction model. In order to better estimate the calculation speed of the PDBN model, we add the running time (RT) index.

THE PROPOSED PDBN MODEL FOR MID- TO LONG-TERM RUNOFF PREDICTION

The reasons for runoff formation are always complex and are closely related to the surrounding hydrology, meteorology, climatic factors, economy, human activities, and many other factors. Furthermore, the main influencing factors are different for different basins and have complex nonlinear characteristics. Therefore, the mid- to long-term runoff prediction of watershed requires the abilities of nonlinear processing and generalization. Deep learning can extract feature information of samples layer by layer from massive amounts of data to obtain key feature factors and has better prediction speed and accuracy than the traditional shallow network. Deep learning has been widely used in many fields such as image processing, artificial intelligence, and there is little application of deep learning to the field of runoff prediction (Bai et al. 2016a; Haidar & Verma 2018).

This study aims to apply deep learning to mid- to long-term runoff prediction. To explore and attempt the application of data science theory to hydrology, a method for mid- to long-term runoff prediction based on PDBN is proposed, and the architecture and learning algorithm of the model are analyzed in detail.

Determining the network depth of the DBN

The suitability of parameter selection in the DBN is related to the prediction performance; therefore, the determination of network depth is the first problem to be solved. When the network depth of the DBN is large, more data features can be excavated, but the computational complexity is increased, and the performance is worse. At present, the exact methods for determining the network depth of a DBN include trial-and-error methods and parameter optimization methods. These methods search the most suitable network depth via many experiments. This study adopts the parameter optimization method by fixing the number of hidden nodes and the learning rate, and then determining the optimal network depth of the DBN by combining the particle swarm optimization (PSO) algorithm.

The common calculation method is applied to estimate the number of nodes in the hidden layer, and the formula is as follows: 
formula
(31)

In Equation (31), l is the number of nodes in the hidden layer, m is the number of input variables, n is the number of output variables, and α is an integer ranging from 0 to 10.

In addition, because the existence of the normalization factor makes it difficult to realize the logarithmic similarity evaluation, this study adopts the reconstruction error method to evaluate the gap between the reconstructed data and the observed data. The smaller the reconstruction error is, the more stable the model is. The two theoretical bases of this evaluation method are as follows: first, the network depth of the DBN is proportional to the prediction accuracy of the model; second, after unsupervised training, the network parameters only need fine-tuning during the supervised training stage. The flow of the network error reconstruction is shown in Figure 3, and the calculation method of reconstruction error is as follows: 
formula
(32)
where X is the ith batch matrix, Si is the value after the reconstruction of the ith batch matrix, and RE is the total error of the reconstruction.
Figure 3

The flow of network error reconstruction.

Figure 3

The flow of network error reconstruction.

Figure 4

Gibbs sampling of CD algorithm.

Figure 4

Gibbs sampling of CD algorithm.

It can be seen from Figure 3 that the DBN may judge whether the hidden layer can better depict the data characteristics of the visual layer. The DBN does this by reconstructing the input sample data and updating the hidden nodes by increasing the network depth of DBN and weight optimization, so that the reconstructed error is a minimum. Then, the label data is output.

Optimization algorithm in unsupervised stage

The CD calculation is based on Gibbs sampling. Gibbs sampling is used in Markov chain Monte Carlo statistical analysis and approximately extracts the sample sequence from a multivariable probability distribution under the condition that it is difficult to sample directly (Geng et al. 2018). The process of Gibbs sampling is shown in Figure 4. The main steps are as follows:

  1. The Gibbs chain is initialized with the observation sample data v and obtaining the input vector v0 of the visible layer.

  2. Referring to Equations (19) and (20), h(t) is sampled from and v(t+1) from in turn. Among them, t is the number of sampling steps, and the effect gets better as (t+ 1) gets larger.

is acquired by sampling and a preliminary estimation of , and using to estimate for the CD-1 algorithm. In the CD-1 algorithm, because all RBMs require many iterations and the direction of parameter update is different, if the learning rate remains unchanged, ‘premature’ phenomenon may appear in the algorithm or it may be difficult to converge, so selecting the adaptive learning rate according to the direction of the parameter update is essential to unsupervised learning.

Based on previous research results (Geng et al. 2018; Qiao et al. 2018; Saufi et al. 2019), this study designs an adaptive learning rate method according to the similarities and differences of parameter updating directions after two consecutive iterations in the training process of RBM. The principle is that when the parameter updating direction (positive or negative changes) is the same after two consecutive iterations, the learning rate will increase, or conversely, it will decrease. The adaptive updating mechanism of learning rate is as follows: 
formula
(33)
 
formula
(34)
 
formula
(35)
Among them, D is the enhancement coefficient of learning rate, d is the decrease coefficient of learning rate, and .

Based on the CD algorithm, the weights of the DBN are updated once every Gibbs sampling period, while the binary sampling of the intermediate state is carried out twice, and the weights of each update are proportional to the state sampling, so , . is the probability threshold ranging from 0.5 to 1, and the general value is 0.7. As a result, the D values range from 1 to 2 when the general value is 1.4, and the d values range from 0.5 to 1 when the general value is 0.7.

Optimization algorithm in the supervised stage

PLSR is a many-to-many regression model that considers both independent variables and dependent variables. Therefore, it is a supervised learning model. Especially when facing multiple variables, multiple correlations, and a relatively insufficiently observed sample, the PLSR has the advantages that traditional classical regression analysis and other methods do not have (Qiao et al. 2018). Therefore, this study conducts supervised fine-tuning for the DBN via PLSR. Starting from the output layer, the output layer and the last hidden layer comprise the first group, and so on, such that a regression model is established for every two layers until the first hidden layer and the input layer is the last group, and supervised fine-tuning is achieved. Taking the output layer and the last hidden layer as an example to establish the regression model, the specific steps include the following:

  1. Assuming that the actual output, ys, of the model is a dependent variable, the dimension of the output matrix is p, the state, hl, of the last hidden layer is the independent variable, and the dimension of the state matrix is q, then: 
    formula
    (36)
     
    formula
    (37)
    where Y is the observation matrix of the N-sample. X is the state observation matrix of the last hidden layer.
  2. Assuming that the first pair of components u1 and v1 extracted from the two observation sample matrices, and u1 is the linear combination of the dependent variable set , and v1 is the linear combination of independent variable set , then we can define: 
    formula
    (38)
     
    formula
    (39)
    Calculating the score vectors of the first pair of components, denoted as and , then 
    formula
    (40)
     
    formula
    (41)
  3. This step calculates the values of α1 and β1 when the correlation between and is strongest, and converts the inner product for calculating score vectors into a conditional extreme solution. 
    formula
    (42)

    Then, the score vectors and are obtained corresponding to the optimal parameters α1 and β1.

  4. This step establishes the regression models of v1 corresponding to and , respectively, as 
    formula
    (43)
    where X1 and Y1 are the residual matrices of X and Y, respectively.
  5. The above steps are repeated using residual matrices X1 and Y1 instead of X and Y, respectively.

  6. Assuming that the rank of the observation matrix X of the last hidden layer is r, the number of components is r (v1, v2, …, vr), such that: 
    formula
    (44)
     
    formula
    (45)

Substituting into (38), and obtaining the PLSR equation between the desired output and the last hidden layer.

And then obtaining the PLSR equation between the desired output and the last hidden layer. 
formula
(46)

Finally, this step obtains the weight wout between the output layer and the last hidden layer.

In general, the number of extracted components is determined by cross-validation, that is, removing the ith state of the last hidden layer of the DBN network every time, and realizing modeling based on PLSR for the remaining (n − 1) states. Then, the regression equation including components is obtained, and finally, the eliminated ith state is substituted into the fitted regression equation , and the corresponding output is obtained. This process repeats until the nth state is completed. The error square sum of the jth predicted output is defined as 
formula
(47)
where yij is the expected output, then the error square sum of predicted output is defined as 
formula
(48)
After that, the regression equation including c components is fitted by all the states of the last hidden layer of the DBN. Noting that the predicted output corresponding to the ith state is , then the error square sum of predicted output can be defined as 
formula
(49)
The error square sum of predicted output is defined as: 
formula
(50)
According to the PLSR, there is: 
formula
(51)
The threshold L is defined as: 
formula
(52)

The value of L is inversely proportional to that of c, so c can be determined by choosing the appropriate L.

After the above steps, the fine-tuning process between the output layer and the last hidden layer is accomplished during the supervised learning stage. By analogy, the fine-tuning process of the whole DBN is finally completed, and the network weight is obtained.

In summary, the mid- to long-term runoff prediction by combining the DBN and PLSR can be divided into three steps: firstly, determining the number of hidden layer nodes and network learning rate, and adopting the PSO algorithm for searching the optimal network depth of the DBN in different depths; secondly, introducing the adaptive learning rate for improving the convergence speed of the DBN in the unsupervised training stage; and finally, adopting the PLSR to avoid easily falling into local optimum and to accelerate training speed in the supervised learning stage, so as to improve the fine-tuning accuracy of the DBN network.

CASE STUDY

Study area and data

The Yalong River (25°12–34°9N, 96°47–102°42E) is the largest tributary of the Jinsha River in the southern region of the Tibetan Plateau. The total basin area of the Yalong River is approximately 136,000 km2, and the mainstream length is 1,571 km. The maximum altitude is greater than 6,000 m, and the minimum altitude is less than 1,000 m. The mean annual temperature is approximately 2.5 °C. The mean annual precipitation is approximately 600–1,800 mm decreasing from the south to the north. The average annual estuarine discharge is 1,860 m3/s. The major vegetation types are forests, shrubs, and meadows (Ye et al. 2017). The geography of the Yalong River is shown in Figure 5.

Figure 5

The Yalong River Basin.

Figure 5

The Yalong River Basin.

According to the time span characteristics of hydrometeorological data, the relevant data of the mid-long runoff prediction in the Yalong River Basin include: (1) 130 atmospheric circulation factors (selecting 21 teleconnection climatic factors) for the period from January 1951 to December 2012; (2) meteorological data such as pressure, temperature, humidity, precipitation, wind speed, and sunshine from January 1960 to September 2012; (3) runoff data of Lianghekou, Jinping, Guandi, and Ertan hydrological stations from January 1960 to December 2016. In this study, the dataset is 624 sets of samples from the period of January 1960 to December 2011, and it is divided into two parts: the training set (from January 1960 to December 2001) and the test set (from January 2002 to December 2011) (Figure 6).

Figure 6

Dataset partition.

Figure 6

Dataset partition.

Model development

BP neural networks

In this study, the BP neural networks are used as a competitive model for point forecasts. In BP neural networks, the number of hidden layer nodes is 12, and the training function is ‘tansig’, the learning function is ‘logsig’, the maximum training times are 600, the learning rate is 0.1, the model training adopts LM algorithm, the momentum factor is 0.9, and the expected error is 0.001.

Support vector machine

The least-squares support vector machine (LS-SVM) is used in this study. In SVM, the kernel function is ‘sigmoid’, and regularization parameters and kernel function's parameters for SVM are optimized using the grid-search algorithm with cross-validation. The LS-SVM toolbox is available on http://www.esat.kuleuven.be/sista/lssvmlab/.

Results and discussion

In this section, the proposed approach for mid- to long-term runoff prediction is applied, including COM selection result, factor selection result, and the PDBN model. The forecasting performances of PDBN are also compared with the application of the DBN, BP, and SVM to demonstrate the superiority using five common criteria.

COM selection result

After analyzing the consistency of the four hydrological stations (Lianghekou, Jinping, Guandi, and Ertan hydrological stations) in the Yalong River basin, the results show that there is a high correlation among the average runoff of the four hydrological stations on a monthly scale (Figure 7). Therefore, we construct the COM (coarse-grained) based on the monthly mean runoff of the four hydrological stations (fine-grained) to describe the abundance and drought of the monthly average runoff in the watershed. In the section ‘Construction of Comprehensive Runoff Index’, parameters for the COM1 and COM2 are shown in Tables 1 and 2, respectively. Comparison of COM1 and COM2 with the monthly mean runoff at the four hydrological stations (Figure 8) and the results of PCC, Kendalls Tau-b, and Spearman correlation coefficient are shown in Tables 35, respectively. Results of the correlation analysis between the COM1 and COM2 on the monthly scale indicate that the correlation of COM2 is higher. Consequently, this study selects the COM2 as the comprehensive runoff index of the Yalong River Basin.

Table 1

Parameters for the COM1 calculation based on monthly mean runoff at the four hydrological stations (m3/s)

StationLianghekouJinpingGuandiErtan
Monthly mean runoff 661.54 1208.65 1416.25 1617.32 
Normalization 0.134906 0.246474 0.288808 0.329812 
Runoff impact index 7.412580 4.057225 3.462504 3.032031 
Weight normalization 0.412627 0.225849 0.192743 0.168781 
StationLianghekouJinpingGuandiErtan
Monthly mean runoff 661.54 1208.65 1416.25 1617.32 
Normalization 0.134906 0.246474 0.288808 0.329812 
Runoff impact index 7.412580 4.057225 3.462504 3.032031 
Weight normalization 0.412627 0.225849 0.192743 0.168781 
Table 2

Parameters for the COM2 calculation based on the controlling area of four hydrological stations

StationLianghekouJinpingGuandiErtan
Percentage of controlled area of hydrological stations 51% 81% 89% 97% 
Controlled area impact index 1.960784 1.234568 1.123596 1.030928 
Weight normalization 0.366510 0.230766 0.210023 0.192701 
StationLianghekouJinpingGuandiErtan
Percentage of controlled area of hydrological stations 51% 81% 89% 97% 
Controlled area impact index 1.960784 1.234568 1.123596 1.030928 
Weight normalization 0.366510 0.230766 0.210023 0.192701 
Table 3

PCC analysis of COM and runoff at four hydrological stations

 
LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou PCC 1.000 0.966 0.963 0.947 0.977 0.975 
Jinping PCC 0.966 1.000 0.998 0.993 0.998 0.998 
Guandi PCC 0.963 0.998 1.000 0.994 0.997 0.998 
Ertan PCC 0.947 0.993 0.994 1.000 0.992 0.993 
COM1 PCC 0.977 0.998 0.997 0.992 1.000 1.000 
COM2 PCC 0.975 0.998 0.998 0.993 1.000 1.000 
 
LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou PCC 1.000 0.966 0.963 0.947 0.977 0.975 
Jinping PCC 0.966 1.000 0.998 0.993 0.998 0.998 
Guandi PCC 0.963 0.998 1.000 0.994 0.997 0.998 
Ertan PCC 0.947 0.993 0.994 1.000 0.992 0.993 
COM1 PCC 0.977 0.998 0.997 0.992 1.000 1.000 
COM2 PCC 0.975 0.998 0.998 0.993 1.000 1.000 
Table 4

Tau-b analysis of COM and runoff at four hydrological stations

LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou Tau-b 1.000 0.862 0.849 0.808 0.890 0.883 
Jinping Tau-b 0.862 1.000 0.946 0.901 0.956 0.958 
Guandi Tau-b 0.849 0.946 1.000 0.913 0.950 0.954 
Ertan Tau-b 0.808 0.901 0.913 1.000 0.909 0.915 
COM1 Tau-b 0.890 0.956 0.950 0.909 1.000 0.994 
COM2 Tau-b 0.883 0.958 0.954 0.915 0.994 1.000 
LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou Tau-b 1.000 0.862 0.849 0.808 0.890 0.883 
Jinping Tau-b 0.862 1.000 0.946 0.901 0.956 0.958 
Guandi Tau-b 0.849 0.946 1.000 0.913 0.950 0.954 
Ertan Tau-b 0.808 0.901 0.913 1.000 0.909 0.915 
COM1 Tau-b 0.890 0.956 0.950 0.909 1.000 0.994 
COM2 Tau-b 0.883 0.958 0.954 0.915 0.994 1.000 
Table 5

Spearman analysis of COM and runoff at four hydrological stations

LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou Spearman 1.000 0.976 0.971 0.953 0.985 0.983 
Jinping Spearman 0.976 1.000 0.994 0.984 0.997 0.997 
Guandi Spearman 0.971 0.994 1.000 0.987 0.996 0.996 
Ertan Spearman 0.953 0.984 0.987 1.000 0.987 0.988 
COM1 Spearman 0.985 0.997 0.996 0.987 1.000 1.000 
COM2 Spearman 0.983 0.997 0.996 0.988 1.000 1.000 
LianghekouJinpingGuandiErtanCOM1COM2
Lianghekou Spearman 1.000 0.976 0.971 0.953 0.985 0.983 
Jinping Spearman 0.976 1.000 0.994 0.984 0.997 0.997 
Guandi Spearman 0.971 0.994 1.000 0.987 0.996 0.996 
Ertan Spearman 0.953 0.984 0.987 1.000 0.987 0.988 
COM1 Spearman 0.985 0.997 0.996 0.987 1.000 1.000 
COM2 Spearman 0.983 0.997 0.996 0.988 1.000 1.000 
Figure 7

Comparison of monthly mean runoff consistency at the four hydrological stations in the watershed (m3/s).

Figure 7

Comparison of monthly mean runoff consistency at the four hydrological stations in the watershed (m3/s).

Figure 8

Comparison of COM1 and COM2 with the monthly mean runoff at the four hydrological stations (m3/s).

Figure 8

Comparison of COM1 and COM2 with the monthly mean runoff at the four hydrological stations (m3/s).

Factor selection result

Alternative candidate predictive factors

In this study, the main influencing factors of the runoff change include antecedent runoff, antecedent rainfall, and 21 teleconnection climate factors in the Yalong River Basin (Table 6). Lagged predictive factors from the current time step t up to the previous 12 time steps (1 month) are considered sufficient for capturing key factors affecting the runoff. Here, the predictive factors from the current time t to t − 12 are selected as potential input candidates. Consequently, the alternative candidate predictive factors include comprehensive runoff index (com(t − 1), com(t − 2), … , com(t − 12)), area rainfall (rain(t − 1), rain(t − 2), … , rain(t − 12)), and 21 teleconnection climatic factors (atm1 (atm1(t − 1), atm1(t − 2),…, atm1(t − 12)), atm2 (atm2(t − 1), atm2(t − 2),…, atm2(t − 12)),…, atm21(atm21(t − 1), atm21(t 2), …, atm21(t 12))). The total number of factors is 276 (23*12).

Factor selection procedure and result

In this study, the PMI method is adopted to select all the alternative factors affecting runoff change according to the AIC. However, due to the significant influence of the autoregressive term on the factor selection, the method based on the AIC is not fully applicable to the factors selection in this study. In order to study the effects of other hydrological factors (such as rainfall, climate, and others), this study completes the factor selection on the basis of the PMI method combined with manual selection. The selection procedure includes:

  1. The PMI method is adopted to sort all the candidate factors according to the correlation size. The partial results of the primary selection are shown in Table 7.

  2. On the basis of the above primary results, the top 20 candidate factors are selected, in terms of correlation size (Table 8).

  3. On the basis of the above results, the correlation size from the top of the list of variable factors (up to the top five) is manually selected from the antecedent COM, antecedent rainfall, and antecedent teleconnection climate factors of the Yalong River Basin. New comprehensive selection results are formed, as shown in Table 9. The finally selected results show that the factors affecting the change of runoff process in the Yalong River Basin are mainly related to the lag time of 1, 2, 3, 11, and 12 months of the COM, the lag time of 1, 7, and 12 months of the area rainfall, and the lag time of 1 month of the North African Subtropical High Area Index (20 W–60E), 6 months of the Indian Ocean Warm Pool Area Index, 7 months of the Northern Hemisphere Polar Vortex Central Intensity Index (50–360), 8 months of the NINO 4 SSTA Index, and 1 month of the Indian Ocean Warm Pool Strength Index.

Table 6

The name and definition of alternative candidate predictive factors

Predictive factorsDefinition and interpretationPredictive factorsDefinition and interpretation
comprehensive runoff index (com) Describing the abundance and drought of the monthly average runoff in the watershed based on the controlling area of four hydrological stations atm11 Number of landing typhoon on China 
area rainfall (rain) Adopting the Tyson Polygon Method to construct the rain factor based on 22 meteorological stations in the Yalong River Basin atm12 NINO 3 SSTA Index 
atm1 North African Subtropical High Area Index (20 W − 60E) atm13 NINO 4 SSTA Index 
atm2 Atlantic-European Polar Vortex Area Index (430 W − 60E) atm14 NINO B SSTA Index 
atm3 Northern Hemisphere Polar Vortex Central Intensity Index (50 − 360) atm15 Indian Ocean Warm Pool Area Index 
atm4 Asian Zonal Circulation Index (IZ, 60E − 150E) atm16 Indian Ocean Warm Pool Strength Index 
atm5 East Asian Trough Intensity Index atm17 Western Pacific Warm Pool Strength Index 
atm6 West Pacific Pattern atm18 Atlantic Multi-decadal Oscillation Index 
atm7 East Atlantic/West Russia Pattern atm19 West Wind Drift Current SST Index 
atm8 Polar/Eurasia Pattern atm20 ENSO Modoki Index 
atm9 Mid-Eastern Pacific 200 mb Zonal Wind Index atm21 Tropic Indian Ocean Dipole Index 
atm10 East Pacific 850 mb Trade Wind Index   
Predictive factorsDefinition and interpretationPredictive factorsDefinition and interpretation
comprehensive runoff index (com) Describing the abundance and drought of the monthly average runoff in the watershed based on the controlling area of four hydrological stations atm11 Number of landing typhoon on China 
area rainfall (rain) Adopting the Tyson Polygon Method to construct the rain factor based on 22 meteorological stations in the Yalong River Basin atm12 NINO 3 SSTA Index 
atm1 North African Subtropical High Area Index (20 W − 60E) atm13 NINO 4 SSTA Index 
atm2 Atlantic-European Polar Vortex Area Index (430 W − 60E) atm14 NINO B SSTA Index 
atm3 Northern Hemisphere Polar Vortex Central Intensity Index (50 − 360) atm15 Indian Ocean Warm Pool Area Index 
atm4 Asian Zonal Circulation Index (IZ, 60E − 150E) atm16 Indian Ocean Warm Pool Strength Index 
atm5 East Asian Trough Intensity Index atm17 Western Pacific Warm Pool Strength Index 
atm6 West Pacific Pattern atm18 Atlantic Multi-decadal Oscillation Index 
atm7 East Atlantic/West Russia Pattern atm19 West Wind Drift Current SST Index 
atm8 Polar/Eurasia Pattern atm20 ENSO Modoki Index 
atm9 Mid-Eastern Pacific 200 mb Zonal Wind Index atm21 Tropic Indian Ocean Dipole Index 
atm10 East Pacific 850 mb Trade Wind Index   
Table 7

Partial ranking results of the correlation size of input factors

Iteration timesSelected factors (dimension)PMIAICIteration timesSelected factors (dimension)PMIAIC
253th MI = 1.06327 7359.21 16 27th 0.242582 2297.18 
276th 0.342728 7313.61 17 43th 0.242350 1759.57 
259th 0.272634 7317.04 18 164th 0.245940 1140.58 
1st 0.250970 7116.99 19 76th 0.249823 512.49 
264th 0.281160 6850.16 20 118th 0.217033 309.10 
265th 0.263007 6393.48 21 186th 0.189791 286.41 
275th 0.201745 5937.04 22 104th 0.177744 −83.84 
266th 0.159973 5842.28 23 229th 0.157730 −317.03 
174th 0.198184 5408.06 24 74th 0.119725 −1063.35 
31th 0.165484 5202.29 25 241th 0.109808 −1086.21 
10 152th 0.189054 4894.24 26 205th 0.068223 −1124.14 
11 181th 0.227255 4579.80 27 183th 0.0462753 −1127.05 
12 185th 0.221514 4408.21 28 169th 0.0408787 −1132.51 
13 267th 0.237344 4275.77 29 97th 0.0316651 −1134.98 
14 57th 0.219919 3717.66 30 157th 0.0158137 −1135.35 
15 245th 0.243911 3045.74 31 182th 0.0105117 −1135.73 
Iteration timesSelected factors (dimension)PMIAICIteration timesSelected factors (dimension)PMIAIC
253th MI = 1.06327 7359.21 16 27th 0.242582 2297.18 
276th 0.342728 7313.61 17 43th 0.242350 1759.57 
259th 0.272634 7317.04 18 164th 0.245940 1140.58 
1st 0.250970 7116.99 19 76th 0.249823 512.49 
264th 0.281160 6850.16 20 118th 0.217033 309.10 
265th 0.263007 6393.48 21 186th 0.189791 286.41 
275th 0.201745 5937.04 22 104th 0.177744 −83.84 
266th 0.159973 5842.28 23 229th 0.157730 −317.03 
174th 0.198184 5408.06 24 74th 0.119725 −1063.35 
31th 0.165484 5202.29 25 241th 0.109808 −1086.21 
10 152th 0.189054 4894.24 26 205th 0.068223 −1124.14 
11 181th 0.227255 4579.80 27 183th 0.0462753 −1127.05 
12 185th 0.221514 4408.21 28 169th 0.0408787 −1132.51 
13 267th 0.237344 4275.77 29 97th 0.0316651 −1134.98 
14 57th 0.219919 3717.66 30 157th 0.0158137 −1135.35 
15 245th 0.243911 3045.74 31 182th 0.0105117 −1135.73 
Table 8

Candidate factors ranked in the top 20 based on PMI

Sorting by PMI valueThe candidate factorsSorting by PMI valueThe candidate factors
rain(t 1), 253th 11 atm13(t − 8), 152th 
com(t − 12), 276th 12 atm16(t − 1), 181th 
rain(t − 7), 259th 13 atm16(t − 5), 185th 
atm1(t − 1), 1st 14 com(t − 3), 267th 
rain(t − 12), 264th 15 atm5(t − 9), 57th 
com(t − 1), 265th 16 atm21(t − 5), 245th 
com(t − 11), 275th 17 atm3(t − 3), 27th 
com(t2), 266th 18 atm4(t − 7), 43th 
atm15(t − 6), 174th 19 atm14(t − 8), 164th 
10 atm3(t − 7), 31th 20 atm7(t − 4), 76th 
Sorting by PMI valueThe candidate factorsSorting by PMI valueThe candidate factors
rain(t 1), 253th 11 atm13(t − 8), 152th 
com(t − 12), 276th 12 atm16(t − 1), 181th 
rain(t − 7), 259th 13 atm16(t − 5), 185th 
atm1(t − 1), 1st 14 com(t − 3), 267th 
rain(t − 12), 264th 15 atm5(t − 9), 57th 
com(t − 1), 265th 16 atm21(t − 5), 245th 
com(t − 11), 275th 17 atm3(t − 3), 27th 
com(t2), 266th 18 atm4(t − 7), 43th 
atm15(t − 6), 174th 19 atm14(t − 8), 164th 
10 atm3(t − 7), 31th 20 atm7(t − 4), 76th 
Table 9

Reduction results of input factors based on PMI

Input factorsSelected factors
Comprehensive runoff index com(t − 12), com(t − 1), com(t − 11), com(t − 2), com(t − 3) 
Rainfall factors rain(t − 1), rain(t 7), rain(t − 12) 
Climate factors atm1(t − 1), atm15(t − 6), atm3(t − 7),  atm13(t − 8),  atm16(t − 1) 
Input factorsSelected factors
Comprehensive runoff index com(t − 12), com(t − 1), com(t − 11), com(t − 2), com(t − 3) 
Rainfall factors rain(t − 1), rain(t 7), rain(t − 12) 
Climate factors atm1(t − 1), atm15(t − 6), atm3(t − 7),  atm13(t − 8),  atm16(t − 1) 

Determining the structure and parameter optimization of PDBN

Network depth of PDBN

According to Equation (31), it is assumed that the number of nodes in the hidden layer is 10. Therefore, on the basis of standardized processing of the dataset, the structure and parameters of the DBN are set as follows: the number of input variables is 13, the number of output variables is 1, it is assumed that the number of hidden nodes is 10, the learning rate is 0.01, and the number of iterations is 800. The network depth of the DBN is between 2 and 6, and the experimental results are shown in Table 10 and Figure 9.

Table 10

Training results of PDBN with different network depths

Network depthsTop level reconstruction errorBottom reconstruction errorMAPE (%)RMSEDCTime (s)
4.38–4.95 8.33–9.11 17.61 0.1753 0.8376 11.72 
4.68–5.07 9.09–9.32 21.34 0.2121 0.8126 18.44 
4.24–4.89 8.13–8.98 15.25 0.1826 0.8413 25.86 
5.32–5.51 9.29–9.55 20.25 0.2434 0.7787 32.55 
5.98–6.37 9.77–9.95 32.88 0.2697 0.7423 39.29 
Network depthsTop level reconstruction errorBottom reconstruction errorMAPE (%)RMSEDCTime (s)
4.38–4.95 8.33–9.11 17.61 0.1753 0.8376 11.72 
4.68–5.07 9.09–9.32 21.34 0.2121 0.8126 18.44 
4.24–4.89 8.13–8.98 15.25 0.1826 0.8413 25.86 
5.32–5.51 9.29–9.55 20.25 0.2434 0.7787 32.55 
5.98–6.37 9.77–9.95 32.88 0.2697 0.7423 39.29 
Figure 9

The relationship between the network depth of PDBN and the performance.

Figure 9

The relationship between the network depth of PDBN and the performance.

As can be seen from Table 10 and Figure 9, the reconstruction errors at the bottom and top of the PDBN are relatively stable in the early stage but changed greatly in the later stage as the network depth increased. The time complexity of model training also increased. When the network depth is less than or equal to 4, the comprehensive performance of the model is relatively stable. Since then, increasing the network depth leads to the fluctuation of PDBN and the increase of error accumulation.

Comprehensive comparison, when the network depth is 4 (the number of hidden layers is 3), the comprehensive performance of PDBN is the best, so this study selected 4 as the network depth of PDBN.

Structure of PDBN

According to Equation (31), the number of neurons in the hidden layer selected in this study is an integer ranging from 4 to 14. However, the number is small, and the generalization ability is weak.

In order to avoid the shortage of data feature description caused by selecting too few hidden nodes, this study increases the number of neurons in the hidden layer by 20, 30, 40, and 50 levels on the basis of the previous hidden layer node range. Finally, the training of PDBN at different hidden nodes is completed, and the MAPE, RMSE, DC, and RT are recorded for each experiment. Based on these, the optimal number of neurons in the hidden layer is determined. The experimental results are shown in Table 11.

Table 11

Training results of PDBN with different numbers of neurons in the hidden layer

Input nodesHidden layersNumber of neurons in the hidden layerMAPE (%)RMSEDCTime (s)
13 25.24 0.2867 0.7349 17.62 
30.36 0.3635 0.6548 19.35 
28.08 0.3268 0.7012 20.41 
22.14 0.3043 0.6801 21.83 
18.67 0.1986 0.8012 23.11 
24.29 0.2898 0.7134 24.40 
10 15.25 0.1826 0.8413 25.86 
11 18.43 0.2326 0.7896 27.42 
12 13.28 0.1932 0.8846 29.89 
13 19.48 0.2245 0.8112 31.34 
14 21.22 0.2876 0.7335 33.87 
20 31.12 0.3425 0.6123 39.25 
30 35.43 0.3916 0.5712 49.66 
40 40.36 0.4256 0.5467 57.68 
50 42.12 0.4416 0.5368 68.31 
Input nodesHidden layersNumber of neurons in the hidden layerMAPE (%)RMSEDCTime (s)
13 25.24 0.2867 0.7349 17.62 
30.36 0.3635 0.6548 19.35 
28.08 0.3268 0.7012 20.41 
22.14 0.3043 0.6801 21.83 
18.67 0.1986 0.8012 23.11 
24.29 0.2898 0.7134 24.40 
10 15.25 0.1826 0.8413 25.86 
11 18.43 0.2326 0.7896 27.42 
12 13.28 0.1932 0.8846 29.89 
13 19.48 0.2245 0.8112 31.34 
14 21.22 0.2876 0.7335 33.87 
20 31.12 0.3425 0.6123 39.25 
30 35.43 0.3916 0.5712 49.66 
40 40.36 0.4256 0.5467 57.68 
50 42.12 0.4416 0.5368 68.31 

As can be seen from Table 11, with the increase of neurons in the hidden layer, the RT increases. The MAPE and RMSE decrease at early stages and then begin to increase. With more than 20 neurons in the hidden layer, the performance of the PDBN decreases sharply. The performance change of PDBN is shown in Figure 10.

Figure 10

Relation curve between the number of neurons in the hidden layer and network performance.

Figure 10

Relation curve between the number of neurons in the hidden layer and network performance.

As can be seen from Figure 10, when the number neuron in the hidden layer is 12, the MAPE and RMSE corresponding to the PDBN are the smallest and the DC is the largest. This result indicates that there is a better fitting effect between the predicted value and the true value; however, the RT increases with an increasing number of neurons in the hidden layer. For a comprehensive comparison, when the number neurons in the hidden layer is 12, the PDBN is optimal (MAPE = 13.28%, RMSE = 0.1932, DC = 0.8846, t = 29.8 s). Therefore, the optimal structure of the PDBN for the mid- to long-term runoff prediction in the Yalong River Basin is determined (13-12-12-12-1).

Parameter optimization of PDBN

According to the optimization algorithm, the number of iterations of every RBM in the unsupervised stage is 300, D is 1.4, and d is 0.7. It is necessary to determine the number of extracted components of each layer from top to bottom to improve the fine-tuning precision of PDBN. Through training, the relationship between the number of extracted components c and the limited value L is as shown in Figure 11. When L = 0.02, the fine-tuning precision of PDBN is the highest, and the corresponding number of extracted components .

Figure 11

Relationship curve between the limited value L and the extracted component number c.

Figure 11

Relationship curve between the limited value L and the extracted component number c.

Evaluation of forecasting performance

In order to efficiently demonstrate the superior modeling performance of PDBN, we compare the performance of BP, SVM, and DBN models in runoff prediction. The forecasting evaluation results are comprehensively shown in Figures 1214 and Table 12 from different aspects. The predicted and observed runoff during model calibration using different data-driven models is shown in Figure 15. The relative percent error using the different methods is shown in Figure 16.

Table 12

Performance comparison of different methods

ModelStructureMAPE (%)RMSE (m3/s)DCRT (s)
BP 13-12-1 24.75 326.29 0.8775 30.92 
SVM — 24.64 360.02 0.8508 29.15 
DBN 13-12-12-12-1 41.00 277.50 0.9114 16.58 
PDBN 13-12-12-12-1 19.98 229.70 0.9393 10.32 
ModelStructureMAPE (%)RMSE (m3/s)DCRT (s)
BP 13-12-1 24.75 326.29 0.8775 30.92 
SVM — 24.64 360.02 0.8508 29.15 
DBN 13-12-12-12-1 41.00 277.50 0.9114 16.58 
PDBN 13-12-12-12-1 19.98 229.70 0.9393 10.32 
Figure 12

The prediction results using the DBN model compared with PDBN.

Figure 12

The prediction results using the DBN model compared with PDBN.

Figure 13

The prediction results using the BP model compared with PDBN.

Figure 13

The prediction results using the BP model compared with PDBN.

Figure 14

The prediction results using the SVM model compared with PDBN.

Figure 14

The prediction results using the SVM model compared with PDBN.

Figure 15

Predicted and observed runoff using different methods (m3/s).

Figure 15

Predicted and observed runoff using different methods (m3/s).

Figure 16

The relative percent error using different methods.

Figure 16

The relative percent error using different methods.

Among the evaluation results based on the dataset, the data-driven models have good prediction results on the dataset constructed by COM, rainfall factors, and climate factors, but different evaluation indicators show diverse results. Among them, for the MAPE index, the PDBN model is the best (BP is 24.75%, SVM model is 24.64%, and DBN model is 41%), which indicates the minor error between the predicted value and the observed value adopting the PDBN. For the RMSE index, the PDBN model indicates the higher prediction accuracy (PDBN is 229.7 m3/s, BP model is 326.29 m3/s, SVM model is 360.02 m3/s, DBN is 277.5 m3/s), and the RMSE from the SVM model is larger. This index is proportional to the annual flow of the river basin. For the DC index, the four data-driven models are better and the PDBN model is the best (PDBN is 0.9393, BP model is 0.8775, SVM model is 0.8508, and DBN is 0.9114), which shows these models have better fitting effect. The main reason is that the PMI method has the advantages of characterizing relationships of linear and nonlinear among factors, and thus might be appropriate for the factor selection of hydrological process with complex spatio-temporal characteristics. For the RT index, the PDBN model is the best (PDBN is 10.32 s, BP model is 30.92 s, SVM model is 29.15 s, and DBN is 16.58 s), the BP and SVM models have lower scores for this index, which demonstrates the advantage of deep learning in computation speed.

In summary, the mid- to long-term runoff prediction method presented in this study includes the construction of COM, the factor reduction using PMI, and the PDBN. Among them, the COM and factor reduction provide early data processing for runoff prediction, and the PDBN method is proposed to improve the accuracy of runoff prediction. Also, the four data-driven models have an efficient prediction for the dataset tested, and the comprehensive performance of the PDBN model is better than that of BP, SVM, and the DBN models for MAPE, RMSE, DC, RE, and RT.

CONCLUSIONS

In general, the characterization and determination of watershed runoff trends and their key factors are a difficult task. Aiming to provide a solution, this study studied the correlation between multiple factors that affect changes in mid- to long-term runoff in the watershed. A method to construct comprehensive runoff factors was proposed, and the comprehensive characterization of runoff changes in the whole watershed was realized. Then, we investigated a method for generating runoff impact factors based on PMI to obtain the key feature factors related to the runoff in the watershed, which achieved the factor selection for the mid- to long-term runoff forecasting in the Yalong River Basin.

Aiming to solve the problems caused by the highly nonlinear complex characteristics of mid- to long-term runoff forecasting in the watershed, this study proposes a mid- to long-term runoff forecasting method based on deep learning to overcome the drawback of traditional artificial neural networks and other shallow learning networks. At the same time, the dynamic optimization of structure and parameters of the DBN is studied for improving the forecasting accuracy and generalization. The novelty of our proposed method lies in the PMI-based key factor selection and the PDBN-based forecasting method.

Experimental results on the data collected in the Yalong River Basin demonstrate that our method can obtain the most consistent and correct results in mid- to long-term runoff forecasting. We acknowledge that there are many factors which are not considered in this study, such as human activities. In addition, because of the limit of condition, more recent data are not included in the study. In our future work, we will investigate more factors and more recent data which can be formulated and used for runoff forecasting. Apart from this, future work also includes investigating more robust deep learning models.

ACKNOWLEDGEMENTS

This research is supported by ‘the Fundamental Research Funds for the Central Universities’ (Grant Nos. 2017B616X14 and 2018B610X14), ‘the National Natural Science Foundation of China’ (Grant Nos. 51420105014 and 61976118), and ‘the Postgraduate Research & Practice Innovation Program of Jiangsu Province’ (Grant No. KYCX18_0583).

REFERENCES

REFERENCES
Ackley
D. H.
,
Hinton
G. E.
&
Sejnowski
T. J.
1985
A learning algorithm for Boltzmann machines
.
Cogn. Sci.
9
(
1
),
147
169
.
Alizadeh
M. J.
,
Kavianpour
M. R.
,
Kisi
O.
&
Nourani
V.
2017
A new approach for simulating and forecasting the rainfall-runoff process within the next two months
.
J. Hydrol.
548
,
588
597
.
Bai
Y.
,
Xie
J.
,
Wang
X.
&
Li
C.
2016b
Model fusion approach for monthly reservoir inflow forecasting
.
J. Hydroinform.
18
(
4
),
634
650
.
Chen
Y.
&
Han
D.
2016
Big data and hydroinformatics
.
J. Hydroinform.
18
(
4
),
599
614
.
Fotovatikhah
F.
,
Herrera
M.
,
Shamshirband
S.
,
Chau
K. W.
,
Faizollahzadeh Ardabili
S.
&
Piran
M. J.
2018
Survey of computational intelligence as basis to big flood management: challenges, research directions and future work
.
Eng. Appl. Comput. Fluid
12
(
1
),
411
437
.
Geng
Z.
,
Li
Z.
&
Han
Y.
2018
A new deep belief network based on RBM with glial chains
.
Inf. Sci.
463
,
294
306
.
Huang
W.
,
Xu
B.
&
Chan-Hilton
A.
2004
Forecasting flows in Apalachicola River using neural networks
.
Hydrol. Process.
18
(
13
),
2545
2564
.
Li
C.
,
Zhu
L.
,
He
Z.
,
Gao
H.
,
Yang
Y.
,
Yao
D.
&
Qu
X.
2019
Runoff prediction method based on adaptive elman neural network
.
Water
11
(
6
),
1113
.
May
R. J.
,
Dandy
G. C.
,
Maier
H. R.
&
Nixon
J. B.
2008a
Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems
.
Environ. Model. Softw.
23
(
10–11
),
1289
1299
.
May
R. J.
,
Maier
H. R.
,
Dandy
G. C.
&
Fernando
T. G.
2008b
Non-linear variable selection for artificial neural networks using partial mutual information
.
Environ. Model. Softw.
23
(
10–11
),
1312
1326
.
Qiao
J.
,
Wang
G.
,
Li
W.
&
Li
X.
2018
A deep belief network with PLSR for nonlinear system modeling
.
Neural Netw.
104
,
68
79
.
Saufi
S. R.
,
Ahmad
Z. A. B.
,
Leong
M. S.
&
Lim
M. H.
2019
Challenges and opportunities of deep learning models for machinery fault detection and diagnosis: a review
.
IEEE Access
7
,
122644
122662
.
Shamshirband
S.
,
Hashemi
S.
,
Salimi
H.
,
Samadianfard
S.
,
Asadi
E.
,
Shadkani
S.
,
Kargar
K.
,
Mosavi
A.
,
Nabipour
N.
&
Chau
K. W.
2020
Predicting standardized streamflow index for hydrological drought using machine learning models
.
Eng. Appl. Comput. Fluid
14
(
1
),
339
350
.
Shoaib
M.
,
Shamseldin
A. Y.
,
Khan
S.
,
Khan
M. M.
,
Khan
Z. M.
,
Sultan
T.
&
Melville
B. W.
2018
A comparative study of various hybrid wavelet feedforward neural network models for runoff forecasting
.
Water Resour. Manage.
32
(
1
),
83
103
.
Tan
Q. F.
,
Lei
X. H.
,
Wang
X.
,
Wang
H.
,
Wen
X.
,
Ji
Y.
&
Kang
A. Q.
2018
An adaptive middle and long-term runoff forecast model using EEMD-ANN hybrid approach
.
J. Hydrol.
567
,
767
780
.
Wang
W. C.
,
Chau
K. W.
,
Xu
D. M.
,
Qiu
L.
&
Liu
C. C.
2017
The annual maximum flood peak discharge forecasting using Hermite projection pursuit regression with SSO and LS method
.
Water Resour. Manage.
31
(
1
),
461
477
.
Yao
J. T.
,
Vasilakos
A. V.
&
Pedrycz
W.
2013
Granular computing: perspectives and challenges
.
IEEE Trans. Cybern.
43
(
6
),
1977
1989
.
Yaseen
Z. M.
,
Mohtar
W. H. M. W.
,
Ameen
A. M. S.
,
Ebtehaj
I.
,
Razali
S. F. M.
,
Bonakdari
H.
,
Salih
S. Q.
,
Al-Ansari
N.
&
Shahid
S.
2019a
Implementation of univariate paradigm for streamflow simulation using hybrid data-driven model: case study in tropical region
.
IEEE Access
7
,
74471
74481
.