## Abstract

Data representation and prediction model design play an important role in mid- to long-term runoff prediction. However, it is challenging to extract key factors that accurately characterize the changes in the runoff of a river basin because of the complex nature of the runoff process. In addition, the low accuracy is another problem for mid- to long-term runoff prediction. With an aim to solve these problems, two improvements are proposed in this paper. First, the partial mutual information (PMI)-based approach was employed for estimating the importance of various factors. Second, a deep learning architecture was introduced by using the deep belief network (DBN) with partial least-squares regression (PLSR), together denoted as PDBN, for mid- to long-term runoff prediction, which solves the problem of parameter optimization for the DBN using PLSR. The novelty of the proposed method lies in the key factor selection and a novel forecasting method for mid- to long-term runoff. Experimental results demonstrated that the proposed method can significantly improve the effect of mid- to long-term runoff prediction. Also, compared with the results obtained by current state-of-the-art prediction methods, i.e., DBN, backpropagation neural networks, and support vector machine models, our prediction results demonstrate the performance of the proposed method.

## HIGHLIGHTS

Constructing the comprehensive runoff index to perform the comprehensive characterization of runoff changes in the whole watershed.

Applying the partial mutual information method for factor selection.

The PDBN is used as a deep learning approach for the mid- to long-term runoff prediction.

The proposed method can significantly improve the effect of the mid- to long-term runoff prediction.

### Graphical Abstract

## INTRODUCTION

The rapid development of information technology, especially the theory of data science and soft computing techniques, has significantly influenced the development and application of hydrology (Chen & Han 2016; Chen & Wang 2018; Ma *et al.* 2018), such as drought prediction (Shamshirband *et al.* 2020), flood prediction and management (Wang *et al.* 2017; Fotovatikhah *et al.* 2018; Mosavi *et al.* 2018), river flow forecasting (Bai *et al.* 2016b; Yaseen *et al.* 2019a, 2019b), and runoff forecasting (Valipour 2015; Tan *et al.* 2018). Among them, the mid- to long-term runoff prediction is an important research direction in hydrological forecasting and hydroinformatics. In general, the amount of basic hydrological data necessary for hydrological forecasting is increasing, both in terms of the amount of data and the number of data modes, which provides informative factors for the mid- to long-term runoff prediction. However, the increasing amount of hydrological data can introduce redundant and noisy information to the prediction feature or factor, which may deteriorate the performance of the mid- to long-term runoff prediction. At this time, traditional methods face difficulty in adapting to hydrological data analysis and runoff prediction due to insufficient noise removal and representation learning capabilities.

Data representation plays an important role in mid- to long-term runoff prediction. However, it is challenging to extract key factors that accurately characterize the changes in the runoff of a river basin because of the complex nature of the runoff process, which seriously affects the speed and accuracy of runoff prediction. Over the past decades, input factors have also evolved from considering only previous rainfall and runoff to multiple factors including runoff, rainfall, climate, and others. How to select a suitable set of key factors from the above factors is still difficult in mid- to long-term runoff prediction. In order to select key factors, some methods have been applied in runoff prediction, mainly including prior knowledge method (Huang *et al.* 2004), correlation coefficient method (Li *et al.* 2012), principal component analysis (PCA; Li *et al.* 2019), and partial mutual information (PMI; Fernando *et al.* 2009). Among them, the prior knowledge method is seldom applied due to the large amount of calculation. The methods of correlation coefficient and PCA are commonly used to describe the linear relationship between the impact factors, but difficult to characterize the complex nonlinear relationship, which limits the applicable scope of these general methods. The PMI method is based on information entropy theory, which has the advantages of characterizing relationships of linear and nonlinear among factors, and has been widely used in many fields.

Prediction model design is another important research content in mid- to long-term runoff prediction. At present, some various advanced data-driven methods have been proposed to improve the effect of mid- to long-term runoff prediction, such as the wavelet transforms (Jiang *et al.* 2018), support vector machines (Zhao *et al.* 2017), artificial neural networks (Alizadeh *et al.* 2017; Shoaib *et al.* 2018; Yaseen *et al.* 2019a, 2019b), and Bayesian statistic prediction models (Ma *et al.* 2013; Chu *et al.* 2017). Data-driven models have provided a useful tool for solving the watersheds with too complex and unknown runoff production. However, the mid- to long-term runoff prediction is low accuracy because of the highly nonlinear complex characteristics of mid- to long-term runoff process in the watershed. Moreover, an extremely high dimensional feature will be extracted if multiple factors are modeled, which significantly increases the scale of the prediction model and calculation complexity.

The important point for solving the above problems is to extract key factors that are closely related to the runoff in a river basin and design prediction model. Aiming to improve the mid- to long-term prediction of the runoff process, this study comprehensively introduces multiple hydrological factors to construct the prediction feature. Moreover, aiming to solve the problems caused by the highly nonlinear complex characteristics of mid- to long-term runoff forecasting in the watershed, and the shallow network is degenerated for the mid- to long-term runoff forecasting task, a deep belief network (DBN) combined with partial least-squares regression (PDBN) is established to learn the most representative key factors for mid- to long-term runoff prediction, which solves the problem of weight optimization for the DBN using PLSR. The main contribution of this study is three-fold:

Investigating the consistency of runoff at different hydrological stations for constructing the COM according to the granulation computing to perform the comprehensive characterization of runoff changes in the whole watershed.

Generating key factors for runoff prediction on the basis of the PMI method combined with manual selection.

Applying to the mid- to long-term runoff prediction with the proposed PDBN based on the above key factors. The novelty of the proposed PDBN lies in: (i) an adaptive learning rate is introduced to accelerate the convergence rate in the unsupervised learning stage of the DBN and (ii) the PLSR is used to determine the optimal weights in the supervised learning stage to improve prediction accuracy.

A flowchart of our proposed method is shown in Figure 1.

The remainder of this paper is structured as follows: The introduction and review of literature are presented in this section. The next two sections describe the methodology and the proposed model PDBN used in this study, respectively. Moreover, the performance indices used to evaluate the models developed in this study is also reported in the section ‘Methodology’. The data used in the study are described in the section ‘Case Study’, and results of the different developed models are elaborated in the section ‘Case Study’. The conclusions are provided in the final section.

## METHODOLOGY

The mid- to long-term runoff prediction method presented in this study consists of three parts: construction of COM, factor reduction using PMI, and DBN. In the following section, these parts will be elaborated in detail.

### Construction of COM

The main influencing factors of the runoff change within the basin include rainfall, climate, and other factors. Unlike the previous research works, we investigate the consistency of runoff at different hydrological stations for constructing the COM according to granulation computing (Yao *et al.* 2013), to characterize the abundance and drought of runoff in the basin. According to the characteristics of mid- to long-term runoff forecasting, we choose the monthly scale as the time granularity. Specifically, the study includes two parts: the average runoff consistency analysis based on information from different hydrological stations and the construction of the COM.

#### Average runoff consistency analysis based on information from different hydrological stations

Firstly, the runoff consistency analysis considering information from different hydrological stations is conducted, and then, the correlation is used to calculate the correlation of the monthly mean runoff at different hydrological stations, thereby providing the basis for the construction of the COM for the entire basin.

#### Construction of the COM

To avoid the cumulative impact between the hydrological stations, we estimate the weights of the selected hydrological stations depending on their contribution and influence on the downstream hydrological stations. The COM is usually established on the monthly mean runoff at different hydrological stations (COM1). Specifically, the proposed COM is established on the controlling area of different hydrological stations (COM2).

##### Calculation steps of COM1

##### Calculation steps of COM2

### Factor reduction using PMI

There is nonlinear correlation observed among hydrological and meteorological variables. In this case, MI can be used to select the input factors in runoff forecasting more effectively (He *et al.* 2011). However, while selecting input variables using MI, it should be taken into consideration that the coupling relationship between input variables affects the calculation results, which may lead to the mis-selection or omission of the input variables.

To solve the above problems, May *et al.* (May *et al.* 2008a; 2008b; Qian & Shu 2015) proposed the PMI methods to eliminate the relationship between variables through calculating conditional expectations, thereby ensuring the reliability and accuracy of variable selection.

#### Partial mutual information (PMI)

*H*(

*C*) represents the information entropy of variable

_{j}*C*,

_{j}*p*denotes the probability of variable

_{i}*C*at different values, and

_{j}*k*represents the number of variables.

#### PMI-based factor selection

*X*and

*Z*are input variables, and

*Y*is the output variable, then there may be correlation between input variables

*X*and

*Z*, and the value of mutual information

*I*(

*Y,Z*) may be greater than the actual value. Therefore, we can eliminate

*X*contained in variables

*Y*and

*Z*with conditional expectation, and the correlation between variables can be measured using PMI. After eliminating

*x*,

*Y*is denoted as

*u*,

*Z*is denoted as

*v*, which can be expressed as:

Specifically assuming that *C* stands for the input variable set, *Y* stands for the output variable (runoff forecasting results), *S* stands for the optimal input variable set, and stands for the candidate variable corresponding to the maximum PMI, the PMI-based variable selection can be realized as per the following steps:

Step 1. Initialize

*S*as an empty setStep 2. If

*C*is a null set, returnStep 3. Calculate the

*u**=**Y**−**m*(_{Y}*S*) by Equation (11)Step 4. For each element of , calculate by Equation (12)

Step 5. If reaches the maximum, select

*C*as the candidate variable_{s}Step 6. Calculate the Akaike Information Criterion (AIC). If the AIC is decreased, moving

*C*to the optimal input variable set_{s}*S*, and returning Step 2; Stop selection if AIC increases.

*u*stands for the

_{i}*Y*regression residual calculated from the selected variables;

*n*stands for the number of samples, and

*p*stands for the number of selected variables. With the selection of independent variables, AIC keeps decreasing. When AIC reaches the minimum, the collection of optimal independent variables is finished.

### Deep belief network

The DBN is constructed by sequential stacking of multiple restricted Boltzmann machines (RBMs) (Ackley *et al.* 1985). The learning includes two stages: unsupervised and supervised stages. The unsupervised stage is mainly used to pre-train the RBM, while in the supervised stage, the gradient-based error back propagation algorithm is used to fine tune the whole network. This method allows for particular advantages in learning the deep structure and has been successfully applied to pattern recognition, prediction, and other tasks. In addition to its advantages in weight optimization, the DBN can also be used as a probability generation model to overcome the problem of overfitting and underfitting of the model. A typical RBM is a random indirect graphical model, and a common DBN is a ubiquitous deep neural network (Figure 2).

#### Unsupervised contrastive divergence algorithm

*v*represents the states of visual neurons,

*h*represents the states of hidden neurons, and the model parameter is , so the probability distribution function based on the energy model can be defined as: where

*Z*is the normalization factor:

*b*, and the bias value of the hidden layer is

_{i}*c*. Therefore, the conditional probability distribution can be described as: where

_{j}*σ*is activation function, and .

#### Supervised learning

*α*is the number of iterations and

*K*is the number of samples. The changed weights are as follows: where

*η*is the learning rate.

Thus, the updated weight between the output layer and the last hidden layer can be obtained, and the weight of the whole DBN model can be obtained by fine-tuning layer by layer.

### Evaluation criteria

*y*represents the observed value of the COM in the Yalong River basin at

^{t}*t*, denotes the predicted value at

*t*, and

*n*is the sample data. represents the mean of the observed value of the COM in the Yalong River basin.

The MAPE is a most frequently used statistics index and employed for examining the error between the predicted value and the observed value. The RMSE is used to measure the predicted value accuracy. A RMSE value of 0 indicates perfect match between the predicted value and the observed value. Likewise, the DC provides a measure of the capacity of the model to predict observed values. In general, high values of DC (up to 1) and small values for RMSE indicate a good model. The RE is used to reflect the reliability of the prediction model. In order to better estimate the calculation speed of the PDBN model, we add the running time (RT) index.

## THE PROPOSED PDBN MODEL FOR MID- TO LONG-TERM RUNOFF PREDICTION

The reasons for runoff formation are always complex and are closely related to the surrounding hydrology, meteorology, climatic factors, economy, human activities, and many other factors. Furthermore, the main influencing factors are different for different basins and have complex nonlinear characteristics. Therefore, the mid- to long-term runoff prediction of watershed requires the abilities of nonlinear processing and generalization. Deep learning can extract feature information of samples layer by layer from massive amounts of data to obtain key feature factors and has better prediction speed and accuracy than the traditional shallow network. Deep learning has been widely used in many fields such as image processing, artificial intelligence, and there is little application of deep learning to the field of runoff prediction (Bai *et al.* 2016a; Haidar & Verma 2018).

This study aims to apply deep learning to mid- to long-term runoff prediction. To explore and attempt the application of data science theory to hydrology, a method for mid- to long-term runoff prediction based on PDBN is proposed, and the architecture and learning algorithm of the model are analyzed in detail.

### Determining the network depth of the DBN

The suitability of parameter selection in the DBN is related to the prediction performance; therefore, the determination of network depth is the first problem to be solved. When the network depth of the DBN is large, more data features can be excavated, but the computational complexity is increased, and the performance is worse. At present, the exact methods for determining the network depth of a DBN include trial-and-error methods and parameter optimization methods. These methods search the most suitable network depth via many experiments. This study adopts the parameter optimization method by fixing the number of hidden nodes and the learning rate, and then determining the optimal network depth of the DBN by combining the particle swarm optimization (PSO) algorithm.

In Equation (31), *l* is the number of nodes in the hidden layer, *m* is the number of input variables, *n* is the number of output variables, and *α* is an integer ranging from 0 to 10.

*X*is the

*i*th batch matrix,

*S*is the value after the reconstruction of the

_{i}*i*th batch matrix, and RE is the total error of the reconstruction.

It can be seen from Figure 3 that the DBN may judge whether the hidden layer can better depict the data characteristics of the visual layer. The DBN does this by reconstructing the input sample data and updating the hidden nodes by increasing the network depth of DBN and weight optimization, so that the reconstructed error is a minimum. Then, the label data is output.

### Optimization algorithm in unsupervised stage

The CD calculation is based on Gibbs sampling. Gibbs sampling is used in Markov chain Monte Carlo statistical analysis and approximately extracts the sample sequence from a multivariable probability distribution under the condition that it is difficult to sample directly (Geng *et al.* 2018). The process of Gibbs sampling is shown in Figure 4. The main steps are as follows:

The Gibbs chain is initialized with the observation sample data

*v*and obtaining the input vector*v*^{0}of the visible layer.Referring to Equations (19) and (20),

*h*^{(t)}is sampled from and*v*^{(t+1)}from in turn. Among them,*t*is the number of sampling steps, and the effect gets better as (*t**+*1) gets larger.

is acquired by sampling and a preliminary estimation of , and using to estimate for the CD-1 algorithm. In the CD-1 algorithm, because all RBMs require many iterations and the direction of parameter update is different, if the learning rate remains unchanged, ‘premature’ phenomenon may appear in the algorithm or it may be difficult to converge, so selecting the adaptive learning rate according to the direction of the parameter update is essential to unsupervised learning.

*et al.*2018; Qiao

*et al.*2018; Saufi

*et al.*2019), this study designs an adaptive learning rate method according to the similarities and differences of parameter updating directions after two consecutive iterations in the training process of RBM. The principle is that when the parameter updating direction (positive or negative changes) is the same after two consecutive iterations, the learning rate will increase, or conversely, it will decrease. The adaptive updating mechanism of learning rate is as follows: Among them,

*D*is the enhancement coefficient of learning rate,

*d*is the decrease coefficient of learning rate, and .

Based on the CD algorithm, the weights of the DBN are updated once every Gibbs sampling period, while the binary sampling of the intermediate state is carried out twice, and the weights of each update are proportional to the state sampling, so , . is the probability threshold ranging from 0.5 to 1, and the general value is 0.7. As a result, the *D* values range from 1 to 2 when the general value is 1.4, and the *d* values range from 0.5 to 1 when the general value is 0.7.

### Optimization algorithm in the supervised stage

PLSR is a many-to-many regression model that considers both independent variables and dependent variables. Therefore, it is a supervised learning model. Especially when facing multiple variables, multiple correlations, and a relatively insufficiently observed sample, the PLSR has the advantages that traditional classical regression analysis and other methods do not have (Qiao *et al.* 2018). Therefore, this study conducts supervised fine-tuning for the DBN via PLSR. Starting from the output layer, the output layer and the last hidden layer comprise the first group, and so on, such that a regression model is established for every two layers until the first hidden layer and the input layer is the last group, and supervised fine-tuning is achieved. Taking the output layer and the last hidden layer as an example to establish the regression model, the specific steps include the following:

- Assuming that the actual output,
*y*, of the model is a dependent variable, the dimension of the output matrix is_{s}*p*, the state,*h*, of the last hidden layer is the independent variable, and the dimension of the state matrix is_{l}*q*, then: where*Y*is the observation matrix of the*N*-sample.*X*is the state observation matrix of the last hidden layer. - This step calculates the values of
*α*_{1}and*β*_{1}when the correlation between and is strongest, and converts the inner product for calculating score vectors into a conditional extreme solution.Then, the score vectors and are obtained corresponding to the optimal parameters

*α*_{1}and*β*_{1}. The above steps are repeated using residual matrices

*X*_{1}and*Y*_{1}instead of*X*and*Y*, respectively.

Substituting into (38), and obtaining the PLSR equation between the desired output and the last hidden layer.

Finally, this step obtains the weight *w*_{out} between the output layer and the last hidden layer.

*i*th state of the last hidden layer of the DBN network every time, and realizing modeling based on PLSR for the remaining (

*n*− 1) states. Then, the regression equation including components is obtained, and finally, the eliminated

*i*th state is substituted into the fitted regression equation , and the corresponding output is obtained. This process repeats until the

*n*th state is completed. The error square sum of the

*j*th predicted output is defined as where

*y*is the expected output, then the error square sum of predicted output is defined as After that, the regression equation including

_{ij}*c*components is fitted by all the states of the last hidden layer of the DBN. Noting that the predicted output corresponding to the

*i*th state is , then the error square sum of predicted output can be defined as

The value of *L* is inversely proportional to that of *c*, so *c* can be determined by choosing the appropriate *L*.

After the above steps, the fine-tuning process between the output layer and the last hidden layer is accomplished during the supervised learning stage. By analogy, the fine-tuning process of the whole DBN is finally completed, and the network weight is obtained.

In summary, the mid- to long-term runoff prediction by combining the DBN and PLSR can be divided into three steps: firstly, determining the number of hidden layer nodes and network learning rate, and adopting the PSO algorithm for searching the optimal network depth of the DBN in different depths; secondly, introducing the adaptive learning rate for improving the convergence speed of the DBN in the unsupervised training stage; and finally, adopting the PLSR to avoid easily falling into local optimum and to accelerate training speed in the supervised learning stage, so as to improve the fine-tuning accuracy of the DBN network.

## CASE STUDY

### Study area and data

The Yalong River (25°12–34°9N, 96°47–102°42E) is the largest tributary of the Jinsha River in the southern region of the Tibetan Plateau. The total basin area of the Yalong River is approximately 136,000 km^{2}, and the mainstream length is 1,571 km. The maximum altitude is greater than 6,000 m, and the minimum altitude is less than 1,000 m. The mean annual temperature is approximately 2.5 °C. The mean annual precipitation is approximately 600–1,800 mm decreasing from the south to the north. The average annual estuarine discharge is 1,860 m^{3}/s. The major vegetation types are forests, shrubs, and meadows (Ye *et al.* 2017). The geography of the Yalong River is shown in Figure 5.

According to the time span characteristics of hydrometeorological data, the relevant data of the mid-long runoff prediction in the Yalong River Basin include: (1) 130 atmospheric circulation factors (selecting 21 teleconnection climatic factors) for the period from January 1951 to December 2012; (2) meteorological data such as pressure, temperature, humidity, precipitation, wind speed, and sunshine from January 1960 to September 2012; (3) runoff data of Lianghekou, Jinping, Guandi, and Ertan hydrological stations from January 1960 to December 2016. In this study, the dataset is 624 sets of samples from the period of January 1960 to December 2011, and it is divided into two parts: the training set (from January 1960 to December 2001) and the test set (from January 2002 to December 2011) (Figure 6).

### Model development

#### BP neural networks

In this study, the BP neural networks are used as a competitive model for point forecasts. In BP neural networks, the number of hidden layer nodes is 12, and the training function is ‘tansig’, the learning function is ‘logsig’, the maximum training times are 600, the learning rate is 0.1, the model training adopts LM algorithm, the momentum factor is 0.9, and the expected error is 0.001.

#### Support vector machine

The least-squares support vector machine (LS-SVM) is used in this study. In SVM, the kernel function is ‘sigmoid’, and regularization parameters and kernel function's parameters for SVM are optimized using the grid-search algorithm with cross-validation. The LS-SVM toolbox is available on http://www.esat.kuleuven.be/sista/lssvmlab/.

### Results and discussion

In this section, the proposed approach for mid- to long-term runoff prediction is applied, including COM selection result, factor selection result, and the PDBN model. The forecasting performances of PDBN are also compared with the application of the DBN, BP, and SVM to demonstrate the superiority using five common criteria.

#### COM selection result

After analyzing the consistency of the four hydrological stations (Lianghekou, Jinping, Guandi, and Ertan hydrological stations) in the Yalong River basin, the results show that there is a high correlation among the average runoff of the four hydrological stations on a monthly scale (Figure 7). Therefore, we construct the COM (coarse-grained) based on the monthly mean runoff of the four hydrological stations (fine-grained) to describe the abundance and drought of the monthly average runoff in the watershed. In the section ‘Construction of Comprehensive Runoff Index’, parameters for the COM1 and COM2 are shown in Tables 1 and 2, respectively. Comparison of COM1 and COM2 with the monthly mean runoff at the four hydrological stations (Figure 8) and the results of PCC, Kendalls Tau-b, and Spearman correlation coefficient are shown in Tables 3–5, respectively. Results of the correlation analysis between the COM1 and COM2 on the monthly scale indicate that the correlation of COM2 is higher. Consequently, this study selects the COM2 as the comprehensive runoff index of the Yalong River Basin.

Station . | Lianghekou . | Jinping . | Guandi . | Ertan . |
---|---|---|---|---|

Monthly mean runoff | 661.54 | 1208.65 | 1416.25 | 1617.32 |

Normalization | 0.134906 | 0.246474 | 0.288808 | 0.329812 |

Runoff impact index | 7.412580 | 4.057225 | 3.462504 | 3.032031 |

Weight normalization | 0.412627 | 0.225849 | 0.192743 | 0.168781 |

Station . | Lianghekou . | Jinping . | Guandi . | Ertan . |
---|---|---|---|---|

Monthly mean runoff | 661.54 | 1208.65 | 1416.25 | 1617.32 |

Normalization | 0.134906 | 0.246474 | 0.288808 | 0.329812 |

Runoff impact index | 7.412580 | 4.057225 | 3.462504 | 3.032031 |

Weight normalization | 0.412627 | 0.225849 | 0.192743 | 0.168781 |

Station . | Lianghekou . | Jinping . | Guandi . | Ertan . |
---|---|---|---|---|

Percentage of controlled area of hydrological stations | 51% | 81% | 89% | 97% |

Controlled area impact index | 1.960784 | 1.234568 | 1.123596 | 1.030928 |

Weight normalization | 0.366510 | 0.230766 | 0.210023 | 0.192701 |

Station . | Lianghekou . | Jinping . | Guandi . | Ertan . |
---|---|---|---|---|

Percentage of controlled area of hydrological stations | 51% | 81% | 89% | 97% |

Controlled area impact index | 1.960784 | 1.234568 | 1.123596 | 1.030928 |

Weight normalization | 0.366510 | 0.230766 | 0.210023 | 0.192701 |

. | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . | |
---|---|---|---|---|---|---|---|

Lianghekou | PCC | 1.000 | 0.966 | 0.963 | 0.947 | 0.977 | 0.975 |

Jinping | PCC | 0.966 | 1.000 | 0.998 | 0.993 | 0.998 | 0.998 |

Guandi | PCC | 0.963 | 0.998 | 1.000 | 0.994 | 0.997 | 0.998 |

Ertan | PCC | 0.947 | 0.993 | 0.994 | 1.000 | 0.992 | 0.993 |

COM1 | PCC | 0.977 | 0.998 | 0.997 | 0.992 | 1.000 | 1.000 |

COM2 | PCC | 0.975 | 0.998 | 0.998 | 0.993 | 1.000 | 1.000 |

. | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . | |
---|---|---|---|---|---|---|---|

Lianghekou | PCC | 1.000 | 0.966 | 0.963 | 0.947 | 0.977 | 0.975 |

Jinping | PCC | 0.966 | 1.000 | 0.998 | 0.993 | 0.998 | 0.998 |

Guandi | PCC | 0.963 | 0.998 | 1.000 | 0.994 | 0.997 | 0.998 |

Ertan | PCC | 0.947 | 0.993 | 0.994 | 1.000 | 0.992 | 0.993 |

COM1 | PCC | 0.977 | 0.998 | 0.997 | 0.992 | 1.000 | 1.000 |

COM2 | PCC | 0.975 | 0.998 | 0.998 | 0.993 | 1.000 | 1.000 |

. | . | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . |
---|---|---|---|---|---|---|---|

Lianghekou | Tau-b | 1.000 | 0.862 | 0.849 | 0.808 | 0.890 | 0.883 |

Jinping | Tau-b | 0.862 | 1.000 | 0.946 | 0.901 | 0.956 | 0.958 |

Guandi | Tau-b | 0.849 | 0.946 | 1.000 | 0.913 | 0.950 | 0.954 |

Ertan | Tau-b | 0.808 | 0.901 | 0.913 | 1.000 | 0.909 | 0.915 |

COM1 | Tau-b | 0.890 | 0.956 | 0.950 | 0.909 | 1.000 | 0.994 |

COM2 | Tau-b | 0.883 | 0.958 | 0.954 | 0.915 | 0.994 | 1.000 |

. | . | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . |
---|---|---|---|---|---|---|---|

Lianghekou | Tau-b | 1.000 | 0.862 | 0.849 | 0.808 | 0.890 | 0.883 |

Jinping | Tau-b | 0.862 | 1.000 | 0.946 | 0.901 | 0.956 | 0.958 |

Guandi | Tau-b | 0.849 | 0.946 | 1.000 | 0.913 | 0.950 | 0.954 |

Ertan | Tau-b | 0.808 | 0.901 | 0.913 | 1.000 | 0.909 | 0.915 |

COM1 | Tau-b | 0.890 | 0.956 | 0.950 | 0.909 | 1.000 | 0.994 |

COM2 | Tau-b | 0.883 | 0.958 | 0.954 | 0.915 | 0.994 | 1.000 |

. | . | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . |
---|---|---|---|---|---|---|---|

Lianghekou | Spearman | 1.000 | 0.976 | 0.971 | 0.953 | 0.985 | 0.983 |

Jinping | Spearman | 0.976 | 1.000 | 0.994 | 0.984 | 0.997 | 0.997 |

Guandi | Spearman | 0.971 | 0.994 | 1.000 | 0.987 | 0.996 | 0.996 |

Ertan | Spearman | 0.953 | 0.984 | 0.987 | 1.000 | 0.987 | 0.988 |

COM1 | Spearman | 0.985 | 0.997 | 0.996 | 0.987 | 1.000 | 1.000 |

COM2 | Spearman | 0.983 | 0.997 | 0.996 | 0.988 | 1.000 | 1.000 |

. | . | Lianghekou . | Jinping . | Guandi . | Ertan . | COM1 . | COM2 . |
---|---|---|---|---|---|---|---|

Lianghekou | Spearman | 1.000 | 0.976 | 0.971 | 0.953 | 0.985 | 0.983 |

Jinping | Spearman | 0.976 | 1.000 | 0.994 | 0.984 | 0.997 | 0.997 |

Guandi | Spearman | 0.971 | 0.994 | 1.000 | 0.987 | 0.996 | 0.996 |

Ertan | Spearman | 0.953 | 0.984 | 0.987 | 1.000 | 0.987 | 0.988 |

COM1 | Spearman | 0.985 | 0.997 | 0.996 | 0.987 | 1.000 | 1.000 |

COM2 | Spearman | 0.983 | 0.997 | 0.996 | 0.988 | 1.000 | 1.000 |

#### Factor selection result

##### Alternative candidate predictive factors

In this study, the main influencing factors of the runoff change include antecedent runoff, antecedent rainfall, and 21 teleconnection climate factors in the Yalong River Basin (Table 6). Lagged predictive factors from the current time step *t* up to the previous 12 time steps (1 month) are considered sufficient for capturing key factors affecting the runoff. Here, the predictive factors from the current time *t* to *t* − 12 are selected as potential input candidates. Consequently, the alternative candidate predictive factors include comprehensive runoff index (*com*(*t* − 1), *com*(*t* − 2), … , *com*(*t* − 12)), area rainfall (*rain*(*t* − 1), *rain*(*t* − 2), … , *rain*(*t* − 12)), and 21 teleconnection climatic factors (*atm*_{1} (*atm*_{1}(*t* − 1), *atm*_{1}(*t* − 2),…, *atm*_{1}(*t* − 12)), *atm*_{2} (*atm*_{2}(*t* − 1), *atm*_{2}(*t* − 2),…, *atm*_{2}(*t* − 12)),…, *atm*_{21}(*atm*_{21}(*t* − 1), *atm*_{21}(*t**−* 2), …, *atm*_{21}(*t**−* 12))). The total number of factors is 276 (23*12).

##### Factor selection procedure and result

In this study, the PMI method is adopted to select all the alternative factors affecting runoff change according to the AIC. However, due to the significant influence of the autoregressive term on the factor selection, the method based on the AIC is not fully applicable to the factors selection in this study. In order to study the effects of other hydrological factors (such as rainfall, climate, and others), this study completes the factor selection on the basis of the PMI method combined with manual selection. The selection procedure includes:

The PMI method is adopted to sort all the candidate factors according to the correlation size. The partial results of the primary selection are shown in Table 7.

On the basis of the above primary results, the top 20 candidate factors are selected, in terms of correlation size (Table 8).

On the basis of the above results, the correlation size from the top of the list of variable factors (up to the top five) is manually selected from the antecedent COM, antecedent rainfall, and antecedent teleconnection climate factors of the Yalong River Basin. New comprehensive selection results are formed, as shown in Table 9. The finally selected results show that the factors affecting the change of runoff process in the Yalong River Basin are mainly related to the lag time of 1, 2, 3, 11, and 12 months of the COM, the lag time of 1, 7, and 12 months of the area rainfall, and the lag time of 1 month of the North African Subtropical High Area Index (20 W–60E), 6 months of the Indian Ocean Warm Pool Area Index, 7 months of the Northern Hemisphere Polar Vortex Central Intensity Index (50–360), 8 months of the NINO 4 SSTA Index, and 1 month of the Indian Ocean Warm Pool Strength Index.

Predictive factors . | Definition and interpretation . | Predictive factors . | Definition and interpretation . |
---|---|---|---|

comprehensive runoff index (com) | Describing the abundance and drought of the monthly average runoff in the watershed based on the controlling area of four hydrological stations | atm_{11} | Number of landing typhoon on China |

area rainfall (rain) | Adopting the Tyson Polygon Method to construct the rain factor based on 22 meteorological stations in the Yalong River Basin | atm_{12} | NINO 3 SSTA Index |

atm_{1} | North African Subtropical High Area Index (20 W − 60E) | atm_{13} | NINO 4 SSTA Index |

atm_{2} | Atlantic-European Polar Vortex Area Index (430 W − 60E) | atm_{14} | NINO B SSTA Index |

atm_{3} | Northern Hemisphere Polar Vortex Central Intensity Index (50 − 360) | atm_{15} | Indian Ocean Warm Pool Area Index |

atm_{4} | Asian Zonal Circulation Index (IZ, 60E − 150E) | atm_{16} | Indian Ocean Warm Pool Strength Index |

atm_{5} | East Asian Trough Intensity Index | atm_{17} | Western Pacific Warm Pool Strength Index |

atm_{6} | West Pacific Pattern | atm_{18} | Atlantic Multi-decadal Oscillation Index |

atm_{7} | East Atlantic/West Russia Pattern | atm_{19} | West Wind Drift Current SST Index |

atm_{8} | Polar/Eurasia Pattern | atm_{20} | ENSO Modoki Index |

atm_{9} | Mid-Eastern Pacific 200 mb Zonal Wind Index | atm_{21} | Tropic Indian Ocean Dipole Index |

atm_{10} | East Pacific 850 mb Trade Wind Index |

Predictive factors . | Definition and interpretation . | Predictive factors . | Definition and interpretation . |
---|---|---|---|

comprehensive runoff index (com) | Describing the abundance and drought of the monthly average runoff in the watershed based on the controlling area of four hydrological stations | atm_{11} | Number of landing typhoon on China |

area rainfall (rain) | Adopting the Tyson Polygon Method to construct the rain factor based on 22 meteorological stations in the Yalong River Basin | atm_{12} | NINO 3 SSTA Index |

atm_{1} | North African Subtropical High Area Index (20 W − 60E) | atm_{13} | NINO 4 SSTA Index |

atm_{2} | Atlantic-European Polar Vortex Area Index (430 W − 60E) | atm_{14} | NINO B SSTA Index |

atm_{3} | Northern Hemisphere Polar Vortex Central Intensity Index (50 − 360) | atm_{15} | Indian Ocean Warm Pool Area Index |

atm_{4} | Asian Zonal Circulation Index (IZ, 60E − 150E) | atm_{16} | Indian Ocean Warm Pool Strength Index |

atm_{5} | East Asian Trough Intensity Index | atm_{17} | Western Pacific Warm Pool Strength Index |

atm_{6} | West Pacific Pattern | atm_{18} | Atlantic Multi-decadal Oscillation Index |

atm_{7} | East Atlantic/West Russia Pattern | atm_{19} | West Wind Drift Current SST Index |

atm_{8} | Polar/Eurasia Pattern | atm_{20} | ENSO Modoki Index |

atm_{9} | Mid-Eastern Pacific 200 mb Zonal Wind Index | atm_{21} | Tropic Indian Ocean Dipole Index |

atm_{10} | East Pacific 850 mb Trade Wind Index |

Iteration times . | Selected factors (dimension) . | PMI . | AIC . | Iteration times . | Selected factors (dimension) . | PMI . | AIC . |
---|---|---|---|---|---|---|---|

0 | 253th | MI = 1.06327 | 7359.21 | 16 | 27th | 0.242582 | 2297.18 |

1 | 276th | 0.342728 | 7313.61 | 17 | 43th | 0.242350 | 1759.57 |

2 | 259th | 0.272634 | 7317.04 | 18 | 164th | 0.245940 | 1140.58 |

3 | 1st | 0.250970 | 7116.99 | 19 | 76th | 0.249823 | 512.49 |

4 | 264th | 0.281160 | 6850.16 | 20 | 118th | 0.217033 | 309.10 |

5 | 265th | 0.263007 | 6393.48 | 21 | 186th | 0.189791 | 286.41 |

6 | 275th | 0.201745 | 5937.04 | 22 | 104th | 0.177744 | −83.84 |

7 | 266th | 0.159973 | 5842.28 | 23 | 229th | 0.157730 | −317.03 |

8 | 174th | 0.198184 | 5408.06 | 24 | 74th | 0.119725 | −1063.35 |

9 | 31th | 0.165484 | 5202.29 | 25 | 241th | 0.109808 | −1086.21 |

10 | 152th | 0.189054 | 4894.24 | 26 | 205th | 0.068223 | −1124.14 |

11 | 181th | 0.227255 | 4579.80 | 27 | 183th | 0.0462753 | −1127.05 |

12 | 185th | 0.221514 | 4408.21 | 28 | 169th | 0.0408787 | −1132.51 |

13 | 267th | 0.237344 | 4275.77 | 29 | 97th | 0.0316651 | −1134.98 |

14 | 57th | 0.219919 | 3717.66 | 30 | 157th | 0.0158137 | −1135.35 |

15 | 245th | 0.243911 | 3045.74 | 31 | 182th | 0.0105117 | −1135.73 |

Iteration times . | Selected factors (dimension) . | PMI . | AIC . | Iteration times . | Selected factors (dimension) . | PMI . | AIC . |
---|---|---|---|---|---|---|---|

0 | 253th | MI = 1.06327 | 7359.21 | 16 | 27th | 0.242582 | 2297.18 |

1 | 276th | 0.342728 | 7313.61 | 17 | 43th | 0.242350 | 1759.57 |

2 | 259th | 0.272634 | 7317.04 | 18 | 164th | 0.245940 | 1140.58 |

3 | 1st | 0.250970 | 7116.99 | 19 | 76th | 0.249823 | 512.49 |

4 | 264th | 0.281160 | 6850.16 | 20 | 118th | 0.217033 | 309.10 |

5 | 265th | 0.263007 | 6393.48 | 21 | 186th | 0.189791 | 286.41 |

6 | 275th | 0.201745 | 5937.04 | 22 | 104th | 0.177744 | −83.84 |

7 | 266th | 0.159973 | 5842.28 | 23 | 229th | 0.157730 | −317.03 |

8 | 174th | 0.198184 | 5408.06 | 24 | 74th | 0.119725 | −1063.35 |

9 | 31th | 0.165484 | 5202.29 | 25 | 241th | 0.109808 | −1086.21 |

10 | 152th | 0.189054 | 4894.24 | 26 | 205th | 0.068223 | −1124.14 |

11 | 181th | 0.227255 | 4579.80 | 27 | 183th | 0.0462753 | −1127.05 |

12 | 185th | 0.221514 | 4408.21 | 28 | 169th | 0.0408787 | −1132.51 |

13 | 267th | 0.237344 | 4275.77 | 29 | 97th | 0.0316651 | −1134.98 |

14 | 57th | 0.219919 | 3717.66 | 30 | 157th | 0.0158137 | −1135.35 |

15 | 245th | 0.243911 | 3045.74 | 31 | 182th | 0.0105117 | −1135.73 |

Sorting by PMI value . | The candidate factors . | Sorting by PMI value . | The candidate factors . |
---|---|---|---|

1 | rain(t− 1), 253th | 11 | atm_{13}(t − 8), 152th |

2 | com(t − 12), 276th | 12 | atm_{16}(t − 1), 181th |

3 | rain(t − 7), 259th | 13 | atm_{16}(t − 5), 185th |

4 | atm_{1}(t − 1), 1st | 14 | com(t − 3), 267th |

5 | rain(t − 12), 264th | 15 | atm_{5}(t − 9), 57th |

6 | com(t − 1), 265th | 16 | atm_{21}(t − 5), 245th |

7 | com(t − 11), 275th | 17 | atm_{3}(t − 3), 27th |

8 | com(t−2), 266th | 18 | atm_{4}(t − 7), 43th |

9 | atm_{15}(t − 6), 174th | 19 | atm_{14}(t − 8), 164th |

10 | atm_{3}(t − 7), 31th | 20 | atm_{7}(t − 4), 76th |

Sorting by PMI value . | The candidate factors . | Sorting by PMI value . | The candidate factors . |
---|---|---|---|

1 | rain(t− 1), 253th | 11 | atm_{13}(t − 8), 152th |

2 | com(t − 12), 276th | 12 | atm_{16}(t − 1), 181th |

3 | rain(t − 7), 259th | 13 | atm_{16}(t − 5), 185th |

4 | atm_{1}(t − 1), 1st | 14 | com(t − 3), 267th |

5 | rain(t − 12), 264th | 15 | atm_{5}(t − 9), 57th |

6 | com(t − 1), 265th | 16 | atm_{21}(t − 5), 245th |

7 | com(t − 11), 275th | 17 | atm_{3}(t − 3), 27th |

8 | com(t−2), 266th | 18 | atm_{4}(t − 7), 43th |

9 | atm_{15}(t − 6), 174th | 19 | atm_{14}(t − 8), 164th |

10 | atm_{3}(t − 7), 31th | 20 | atm_{7}(t − 4), 76th |

Input factors . | Selected factors . |
---|---|

Comprehensive runoff index | com(t − 12), com(t − 1), com(t − 11), com(t − 2), com(t − 3) |

Rainfall factors | rain(t − 1), rain(t− 7), rain(t − 12) |

Climate factors | atm_{1}(t − 1), atm_{15}(t − 6), atm_{3}(t − 7), atm_{13}(t − 8), atm_{16}(t − 1) |

Input factors . | Selected factors . |
---|---|

Comprehensive runoff index | com(t − 12), com(t − 1), com(t − 11), com(t − 2), com(t − 3) |

Rainfall factors | rain(t − 1), rain(t− 7), rain(t − 12) |

Climate factors | atm_{1}(t − 1), atm_{15}(t − 6), atm_{3}(t − 7), atm_{13}(t − 8), atm_{16}(t − 1) |

#### Determining the structure and parameter optimization of PDBN

##### Network depth of PDBN

According to Equation (31), it is assumed that the number of nodes in the hidden layer is 10. Therefore, on the basis of standardized processing of the dataset, the structure and parameters of the DBN are set as follows: the number of input variables is 13, the number of output variables is 1, it is assumed that the number of hidden nodes is 10, the learning rate is 0.01, and the number of iterations is 800. The network depth of the DBN is between 2 and 6, and the experimental results are shown in Table 10 and Figure 9.

Network depths . | Top level reconstruction error . | Bottom reconstruction error . | MAPE (%) . | RMSE . | DC . | Time (s) . |
---|---|---|---|---|---|---|

2 | 4.38–4.95 | 8.33–9.11 | 17.61 | 0.1753 | 0.8376 | 11.72 |

3 | 4.68–5.07 | 9.09–9.32 | 21.34 | 0.2121 | 0.8126 | 18.44 |

4 | 4.24–4.89 | 8.13–8.98 | 15.25 | 0.1826 | 0.8413 | 25.86 |

5 | 5.32–5.51 | 9.29–9.55 | 20.25 | 0.2434 | 0.7787 | 32.55 |

6 | 5.98–6.37 | 9.77–9.95 | 32.88 | 0.2697 | 0.7423 | 39.29 |

Network depths . | Top level reconstruction error . | Bottom reconstruction error . | MAPE (%) . | RMSE . | DC . | Time (s) . |
---|---|---|---|---|---|---|

2 | 4.38–4.95 | 8.33–9.11 | 17.61 | 0.1753 | 0.8376 | 11.72 |

3 | 4.68–5.07 | 9.09–9.32 | 21.34 | 0.2121 | 0.8126 | 18.44 |

4 | 4.24–4.89 | 8.13–8.98 | 15.25 | 0.1826 | 0.8413 | 25.86 |

5 | 5.32–5.51 | 9.29–9.55 | 20.25 | 0.2434 | 0.7787 | 32.55 |

6 | 5.98–6.37 | 9.77–9.95 | 32.88 | 0.2697 | 0.7423 | 39.29 |

As can be seen from Table 10 and Figure 9, the reconstruction errors at the bottom and top of the PDBN are relatively stable in the early stage but changed greatly in the later stage as the network depth increased. The time complexity of model training also increased. When the network depth is less than or equal to 4, the comprehensive performance of the model is relatively stable. Since then, increasing the network depth leads to the fluctuation of PDBN and the increase of error accumulation.

Comprehensive comparison, when the network depth is 4 (the number of hidden layers is 3), the comprehensive performance of PDBN is the best, so this study selected 4 as the network depth of PDBN.

##### Structure of PDBN

According to Equation (31), the number of neurons in the hidden layer selected in this study is an integer ranging from 4 to 14. However, the number is small, and the generalization ability is weak.

In order to avoid the shortage of data feature description caused by selecting too few hidden nodes, this study increases the number of neurons in the hidden layer by 20, 30, 40, and 50 levels on the basis of the previous hidden layer node range. Finally, the training of PDBN at different hidden nodes is completed, and the MAPE, RMSE, DC, and RT are recorded for each experiment. Based on these, the optimal number of neurons in the hidden layer is determined. The experimental results are shown in Table 11.

Input nodes . | Hidden layers . | Number of neurons in the hidden layer . | MAPE (%) . | RMSE . | DC . | Time (s) . |
---|---|---|---|---|---|---|

13 | 3 | 4 | 25.24 | 0.2867 | 0.7349 | 17.62 |

5 | 30.36 | 0.3635 | 0.6548 | 19.35 | ||

6 | 28.08 | 0.3268 | 0.7012 | 20.41 | ||

7 | 22.14 | 0.3043 | 0.6801 | 21.83 | ||

8 | 18.67 | 0.1986 | 0.8012 | 23.11 | ||

9 | 24.29 | 0.2898 | 0.7134 | 24.40 | ||

10 | 15.25 | 0.1826 | 0.8413 | 25.86 | ||

11 | 18.43 | 0.2326 | 0.7896 | 27.42 | ||

12 | 13.28 | 0.1932 | 0.8846 | 29.89 | ||

13 | 19.48 | 0.2245 | 0.8112 | 31.34 | ||

14 | 21.22 | 0.2876 | 0.7335 | 33.87 | ||

20 | 31.12 | 0.3425 | 0.6123 | 39.25 | ||

30 | 35.43 | 0.3916 | 0.5712 | 49.66 | ||

40 | 40.36 | 0.4256 | 0.5467 | 57.68 | ||

50 | 42.12 | 0.4416 | 0.5368 | 68.31 |

Input nodes . | Hidden layers . | Number of neurons in the hidden layer . | MAPE (%) . | RMSE . | DC . | Time (s) . |
---|---|---|---|---|---|---|

13 | 3 | 4 | 25.24 | 0.2867 | 0.7349 | 17.62 |

5 | 30.36 | 0.3635 | 0.6548 | 19.35 | ||

6 | 28.08 | 0.3268 | 0.7012 | 20.41 | ||

7 | 22.14 | 0.3043 | 0.6801 | 21.83 | ||

8 | 18.67 | 0.1986 | 0.8012 | 23.11 | ||

9 | 24.29 | 0.2898 | 0.7134 | 24.40 | ||

10 | 15.25 | 0.1826 | 0.8413 | 25.86 | ||

11 | 18.43 | 0.2326 | 0.7896 | 27.42 | ||

12 | 13.28 | 0.1932 | 0.8846 | 29.89 | ||

13 | 19.48 | 0.2245 | 0.8112 | 31.34 | ||

14 | 21.22 | 0.2876 | 0.7335 | 33.87 | ||

20 | 31.12 | 0.3425 | 0.6123 | 39.25 | ||

30 | 35.43 | 0.3916 | 0.5712 | 49.66 | ||

40 | 40.36 | 0.4256 | 0.5467 | 57.68 | ||

50 | 42.12 | 0.4416 | 0.5368 | 68.31 |

As can be seen from Table 11, with the increase of neurons in the hidden layer, the RT increases. The MAPE and RMSE decrease at early stages and then begin to increase. With more than 20 neurons in the hidden layer, the performance of the PDBN decreases sharply. The performance change of PDBN is shown in Figure 10.

As can be seen from Figure 10, when the number neuron in the hidden layer is 12, the MAPE and RMSE corresponding to the PDBN are the smallest and the DC is the largest. This result indicates that there is a better fitting effect between the predicted value and the true value; however, the RT increases with an increasing number of neurons in the hidden layer. For a comprehensive comparison, when the number neurons in the hidden layer is 12, the PDBN is optimal (MAPE = 13.28%, RMSE = 0.1932, DC = 0.8846, t = 29.8 s). Therefore, the optimal structure of the PDBN for the mid- to long-term runoff prediction in the Yalong River Basin is determined (13-12-12-12-1).

##### Parameter optimization of PDBN

According to the optimization algorithm, the number of iterations of every RBM in the unsupervised stage is 300, *D* is 1.4, and *d* is 0.7. It is necessary to determine the number of extracted components of each layer from top to bottom to improve the fine-tuning precision of PDBN. Through training, the relationship between the number of extracted components *c* and the limited value *L* is as shown in Figure 11. When *L* = 0.02, the fine-tuning precision of PDBN is the highest, and the corresponding number of extracted components .

#### Evaluation of forecasting performance

In order to efficiently demonstrate the superior modeling performance of PDBN, we compare the performance of BP, SVM, and DBN models in runoff prediction. The forecasting evaluation results are comprehensively shown in Figures 12–14 and Table 12 from different aspects. The predicted and observed runoff during model calibration using different data-driven models is shown in Figure 15. The relative percent error using the different methods is shown in Figure 16.

Model . | Structure . | MAPE (%) . | RMSE (m^{3}/s)
. | DC . | RT (s) . |
---|---|---|---|---|---|

BP | 13-12-1 | 24.75 | 326.29 | 0.8775 | 30.92 |

SVM | — | 24.64 | 360.02 | 0.8508 | 29.15 |

DBN | 13-12-12-12-1 | 41.00 | 277.50 | 0.9114 | 16.58 |

PDBN | 13-12-12-12-1 | 19.98 | 229.70 | 0.9393 | 10.32 |

Model . | Structure . | MAPE (%) . | RMSE (m^{3}/s)
. | DC . | RT (s) . |
---|---|---|---|---|---|

BP | 13-12-1 | 24.75 | 326.29 | 0.8775 | 30.92 |

SVM | — | 24.64 | 360.02 | 0.8508 | 29.15 |

DBN | 13-12-12-12-1 | 41.00 | 277.50 | 0.9114 | 16.58 |

PDBN | 13-12-12-12-1 | 19.98 | 229.70 | 0.9393 | 10.32 |

Among the evaluation results based on the dataset, the data-driven models have good prediction results on the dataset constructed by COM, rainfall factors, and climate factors, but different evaluation indicators show diverse results. Among them, for the MAPE index, the PDBN model is the best (BP is 24.75%, SVM model is 24.64%, and DBN model is 41%), which indicates the minor error between the predicted value and the observed value adopting the PDBN. For the RMSE index, the PDBN model indicates the higher prediction accuracy (PDBN is 229.7 m^{3}/s, BP model is 326.29 m^{3}/s, SVM model is 360.02 m^{3}/s, DBN is 277.5 m^{3}/s), and the RMSE from the SVM model is larger. This index is proportional to the annual flow of the river basin. For the DC index, the four data-driven models are better and the PDBN model is the best (PDBN is 0.9393, BP model is 0.8775, SVM model is 0.8508, and DBN is 0.9114), which shows these models have better fitting effect. The main reason is that the PMI method has the advantages of characterizing relationships of linear and nonlinear among factors, and thus might be appropriate for the factor selection of hydrological process with complex spatio-temporal characteristics. For the RT index, the PDBN model is the best (PDBN is 10.32 s, BP model is 30.92 s, SVM model is 29.15 s, and DBN is 16.58 s), the BP and SVM models have lower scores for this index, which demonstrates the advantage of deep learning in computation speed.

In summary, the mid- to long-term runoff prediction method presented in this study includes the construction of COM, the factor reduction using PMI, and the PDBN. Among them, the COM and factor reduction provide early data processing for runoff prediction, and the PDBN method is proposed to improve the accuracy of runoff prediction. Also, the four data-driven models have an efficient prediction for the dataset tested, and the comprehensive performance of the PDBN model is better than that of BP, SVM, and the DBN models for MAPE, RMSE, DC, RE, and RT.

## CONCLUSIONS

In general, the characterization and determination of watershed runoff trends and their key factors are a difficult task. Aiming to provide a solution, this study studied the correlation between multiple factors that affect changes in mid- to long-term runoff in the watershed. A method to construct comprehensive runoff factors was proposed, and the comprehensive characterization of runoff changes in the whole watershed was realized. Then, we investigated a method for generating runoff impact factors based on PMI to obtain the key feature factors related to the runoff in the watershed, which achieved the factor selection for the mid- to long-term runoff forecasting in the Yalong River Basin.

Aiming to solve the problems caused by the highly nonlinear complex characteristics of mid- to long-term runoff forecasting in the watershed, this study proposes a mid- to long-term runoff forecasting method based on deep learning to overcome the drawback of traditional artificial neural networks and other shallow learning networks. At the same time, the dynamic optimization of structure and parameters of the DBN is studied for improving the forecasting accuracy and generalization. The novelty of our proposed method lies in the PMI-based key factor selection and the PDBN-based forecasting method.

Experimental results on the data collected in the Yalong River Basin demonstrate that our method can obtain the most consistent and correct results in mid- to long-term runoff forecasting. We acknowledge that there are many factors which are not considered in this study, such as human activities. In addition, because of the limit of condition, more recent data are not included in the study. In our future work, we will investigate more factors and more recent data which can be formulated and used for runoff forecasting. Apart from this, future work also includes investigating more robust deep learning models.

## ACKNOWLEDGEMENTS

This research is supported by ‘the Fundamental Research Funds for the Central Universities’ (Grant Nos. 2017B616X14 and 2018B610X14), ‘the National Natural Science Foundation of China’ (Grant Nos. 51420105014 and 61976118), and ‘the Postgraduate Research & Practice Innovation Program of Jiangsu Province’ (Grant No. KYCX18_0583).