Prediction of runoff in the upper Yangtze River based on CEEMDAN-NAR model

Scienti ﬁ c and accurate prediction of river runoff is important for river ﬂ ood control and sustainable use of water resources. This study evaluates the ability of a Nonlinear Auto Regressive model (NAR) in predicting runoff volume. Using the Cuntan Hydrological Station in the upper reaches of the Yangtze River as the research object, the model was established based on the runoff characteristics from 1951 to 2020 and tested by NAR. To improve the prediction ef ﬁ ciency, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) preprocessing technique is used to decompose the data. The results show that the coupled CEEMDAN-NAR model has better predictive ability than the single model, with a coupled model deterministic coef ﬁ cient (DC) of 0.93 and a prediction accuracy of Class A. better predictions than the GRU, NAR and


GRAPHICAL ABSTRACT INTRODUCTION
The changes in runoff elements affect the whole hydrological and water resources system, and the change pattern is intricate and complex, with nonlinear, sudden change, randomness and other complex characteristics Dong et al. (). Improving the accuracy of medium-term and longterm forecasts of runoff in the upper reaches of the Yangtze River is important for the rational planning and allocation of water resources in the Yangtze River basin. At present, the commonly used medium-term and long-term runoff forecasting models include the multiple regression model Cui & Jin ()  The traditional neural network model cannot predict highfrequency mutation data well, and demonstrates the defects of training conversion, which makes the network too far away from the whole. Due to the complex characteristics of hydrological data, such as nonlinearity and mutability, there is some error in using a single neural network to predict long series data. Reducing the nonstationarity of runoff series by building coupled prediction models is a new way to improve the accuracy of runoff prediction. Roushangar et al. (a, b, c)   The IMF obtained by CEEMDAN decomposition was denoted as IMF n , defining the operator E c (Á) as the cth order eigenmode component generated by the EMD algorithm. w i is the white noise signal satisfying N (0,1) distribution and x(t) is the original signal. Then the CEEM-DAN method can be described as follows: (1) Decompose using the EMD algorithm to obtain firstorder eigenmode components: (2) Calculate the residuals after first order eigenmode decomposition: (3) Perform i experiments (i À 1, Á Á Á , M) and decompose the signal r 1 (t) þ ε 1 E 1 (w i (t)) until the eigenmode components of the first-order EMD are obtained. On this basis, continue to calculate the second-order eigenmode component: (4) For each remaining stage, that is n ¼ 2, Á Á Á , N calculate the signal of the nth residual: (5) Repeat the steps of (3), such that the intrinsic modal components of order n þ 1 is: (6) Repeat (4) and (5) until the residual signal satisfies the termination condition of the decomposition, and finally K modal components are obtained. The final residual signal of the decomposition is: The final original signal can be decomposed into: NAR neural networks The NAR dynamic neural network consists of a regression neural network with an input layer, an implicit layer, an input delay layer, and an output layer Yuan (). Different from the BP neural network, the NAR neural network adds an input delay layer to the hidden layer, which is used as the delay number to record the previous data, so as to realize the dynamic memory of the system. The NAR neural network has good memory and stability, and the network has been used in a wide range of applications Huang & Lu (). A schematic diagram of the NAR network structure is shown in Figure 1.
In Figure 1, y(t) on the left represents input data, y(t) on the right represents output data, 1:5 represents the input and output delay order, W represents the connection power, b represents the threshold value, and 20 represents the number of implied layers. The NAR common network model expression is as follows: In the formula, n represents the order of the delay layer.
It can be seen from this formula that the output y(t þ 1) at the next moment depends on the y(t) at the previous n moments, which means that the model has a delay, and the past value is used to infer the current value.
The delay function structure is shown in Figure 2.
This structure diagram shows that the output is time x(t þ 1), and the result depends on the recording of the delay function, that is, the input value the neural network, so that the NAR neural network model has the ability to record previous data.

Gated recurrent unit (GRU) models
Hochreiter & Schmidhuber () proposed Long Short-Term Memory (LSTM) neural networks, a deformation model of recurrent neural networks, to overcome the problem of long dependencies in recurrent neural networks. However, the  The specific expressions and structural diagram of the GRU model are shown below:

CEEMDAN-NAR models
In order to improve the prediction accuracy of runoff, the CEEMDAN-NAR model is proposed, which is a 'decomposition-prediction-reconstruction' process, using CEEMDAN to decompose the runoff data into a finite number of subseries with high to low frequencies, which include local features and residual terms of the original data at different time scales and are relatively linear and steady-state. Using NAR to predict subseries with different frequencies, the predicted data contain more information and the prediction results are more consistent with the real situation. The specific steps of CEEMDAN-NAR coupling model are as follows: (1) CEEMDAN decomposition: MATLAB is used to perform CEEMDAN decomposition on the original data to obtain the IMF components and residual terms of the time series.
(2) Divide training data and validation data: The runoff data from 1951 to 2004 is used as the training data of the NAR network, and the data from 2005 to 2020 is used as the validation data.
(3) NAR neural network prediction: The NAR network is used to repeatedly debug the training data to make the prediction reach the best effect.
(4) Analysis of prediction results: Finally, the predicted IMF components and residual terms are cumulatively reduced and compared with the original data.
The technical route of the CEEMDAN-NAR coupling prediction model is shown in Figure 4.

Statistical evaluation indicators
In order to more clearly reflect the error of the model prediction results and the prediction accuracy, four indicators, the deterministic coefficient (DC), the mean absolute error (MAE), the mean absolute percentage error (MAPE) and the root mean square error (RMSE), were used for the analysis, and the equations were calculated as follows: [y 0 (i) À y 0 ] 2 , i ¼ 1, 2, . . . , n In the equation, y c (i) is the predicted value, y 0 (i) is the measured value, y 0 is the mean value of the measured value, and n is the sequence length.
According to the Specification of Hydrological Intelligence Forecasting (SL250-2000), the closer the DC value is to 1, the better is the model prediction. When DC ! 0.9, the prediction accuracy is Grade A; when 0.7 DC<0.9, the prediction accuracy is Grade B; when 0.5 DC<0.7, the prediction accuracy is Grade C; when DC < 0.5, the prediction result is unreliable.

CASE STUDIES Data sources
The   As can be seen from Table 1 at Cuntan Station were calculated, as shown in Table 2.
It can be seen from Figure 9 and Table 2      of water resources, water allocation and optimal reservoir operation in the Yangtze River basin.
(3) The CEEMDAN-NAR coupled model has an effective decomposition algorithm and a powerful, fast and stable prediction tool. The use of the CEEMDAN method to decompose the runoff sequence is helpful to understand the characteristics of the periodic change of runoff, and it can transform the nonlinear and nonstationary time series into a stationary time series, which helps to improve the accuracy of prediction and solves the problem of using the NAR model alone. Forecasting is difficult to reveal the characteristics of runoff changes and the problems of low prediction accuracy, which has broad application prospects. The prediction algorithm based on the CEEMDAN-NAR model can not only be used for annual runoff prediction, but also for the prediction of other time series such as sediment, rainfall, inventory, and meteorological factors. In addition, based on the time series analysis, the physical mechanism of runoff and long-term prediction are not considered, and how to conduct a more comprehensive analysis and improve the prediction accuracy will be the focus of the next research.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.