Abstract
Scientific and accurate prediction of river runoff is important for river flood control and sustainable use of water resources. This study evaluates the ability of a Nonlinear Auto Regressive model (NAR) in predicting runoff volume. Using the Cuntan Hydrological Station in the upper reaches of the Yangtze River as the research object, the model was established based on the runoff characteristics from 1951 to 2020 and tested by NAR. To improve the prediction efficiency, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) preprocessing technique is used to decompose the data. The results show that the coupled CEEMDAN-NAR model has better predictive ability than the single model, with a coupled model deterministic coefficient (DC) of 0.93 and a prediction accuracy of Class A.
HIGHLIGHTS
Proposed a runoff prediction model using CEEMDAN-NAR hybrid approach.
Decomposition of runoff data using CEEMDAN preprocessing techniques to improve model prediction accuracy.
Using models to predict runoff in the upper Yangtze River from 2005 to 2020 and verifying their accuracy
The CEEMDAN-NAR model provides better predictions than the GRU, NAR and EEMD-NAR prediction models.
Graphical Abstract
INTRODUCTION
The changes in runoff elements affect the whole hydrological and water resources system, and the change pattern is intricate and complex, with nonlinear, sudden change, randomness and other complex characteristics Dong et al. (2012). Improving the accuracy of medium-term and long-term forecasts of runoff in the upper reaches of the Yangtze River is important for the rational planning and allocation of water resources in the Yangtze River basin. At present, the commonly used medium-term and long-term runoff forecasting models include the multiple regression model Cui & Jin (2016) and the fuzzy pattern recognition model Zhu et al. (2014), but the solution function of the multiple regression model is too complicated. Fuzzy systems do not have adaptive learning capabilities, and it is also a tricky problem to automatically generate and adjust affiliation functions and fuzzy rules. For this reason, the introduction of artificial neural networks Chen et al. 2005; Bai et al. 2011; Cui 2013; Alizadeh et al. 2017; Nourani et al. 2018; Koradia et al. 2020) with better nonlinear and self-learning capabilities for runoff prediction has become a new topic of research for many scholars. Coulibaly et al. (2000) used a recurrent neural network based on a low-frequency climate change index to predict the annual runoff in northern Quebec. Mahabir et al. (2010) used fuzzy logic to predict seasonal runoff in rocks and midstream watersheds. Alizadeh et al. used wavelet neural networks to predict rainfall and runoff in the Netter River basin in the subsequent two months. Komasi & Sharghi (2016) a combined wavelet analysis and support vector machine model to predict and analyze the rainfall–runoff process in the Aghchai and Eel river basins. Guo et al. (2015) coupled the predicted future climate elements with the LS-SVM statistical downscaling method to predict the future temperature, precipitation and runoff changes in the Yangtze River basin based on the Budyko water-heat coupled equilibrium assumption. Wan & Tang (2009) used chaos theory combined with neural networks to study the prediction of monthly average runoff variation pattern at Cuntan Station in the Three Gorges. The above research mainly focuses on the traditional neural network. The traditional neural network model cannot predict high-frequency mutation data well, and demonstrates the defects of training conversion, which makes the network too far away from the whole. Due to the complex characteristics of hydrological data, such as nonlinearity and mutability, there is some error in using a single neural network to predict long series data. Reducing the nonstationarity of runoff series by building coupled prediction models is a new way to improve the accuracy of runoff prediction. Roushangar et al. (2021a, 2021b, 2021c) used wavelet transform (WT) and ensemble empirical mode decomposition (EEMD) preprocessing techniques combined with nonlinear neural ensemble methods to predict precipitation index series for the period 1978–2017 at four stations in northwestern Iran, and the results showed that data preprocessing can effectively improve model accuracy. Roushangar et al. (2021a, 2021b, 2021c) evaluated the predictive capability of kernel extreme learning machine (KELM) for daily suspended sediment concentration (SSC) and discharge (SSD) of rivers and processed the data by WT and EEMD preprocessing techniques to improve the model prediction efficiency. Roushangar et al. (2021a, 2021b, 2021c) evaluated the predictive capability of GPR on daily river stagnation and processed the data by WT and EEMD preprocessing techniques. The results showed that WT-GPR and EEMD-GPR improved the model prediction capability by 25–40%. Numerous studies have shown that the prediction accuracy of the model can be greatly improved by preprocessing the data compared with the traditional neural network model. Complete ensemble empirical model decomposition with adaptive noise (CEEMDAN) described by Torres et al. (2016) is a method of signal decomposition based on the time-scale characteristics of the data themselves, which can accurately reconstruct the original signal and improve the decomposition efficiency. CEEMDAN is used to decompose the runoff data, and the decomposed series have good stability. Combined with the Nonlinear Auto Regressive model (NAR), which has strong autonomous learning adaptation and generalization ability, a CEEMDAN-NAR model for annual runoff prediction of the upper Yangtze River was constructed, which provided a new way for runoff prediction. The purpose of this study is to evaluate the predictive capability of the NAR model in terms of runoff. To improve the prediction accuracy, the data were preprocessed using CEEMDAN to compare the predictive capability of the coupled CEEMDAN-NAR model with that of a single model. It also provides some reference for the rational planning and allocation of water resources in the Yangtze River basin.
MODELS AND METHODS
CEEMDAN model
CEEMDAN is an improvement of EMD (Empirical Mode Decomposition) by Huang et al. (1998) and EEMD (Ensemble Empirical Mode Decompo-sition) from Wu & Huang (2009). EMD and EEMD are often subject to modal mixing and data reconstruction errors that affect the accuracy of the decomposition. CEEMDAN does this by adding a limited number of times of adaptive white noise to each process. The reduction in the number of averaging times presupposes the same reduction in the reconstruction error of the signal. CEEMDAN decomposes complex time series signals into a series of simple Intrinsic Mode Functions (IMFs) and a residual term (Res).
The IMF obtained by CEEMDAN decomposition was denoted as , defining the operator
as the cth order eigenmode component generated by the EMD algorithm.
is the white noise signal satisfying N (0,1) distribution and
is the original signal. Then the CEEMDAN method can be described as follows:
- (1)
- (2)
- (3)
- (4)
- (5)
- (6)
NAR neural networks
The NAR Ma et al. (2020) neural network uses itself as the regression variable, and it represents the random variables at a subsequent time by a linear combination of random variables over time. As a dynamic neural network based on a time series, its output is not only a static mapping, but also a comprehensive utilization of previous dynamic results, with feedback and memory functions Farhan & Rajiv (2020).
The NAR dynamic neural network consists of a regression neural network with an input layer, an implicit layer, an input delay layer, and an output layer Yuan (2016). Different from the BP neural network, the NAR neural network adds an input delay layer to the hidden layer, which is used as the delay number to record the previous data, so as to realize the dynamic memory of the system. The NAR neural network has good memory and stability, and the network has been used in a wide range of applications Huang & Lu (2016). A schematic diagram of the NAR network structure is shown in Figure 1.


In the formula, n represents the order of the delay layer. It can be seen from this formula that the output at the next moment depends on the
at the previous n moments, which means that the model has a delay, and the past value is used to infer the current value.
The delay function structure is shown in Figure 2.
This structure diagram shows that the output is time , and the result depends on the recording of the delay function, that is, the input value
of the neural network, so that the NAR neural network model has the ability to record previous data.
Gated recurrent unit (GRU) models
Hochreiter & Schmidhuber (1997) proposed Long Short-Term Memory (LSTM) neural networks, a deformation model of recurrent neural networks, to overcome the problem of long dependencies in recurrent neural networks. However, the LSTM neural network model has a more complex form and takes a long time to train. To address this problem, Cho et al. (2014) improved the LSTM model and proposed the GRU neural network model. The GRU neural network consists of a reset gate and an update gate. The reset gate determines how the new input information is combined with the previous memory, and the update gate defines the amount of previous memory saved to the current time step. The GRU model has simplified the structure and reduced the number of parameters while retaining the function of long and short-term memory network units, thus significantly improving the training speed and overcoming the long-term dependency problem of traditional neural networks, which makes it stand out in the problem of predictive analysis of long series data. The GRU principle is shown in Figure 3.
The GRU model can accurately extract the structural features of the data, has a fast learning speed and a low number of iterations, with good predictive performance, and is suitable for the analysis of nonlinear data.
The prediction results of this model are compared with those of the CEEMDAN-NAR model to compare and analyse whether the prediction results of the coupled CEEMDAN-NAR model are good.
CEEMDAN-NAR models
In order to improve the prediction accuracy of runoff, the CEEMDAN-NAR model is proposed, which is a ‘decomposition–prediction–reconstruction’ process, using CEEMDAN to decompose the runoff data into a finite number of subseries with high to low frequencies, which include local features and residual terms of the original data at different time scales and are relatively linear and steady-state. Using NAR to predict subseries with different frequencies, the predicted data contain more information and the prediction results are more consistent with the real situation. The specific steps of CEEMDAN-NAR coupling model are as follows:
- (1)
CEEMDAN decomposition: MATLAB is used to perform CEEMDAN decomposition on the original data to obtain the IMF components and residual terms of the time series.
- (2)
Divide training data and validation data: The runoff data from 1951 to 2004 is used as the training data of the NAR network, and the data from 2005 to 2020 is used as the validation data.
- (3)
NAR neural network prediction: The NAR network is used to repeatedly debug the training data to make the prediction reach the best effect.
- (4)
Analysis of prediction results: Finally, the predicted IMF components and residual terms are cumulatively reduced and compared with the original data.
The technical route of the CEEMDAN-NAR coupling prediction model is shown in Figure 4.
Statistical evaluation indicators
In the equation, is the predicted value,
is the measured value,
is the mean value of the measured value, and n is the sequence length.
According to the Specification of Hydrological Intelligence Forecasting (SL250-2000), the closer the DC value is to 1, the better is the model prediction. When DC ≥ 0.9, the prediction accuracy is Grade A; when 0.7 ≤ DC<0.9, the prediction accuracy is Grade B; when 0.5 ≤ DC<0.7, the prediction accuracy is Grade C; when DC < 0.5, the prediction result is unreliable.
CASE STUDIES
Data sources
The Cuntan Hydrological Station on the upper reaches of the Yangtze River is located in Chongqing, 7.5 km downstream of the confluence of the Yangtze River and the Jialing River, with a catchment area of 866,600 km2. It controls the basic water regime after the Minjiang, Tuojiang, Jialing and Chishui Rivers merge into the Yangtze River. An important reservoir control station for the Three Gorges Project and an important hydrologic control station for the upper reaches of the Yangtze River. Cuntan Hydrological Station was built in 1936, and is a national level hydrological station and has complete and continuous measurement information. The largest flood observed since records began at the station occurred in 1981, when the water level reached 191.45 meters. The main measurement of water level, flow, sediment, water quality, etc. Cuntan hydrological station controls more than 60% of the water upstream of the Yangtze River, providing countless important information for leaders at all levels who are responsible for flood control.
The location of the study area is shown in Figure 5.
The runoff data from 1951–2020 from Cuntan Hydrological Station were used for the study data, In which runoff data from 1951 to 2004 were selected as simulation training for the model and data from 2005 to 2020 were used for validation. The runoff changes at the Cuntan Hydrological Station from 1951 to 2020 are shown in Figure 6.
CEEMDAN decomposition
CEEMDAN was used to decompose the runoff data from 1951 to 2020 at the Cuntan Hydrological Station on the upper reaches of the Yangtze River. After repeated experiments, the annual runoff was decomposed into four IMF components and one remaining term. The results are shown in Figure 7.
CEEMDAN decomposition results of runoff at Cuntan Hydrological Station from 1951 to 2020.
CEEMDAN decomposition results of runoff at Cuntan Hydrological Station from 1951 to 2020.
As can be seen from the Figure 7, the IMF component exhibits variations in frequency, amplitude and wavelength of the runoff time series. Among them, the IMF1 component has the highest frequency, the largest amplitude, and the shortest wavelength. From IMF1–IMF4, the amplitude of the IMF component gradually decreases, the frequency gradually decreases, and the wavelength gradually becomes longer. The remaining items represent the overall change trend of runoff at Cuntan Station from 1951 to 2020. It can be seen that the runoff from 1951 to 2020 has a downward trend.
Annual runoff projections
The CEEMDAN-NAR model predicts the decomposed IMF1-Res for 2005–2020 at the Cuntan Hydrographic Station, using the decomposed data from 1951 to 2004 as the model training sample to predict the data for 2005–2020, and comparing the predicted data with the true values to show the absolute error between the predicted and true values of IMF1-Res. The delay order is 1:5, and the number of hidden layer nodes is 20. The decomposed four IMF components are predicted with the residual term, and the absolute error between the predicted and true values of each component is displayed. The prediction results are shown in Figure 8.
It can be seen from Figure 8 that after the CEEMDAN decomposition process, the nonlinearity and nonsmoothness of the annual runoff time series are greatly reduced, and the IMF1–IMF2 data have the greatest volatility and the worst fitting effect, and the absolute error produced is also larger, fluctuating within ±200. The predicted values from IMF3-Res fit better and better with the true values, and the absolute errors show a gradually decreasing trend.
As can be seen from Table 1, the deterministic coefficient of the CEEMDAN-NAR model for predicting the annual runoff of each component at Cuntan Station becomes better gradually, and the deterministic coefficient of IMF1 is 0.81. According to the Specification of Hydrological Intelligence Forecasting (SL250-2000), the prediction accuracy of IMF1 is Grade B, and from IMF2-Res, the deterministic coefficients the coefficients from IMF2-Res are all >0.90, and the prediction accuracy is Class A.
Analysis of the deterministic index of annual runoff for each component at Cuntan Station
Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | Res . | overall . |
---|---|---|---|---|---|---|
DC | 0.81 | 0.92 | 0.96 | 0.99 | 0.99 | 0.92 |
Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | Res . | overall . |
---|---|---|---|---|---|---|
DC | 0.81 | 0.92 | 0.96 | 0.99 | 0.99 | 0.92 |
In order to study the annual runoff of Cuntan hydrological station in a more visual and detailed way, the predicted components were superimposed, and the reconstructed data were organized into adult runoff displays to compare the absolute error between the real and predicted values of runoff. The specific results are shown in Figure 9. The DC, MAE, RMSE, and MAPE of the runoff from 2005 to 2020 at Cuntan Station were calculated, as shown in Table 2.
Evaluation indicators for annual runoff prediction at Cuntan Station
MAE . | RMSE . | MAPE(%) . | DC . |
---|---|---|---|
83.53 | 101.52 | 2.56 | 0.94 |
MAE . | RMSE . | MAPE(%) . | DC . |
---|---|---|---|
83.53 | 101.52 | 2.56 | 0.94 |
It can be seen from Figure 9 and Table 2 that the CEEMDAN-NAR coupling prediction model has a good prediction effect. The real value of runoff and the predicted value have the same trend, and the fitting effect is good. The difference between the real value and the predicted value in 2006. The maximum absolute error is 169.15 × 108m3, and the absolute error is the smallest in 2010. The model has a certainty coefficient of 0.94 for runoff prediction at Cuntan Station from 2005 to 2020, which reaches the Class A accuracy specified in Hydrological Intelligence Forecasting (SL250-2000).
Comparative analysis
To verify the superiority of the CEEMDAN-NAR coupled model, the GRU model, NAR model, EEMD-NAR model, and CEEMDAN-NAR coupled model were compared. the prediction results of the CEEMDAN-NAR model and the other models are shown in Figure 10.
The results show that the prediction results of the CEEMDAN-NAR model are in line with expectations, and the period and trend are highly consistent with the sequence of the measured data. Table 3 shows that the errors of CEEMDAN-NAR and EEMD-NAR models, which are ‘decomposition–prediction–reconstruction’ models, are significantly smaller than those of the single-prediction models. The reason is that the smoothness of the time series after decomposition is greatly improved, which improves the prediction accuracy. The CEEMDAN decomposition solves the problems of modal confusion and residual noise in the reconstruction sequence in the EEMD decomposition process. The decomposition process is complete and has no reconstruction errors. The errors of the prediction results of the CEEMDAN-NAR model are smaller than those of the EEMD-NAR model. Among them, compared with the GRU model, NAR model, and EEMD-NAR model, the MAE is reduced by 52.31%, 29.47%, and 14.13%, respectively, the MAPE is reduced by 2.72%, 1.00%, and 0.39%, and the RMSE is reduced by 51.11%, 23.02%, and 11.13%.
Comparison of different model errors
Model . | MAE (108m3) . | RMSE (108m3) . | MAPE (%) . |
---|---|---|---|
GRU | 175.18 | 207.67 | 5.28 |
NAR | 118.43 | 131.88 | 3.56 |
EEMD-NAR | 97.27 | 114.24 | 2.95 |
CEEMDAN-NAR | 83.53 | 101.52 | 2.56 |
Model . | MAE (108m3) . | RMSE (108m3) . | MAPE (%) . |
---|---|---|---|
GRU | 175.18 | 207.67 | 5.28 |
NAR | 118.43 | 131.88 | 3.56 |
EEMD-NAR | 97.27 | 114.24 | 2.95 |
CEEMDAN-NAR | 83.53 | 101.52 | 2.56 |
The nonstationary and nonlinear characteristics of the runoff data will affect the training process of the single prediction model, resulting in low accuracy of the prediction results. The CEEMDAN method effectively reduces the number of iterations, improves the reconstruction accuracy, and is more suitable for the analysis of nonlinear signals. CEEMDAN-NAR coupled with NAR prediction model is applied to annual runoff prediction, which has obvious advantages compared with the traditional single-prediction model and EEMD decomposition.
CONCLUSION
- (1)
Establish CEEMDAN-NAR to establish a coupling model, and use this model in the runoff prediction of the Cuntan Hydrometric Station in the upper reaches of the Yangtze River. The results show that the runoff of the Cuntan Hydrological Station in the upper reaches of the Yangtze River in 1951–2020 exhibited a downward trend. The change cycle and trend of the model prediction results are consistent with the real data, the prediction error is small, and the deterministic coefficient (DC) is 0.94, reaching the Class A accuracy standard, and the prediction results are reliable.
- (2)
The coupled CEEMDAN-NAR model outperforms the GRU model, NAR model, and EEMD-NAR model in prediction. Taking the annual runoff data decomposed by CEEMDAN as the input data of the model can effectively improve the prediction accuracy of the NAR prediction model. The coupled prediction model based on CEEMDAN can obtain more accurate and stable prediction results, which is helpful for the study of hydrological time series prediction. The use of this model can provide a technical reference for the development and utilization of water resources, water allocation and optimal reservoir operation in the Yangtze River basin.
- (3)
The CEEMDAN-NAR coupled model has an effective decomposition algorithm and a powerful, fast and stable prediction tool. The use of the CEEMDAN method to decompose the runoff sequence is helpful to understand the characteristics of the periodic change of runoff, and it can transform the nonlinear and nonstationary time series into a stationary time series, which helps to improve the accuracy of prediction and solves the problem of using the NAR model alone. Forecasting is difficult to reveal the characteristics of runoff changes and the problems of low prediction accuracy, which has broad application prospects. The prediction algorithm based on the CEEMDAN-NAR model can not only be used for annual runoff prediction, but also for the prediction of other time series such as sediment, rainfall, inventory, and meteorological factors. In addition, based on the time series analysis, the physical mechanism of runoff and long-term prediction are not considered, and how to conduct a more comprehensive analysis and improve the prediction accuracy will be the focus of the next research.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.