Application of the decomposition-prediction- reconstruction framework to medium- and long-term runoff forecasting

Mediumand long-term runoff forecasting has always been a problem, especially in the wet season. Forecasting performance can be improved using complementary ensemble empirical mode decomposition (CEEMD) to produce clearer signals as model inputs. In the forecasting models based on CEEMD, the entire time series is decomposed into several sub-series, each sub-series is divided into training and validation datasets and forecasted by some common models, such as least squares support vector machine (LSSVM), and finally an ensemble forecasting result is obtained by summing the forecasted results of each sub-series. This model was applied to forecast the inflow runoff of the Shitouxia Reservoir (STX Reservoir). The forecasting results show that the Nash efficiency coefficient of the LSSVM model is 0.815, and the Nash efficiency coefficient of the CEEMD-LSSVM model is 0.954, an increase of 13.9%. The root mean square error value is reduced from 20.654 to 10.235, a decrease of 50.4%. The runoff forecasting performance can be effectively improved by applying the CEEMD-LSSVM model. When analyzing the annual runoff forecasting results month by month, it was found that the forecasting results for November to April were unsatisfactory compared results from the nearest neighbor bootstrapping regressive (NNBR) model, which was more suitable for the dry season, but the forecasting results for May to October improved significantly. This also proves that the CEEMD-LSSVM model has a great advantage in the forecasting of inflow runoff during the wet season. In the optimized operation of reservoirs, the forecasting result of inflow runoff in the wet season is more important than in the dry season. Therefore, when forecasting annual runoff month by month, the CEEMD-LSSVM model is recommended for the wet season combined with the NNBR model for the dry season.


INTRODUCTION
Hydrological forecasting is of significant importance for planning and managing water resources. If the forecasting lead time is longer than the maximal confluence time of the basin plus 3 days but shorter than 1 year, it is classed as medium-and long-term hydrological forecasting (Tang et al. ). Medium-and long-term hydrological forecasting is a powerful means of making full use of water resources and realizing optimal reservoir scheduling. It is an important basis for correct decision-making in reservoir operation management.
Runoff forecasting has attracted wide attention in the last few decades. Physical models are usually used for runoff prediction. In this method, the runoff generation process is simulated by equations with specific boundary conditions. But this kind of model needs a lot of accurate historical rainfall runoff data to calibrate the model parameters. In practice, it is difficult to ensure the accuracy of the data and meet the requirement of the sample size, Cheng-You ) have been proposed for medium-and long-term runoff forecasting. However, In the middle and long-term runoff forecasting, which model shows the best performance is not yet clear. ANN has strong non-linear mapping capabilities, but it also has problems such as slow learning speed, overfitting, and dimensionality disasters.
Therefore, when processing complex hydrological data, its forecasting performance is not satisfactory. SVM is a small sample statistical learning model based on the Vapnik-Chervonenkis (VC) dimensionality theory and the principle of structural risk minimization; it can effectively avoid dimensionality disasters, has high simulation accuracy, and can theoretically achieve global optimization (Vapnik  proposed a wavelet-neuro-fuzzy model to simulate precipitation, and applied the periodic expression ability of wavelet transform technology to improve the forecasting accuracy of the model. Since wavelet transform is suitable for processing non-stationary data mathematical tools, and the input data is required to be linear, the mother wavelet also requires a pre-set basis function (Niu et al. ), which limits its application. Hydrological time series usually have highly complex non-stationary characteristics, and the adjacent states are mostly non-linear relations (Huang et al. ).
Therefore, empirical mode decomposition (EMD) is used in the field of hydrological data analysis because it is suitable for processing complex non-linear and non-stationary time series (Karthikeyan & Nagesh Kumar ). In addition, EMD is based on the principle of local scale separation and does not require a predetermined basis function, which is adaptive and intuitive (Sang et al. ). The entire series is decomposed by EMD into several sub-series called intrinsic mode function (IMF) and a residue. However, there are various signal oscillation modes in the IMF component after EMD decomposition because it is discontinuous and there is a local intermittent component. In

METHODOLOGY CEEMD
The EMD method is used for decomposing complex signals into single-frequency signals. EMD is an empirical, intuitive, direct and self-adaptive data processing method for nonlinear and non-stationary time series (Huang et al. ).

CEEMD (Torres et al. ) is an enhancement of EMD
and EEMD (Wu & Huang ). The decomposed signals can be arranged according to the frequency from high to low. The white noise added into the EEMD method overcomes the pattern aliasing problem in the EMD method; it also brings the reconstruction error into the decomposed IMF component. CEEMD adds two Gaussian white noises at the same time in the process of reconstructing the signal. The amplitudes are the same but the phases are opposite. Therefore, the IMF is finally determined according to the CEEMD method. The reconstruction error can be offset when the components are averaged. The specific process of CEEMD decomposition is as follows.
Step 1. Set the maximum aggregation number I and white noise amplitude, and initialize it so that i ¼ 1.
Step 2. Generate white noise n i (t), reconstruct a pair of new signals according to Equation (1): Step 3. Connect all local maxima and minima by a cubic spline interpolation, and generate an upper and lower envelope e max (t), e min (t).
Step 4. Compute the envelope mean using Equation (2): Step 5. Calculate the difference between x(t) and m(t) as c(t): Step 6. Check whether or not c(t) is an IMF according to the two conditions mentioned above. If c(t) is an IMF, go to Step 6; otherwise, let x(t) ¼ c(t), and repeat Steps 3-5 until c(t) is an IMF.
Step 7. Calculate the residue r(t) ¼ x(t)-c(t). If the residue r(t) becomes a monotonic function or at most has one local extreme point, the whole decomposition is completed.
Step 3-6, to obtain q IMF components: Step 8. Determine if the maximum number of iterations is reached. If i < I, let i ¼ i þ 1, loop Steps 2-7, which requires adding white noise to the initial signal, and that the white noise added each time is different.
Step 9. Find the average value of the IMF after the EMD decomposition, and obtain: where c j is the jth IMF. The solution speed is faster. The model works as follows: For a given data set {x i , y i } N i¼1 , the mathematical model of LSSVM can be described as: where x i is the ith m-dimensional input, y i is the ith real-valued output, n is the number of samples, φ is the kernel-space mapping function, ω τ is the weight vector, b is the deviation, and e i ∈R is the error variable.
Solving coefficients based on Lagrangian function, finally construct the LSSVM regression function as: where y(x) is the forecast object, x i is the support vector obtained through training, x is the forecasting sample, α is the Lagrangian coefficient obtained through training, b is the deviation amount, and K(x,x t ) is the kernel function.
This study chooses Gaussian radial basis function (RBF) as the kernel function (Maity et al. ) Its expression is as follows: where σ is the of the kernel width. The structure of LSSVM is shown in Figure 1.

CEEMD-LSSVM
Because the inflow runoff has non-linear and non-stationary

Model verification
The Nash-Sutcliffe efficiency coefficient (NS), root mean square error (RMSE), correlation coefficient (R), mean absolute relative error (MARE) and mean absolute error (MAE) were used to evaluate the performance of forecasting models: where n is the length of runoff time series; y i and y i ' are the observed and forecasted runoff at time i, y and y 0 are the mean of the observed and forecasted runoff, respectively.
The closer the NS and R values are to 1, and the RMSE, MARE and MAE values are to 0, the better the performance of the forecasting model.

Results and discussion
Characteristic analysis of runoff decomposition signals The original runoff sequence was decomposed into 10 sub-families, including nine IMFs and one residue. These sub-series have different frequencies and amplitudes.

Application analysis and discussion
In the forecasting model, the entire time series was decomposed into several IMFs and one residue, and then each decomposition component was divided into a training period and a validation period. The training period data was used to build the model, and the validation period was used for forecast testing. The results showed that IMF8, IMF9 and the residue of the runoff sequence were highly accurate by the multivariate linear fitting method, and other components were forecast by LSSVM. Figure 5 shows the forecasting results of the runoff sequence decomposition. Table 1 shows the forecast results of the components. It can be seen that in addition to the higher frequency IMF1 and IMF2, the other IMFs and the residue are     (11), the runoff in the dry season is relatively low, and thus even a small deviation will lead to a large relative error. The performance of the CEEMD-LSSVM and LSSVM models is shown in Figure 6. It can be seen that CEEMD-LSSVM is significantly better than the LSSVM model for the forecasting of runoff, especially in the peak forecasting, which proves that the overall accuracy of the LSSVM model can be improved by the CEEMD method. Therefore, the decomposition can be helpful to transform non-linear and non-stationary time series to stationary time series and can be useful to improve the forecasting capacity. It can also be seen that in terms of minimum values, the CEEMD-LSSVM model does not perform well and even predicts negative values.
MAE, RMSE and MARE were used to evaluate the forecasting accuracy in each month (Table 3). It is obvious that the CEEMD-LSSVM model is very suitable for runoff forecasting in the wet season, but the prediction effect in the dry season is not as good as the NNBR model. As shown in Table 3, for the STX Reservoir, the CEEMD-LSSVM model predicts better results than the other models from May to October, but in other months it is not as accurate as the NNBR model. The scatter plots of the runoff predictions and observations show that in the runoff prediction process of the reservoir, the prediction accuracy of NNBR when the runoff is small is higher than that of the CEEMD method, as shown in Figure 7. It can be seen that the scatter points in Figure 7(b) is more    respectively. Although the standard deviation of the combined model is slightly larger than the other models, the RMSE and R values are smaller and higher than for the other models, respectively. In general, the combined model can effectively improve the accuracy and can be applied to the forecasting of the inflow runoff of the STX Reservoir. Figure 9 shows that the forecasting accuracy in both the wet and dry seasons can be significantly improved. Table 4 shows that compared with the CEEMD-LSSVM model, (1) It is proved that the runoff decomposition model based on CEEMD can effectively identify the characteristic information of the original runoff series, and decompose the runoff series into several IMF components and one residual quantity whose frequencies are from high to low.
(2) The CEEMD-LSSVM model can significantly improve the accuracy of the dry season prediction, but it is relatively low for the dry season prediction.
(3) The NNBR model can better characterize the autocorrelation of runoff and has little variation in numerical fluctuations, so the NNBR model shows better performance than the other models in dry season prediction. It is therefore suggested that the monthly runoff forecasting for the STX Reservoir should combine the NNBR model with the CEEMD-LSSVM model, to use CEEMD-LSSVM model in the wet season, while in the dry season, the NNBR method with relatively high accuracy is used to forecast runoff, with the two methods being combined in stages to improve the prediction accuracy of inflow runoff.
The method in this study provided reliable results to simulate the runoff data to can provide support for  decision-making and risk analysis in reservoir operation. In the future, the techniques can be applied to different reservoirs. In theory, because of the complexity of the hydrological system, in order to achieve more accurate prediction, it is necessary to analyze the physical meaning of each sub-series. With the increase of global temperature and atmospheric humidity, there will be more and more extreme hydrometeorological events such as extreme precipitation and drought. Reservoirs' real-time operation will be affected by extreme runoff more frequently. Consequently, when the trajectory of future hydrological elements no longer follows historical data, the prediction accuracy of the model needs to be improved.
The uncertainty of runoff prediction error should be considered in future reservoir real-time operation, so as to improve the reservoir real-time operation potentiality. This will be the focus of future research.