## Abstract

Medium- and long-term runoff forecasting has always been a problem, especially in the wet season. Forecasting performance can be improved using complementary ensemble empirical mode decomposition (CEEMD) to produce clearer signals as model inputs. In the forecasting models based on CEEMD, the entire time series is decomposed into several sub-series, each sub-series is divided into training and validation datasets and forecasted by some common models, such as least squares support vector machine (LSSVM), and finally an ensemble forecasting result is obtained by summing the forecasted results of each sub-series. This model was applied to forecast the inflow runoff of the Shitouxia Reservoir (STX Reservoir). The forecasting results show that the Nash efficiency coefficient of the LSSVM model is 0.815, and the Nash efficiency coefficient of the CEEMD-LSSVM model is 0.954, an increase of 13.9%. The root mean square error value is reduced from 20.654 to 10.235, a decrease of 50.4%. The runoff forecasting performance can be effectively improved by applying the CEEMD-LSSVM model. When analyzing the annual runoff forecasting results month by month, it was found that the forecasting results for November to April were unsatisfactory compared results from the nearest neighbor bootstrapping regressive (NNBR) model, which was more suitable for the dry season, but the forecasting results for May to October improved significantly. This also proves that the CEEMD-LSSVM model has a great advantage in the forecasting of inflow runoff during the wet season. In the optimized operation of reservoirs, the forecasting result of inflow runoff in the wet season is more important than in the dry season. Therefore, when forecasting annual runoff month by month, the CEEMD-LSSVM model is recommended for the wet season combined with the NNBR model for the dry season.

## HIGHLIGHTS

CEEMD is suitable for non-linear and non-stationary time series.

A comprehensive evaluation system was used to evaluate the accuracy of the different models.

The performance of the models can be improved by using the CEEMD method.

Different models are used for forecasting in the dry season and the wet season respectively.

## INTRODUCTION

Hydrological forecasting is of significant importance for planning and managing water resources. If the forecasting lead time is longer than the maximal confluence time of the basin plus 3 days but shorter than 1 year, it is classed as medium- and long-term hydrological forecasting (Tang *et al.* 2008). Medium- and long-term hydrological forecasting is a powerful means of making full use of water resources and realizing optimal reservoir scheduling. It is an important basis for correct decision-making in reservoir operation management.

Runoff forecasting has attracted wide attention in the last few decades. Physical models are usually used for runoff prediction. In this method, the runoff generation process is simulated by equations with specific boundary conditions. But this kind of model needs a lot of accurate historical rainfall runoff data to calibrate the model parameters. In practice, it is difficult to ensure the accuracy of the data and meet the requirement of the sample size, which often leads to unsatisfactory calibration results of model parameters and poor forecasting performance of the model.

Many non-linear data-driven models, such as artificial neural networks (ANN) (Sivapragasam & Vasudevan 2010; Humphrey *et al.* 2016; Shiri & Kisi 2010; Sudheer *et al.* 2010), adaptive neuro-fuzzy inference system (ANFIS) (Nayak *et al.* 2004; Ashrafi *et al.* 2017), genetic programming (GP) (Kisi & Cimen 2011; Danandeh Mehr 2013; Ravansalar *et al.* 2017), support vector machines (SVM) (Asefa *et al.* 2006; Kisi & Cimen 2011; Huang *et al.* 2014), and nearest neighbor bootstrapping regressive models (NNBR) (Ye & Cheng-You *2*011) have been proposed for medium- and long-term runoff forecasting. However, In the middle and long-term runoff forecasting, which model shows the best performance is not yet clear. ANN has strong non-linear mapping capabilities, but it also has problems such as slow learning speed, overfitting, and dimensionality disasters. Therefore, when processing complex hydrological data, its forecasting performance is not satisfactory. SVM is a small sample statistical learning model based on the Vapnik–Chervonenkis (VC) dimensionality theory and the principle of structural risk minimization; it can effectively avoid dimensionality disasters, has high simulation accuracy, and can theoretically achieve global optimization (Vapnik 2000). Least square support vector machine (LSSVM) is an improved SVM model. Its convergence speed is faster than traditional SVM and the model forecasting accuracy is better. Some studies have shown that clearer sub-signals can be generated as model inputs through signal decomposition technology, thus improving the forecasting performance of the model (Tan *et al.* 2018). Partal (2007) proposed a wavelet-neuro-fuzzy model to simulate precipitation, and applied the periodic expression ability of wavelet transform technology to improve the forecasting accuracy of the model. Since wavelet transform is suitable for processing non-stationary data mathematical tools, and the input data is required to be linear, the mother wavelet also requires a pre-set basis function (Niu *et al.* 2016), which limits its application. Hydrological time series usually have highly complex non-stationary characteristics, and the adjacent states are mostly non-linear relations (Huang *et al.* 2009).

Therefore, empirical mode decomposition (EMD) is used in the field of hydrological data analysis because it is suitable for processing complex non-linear and non-stationary time series (Karthikeyan & Nagesh Kumar 2013). In addition, EMD is based on the principle of local scale separation and does not require a predetermined basis function, which is adaptive and intuitive (Sang *et al.* 2012). The entire series is decomposed by EMD into several sub-series called intrinsic mode function (IMF) and a residue. However, there are various signal oscillation modes in the IMF component after EMD decomposition because it is discontinuous and there is a local intermittent component. In response to this phenomenon, ensemble empirical mode decomposition (EEMD) adds a certain white noise to the original signal, and the uniform distribution of Gaussian white noise in the frequency range makes the IMF component after EMD decomposition continuous on the time scale, thus overcoming the modal aliasing defects caused by intermittent components (Huang *et al.* 1998; Huang & Wu 2008). While the white noise added in the EEMD method overcomes the pattern aliasing problem in the EMD method, it also brings a reconstruction error into the decomposed IMF component. The reconstruction error can only be reduced in the EEMD and cannot be eliminated. In order to overcome this shortcoming, a polymer EMD method, complementary ensemble empirical mode decomposition (CEEMD), is proposed (Yeh *et al.* 2010). Therefore, the IMF is finally determined according to the CEEMD method. The reconstruction error can be offset when the components are averaged. As an improved algorithm of EMD, CEEMD (Li *et al.* 2014) not only effectively solves the modal aliasing problem of EMD, but also preserves the advantages of EMD processing non-stationary signals, such as adaptiveness, two filtering characteristics and so on.

This study summarizes the limitations of other researchers’ conclusions and the advantages of CEEMD, which is more suitable for non-linear and non-stationary data, and proposes the CEEMD-LSSVM model. The CEEMD-LSSVM model, based on the decomposition-prediction-reconstruction pattern can be applied to predict the non-linear and non-stationary runoff time series well. The input is not the runoff time series, but the signal decomposed by CEEMD. It can be used to verify whether the forecasting model can be applied to runoff forecasting. Taking the Shitouxia (STX) Reservoir as an example, the CEEMD model is used to decompose the original time series into several sub-series, and the LSSVM model is used to forecast each sub-series, then ensemble forecast is obtained by summing the forecasting results of each sub-series. Finally, the forecasting results are compared with those obtained by LSSVM and NNBR models.

## METHODOLOGY

### CEEMD

The EMD method is used for decomposing complex signals into single-frequency signals. EMD is an empirical, intuitive, direct and self-adaptive data processing method for non-linear and non-stationary time series (Huang *et al.* 1998). CEEMD (Torres *et al.* 2011) is an enhancement of EMD and EEMD (Wu & Huang 2009). The decomposed signals can be arranged according to the frequency from high to low. The white noise added into the EEMD method overcomes the pattern aliasing problem in the EMD method; it also brings the reconstruction error into the decomposed IMF component. CEEMD adds two Gaussian white noises at the same time in the process of reconstructing the signal. The amplitudes are the same but the phases are opposite. Therefore, the IMF is finally determined according to the CEEMD method. The reconstruction error can be offset when the components are averaged. The specific process of CEEMD decomposition is as follows.

Step 1. Set the maximum aggregation number *I* and white noise amplitude, and initialize it so that *i* = 1.

Step 3. Connect all local maxima and minima by a cubic spline interpolation, and generate an upper and lower envelope *e _{max}*(

*t*),

*e*(

_{min}*t*).

Step 6. Check whether or not *c*(*t*) is an IMF according to the two conditions mentioned above. If *c*(*t*) is an IMF, go to Step 6; otherwise, let *x*(*t*) = *c*(*t*), and repeat Steps 3–5 until *c*(*t*) is an IMF.

Step 8. Determine if the maximum number of iterations is reached. If , let , loop Steps 2-7, which requires adding white noise to the initial signal, and that the white noise added each time is different.

### LSSVM

LSSVM originated from SVM and is a powerful methodology for solving problems in non-linear classification, function estimation and regression. This method has been applied in pattern recognition, signal processing and non-linear regression estimation. LSSVM was proposed by Suykens and Vandewalle in 1999 and has been employed in chaotic time series prediction (Suykens & Vandewalle 1999). It uses a set of linear equations for training while SVM uses a quadratic optimization problem, which is the major difference between the two. Compared with SVM, LSSVM uses the least squares linear system as the loss function, and the solution process becomes a set of equations. The solution speed is faster. The model works as follows:

*x*is the

_{i}*i*th m-dimensional input,

*y*is the th real-valued output,

_{i}*n*is the number of samples,

*φ*is the kernel-space mapping function,

*ω*

^{τ}is the weight vector,

*b*is the deviation, and

*e*∈

_{i}*R*is the error variable.

*y*(

*x*) is the forecast object,

*x*is the support vector obtained through training,

_{i}*x*is the forecasting sample,

*α*is the Lagrangian coefficient obtained through training,

*b*is the deviation amount, and

*K*(

*x,x*) is the kernel function. This study chooses Gaussian radial basis function (RBF) as the kernel function (Maity

_{t}*et al.*2010) Its expression is as follows:where is the of the kernel width. The structure of LSSVM is shown in Figure 1.

### CEEMD-LSSVM

Because the inflow runoff has non-linear and non-stationary characteristics, the traditional signal analysis method extracts the characteristic information of the signal from the time domain or the frequency domain, which makes the information have limitations and easily causes information loss. CEEMD is particularly well suited for processing complex non-linear systems due to its good adaptability. In order to cope with the non-stationary problem of runoff, a CEEMD-LSSVM hybrid model is built, where CEEMD is used to decompose the original time series into several sub-series, and LSSVM is used to build a forecasting model for each sub-series. Sub-series obtained by CEEMD are relatively stationary, and can provide information about the original data structure and its periodicity. Therefore, the performance of the forecasting models are expected to be improved by giving useful information on various resolution levels.

Forecasting adopts the ‘decomposition-prediction-reconstruction’ based on CEEMD, and includes these steps: Firstly, decompose runoff time series into a collection of IMFs and a residue using CEEMD; and each sub-series is divided into training samples and validation samples. Then, the CEEMD-LSSVM forecasting model of sub-series is established based on the training samples, and the decomposition signals of the training samples are forecast; finally, the forecasting results of the validation samples are reconstructed, and the accuracy of the model is analyzed. The framework of CEEMD-LSSVM is shown in Figure 2.

In order to verify the prediction performance of the CEEMD-LSSVM model in dry season and flood season, two forecasting models, LSSVM and NNBR (Lall & Sharma 1996), are applied for comparative prediction. The input of the NNBR model is monthly runoff in the dry season. The forecasting results of the three are then analyzed and the accuracy of the forecasting methods at different seasons are compared.

## CASE STUDY

### Study area and data

Qinghai Province is located in arid area. In order to relieve the pressure of water supply in the Xining area, the provincial capital of Qinghai, an inter-basin water transfer system, the Datong river-to-Huangshui river, was built. It is an important project to solve the water supply problem in Xining city (Figure 3). The STX Reservoir is located on the main stream of the Datong River. The catchment area of the Datong River is 15,000 km^{2} and the length of the main stream is 560.7 km. The STX Reservoir is the source reservoir of the project, so medium- and long-term runoff forecasting is of practical significance for the joint regulation of these reservoirs and rational allocation and utilization of water resources.

A longer training period should be selected in order to ensure the effect of the training period, so this study selected data for the STX Reservoir from 1956 to 2000 (540 time periods) as the training period, and 2001–2010 (120 time periods) as the validation period. Applying CEEMD to decompose the runoff time series of the STX Reservoir, it was decomposed into nine IMFs and one residue according to the frequency. As shown in Figure 4, as the component order increases, IMF_{1}-IMF_{9} gradually show regularization and smoothing, the periodicity gets stronger and stronger, and the residue slowly decreases.

### Model verification

*y*and

_{i}*y*’ are the observed and forecasted runoff at time

_{i}*i*, and are the mean of the observed and forecasted runoff, respectively. The best fit between observed and forecasted values would be NS = 1, R = 1, RMSE = 0, MARE = 0 and MAE = 0. The closer the NS and R values are to 1, and the RMSE, MARE and MAE values are to 0, the better the performance of the forecasting model.

### Results and discussion

#### Characteristic analysis of runoff decomposition signals

In previous forecasting experiments, the runoff series in the validation period are processed using the decomposition model saved in the training period. However, the decomposition results may differ significantly depending on the length of the series. The longer the runoff sequence, the more sub-series are decomposed (Huang *et al.* 1998). The original runoff sequence was decomposed into 10 sub-families, including nine IMFs and one residue. These sub-series have different frequencies and amplitudes.

#### Application analysis and discussion

In the forecasting model, the entire time series was decomposed into several IMFs and one residue, and then each decomposition component was divided into a training period and a validation period. The training period data was used to build the model, and the validation period was used for forecast testing. The results showed that IMF8, IMF9 and the residue of the runoff sequence were highly accurate by the multivariate linear fitting method, and other components were forecast by LSSVM. Figure 5 shows the forecasting results of the runoff sequence decomposition. Table 1 shows the forecast results of the components. It can be seen that in addition to the higher frequency IMF1 and IMF2, the other IMFs and the residue are predicted to perform well. When the original sequence was decomposed layer by layer, the frequency of the decomposition component was lowered and there was a gradual apparent periodic process. Therefore, the predicted performance of the IMFs gradually increased, resulting in NS and R close to 1, while RMSE, MARE and MAE were approaching 0. The component forecasting results were used to reconstruct the information to obtain the runoff forecasting value of the STX Reservoir. R and NS were close to 1, RMSE, MARE and MAE were lower, indicating that the forecasting result was better.

. | Criteria . | IMF_{1}
. | IMF_{2}
. | IMF_{3}
. | IMF_{4}
. | IMF_{5}
. | IMF_{6}
. | IMF_{7}
. | IMF_{8}
. | IMF_{9}
. | Residue . |
---|---|---|---|---|---|---|---|---|---|---|---|

Training dataset | R | 0.857 | 0.9643 | 0.999 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

NS | 0.5022 | 0.8663 | 0.9964 | 0.999 | 1 | 1 | 1 | 1 | 1 | 1 | |

MAE | 5.6519 | 1.2482 | 1.4208 | 0.2048 | 0.0014 | 0.0001 | 0 | 0 | 0 | 0 | |

MARE (%) | 175.57 | 126.35 | 9.7520 | 5.4598 | 0.9614 | 0.2853 | 0.0537 | 0 | 0 | 0 | |

RMSE | 6.3545 | 1.2346 | 1.4652 | 0.1631 | 0.0083 | 0.0014 | 0.0001 | 0 | 0 | 0 | |

Validating dataset | R | 0.7119 | 0.9312 | 0.9969 | 0.9983 | 1 | 1 | 1 | 1 | 1 | 1 |

NS | 0.5022 | 0.8663 | 0.9937 | 0.9966 | 1 | 1 | 1 | 1 | 1 | 1 | |

MAE | 7.3310 | 1.6842 | 2.2209 | 0.4052 | 0.0225 | 0.0023 | 0.0014 | 0.0001 | 0 | 0 | |

MARE (%) | 229.44 | 198.92 | 12.2750 | 8.9845 | 1.6914 | 0.3594 | 0.0892 | 0.0039 | 0.0001 | 0 | |

RMSE | 9.8449 | 2.7634 | 2.8642 | 0.5371 | 0.0280 | 0.0027 | 0.0017 | 0.0001 | 0 | 0 |

. | Criteria . | IMF_{1}
. | IMF_{2}
. | IMF_{3}
. | IMF_{4}
. | IMF_{5}
. | IMF_{6}
. | IMF_{7}
. | IMF_{8}
. | IMF_{9}
. | Residue . |
---|---|---|---|---|---|---|---|---|---|---|---|

Training dataset | R | 0.857 | 0.9643 | 0.999 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

NS | 0.5022 | 0.8663 | 0.9964 | 0.999 | 1 | 1 | 1 | 1 | 1 | 1 | |

MAE | 5.6519 | 1.2482 | 1.4208 | 0.2048 | 0.0014 | 0.0001 | 0 | 0 | 0 | 0 | |

MARE (%) | 175.57 | 126.35 | 9.7520 | 5.4598 | 0.9614 | 0.2853 | 0.0537 | 0 | 0 | 0 | |

RMSE | 6.3545 | 1.2346 | 1.4652 | 0.1631 | 0.0083 | 0.0014 | 0.0001 | 0 | 0 | 0 | |

Validating dataset | R | 0.7119 | 0.9312 | 0.9969 | 0.9983 | 1 | 1 | 1 | 1 | 1 | 1 |

NS | 0.5022 | 0.8663 | 0.9937 | 0.9966 | 1 | 1 | 1 | 1 | 1 | 1 | |

MAE | 7.3310 | 1.6842 | 2.2209 | 0.4052 | 0.0225 | 0.0023 | 0.0014 | 0.0001 | 0 | 0 | |

MARE (%) | 229.44 | 198.92 | 12.2750 | 8.9845 | 1.6914 | 0.3594 | 0.0892 | 0.0039 | 0.0001 | 0 | |

RMSE | 9.8449 | 2.7634 | 2.8642 | 0.5371 | 0.0280 | 0.0027 | 0.0017 | 0.0001 | 0 | 0 |

Table 2 shows the forecasting performance of the LSSVM, NNBR and CEEMD-LSSVM models for the STX Reservoir. The performance of the CEEMD-LSSVM model was significantly improved, and the NS and R values were higher than for the other models. This indicates that the CEEMD-LSSVM model produced better agreement between the forecasted and observed runoff and the RMSE and MAE values were lower than for the other models, so the average forecasting error was low. However, the MARE value of the CEEMD-LSSVM model was higher than that of the NNBR model, which can be attributed to the fact that the CEEMD-LSSVM model is not suitable in the dry season. According to Equation (11), the runoff in the dry season is relatively low, and thus even a small deviation will lead to a large relative error. The performance of the CEEMD-LSSVM and LSSVM models is shown in Figure 6. It can be seen that CEEMD-LSSVM is significantly better than the LSSVM model for the forecasting of runoff, especially in the peak forecasting, which proves that the overall accuracy of the LSSVM model can be improved by the CEEMD method. Therefore, the decomposition can be helpful to transform non-linear and non-stationary time series to stationary time series and can be useful to improve the forecasting capacity. It can also be seen that in terms of minimum values, the CEEMD-LSSVM model does not perform well and even predicts negative values.

MAE, RMSE and MARE were used to evaluate the forecasting accuracy in each month (Table 3). It is obvious that the CEEMD-LSSVM model is very suitable for runoff forecasting in the wet season, but the prediction effect in the dry season is not as good as the NNBR model. As shown in Table 3, for the STX Reservoir, the CEEMD-LSSVM model predicts better results than the other models from May to October, but in other months it is not as accurate as the NNBR model. The scatter plots of the runoff predictions and observations show that in the runoff prediction process of the reservoir, the prediction accuracy of NNBR when the runoff is small is higher than that of the CEEMD method, as shown in Figure 7. Figure 7(a) and 7(b) show the comparison of the NNBR model in the wet season and the dry season, respectively. Figure 7(c) shows the comparison between the observed and predicted values of the NNBR model in the dry season, respectively. It can be seen that the scatter points in Figure 7(b) is more scattered and do not cluster near the diagonal. The scatter points in Figure 7(a) and 7(c) are distributed near the diagonal.

Model . | R . | NS . | MAE . | MARE (%) . | RMSE . |
---|---|---|---|---|---|

CEEMD-LSSVM | 0.978 | 0.954 | 7.562 | 26.864 | 10.235 |

LSSVM | 0.903 | 0.815 | 13.167 | 32.167 | 20.654 |

Model . | R . | NS . | MAE . | MARE (%) . | RMSE . |
---|---|---|---|---|---|

CEEMD-LSSVM | 0.978 | 0.954 | 7.562 | 26.864 | 10.235 |

LSSVM | 0.903 | 0.815 | 13.167 | 32.167 | 20.654 |

Model . | Criteria . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CEEMD-LSSVM | MAE | 5 | 3 | 6 | 7 | 7 | 10 | 13 | 8 | 16 | 5 | 10 | 3 |

MARE (%) | 58 | 43 | 54 | 27 | 18 | 16 | 9 | 7 | 14 | 8 | 40 | 28 | |

RMSE | 7 | 4 | 6 | 8 | 8 | 12 | 17 | 9 | 18 | 5 | 13 | 4 | |

LSSVM | MAE | 3 | 5 | 4 | 4 | 11 | 15 | 34 | 31 | 25 | 16 | 8 | 2 |

MARE (%) | 38 | 68 | 46 | 17 | 33 | 24 | 25 | 27 | 22 | 29 | 34 | 24 | |

RMSE | 3 | 6 | 6 | 5 | 13 | 20 | 41 | 37 | 29 | 22 | 9 | 3 | |

NNBR | MAE | 1.5 | 1.6 | 1.4 | 1.4 | 1.9 | 0.9 | ||||||

MARE (%) | 18.5 | 19.3 | 13.1 | 5.7 | 8.8 | 8.3 | |||||||

RMSE | 1.9 | 1.8 | 1.6 | 1.7 | 2.0 | 1.2 |

Model . | Criteria . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CEEMD-LSSVM | MAE | 5 | 3 | 6 | 7 | 7 | 10 | 13 | 8 | 16 | 5 | 10 | 3 |

MARE (%) | 58 | 43 | 54 | 27 | 18 | 16 | 9 | 7 | 14 | 8 | 40 | 28 | |

RMSE | 7 | 4 | 6 | 8 | 8 | 12 | 17 | 9 | 18 | 5 | 13 | 4 | |

LSSVM | MAE | 3 | 5 | 4 | 4 | 11 | 15 | 34 | 31 | 25 | 16 | 8 | 2 |

MARE (%) | 38 | 68 | 46 | 17 | 33 | 24 | 25 | 27 | 22 | 29 | 34 | 24 | |

RMSE | 3 | 6 | 6 | 5 | 13 | 20 | 41 | 37 | 29 | 22 | 9 | 3 | |

NNBR | MAE | 1.5 | 1.6 | 1.4 | 1.4 | 1.9 | 0.9 | ||||||

MARE (%) | 18.5 | 19.3 | 13.1 | 5.7 | 8.8 | 8.3 | |||||||

RMSE | 1.9 | 1.8 | 1.6 | 1.7 | 2.0 | 1.2 |

*Note:* the bold values are the best values.

Therefore, in the runoff forecasting for the season, the accuracy of the CEEMD-LSSVM model is considerable; in the runoff forecasting for the dry season, the simulation effect of the NNBR model is better than the CEEMD-LSSVM model.

It can be seen from the Figure 7(b) that the runoff in the dry season forecast by the CEEMD model is even negative, because the runoff sequence decomposed by CEEMD is longer, there are more components obtained by decomposition the amplitude of the high-frequency term is larger, the fluctuation range is larger, and there is more noise. Each component is separately forecasted, and there may be accumulated errors when linearly superimposed, resulting in low forecasting performance or even negative values at minimum values.

A Taylor diagram can quantify the correspondence ratio between the simulation results and the measured data, including the correlation coefficient, RMSE and the standard deviation (Qiang *et al.* 2018). As shown in Figure 8, the standard deviations of the LSSVM model, the CEEMD-LSSVM model and the combined model are 45.03, 46.29, 46.51, respectively; RMSE values are 20.65, 10.23, 9.09, respectively; R values are 0.903, 0.978, 0.985 respectively. Although the standard deviation of the combined model is slightly larger than the other models, the RMSE and R values are smaller and higher than for the other models, respectively. In general, the combined model can effectively improve the accuracy and can be applied to the forecasting of the inflow runoff of the STX Reservoir.

Figure 9 shows that the forecasting accuracy in both the wet and dry seasons can be significantly improved. Table 4 shows that compared with the CEEMD-LSSVM model, the R and NS values are improved from 0.978 and 0.954 to 0.985 and 0.966, while the RMSE, MARE and MAE values are reduced from 10.235, 26.864% and 7.562 to 8.835, 12.112% and 5.563 for values of STX Reservoir.

Model . | R . | NS . | MAE . | MARE(%) . | RMSE . |
---|---|---|---|---|---|

CEEMD-LSSVM | 0.978 | 0.954 | 7.562 | 26.864 | 10.235 |

Combination | 0.985 | 0.966 | 5.563 | 12.112 | 8.835 |

Model . | R . | NS . | MAE . | MARE(%) . | RMSE . |
---|---|---|---|---|---|

CEEMD-LSSVM | 0.978 | 0.954 | 7.562 | 26.864 | 10.235 |

Combination | 0.985 | 0.966 | 5.563 | 12.112 | 8.835 |

In summary, values of results of values of analysis demonstrate that the proposed CEEMD-LSSVM model is able to attain better results than values of LSSVM model, a drastic improvement in terms of different evaluation measures for monthly runoff time series forecasting. This also indicates that the idea of decomposition-prediction-reconstruction is feasible and the proposed CEEMD-LSSVM model can overcome values of drawbacks of individual models by generating a synergetic effect in forecasting.

## CONCLUSIONS

In this study, runoff forecasting methods based on a decomposition-prediction-reconstruction model with the CEEMD, LSSVM and NNBR models were used to forecast runoff. The accuracy of the three forecasting methods was compared and analyzed. The forecasting methods suitable for different months were selected to forecast the runoff of the STX Reservoir by stages. The main conclusions are as follows:

(1) It is proved that the runoff decomposition model based on CEEMD can effectively identify the characteristic information of the original runoff series, and decompose the runoff series into several IMF components and one residual quantity whose frequencies are from high to low.

(2) The CEEMD-LSSVM model can significantly improve the accuracy of the dry season prediction, but it is relatively low for the dry season prediction.

(3) The NNBR model can better characterize the autocorrelation of runoff and has little variation in numerical fluctuations, so the NNBR model shows better performance than the other models in dry season prediction. It is therefore suggested that the monthly runoff forecasting for the STX Reservoir should combine the NNBR model with the CEEMD-LSSVM model, to use CEEMD-LSSVM model in the wet season, while in the dry season, the NNBR method with relatively high accuracy is used to forecast runoff, with the two methods being combined in stages to improve the prediction accuracy of inflow runoff.

The method in this study provided reliable results to simulate the runoff data to can provide support for decision-making and risk analysis in reservoir operation. In the future, the techniques can be applied to different reservoirs. In theory, because of the complexity of the hydrological system, in order to achieve more accurate prediction, it is necessary to analyze the physical meaning of each sub-series. With the increase of global temperature and atmospheric humidity, there will be more and more extreme hydrometeorological events such as extreme precipitation and drought. Reservoirs' real-time operation will be affected by extreme runoff more frequently. Consequently, when the trajectory of future hydrological elements no longer follows historical data, the prediction accuracy of the model needs to be improved.

The uncertainty of runoff prediction error should be considered in future reservoir real-time operation, so as to improve the reservoir real-time operation potentiality. This will be the focus of future research.

## ACKNOWLEDGEMENTS

This research was supported by the National Key R&D Program of China (2017YFC0406004; 2018YFC0407303), The National Natural Science Foundation of China (51979038, 51569003, 51909033), the Heilongjiang Postdoctoral Fund (LBH-Z18020), and the Natural Science Foundation of Heilongjiang Province of China (E2015024; LH2019E010).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available readers should contact the corresponding author for details.