Abstract
Monthly runoff forecasting has always been a key problem in water resources management. As a data-driven method, the least square support vector machine (LSSVM) method has been investigated by numerous studies in runoff forecasting. However, selecting appropriate parameters for LSSVM is the key to obtaining satisfactory model performance. In this study, we propose a hybrid model for monthly runoff forecasting, VMD-SSA-LSSVM for short, which combines variational mode decomposition (VMD) with LSSVM and the parameters of LSSVM are optimized by a sparrow search algorithm (SSA). Firstly, VMD is utilized to decompose the original time series data into several subsequences. Secondly, LSSVM is employed to simulate each subsequence, for which the parameters are optimized by SSA. Finally, the simulated results for each subsequence are accumulated as the final results. The validity of the proposed model was verified by forecasting monthly runoff for two reservoirs located in China. Four frequently-used statistical indexes, namely the Nash efficiency coefficient, root mean squared error, correlation coefficient and mean absolute percentage error were used to evaluate model performance. The results demonstrate the superiority of VMD-SSA-LSSVM over the compared models in terms of all statistical indexes, indicating that it is beneficial for enhancing monthly runoff forecast accuracy.
HIGHLIGHTS
A hybrid model of VMD-SSA-LSSVM is proposed for monthly runoff forecasting.
VMD is utilized to decompose monthly runoff data into several subsequences.
LSSVM is employed to simulate each subsequence and the SSA is used to optimize the parameters of LSSVM.
The forecast results for each subsequence are aggregated as the final forecast results.
The results verify the superiority of the proposed model.
INTRODUCTION
Monthly runoff forecasting with acceptable accuracy plays a key role in water resources management (Xu & Yang 2019), such as irrigation, water supply and hydroelectric power generation. However, affected by the influence of multiple factors, such as climate change and underlying surface variation, monthly runoff is characterized by nonlinearity, randomness and nonstationarity, which is difficult to simulate (Ji et al. 2020). Many models reported in the literature have attempted to simulate monthly runoff, of which data-driven models that are easy to implement without calling for amounts of data and comprehending the underlying physical process can be seen as an important category (Ashrafi et al. 2017; Liao et al. 2020; Lin et al. 2021). With the development of AI techniques (Schmidhuber 2015), many machine learning methods have been successfully applied in monthly runoff forecasting (Zuo et al. 2020; Feng et al. 2021b; Roushangar et al. 2021). Hence, to further detect the potential of exiting machine learning methods in monthly runoff forecasting, it is necessary to study how to enhance model performance.
As a typical machine learning method, the support vector machine (SVM) method is calibrated utilizing a structural risk minimization principle to minimize an upper bound of the generalization error (Liu et al. 2014). Although SVM has been widely applied in hydrological forecasting, some drawbacks like low execution efficiency and high computation cost may exist (Barman & Dev Choudhury 2020; Niu et al. 2021). To overcome these defects, the least-squares support vector machine (LSSVM) method was proposed. Compared to SVM, LSSVM adopts linear equality constraints rather than quadratic programming to solve complex problems and can get better model performance. Consequently, many applications referring to LSSVM can be found in hydrological forecasting. However, the model parameters of LSSVM affect model performance to a great extent and are not easy manually set (Ashrafi et al. 2017). So far, two main effective approaches, namely data decomposition (Tan et al. 2018) and optimization algorithms (Zheng et al. 2017) are usually employed to yield better performance for the LSSVM model. For instance, Zhao et al. (2017) applied an empirical mode decomposition-based chaotic LSSVM hybrid model for annual runoff forecasting and investigated it with runoff data from four sites. Guo et al. (2021) utilized an improved quantum-based grey wolf optimizer (GWO) algorithm to optimize the parameters of LSSVM to enhance the performance of a multi-step-ahead forecast for reservoir water availability and verified the proposed model's performance with two time series of reservoir water availability in Zhoushan Islands, China. Niu et al. (2021) developed a hybrid hydrological forecasting method combining LSSVM, ensemble empirical mode decomposition and a gravitational search algorithm, and exhibited its superior performance over several control models in monthly runoff for two hydrological stations in China.
Variational mode decomposition (VMD), proposed by Dragomiretskiy & Zosso (2014), is an entirely non-recursive tool that can concurrently extract several band-limited intrinsic modes from complex data signals. Via VMD, time series data with the characteristics of high complexity and nonlinearity can be decomposed into several subsequences with different frequency scales. In addition, VMD avoids the defect of modal aliasing faced by the traditional decomposition method. Therefore, VMD has become an important decomposition method in many fields such as signal processing, wind speed prediction (Liu et al. 2018), fault diagnosis and time series prediction. Selection of an algorithm with strong optimization ability and good robustness should be made before parameter calibration for modeling hydrological time series data. Inspired by the foraging behavior and anti-predation behavior of sparrows, the sparrow search algorithm (SSA) was proposed to solve global optimization problems (Xue & Shen 2020). As a new swarm intelligent algorithm, SSA has the merits of strong optimization ability and fast convergence speed and has been utilized to optimize model parameters for time series prediction and other fields. Many applications of SSA in the literature have verified that the SSA method outperforms several mature and commonly used evolutionary algorithms, like GWO and particle swarm optimization in terms of solution precision, search rate and local minimum avoidance (Feng et al. 2021a). Hence, the SSA method can be used to optimize the parameters of LSSVM to obtain higher forecast accuracy for monthly runoff forecasting. To our knowledge, VMD and SSA-based LSSVM has not previously been applied in hydrological forecasting.
In this study, a hybrid model of VMD and SSA-based LSSVM (VMD-SSA-LSSVM) was proposed for monthly runoff forecasting. The main aspects of this research are as follows: (1) VMD is employed to decompose monthly runoff data into several subsequences to decrease modeling difficulty. (2) For modeling each subsequence, SSA is used to optimize the LSSVM hyperparameters. (3) Compared to several models, the proposed model provides better forecast accuracy. The proposed model is verified with monthly runoff data from two reservoirs located in China.
The rest of the paper is organized as follows: the basic principle of the VMD, LSSVM and SSA methods are presented and the hybrid model VMD-SSA-LSSVM is briefly illustrated. The details of the study area and data and develops forecast models are then introduced, before the statistics of the forecast results are presented and the results discussed. Finally, the conclusion is presented.
METHODOLOGY
Variational mode decomposition
In VMD, an adaptively and completely non-recursive modal variational method is utilized, which can adaptively determine the center frequency and bandwidth, and effectively separate the inherent modal components to obtain effective decomposition components of a signal, and finally solve the variational optimal solution. The main steps of VMD are as follows:
- (1)
Obtain the analytical signal related to each modal function through the Hilbert transform and construct the frequency spectrum.
- (2)
Estimate the exponential mixing of the center frequency for each mode, and shift the spectrum of the mode function to the baseband.
- (3)Use Gaussian smoothing to estimate the demodulated signal, that is, the L2-norm of the gradient. Hence, the constraint problem can be gained and expressed as:where
is the k-th mode component;
is the center frequency corresponding to the k-th mode;
is the Dirac distribution; * represents the convolution calculation and
denotes the t-th data of the input signal.
- In order to solve the above constrained problem, a non-constrained problem is constructed and the alternate direction method of multipliers (ADMM) algorithm with iterative updates is used to finally obtain the optimal solution of the model. The augmented Lagrangian structure is as follows:where α is the secondary penalty factor, λ is the Lagrangian multiplier and
denotes the inner product operation.














Least square vector support machine method









Sparrow search algorithm
The sparrow Search Algorithm (SSA), which mimics the foraging and anti-predation behavior of sparrows, is characterized by strong robustness and global searching ability. There are two categories of captive house sparrows, namely producers and scroungers. The producers have high levels of energy reserves and are responsible for searching for food sources, while the scroungers follow the producers to search for food. Once the sparrows find predators, they send alarm signals and move towards safe areas. Each sparrow can become a producer, but the ratio of the two categories is changeless. The main content of an SSA are briefly described below.






If , it means there are no predators around and the foraging environment is safe. By contrast, if
, it means there are predators in the surroundings and all sparrows should fly to other safe areas.







Hybrid model for monthly runoff forecasting
In order to improve the accuracy of monthly runoff prediction, a hybrid model of VMD-SSA-LSSVM is proposed. The specific flow chart of the model is shown in Figure 1. The main steps of the model are as follows:
Step1: Decomposition. Via VMD, the original monthly runoff data is decomposed into K relatively simple subsequences and each subsequence is normalized to [−1, 1]. The input variables of each subsequence are determined by using the partial autocorrelation function (PACF).
Step2: Parameter calibration. Initialize the parameters of the SSA algorithm and utilize SSA to calibrate the parameters of the LSSVM for each subsequence.
Step3: Aggregations. The forecast results for each subsequence are aggregated as the final forecast results.
Evaluation index




CASE STUDY
Study area and data
The Xinfengjiang and Guangzhao reservoirs, located in southern and southwestern China, respectively, were selected as study areas, as shown in Figure 2. The Xinfengjiang Reservoir is located on the Xinfeng River, a tributary of the Dong River in the Pearl River System. For the Xinfengjiang Reservoir, the catchment area is 5,740 km2 and the average annual runoff is 192 m3/s. With a total storage capacity of 13.896 billion (13.896 × 109) m3 and installed capacity of 315 MW, the Xinfengjiang Reservoir is a large-scale water conservancy project with multipurpose comprehensive utilization that mainly focuses on power generation, as well as flood control, water supply, and shipping. Benefitting from carryover storage, the average annual power generation of the Xinfengjiang Reservoir can reach 1.172 billion kWh. The Guangzhao Reservoir is located in the middle section of the Beipan River, a tributary of the Xi River in the Pearl River system. For the Guangzhao Reservoir, the drainage area is 13,548 km2 and the average annual runoff is 257 m3/s. With a total storage capacity of 3.245 billion m3 and installed capacity of 1,040MW, the Guangzhao Reservior is also a large-scale water conservancy project with multipurpose comprehensive utilization that mainly focuses on power generation, as well as water supply, navigation and irrigation. The average annual power generation of the GuangZhao Reservoir is 2.754 billion kWh.
In this study, the data obtained from the Xinfengjiang Reservoir covers from 1943 to 2015, of which the data from 1943 to 2002 was used for calibration and the rest was used for validation. Similarly, the data obtained from the Guangzhao Reservoir covers from 1956 to 2017, of which the data from 1956 to 2004 were used for calibration and the rest were used for validation.
Data decomposition
Although VMD can be employed to decompose the time series data effectively, the appropriate number of modes, K, needs to be predetermined. When the K value is selected to be large, the center frequencies of adjacent modal components will be close to each other, resulting in frequency aliasing. When the K value is selected to be small, some information in the original signal will be missed, which affects the prediction accuracy. To reduce the complexity of monthly runoff data and extract features more efficiently, the center frequency method revealed in the literature (Xie et al. 2019) is utilized to determine the number of modes of K. The center frequencies with different K values for the Xinfengjiang and Guangzhao Reservoirs are presented in Tables 1 and 2, respectively. From Table 1, it can be found that when K = 8, the value of the central frequency is relatively larger. Although when K = 9 or K = 10 the value of the central frequency is slightly larger, the difference between the center frequencies of adjacent sequences is becoming smaller and smaller, and modal mixing can occur easily. Hence, K = 8 can be selected as the number of modes for the decomposition of monthly runoff data for the Xinfengjiang Reservoir. Similarly, for the Guangzhao Reservoir, K = 7 can be selected as the number of modes. The decomposition results of Xinfengjiang and Guangzhao Reservoirs are shown in Figures 3 and 4, respectively. It is obvious that the significant difference existing in each subsequence indicates that the complex information implicated in the original monthly runoff data can be effectively extracted.
Center frequencies with different K values for the Xinfengjiang Reservoir
Modal number . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | IMF10 . |
---|---|---|---|---|---|---|---|---|---|---|
k = 1 | 0.2825 | |||||||||
k = 2 | 0.3241 | 247.8243 | ||||||||
k = 3 | 0.3033 | 162.5706 | 332.1366 | |||||||
k = 4 | 0.2048 | 85.1859 | 248.6603 | 367.8956 | ||||||
k = 5 | 0.1964 | 83.2899 | 168.3411 | 268.5195 | 415.3279 | |||||
k = 6 | 0.1935 | 83.0461 | 165.9784 | 250.3814 | 333.4556 | 418.3235 | ||||
k = 7 | 0.1933 | 82.8716 | 164.7413 | 234.6214 | 287.1515 | 364.0949 | 421.5359 | |||
k = 8 | 0.1848 | 82.3207 | 120.9071 | 168.7265 | 250.2942 | 328.5215 | 374.0438 | 451.1441 | ||
k = 9 | 0.1827 | 82.0981 | 116.9117 | 166.9419 | 242.6613 | 285.4839 | 333.1360 | 377.3509 | 453.3533 | |
k = 10 | 0.1204 | 49.7305 | 83.6972 | 164.3951 | 202.8153 | 249.8869 | 290.8583 | 362.6170 | 415.4588 | 460.7523 |
Modal number . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | IMF10 . |
---|---|---|---|---|---|---|---|---|---|---|
k = 1 | 0.2825 | |||||||||
k = 2 | 0.3241 | 247.8243 | ||||||||
k = 3 | 0.3033 | 162.5706 | 332.1366 | |||||||
k = 4 | 0.2048 | 85.1859 | 248.6603 | 367.8956 | ||||||
k = 5 | 0.1964 | 83.2899 | 168.3411 | 268.5195 | 415.3279 | |||||
k = 6 | 0.1935 | 83.0461 | 165.9784 | 250.3814 | 333.4556 | 418.3235 | ||||
k = 7 | 0.1933 | 82.8716 | 164.7413 | 234.6214 | 287.1515 | 364.0949 | 421.5359 | |||
k = 8 | 0.1848 | 82.3207 | 120.9071 | 168.7265 | 250.2942 | 328.5215 | 374.0438 | 451.1441 | ||
k = 9 | 0.1827 | 82.0981 | 116.9117 | 166.9419 | 242.6613 | 285.4839 | 333.1360 | 377.3509 | 453.3533 | |
k = 10 | 0.1204 | 49.7305 | 83.6972 | 164.3951 | 202.8153 | 249.8869 | 290.8583 | 362.6170 | 415.4588 | 460.7523 |
Center frequencies with different K values for the Guangzhao Reservoir
Modal number . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | IMF10 . |
---|---|---|---|---|---|---|---|---|---|---|
k = 1 | 0.1984 | |||||||||
k = 2 | 0.2212 | 195.4229 | ||||||||
k = 3 | 0.2145 | 161.5596 | 324.8544 | |||||||
k = 4 | 0.1202 | 83.7359 | 217.3362 | 339.0131 | ||||||
k = 5 | 0.1167 | 83.3431 | 165.7438 | 249.7721 | 365.5804 | |||||
k = 6 | 0.1164 | 83.2808 | 165.4430 | 245.2936 | 330.8657 | 445.2398 | ||||
k = 7 | 0.1159 | 83.1669 | 164.9476 | 241.7157 | 313.6208 | 372.7494 | 456.1387 | |||
k = 8 | 0.1121 | 82.8604 | 128.6639 | 167.0464 | 245.3449 | 322.3837 | 376.5449 | 456.8794 | ||
k = 9 | 0.0792 | 83.1053 | 55.9774 | 165.5323 | 240.1955 | 282.5769 | 329.6499 | 379.4012 | 458.2014 | |
k = 10 | 0.1035 | 83.0164 | 83.3154 | 165.3954 | 216.5814 | 248.1914 | 291.5010 | 332.8500 | 381.0694 | 458.6907 |
Modal number . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | IMF10 . |
---|---|---|---|---|---|---|---|---|---|---|
k = 1 | 0.1984 | |||||||||
k = 2 | 0.2212 | 195.4229 | ||||||||
k = 3 | 0.2145 | 161.5596 | 324.8544 | |||||||
k = 4 | 0.1202 | 83.7359 | 217.3362 | 339.0131 | ||||||
k = 5 | 0.1167 | 83.3431 | 165.7438 | 249.7721 | 365.5804 | |||||
k = 6 | 0.1164 | 83.2808 | 165.4430 | 245.2936 | 330.8657 | 445.2398 | ||||
k = 7 | 0.1159 | 83.1669 | 164.9476 | 241.7157 | 313.6208 | 372.7494 | 456.1387 | |||
k = 8 | 0.1121 | 82.8604 | 128.6639 | 167.0464 | 245.3449 | 322.3837 | 376.5449 | 456.8794 | ||
k = 9 | 0.0792 | 83.1053 | 55.9774 | 165.5323 | 240.1955 | 282.5769 | 329.6499 | 379.4012 | 458.2014 | |
k = 10 | 0.1035 | 83.0164 | 83.3154 | 165.3954 | 216.5814 | 248.1914 | 291.5010 | 332.8500 | 381.0694 | 458.6907 |
Decomposed results of monthly runoff data for the Xinfengjiang Reservoir.
Decomposed results of monthly runoff data for the Guangzhao Reservoir.
Input determination
Choosing appropriate input variables for the proposed model is vital for gaining reliable forecast accuracy. The partial autocorrelation function (PACF) is usually utilized to determine the input variables. Generally, when using PACF for determining the input variable, when all PACF values fall into the confidence interval, the previous values can be selected as the input. The PACF values of the original data and decomposed subsequences of the Xinfengjiang and Guangzhao Reservoirs are depicted in Figures 5 and 6, respectively. Following Figures 5 and 6, the PACF values for the two reservoirs can be analyzed and determined and the selection of input variables are shown in Tables 3 and 4, respectively. It can be seen that the input numbers of the original data and the decomposed subsequences are not exactly the same, which illustrates the complex and changeable characteristics of monthly runoff data.
The selected input values of each series for the Xinfengjiang Reservoir
No. . | Series . | Input variables . | Numbers of input . |
---|---|---|---|
1 | Original | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 | 8 |
2 | IMF1 | xt-1,xt-2,xt-3 | 3 |
3 | IMF2 | xt-1,xt-2,xt-3,xt-4 | 4 |
4 | IMF3 | xt-1,xt-2,xt-3,xt-4 | 4 |
5 | IMF4 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 | 6 |
6 | IMF5 | xt-1,xt-2 | 2 |
7 | IMF6 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
8 | IMF7 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
9 | IMF8 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
No. . | Series . | Input variables . | Numbers of input . |
---|---|---|---|
1 | Original | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 | 8 |
2 | IMF1 | xt-1,xt-2,xt-3 | 3 |
3 | IMF2 | xt-1,xt-2,xt-3,xt-4 | 4 |
4 | IMF3 | xt-1,xt-2,xt-3,xt-4 | 4 |
5 | IMF4 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 | 6 |
6 | IMF5 | xt-1,xt-2 | 2 |
7 | IMF6 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
8 | IMF7 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
9 | IMF8 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
The selected input values of each series for the Guangzhao Reservoir
No. . | Series . | Input variables . | Numbers of input . |
---|---|---|---|
1 | Original | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 | 8 |
2 | IMF1 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
3 | IMF2 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 | 6 |
4 | IMF3 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
5 | IMF4 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
6 | IMF5 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 | 9 |
7 | IMF6 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 | 9 |
8 | IMF7 | xt-1,xt-2,xt-3,xt-4 | 4 |
No. . | Series . | Input variables . | Numbers of input . |
---|---|---|---|
1 | Original | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 | 8 |
2 | IMF1 | xt-1,xt-2,xt-3,xt-4,xt-5 | 5 |
3 | IMF2 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 | 6 |
4 | IMF3 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
5 | IMF4 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 | 7 |
6 | IMF5 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 | 9 |
7 | IMF6 | xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 | 9 |
8 | IMF7 | xt-1,xt-2,xt-3,xt-4 | 4 |
Model development
To confirm the rationality and feasibility of the proposed model, three models were utilized for comparison, namely PSO-SVM, SSA-SVM and SSA-LSSVM. In order to make the models more convincing, each model was run 20 times independently and the average of the results was selected as the final forecast results. For the PSO-SVM, SSA-SVM and SSA-LSSVM models, the raw monthly runoff data obtained from the two reservoirs was utilized for model development, and the input variables were set based on PACF values of the original series. In this study, the three parameters of PSO-SVM and SSA-SVM, i.e. the penalty factor, RBF kernel function parameter and insensitive loss coefficient, were optimized by PSO and SSA, respectively. For the SSA-LSSVM model, the two parameters, i.e. the penalty parameter and RBF kernel function, were optimized by SSA. For the VMD-SSA-LSSVM model, there are three steps. Firstly, the raw monthly runoff data was decomposed into subsequences via VMD. Secondly, the SSA-LSSVM model was employed to simulate each subsequence and the input variables for each subsequence was determined by PACF. Finally, the results for each subsequence were summarized as the final results.
RESULTS AND DISCUSSION
Forecast results
According to the methods mentioned above, the raw monthly runoff data and subsequences generated by VMD were simulated. The evaluation indexes of different models over the calibration and validation periods for the Xinfengjiang and Guangzhao Reservoirs are presented in Tables 5 and 6, respectively. It can be easily found from Table 5 that the VMD-SSA-LSSVM model performs best among the four models in terms of all the four evaluation indexes during the calibration and validation periods. For instance, when compared to the PSO-SVM model, the SSA-SVM model offers slightly better forecast accuracy with a reduction of 0.63% and 0.46% in respect of RMSE and MAPE and enhancement of 1.04% and 2.22%, in respect of R and CE, respectively, during the validation period, respectively. In comparison with the SSA-SVM model, the SSA-LSSVM model provides better forecast accuracy with a reduction of 2.19% in respect of RMSE and enhancement of 4.04% and 7.34% in respect of R and CE during the validation period for the Xinfengjiang Reservoir. Compared to the SSA-LSSVM model for Xinfengjiang Reservoir, the VMD-SSA-LSSVM model yields much better forecast accuracy with reduction of 60.97% and 52.67% in respect of RMSE and MAPE, and increment of 49.53% and 128.14% in respect of R and CE during the validation period, respectively. Similarly, as revealed in Table 6, the VMD-SSA-LSSVM model performs much better than the three benchmark models in terms of all the four evaluation indexes over the calibration and validation periods for the Guangzhao Reservoir. For instance, the VMD-SSA-LSSVM model yields better forecast accuracy with a reduction of 54.66% and 56.40% in respect of RMSE and MAPE, respectively, and an increment of 24.61% and 69.92% in respect of R and CE, respectively, compared to the SSA-LSSVM model during the validation period.
Comparison of evaluation indexes of the different models for the Xinfengjiang Reservoir
Model . | Calibration . | Validation . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAPE(%) . | R . | CE . | RMSE . | MAPE(%) . | R . | CE . | |
PSO-SVM | 148.527 | 57.330 | 0.685 | 0.448 | 164.528 | 63.769 | 0.610 | 0.363 |
SSA-SVM | 147.753 | 56.601 | 0.689 | 0.454 | 163.484 | 63.478 | 0.617 | 0.371 |
SSA-LSSVM | 139.983 | 65.320 | 0.717 | 0.510 | 159.910 | 84.879 | 0.642 | 0.398 |
VMD-SSA-LSSVM | 58.522 | 38.735 | 0.961 | 0.914 | 62.417 | 40.172 | 0.960 | 0.908 |
Model . | Calibration . | Validation . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAPE(%) . | R . | CE . | RMSE . | MAPE(%) . | R . | CE . | |
PSO-SVM | 148.527 | 57.330 | 0.685 | 0.448 | 164.528 | 63.769 | 0.610 | 0.363 |
SSA-SVM | 147.753 | 56.601 | 0.689 | 0.454 | 163.484 | 63.478 | 0.617 | 0.371 |
SSA-LSSVM | 139.983 | 65.320 | 0.717 | 0.510 | 159.910 | 84.879 | 0.642 | 0.398 |
VMD-SSA-LSSVM | 58.522 | 38.735 | 0.961 | 0.914 | 62.417 | 40.172 | 0.960 | 0.908 |
Comparison of evaluation indexes of the different models for the Guangzhao Reservoir
Model . | Calibration . | Validation . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAPE(%) . | R . | CE . | RMSE . | MAPE . | R . | CE . | |
PSO-SVM | 138.593 | 52.319 | 0.823 | 0.675 | 125.056 | 65.031 | 0.748 | 0.517 |
SSA-SVM | 133.098 | 48.412 | 0.839 | 0.700 | 123.478 | 61.896 | 0.762 | 0.529 |
SSA-LSSVM | 126.018 | 38.214 | 0.855 | 0.731 | 123.161 | 59.317 | 0.768 | 0.532 |
VMD-SSA-LSSVM | 54.491 | 17.668 | 0.976 | 0.950 | 55.844 | 25.861 | 0.957 | 0.904 |
Model . | Calibration . | Validation . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAPE(%) . | R . | CE . | RMSE . | MAPE . | R . | CE . | |
PSO-SVM | 138.593 | 52.319 | 0.823 | 0.675 | 125.056 | 65.031 | 0.748 | 0.517 |
SSA-SVM | 133.098 | 48.412 | 0.839 | 0.700 | 123.478 | 61.896 | 0.762 | 0.529 |
SSA-LSSVM | 126.018 | 38.214 | 0.855 | 0.731 | 123.161 | 59.317 | 0.768 | 0.532 |
VMD-SSA-LSSVM | 54.491 | 17.668 | 0.976 | 0.950 | 55.844 | 25.861 | 0.957 | 0.904 |
In order to visually display the comparison of forecast results, the hydrographs of the observed data and simulated results of the four models during the validation period are depicted in Figures 7 and 8. As shown in Figures 7 and 8, the hydrographs of the two reservoirs drastically vary with time, which indicates the complexity of the monthly runoff data and the difficulty to simulate them. It can be intuitively found that adopting a simple model such as PSO-SVM, SSA-SVM or SSA-LSSVM may lead to unsatisfactory forecast accuracy, while adopting a hybrid model such as VMD-SSA-LSSVM can very effectively solve this problem.
Comparison of the forecast results for the Xinfengjiang Reservoir during the validation period.
Comparison of the forecast results for the Xinfengjiang Reservoir during the validation period.
Comparison of the forecast results for the Guangzhao Reservoir during the validation period.
Comparison of the forecast results for the Guangzhao Reservoir during the validation period.
In order to further comprehend the fitting degree of the models, scatter diagrams of the four models for the Xinfengjiang and Guangzhao Reservoirs during the validation period are drawn in Figures 9 and 10, respectively. Generally, the closer the scatters are to the 45 degree slanted line, the better model performance it shows. As shown in Figures 9 and 10, the scatters of the VMD-SSA-LSSVM model are more concentrated around the 45 degree slanted line compared to the other three models, which indicates that the VMD-SSA-LSSVM model is more reliable and feasible.
Comparison of scatter diagrams of forecast results for the Xinfengjiang Reservoir during the validation period.
Comparison of scatter diagrams of forecast results for the Xinfengjiang Reservoir during the validation period.
Comparison of scatter diagrams of forecast results for the Guangzhao Reservoir during the validation period.
Comparison of scatter diagrams of forecast results for the Guangzhao Reservoir during the validation period.
In order to evaluate the performance of the VMD-SSA-LSSVM model in peak flow forecasting, peak flow estimates of the four models over the validation period for the Xinfengjiang and Guangzhao Reservoirs were carried out by statistical analysis. As shown in Table 7, the absolute averages of the relative error for the PSO-SVM, SSA-SVM, SSA-LSSVM and VMD-SSA-LSSVM for forecasting the 15 peak flows are 48.91%, 48.10%, 43.81% and 17.74%, respectively. Similarly, as shown in Table 8, the absolute averages of the relative error for the PSO-SVM, SSA-SVM, SSA-LSSVM and VMD-SSA-LSSVM for forecasting the 13 peak flows are 25.56%, 25.30%, 25.24% and 12.83%, respectively. It can be easily concluded that the VMD-SSA-LSSVM model yields much better forecast accuracy than PSO-SVM, SSA-SVM and SSA-LSSVM in terms of peak flow forecast.
Peak flow estimates of different models for the Xinfengjiang Reservoir during the validation period
Peak No. . | Original . | PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | Relative error (%) . | |||
---|---|---|---|---|---|---|---|---|---|
PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | ||||||
1 | 618.00 | 333.59 | 332.19 | 400.46 | 463.25 | −46.02 | −46.25 | −35.20 | −25.04 |
2 | 353.00 | 314.28 | 326.47 | 396.77 | 319.31 | −10.97 | −7.52 | 12.40 | −9.54 |
3 | 336.00 | 227.57 | 230.92 | 341.28 | 327.34 | −32.27 | −31.28 | 1.57 | −2.58 |
4 | 203.00 | 275.99 | 284.80 | 337.67 | 300.13 | 35.96 | 40.29 | 66.34 | 47.85 |
5 | 1,496.00 | 343.33 | 356.83 | 432.71 | 1,138.34 | −77.05 | −76.15 | −71.08 | −23.91 |
6 | 783.82 | 402.58 | 416.11 | 434.02 | 694.35 | −48.64 | −46.91 | −44.63 | −11.41 |
7 | 687.50 | 226.27 | 231.49 | 334.10 | 681.91 | −67.09 | −66.33 | −51.40 | −0.81 |
8 | 1,066.00 | 242.20 | 248.53 | 332.90 | 850.30 | −77.28 | −76.69 | −68.77 | −20.23 |
9 | 228.20 | 218.46 | 224.87 | 292.54 | 340.36 | −4.27 | −1.46 | 28.19 | 49.15 |
10 | 867.50 | 366.19 | 375.70 | 446.02 | 709.82 | −57.79 | −56.69 | −48.59 | −18.18 |
11 | 369.60 | 124.79 | 129.25 | 158.42 | 445.00 | −66.24 | −65.03 | −57.14 | 20.40 |
12 | 442.30 | 387.53 | 389.48 | 450.46 | 452.05 | −12.38 | −11.94 | 1.84 | 2.20 |
13 | 860.90 | 302.93 | 313.74 | 390.72 | 709.38 | −64.81 | −63.56 | −54.61 | −17.60 |
14 | 616.20 | 233.46 | 240.68 | 312.03 | 572.00 | −62.11 | −60.94 | −49.36 | −7.17 |
15 | 493.63 | 144.46 | 145.40 | 167.96 | 444.38 | −70.74 | −70.55 | −65.98 | −9.98 |
Average (absolute) | 48.91 | 48.10 | 43.81 | 17.74 |
Peak No. . | Original . | PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | Relative error (%) . | |||
---|---|---|---|---|---|---|---|---|---|
PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | ||||||
1 | 618.00 | 333.59 | 332.19 | 400.46 | 463.25 | −46.02 | −46.25 | −35.20 | −25.04 |
2 | 353.00 | 314.28 | 326.47 | 396.77 | 319.31 | −10.97 | −7.52 | 12.40 | −9.54 |
3 | 336.00 | 227.57 | 230.92 | 341.28 | 327.34 | −32.27 | −31.28 | 1.57 | −2.58 |
4 | 203.00 | 275.99 | 284.80 | 337.67 | 300.13 | 35.96 | 40.29 | 66.34 | 47.85 |
5 | 1,496.00 | 343.33 | 356.83 | 432.71 | 1,138.34 | −77.05 | −76.15 | −71.08 | −23.91 |
6 | 783.82 | 402.58 | 416.11 | 434.02 | 694.35 | −48.64 | −46.91 | −44.63 | −11.41 |
7 | 687.50 | 226.27 | 231.49 | 334.10 | 681.91 | −67.09 | −66.33 | −51.40 | −0.81 |
8 | 1,066.00 | 242.20 | 248.53 | 332.90 | 850.30 | −77.28 | −76.69 | −68.77 | −20.23 |
9 | 228.20 | 218.46 | 224.87 | 292.54 | 340.36 | −4.27 | −1.46 | 28.19 | 49.15 |
10 | 867.50 | 366.19 | 375.70 | 446.02 | 709.82 | −57.79 | −56.69 | −48.59 | −18.18 |
11 | 369.60 | 124.79 | 129.25 | 158.42 | 445.00 | −66.24 | −65.03 | −57.14 | 20.40 |
12 | 442.30 | 387.53 | 389.48 | 450.46 | 452.05 | −12.38 | −11.94 | 1.84 | 2.20 |
13 | 860.90 | 302.93 | 313.74 | 390.72 | 709.38 | −64.81 | −63.56 | −54.61 | −17.60 |
14 | 616.20 | 233.46 | 240.68 | 312.03 | 572.00 | −62.11 | −60.94 | −49.36 | −7.17 |
15 | 493.63 | 144.46 | 145.40 | 167.96 | 444.38 | −70.74 | −70.55 | −65.98 | −9.98 |
Average (absolute) | 48.91 | 48.10 | 43.81 | 17.74 |
Peak flow estimates of different models for the Guangzhao Reservoir during the validation period
Peak NO. . | Original . | PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | Relative error (%) . | |||
---|---|---|---|---|---|---|---|---|---|
PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | ||||||
1 | 580.18 | 361.77 | 374.67 | 389.52 | 530.95 | −37.65 | −35.42 | −32.86 | −8.48 |
2 | 580.51 | 280.45 | 282.91 | 284.68 | 519.43 | −51.69 | −51.27 | −50.96 | −10.52 |
3 | 656.24 | 587.87 | 633.46 | 638.55 | 632.34 | −10.42 | −3.47 | −2.70 | −3.64 |
4 | 526.04 | 530.39 | 581.37 | 610.27 | 492.89 | 0.83 | 10.52 | 16.01 | −6.30 |
5 | 500.19 | 411.55 | 451.98 | 512.38 | 504.09 | −17.72 | −9.64 | 2.44 | 0.78 |
6 | 552.69 | 540.49 | 570.02 | 587.45 | 488.44 | −2.21 | 3.14 | 6.29 | −11.62 |
7 | 320.45 | 356.60 | 382.99 | 406.07 | 357.82 | 11.28 | 19.52 | 26.72 | 11.66 |
8 | 534.98 | 546.28 | 564.83 | 562.16 | 440.09 | 2.11 | 5.58 | 5.08 | −17.74 |
9 | 219.93 | 329.72 | 340.01 | 347.34 | 308.74 | 49.92 | 54.60 | 57.93 | 40.38 |
10 | 614.08 | 484.71 | 516.41 | 538.46 | 517.45 | −21.07 | −15.90 | −12.31 | −15.74 |
11 | 547.96 | 400.82 | 418.45 | 413.50 | 495.05 | −26.85 | −23.64 | −24.54 | −9.66 |
12 | 423.92 | 257.91 | 261.23 | 271.37 | 472.72 | −39.16 | −38.38 | −35.99 | 11.51 |
13 | 1,199.60 | 463.21 | 506.16 | 548.65 | 975.34 | −61.39 | −57.81 | −54.26 | −18.69 |
Average (absolute) | 25.56 | 25.30 | 25.24 | 12.83 |
Peak NO. . | Original . | PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | Relative error (%) . | |||
---|---|---|---|---|---|---|---|---|---|
PSO-SVM . | SSA-SVM . | SSA-LSSVM . | VMD-SSA-LSSVM . | ||||||
1 | 580.18 | 361.77 | 374.67 | 389.52 | 530.95 | −37.65 | −35.42 | −32.86 | −8.48 |
2 | 580.51 | 280.45 | 282.91 | 284.68 | 519.43 | −51.69 | −51.27 | −50.96 | −10.52 |
3 | 656.24 | 587.87 | 633.46 | 638.55 | 632.34 | −10.42 | −3.47 | −2.70 | −3.64 |
4 | 526.04 | 530.39 | 581.37 | 610.27 | 492.89 | 0.83 | 10.52 | 16.01 | −6.30 |
5 | 500.19 | 411.55 | 451.98 | 512.38 | 504.09 | −17.72 | −9.64 | 2.44 | 0.78 |
6 | 552.69 | 540.49 | 570.02 | 587.45 | 488.44 | −2.21 | 3.14 | 6.29 | −11.62 |
7 | 320.45 | 356.60 | 382.99 | 406.07 | 357.82 | 11.28 | 19.52 | 26.72 | 11.66 |
8 | 534.98 | 546.28 | 564.83 | 562.16 | 440.09 | 2.11 | 5.58 | 5.08 | −17.74 |
9 | 219.93 | 329.72 | 340.01 | 347.34 | 308.74 | 49.92 | 54.60 | 57.93 | 40.38 |
10 | 614.08 | 484.71 | 516.41 | 538.46 | 517.45 | −21.07 | −15.90 | −12.31 | −15.74 |
11 | 547.96 | 400.82 | 418.45 | 413.50 | 495.05 | −26.85 | −23.64 | −24.54 | −9.66 |
12 | 423.92 | 257.91 | 261.23 | 271.37 | 472.72 | −39.16 | −38.38 | −35.99 | 11.51 |
13 | 1,199.60 | 463.21 | 506.16 | 548.65 | 975.34 | −61.39 | −57.81 | −54.26 | −18.69 |
Average (absolute) | 25.56 | 25.30 | 25.24 | 12.83 |
DISCUSSION
According to the forecast results yielded by PSO-SVM and SSA-SVM, it can be seen that there are slight differences in respect of the four statistical indexes, indicating the significance of selecting an appropriate optimization algorithm for model parameter calibration. Standard SVM utilizing the structural risk-minimization principle can gain good generalization performance. However, the performance of SVM usually depends on the optimization algorithm to calibrate the parameters. Even though PSO has been successfully used in solving optimization problems, PSO has to face the drawback of easy premature convergence (Shi et al. 2005). As a newly proposed optimization algorithm, SSA has strong global optimum ability and can effectively avoid the local optimum issue. Hence, compared with PSO, SSA affords slightly better optimization abilities. The problem that the SSA-LSSVM performs slightly better than the SSA-SVM model can be ascribed to the fact that LSSVM originated from SVM and can be seen as an improvement on it. Compared with SVM, LSSVM utilizes a set of linear equations for calibrating instead of a quadratic optimization problem, leading to better generalization ability and execution efficiency. Affected by several factors, such as climate change, runoff usually contains multifrequency components (Niu et al. 2019). Hence, it is hard to use a single model to fully simulate runoff accurately because only one resolution component is utilized and the underlying multi-scale phenomena can't be extracted. For the VMD-SSA-LSSVM model, VMD is used to recognize the multifrequency components to reduce the modelling difficulty. Consequently, the VMD-SSA-LSSVM model outperformed standalone SSA-LSSVM.
The probable causes of VMD-SSA-LSSVM outperforming the other three models can be roughly ascribed to the contribution of VMD decomposition and parameter calibration based on SSA for LSSVM. VMD can decompose the raw monthly runoff data into several subsequences and unravel the underlying multi-scale phenomena. Each subsequence was simulated by SSA-LSSVM, which can recognize dynamic changes and reduce the modelling difficulty.
In this study, although the reliability and feasibility of VMD-SSA-LSSVM has been confirmed, further research should be made in the future. It is essential to employ new and excellent decomposition algorithms to improve the quality of subsequences. Meanwhile, the standard SSA utilized in this study should be modified to enhance the quality of parameter calibration for the models. Furthermore, more machine learning techniques should be verified to enhance the single model forecast accuracy.
CONCLUSION
In this study, a hybrid model, VMD-SSA-LSSVM, has been proposed for monthly runoff forecasting. Firstly, the raw monthly runoff data was decomposed into several subsequences via VMD. Secondly, each subsequence was simulated by the SSA-LSSVM model, of which two parameters, the penalty parameter and RBF kernel function, were optimized by SSA. Finally, all the output of the SSA-LSSVM model for each subsequence were summarized as final forecast results. Monthly runoff data obtained from two reservoirs (Xinfengjiang and Guangzhao Reservoirs) located in China were utilized to verify the VMD-SSA-LSSVM model. In order to assess the proposed model performance, four frequently used evaluation indexes were utilized and three models, namely PSO-SVM, SSA-SVM and VMD-SSA-LSSVM were employed for comparison. The results showed that the proposed model was superior to the three models in respect of all the four evaluation indexes. Therefore, the proposed model has been shown to be reliable, feasible and promising for enhancing the forecast accuracy of monthly runoff prediction.
ACKNOWLEDGEMENTS
This paper was supported by the National Natural Science Foundation of China (51709109).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.