Monthly runoff forecasting has always been a key problem in water resources management. As a data-driven method, the least square support vector machine (LSSVM) method has been investigated by numerous studies in runoff forecasting. However, selecting appropriate parameters for LSSVM is the key to obtaining satisfactory model performance. In this study, we propose a hybrid model for monthly runoff forecasting, VMD-SSA-LSSVM for short, which combines variational mode decomposition (VMD) with LSSVM and the parameters of LSSVM are optimized by a sparrow search algorithm (SSA). Firstly, VMD is utilized to decompose the original time series data into several subsequences. Secondly, LSSVM is employed to simulate each subsequence, for which the parameters are optimized by SSA. Finally, the simulated results for each subsequence are accumulated as the final results. The validity of the proposed model was verified by forecasting monthly runoff for two reservoirs located in China. Four frequently-used statistical indexes, namely the Nash efficiency coefficient, root mean squared error, correlation coefficient and mean absolute percentage error were used to evaluate model performance. The results demonstrate the superiority of VMD-SSA-LSSVM over the compared models in terms of all statistical indexes, indicating that it is beneficial for enhancing monthly runoff forecast accuracy.

  • A hybrid model of VMD-SSA-LSSVM is proposed for monthly runoff forecasting.

  • VMD is utilized to decompose monthly runoff data into several subsequences.

  • LSSVM is employed to simulate each subsequence and the SSA is used to optimize the parameters of LSSVM.

  • The forecast results for each subsequence are aggregated as the final forecast results.

  • The results verify the superiority of the proposed model.

Monthly runoff forecasting with acceptable accuracy plays a key role in water resources management (Xu & Yang 2019), such as irrigation, water supply and hydroelectric power generation. However, affected by the influence of multiple factors, such as climate change and underlying surface variation, monthly runoff is characterized by nonlinearity, randomness and nonstationarity, which is difficult to simulate (Ji et al. 2020). Many models reported in the literature have attempted to simulate monthly runoff, of which data-driven models that are easy to implement without calling for amounts of data and comprehending the underlying physical process can be seen as an important category (Ashrafi et al. 2017; Liao et al. 2020; Lin et al. 2021). With the development of AI techniques (Schmidhuber 2015), many machine learning methods have been successfully applied in monthly runoff forecasting (Zuo et al. 2020; Feng et al. 2021b; Roushangar et al. 2021). Hence, to further detect the potential of exiting machine learning methods in monthly runoff forecasting, it is necessary to study how to enhance model performance.

As a typical machine learning method, the support vector machine (SVM) method is calibrated utilizing a structural risk minimization principle to minimize an upper bound of the generalization error (Liu et al. 2014). Although SVM has been widely applied in hydrological forecasting, some drawbacks like low execution efficiency and high computation cost may exist (Barman & Dev Choudhury 2020; Niu et al. 2021). To overcome these defects, the least-squares support vector machine (LSSVM) method was proposed. Compared to SVM, LSSVM adopts linear equality constraints rather than quadratic programming to solve complex problems and can get better model performance. Consequently, many applications referring to LSSVM can be found in hydrological forecasting. However, the model parameters of LSSVM affect model performance to a great extent and are not easy manually set (Ashrafi et al. 2017). So far, two main effective approaches, namely data decomposition (Tan et al. 2018) and optimization algorithms (Zheng et al. 2017) are usually employed to yield better performance for the LSSVM model. For instance, Zhao et al. (2017) applied an empirical mode decomposition-based chaotic LSSVM hybrid model for annual runoff forecasting and investigated it with runoff data from four sites. Guo et al. (2021) utilized an improved quantum-based grey wolf optimizer (GWO) algorithm to optimize the parameters of LSSVM to enhance the performance of a multi-step-ahead forecast for reservoir water availability and verified the proposed model's performance with two time series of reservoir water availability in Zhoushan Islands, China. Niu et al. (2021) developed a hybrid hydrological forecasting method combining LSSVM, ensemble empirical mode decomposition and a gravitational search algorithm, and exhibited its superior performance over several control models in monthly runoff for two hydrological stations in China.

Variational mode decomposition (VMD), proposed by Dragomiretskiy & Zosso (2014), is an entirely non-recursive tool that can concurrently extract several band-limited intrinsic modes from complex data signals. Via VMD, time series data with the characteristics of high complexity and nonlinearity can be decomposed into several subsequences with different frequency scales. In addition, VMD avoids the defect of modal aliasing faced by the traditional decomposition method. Therefore, VMD has become an important decomposition method in many fields such as signal processing, wind speed prediction (Liu et al. 2018), fault diagnosis and time series prediction. Selection of an algorithm with strong optimization ability and good robustness should be made before parameter calibration for modeling hydrological time series data. Inspired by the foraging behavior and anti-predation behavior of sparrows, the sparrow search algorithm (SSA) was proposed to solve global optimization problems (Xue & Shen 2020). As a new swarm intelligent algorithm, SSA has the merits of strong optimization ability and fast convergence speed and has been utilized to optimize model parameters for time series prediction and other fields. Many applications of SSA in the literature have verified that the SSA method outperforms several mature and commonly used evolutionary algorithms, like GWO and particle swarm optimization in terms of solution precision, search rate and local minimum avoidance (Feng et al. 2021a). Hence, the SSA method can be used to optimize the parameters of LSSVM to obtain higher forecast accuracy for monthly runoff forecasting. To our knowledge, VMD and SSA-based LSSVM has not previously been applied in hydrological forecasting.

In this study, a hybrid model of VMD and SSA-based LSSVM (VMD-SSA-LSSVM) was proposed for monthly runoff forecasting. The main aspects of this research are as follows: (1) VMD is employed to decompose monthly runoff data into several subsequences to decrease modeling difficulty. (2) For modeling each subsequence, SSA is used to optimize the LSSVM hyperparameters. (3) Compared to several models, the proposed model provides better forecast accuracy. The proposed model is verified with monthly runoff data from two reservoirs located in China.

The rest of the paper is organized as follows: the basic principle of the VMD, LSSVM and SSA methods are presented and the hybrid model VMD-SSA-LSSVM is briefly illustrated. The details of the study area and data and develops forecast models are then introduced, before the statistics of the forecast results are presented and the results discussed. Finally, the conclusion is presented.

Variational mode decomposition

In VMD, an adaptively and completely non-recursive modal variational method is utilized, which can adaptively determine the center frequency and bandwidth, and effectively separate the inherent modal components to obtain effective decomposition components of a signal, and finally solve the variational optimal solution. The main steps of VMD are as follows:

  • (1)

    Obtain the analytical signal related to each modal function through the Hilbert transform and construct the frequency spectrum.

  • (2)

    Estimate the exponential mixing of the center frequency for each mode, and shift the spectrum of the mode function to the baseband.

  • (3)
    Use Gaussian smoothing to estimate the demodulated signal, that is, the L2-norm of the gradient. Hence, the constraint problem can be gained and expressed as:
    (1)
    where is the k-th mode component; is the center frequency corresponding to the k-th mode; is the Dirac distribution; * represents the convolution calculation and denotes the t-th data of the input signal.
  • In order to solve the above constrained problem, a non-constrained problem is constructed and the alternate direction method of multipliers (ADMM) algorithm with iterative updates is used to finally obtain the optimal solution of the model. The augmented Lagrangian structure is as follows:
    (2)
    where α is the secondary penalty factor, λ is the Lagrangian multiplier and denotes the inner product operation.

To solve the variational problem, ADMM is utilized to continuously update , and and optimize each modal component and center frequency , which can be obtained by the following iterative equations:
(3)
(4)
(5)
where n is the iteration number; is the iterative factor, and , , and correspond to the Fourier transforms of , , and , respectively.

Least square vector support machine method

The least squares support vector machine (LSSVM) method (Suykens & Vandewalle 1999) is an improvement of the traditional SVM, and has unique advantages in dealing with small samples and nonlinear problems. Starting with loss function in machine learning, LSSVM replaces the inequality constraints in standard SVM by equation constraints, which can transform the solution of a quadratic optimization problem into a system of linear equations. To a certain extent, LSSVM reduces the computational complexity and improves calculation speed (Nabipour et al. 2020). LSSVM can be used to map the input space to a high-dimensional feature space in time series regression and its nonlinear function can be described as
(6)
where x is the input data; is the corresponding data, ω is the weight; is the mapping function and b is the deviation value.
According to the principle of structural risk minimization, a regression problem can be transformed into an optimization problem so that ω and b in the function are optimal values:
(7)
where is the regression error between the real and predicted values of the output, and C is the regularization parameter that determines the trade-off between model complexity and accuracy.
To find the optimal solutions, the corresponding Lagrangian function is constructed as:
(8)
where is the Lagrangian multiplier.
According to the Karush-Kuhn-Tucker (KKT) conditions, the optimal solution of the problem is obtained by partial differentiation of Equation (8) with respect to , respectively as:
(9)
The linear problem can be simplified by eliminating and as:
(10)
where , and .
The commonly used radial basis kernel function (RBF) can be selected due to its good performance in time series regression, which is as follows:
(11)
where is the bandwidth of the kernel function.

Sparrow search algorithm

The sparrow Search Algorithm (SSA), which mimics the foraging and anti-predation behavior of sparrows, is characterized by strong robustness and global searching ability. There are two categories of captive house sparrows, namely producers and scroungers. The producers have high levels of energy reserves and are responsible for searching for food sources, while the scroungers follow the producers to search for food. Once the sparrows find predators, they send alarm signals and move towards safe areas. Each sparrow can become a producer, but the ratio of the two categories is changeless. The main content of an SSA are briefly described below.

The position of the sparrows can be expressed by a matrix as:
(12)
where d denotes the number of dimensions and n is the number of sparrows.
Then, the fitness value of the sparrow population can be expressed by a vector as below:
(13)
where the value in each row denotes the fitness value of each sparrow.
The producers with better fitness values are first to gain food and are in charge of searching for food, of which the locations are updated as follows:
(14)
where t represents t-th iteration number; j = 1, 2, 3,…,d; is the maximum number of iterations; is the position of the i-th sparrow in the j-th dimension at iteration t; is a random number. indicates the warning value; is the safe threshold; Q is a random number obeying normal distribution; L represents a matrix of .

If , it means there are no predators around and the foraging environment is safe. By contrast, if , it means there are predators in the surroundings and all sparrows should fly to other safe areas.

The scroungers follow the producers to compete for food. The positions of the scroungers can be updated as follows:
(15)
when i >n/2, this indicates that the i-th scrounger is less adapted and needs to fly to other places to find food and get more energy.
Being aware of danger, sparrows will quickly move to the safe areas, which can be expressed as
(16)
where denotes the current global optimal position; is the control parameter generated by a normal distribution of random numbers with mean 0 and variance 1; denotes the random number; is the fitness value of the sparrow; is a constant. and denote the current global best and worst fitness values, respectively.

Hybrid model for monthly runoff forecasting

In order to improve the accuracy of monthly runoff prediction, a hybrid model of VMD-SSA-LSSVM is proposed. The specific flow chart of the model is shown in Figure 1. The main steps of the model are as follows:

Figure 1

The flowchart of VMD-SSA-LSSVM for monthly runoff forecasting.

Figure 1

The flowchart of VMD-SSA-LSSVM for monthly runoff forecasting.

Close modal

Step1: Decomposition. Via VMD, the original monthly runoff data is decomposed into K relatively simple subsequences and each subsequence is normalized to [−1, 1]. The input variables of each subsequence are determined by using the partial autocorrelation function (PACF).

Step2: Parameter calibration. Initialize the parameters of the SSA algorithm and utilize SSA to calibrate the parameters of the LSSVM for each subsequence.

Step3: Aggregations. The forecast results for each subsequence are aggregated as the final forecast results.

Evaluation index

In order to accurately evaluate the performance of the proposed model, four measures, namely root mean square error (RMSE), mean average absolute percentage error (MAPE), correlation coefficient (R) and Nash efficiency coefficient (CE) are employed to estimate the prediction accuracy of the model. Among them, the smaller the RMSE and MAPE, the higher the R and CE, the better performance of the model. The theoretical formulas of RMSE, MAPE, R and CE are as follows.
(17)
(18)
(19)
(20)
where and are the i-th observed and forecasted values, respectively; and are the average of the observed and forecasted values, respectively.

Study area and data

The Xinfengjiang and Guangzhao reservoirs, located in southern and southwestern China, respectively, were selected as study areas, as shown in Figure 2. The Xinfengjiang Reservoir is located on the Xinfeng River, a tributary of the Dong River in the Pearl River System. For the Xinfengjiang Reservoir, the catchment area is 5,740 km2 and the average annual runoff is 192 m3/s. With a total storage capacity of 13.896 billion (13.896 × 109) m3 and installed capacity of 315 MW, the Xinfengjiang Reservoir is a large-scale water conservancy project with multipurpose comprehensive utilization that mainly focuses on power generation, as well as flood control, water supply, and shipping. Benefitting from carryover storage, the average annual power generation of the Xinfengjiang Reservoir can reach 1.172 billion kWh. The Guangzhao Reservoir is located in the middle section of the Beipan River, a tributary of the Xi River in the Pearl River system. For the Guangzhao Reservoir, the drainage area is 13,548 km2 and the average annual runoff is 257 m3/s. With a total storage capacity of 3.245 billion m3 and installed capacity of 1,040MW, the Guangzhao Reservior is also a large-scale water conservancy project with multipurpose comprehensive utilization that mainly focuses on power generation, as well as water supply, navigation and irrigation. The average annual power generation of the GuangZhao Reservoir is 2.754 billion kWh.

Figure 2

Location of the two reservoirs in China.

Figure 2

Location of the two reservoirs in China.

Close modal

In this study, the data obtained from the Xinfengjiang Reservoir covers from 1943 to 2015, of which the data from 1943 to 2002 was used for calibration and the rest was used for validation. Similarly, the data obtained from the Guangzhao Reservoir covers from 1956 to 2017, of which the data from 1956 to 2004 were used for calibration and the rest were used for validation.

Data decomposition

Although VMD can be employed to decompose the time series data effectively, the appropriate number of modes, K, needs to be predetermined. When the K value is selected to be large, the center frequencies of adjacent modal components will be close to each other, resulting in frequency aliasing. When the K value is selected to be small, some information in the original signal will be missed, which affects the prediction accuracy. To reduce the complexity of monthly runoff data and extract features more efficiently, the center frequency method revealed in the literature (Xie et al. 2019) is utilized to determine the number of modes of K. The center frequencies with different K values for the Xinfengjiang and Guangzhao Reservoirs are presented in Tables 1 and 2, respectively. From Table 1, it can be found that when K = 8, the value of the central frequency is relatively larger. Although when K = 9 or K = 10 the value of the central frequency is slightly larger, the difference between the center frequencies of adjacent sequences is becoming smaller and smaller, and modal mixing can occur easily. Hence, K = 8 can be selected as the number of modes for the decomposition of monthly runoff data for the Xinfengjiang Reservoir. Similarly, for the Guangzhao Reservoir, K = 7 can be selected as the number of modes. The decomposition results of Xinfengjiang and Guangzhao Reservoirs are shown in Figures 3 and 4, respectively. It is obvious that the significant difference existing in each subsequence indicates that the complex information implicated in the original monthly runoff data can be effectively extracted.

Table 1

Center frequencies with different K values for the Xinfengjiang Reservoir

Modal numberIMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9IMF10
k = 1 0.2825          
k = 2 0.3241 247.8243         
k = 3 0.3033 162.5706 332.1366        
k = 4 0.2048 85.1859 248.6603 367.8956       
k = 5 0.1964 83.2899 168.3411 268.5195 415.3279      
k = 6 0.1935 83.0461 165.9784 250.3814 333.4556 418.3235     
k = 7 0.1933 82.8716 164.7413 234.6214 287.1515 364.0949 421.5359    
k = 8 0.1848 82.3207 120.9071 168.7265 250.2942 328.5215 374.0438 451.1441   
k = 9 0.1827 82.0981 116.9117 166.9419 242.6613 285.4839 333.1360 377.3509 453.3533  
k = 10 0.1204 49.7305 83.6972 164.3951 202.8153 249.8869 290.8583 362.6170 415.4588 460.7523 
Modal numberIMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9IMF10
k = 1 0.2825          
k = 2 0.3241 247.8243         
k = 3 0.3033 162.5706 332.1366        
k = 4 0.2048 85.1859 248.6603 367.8956       
k = 5 0.1964 83.2899 168.3411 268.5195 415.3279      
k = 6 0.1935 83.0461 165.9784 250.3814 333.4556 418.3235     
k = 7 0.1933 82.8716 164.7413 234.6214 287.1515 364.0949 421.5359    
k = 8 0.1848 82.3207 120.9071 168.7265 250.2942 328.5215 374.0438 451.1441   
k = 9 0.1827 82.0981 116.9117 166.9419 242.6613 285.4839 333.1360 377.3509 453.3533  
k = 10 0.1204 49.7305 83.6972 164.3951 202.8153 249.8869 290.8583 362.6170 415.4588 460.7523 
Table 2

Center frequencies with different K values for the Guangzhao Reservoir

Modal numberIMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9IMF10
k = 1 0.1984          
k = 2 0.2212 195.4229         
k = 3 0.2145 161.5596 324.8544        
k = 4 0.1202 83.7359 217.3362 339.0131       
k = 5 0.1167 83.3431 165.7438 249.7721 365.5804      
k = 6 0.1164 83.2808 165.4430 245.2936 330.8657 445.2398     
k = 7 0.1159 83.1669 164.9476 241.7157 313.6208 372.7494 456.1387    
k = 8 0.1121 82.8604 128.6639 167.0464 245.3449 322.3837 376.5449 456.8794   
k = 9 0.0792 83.1053 55.9774 165.5323 240.1955 282.5769 329.6499 379.4012 458.2014  
k = 10 0.1035 83.0164 83.3154 165.3954 216.5814 248.1914 291.5010 332.8500 381.0694 458.6907 
Modal numberIMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9IMF10
k = 1 0.1984          
k = 2 0.2212 195.4229         
k = 3 0.2145 161.5596 324.8544        
k = 4 0.1202 83.7359 217.3362 339.0131       
k = 5 0.1167 83.3431 165.7438 249.7721 365.5804      
k = 6 0.1164 83.2808 165.4430 245.2936 330.8657 445.2398     
k = 7 0.1159 83.1669 164.9476 241.7157 313.6208 372.7494 456.1387    
k = 8 0.1121 82.8604 128.6639 167.0464 245.3449 322.3837 376.5449 456.8794   
k = 9 0.0792 83.1053 55.9774 165.5323 240.1955 282.5769 329.6499 379.4012 458.2014  
k = 10 0.1035 83.0164 83.3154 165.3954 216.5814 248.1914 291.5010 332.8500 381.0694 458.6907 
Figure 3

Decomposed results of monthly runoff data for the Xinfengjiang Reservoir.

Figure 3

Decomposed results of monthly runoff data for the Xinfengjiang Reservoir.

Close modal
Figure 4

Decomposed results of monthly runoff data for the Guangzhao Reservoir.

Figure 4

Decomposed results of monthly runoff data for the Guangzhao Reservoir.

Close modal

Input determination

Choosing appropriate input variables for the proposed model is vital for gaining reliable forecast accuracy. The partial autocorrelation function (PACF) is usually utilized to determine the input variables. Generally, when using PACF for determining the input variable, when all PACF values fall into the confidence interval, the previous values can be selected as the input. The PACF values of the original data and decomposed subsequences of the Xinfengjiang and Guangzhao Reservoirs are depicted in Figures 5 and 6, respectively. Following Figures 5 and 6, the PACF values for the two reservoirs can be analyzed and determined and the selection of input variables are shown in Tables 3 and 4, respectively. It can be seen that the input numbers of the original data and the decomposed subsequences are not exactly the same, which illustrates the complex and changeable characteristics of monthly runoff data.

Table 3

The selected input values of each series for the Xinfengjiang Reservoir

No.SeriesInput variablesNumbers of input
Original xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 
IMF1 xt-1,xt-2,xt-3 
IMF2 xt-1,xt-2,xt-3,xt-4 
IMF3 xt-1,xt-2,xt-3,xt-4 
IMF4 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 
IMF5 xt-1,xt-2 
IMF6 xt-1,xt-2,xt-3,xt-4,xt-5 
IMF7 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF8 xt-1,xt-2,xt-3,xt-4,xt-5 
No.SeriesInput variablesNumbers of input
Original xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 
IMF1 xt-1,xt-2,xt-3 
IMF2 xt-1,xt-2,xt-3,xt-4 
IMF3 xt-1,xt-2,xt-3,xt-4 
IMF4 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 
IMF5 xt-1,xt-2 
IMF6 xt-1,xt-2,xt-3,xt-4,xt-5 
IMF7 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF8 xt-1,xt-2,xt-3,xt-4,xt-5 
Table 4

The selected input values of each series for the Guangzhao Reservoir

No.SeriesInput variablesNumbers of input
Original xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 
IMF1 xt-1,xt-2,xt-3,xt-4,xt-5 
IMF2 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 
IMF3 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF4 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF5 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 
IMF6 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 
IMF7 xt-1,xt-2,xt-3,xt-4 
No.SeriesInput variablesNumbers of input
Original xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8 
IMF1 xt-1,xt-2,xt-3,xt-4,xt-5 
IMF2 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6 
IMF3 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF4 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7 
IMF5 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 
IMF6 xt-1,xt-2,xt-3,xt-4,xt-5,xt-6,xt-7,xt-8,xt-9 
IMF7 xt-1,xt-2,xt-3,xt-4 
Figure 5

PACF values of each series from the Xinfengjiang Reservoir.

Figure 5

PACF values of each series from the Xinfengjiang Reservoir.

Close modal
Figure 6

PACF values of each series from the Guangzhao Reservoir.

Figure 6

PACF values of each series from the Guangzhao Reservoir.

Close modal

Model development

To confirm the rationality and feasibility of the proposed model, three models were utilized for comparison, namely PSO-SVM, SSA-SVM and SSA-LSSVM. In order to make the models more convincing, each model was run 20 times independently and the average of the results was selected as the final forecast results. For the PSO-SVM, SSA-SVM and SSA-LSSVM models, the raw monthly runoff data obtained from the two reservoirs was utilized for model development, and the input variables were set based on PACF values of the original series. In this study, the three parameters of PSO-SVM and SSA-SVM, i.e. the penalty factor, RBF kernel function parameter and insensitive loss coefficient, were optimized by PSO and SSA, respectively. For the SSA-LSSVM model, the two parameters, i.e. the penalty parameter and RBF kernel function, were optimized by SSA. For the VMD-SSA-LSSVM model, there are three steps. Firstly, the raw monthly runoff data was decomposed into subsequences via VMD. Secondly, the SSA-LSSVM model was employed to simulate each subsequence and the input variables for each subsequence was determined by PACF. Finally, the results for each subsequence were summarized as the final results.

Forecast results

According to the methods mentioned above, the raw monthly runoff data and subsequences generated by VMD were simulated. The evaluation indexes of different models over the calibration and validation periods for the Xinfengjiang and Guangzhao Reservoirs are presented in Tables 5 and 6, respectively. It can be easily found from Table 5 that the VMD-SSA-LSSVM model performs best among the four models in terms of all the four evaluation indexes during the calibration and validation periods. For instance, when compared to the PSO-SVM model, the SSA-SVM model offers slightly better forecast accuracy with a reduction of 0.63% and 0.46% in respect of RMSE and MAPE and enhancement of 1.04% and 2.22%, in respect of R and CE, respectively, during the validation period, respectively. In comparison with the SSA-SVM model, the SSA-LSSVM model provides better forecast accuracy with a reduction of 2.19% in respect of RMSE and enhancement of 4.04% and 7.34% in respect of R and CE during the validation period for the Xinfengjiang Reservoir. Compared to the SSA-LSSVM model for Xinfengjiang Reservoir, the VMD-SSA-LSSVM model yields much better forecast accuracy with reduction of 60.97% and 52.67% in respect of RMSE and MAPE, and increment of 49.53% and 128.14% in respect of R and CE during the validation period, respectively. Similarly, as revealed in Table 6, the VMD-SSA-LSSVM model performs much better than the three benchmark models in terms of all the four evaluation indexes over the calibration and validation periods for the Guangzhao Reservoir. For instance, the VMD-SSA-LSSVM model yields better forecast accuracy with a reduction of 54.66% and 56.40% in respect of RMSE and MAPE, respectively, and an increment of 24.61% and 69.92% in respect of R and CE, respectively, compared to the SSA-LSSVM model during the validation period.

Table 5

Comparison of evaluation indexes of the different models for the Xinfengjiang Reservoir

ModelCalibration
Validation
RMSEMAPE(%)RCERMSEMAPE(%)RCE
PSO-SVM 148.527 57.330 0.685 0.448 164.528 63.769 0.610 0.363 
SSA-SVM 147.753 56.601 0.689 0.454 163.484 63.478 0.617 0.371 
SSA-LSSVM 139.983 65.320 0.717 0.510 159.910 84.879 0.642 0.398 
VMD-SSA-LSSVM 58.522 38.735 0.961 0.914 62.417 40.172 0.960 0.908 
ModelCalibration
Validation
RMSEMAPE(%)RCERMSEMAPE(%)RCE
PSO-SVM 148.527 57.330 0.685 0.448 164.528 63.769 0.610 0.363 
SSA-SVM 147.753 56.601 0.689 0.454 163.484 63.478 0.617 0.371 
SSA-LSSVM 139.983 65.320 0.717 0.510 159.910 84.879 0.642 0.398 
VMD-SSA-LSSVM 58.522 38.735 0.961 0.914 62.417 40.172 0.960 0.908 
Table 6

Comparison of evaluation indexes of the different models for the Guangzhao Reservoir

ModelCalibration
Validation
RMSEMAPE(%)RCERMSEMAPERCE
PSO-SVM 138.593 52.319 0.823 0.675 125.056 65.031 0.748 0.517 
SSA-SVM 133.098 48.412 0.839 0.700 123.478 61.896 0.762 0.529 
SSA-LSSVM 126.018 38.214 0.855 0.731 123.161 59.317 0.768 0.532 
VMD-SSA-LSSVM 54.491 17.668 0.976 0.950 55.844 25.861 0.957 0.904 
ModelCalibration
Validation
RMSEMAPE(%)RCERMSEMAPERCE
PSO-SVM 138.593 52.319 0.823 0.675 125.056 65.031 0.748 0.517 
SSA-SVM 133.098 48.412 0.839 0.700 123.478 61.896 0.762 0.529 
SSA-LSSVM 126.018 38.214 0.855 0.731 123.161 59.317 0.768 0.532 
VMD-SSA-LSSVM 54.491 17.668 0.976 0.950 55.844 25.861 0.957 0.904 

In order to visually display the comparison of forecast results, the hydrographs of the observed data and simulated results of the four models during the validation period are depicted in Figures 7 and 8. As shown in Figures 7 and 8, the hydrographs of the two reservoirs drastically vary with time, which indicates the complexity of the monthly runoff data and the difficulty to simulate them. It can be intuitively found that adopting a simple model such as PSO-SVM, SSA-SVM or SSA-LSSVM may lead to unsatisfactory forecast accuracy, while adopting a hybrid model such as VMD-SSA-LSSVM can very effectively solve this problem.

Figure 7

Comparison of the forecast results for the Xinfengjiang Reservoir during the validation period.

Figure 7

Comparison of the forecast results for the Xinfengjiang Reservoir during the validation period.

Close modal
Figure 8

Comparison of the forecast results for the Guangzhao Reservoir during the validation period.

Figure 8

Comparison of the forecast results for the Guangzhao Reservoir during the validation period.

Close modal

In order to further comprehend the fitting degree of the models, scatter diagrams of the four models for the Xinfengjiang and Guangzhao Reservoirs during the validation period are drawn in Figures 9 and 10, respectively. Generally, the closer the scatters are to the 45 degree slanted line, the better model performance it shows. As shown in Figures 9 and 10, the scatters of the VMD-SSA-LSSVM model are more concentrated around the 45 degree slanted line compared to the other three models, which indicates that the VMD-SSA-LSSVM model is more reliable and feasible.

Figure 9

Comparison of scatter diagrams of forecast results for the Xinfengjiang Reservoir during the validation period.

Figure 9

Comparison of scatter diagrams of forecast results for the Xinfengjiang Reservoir during the validation period.

Close modal
Figure 10

Comparison of scatter diagrams of forecast results for the Guangzhao Reservoir during the validation period.

Figure 10

Comparison of scatter diagrams of forecast results for the Guangzhao Reservoir during the validation period.

Close modal

In order to evaluate the performance of the VMD-SSA-LSSVM model in peak flow forecasting, peak flow estimates of the four models over the validation period for the Xinfengjiang and Guangzhao Reservoirs were carried out by statistical analysis. As shown in Table 7, the absolute averages of the relative error for the PSO-SVM, SSA-SVM, SSA-LSSVM and VMD-SSA-LSSVM for forecasting the 15 peak flows are 48.91%, 48.10%, 43.81% and 17.74%, respectively. Similarly, as shown in Table 8, the absolute averages of the relative error for the PSO-SVM, SSA-SVM, SSA-LSSVM and VMD-SSA-LSSVM for forecasting the 13 peak flows are 25.56%, 25.30%, 25.24% and 12.83%, respectively. It can be easily concluded that the VMD-SSA-LSSVM model yields much better forecast accuracy than PSO-SVM, SSA-SVM and SSA-LSSVM in terms of peak flow forecast.

Table 7

Peak flow estimates of different models for the Xinfengjiang Reservoir during the validation period

Peak No.OriginalPSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVMRelative error (%)
PSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVM
618.00 333.59 332.19 400.46 463.25 −46.02 −46.25 −35.20 −25.04 
353.00 314.28 326.47 396.77 319.31 −10.97 −7.52 12.40 −9.54 
336.00 227.57 230.92 341.28 327.34 −32.27 −31.28 1.57 −2.58 
203.00 275.99 284.80 337.67 300.13 35.96 40.29 66.34 47.85 
1,496.00 343.33 356.83 432.71 1,138.34 −77.05 −76.15 −71.08 −23.91 
783.82 402.58 416.11 434.02 694.35 −48.64 −46.91 −44.63 −11.41 
687.50 226.27 231.49 334.10 681.91 −67.09 −66.33 −51.40 −0.81 
1,066.00 242.20 248.53 332.90 850.30 −77.28 −76.69 −68.77 −20.23 
228.20 218.46 224.87 292.54 340.36 −4.27 −1.46 28.19 49.15 
10 867.50 366.19 375.70 446.02 709.82 −57.79 −56.69 −48.59 −18.18 
11 369.60 124.79 129.25 158.42 445.00 −66.24 −65.03 −57.14 20.40 
12 442.30 387.53 389.48 450.46 452.05 −12.38 −11.94 1.84 2.20 
13 860.90 302.93 313.74 390.72 709.38 −64.81 −63.56 −54.61 −17.60 
14 616.20 233.46 240.68 312.03 572.00 −62.11 −60.94 −49.36 −7.17 
15 493.63 144.46 145.40 167.96 444.38 −70.74 −70.55 −65.98 −9.98 
Average (absolute) 48.91 48.10 43.81 17.74 
Peak No.OriginalPSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVMRelative error (%)
PSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVM
618.00 333.59 332.19 400.46 463.25 −46.02 −46.25 −35.20 −25.04 
353.00 314.28 326.47 396.77 319.31 −10.97 −7.52 12.40 −9.54 
336.00 227.57 230.92 341.28 327.34 −32.27 −31.28 1.57 −2.58 
203.00 275.99 284.80 337.67 300.13 35.96 40.29 66.34 47.85 
1,496.00 343.33 356.83 432.71 1,138.34 −77.05 −76.15 −71.08 −23.91 
783.82 402.58 416.11 434.02 694.35 −48.64 −46.91 −44.63 −11.41 
687.50 226.27 231.49 334.10 681.91 −67.09 −66.33 −51.40 −0.81 
1,066.00 242.20 248.53 332.90 850.30 −77.28 −76.69 −68.77 −20.23 
228.20 218.46 224.87 292.54 340.36 −4.27 −1.46 28.19 49.15 
10 867.50 366.19 375.70 446.02 709.82 −57.79 −56.69 −48.59 −18.18 
11 369.60 124.79 129.25 158.42 445.00 −66.24 −65.03 −57.14 20.40 
12 442.30 387.53 389.48 450.46 452.05 −12.38 −11.94 1.84 2.20 
13 860.90 302.93 313.74 390.72 709.38 −64.81 −63.56 −54.61 −17.60 
14 616.20 233.46 240.68 312.03 572.00 −62.11 −60.94 −49.36 −7.17 
15 493.63 144.46 145.40 167.96 444.38 −70.74 −70.55 −65.98 −9.98 
Average (absolute) 48.91 48.10 43.81 17.74 
Table 8

Peak flow estimates of different models for the Guangzhao Reservoir during the validation period

Peak NO.OriginalPSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVMRelative error (%)
PSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVM
580.18 361.77 374.67 389.52 530.95 −37.65 −35.42 −32.86 −8.48 
580.51 280.45 282.91 284.68 519.43 −51.69 −51.27 −50.96 −10.52 
656.24 587.87 633.46 638.55 632.34 −10.42 −3.47 −2.70 −3.64 
526.04 530.39 581.37 610.27 492.89 0.83 10.52 16.01 −6.30 
500.19 411.55 451.98 512.38 504.09 −17.72 −9.64 2.44 0.78 
552.69 540.49 570.02 587.45 488.44 −2.21 3.14 6.29 −11.62 
320.45 356.60 382.99 406.07 357.82 11.28 19.52 26.72 11.66 
534.98 546.28 564.83 562.16 440.09 2.11 5.58 5.08 −17.74 
219.93 329.72 340.01 347.34 308.74 49.92 54.60 57.93 40.38 
10 614.08 484.71 516.41 538.46 517.45 −21.07 −15.90 −12.31 −15.74 
11 547.96 400.82 418.45 413.50 495.05 −26.85 −23.64 −24.54 −9.66 
12 423.92 257.91 261.23 271.37 472.72 −39.16 −38.38 −35.99 11.51 
13 1,199.60 463.21 506.16 548.65 975.34 −61.39 −57.81 −54.26 −18.69 
Average (absolute) 25.56 25.30 25.24 12.83 
Peak NO.OriginalPSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVMRelative error (%)
PSO-SVMSSA-SVMSSA-LSSVMVMD-SSA-LSSVM
580.18 361.77 374.67 389.52 530.95 −37.65 −35.42 −32.86 −8.48 
580.51 280.45 282.91 284.68 519.43 −51.69 −51.27 −50.96 −10.52 
656.24 587.87 633.46 638.55 632.34 −10.42 −3.47 −2.70 −3.64 
526.04 530.39 581.37 610.27 492.89 0.83 10.52 16.01 −6.30 
500.19 411.55 451.98 512.38 504.09 −17.72 −9.64 2.44 0.78 
552.69 540.49 570.02 587.45 488.44 −2.21 3.14 6.29 −11.62 
320.45 356.60 382.99 406.07 357.82 11.28 19.52 26.72 11.66 
534.98 546.28 564.83 562.16 440.09 2.11 5.58 5.08 −17.74 
219.93 329.72 340.01 347.34 308.74 49.92 54.60 57.93 40.38 
10 614.08 484.71 516.41 538.46 517.45 −21.07 −15.90 −12.31 −15.74 
11 547.96 400.82 418.45 413.50 495.05 −26.85 −23.64 −24.54 −9.66 
12 423.92 257.91 261.23 271.37 472.72 −39.16 −38.38 −35.99 11.51 
13 1,199.60 463.21 506.16 548.65 975.34 −61.39 −57.81 −54.26 −18.69 
Average (absolute) 25.56 25.30 25.24 12.83 

According to the forecast results yielded by PSO-SVM and SSA-SVM, it can be seen that there are slight differences in respect of the four statistical indexes, indicating the significance of selecting an appropriate optimization algorithm for model parameter calibration. Standard SVM utilizing the structural risk-minimization principle can gain good generalization performance. However, the performance of SVM usually depends on the optimization algorithm to calibrate the parameters. Even though PSO has been successfully used in solving optimization problems, PSO has to face the drawback of easy premature convergence (Shi et al. 2005). As a newly proposed optimization algorithm, SSA has strong global optimum ability and can effectively avoid the local optimum issue. Hence, compared with PSO, SSA affords slightly better optimization abilities. The problem that the SSA-LSSVM performs slightly better than the SSA-SVM model can be ascribed to the fact that LSSVM originated from SVM and can be seen as an improvement on it. Compared with SVM, LSSVM utilizes a set of linear equations for calibrating instead of a quadratic optimization problem, leading to better generalization ability and execution efficiency. Affected by several factors, such as climate change, runoff usually contains multifrequency components (Niu et al. 2019). Hence, it is hard to use a single model to fully simulate runoff accurately because only one resolution component is utilized and the underlying multi-scale phenomena can't be extracted. For the VMD-SSA-LSSVM model, VMD is used to recognize the multifrequency components to reduce the modelling difficulty. Consequently, the VMD-SSA-LSSVM model outperformed standalone SSA-LSSVM.

The probable causes of VMD-SSA-LSSVM outperforming the other three models can be roughly ascribed to the contribution of VMD decomposition and parameter calibration based on SSA for LSSVM. VMD can decompose the raw monthly runoff data into several subsequences and unravel the underlying multi-scale phenomena. Each subsequence was simulated by SSA-LSSVM, which can recognize dynamic changes and reduce the modelling difficulty.

In this study, although the reliability and feasibility of VMD-SSA-LSSVM has been confirmed, further research should be made in the future. It is essential to employ new and excellent decomposition algorithms to improve the quality of subsequences. Meanwhile, the standard SSA utilized in this study should be modified to enhance the quality of parameter calibration for the models. Furthermore, more machine learning techniques should be verified to enhance the single model forecast accuracy.

In this study, a hybrid model, VMD-SSA-LSSVM, has been proposed for monthly runoff forecasting. Firstly, the raw monthly runoff data was decomposed into several subsequences via VMD. Secondly, each subsequence was simulated by the SSA-LSSVM model, of which two parameters, the penalty parameter and RBF kernel function, were optimized by SSA. Finally, all the output of the SSA-LSSVM model for each subsequence were summarized as final forecast results. Monthly runoff data obtained from two reservoirs (Xinfengjiang and Guangzhao Reservoirs) located in China were utilized to verify the VMD-SSA-LSSVM model. In order to assess the proposed model performance, four frequently used evaluation indexes were utilized and three models, namely PSO-SVM, SSA-SVM and VMD-SSA-LSSVM were employed for comparison. The results showed that the proposed model was superior to the three models in respect of all the four evaluation indexes. Therefore, the proposed model has been shown to be reliable, feasible and promising for enhancing the forecast accuracy of monthly runoff prediction.

This paper was supported by the National Natural Science Foundation of China (51709109).

Data cannot be made publicly available; readers should contact the corresponding author for details.

Ashrafi
M.
,
Chua
L. H. C.
,
Quek
C.
&
Qin
X.
2017
A fully-online Neuro-Fuzzy model for flow forecasting in basins with limited data
.
Journal of Hydrology
545
,
424
435
.
Dragomiretskiy
K.
&
Zosso
D.
2014
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
Feng
B. F.
,
Xu
Y. S.
,
Zhang
T.
&
Zhang
X.
2021a
Hydrological time series prediction by extreme learning machine and sparrow search algorithm
.
Water Supply
22
(
3
),
3143
3157
.
https://doi.org/10.2166/ws.2021.419 (online)
.
Ji
Y.
,
Dong
H.-T.
,
Xing
Z. X.
,
Sun
M. X.
,
Fu
Q.
&
Liu
D.
2020
Application of the decomposition-prediction-reconstruction framework to medium- and long-term runoff forecasting
.
Water Supply
21
(
2
),
696
709
.
Liao
S.
,
Liu
Z.
,
Liu
B.
,
Cheng
C.
,
Jin
X.
&
Zhao
Z.
2020
Multistep-ahead daily inflow forecasting using the ERA-Interim reanalysis data set based on gradient-boosting regression trees
.
Hydrology and Earth System Sciences
24
(
5
),
2343
2363
.
Lin
Y.
,
Wang
D.
,
Wang
G.
,
Qiu
J.
,
Long
K.
,
Du
Y.
,
Xie
H.
,
Wei
Z.
,
Shangguan
W.
&
Dai
Y.
2021
A hybrid deep learning algorithm and its application to streamflow prediction
.
Journal of Hydrology
601
,
126636
.
Shi
X. H.
,
Liang
Y. C.
,
Lee
H. P.
,
Lu
C.
&
Wang
L. M.
2005
An improved GA and a novel PSO-GA-based hybrid algorithm
.
Information Processing Letters
93
(
5
),
255
261
.
Suykens
J. A. K.
&
Vandewalle
J.
1999
Least squares support vector machine classifiers
.
Neural Processing Letters
9
(
3
),
293
300
.
Tan
Q. F.
,
Lei
X. H.
,
Wang
X.
,
Wang
H.
,
Wen
X.
,
Ji
Y.
&
Kang
A. Q.
2018
An adaptive middle and long-term runoff forecast model using EEMD-ANN hybrid approach
.
Journal of Hydrology
567
,
767
780
.
Xue
J.
&
Shen
B.
2020
A novel swarm intelligence optimization approach: sparrow search algorithm
.
Systems Science & Control Engineering
8
(
1
),
22
34
.
Zhao
X. H.
,
Chen
X.
,
Xu
Y. X.
,
Xi
D. J.
,
Zhang
Y. B.
&
Zheng
X. Q.
2017
An EMD-based chaotic least squares support vector machine hybrid model for annual runoff forecasting
.
Water
9
(
3
),
153
.
Zheng
F.
,
Zecchin
A. C.
,
Newman
J. P.
,
Maier
H. R.
&
Dandy
G. C.
2017
An adaptive convergence-trajectory controlled ant colony optimization algorithm with application to water distribution system design problems
.
IEEE Transactions on Evolutionary Computation
21
(
5
),
773
791
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).