The rainfall series exhibits uncertainty and non-stationarity. Improving the accuracy of rainfall prediction is of significant importance for flood prevention and mitigation. This study proposes a hybrid model and applies it to rainfall forecasting in the eastern region of Hubei Province. The proposed method first uses variational mode decomposition and improved complete ensemble empirical mode decomposition with adaptive noise to reduce high-frequency noise components. Then, particle swarm optimization and support vector machine are used for training and forecasting. Compared with other models, the prediction model after noise reduction shows better performance than the model without secondary decomposition, with results that are closer to the actual values. The proposed hybrid model outperforms other models, with the predicted trend more closely aligning with the actual data, and the value of R² of predictions for individual cities reaches 0.96. This study not only provides an efficient method for rainfall forecasting but also holds significant importance for understanding and addressing climate change.

  • Novel hybrid model combining secondary decomposition (VMD-ICEEMDAN), reduces non-stationarity and enhances rainfall forecasting accuracy.

  • PSO-optimized SVM, achieves superior parameter tuning, yielding a high R2 of 0.96 and minimal prediction errors (MAE: 1.27).

  • Validated in eastern Hubei Province, the model provides robust tools for flood prevention and climate resilience in complex hydrological systems.

One of the main natural disasters that endanger people's lives and property has always been heavy rainfall and flooding. When a flood disaster occurs, significant property damage and loss of human life can result. The relationship between rainfall, runoff, and flooding has been widely studied. Zaghloul et al. (2022) analyzed the long-term trends of river discharge and climate change in northern Canada and found that climate change has a significant impact on river flow. Additionally, Zhang et al. (2022) investigated the driving factors of flash floods triggered by heavy rainfall in the middle reaches of the Yellow River. They found that the ecological construction of soil and water conservation projects on the Loess Plateau has a significant role in reducing peak flows and sediment during moderate- to low-intensity rainfall events, but these projects have a limited effect on reducing floods and sediment during high-intensity flash floods. These studies provide valuable references, especially in the context of considering the impacts of rainfall on water resources and flood risk. Therefore, improving the accuracy of rainfall forecasts and reducing the losses caused by heavy rainfall and flooding are major challenges we face. Traditional medium- and long-term forecasting methods can be classified into three categories. One is deterministic, the second is stochastic, and the third one is hydrologic modeling. Local economic and social development and watershed development are becoming increasingly prominent in the national economy. Forecasting the total average precipitation at the local or watershed level is therefore essential. Rainfall forecasting activities must meet strict criteria to provide reservoir scheduling and offer the government a solid theoretical foundation for controlling and preventing floods. Furthermore, the rainfall process is a complex hydrological phenomenon influenced by various factors such as the atmosphere, the ocean, and surface features. Meteorological conditions are inherently complex, variable, and highly dynamic. Due to the greater randomness of the rainfall process, it is difficult to accurately calculate the amount of rainfall in a given area over a certain period using one or several numerical methods. Therefore, improving the accuracy of rainfall prediction has become an important area of research. Currently, the methods commonly used for rainfall forecasting are primarily divided into two categories: traditional statistical methods and machine learning algorithms.

Widely used machine learning algorithms include artificial neural network (ANN) and support vector machine (SVM). These two methods can establish accurate nonlinear correlations in a large amount of heterogeneous data and have better generalization ability than traditional analysis methods. Bagirov & Mahmood (2018) used SVM, multiple linear regression, K-nearest neighbor, and ANN to predict monthly rainfall for five meteorological variables for 24 geographically different weather stations in Australia from 1970 to 2014 and evaluated the performance of each model by comparing the observed and forecast rainfall. The monthly rainfall data were used to predict the monthly rainfall, and the predictive performance of the models was evaluated by comparing the observed and simulated rainfall. Farajzadeh & Alizadeh (2018) developed a model (wavelet-seasonal autoregressive integrated moving average with exogenous regressors (SARIMAX)-least-squares SVM (LSSVM), W-S-LSSVM) based on discrete wavelet transform, autoregressive integrated moving average with exogenous regressors (ARIMAX), and LSSVM for predicting the rainfall time series in the Lake Urmia Basin. Shin & Han (2001) combined general domain knowledge with specific case knowledge. We are particularly interested in the optimal or near-optimal decision trees, which represent the best combination level between these two types of knowledge. Shaker Reddy & Sureshbabu (2020) used the enhanced multiple linear regression model to model the climatic data from the Indian Metrological Department (IMD, Hyderabad) for the period 1901–2002 and obtained better accuracy in predicting precipitation as compared to the existing models. Chaurasia et al. (2020) predicted daily rainfall using a multilayer-stacked long short-term memory (LSTM) model. The analysis and prediction of rainfall time series, which is essential for water resource management and flood control, was analyzed by Ghaderpour et al. (2023) for the Italian region, providing insights into regional climate patterns and their implications for water resources management. In addition, Mulla et al. (2024) used the SARIMA model with exogenous variables to predict monthly rainfall, demonstrating the effectiveness of this approach in capturing seasonal variations and improving forecast accuracy. Nabipour et al. (2020) combined a novel natural optimization algorithm and ANN to predict short-term hydrological droughts and concluded that the hybrid model has better performance and the particle swarm optimization (PSO) algorithm outperforms other optimization algorithms. SVM does not depend on the choice of model, has good generalization ability, has a certain degree of resistance to allergies, and has certain advantages in solving practical problems such as small samples, nonlinear, over-learning, and local minima, which is suitable for application in the study of weather conditions. Xiong & Zeng (2008) constructed a rainfall forecasting model based on the SVM method and compared it with the T213 forecasting model. They found that the SVM was superior. It is difficult for a single forecasting model to have a good fit for the whole hydrological process, so rainfall forecasting that integrates multiple forecasting methods has received attention from researchers. Combining many different methods to construct coupled models is widely used in the field of forecasting. Xiong et al. (2016) used the Hodrick-Prescott (HP)-Elman Neural Network (ENN)-LSSVM model to predict rainfall in a farm in Jilin Province, and the results showed that the proposed model has higher forecasting accuracy than the LSSVM model; Yu et al. (2016) used the SVM method to construct the corresponding classification and regression models based on the consideration of many forecasting factors, and the experimental results showed that both models had better forecasting effects. Qin (2021) used the SARIMA-CEEMD-LSTM model to observe 36 meteorological points in East and Central China, and the results showed that the model can demonstrate a higher fitting accuracy in the simultaneous forecasting of multiple forecasting points. The results show that the model can show high goodness of fit in the simultaneous forecasting of multiple forecast points. Zhao & Yu (2021) proposed a model based on the complete empirical modal decomposition and least-squares SVM, which embodies the idea of ‘refinement’ of the original data processing and reduces the forecasting difficulty of the complex components in the primary decomposition. The ‘secondary decomposition’ embodies the idea of refinement of the original data processing, which reduces the difficulty of predicting the complex components in the primary decomposition. Seo et al. (2018) and He et al. (2019) established a combined runoff forecasting model based on the variational mode decomposition (VMD) algorithm. Sibtain et al. (2020) coupled SVM with a two-stage signal decomposition method to develop a improved complete ensemble empirical mode decomposition with adaptive noise-VMD-SVM runoff forecasting model, which proved that the proposed combined runoff model outperformed other single forecasting models. Lian (2022) established a combined runoff forecasting model based on the complementary integrated empirical modal decomposition (CEEMD), and his proposed combined model has higher accuracy. Rainfall evolution is a complex, multidimensional system driven by numerous uncertainty factors. It is uncommon to build a coupled forecasting model for rainfall simulation and forecasting by minimizing the rainfall series' non-stationarity given its inherent properties. The development trend and the direction of regional rainfall forecasting can effectively improve the forecasting degree by reducing the non-stationarity of the series by downscaling and combining with contemporary nonlinear theoretical methods for forecasting.

Compared to conventional primary decomposition, this study adopts a secondary decomposition approach. Compared with the traditional EMD method, ICEEMDAN uses an adaptive noise algorithm on the variables after VMD to reduce the mode mixing issue and ensure that the decomposed signals retain all the information of the original signal, thus improving the accuracy of the decomposition. In this study, the PSO algorithm is used to optimize the hyperparameters of the SVM model by automatically searching for the optimal parameter combination in the training set, thereby enhancing the model's performance. ICEEMDAN can better adapt to the nonlinear and non-smooth characteristics of the signal to find the hidden information in the rainfall sequence. By using the SVM algorithm after PSO optimization, the proposed combined model is applied to rainfall forecasting in the eastern part of Hubei Province.

Study region

The eastern part of Hubei Province, mainly including Huanggang, Huangshi, and Ezhou, is located at the southern foot of the Dabie Mountains and the central portion of the Yangtze River. The geographical position is 114°25′ to 116°8′ East and 29°45′ to 31°35′ North. It is connected with Henan in the north, Anhui in the east, and Jiujiang in the south. It has a subtropical continental monsoon climate and a microclimate zone of Jianghuai. The light and heat of the four seasons are demarcated. The average annual temperature in eastern Hubei is high, the number of hot days in summer is significantly high, the annual rainfall is low, and the annual sunshine hours are essentially normal. Local heavy rainfall and flooding occur during the rainy season, and stage droughts as well as autumn droughts are experienced in the summer and autumn. Therefore, it is necessary to forecast rainfall in eastern Hubei. In this paper, the rainfall data of three cities in eastern Hubei from 2000 to 2020 are selected as the research object to analyze the rainfall changes and their characteristics, and the specific location of the research area is shown in Figure 1.
Figure 1

Location elevation map of the study area.

Figure 1

Location elevation map of the study area.

Close modal

Datasets

The meteorological data for this paper were obtained from the National Meteorological Information Center of the China Meteorological Administration (http://data.cma.cn). We utilized the monthly rainfall data from meteorological stations in Huanggang, Huangshi, and Ezhou for the period 2000–2020. The monthly rainfall data volume dataset for the Huanggang, Huangshi, and Ezhou meteorological stations from 2000 to 2020 has undergone strict quality checking. Because the meteorological stations in Huanggang, Huangshi, and Ezhou are representative of their geographical locations and environmental conditions, they can reflect the typical characteristics of the local climate regions. A rigorous quality check was conducted on the volume dataset (2000–2020). Due to the turnover of data and the absence of data, data were excluded if they were missing greater than 0.1% and spline interpolation was performed every 5 years and used to supplement missing data. The study data are presented in Figure 2.
Figure 2

Monthly rainfall data of Huanggang, Huangshi and Ezhou from 2000 to 2020.

Figure 2

Monthly rainfall data of Huanggang, Huangshi and Ezhou from 2000 to 2020.

Close modal

Rainfall is unpredictable and uncertain, as shown in Figure 2, and this results in a rainfall sequence that exhibits both linear and nonlinear features. This makes rainfall forecasting challenging because the rainfall sequence exhibits the characteristics of both nonlinear and linear fusion, making it challenging for a single neural network model to produce better forecasting results and challenging to represent the rainfall sequence in various ways. Additionally, learning from some high-frequency mutation data is not possible, and it is challenging to express the changing features of the rainfall series in different frequency domains. Thus, one of the most essential ways to increase the precision of rainfall forecasting is to create a new hybrid model that combines nonlinear and linear functions.

Methods

Variational modal decomposition

Modal discretization and signal processing can be done adaptively and non-recursively with VMD. In contrast to the EMD principle, the VMD is a fully non-recursive model that looks for the set of modal components and their corresponding center frequencies. Each mode is smoothed after being demodulated into the baseband. To determine the center frequency and bandwidth of the components of each decomposition, it performs an iterative search for the optimal solution of the variational model (Xu et al. 2020). One benefit of the technique is that it can be used to find the number of decompositions that involve modalities; its adaptability is shown by finding the array of modal decompositions for a particular sequence based on the real-world scenario. The optimal solution to the variational problem can be achieved by adaptively matching each mode's optimal center frequency and finite bandwidth in a subsequent search and solution process. This approach effectively separates the intrinsic mode function (IMF), divides the frequency domain of the signal, and achieves the effective separation of the IMF (Zhu et al. 2023).

The corresponding constrained variational expression is provided, assuming that the initial rainfall sequence S is decomposed into K components μ. This guarantees that the decomposed sequence is a modal component with finite bandwidth and a defined center frequency, and that the estimated bandwidths of modes are simultaneously minimized (Pei et al. 2020; Li et al. 2023). The requirement is that the sum of all modes is equal to the original rainfall sequence.
(1)
To solve the above-constrained optimization problem, the constrained variational problem is transformed into an unconstrained variational problem by taking advantage of the quadratic penalty term and the Lagrange multiplier method, an augmented and generalized Lagrange function is introduced, and an expression for the unconstrained variational problem is obtained:
(2)
where f is the runoff data; is the impulse function; is the the K modal components obtained after decomposition; is the K modal components obtained after decomposition; * is the convolutional operator; α is the penalty factor; and is the the Lagrangian operator.

The alternating direction multiplier approach can be used to iteratively search for the optimal solution to the unconstrained variational problem of equation by picking the extreme points of this Lagrange function.

Improved complete ensemble empirical mode decomposition with adaptive noise

The EMD technique, originally developed by Huang et al. (1998), has been widely used for analyzing non-smooth signals. The result is a set of data sequences with several characteristic scales, each of which is referred to as an IMF. This is achieved by gradually breaking down the fluctuations and trends of the signal's various scales. By including pairs of positive and negative Gaussian white noises in the decomposed signals, the EEMD and CEEMD algorithms minimize modal aliasing in the decomposition of EMD algorithms, a solution to the issue of modal aliasing in the decomposition of signals by EMD algorithms (Li et al. 2022). By incorporating pairings of positive and negative Gaussian white noise into the signal to be decomposed, the EEMD and CEEMD algorithms decrease the modal aliasing of EMD decomposition to be less noticeable (Li et al. 2022). However, in the intrinsic modal components of the decomposed signals, these two techniques always leave some white noise behind, which interferes with the signals' further processing and analysis. By using their improved algorithm, Colominas et al. (2014) successfully transferred white noise from high frequency to low frequency using Complete Ensemble Empirical Modal Decomposition (CEEMDAN), addressing the transfer difficulty from high frequency to low frequency (Nourani et al. 2011). The enhanced approach adds Gaussian white noise directly during the decomposition process, which sets it apart from CEEMDAN, despite choosing the white noise's Kth IMF component once EMD has broken it down. The following is the ICEEMDAN decomposition process:

The first three variables are for Gaussian white noise, for the signal's local average, and for the kth order modal component produced by the EMD decomposition.

  • (1) Reconstruct the time series by adding i sets of Gaussian white noise to the original rainfall time series x (i = 1.2.3……)
    (3)
    where y is the rainfall time series that is the original, and is the ratio of signal to noise for Gaussian noise and is proportional to the original signal, but the deviation is what matters. The noise's kth order mode is defined , and Gaussian white noise with a mean of 0 and unit variance of 1 is called . i = 1,2,3…
  • (2) Calculate the first-order residuals from the local means obtained from the ICEEMDAN decomposition and first-order modal components .
    (4)
    (5)
  • (3) The average of the local means was used to get the residuals and IMF modal components that are second-order while adding white noise to the initial set of residuals.
    (6)
    (7)
    where M is the local mean of the generated signal.
  • (4) Obtain calculations for the kth and kth sets of residuals and modal components.
    (8)
    (9)
  • (5) To acquire the final residual S with the original rainfall time series x, repeat step (3), halt the calculation when a monotonic function exists in the residual S, and record the derived IMF components as follows:
    (10)
    (11)

PSO-optimized SVM

  • (1) Particle swarm optimization

PSO is a swarm intelligence algorithm that originates from the social activities of birds in flight and foraging, where birds search for the global optimal point through the information interaction among individuals. Its benefits include an easy-to-implement premise, minimal parameter setting, and simplicity. A flock of random particles (random solution) is used to initialize PSO. Iteration is then used to find the best option. The particles update themselves by monitoring two ‘extreme values (pbest and gbest)’ in each iteration. The article uses the following equation to update its position and velocity after determining these two ideal values.
(12)
(13)

In the formula, and are random numbers between (0,1) and and represent the learning factors.

  • (2) Support vector machine

SVM is a machine learning algorithm that is commonly utilized and can be used for linear and nonlinear classification problems, regression, and outlier detection. The fundamental idea is to maximize the distance between the closest sample point and the hyperplane by identifying a hyperplane in the feature space to divide the samples into distinct categories.

The SVM classification model relies on two crucial parameters: C and . The parameter C is associated with the penalty factor, and the value of C affects the classification accuracy of the classifier, which can also be interpreted as the tolerance limit for errors (Yao et al. 2014). If C is too large, the classification accuracy in the training phase is very high, while the classification accuracy in the testing phase is very low, and there is a low tolerance for errors generated by the model. If C is too small, the classification accuracy is poor and unsatisfactory, with a high level of tolerance for model-generated errors, thus rendering the trained classification model useless (Kharrat et al. 2014). An inappropriate value of C results in a model with poor generalization ability. The parameter γ has a greater impact on the results compared to the penalty factor, and the value of γ has an impact on the division in the feature space. If the value of γ is too large, it will lead to overfitting, while too small a value of γ will lead to underfitting, and the size of γ affects how many support vectors are available, which in turn affects the training speed of the model.

  • (3) PSO-optimized SVM

The values of C and γ have a great influence on SVM performance; different values will result in varying classification outcomes. To choose the appropriate values f C and γ, the SVM model is optimized by using PSO. The particle swarm optimization algorithm process is shown in Figure 3.

Combined modelling

After the modal components are obtained through VMD, the hidden information in the sequence is more effectively revealed after secondary decomposition by using ICEEMDAN. PSO relies on particle velocity to complete the search process, which improves the accuracy of the SVM mechanism. Consequently, the forecasting accuracy of SVM can be effectively improved through the use of PSO. The combined model for rainfall forecasting in Hubei's eastern region is established in this research. The model's particular procedure is as follows.
  • (1) The components of the first 228 months of each city are used as the training set of PSO-optimized SVM, and the components of the last 24 months are used as the validation set of PSO-optimized SVM.

  • (2) VMD splits the monthly average rainfall time series into high-frequency and low-frequency components, which are shown in Figure 4 as , ,…, , and Res. The monthly average rainfall time series of three cities in eastern Hubei Province are 252, respectively.

  • (3) Compute Res using an ICEEMDAN quadratic decomposition.

  • (4) The decomposed IMF data are inputted into the PSO-optimized SVM, and the lattice parameters of the neural network model are constantly debugged to optimize the training of the model on the training data, thereby increasing forecasting accuracy.

  • (5) Predict the IMF components from 2001 to 2018 by the tuned PSO-optimized SVM model.

  • (6) The predicted IMF components and trend terms are reconstructed to obtain the predicted values of each component for 2019–2020.

  • (7) The IMF components were first entered into the PSO-optimized SVM model separately to derive the corresponding predicted values.

  • (8) The components obtained after quadratic decomposition of the residuals are then input into the PSO-optimized SVM model to produce the corresponding predicted values.

  • (9) The forecasting of rainfall is reduced by combining and simplifying the forecasting of the two components.

Figure 3

PSO-optimized SVM flow chart.

Figure 3

PSO-optimized SVM flow chart.

Close modal

Model evaluation criteria

These are the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient (R2) are hereby introduced as the evaluation metrics (Ghaderpour et al. 2023) (Table 1).

Table 1

Four evaluation rules

MetricEquationDefinition
R2 
 
Correlation coefficient 
MAE 
 
The average absolute forecast error of i times forecast results 
MAPE 
 
The average of absolute error 
RMSE 
 
The root mean square forecast error 
MetricEquationDefinition
R2 
 
Correlation coefficient 
MAE 
 
The average absolute forecast error of i times forecast results 
MAPE 
 
The average of absolute error 
RMSE 
 
The root mean square forecast error 

Here and indicate the actual and predicted values at time n, respectively. n is the sample size.

VMD data decomposition

Following the previous VMD steps, the rainfall data for the three cities in the study area for 20 years from 2000 to 2020 were decomposed. Figure 4 displays the outcomes of the decomposition, which produces seven modal components and one residual.
Figure 4

VMD of rainfall sequence.

Figure 4

VMD of rainfall sequence.

Close modal

Figure 4 shows that the rainfall data are separated into seven components and a residual, where each city's IMF component's amplitude, frequency, and wavelength gradually decrease and lengthen with the number of decomposition layers, respectively. The sequence shows some regularity as a result of these changes, which can be seen that the rainfall sequence is decomposed by the VMD, and the non-smoothness of the sequence is reduced.

ICEEMDAN quadratic decomposition of the residual term

Owing to the time series' nonlinearity and non-stationarity, the waveform fluctuation of the Res waveform of each city component is more violent, and its sample entropy value is calculated to be 1.5638, which is more complex and carries rich information. If it is directly predicted in the modeling process, the forecasting accuracy of the overall model will be weakened. Therefore, the ICEEMDAN quadratic decomposition of Res is performed to raise the model's accuracy. The decomposition results are shown in Figure 5.
Figure 5

ICEEMDAN of the rainfall sequence.

Figure 5

ICEEMDAN of the rainfall sequence.

Close modal

PSO-optimized SVM model forecasting

Now the forecasting is carried out with the SVM model after PSO optimization, the rainfall data from 20 years is divided into a test set and a training set in a 9:1 ratio. The first 228 data points are used as the test set, and the last 24 data points are used as the training set. This data partitioning facilitates the construction of a well-trained model with good generalization ability and can provide reliable forecasting in practical applications. To show the forecasting effect of the SVM model under PSO optimization after quadratic decomposition, the result is shown in Figure 6, and the real and predicted values are compared in Table 2.
Table 2

The relative error index of 2019–2020 data

CityYearMonthTrue value (mm)Simulated value (mm)Relative error (%)
Huanggang 2019 62.57 64.66 3.23 
110.41 119.51 7.61 
77.95 82.91 5.98 
134.21 137.92 2.69 
154.42 160.44 3.75 
236.48 243.51 2.89 
89.05 93.84 5.10 
33.20 34.96 5.34 
3.05 3.13 2.56 
10 29.00 29.17 0.58 
11 47.31 49.63 4.67 
12 31.15 31.57 1.33 
2020 118.69 115.04 3.17 
77.46 79.12 2.10 
125.09 131.43 4.82 
61.30 62.11 1.30 
138.03 141.98 2.78 
437.57 444.58 1.58 
623.90 632.66 1.38 
150.15 185.57 2.92 
218.32 221.85 1.59 
10 122.17 123.43 1.03 
11 72.61 77.77 6.64 
12 17.32 18.12 4.42 
Huangshi 2019 60.34 58.94 2.38 
154.82 153.29 0.99 
132.64 131.02 1.24 
131.69 129.16 1.96 
186.36 193.19 3.54 
174.83 171.98 1.66 
145.14 141.89 2.29 
23.26 22.14 5.06 
1.71 1.67 2.39 
10 30.81 31.68 2.75 
11 52.20 53.68 2.76 
12 39.27 38.25 2.67 
2020 165.58 164.04 0.33 
89.50 84.86 5.47 
173.07 176.70 2.05 
96.35 96.14 0.22 
155.95 155.20 0.48 
346.82 349.77 0.84 
572.45 552.86 3.54 
117.60 119.72 1.77 
280.47 279.55 0.33 
10 99.11 105.00 5.61 
11 72.72 69.73 4.29 
12 24.65 21.62 1.40 
Ezhou 2019 60.65 51.66 1.74 
112.44 111.81 0.56 
95.27 101.96 6.56 
128.51 131.38 2.18 
169.44 176.33 1.10 
232.68 269.60 2.89 
102.57 103.09 0.50 
23.12 23.46 1.45 
2.10 2.19 4.11 
10 31.28 32.90 4.92 
11 52.26 53.81 2.88 
12 30.80 31.49 2.19 
2020 134.89 135.71 0.60 
88.28 92.55 4.61 
137.88 140.25 1.68 
73.37 79.98 8.26 
140.76 142.27 1.06 
378.16 380.36 0.58 
566.39 580.05 2.36 
123.73 139.89 1.16 
228.77 231.82 1.32 
10 124.08 126.46 1.88 
11 73.77 79.51 7.22 
12 21.10 21.77 2.08 
CityYearMonthTrue value (mm)Simulated value (mm)Relative error (%)
Huanggang 2019 62.57 64.66 3.23 
110.41 119.51 7.61 
77.95 82.91 5.98 
134.21 137.92 2.69 
154.42 160.44 3.75 
236.48 243.51 2.89 
89.05 93.84 5.10 
33.20 34.96 5.34 
3.05 3.13 2.56 
10 29.00 29.17 0.58 
11 47.31 49.63 4.67 
12 31.15 31.57 1.33 
2020 118.69 115.04 3.17 
77.46 79.12 2.10 
125.09 131.43 4.82 
61.30 62.11 1.30 
138.03 141.98 2.78 
437.57 444.58 1.58 
623.90 632.66 1.38 
150.15 185.57 2.92 
218.32 221.85 1.59 
10 122.17 123.43 1.03 
11 72.61 77.77 6.64 
12 17.32 18.12 4.42 
Huangshi 2019 60.34 58.94 2.38 
154.82 153.29 0.99 
132.64 131.02 1.24 
131.69 129.16 1.96 
186.36 193.19 3.54 
174.83 171.98 1.66 
145.14 141.89 2.29 
23.26 22.14 5.06 
1.71 1.67 2.39 
10 30.81 31.68 2.75 
11 52.20 53.68 2.76 
12 39.27 38.25 2.67 
2020 165.58 164.04 0.33 
89.50 84.86 5.47 
173.07 176.70 2.05 
96.35 96.14 0.22 
155.95 155.20 0.48 
346.82 349.77 0.84 
572.45 552.86 3.54 
117.60 119.72 1.77 
280.47 279.55 0.33 
10 99.11 105.00 5.61 
11 72.72 69.73 4.29 
12 24.65 21.62 1.40 
Ezhou 2019 60.65 51.66 1.74 
112.44 111.81 0.56 
95.27 101.96 6.56 
128.51 131.38 2.18 
169.44 176.33 1.10 
232.68 269.60 2.89 
102.57 103.09 0.50 
23.12 23.46 1.45 
2.10 2.19 4.11 
10 31.28 32.90 4.92 
11 52.26 53.81 2.88 
12 30.80 31.49 2.19 
2020 134.89 135.71 0.60 
88.28 92.55 4.61 
137.88 140.25 1.68 
73.37 79.98 8.26 
140.76 142.27 1.06 
378.16 380.36 0.58 
566.39 580.05 2.36 
123.73 139.89 1.16 
228.77 231.82 1.32 
10 124.08 126.46 1.88 
11 73.77 79.51 7.22 
12 21.10 21.77 2.08 
Figure 6

Model forecasting curve.

Figure 6

Model forecasting curve.

Close modal

Figure 6 and Table 2 show that, for the three cities in eastern Hubei, the projected values of rainfall data for 2019–2020 are essentially in line with the actual values. Extreme weather in individual months can cause this phenomenon, even though the error of individual months can exceed 5%. Moreover, the results above indicate that the combined model can predict rainfall with high follow-through and volatility. Since all relative mistakes are less than 20% and the greatest and minimum relative errors are separated into 8.26 and 0.22%, it is possible to choose multiple models for forecasting by dividing the high-frequency and low-frequency components and verifying that the forecasting model constructed in this paper with the combined model has high accuracy and smoothness.

The combined model shows good results in rainfall forecasting in this paper. To illustrate the model's supremacy established in this paper, a single neural network VMD–PSO–SVM model, the ICEEMDAN–PSO–SVM model, and the VMD–ICEEMDAN–BiLSTM (bi-directional LSTM, BiLSTM) are used to compare with the model of this paper, respectively, and the city of Ezhou is taken as an example in this section. The model's predicted outcome is contrasted with the original sequence in Figure 7, and its error is shown in Table 3 and Figure 8.
Table 3

Comparison of the relative errors of the combined model with other models

YearMonthV-I-P-S:VMD-ICEEMDAN-PSO-SVMV-I-B:VMD-ICEEMDAN-BiLSTMI-P-S:ICEEMDAN-PSO-SVMV-P-S:VMD-PSO-SVM
2019 1.74 27.38 14.11 3.01 
0.56 13.03 8.69 3.43 
6.56 26.23 7.46 2.61 
2.18 29.62 22.06 1.18 
1.10 5.82 14.52 8.94 
2.89 30.22 11.04 2.10 
0.50 25.34 13.86 3.99 
1.45 25.72 25.32 7.53 
4.11 28.23 14.72 27.58 
10 4.92 30.14 10.94 21.36 
11 2.88 11.56 41.48 15.48 
12 2.19 22.38 13.05 33.64 
2020 0.60 19.85 21.88 1.58 
4.61 38.00 20.95 8.83 
1.68 26.52 2.16 1.64 
8.26 17.96 28.27 6.15 
1.06 9.25 15.48 2.11 
0.58 11.09 19.18 1.71 
2.36 23.96 24.53 2.15 
1.16 10.87 37.25 0.72 
1.32 23.23 28.62 6.61 
10 1.88 5.95 27.76 3.14 
11 7.22 17.27 34.43 16.20 
12 2.08 72.30 1.44 16.31 
YearMonthV-I-P-S:VMD-ICEEMDAN-PSO-SVMV-I-B:VMD-ICEEMDAN-BiLSTMI-P-S:ICEEMDAN-PSO-SVMV-P-S:VMD-PSO-SVM
2019 1.74 27.38 14.11 3.01 
0.56 13.03 8.69 3.43 
6.56 26.23 7.46 2.61 
2.18 29.62 22.06 1.18 
1.10 5.82 14.52 8.94 
2.89 30.22 11.04 2.10 
0.50 25.34 13.86 3.99 
1.45 25.72 25.32 7.53 
4.11 28.23 14.72 27.58 
10 4.92 30.14 10.94 21.36 
11 2.88 11.56 41.48 15.48 
12 2.19 22.38 13.05 33.64 
2020 0.60 19.85 21.88 1.58 
4.61 38.00 20.95 8.83 
1.68 26.52 2.16 1.64 
8.26 17.96 28.27 6.15 
1.06 9.25 15.48 2.11 
0.58 11.09 19.18 1.71 
2.36 23.96 24.53 2.15 
1.16 10.87 37.25 0.72 
1.32 23.23 28.62 6.61 
10 1.88 5.95 27.76 3.14 
11 7.22 17.27 34.43 16.20 
12 2.08 72.30 1.44 16.31 
Figure 7

Comparison between the forecasting results of various algorithms and the original data.

Figure 7

Comparison between the forecasting results of various algorithms and the original data.

Close modal
Figure 8

Model relative error comparison chart.

Figure 8

Model relative error comparison chart.

Close modal
Figure 9

Taylor distribution for different sampling methods.

Figure 9

Taylor distribution for different sampling methods.

Close modal

Figure 8 illustrates how poorly the prediction model with no additional decomposition processing performs in terms of forecasting effect. This is particularly evident at each wave peak, where there is a significant variation from the initial value. Following noise reduction processing, the prediction model's performance is superior to that of the model without secondary decomposition. The predicted outcome aligns with the actual one, and the trend of change closely matches the observed trend. The PSO-optimized SVM model is more accurate than the other models. The best-performing combined model developed in this work is in line with the original rainfall sequence's trend, and the predicted data are basically in line with the original data, suggesting that the forecast through the secondary decomposition has a good effect.

The evaluation metrics of the four models are shown in Table 4.

Table 4

Data of indicators for the evaluation of the four models

ModalMAERMSEMAPER²
VMD–ICEEMDAN–PSO–SVM 1.27 15.04 3.30 0.96 
VMD–ICEEMDAN–BiLSTM 3.59 32.88 6.54 0.93 
ICEEMDAN–PSO–SVM 6.24 53.62 10.67 0.91 
VMD–PSO–SVM 12.70 75.69 18.13 0.86 
ModalMAERMSEMAPER²
VMD–ICEEMDAN–PSO–SVM 1.27 15.04 3.30 0.96 
VMD–ICEEMDAN–BiLSTM 3.59 32.88 6.54 0.93 
ICEEMDAN–PSO–SVM 6.24 53.62 10.67 0.91 
VMD–PSO–SVM 12.70 75.69 18.13 0.86 

SVM has been applied practically to a range of meteorological forecasting problems. A multi-timescale SVM technique, which accounts for variations in meteorological factors at multiple stations and times, has been used in certain studies to predict local short-term rainfall. This strategy increases the accuracy of short-term rainfall forecasts. Nevertheless, the SVM model is sensitive to parameter choice and has a limited capacity to handle massive amounts of data. Nevertheless, these drawbacks are compensated for by the SVM model following quadratic decomposition and optimization.

From Figure 9 and Table 4, it is obvious that the relative error of the combined model for this study is far superior to the other models, which indicates that the combined model is highly accurate and has a small error that can be utilized for rainfall forecasting in order to further assess the performance of the four forecasting models. The downside is that the assumption of consistency in rainfall has been broken due to methodological limitations, and extreme rainfall in the city poses a significant amount of uncertainty.

This study uses three urban areas in Hubei Province as its study subject. The study adopts the SVM model, which is simulated after quadratic decomposition and optimized using PSO to predict rainfall and uses measured data to verify the model parameters. The principal conclusions are as follows:

  • (1) Utilizing the concept of decomposition and integration, the original rainfall time series are first decomposed. Then the combined network model makes individual forecasts for each sub-sequence, and finally, the forecasting results are summed. Compared with individual network models, the forecasting performance is significantly improved, and the accuracy of predicting the rainfall time series is notably higher in the model after secondary decomposition than in the model without it.

  • (2) Sequence decomposition is a fundamental step in making complex rainfall sequence predictions. In this work, VMD and ICEEMDAN secondary decomposition are used to break down the original data. ICEEMDAN adds a finite amount of adaptive white noise to each decomposition process, which lowers the reconstruction error by reducing the number of iterations. The components bear a striking resemblance to the original sequence, indicating the decomposition method's good smoothness, reduced volatility, and better stability, all of which improve the circumstances for model coupling. Following secondary decomposition and forecast reconstruction, the linked model can significantly increase accuracy. The RMSE of the combined model is 15.04, the average absolute error is 1.27, the average absolute percentage error is 3.30%, and the coefficient of determination is 96%, which is the smallest value of each error and the largest correlation coefficient. In comparison to the other four models, the combined forecasting model has the best results, which shows that it is effective and feasible to apply this model to rainfall forecasting.

  • (3) The combined model is more accurate than the other three models in predicting rainfall in eastern Hubei Province, according to a comparative analysis of the four models. However, parameter tuning remains a complex process and can be further optimized using computer algorithms. In this study, the model only focuses on rainfall forecasting; its performance under other conditions, such as flow and climate variables, requires further investigation. More research is needed to evaluate how well the model performs in other areas, such as flow and climate.

  • (1) This study adopts the idea of secondary decomposition compared to the traditional primary decomposition. Compared with the traditional EMD method, ICEEMDAN adopts an adaptive noise algorithm to remove the noise from variables that have been decomposed by VMD for the first time, which can reduce the modal aliasing problem in the decomposition process. In addition, the complete integration strategy can ensure that the decomposed signal retains all the information of the original signal, which improves the accuracy of the decomposition.

  • (2) A combined model is based on secondary decomposition and the optimized SVM machine learning algorithm for regional rainfall forecasting. The hyperparameters of the SVM affect the performance of the SVM, and in order to prevent the SVM from overfitting the training data, the optimal parameters need to be combined for a specific data pattern. If it is based on manual settings, it will be greatly affected by human expertise and other experiences, resulting in poor model performance. Therefore, after the optimization of the PSO algorithm, the best combination of parameters is automatically searched in the training set, thus achieving improved model results.

This study was supported by the North China University of Water Resources and Hydropower (NCUWH) Graduate Student Innovation Ability Enhancement Project (NCWUYC-202416019); the National Natural Science Foundation of China (Grant No. 51779093); the Support Program for Scientific and Technological Innovation Teams in Universities of Henan Province (24IRTSTHN012), and the Key Scientific Research Project of Universities of Henan Province (CN) (Grant No. 17A570004).

All authors contributed to the study's conception and design. Writing and editing were done by X.Z. and W.C.; preliminary data collection was conducted by Y.Z., J.Z., and H.R. All authors read and approved the final manuscript.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Bagirov
A. M.
&
Mahmood
A.
(
2018
)
A comparative assessment of models to predict monthly rainfall in Australia
,
Water Resources Management
,
32
,
1777
1794
.
https://doi.org/10.1007/s11269-018-1903-y
.
Chaurasia, K., Tarun, U., Sarala, G. V. & Soni, K. (2020) AI based prediction of daily rainfall from satellite observation for disaster management. In SPIE Future Sensing Technologies (Vol. 11525, pp. 176–188). SPIE.
Colominas
M. A.
,
Schlotthauer
G.
&
Torres
M. E.
(
2014
)
Improved complete ensemble EMD: a suitable tool for biomedical signal processing
,
Biomedical Signal Processing and Control
,
14
,
19
29
.
Ghaderpour
E.
,
Dadkhah
H.
,
Dabiri
H.
,
Bozzano
F.
,
Scarascia Mugnozza
G.
&
Mazzanti
P.
(
2023
)
Precipitation time series analysis and forecasting for Italian regions
,
Engineering Proceedings
,
39
(
1
),
23
.
https://doi.org/10.3390/engproc2023039023
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
, Yen, N.-C., Tung, C. C. &
Liu
H. H.
(
1998
)
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
,
Proceedings of the Royal Society of London, Series A: Mathematical, Physical and Engineering Sciences
,
454
(
1971
),
903
995
.
Kharrat
A.
,
BenMessaoud
M.
&
Abid
M.
(
2014
)
Brain tumour diagnostic segmentation based on optimal texture features and support vector machine classifier
,
International Journal of Signal and Imaging Systems Engineering
,
7
(
2
),
65
74
.
Nourani
V.
,
Kisi
Ö.
&
Komasi
M.
(
2011
)
Two hybrid artificial intelligence approaches for modeling rainfall–runoff process
,
Journal of Hydrology
,
402
(
1–2
),
41
59
.
Qin
W. Z.
(
2021
)
A SARIMA-CEEMD-LSTM model for predicting long-term precipitation
,
Qingdao University
,
2021
.
doi:10.27262/d.cnki.gqdau.2021.001858
.
Shaker Reddy
P. C.
&
Sureshbabu
A.
(
2020
)
An enhanced multiple linear regression model for seasonal rainfall prediction
,
International Journal of Sensors Wireless Communications and Control
,
10
(
4
),
473
483
.
Shin
K. S.
&
Han
I.
(
2001
)
A case-based approach using inductive indexing for corporate bond rating
,
Decision Support Systems
,
32
(
1
),
41
52
.
Sibtain
M.
,
Li
X.
,
Nabi
G.
,
Azam
M. I.
&
Bashir
H.
(
2020
)
Development of a three-stage hybrid model by utilizing a two-stage signal decomposition methodology and machine learning approach to predict monthly runoff at Swat River Basin, Pakistan
,
Discrete Dynamics in Nature and Society
,
2020
(
1
),
7345676
.
Xiong
Q. F.
&
Zeng
X. Q.
(
2008
)
Application and improvement of SVM method in precipitation forecast
,
Meteor. Mon
,
34
(
12
),
90
95
.
Xiong
W.
,
Chen
X.
&
Li
H.
(
2016
)
Rainfall prediction based on the HP-ENN-LSSVM model
,
Journal of Yuxi Normal University
,
32
(
04
),
51
56
.
Xu
B.
,
Yang
F.
&
Li
Y.
(
2020
)
Application of two ensemble learning algorithms in medium- and long-term runoff forecasting
,
Hydropower
,
46
(
4
),
21
34
.
Yao
B.
,
Hu
P.
,
Zhang
M.
&
Jin
M.
(
2014
)
A support vector machine with the tabu search algorithm for freeway incident detection
,
International Journal of Applied Mathematics and Computer Science
,
24
(
2
),
397
404
.
Yu
Q.
,
Xu
C.
,
Li
S.
,
Liu
H.
,
Song
Y.
&
Liu
X. O.
(
2016
)
Application of fuzzy clustering algorithm and support vector machine to short-term forecasting of PV power
. In:
Proceedings of the CSU-EPSA
, Vol.
28
, No.
12
, pp.
115
118
.
Zaghloul
M. S.
,
Ghaderpour
E.
,
Dastour
H.
,
Farjad
B.
,
Gupta
A.
,
Eum
H.
,
Achari
G.
&
Hassan
Q. K.
(
2022
)
Long term trend analysis of river flow and climate in Northern Canada
,
Hydrology
,
9
(
11
),
197
.
https://doi.org/10.3390/hydrology9110197
.
Zhang
P.
,
Sun
W.
,
Xiao
P.
,
Yao
W.
&
Liu
G.
(
2022
)
Driving factors of heavy rainfall causing flash floods in the Middle Reaches of the Yellow River: a case study in the Wuding River Basin, China
,
Sustainability
,
14
(
13
),
8004
.
https://doi.org/10.3390/su14138004
.
Zhao
Z.
&
Yu
Y.-b.
(
2021
)
Ultra-short-term wind speed prediction based on quadratic decomposition and IGWO-LSSVR model
,
Electric Power Science and Engineering
,
37
(
5
),
18
25
.
Zhu
G.
,
Zeng
X.
,
Gong
Z.
,
Gao
Z.
,
Ji
R.
,
Zeng
Y.
&
Lu
C.
(
2023
)
Monitoring robot machine tool sate via neural ODE and BP-GA
,
Measurement Science and Technology
,
35
(
3
),
036110
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).