A novel rainfall prediction model based on CEEMDAN-PSO-ELM coupled model

Rainfall prediction is a very important guideline for water resources management as well as ecological protection, and its changes are the result of multiple factors with obvious uncertainties and nonlinearities. Based on the advantages of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) non-smooth signal decomposition, the Particle Swarm Optimization (PSO) can be used to optimize the input weights and thresholds of the Extreme Learning Machine (ELM), which can effectively improve the prediction effect and accuracy of ELM, and a rainfall prediction model based on CEEMDAN-PSO-ELM is constructed. The model is applied to the monthly rainfall prediction of Zhongwei city, and the results show that the CEEMDAN-PSO-ELM coupled model has a high prediction accuracy, the mean absolute error (MAE) is 1.29, relative percentage error (RPE) is 0.45, root mean square error (RMSE) is 1.43 and the nash ef ﬁ ciency coef ﬁ cient (NSE) is 0.93. It has obvious advantages in hydrological simulation prediction when compared and analyzed the deep Long-Short Term Memory (LSTM), PSO-ELM coupled model and ELM model.


INTRODUCTION
Short-term changes in rainfall often cause droughts or floods and have different degrees of impact on local economic development. Scientific and accurate prediction of rainfall can provide technical support for regional ecological environmental protection, flood prevention and mitigation (Chen et al. 2017;Cheng et al. 2019). Influenced by a variety of uncertainties, rainfall has significant nonlinearity and uncertainty, and the prediction accuracy is often low (Zhang et al. 2021). Using new models and methods to improve the accuracy of rainfall prediction has become the focus and difficulty of research in the field of water resources management and control (Peng et al. 2015;Kumar et al. 2019). At present, domestic and foreign scholars have conducted a lot of research on rainfall prediction methods and achieved fruitful results. Wmra et al. (2021) used machine learning methods to predict rainfall in Malaysia. Deepak Kumar et al. (Jinle et al. 2019) analyzed long time hydrological series and used (Recurrent Neural Network, RNN) training model to predict rainfall. Wang et al. (2021) used CEEMDAN for data preprocessing and coupled with Adaptive Metropolis-Markov Chain Monte Carlo algorithm (AM-MCMC). The results showed that its method was effective and led to improved model prediction accuracy. Liu et al. (2019) used an improved BP model for short-term rainfall prediction. Yadav et al. (2017) used ELM for short-term rainfall prediction and compared it with other classical traditional algorithms and concluded that ELM exhibited small correlation coefficients and errors. Anupam & Pani (2020) used ELM to simulate short-term floods in India to provide a scientific basis for local government decision making. Zhang et al. (2019) used PSO to optimize the ESN model and predicted the urban precipitation problem with good results. In order to improve the prediction accuracy of the rainfall model, Hao & Zhu (2021) introduced the slope and quantile on the basis of the existing gray waveform prediction model and constructed a non-equal interval slope gray waveform prediction model. From the above, it can be seen that researchers at home and abroad have studied rainfall prediction models mainly by traditional single machine learning methods or neural networks, and for ELM most of them simply adjust the model parameters to improve the model accuracy. However, there are fewer studies on processing time series before prediction to reduce their non-smoothness and then optimizing the models. CEEM-DAN in the field of signal processing has powerful decomposition capability, which is a method of improved HHT, suitable for decomposing nonlinear and non-stationary signals. Its most important feature is that it can extract each component of the signal and its change trend in an adaptive way, add Gaussian white noise and change the polar point characteristics to make the decomposed component data more continuous. In order to alleviate the influence of modal aliasing on the data decomposition results and effectively solve the problem that the sum of decomposition results of EEMD is not equal to the original sequence, CEEMDAN algorithm adds adaptive white noise sequences at each stage of data decomposition, which effectively alleviates the modal aliasing phenomenon and eliminates the influence of artificially added white noise on the completeness of the original sequence data, improves the completeness of data decomposition, and reduces the data reconstruction cost. PSO does not depend on the problem information, uses real numbers to solve, and has strong algorithmic generality. The PSO algorithm is simple and easy to implement, which is the biggest advantage of the PSO algorithm. Compared with the traditional feedforward neural network, which has disadvantages such as slow training speed, easy to fall into local minima, and sensitive selection of learning rate, the ELM method has advantages such as fast learning speed and good generalization performance compared with the previous traditional training methods. The paper constructs a rainfall prediction model based on CEEMDAN-PSO-ELM coupled model and applies it to monthly precipitation prediction in Zhongwei city.

CEEMDAN
The rainfall time series data is decomposed by CEEMDAN to obtain multiple IMF components and trend terms (Trend) to further determine the periodic trend. Based on the spectrum of each decomposition layer, the effective signal is extracted and the high frequency random noise is eliminated to smooth the series with the following main decomposition step (Zhou et al. 2021).
(1) Let x(t) be the original series data (2) The first-order residual r i (t): (3) r i (t) þ 1 i E 1 (v i (t)) is achieved until condition (1) is satisfied, at which point the overall mean is defined as IMF 2 (t) and the formula is calculated as follows: (4) Calculate the residual r k (t) of K. The formula is as: is extracted and the overall mean IMF (kþ1) (t) is calculated .The formula is as follows: (6) Repeat steps (4) (5) until the residuals can no longer be decomposed to obtain the final residual R(t) as: Then the final expression of the original sequence x(t) is: where, I is total number of times. v i (t) is the zero-mean Gaussian white noise with unit variance. 1 k is the noise factor controlling the signal-to-noise ratio of the added noise to the original signal. E k is a well-defined operator.
K is the total number of IMF k (t).
The implementation of the CEEMDAN algorithm and Equation (7) shows that the decomposition process is complete and allows for an accurate reconstruction of the original signal. The algorithm implementation is able to select the appropriate signal-to-noise ratio at each modal decomposition stage by means of the coefficient 1 k .

PSO-ELM
The PSO algorithm was proposed by Kennedy and Eberhart as an algorithm to avoid searching for locally optimal solutions (Wang & Fan 2010). In the PSO algorithm there is the stipulation that each solution represents a particle in the spatial solution, and each particle will have a random initial velocity and initial position, and the algorithm can find a corresponding initial fitness value based on the initial velocity and initial position. Each particle will move randomly according to its own state and the state of other particles, and in the process of movement, it will write down the best position it finds, and the best position is the optimal solution.
ELM (Luo et al. 2018) is a learning algorithm constructed based on feed-forward neural networks, which has the advantages of easy-to-use models, simple parameter selection and good generalization performance over traditional neural networks. For If the number of neurons in the hidden layer is n, the standard form is as follows: According to the zero error approximation principle, there are parameters ai, bi, bi in this feedforward neural network such that: where F p (x) is the output vector. b i is the weight vector between the ith hidden layer node and the output layer node. G(x) is the excitation function. v i is the weight vector between the ith hidden layer node and the input layer node. b i is the neuron bias vector in the hidden layer. Equation (9) can be reduced to: Hb ¼ Y, with H being the output matrix of the hidden layer. The output matrix H is determined by randomly determining the queue and weights in the ELM. b can be obtained by solving for the equation H þ denotes the Moore Penrose generalised inverse of the output matrix of the hidden layer.

CEEMDAN-PSO-ELM
ELM generates the connection weights between the input and hidden layers and the thresholds of the neurons in the hidden layer randomly, and there is no need to add human subjective will to modify them during the training and fitting process, and only the number of neurons in the hidden layer needs to be set so as to obtain the unique optimal solution (Tiwari & Adamowski 2014). The RMSE of the training samples is used as the fitness value. The smaller the fitness value, the better the input weights and thresholds are proved to be and the better the prediction is. The process of which is shown in Figure 1.
The time series of rainfall is U(t) (t ¼ 1, 2, . . . , N) and the basic process of prediction is as follows (Gu et al. 2011;Zhu et al. 2017;Wang et al. 2019).
(1) Decompose the rainfall time series with the CEEMDAN algorithm to obtain n IMF components Ci(t) (t ¼ 1, 2, . . . , N) and a trend term Trend.
(2) The decomposed several IMF components and residuals are divided into two parts, the training set and the prediction set, and the PSO-ELM coupled model is initialized and the appropriate parameters are selected. (3) Establish the PSO-ELM neural network topology.
(4) Generate the population. Including the weight matrix ω and the implied layer queues. (5) Obtain the optimal parameters. The sequences were trained according to the PSO-ELM coupled model to obtain the optimal model parameters, mainly including, maximum number of iterations T ¼ 200, population size M ¼ 20, learning factor c 1 ¼ c 2 ¼ 2, r 1 and r 2 as two randomly generated parameters in the range of (0,1), particle dimension D, etc. (6) Determine the RMSE of the training set as a function of the fitness value, calculate the fitness value of each particle.

Model validation
To verify whether the coupled CEEMDAN-PSO-ELM coupled model fits better than the single ELM as well as the PSO-ELM model, three metrics, MAE, RPE, RMSE, NSE of the actual and predicted values, are introduced to quantitatively assess the model accuracy. The specific equations are as follows (Hochreiter & Schmidhuber 1997;Zaher et al. 2018). Water Supply Vol 00 No 0, 5 Uncorrected Proof where, represents the number of forecast years. C o (i) and C f (i) represent the actual and predicted rainfall values, respectively. C o denotes the average value of actual rainfall.

Data sources
The data source is from the water resources bulletin of Zhongwei City, whose rainfall variation is shown in Figure 2. In this study, the monthly rainfall data of Zhongwei City for 18 years (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) were used as training samples and the rainfall for the two years of 2017-2018 were predicted. We can see from Figure 2 that the data shows an overall decreasing trend. The decreasing process is accompanied by certain volatility, showing uneven distribution. Undoubtedly, this reflects that it is reasonable to choose the CEEMDAN decomposition method and PSO optimization algorithm to process the data.

PSO-ELM
To analyse and verify the applicability and feasibility of the ELM model, rainfall data from 1999 to 2016 were used as training samples to predict the rainfall values for 2017-2018 and compared with the actual measured values. In Figure 5, we can see that the overall error of ELM prediction is large. ELM model prediction curve is shown in Figure 3. The results of the ELM prediction after optimization of the PSO algorithm are shown in Figure 4. The results of quantitative analysis of prediction effect of PSO optimised ELM model using the evaluation index are shown in Table 1. For a more intuitive view of the error comparison, it is plotted as a stacked graph, as shown in Figure 5.

Uncorrected Proof
From Table 1 and Figure 5, it is clear that the model accuracy is significantly improved after optimization of PSO. The RPE of PSO-ELM is 2.77, which is 2.28 times more accurate than ELM. Meanwhile, the MAE of PSO-ELM is 3.56 which is 1.57 times more accurate than ELM. The reason for this is that, for the sake of addressing the shortcomings of the ELM algorithm, which generates random input weights and implied layer thresholds during training, the global search capability of the PSO algorithm is used to optimally solve for the input weights and implied layer thresholds. The RMSE between the output value of the training and desired output value are used as the fitness function of the PSO algorithm to improve the accuracy of prediction.

CEEMDAN-PSO-ELM
In order to improve the prediction accuracy, the CEEMDAN decomposition of Zhongwei rainfall from 1999 to 2018 were carried out, with the noise variance taken as 0.1 and the noise count taken as 200, and six IMF components and one trend term were obtained. The decomposition diagram obtained by CEEMDAN decomposition is shown in Figure 6.

Uncorrected Proof
Different IMF can reflect the oscillation characteristics of rainfall sequences at different scales, respectively. As can be seen from Figure 6, the frequency of IMF 1 is highest and the wavelength is shortest. Compared with IMF 1 , the components IMF 2 and IMF 3 , which become lower in frequency and longer in wavelength, IMF 1 , IMF 2 and IMF 3 are defined as high-frequency components, and the remaining components are low-frequency components. IMF 4 and IMF 5 have smaller amplitudes, lower frequencies and longer wavelengths than the first two components. The component IMF 6 has the smallest amplitude, the lowest frequency, The trend term represents the overall trend of rainfall time series in Zhongwei, and the rainfall series shows a trend of first increasing and then gradually decreasing during the whole observation period.
After pre-processing the original rainfall time series, the PSO-ELM coupled model was used to predict the low and high frequency components and trend items respectively. The specific parameters of the PSO-ELM coupled model were selected as: the population size was 20, the acceleration factor c 1 was 1.3 and c 2 was 1.8. The number of neurons was determined by trial-and-error method. 3 hidden layer neurons were selected for the high frequency part of PSO-ELM, and 10 hidden layer neurons were selected for both the low frequency and trend term parts of PSO-ELM. The final calculation results of the model are shown in Figure 7.
In Table 2, the absolute error maximum, minimum and average values of IMF 1 are larger, 8.21 mm, 1.35 mm and 5.69 mm respectively, which indicates that the non-smoothness is higher. There is no doubt that the prediction results are more easily affected by high-frequency signals. The trend term absolute error maximum, minimum and average values are smaller, 0.10 mm, 0.00 mm and 0.05 mm respectively, which indicates that the signal of lower frequency is relatively smooth and Uncorrected Proof has a smaller impact on the prediction error. After the time series is decomposed by CEEMDAN, the IMF component and trend term tend to be smooth, and these smooth data are used for more accurate prediction by ELM model after PSO optimization. We can see from Figure 8 and Table 3 that the data predicted values are basically consistent with the real values, the overall relative error is small, the prediction passing rate is high, and the CEEMDAN-PSO-ELM coupled model has high prediction accuracy. The RPE of CEEMDAN-PSO-ELM coupled model is 0.45, which is 6.16 times more accurate than PSO-ELM. Meanwhile, the MAE of CEEMDAN-PSO-ELM coupled model is 1.29 which is 2.76 times more accurate than PSO-ELM.
Here it can also be seen that the pre-processing of the data has a very significant impact on the improvement of the model prediction accuracy.

DISCUSSION
Meanwhile, for further verify the predictive effect of CEEMDAN-PSO-ELM coupled model, LSTM, ELM and PSO-ELM coupled models were adopted to predict the rainfall of Zhongwei. The results are shown in Figure 9 and Table 4.
After comparative analysis, we know that: the predicted values of the CEEMDAN-PSO-ELM coupled model are in high agreement with the measured values with the smallest error, and the NSE is closer to 1. The MAE, RPW, RMSE, and NSE of ELM are 4.92 mm, 6.32%, 5.87 mm, and 0.79, respectively, which are more accurate than LSTM. This indicates that ELM has a stronger learning ability than LSTM for rainfall, which laterally reflects that it is reasonable to choose ELM. Meanwhile, the MAE, RPE, RMSE and NSE of CEEMDAN are 1.29 mm, 0.45%, 1.43 mm, and 0.93, respectively, and the accuracy is at least 7 times better compared with ELM. Compared with LSTM, ELM, and PSO-ELM coupled models, the results of CEEMDAN-PSO-ELM coupled model are better, which indicates that CEEMDAN-PSO-ELM coupled model -ELM coupled model is more advantageous in rainfall prediction. The reason is as follows: The original sequence data are decomposed into components of different frequencies by CEEMDAN. Meanwhile, the magnitude of the prediction error of these components determines the final error of the model. After processing these raw data by CEEMDAN, the data have smoothness, and at this time, to address the shortcomings of the ELM algorithm training that randomly generated input Uncorrected Proof weights and implied layer thresholds lead to poor prediction effect and stability, the global search capability of the PSO algorithm is used to optimally solve the input weights and implied layer thresholds. The RMSE between the output value of the training sample and the desired output value is used as the fitness function of the PSO algorithm to improve the prediction accuracy of ELM, and the overall prediction effect will not be greatly affected even if the prediction effect of some years is not good.

CONCLUSION
(1) CEEMDAN can extract and express the non-smooth oscillation features that will be associated with the rainfall time series, which can reduce the interference of feature information and the non-smoothness of the series at different     Uncorrected Proof scales, and PSO can optimize the input weights and thresholds of ELM, which makes the model prediction accuracy greatly improved.
(2) By comparing the prediction results with those of LSTM model, ELM model and PSO-ELM coupled model, MAE of CEEMDAN-PSO-ELM coupled model is 1.29 mm, RPE is 0.45%, the RMSE is 1.43 mm, and NSE is 0.93, the model accuracy high, and the prediction effect is better than the LSTM model, ELM model and PSO-ELM coupled model. (3) The established CEEMDAN-PSO-ELM coupled model has high generalization ability as well as prediction accuracy, which can provide reference for the formulation of relevant environmental management policies. Since there are many factors influencing rainfall and the driving mechanism is complex, any factor changes will lead to the prediction effect of the model. How to carry out prediction based on the physical mechanism of rainfall changes and integrate multiple factors is the focus of the next research.