Abstract
Scientific precipitation predicting is of great value and guidance to regional water resources development and utilization, agricultural production, and drought and flood control. Precipitation is a nonlinear, non-smooth time series with significant stochasticity and uncertainty. In this paper, a complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) with long short-term memory (LSTM) model is developed for predicting annual precipitation in Zhengzhou city, China, which is compared with a single LSTM model, an ensemble empirical mode decomposition–LSTM model, a complementary ensemble empirical mode decomposition–LSTM model, and a CEEMDAN–autoregressive integrated moving average and a CEEMDAN–recurrent neural network model. The results show that the mean absolute percentage error, root mean square error, and coefficient of determination of the coupled CEEMDAN–LSTM model are 2.69%, 17.37 mm, and 0.9863, respectively. The prediction accuracy is significantly higher than that of the other five models, indicating that the proposed model has high prediction accuracy and can be used for annual precipitation forecasting in Zhengzhou city.
HIGHLIGHTS
The CEEMDAN method adds adaptive Gaussian white noise in the decomposition, which effectively reduces the reconstruction error.
LSTM can effectively overcome the gradient explosion problem of recurrent neural networks and has significant advantages in handling long time series data.
The CEEMDAN–LSTM coupled model has good learning ability in dealing with nonlinear and non-smooth hydrological factor sequences.
INTRODUCTION
Precipitation is a common weather phenomenon that exists in nature, and the amount of precipitation affects agricultural production, and drought and flood prevention and control (Zhao et al. 2011). Too much precipitation in a short period of time is the main factor that induces natural disasters such as floods and mudslides, while too little precipitation can cause drought, resulting in reduced yields in agriculture, forestry, and fisheries and direct economic losses to society. Since precipitation is a nonlinear and nonstationary time series (Ma & Liu 2007), accurate and scientific prediction of precipitation can provide information support for decision-making in social production activities since a high-precision precipitation prediction method can identify the changing patterns of precipitation in a timely manner.
Statistical methods and machine learning are common data-driven time series forecasting methods. In terms of statistical methods, the most popular one in recent years is based on autoregressive integrated moving average (ARIMA) (Li et al. 2020; Coban et al. 2021). It is shown that when the time series is linear or nearly linear, statistical models can produce satisfactory prediction results (Djamal & Priatna 2020; Xu et al. 2020). However, when time series exhibit nonlinearity, their prediction results are often unsatisfactory. In view of this, machine learning methods suitable for modeling complex nonlinear processes are widely used in time series prediction models. However, traditional machine learning methods cannot capture the memory of the input sequence (Shen 2018), which affects the prediction accuracy. Recurrent neural networks (RNN) in deep learning overcomes these drawbacks and is widely used in several fields (Zhang & Cui 2020; Guo et al. 2021; Wang et al. 2021).
Traditional RNN shows great potential in data processing (Andrew & Jon 2019); however, it can only remember part of the sequence, and once the time series is too long, its accuracy of prediction or classification will be reduced. Therefore, a special kind of RNN is proposed, namely a long short-term memory (LSTM) model (Sepp & Jürgen 1997). Yuan et al. (2021) constructed an LSTM-based rolling forecast method for typhoon intensity. The comparison results show that the LSTM model using the optimal predictor performs the best and has the smallest prediction error. Shen et al. (2020) conducted a study on LSTM-based summer precipitation prediction in China. The experimental results show that the LSTM prediction results have some advantages over other models for seasonal prediction using LSTM networks by considering the mean value and root mean square error (RMSE) together. Liu et al. (2020) used LSTM to predict monthly precipitation on the Qinghai–Tibet Plateau. This study analyzed the effect of different prediction lengths on the prediction accuracy of the models and found that the accuracy of other models decreased as the prediction length increased, but the prediction accuracy of LSTM is higher than that of RNN, non-linear autoregressive, sparrow search algorithm, and ARIMA models under different prediction lengths. Kang et al. (2020) selected a multi-input variable LSTM model to predict daily precipitation. After determining the correlation between meteorological variables and precipitation, nine important input variables were selected to construct the LSTM model, and the experimental results showed that the LSTM was suitable for precipitation prediction. In terms of algorithm improvement, Wang et al. (2019) used wavelet LSTM to predict the ultra-short-term probability of wind power. The results show that the combination of wavelet decomposition and deep learning methods can better improve the accuracy of prediction and the interval reliability of probabilistic prediction.
Currently, LSTM related prediction models are widely studied. Some results have been achieved in both model improvement and model application; however, none of the existing LSTM-based prediction models have solved problems such as lag that exist in LSTM, mainly because of the need to cyclically adjust the weights in LSTM training to produce gradient disappearance or explosion. Precipitation is a nonlinear, non-smooth complex dynamic system with a large amount of noise and outliers in its time series (Buttafuoco & Conforti 2021; Li et al. 2021). For annual precipitation prediction, a single-input, single-output method is used. Annual precipitation prediction needs to be denoised to improve the accuracy of the prediction. Empirical mode decomposition (EMD) solves the problem of wavelet decomposition not accommodating multi-segment non-stationary sequences by adaptively decomposing the signal without prior analysis of the signal. The components obtained from the EMD decomposition are called the intrinsic mode function (IMF), and the IMF frequency ranges from high to low, representing the characteristics of the sequence at different scales, allowing the EMD decomposition to be used for time series noise reduction. Ensemble empirical mode decomposition (EEMD) solves the EMD mode mixing problem by introducing Gaussian white noise into the signal for multiple EMD decompositions. The complementary ensemble empirical mode decomposition (CEEMD) and EEMD processes are similar, with the difference that EEMD only adds white noise each time. The CEEMD algorithm takes the original signal plus white noise and the original signal minus white noise and passes both signals through EMD at the same time to find the mean value, which is used to cancel out the noise added to the signal. Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) decomposition of adaptive noise adds a finite number of adaptive white noises at each stage, reducing the number of iterations and improving reconstruction accuracy compared to CEEMD. To solve the prediction lag phenomenon of machine learning-related time series prediction models, the CEEMDAN decomposition algorithm is used to decompose the time series into eigenmodal series representing different scale features, which makes it easier to determine the characteristics of low frequency and highly periodic series, while LSTM has higher prediction accuracy for low frequency and highly periodic series. Chen et al. (2022) used the CEEMDAN–LSTM model to predict the road-bed temperature in seasonal freezing areas, and the results showed that the CEEMDAN–LSTM model has high predictive power for the temperature time series. Cao et al. (2019) used the CEEMDAN–LSTM model to predict stock market prices and compared the prediction results with a single LSTM model, support vector machine, multi-layer perceptron, and other hybrid models, to show that the CEEMDAN–LSTM model has the highest prediction accuracy and good performance in predicting financial time series. In this paper, we propose a CEEMDAN–LSTM annual precipitation prediction model by decomposing the time series with CEEMDAN and pre-processing them before inputting them into the prediction model for training. The results show that the proposed coupled model can effectively improve the prediction accuracy of annual precipitation, providing a new way for current precipitation prediction.
THEORY AND METHODS
CEEMDAN
The CEEMDAN method adds adaptive Gaussian white noise to the decomposition (Liang et al. 2019). Each group of signals is averaged immediately after it is decomposed by CEEMDAN. This avoids the problem of the difficult alignment of the final set averaging caused by the difference in the decomposition results of each group of IMFs in CEEMD. It also prevents the bad decomposition of one of the orders of IMFs passing to the next order, which can influence the CEEMDAN method from effectively reducing the reconstruction error and improving the computational efficiency. The specific decomposition steps of CEEMDAN are as follows:
- (1)
- (2)
- (3)
- (4)
- (5)
- (6)
LSTM neural network




















CEEMDAN–LSTM coupled model
Because of the nonlinearity and non-stationarity of precipitation, it is difficult to find the variation pattern of precipitation using a single model, the CEEMDAN–LSTM coupled model is chosen to improve the prediction value accuracy in this paper. The flow chart of the CEEMDAN–LSTM coupling model is shown in Figure 2. First, multiple IMF components are obtained by decomposing CEEMDAN, and these IMF components have a certain stability. Then, the LSTM model is used to simulate the prediction of each IMF constituents, and the forecasts are summed up to obtain the predicted value of precipitation as below:
- (1)
CEEMDAN decomposition: the time series-based precipitation data were first decomposed using MATLAB software to obtain several IMF components as well as a residual.
- (2)
Dividing training and validation data: the training data of LSTM uses 60 years of precipitation data from 1951 to 2010 in Zhengzhou city, and the validation data uses precipitation data between 2011 and 2020.
- (3)
LSTM training: the LSTM is used to continuously debug the parameters to make sure that the accuracy is at a high level so that the prediction is optimal.
- (4)
Analysis of forecast results: the IMF components obtained from the forecast are summed cumulatively and the relative error is calculated.
Model evaluation



CASE STUDY
Overview of the study area
Data sources
Precipitation variation process in Zhengzhou city from 1951 to 2020.
Results of decomposition of precipitation series
CEEMDAN decomposition of precipitation during 1951–2020 in Zhengzhou city.
LSTM model construction


















Simulation results analysis
DISCUSSION
Comparison of simulation results between the CEEMDAN–LSTM coupled model and other models
Year . | Actual result (mm) . | LSTM . | EEMD-LSTM . | CEEMD-LSTM . | CEEMDAN–LSTM . | CEEMDAN–ARIMA . | CEEMDAN–RNN . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | ||
2011 | 706.5 | 596.9 | 15.51% | 606.4 | 14.17% | 782.3 | 10.73% | 722.00 | 2.19% | 741.8 | 4.93% | 680 | 3.75% |
2012 | 501.0 | 579.6 | 15.69% | 588.2 | 17.41% | 574.5 | 14.67% | 533.90 | 6.57% | 545.7 | 8.82% | 520.3 | 3.85% |
2013 | 353.2 | 439.4 | 24.41% | 446.2 | 26.33% | 374.6 | 6.06% | 320.30 | 9.31% | 346.9 | 1.84% | 369 | 4.47% |
2014 | 551.6 | 441.1 | 20.03% | 430.8 | 21.90% | 485.6 | 11.97% | 556.40 | 0.87% | 583.2 | 5.80% | 530.4 | 3.84% |
2015 | 689.1 | 581.8 | 15.57% | 588.2 | 14.64% | 604.1 | 12.33% | 685.00 | 0.59% | 644.7 | 6.53% | 710.7 | 3.13% |
2016 | 833.0 | 624.0 | 25.09% | 633.5 | 23.95% | 695.3 | 16.53% | 835.80 | 0.34% | 893.5 | 7.20% | 809.5 | 2.82% |
2017 | 598.8 | 523.7 | 12.54% | 532.3 | 11.11% | 539.7 | 9.87% | 602.20 | 0.57% | 562.2 | 6.01% | 615 | 2.71% |
2018 | 609.5 | 494.1 | 18.93% | 500.9 | 17.82% | 544.5 | 10.66% | 623.80 | 2.35% | 587.8 | 3.61% | 592 | 2.87% |
2019 | 480.2 | 532.8 | 10.95% | 522.5 | 8.81% | 455.4 | 5.16% | 485.30 | 1.06% | 462.0 | 3.75% | 459.7 | 4.27% |
2020 | 583.3 | 526.7 | 9.70% | 533.1 | 8.61% | 550.0 | 5.71% | 601.30 | 3.09% | 607.4 | 4.11% | 603.1 | 3.39% |
MAPE (%) | 16.84% | 16.47% | 10.37% | 2.69% | 5.26% | 3.51% |
Year . | Actual result (mm) . | LSTM . | EEMD-LSTM . | CEEMD-LSTM . | CEEMDAN–LSTM . | CEEMDAN–ARIMA . | CEEMDAN–RNN . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | Simulated result (mm) . | RE (%) . | ||
2011 | 706.5 | 596.9 | 15.51% | 606.4 | 14.17% | 782.3 | 10.73% | 722.00 | 2.19% | 741.8 | 4.93% | 680 | 3.75% |
2012 | 501.0 | 579.6 | 15.69% | 588.2 | 17.41% | 574.5 | 14.67% | 533.90 | 6.57% | 545.7 | 8.82% | 520.3 | 3.85% |
2013 | 353.2 | 439.4 | 24.41% | 446.2 | 26.33% | 374.6 | 6.06% | 320.30 | 9.31% | 346.9 | 1.84% | 369 | 4.47% |
2014 | 551.6 | 441.1 | 20.03% | 430.8 | 21.90% | 485.6 | 11.97% | 556.40 | 0.87% | 583.2 | 5.80% | 530.4 | 3.84% |
2015 | 689.1 | 581.8 | 15.57% | 588.2 | 14.64% | 604.1 | 12.33% | 685.00 | 0.59% | 644.7 | 6.53% | 710.7 | 3.13% |
2016 | 833.0 | 624.0 | 25.09% | 633.5 | 23.95% | 695.3 | 16.53% | 835.80 | 0.34% | 893.5 | 7.20% | 809.5 | 2.82% |
2017 | 598.8 | 523.7 | 12.54% | 532.3 | 11.11% | 539.7 | 9.87% | 602.20 | 0.57% | 562.2 | 6.01% | 615 | 2.71% |
2018 | 609.5 | 494.1 | 18.93% | 500.9 | 17.82% | 544.5 | 10.66% | 623.80 | 2.35% | 587.8 | 3.61% | 592 | 2.87% |
2019 | 480.2 | 532.8 | 10.95% | 522.5 | 8.81% | 455.4 | 5.16% | 485.30 | 1.06% | 462.0 | 3.75% | 459.7 | 4.27% |
2020 | 583.3 | 526.7 | 9.70% | 533.1 | 8.61% | 550.0 | 5.71% | 601.30 | 3.09% | 607.4 | 4.11% | 603.1 | 3.39% |
MAPE (%) | 16.84% | 16.47% | 10.37% | 2.69% | 5.26% | 3.51% |
Comparison of errors of different models
Models . | MAPE (%) . | RMSE (mm) . | R2 . |
---|---|---|---|
LSTM | 16.84 | 80.36 | 0.4837 |
EEMD-LSTM | 16.47 | 74.62 | 0.5274 |
CEEMD-LSTM | 10.37 | 50.79 | 0.7348 |
CEEMDAN–ARIMA | 5.26 | 35.55 | 0.9424 |
CEEMDAN–RNN | 3.51 | 20.43 | 0.9722 |
CEEMDAN–LSTM | 2.69 | 17.37 | 0.9863 |
Models . | MAPE (%) . | RMSE (mm) . | R2 . |
---|---|---|---|
LSTM | 16.84 | 80.36 | 0.4837 |
EEMD-LSTM | 16.47 | 74.62 | 0.5274 |
CEEMD-LSTM | 10.37 | 50.79 | 0.7348 |
CEEMDAN–ARIMA | 5.26 | 35.55 | 0.9424 |
CEEMDAN–RNN | 3.51 | 20.43 | 0.9722 |
CEEMDAN–LSTM | 2.69 | 17.37 | 0.9863 |
The comparison of the errors of the different models is shown in Table 2. Each model in the prediction that uses a single model for the decomposition mode of optimization has a decreasing RMSE and MAPE and a R2 closer to 1. The decomposition of the original precipitation series improves the smoothness of the data, indicating that the method of first decomposing, then reconstructing, and finally predicting precipitation data has obvious advantages over a single neural network prediction model. CEEMDAN completely separates the different fluctuation features in the precipitation series, solves the problems of modal confusion and residual noise in the reconstructed series during decomposition, and reduces the reconstruction errors. The LSTM model, on the other hand, solves the gradient disappearance problem due to a sharp decrease in the adjustment rate of the neural network parameters as a result of too small a weight or bias gradient of the neural network, and the gradient explosion problem as a result of too large a weight or bias gradient of the neural network, compared to the traditional ARIMA and RNN models.
CONCLUSION
- (1)
The CEEMDAN–LSTM coupled model has higher prediction accuracy than the single LSTM neural network model, and the CEEMDAN method is used to decompose the nonlinear and non-smooth time series into a more stable set of components, which makes it easier for the model to identify the change characteristics of each component and helps to improve the prediction accuracy. The CEEMDAN decomposition solves the EEMD decomposition and CEEMD decomposition in the problem of large reconstruction errors caused by modal confusion, and residual noise is greatly improved. The smoothness of the original series with certain regularity and trend provides a good basis for the LSTM neural network model to make predictions.
- (2)
In this paper, the advantages of CEEMDAN and LSTM are combined to construct a coupled CEEMDAN–LSTM model to predict the annual precipitation series of Zhengzhou city. The results show that the precipitation of Zhengzhou city from 1951 to 2020 shows strong volatility and an overall decreasing trend. The MAPE of the model prediction results is 2.69%, the RMSE is 17.37 mm, and the R2 is 0.9863, which meets the requirement of prediction accuracy, and the prediction results are reliable.
- (3)
The CEEMDAN–LSTM coupled model has an effective decomposition algorithm and a stable and fast prediction model, which has a broad application prospect. Based on the results, the CEEMDAN–LSTM coupled model can be used not only for the prediction of precipitation, but also for the prediction of other time series such as climate elements, river level, population and gross domestic product. In addition, meteorological factors such as temperature and barometric pressure can be added to the research of precipitation prediction to further improve the prediction accuracy, which is the direction and focus of future research.
AVAILABILITY OF DATA AND MATERIALS
Data and materials are available from the corresponding author upon request.
AUTHOR CONTRIBUTION
All authors contributed to the study conception and design. Writing and editing: Shaolei Guo and Yihao Wen; chart editing: Jiafeng Huang; preliminary data collection: Guoyu Zhu and Xianqi Zhang. All authors read and approved the final manuscript.
FUNDING
This work was supported by the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004].
ETHICAL APPROVAL
Not applicable.
CONSENT TO PARTICIPATE
Not applicable.
CONSENT TO PUBLISH
Not applicable.
COMPETING INTERESTS
None.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.