Abstract
The prediction of monthly precipitation is of great importance for regional water resources management and use. The monthly precipitation sequence is affected by various factors such as atmosphere, region and environment, and has obvious ambiguity, chance and uncertainty. CEEMD based on complementary ensemble empirical mode decomposition can effectively reduce the reconstruction error of time series, and bidirectional long short-term memory (BILSTM) model can effectively learn long-term dependencies in time series. A CEEMD-BILSTM (complementary integrated empirical mode decomposition-bidirectional long short-term memory) coupled model is constructed to predict the monthly precipitation in Zhengzhou, and the performances of the LSTM model, EEMD-LSTM model and EEMD-BILSTM model are compared. The CEEMD-BILSTM model has a maximum relative error of 7.28%, a minimum relative error of 0.00%, and an average relative error of 2.68%, with an RMS error of 2.6% and a coefficient of determination of 0.97 in predicting monthly precipitation in Zhengzhou, which is considered a good accuracy of the CEEMD-BILSTM model for predicting monthly precipitation in Zhengzhou. The model is better than the LSTM model, the EEMD-LSTM model, and the EEMD-BILSTM model and has better fitting ability. It also shows that it has strong nonlinear and complex process learning ability in the hydrological factor model of regional precipitation prediction.
HIGHLIGHTS
CEEMD is a novel data preprocessing method, which can effectively reduce the nonsmoothness of time series.
BILSTM models can efficiently learn long-term dependencies in time series.
CEEMD-BILSTM model has a higher prediction level, and the model is feasible for monthly precipitation prediction.
Graphical Abstract
INTRODUCTION
Precipitation forecasts are important for agricultural production and water use. Accurate prediction of future precipitation and full utilization of precipitation resources play an important role in industrial and agricultural production, water development, flood prevention and mitigation, and engineering management (Liu et al. 2020a, 2020b). Due to the nature of precipitation, precipitation in the study area is difficult, but solving such problems will help us deal with sudden rainfall. Just like Zhengzhou encountered a heavy rain last year, the precipitation prediction in the study area will help us In response to such emergencies, take measures to reduce losses. Monthly precipitation amounts are influenced by various atmospheric, regional, and environmental factors and are highly ambiguous, volatile, and uncertain. The study of monthly precipitation is a complex problem with several levels and orders. As for the research on monthly precipitation, scientists at home and abroad have contributed a lot to improve the forecast accuracy and optimize the forecast models, and have achieved fruitful results (Cui et al. 2008). Based on the precipitation observation data from 1981 to 2005 in the eastern part of Sichuan, Chen et al. (2021) used a long-term and short-term memory (LSTM) neural network to build a monthly precipitation prediction model to predict the monthly precipitation variations from 2006 to 2019, and compared it with the random forest (RF). Liu et al. (2020a, 2020b) proposed a genetic algorithm based automatic optimization model for LSTM hyperparameters – AutoLSTM. AutoLSTM is a good automatic optimization model for time series data prediction. Lu (2016) then improves the traditional GM(1,1) by introducing variable-weight coefficients to assign weight values to the variables of the traditional gray dynamic forecast model, and applies the improved GM(1,1) model to regional precipitation forecasting. The paper better accounts for the weights between variables and improves the computational convergence accuracy of the traditional model. this paper, a two-way long-term and short-term memory (BILSTM) (Rahman et al. 2021) model is used for predictive analysis. BILSTM models incorporate unit states and gate structures to control information transfer, enabling effective learning of long-term dependencies in time series. For example, Chen et al. (2022)combined EMD, attention mechanism, and BILSTM neural network, and used input data interpolation methods to improve the accuracy of runoff prediction. Zhang et al. (2020) developed the Tiny-RainNet model for direct prediction of future precipitation based on continuous radar echo maps, a deep convolutional neural network with a bidirectional long- and short-term memory model for short-term precipitation prediction. Latifolu (2022), on the other hand, uses instantaneous frequency features and bidirectional networks with LSTM to predict daily precipitation data, and the IF-BILSTM model was found to have higher predictive performance than the BILSTM model, especially in long-range forecasting studies where the IF features improve the estimation performance. The BILSTM model is better adapted and can be better used for precipitation prediction. Combining data denoising methods with data-driven models can improve prediction accuracy. At the end of the last century, Dr Norden e. Huang, a Chinese-American scientist at NASA, proposed a new method for handling non-stationary signals, the Hilbert. The empirical mode decomposition (EMD) is an important component of the Huang transform (Huang et al. 1998). EMD is one of the most commonly used methods for data denoising. Since EMD methods often encounter the modal conflation problem, the eigenmode function (IMF) components obtained from the decomposition do not reflect the variation characteristics of the original sequence. Wu et al. presented the findings of a study on the treatment of white noise by EMD, namely ensemble empirical mode decomposition (EEMD), which alleviates the degree of modal confounding in EMD by adding Gaussian white noise to the original sequence and then averaging it, and the decomposition results can more intuitively reflect the changing characteristics of the original sequence (Wu & Huang 2009). EEMD-based hybrid models are now a common tool for hydrological and meteorological forecasting. Xue et al. (2013) used the EEMD method to decompose the autumn precipitation series of the Weihe River basin for the past 50 years into multiple IMFs, and then extracted the information contained in the precipitation series to obtain a multi-scale feature. Qiu et al. (2022) used EEMD to study the spatial and temporal analysis and prediction of precipitation extremes in the Weihe River basin, China. Huang et al. (2017) used the EEMD-GRNN model to predict the annual precipitation in Zhengzhou City, and the results showed that the EEMD method could improve the performance of the GRNN model. Yang et al. (2021) used EEMD-LSTM to predict the annual precipitation in the economic zone of the northern slopes of Tianshan Mountain. The LSTM network model was trained based on the components of the EEMD decomposition and used to forecast precipitation in the study area. However, it was found that there is no definite criterion for the amplitude of the white noise added by the EEMD method, and after ensemble averaging there is still some noise left over, resulting in some reconstruction error in EEMD. Complementary ensemble empirical mode decomposition (CEEMD) (Ye et al. 2010) reduces the reconstruction error of the signal by adding white noise with opposite amplitude and zero mean to the original signal. Zhang et al. (2022) and others constructed a coupled CEEMD-GRU precipitation prediction model and applied it to the monthly precipitation prediction in Shanghai. The model has good adaptability and can be applied to regional precipitation forecasting. In this paper, an improved coupling model of CEEMD and bidirectional long short-term memory neural network BILSTM based on EMD and EEMD is established, and it is applied to the forecast of monthly precipitation in Zhengzhou. After the prediction results are obtained, the performances of the LSTM model, EEMD-LSTM model and EEMD-BILSTM model are compared.
RESEARCH METHODS
CEEMD algorithm
The CEEMD algorithm is an improved method based on the EMD and EEMD algorithms. The EMD algorithm is based on the signal's own characteristics and decomposes the signal into a number of intrinsic mode functions (IMFs) and residual components. Good results with adaptive time-frequency analysis, but with the problem of mixing of modal components at different time scales. EEMD suppresses the problem of modal aliasing to a certain extent by adding white noise to the signal, but it also creates a signal reconstruction problem as a result, with the final signal containing residual noise. CEEMD replaces EEMD by adding positive and negative paired white noise to the two original signals, and after the EMD process, the results of both are averaged as the decomposition result. This method both suppresses the phenomenon of modal aliasing and eliminates residual noise in the signal, while reducing the need to add noise from the order of a hundred to the order of a few tens compared to EEMD, improving both computational accuracy and efficiency. The CEEMD algorithm decomposition steps are as follows.
- (1)
- (2)
The EMD decomposition of the obtained mixed signal produces 2n sets of IMF components, where the jth IMF component of the i-th signal is denoted as
.
- (3)The obtained IMF components are grouped and averaged to output the IMF components of the original signal and the residual components R.where: j is the jth IMF component obtained after CEEMD decomposition of the original signal; R is the residual component left after the original signal does not satisfy the decomposition condition and stops decomposition.
BILSTM
LSTM
Traditional recurrent neural networks often use a logistic nonlinear activation function for recurrent learning. Since the derivative value of the logistic function lies between 0 and 1, when the time interval is large, it is easy to make the gradient tend to 0 or become large, which leads to gradient disappearance or gradient explosion problems. Therefore, traditional recurrent neural networks have the problems of gradient disappearance and gradient explosion when dealing with long interval information sequences.



Where, equation, - current input,
, output of the hidden layer at moment
-t-1 and moment t, The cell state at moment
,
-t-1 and moment t,
,
,
-output of forgetting gate, input gate and output gate at time t,
,
,
,
-weight vectors,
,
,
,
-Bias vectors.
BILSTM
The output of BILSTM is determined jointly by two LSTM layers, the forward LSTM layer can be considered as forward computation from the starting moment to the last moment, and the reverse LSTM layer can be considered as reverse computation from the last moment to the starting moment, and the two layers are processed in the same way during the computation. Finally, the outputs of the forward and reverse layers at each moment are combined to obtain the output at that moment.
CEEMD-BILSTM model
Error analysis

CASE APPLICATION
Overview of the study area
Empirical modal decomposition
Forecasting
After the CEEMD decomposition, the smoothness of the precipitation time series in Zhengzhou City is improved and the volatility is significantly reduced. The prediction errors from IMF1 to IMF6 become smaller and smaller, indicating that the training effect becomes better gradually.
As can be seen from Table 1, the RMSE, the MAE and R2 of IMF1 are larger, which is directly related to the larger frequency of IMF1. The RMSE and MAE from IMF1 prediction to IMF6 show a gradually decreasing trend.
Errors and coefficients of IMF1-IMF6 and trend term predictions
ORDER . | RMSE . | MAE . | ![]() |
---|---|---|---|
IMF1 | 0.9108 | 0.8272 | 0.80365 |
IMF2 | 0.8284 | 0.7488 | 0.83339 |
IMF3 | 0.6693 | 0.5406 | 0.85274 |
IMF4 | 0.4583 | 0.3943 | 0.93585 |
IMF5 | 0.2333 | 0.21022 | 0.95999 |
IMF6 | 0.026316 | 0.025985 | 0.97759 |
Residual | 0.33433 | 0.25898 | 0.90367 |
ORDER . | RMSE . | MAE . | ![]() |
---|---|---|---|
IMF1 | 0.9108 | 0.8272 | 0.80365 |
IMF2 | 0.8284 | 0.7488 | 0.83339 |
IMF3 | 0.6693 | 0.5406 | 0.85274 |
IMF4 | 0.4583 | 0.3943 | 0.93585 |
IMF5 | 0.2333 | 0.21022 | 0.95999 |
IMF6 | 0.026316 | 0.025985 | 0.97759 |
Residual | 0.33433 | 0.25898 | 0.90367 |
Relative error of the forecast month
Month . | True value/mm . | Predicted value/mm . | Relative error/% . |
---|---|---|---|
217 | 13.4 | 12.61 | 5.89 |
218 | 11.8 | 11.75 | 0.42 |
219 | 12.8 | 11.35 | 7.70 |
220 | 29.6 | 28.69 | 3.06 |
221 | 56.0 | 54.00 | 3.57 |
222 | 86.12 | 85.10 | 1.16 |
223 | 99.9 | 99.90 | 0.00 |
224 | 254.4 | 255.40 | 0.39 |
225 | 81.4 | 82.36 | 1.23 |
226 | 81.8 | 82.27 | 0.58 |
227 | 12 | 11.13 | 7.28 |
228 | 8.5 | 8.39 | 1.32 |
229 | 47.7 | 46.53 | 2.09 |
230 | 33.5 | 32.65 | 2.99 |
231 | 25.5 | 24.49 | 3.96 |
232 | 22.3 | 20.97 | 5.97 |
233 | 54.5 | 55.86 | 1.83 |
234 | 99.1 | 100.54 | 1.45 |
235 | 97 | 96.4 | 0.62 |
236 | 129 | 128.96 | 0.03 |
237 | 51.7 | 50.47 | 2.37 |
238 | 41.1 | 42.67 | 3.82 |
239 | 39.6 | 40.67 | 2.69 |
240 | 8.4 | 8.31 | 1.12 |
Mean relative error (%) | 2.68 |
Month . | True value/mm . | Predicted value/mm . | Relative error/% . |
---|---|---|---|
217 | 13.4 | 12.61 | 5.89 |
218 | 11.8 | 11.75 | 0.42 |
219 | 12.8 | 11.35 | 7.70 |
220 | 29.6 | 28.69 | 3.06 |
221 | 56.0 | 54.00 | 3.57 |
222 | 86.12 | 85.10 | 1.16 |
223 | 99.9 | 99.90 | 0.00 |
224 | 254.4 | 255.40 | 0.39 |
225 | 81.4 | 82.36 | 1.23 |
226 | 81.8 | 82.27 | 0.58 |
227 | 12 | 11.13 | 7.28 |
228 | 8.5 | 8.39 | 1.32 |
229 | 47.7 | 46.53 | 2.09 |
230 | 33.5 | 32.65 | 2.99 |
231 | 25.5 | 24.49 | 3.96 |
232 | 22.3 | 20.97 | 5.97 |
233 | 54.5 | 55.86 | 1.83 |
234 | 99.1 | 100.54 | 1.45 |
235 | 97 | 96.4 | 0.62 |
236 | 129 | 128.96 | 0.03 |
237 | 51.7 | 50.47 | 2.37 |
238 | 41.1 | 42.67 | 3.82 |
239 | 39.6 | 40.67 | 2.69 |
240 | 8.4 | 8.31 | 1.12 |
Mean relative error (%) | 2.68 |
CEEMD-BILSTM model prediction results compared with the original data.
As shown in Table 2, the prediction error of the coupled CEEMD-BILSTM model is at a low level, with a maximum relative error of 7.28%, a minimum of 0.00%, and an average relative error of 2.68%. The results show that the model has a small relative error in prediction, a high pass rate and good prediction quality.
DISCUSSION
To verify the superiority of CEEMD-BILSM model, LSTM model, EEMD-LSTM model and EEMD-BILSTM model were used to make predictions, and the prediction results of CEEMD-BILSM model were compared with other models, as shown in Table 3.
Relative error of each model
Month . | True value/mm . | CEEMD-BILSTM . | EEMD-BILSTM . | EEMD-LSTM . | LSTM . | ||||
---|---|---|---|---|---|---|---|---|---|
Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | ||
217 | 13.4 | 12.61 | 5.89 | 14.43 | 7.69 | 14.87 | 10.97 | 15.98 | 19.25 |
218 | 11.8 | 11.75 | 0.42 | 12.57 | 6.53 | 12.56 | 6.44 | 9.6 | 18.64 |
219 | 12.3 | 11.35 | 7.70 | 13.98 | 13.66 | 12.35 | 0.41 | 10.6 | 13.82 |
220 | 29.6 | 28.69 | 3.06 | 30.21 | 2.06 | 28.63 | 3.28 | 27.36 | 7.57 |
221 | 56 | 54.00 | 3.57 | 58.64 | 4.71 | 51.2 | 8.57 | 50.3 | 10.18 |
222 | 86.1 | 85.10 | 1.16 | 85.97 | 0.15 | 101.27 | 17.62 | 102.65 | 19.22 |
223 | 99.9 | 99.90 | 0.00 | 110.28 | 10.39 | 108.59 | 8.70 | 116.98 | 17.10 |
224 | 254.4 | 255.40 | 0.39 | 289.51 | 13.80 | 280.57 | 10.29 | 294.35 | 15.70 |
225 | 81.4 | 82.40 | 1.23 | 93.24 | 14.55 | 80.54 | 1.06 | 98.62 | 21.15 |
226 | 81.8 | 82.27 | 0.58 | 87.53 | 7.00 | 75.91 | 7.20 | 69.49 | 15.05 |
227 | 12 | 11.13 | 7.28 | 12.54 | 4.50 | 13.48 | 12.33 | 15.36 | 28.00 |
228 | 8.5 | 8.39 | 1.32 | 8.31 | 2.24 | 9.28 | 9.18 | 9.64 | 13.41 |
229 | 47.7 | 46.70 | 2.09 | 47.9 | 0.42 | 52.81 | 10.71 | 54.19 | 13.61 |
230 | 33.5 | 32.50 | 2.99 | 34.21 | 2.12 | 30.24 | 9.73 | 39.87 | 19.01 |
231 | 25.5 | 24.49 | 3.96 | 24.63 | 3.41 | 26.58 | 4.24 | 26.35 | 3.33 |
232 | 22.3 | 20.97 | 5.97 | 23.57 | 5.70 | 25.19 | 12.96 | 22.43 | 0.58 |
233 | 54.5 | 55.50 | 1.83 | 55.23 | 1.34 | 58.85 | 7.98 | 64.82 | 18.94 |
234 | 99.1 | 100.54 | 1.45 | 100.35 | 1.26 | 108.25 | 9.23 | 113.94 | 14.97 |
235 | 97 | 96.40 | 0.62 | 99.62 | 2.70 | 100.34 | 3.44 | 116.53 | 20.13 |
236 | 129 | 128.96 | 0.03 | 132.59 | 2.78 | 126.38 | 2.03 | 149.35 | 15.78 |
237 | 51.7 | 50.47 | 2.37 | 53.47 | 3.42 | 55.24 | 6.85 | 61.59 | 19.13 |
238 | 41.1 | 42.67 | 3.82 | 44.35 | 7.91 | 41.52 | 1.02 | 46.43 | 12.97 |
239 | 39.6 | 40.67 | 2.69 | 42.59 | 7.55 | 40.21 | 1.54 | 49.48 | 24.95 |
240 | 8.4 | 8.31 | 1.12 | 9.67 | 15.12 | 8.8 | 4.76 | 10.54 | 25.48 |
Mean relative error/% | 2.68 | 5.88 | 7.11 | 16.17 |
Month . | True value/mm . | CEEMD-BILSTM . | EEMD-BILSTM . | EEMD-LSTM . | LSTM . | ||||
---|---|---|---|---|---|---|---|---|---|
Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | Predicted value/mm . | Relative error/% . | ||
217 | 13.4 | 12.61 | 5.89 | 14.43 | 7.69 | 14.87 | 10.97 | 15.98 | 19.25 |
218 | 11.8 | 11.75 | 0.42 | 12.57 | 6.53 | 12.56 | 6.44 | 9.6 | 18.64 |
219 | 12.3 | 11.35 | 7.70 | 13.98 | 13.66 | 12.35 | 0.41 | 10.6 | 13.82 |
220 | 29.6 | 28.69 | 3.06 | 30.21 | 2.06 | 28.63 | 3.28 | 27.36 | 7.57 |
221 | 56 | 54.00 | 3.57 | 58.64 | 4.71 | 51.2 | 8.57 | 50.3 | 10.18 |
222 | 86.1 | 85.10 | 1.16 | 85.97 | 0.15 | 101.27 | 17.62 | 102.65 | 19.22 |
223 | 99.9 | 99.90 | 0.00 | 110.28 | 10.39 | 108.59 | 8.70 | 116.98 | 17.10 |
224 | 254.4 | 255.40 | 0.39 | 289.51 | 13.80 | 280.57 | 10.29 | 294.35 | 15.70 |
225 | 81.4 | 82.40 | 1.23 | 93.24 | 14.55 | 80.54 | 1.06 | 98.62 | 21.15 |
226 | 81.8 | 82.27 | 0.58 | 87.53 | 7.00 | 75.91 | 7.20 | 69.49 | 15.05 |
227 | 12 | 11.13 | 7.28 | 12.54 | 4.50 | 13.48 | 12.33 | 15.36 | 28.00 |
228 | 8.5 | 8.39 | 1.32 | 8.31 | 2.24 | 9.28 | 9.18 | 9.64 | 13.41 |
229 | 47.7 | 46.70 | 2.09 | 47.9 | 0.42 | 52.81 | 10.71 | 54.19 | 13.61 |
230 | 33.5 | 32.50 | 2.99 | 34.21 | 2.12 | 30.24 | 9.73 | 39.87 | 19.01 |
231 | 25.5 | 24.49 | 3.96 | 24.63 | 3.41 | 26.58 | 4.24 | 26.35 | 3.33 |
232 | 22.3 | 20.97 | 5.97 | 23.57 | 5.70 | 25.19 | 12.96 | 22.43 | 0.58 |
233 | 54.5 | 55.50 | 1.83 | 55.23 | 1.34 | 58.85 | 7.98 | 64.82 | 18.94 |
234 | 99.1 | 100.54 | 1.45 | 100.35 | 1.26 | 108.25 | 9.23 | 113.94 | 14.97 |
235 | 97 | 96.40 | 0.62 | 99.62 | 2.70 | 100.34 | 3.44 | 116.53 | 20.13 |
236 | 129 | 128.96 | 0.03 | 132.59 | 2.78 | 126.38 | 2.03 | 149.35 | 15.78 |
237 | 51.7 | 50.47 | 2.37 | 53.47 | 3.42 | 55.24 | 6.85 | 61.59 | 19.13 |
238 | 41.1 | 42.67 | 3.82 | 44.35 | 7.91 | 41.52 | 1.02 | 46.43 | 12.97 |
239 | 39.6 | 40.67 | 2.69 | 42.59 | 7.55 | 40.21 | 1.54 | 49.48 | 24.95 |
240 | 8.4 | 8.31 | 1.12 | 9.67 | 15.12 | 8.8 | 4.76 | 10.54 | 25.48 |
Mean relative error/% | 2.68 | 5.88 | 7.11 | 16.17 |
Comparison of the prediction results of CEEMD-BILSTM model and other models.
As can be seen from Table 3, the MAE, RMSE and mean relative error of the coupled CEEMD-BILSTM model for precipitation prediction in Zhengzhou City are better than those of the other three models. It can be seen from Figure 9 that the coupled CEEMD-BILSTM model predicts precipitation better than the other three models. The BILSTM model incorporates cell states and gate structures to control the information transfer, which can effectively learn the long-term dependence in the time series. CEEMD reduces the reconstruction error of the signal by adding white noise with opposite amplitude and mean value of 0 to the original signal. The coupled CEEMD-BILSTM model is established by effectively combining the two, which makes the prediction results more accurate and the coupled model can reflect the real changes of precipitation series in more reasonable details. And it is shown by this that the trend and period of the prediction results of the coupled CEEMD-BILSTM model basically match with the actual data.
CONCLUSION
- 1.
The BILSTM model incorporates unit states and gate structures to control the transfer of information, which can effectively learn long-term dependencies in time series. CEEMD reduces the reconstruction error of the signal by adding white noise with opposite amplitude and mean value of zero to the original signal. The effective combination of the two establishes the coupled CEEMD-BILSTM model, which makes the prediction results more accurate and the coupled model can reflect the real changes of precipitation series in more reasonable details. The average relative error is 2.68%, both at a low level, and the model has a higher level of prediction.
- 2.
A coupled CEEMD-BILSTM model was constructed and applied to the prediction of monthly urban precipitation. The average relative error is 2.68% and its coefficient of determination is also close to 1. Its prediction accuracy is better than that of the LSTM model, EEMD-LSTM model and EEMD-BILSTM model. This indicates that the coupled CEEMD-BILSTM model is feasible for monthly precipitation prediction.
- 3.
It should be noted that the model is based on a data-based model and the study of physical mechanisms needs to be enhanced. More research is needed on the physical mechanism aspect of the model. Further research can be conducted on the physical mechanism in future predictions.
AVAILABILITY OF DATA AND MATERIALS
Data and materials are available from the corresponding author upon request.
AUTHOR CONTRIBUTION
All authors contributed to the study conception and design. writing and editing: Xianqi Zhang and Jingwen Shi; chart editing: Guoyu Zhu; preliminary data collection: Yimeng Xiao, Haiyang Chen. All authors read and approved the final manuscript.
FUNDING
This work was supported by the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004].
ETHICS APPROVAL
Not applicable.
CONSENT TO PARTICIPATE
Not applicable.
CONSENT FOR PUBLICATION
Not applicable.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.