At present, the method of using coupled models to model different frequency subseries of precipitation series separately for prediction is still lacking in the research of precipitation prediction, thus in this paper, a coupled model based on Ensemble Empirical Mode Decomposition (EEMD), Long Short-Term Memory neural network (LSTM) and Autoregressive Integrated Moving Average (ARIMA) is proposed for month-by-month precipitation prediction. The monthly historical precipitation data of Luoyang City from 1973 to 2021 were used to build the model, and the modal components of different frequencies obtained by EEMD decomposition were divided into high-frequency series part and low-frequency series part using the Permutation Entropy (PE) algorithm, the LSTM model is used to predict the high-frequency sequence part, while the ARIMA model is used to predict the low-frequency sequence part. Monthly precipitation forecasts are obtained by superimposing the results of the two models. Finally, the predictive performance is evaluated using several assessment metrics. The indicators show that the model predictive performance outperforms the EMD-LSTM (Empirical Mode Decomposition), EEMD-LSTM, EEMD-ARIMA combined models and the single models, and the model has high confidence in the prediction results of future precipitation.

  • This paper adopts the EEMD algorithm to decompose the precipitation series into modal components of different frequencies.

  • LSTM is a special kind of RNN that can solve the problem of gradient explosion and gradient disappearance that occurs during the training of RNN.

  • The ARIMA model is very simple and requires only endogenous variables without resorting to other exogenous variables.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Precipitation is the main driver of the biospheric water cycle and is highly complex and uncertain in time and space. Relevant studies have shown that global warming has exacerbated the instability of the climate system (Zhang & Zhou 2019). In the context of the current global climate change, the frequency of extreme precipitation events and extreme drought events is also showing an increase (Simon & Alberto 2019), and it brings a lot of uncertainty to agricultural production, ecological environment and sustainable economic and social development. Precise forecasts of regional precipitation are therefore made in advance, with a view to meeting the needs of regional flood and drought prevention and water management. Precise precipitation prediction has been a difficult problem to solve in related research because of the complexity and variability of the causes affecting precipitation, which makes it extremely difficult to predict precipitation.

To date, precipitation prediction models can be roughly summarised into two main categories: traditional and emerging prediction models. Traditional forecasting models contain regression analysis classes and time series classes, examples include grey systems theory (Meng et al. 2022), Markov chains and set-pair analysis (Huang & Gao 2022). Such models are generally based on a large amount of monitored historical precipitation data, sorting the data by time scales, combining statistical methods and numerical autocorrelation features, mining the patterns and structures of stochastic processes, and constructing corresponding prediction models. Such models do not take into account other factors that may have an impact, relying alone on finding large data regularities to predict future conditions. Liu & Duan (2017) used maximum covariance regression analysis for precipitation grid data from observation stations on the Qinghai-Tibet Plateau to improve rainfall prediction in the region. Chang & Zhao (2011) obtained a method for predicting precipitation by combining time series methods and regression methods, which combines information on the time series dynamics of precipitation and the influence of natural environmental factors on precipitation, and is suitable for predicting the variability characteristics of precipitation. Lai & Dzombak (2021) developed a statistical time series forecasting technique based on the Autoregressive Integrated Moving Average model (ARIMA) and made a quantitative comparison of other common statistical techniques developed for annual temperature and precipitation forecasting models, showing that ARIMA models generally provide more accurate forecasts, especially in terms of interval forecasts. Jabbari & Bae (2020) used Total Least Square (TLS) and a bias correction method related to lead time, namely Dynamic Weighting (DW), to post-process the forecast real-time data and reduce the bias in real-time precipitation forecasts. Although this traditional prediction model has achieved some success, it still faces many challenges due to the low stability of statistical relationships, the complexity of the modelled atmospheric physics equations and the high smoothness requirements of the time series.

The rapid development of machine learning and deep learning has effectively compensated for the shortcomings of traditional prediction models. This emerging forecasting approach attempts to explore precipitation patterns, using large amounts of historical meteorological data (including precipitation data) to analyse and build models that can be trained to identify patterns in the data and further predict the evolution of other precipitation processes (He et al. 2021). This approach is of interest to many researchers because it not only learns the relationship between input observations and output results directly from the data, but also generates higher grid resolutions more frequently, for example, the first Artificial Neural Network (ANN) methods used to predict rainfall (Zhou et al. 2020), robust rainfall prediction techniques based on Support Vector Regression (SVR) (Hasan et al. 2015), ensemble precipitation prediction based on random forest approach (Zarei et al. 2021). Although these models can handle non-linear precipitation data, they ignore the sequential relationships between the data, so they have room for improvement.

Recurrent Neural Networks (RNN) are memetic, allowing the network to use previous information and current data to jointly determine the output outcome, but RNN have difficulty effectively remembering longer intervals of dependencies and usually lose the memetic information. Therefore, Hochreiter & Schmidhuber (1997) designed the Long Short-Term Memory (LSTM) network, which not only solves the long-term dependence problem, but also overcomes the gradient disappearance problem, and thus is widely used in the field of hydrological prediction. Li et al. (2022) selected the runoff data of the Yarkant River for training and testing of the LSTM model to improve the accuracy of inland river runoff prediction. Liu et al. (2020) applied a Long Short-Term Memory neural network (LSTM) to predict the future trend of the Tibetan Plateau based on its precipitation data from 1990–2016, and the results showed that the prediction accuracy of the LSTM model was improved in predicting precipitation compared with the traditional prediction methods.

Based on the consideration of the nonlinear and time-series relationship of precipitation data, the prediction results may have some errors when the long- and short-term memory method is used for prediction, as it is influenced by many complex factors and cannot guarantee the relative smoothness in general (Kala et al. 2022). To further improve prediction accuracy, the field of forecasting has turned its attention to the exploitation of time series frequency domain features. Precipitation data itself contains random noise, which will interfere with the process of precipitation prediction and thus affect the accuracy of precipitation prediction, so the processing of random noise is an indispensable data processing task, and Ensemble Empirical Mode Decomposition (EEMD) is one of the most commonly used data denoising methods (Yu et al. 2018). Wang et al. (2015) used the EEMD method to process the original annual runoff time series, the constructed EEMD-ARIMA model can significantly improve the annual runoff time series prediction by the ARIMA time series method. Yang et al. (2021) used a coupled EEMD and LSTM model to predict annual precipitation in the northern Tianshan economic zone and the results show that the model outperforms a single model for annual precipitation prediction, but also exposes the drawback that the LSTM model cannot handle well the low-frequency series part obtained from EEMD decomposition and thus is prone to errors. These models do not take into account the division of the subseries obtained from EEMD decomposition into high-frequency and low-frequency parts according to different frequencies, after which the corresponding models are built and predicted separately, and do not take into account the applicability of a single algorithm to different frequency subseries, thus generating some avoidable errors. Based on the above, this paper adopts a permutation entropy (PE) algorithm to divide the subseries obtained from EEMD decomposition into high and low frequencies, and adopts different models to build models for prediction of high and low frequencies separately to construct an EEMD-LSTM-ARIMA coupled precipitation prediction model. LSTM has higher prediction accuracy for high frequency series, and the ARIMA model has an excellent prediction effect for low frequency series.

At present, the method of using the coupled EEMD-LSTM-ARIMA model to model different frequencies separately for prediction is still lacking in the study of precipitation prediction. This study has a dual purpose: firstly, to find the optimal parameters of this coupled model for regional medium- and long-term precipitation forecasting, and secondly, to conduct a comparative test of the developed model to highlight the model accuracy and improve the precipitation prediction accuracy in the region. Therefore, based on the month-by-month historical precipitation data of Luoyang City from 1973 to 2021, this paper investigates and proposes a method that integrates empirical modal decomposition method, long- and short-term memory network and autoregressive differential moving average. A coupled EEMD-LSTM-ARIMA model was constructed, and the model was trained using the monthly precipitation data of Luoyang region from 1973 to 2011, after which the monthly precipitation of Luoyang region from 2012 to 2021 was simulated and predicted, and the prediction results were compared with the prediction results of several different models and real data to evaluate the effectiveness of the model and to make predictions for the region for 2022–2024 monthly precipitation forecasts. The model can provide a new model reference for improving the precipitation prediction accuracy in the region, and also provide data support for disaster prevention and mitigation in the region, etc.

Coupled model (EEMD-LSTM-ARIMA)

Based on the monthly historical precipitation data from 1973–2021 in Luoyang City, this paper couples three methods, EEMD, LSTM and ARIMA, to construct a prediction model, complementing the strengths and weaknesses of the three methods, which not only reduces the influence of random noise in the original data, but also solves the lag problem of LSTM for time-series prediction, and improves the accuracy of precipitation prediction. The framework of the coupled EEMD-LSTM-ARIMA model is shown in Figure 1.
Figure 1

Framework of the coupled EEMD-LSTM-ARIMA model.

Figure 1

Framework of the coupled EEMD-LSTM-ARIMA model.

Close modal

The steps are as follows:

  • (1)

    The EEMD decomposition was performed on the collected monthly historical precipitation data from 1973–2021 in Luoyang City.

  • (2)

    The valid IMF (Intrinsic Mode Function) components obtained from the decomposition are divided and the 1973–2011 dataset is selected for training and the 2012–2021 dataset for validation. The permutation entropy of each IMF component is calculated separately to divide the high-frequency sequence part and the low-frequency sequence part.

  • (3)

    The LSTM prediction model with good prediction effect on high frequency data is set for several IMF subsequences of selected high frequency sequence parts. Several IMF subseries of the selected low-frequency series part are calculated using the ARIMA model that performs well for low-frequency data, and finally the predicted values of the two methods are combined to be the final predicted values.

  • (4)

    The obtained prediction model was used to predict the monthly precipitation in Luoyang City from 2012 to 2021, and the performance of the resulting coupled EEMD-LSTM-ARIMA precipitation prediction model was evaluated by comparing it with the validation set of historical real data.

  • (5)

    The EEMD-LSTM-ARIMA coupled precipitation prediction model was used to predict monthly precipitation in Luoyang City from 2022 to 2024.

The partitioning of time series datasets is often judged according to the rule of thumb that normally the training part of the dataset should carry greater than 60% of the total and the validation part should be greater than 20% of the total, and many researchers have used different partitioning scenarios: Kumar et al. (2019) used 70% of the data to train RNN and LSTM models to model the ‘all-India’ monthly average precipitation data; Liu et al. (2021) used 80% of the data as the training set to train models for wind speed prediction and the remaining 20% as the test set. In this study 80% of the data is taken to train the model and 20% of the data is used to test the model performance.

EEMD-Ensemble empirical mode decomposition

Most of the time series usually used for forecasting exhibit non-linear and non-stationary characteristics. If the prediction process is done directly without smoothing, it is prone to inaccuracy of the prediction results. The Empirical Modal Decomposition (EMD) is a method for analytical processing of nonlinear and non-smooth signals (Huang et al. 1998). The original complex non-linear signal is decomposed by EMD into several Intrinsic Mode Functions (IMFs) and a Residual (Res). The characteristic information of each frequency of the original signal is present in the decomposition to obtain multiple IMF components. where the IMF obtained by EMD processing needs to satisfy: 1) Within the original data, the number of local extremes and trans-zero points must be equal to or differ by at most one; 2) At any time point, the upper and lower envelopes have a mean value of 0. Set the signal to be decomposed as ,The EMD decomposition steps are as follows:

  • (1)

    Find the local extrema of the original signal , The upper and lower envelopes and are obtained by fitting all extreme points separately through three spline interpolations.

  • (2)
    Take the mean value of the envelope signal obtained in the first step:
    (1)
  • (3)
    Calculate the difference between the two sets of data :
    (2)
  • (4)

    Determine whether the new sequence satisfies the IMF conditions:

    • (a)

      meet the IMF conditions, so is the IMF component, let = , then calculate the first-order residuals , consider as the initial signal and repeat steps (1), (2), and (3).

    • (b)

      does not satisfy the condition of IMF, it is considered as the initial signal to repeat steps (1), (2), and (3).

  • (5)
    Determine whether the stopping condition (SD) is satisfied, as calculated by:
    (3)
    where k is the number of cycle processes, and the cycle stops when SD is not greater than the given threshold.
  • (6)
    The final raw signal is calculated by the EMD algorithm to obtain:
    (4)

However, the EMD method also has certain limitations. A major problem is the polarization point of the signal will affect the IMF, and if the distribution is not uniform, the final result will appear model blending (Tang et al. 2012). The EEMD method of adding noise assisted analysis to the decomposition process effectively suppresses the model aliasing problem (Wu & Huang 2009). By adding white noise to the analysis of the original signal, after using a large number of samples to test the mean, the original abnormal noise will be slowly eliminated and the final integrated mean signal obtained will become stable.

The EEMD decomposition steps are as follows:

  • (1)

    Total number of initialized empirical modal runs M.

  • (2)

    When the system runs to M times, add Gaussian white noise to construct a new sequence signal: .

  • (3)

    The new sequence signal is decomposed to obtain a finite number of IMF variables and a residual .

  • (4)

    Repeat the operation (1), (2), (3) steps M times.

  • (5)
    To eliminate the effect of the added white noise on , the IMFs obtained each time are summed and their mean values are found as follows:
    (5)
    where denotes the jth IMF component of the original signal after decomposition for mean value processing.
  • (6)
    After the steps (1) to (5), the EEMD decomposition is obtained in the following form:
    (6)

The precipitation data series are non-linear, generally with random noise, which easily affects the precipitation prediction accuracy. By decomposing the precipitation series with EEMD, we can obtain the intrinsic pattern and change trend of precipitation distribution, remove the influence of random noise, and lay the foundation for the following construction of precipitation prediction model.

LSTM-Long and short-term memory neural network

The LSTM network model is a special type of RNN using a new method of self-loop that allows gradients to continuously flow in the network path over time. The LSTM structure has three additional gates compared to traditional neural networks, called input gates, forgotten gates and output gates (Yin et al. 2021). When a cell enters the LSTM framework, it is transmitted to the next step if it is considered as useful information by the rules, and it is discarded by the forgetting gate if it is deemed useless. The basic cell structure of LSTM is shown in Figure 2.
Figure 2

LSTM basic cell structure.

Figure 2

LSTM basic cell structure.

Close modal

The specific calculation process is as follows:

  • (1)
    Forgotten Gate: Control the previous unit to discard information and keep the remaining information to the current unit . The step t forgotten gate information can be expressed as :
    (7)
  • (2)
    Input Gate: The information of control input is stored in cell . The input gate information in step t can be expressed as:
    (8)
    where is the weight matrix and is the bias vector.
  • At this point the state is:
    (9)
    (10)
  • (3)
    Output gate: The control outputs the selected message. The output gate information at step t can be expressed as:
    (11)
    (12)
    where has a weight value between 0 and 1, and is the final output unit.
  • LSTM has a strong adaptive learning ability and better fitting effect when dealing with complex sample data. By keeping the memory function through the gating structure, it performs better for long time temporal correlation prediction and effectively avoids the phenomenon that the gradient tends to disappear with the increase of time. Therefore, the LSTM neural network is chosen to be more effective for precipitation data processing when considering the nonlinear and time-series relationship of precipitation data.

ARIMA-Autoregressive integrated moving average

ARIMA models are often used for forecasting of time series data. It deals with time series data that are dynamic rather than stationary, and is especially suitable for the dynamics of stochastic processes, but only if the data is a smooth series (Liu et al. 2021). Its essence is to summarize the patterns and trends of the series by exploring the linear relationship between the past data and the current data. ARIMA models can be classified as q-order moving average model MA (q), p-order autoregressive model AR (p), autoregressive moving average model ARMA (p, q), and summed autoregressive moving average ARIMA (p, d, q) model depending on the conditions.

The modeling process mainly uses three parameters, p, d and q, to control the model. p is the number of autonomous regression terms, q is the number of moving average terms, and d is the number of differentials. The ARIMA (p, d, q) model is expressed as follows:
(13)
where , , are the autoregressive polynomial and the moving smoothing polynomial of the ARIMA(p, d, q) model.

Model evaluation indicators

In order to verify the precipitation prediction effect of the coupled EEMD-LSTM-ARIMA model, the following four indicators are used to evaluate the prediction results in this paper: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Square Error (MSE), and coefficient of determination (). The specific calculation formula is as follows:
(14)
(15)
(16)
(17)

In Equations (13)–(16) above, and denote the observed and model predicted values of precipitation data at moment k, respectively, then represents the average of the observed data, n indicates the total volume of precipitation data. The smaller the values of RMSE, MAE and MSE, the more accurate the prediction result and the higher the model prediction effect. The closer is to 1, the better the model predicts the regression fit. Generally, above 0.8 is considered to be a good model fit.

Study area

Luoyang City is located in the western part of Henan Province, with a latitude and longitude range of 112°16′—112°37′E、34°32′—34°45′N. It is located in the central part of the Yellow River basin, on the south bank of the river, leaning against the Song Mountain in the east, connected with the Qinling Mountains in the west, echoing with the Ique in the south, and neighboring with the Mang Mountains in the north, with four rivers crossing the Luo River, the Yi River, the Jian River, and the Chanshui River in the middle, with a total area of 15,200 square kilometers. The climate in this region is relatively dry in spring, hot in summer with more precipitation, mild in autumn with sunny weather, cold in winter with sparse rain and snow, and a typical temperate continental monsoon climate with an average annual precipitation of about 674 mm in the last 50 years. The geographical location map of the study area is shown in Figure 3.
Figure 3

Geographical overview map of the study area.

Figure 3

Geographical overview map of the study area.

Close modal

Data source

In this study, the monthly historical precipitation data of Luoyang City from 1973 to 2021 were obtained from National Centers for Environmental Information (https://ngdc.noaa.gov/), The resulting data is controlled and checked to remove erroneous data where single-day precipitation is displayed as 99.99. Linear regression and moving average methods were then used to analyze the month-by-month historical and annual precipitation data, as shown in Figure 4. From this, we can see that the month-by-month historical precipitation data and annual precipitation data of Luoyang City show strong unsteadiness and non-linearity, and the monthly precipitation is mainly concentrated in summer, but the variation of summer precipitation in comparison with each year is large; annual precipitation fluctuations are also quite pronounced, with basically four or five years as an oscillation cycle; the overall trend of slowly increasing precipitation is shown. Therefore, more accurate prediction models are needed to make reasonable predictions of precipitation and reduce the possible negative impacts caused by precipitation fluctuations.
Figure 4

Luoyang City, 1973–2021 (a) Monthly precipitation (orange dashed line is July moving average; blue line is raw data; black line is linear regression) (b) Annual precipitation (orange dashed line is seven-year sliding average; black line is linear regression; blue line is raw data; red line is average).

Figure 4

Luoyang City, 1973–2021 (a) Monthly precipitation (orange dashed line is July moving average; blue line is raw data; black line is linear regression) (b) Annual precipitation (orange dashed line is seven-year sliding average; black line is linear regression; blue line is raw data; red line is average).

Close modal

Decomposition results

EEMD was used to decompose the historical precipitation data of Luoyang City into seven IMF components and one Res component (Figure 5). The figure shows that IMF1 has the highest frequency and the worst smoothness, IMF1, IMF2, and IMF3 all have relatively large fluctuations in the three components, and the waveforms from IMF4 to IMF7 tend to level off and begin to show a certain periodicity. The Res components reflect the historical trends of precipitation in Luoyang city. Compared with the original precipitation series, the fluctuations of each component are more regular. After the raw data are decomposed by EEMD, the variation characteristics of each component can be better identified, which helps to improve the prediction performance. The statistical analysis of EEMD decomposition results of precipitation data in Luoyang City is shown in Table 1 and the standard deviation of each component obtained is smaller than the original precipitation data, indicating that each component is more stable than the original data.
Table 1

Statistical analysis of EEMD decomposition results of precipitation data in Luoyang City

SeriesMaximum (mm)Minimum (mm)Average (mm)Standard deviation (mm)
Raw data 360.426 47.56782 63.56654 
IMF1 122.4906 −125.845 31.5553 41.3389 
IMF2 115.4464 − 130.461 37.80779 47.15509 
IMF3 98.32462 − 72.9705 17.8036 23.88606 
IMF4 37.82077 − 54.033 10.44778 14.1637 
IMF5 8.553985 − 7.07767 3.594369 4.149955 
IMF6 6.073163 − 6.29281 3.748799 4.168697 
IMF7 17.02878 − 14.1688 7.894088 9.062226 
Res 80.06577 62.37893 4.831101 5.490517 
SeriesMaximum (mm)Minimum (mm)Average (mm)Standard deviation (mm)
Raw data 360.426 47.56782 63.56654 
IMF1 122.4906 −125.845 31.5553 41.3389 
IMF2 115.4464 − 130.461 37.80779 47.15509 
IMF3 98.32462 − 72.9705 17.8036 23.88606 
IMF4 37.82077 − 54.033 10.44778 14.1637 
IMF5 8.553985 − 7.07767 3.594369 4.149955 
IMF6 6.073163 − 6.29281 3.748799 4.168697 
IMF7 17.02878 − 14.1688 7.894088 9.062226 
Res 80.06577 62.37893 4.831101 5.490517 
Figure 5

EEMD decomposition results of monthly precipitation series in Luoyang City from 1973 to 2021.

Figure 5

EEMD decomposition results of monthly precipitation series in Luoyang City from 1973 to 2021.

Close modal

Model building

In order to quantify the frequency height of each IMF component after decomposition by EEMD, the PE algorithm is used to calculate the permutation entropy of each component arrangement. The calculation results are shown in Figure 6. The permutation entropy value decreases sequentially from IMF1 until the permutation entropy value of Res is 0. For ARIMA models, the smaller the permutation entropy value and the lower the time complexity, the higher the prediction accuracy of the model. From Figure 6, we can see that the subseries permutation entropy value of Res is less than 0.2, so the subseries of Res is taken as a low frequency series and brought into the ARIMA model for prediction; the permutation entropy values of IMF1 ∼ IMF7 are all greater than 0.2, which are considered as high-frequency sequences and brought into the LSTM network model for prediction.
Figure 6

Trend of IMF components and Res permutation entropy after EEMD decomposition.

Figure 6

Trend of IMF components and Res permutation entropy after EEMD decomposition.

Close modal

LSTM network modeling for the high frequency part; the optimal solution is selected through a large number of experiments, the number of selected hidden cells is 180, the activation function is ReLU function, the number of iterations is 600, the initial learning rate is 0.002, and after 300 rounds the learning rate is multiplied by a factor of 0.2 to reduce the learning rate, and the optimizer is defined as Adam function and the loss function is mae for training. Also, to control the overfitting phenomenon, the Dropout function is set to 10%. The ARIMA network model was established for the low-frequency part, and PACF analysis was performed for the series of Res. The ARIMA (1, 0, 0) model was finally used.

Validation of the model

Prediction of each IMF component and Res according to the established model, and the comparison of the predicted and actual values is shown in Figure 7. It can be seen that the prediction results of the LSTM for the IMF1 and IMF2 components are not yet well-fitted to the actual values with large errors, while the IMF3, IMF4, IMF5, IMF6, and IMF7 components are better fitted with smaller errors. ARIMA predicted significant results for Res. The EEMD decomposition method is complete, so the final coupled model predictions of precipitation can be obtained by summing the predicted values of each component.
Figure 7

Comparison of predicted and actual values for each IMF component and Res.

Figure 7

Comparison of predicted and actual values for each IMF component and Res.

Close modal
In order to more intuitively verify the effectiveness and accuracy of the coupled EEMD-LSTM-ARIMA model for precipitation prediction, five single prediction models or combined prediction models were established simultaneously in this paper for experimental simulation comparison, namely, the single-variable LSTM model and the ARIMA model, the combined EMD-LSTM model, the combined EEMD-ARIMA model, and the combined EEMD-LSTM model. The predicted results of each model are shown in Figure 8. A comparative analysis of the fit between the results of each prediction and the true values shows that all six models can reflect the fluctuation trend of the true precipitation series; the combined models all fit better than the single model; the EEMD-LSTM-ARIMA coupled model has the best prediction effect and the ARIMA model has the worst prediction effect.
Figure 8

Comparison of prediction results by models.

Figure 8

Comparison of prediction results by models.

Close modal

Discussion

The predictive performance of the model can be more clearly evaluated using the following model evaluation indicators: RMSE、MAE、MSE、, the detailed results of each model are shown in Table 2.

Table 2

Performances of different models

ModelMAEMSERMSE
EEMD-LSTM-ARIMA 7.15772 127.1159 11.27457 0.87221 
EEMD-LSTM 13.40644 452.7453 21.27781 0.76066 
EEMD-ARIMA 23.20466 1126.839 33.56842 0.58573 
EMD-LSTM 18.002 668.5379 25.8561 0.678614 
LSTM 28.52654 1799.315 42.41833 0.49072 
ARIMA 32.89908 2389.303 48.8805 0.41266 
ModelMAEMSERMSE
EEMD-LSTM-ARIMA 7.15772 127.1159 11.27457 0.87221 
EEMD-LSTM 13.40644 452.7453 21.27781 0.76066 
EEMD-ARIMA 23.20466 1126.839 33.56842 0.58573 
EMD-LSTM 18.002 668.5379 25.8561 0.678614 
LSTM 28.52654 1799.315 42.41833 0.49072 
ARIMA 32.89908 2389.303 48.8805 0.41266 

As can be seen from Table 2, the single model ARIMA has the largest MAE, MSE, and RMSE and the smallest value of , indicating that it has the worst predictive performance. Compared with the single model LSTM and ARIMA, the combined models EEMD-LSTM and EEMD-ARIMA have significantly higher prediction accuracy, Where MAE decreased by 53.01 and 29.46% respectively; RMSE decreased by 49.84 and 31.33%, respectively; improved by 0.26994 and 0.17307, respectively. The combined model EEMD-LSTM predicted better compared to the EMD-LSTM model, where MAE decreased by 34.28%; RMSE decreased by 21.52%, improved by 0.087452. Comparing the EEMD-LSTM-ARIMA model with the combined models EEMD-LSTM, and EEMD-ARIMA, the RMSE is reduced by 47.01 and 66.41%, respectively; increased by 0.11155 and 0.28648, respectively. After comparing with each model, the EEMD-LSTM-ARIMA model has the best prediction performance, where MAE is 7.1578, RMSE is 11.27457, is 0.87221, and MSE is 127.1159. The EEMD-LSTM-ARIMA model has values greater than 0.8 and has the smallest values of MAE, MSE, and RMSE, indicating that the model predicts with high confidence. The comparison of each index is shown in Figure 9.
Figure 9

Comparison of different model evaluation indicators.

Figure 9

Comparison of different model evaluation indicators.

Close modal

From the data comparison, it can be seen that due to the characteristics of non-smoothness and large fluctuations in the monthly precipitation data series, a single model cannot summarize the monthly precipitation variation characteristics better, and thus the prediction effect of a single model is poor. Among them, the prediction effect is weaker than that of the LSTM model which can fit any nonlinear relationship because the ARIMA model requires high smoothness of the time series and cannot capture nonlinear relationships. The EEMD method is used to smooth the original monthly precipitation data series, which effectively removes the influence of random noise and improves the prediction accuracy of a single model significantly; meanwhile, the EEMD method avoids the model mixing problem that EMD is prone to and makes each IMF component more uniform, and the final obtained EEMD-LSTM combined model has better prediction effect than the EMD-LSTM combined model. However, because there is still a difference between high and low frequency sequences of each IMF component after EEMD decomposition, the combined models EEMD-LSTM and EEMD-ARIMA formed by using a single model and EEMD method still have certain limitations, the coupled EEMD-LSTM-ARIMA model divides the high and low frequency sequences after PE calculation, and for high frequency the LSTM model is used to calculate the data, and the ARIMA model is used to calculate the low-frequency series, which gives full play to the respective advantages of LSTM and ARIMA models, and avoids the large errors easily generated by a single model in inapplicable series, so the prediction accuracy of the EEMD-LSTM-ARIMA coupled model is higher than that of the combined EEMD-LSTM and EEMD-ARIMA model.

Monthly precipitation forecast

The EEMD-LSTM-ARIMA model was used to forecast the monthly precipitation in Luoyang City from 2022 to 2024, and the forecast results are shown in Table 3.

Table 3

Prediction results of the coupled EEMD-LSTM-ARIMA model of monthly precipitation in Luoyang City from 2022 to 2026

YearJanFebMarAprMayJunJulAugSeptOctNovDec
2022 22.92 5.44 29.42 34.18 65.17 74.83 151.52 183.06 102.74 67.2 13.6 2.367 
2023 12.84 11.03 22.08 40.62 66.04 60.57 146.55 169.67 95.86 38.03 16.01 5.08 
2024 16.27 22.25 34.24 82.67 95.86 107.92 173.36 236.23 59.43 33.53 18.03 7.87 
HistAvg* 11.66 15.64 21.9 50.42 62.01 75.69 145.25 127.59 104.27 49.9 28.54 7.1 
YearJanFebMarAprMayJunJulAugSeptOctNovDec
2022 22.92 5.44 29.42 34.18 65.17 74.83 151.52 183.06 102.74 67.2 13.6 2.367 
2023 12.84 11.03 22.08 40.62 66.04 60.57 146.55 169.67 95.86 38.03 16.01 5.08 
2024 16.27 22.25 34.24 82.67 95.86 107.92 173.36 236.23 59.43 33.53 18.03 7.87 
HistAvg* 11.66 15.64 21.9 50.42 62.01 75.69 145.25 127.59 104.27 49.9 28.54 7.1 

Hist Avg*: Historical average, monthly average precipitation for the past 30 years.

In 2022–2024, 16 months are lower than the historical monthly average precipitation of the past 30 years, and 20 months are higher than the historical monthly average precipitation of the past 30 years; according to the climate type, 19 months are dry or less rainy, eight months are rainy, and the remaining nine months are wet, during the three years. Among them, the predicted precipitation in August 2024 is the maximum in the past three years, reaching 236.23 mm. Combined with the analysis of historical monthly precipitation, strong precipitation in Luoyang City often occurs in July-August, so Luoyang City needs to make corresponding preventive measures for urban precipitation in July-August, and at the same time, focus on preventing flash floods in mountainous and hilly areas caused by local or continuous rainfall, and make preparations for flash flood disaster prevention. The minimum precipitation for three consecutive months is likely to occur from December 2022 to February 2023, and the sum of the three months' precipitation forecast is 26.24 mm, which is consistent with the low precipitation in the local winter and spring controlled by dry and cold continental air masses, so attention needs to be paid to possible drought disasters posing a threat to agriculture and the ecological environment, as well as forest fires.

For the non-linear and non-smooth characteristics of precipitation series, this paper adopts the EEMD algorithm to decompose the precipitation series into modal components of different frequencies, avoiding the problem of modal overlap, and the PE algorithm is used to divide each modal component into a high-frequency sequence part and a low-frequency sequence part, and the LSTM and ARIMA models are used to predict the two parts, respectively. It reduces the error of each component prediction, improves the overall prediction accuracy and enhances the prediction stability, and can be applied to the study of monthly precipitation prediction.

The results of month-by-month precipitation prediction for 2012–2021 in Luoyang City show that the decomposition of raw data by EEMD method can significantly improve the prediction performance; the single LSTM and ARIMA models have poor processing ability for precipitation time series in Luoyang City, and the prediction results have larger errors compared with other combined models. The coupled EEMD-LSTM-ARIMA model proposed in this paper makes full use of the advantages of each method to obtain more accurate prediction results, and the indicators show that the prediction performance of the model is better than the combined models EEMD-LSTM, EEMD-SVR, and EEMD-ARIMA, and the prediction results have higher confidence.

The model proposed in this paper is effective in improving the accuracy of month-by-month precipitation prediction, but it has not been considered in the research process on whether to add some other variables affecting precipitation, and whether adding other variables will have better prediction effect, and we can try to add other variables for multivariate prediction in the future. Other combination models can also be tried for prediction.

This study was funded by the Henan Provincial Key R&D and Promotion Special Project (Science and Technology Tackling) (182102210066) and the National Natural Science Foundation of China (51709115).

Ethical Approval Not applicable.

Consent to Participate Not applicable.

Consent to Publish Not applicable.

Competing interests None.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Chang
Q.
&
Zhao
X. L.
2011
Research on the application of time series models in precipitation prediction
.
Computer Simulation
29
(
7
),
204
206
.
Hasan
N.
,
Nath
N. C.
&
Rasel
R. I.
2015
A support vector regression model for forecasting rainfall
. In:
2015 2nd International Conference on Electrical Information and Communication Technologies (EICT)IEEE
, pp.
554
559
.
He
S. P.
,
Wang
H. J.
,
Li
H.
&
Zhao
J. Z.
2021
Principles of machine learning and its potential applications in climate prediction
.
Journal of Atmospheric Sciences
44
(
01
),
26
38
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term me emory
.
Neural Computation
9
(
8
),
1735
1780
.
Huang
H.
&
Gao
X. F.
2022
Application of Markov optimization model based on Pearson hierarchical clustering in precipitation prediction
.
China High-Tech
14
,
111
113
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
&
Liu
H. H.
1998
The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
454
(
1971
),
903
995
.
Kala
A.
,
Vaidyanathan
S. G.
&
Femi
P. S.
2022
CEEMDAN hybridized with LSTM model for forecasting monthly rainfall
.
Journal of Intelligent & Fuzzy Systems
43
,
2609
2617
.
Kumar
D.
,
Singh
A.
,
Samui
P.
&
Jha
R. K.
2019
Forecasting monthly precipitation using sequential modelling
.
Hydrological Sciences Journal
64
,
690
700
.
Lai
Y. C.
&
Dzombak
D. A.
2021
Use of integrated global climate model simulations and statistical time series forecasting to project regional temperature and precipitation
.
Journal of Applied Meteorology and Climatology
60
,
695
710
.
Li
J. X.
,
Qian
K. X.
,
Liu
Y.
,
Yan
W.
,
Yang
X.
,
Luo
G.
&
Ma
X.
2022
LSTM-based model for predicting inland river runoff in arid region: a case study on Yarkant River, Northwest China
.
Water
14
(
11
),
1745
.
Liu
S. F.
&
Duan
A. M.
2017
A statistical model for predicting summer precipitation in eastern China based on the spring thermal anomaly signal on the Qinghai-Tibet Plateau
.
Journal of Meteorology
75
(
6
),
14
.
Liu
X.
,
Zhao
N.
,
Guo
J. Y.
&
Guo
B.
2020
Monthly precipitation prediction on the Tibetan Plateau based on LSTM neural network
.
Journal of Geoinformation Science
22
(
08
),
1617
1629
.
Simon
M. P.
&
Alberto
M.
2019
Global and regional increase of precipitation extremes under global warming
.
Water Resources Research
55
(
6
),
4901
4914
.
Tang
B. P.
,
Dong
S. J.
&
Ma
J. H.
2012
Research on EMD modal confounding elimination method based on independent component analysis
.
Journal of Instrumentation
33
(
07
),
1477
1482
.
Wang
W. C.
,
Chau
K. W.
,
Xu
D. M.
&
Chen
X. Y.
2015
Improving forecasting accuracy of annual runoff time series using ARIMA based on EEMD decomposition
.
Water Resources Management
29
,
2655
2675
.
Wu
Z. H.
&
Huang
N. E.
2009
Ensemble empirical mode decomposition:a noise-assisted data analysis method
.
Advances in Adaptive Data Analysis
1
(
1
),
1
41
.
Yang
Q.
,
Qin
L.
,
Gao
P.
&
Zhang
R. B.
2021
Annual precipitation prediction in the economic zone of the northern slope of Tianshan Mountain based on EEMD-LSTM model
.
Arid Zone Research
38
(
05
),
1235
1243
.
Yin
H. L.
,
Zhang
X. W.
,
Wang
F.
,
Zhang
Y.
,
Xia
R.
&
Jin
J.
2021
Rainfall-runoff modeling using LSTM-based multi-state-vector sequence-to-sequence model
.
Journal of Hydrology
598
,
126378
.
Yu
Y. H.
,
Zhang
H. B.
&
Singh
V. P.
2018
Forward prediction of runoff data in data-scarce basins with an improved ensemble empirical mode decomposition (EEMD) model
.
Water
10
(
4
),
388
.
Zarei
M.
,
Najarchi
M.
&
Mastouri
R.
2021
Bias correction of global ensemble precipitation forecasts by Random Forest method
.
Earth Science Informatics
14
,
677
689
.
Zhang
W.
&
Zhou
T.
2019
Significant increases in extreme precipitation and the associations with global warming over the global land monsoon regions
.
Journal of Climate
32
(
24
),
1477
1493
.
Zhou
Y. T.
,
Wang
D.
,
Wang
Y. K.
,
Wang
W. P.
&
Meng
D. Q.
2020
Multi-indicator optimization of a typical precipitation forecasting ANN for the western part of Taihu Lake
.
Hydrology
40
(
01
),
35
39
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).