Abstract
This study introduces ensemble empirical mode decomposition (EEMD) coupled with the autoregressive integrated moving average (ARIMA) model for drought prediction. In the realm of drought forecasting, we assess the EEMD-ARIMA model against the traditional ARIMA approach, using monthly precipitation data from January 1970 to December 2019 in Herat province, Afghanistan. Our evaluation spans various timescales of standardized precipitation index (SPI) 3, SPI 6, SPI 9, and SPI 12. Statistical indicators like root-mean-square error, mean absolute error (MAE), mean absolute percentage error (MAPE), and R2 are employed. To comprehend data features thoroughly, each SPI series initially computed from the original monthly precipitation time series. Subsequently, each SPI undergoes decomposition using EEMD, resulting in intrinsic mode functions (IMFs) and one residual series. The next step involves forecasting each IMF component and residual using the corresponding ARIMA model. To create an ensemble forecast for the initial SPI series, the predicted outcomes of the modeled IMFs and residual series are finally added. Results indicate that EEMD-ARIMA significantly enhances drought forecasting accuracy compared to conventional ARIMA model.
HIGHLIGHTS
Improved drought forecasting: Our study introduces the ensemble empirical mode decomposition–autoregressive integrated moving average (ARIMA) model, enhancing drought forecasting accuracy over traditional ARIMA methods.
New drought model for Western Afghanistan: We present a customized model for drought prediction in Western Afghanistan, based on the standardized precipitation index.
INTRODUCTION
Location of precipitation stations, Herat province, Afghanistan (the study area).
Location of precipitation stations, Herat province, Afghanistan (the study area).
The gradual onset and development of drought can be beneficial for researchers to predict these occurrences ahead of time (Rossi 2003; Cancelliere et al. 2007). Mitigating risks has become an essential aspect of planning and policy-making processes. In recent years, this has been facilitated by the emergence of various modeling techniques that provide valuable information about precipitation shortfalls and ultimately enhance drought monitoring capabilities. Because drought adversely affects crops seasons and exacerbate water shortage crisis and have direct influence on human quality life, an early authentic drought prediction plays an indispensable role to mitigate its side effects (Alam et al. 2014).
Through investigating of drought literature, it can be easily found that there are several drought indices (DIs) including standardized precipitation evapotranspiration index, Palmer drought severity index, standardized precipitation index (SPI), effective drought index, and reconnaissance drought index. SPI is one of the most common DIs widely employed for drought forecasting (McKee et al. 1993) and has been endorsed by the World Meteorological Organization (WMO) (Hayes et al. 2011). The calculation of SPI is based on distribution of precipitation, with monthly rainfall amounts being normalized using a gamma or Pearson type III distribution (McKee et al. 1993). Several benefits for employing the SPI index including less complex computation process, offering stable spatial interoperation in different climate area, and its probabilistic features (Guttman 1998; Zargar et al. 2011) were among compelling reasons to utilize this index in the current study. The importance of SPI is mostly in areas with data-sparse characteristics where immediate access to other parameters including evaporation, streamflow, and soil moisture information is impossible (Pasteris et al. 2005). In addition to the aforementioned advantages of using SPI, it can be computed in different timescales that offers a wide perspective into various kinds of drought. For instance, short to medium timescale proves effective in understanding meteorological and agricultural drought, while longer timescales are more applicable to hydrological and socioeconomic drought (Gumus & Algin 2017). The SPI possesses a multitude of versatile applications. In recent years, it has been employed to evaluate groundwater drought and to perform spatiotemporal analysis of floods and droughts (Kumar et al. 2016; Liu et al. 2018; Anshuka et al. 2019).
Although forecasting drought events present inherent challenges due to the complexity of time series data, several approaches have emerged in recent years, including autoregressive integrated moving average (ARIMA), artificial neural network (ANN), support-vector machine (SVM), fuzzy logic, and Gaussian process regression.
For instance, a singular spectrum analysis (SSA) and single least-square support-vector machine (LSSVM) model combined to anticipate drought. Data on the SPI are obtained for the southern Taiwanese Tseng-Wen reservoir catchment (1975–2015). According to the results, LSSVM1 (an LSSVM-based technique that uses antecedent SPI as an input) is not as effective as SSA-LSSVM2 (an LSSVM-based model combined with SSA utilizing the antecedent SPI as input). Furthermore, SSA-LSSVM3 is the most appropriate model for SPI drought forecasting when SSA-LSSVM2 and SSA-LSSVM3 (an SSA-LSSVM-based technique that uses antecedent cumulative monthly rainfall as an input) are compared for performance (Pham et al. 2021; Alawsi et al. 2022). In addition, the wavelet packet transform (WPT) method was applied to the data preprocessing step. The model's construction and evaluation were based on India's SPI from 1985 to 2013. The results show that hybrid strategies – WPT-ANN and WPT-support-vector regression (SVR) – perform better than ANN and SVR, which are standalone machine learning approaches. The hybrid WPT-ANN model fared better than the WPT-SVR model for the majority of grid sites (Das et al. 2020). In Ethiopia's Awash River Basin, three models were utilized to forecast SPI 3 and SPI 12 between 1970 and 2005: SVR, ANN, and hybrid wavelet and ANN (W-ANN). The performances of each model were evaluated using R2, root-mean-square error (RMSE), and mean absolute error (MAE). The findings show that the most effective method for predicting droughts is to use wavelet neural networks (W-ANN) (Belayneh & Adamowski 2012).
Among these methods, ARIMA, a data-driven approach, has been widely utilized in drought and streamflow forecasting due to its ability to consider serial correlations inherent in time series data (Mishra & Desai 2005). Compared to alternatives like ANN and exponential smoothing, ARIMA offers advantages such as a comprehensive search stage involving model identification, estimation, and diagnostic checking, making it a popular choice for various types of time series data (Zhang 2003; Mishra & Desai 2005). However, it is important to note that according to Nourani et al. (2009), ARIMA is most suitable for stationary and some simple nonstationary datasets, and it may have limitations in capturing nonstationary time-oriented data. To address this limitation, decomposing techniques have been introduced to enhance the accuracy of ARIMA models for nonstationary series. For example, in a study by Rezaiy & Shabri (2023), a hybrid wavelet and ARIMA (W-ARIMA) model was employed for drought prediction in Kabul, Afghanistan. This approach aimed to overcome the limitations of the ARIMA model in capturing the nonstationarity of SPI values. The study's results demonstrated the superior performance of the W-ARIMA model over the traditional ARIMA in predicting drought events, as evidenced by various error metrics. Similarly, another study (Shabri 2014) focused on utilizing the SPI time series as a drought indicator. The investigation assessed the effectiveness of a coupled wavelet-adaptive neuro-fuzzy inference system (WANFIS) model for drought forecasting. The analysis incorporated SPI 3, 6, 9, and 12 series to evaluate drought conditions. To illustrate the applicability of the WANFIS model in drought forecasting, rainfall data from the Klang River basin in Malaysia were obtained. Time series data typically include both linear and nonlinear fluctuations from many applications. Low-precision linear ARIMA and nonlinear ANN models can be used to model discrete time series of drought events. But when the advantages of ARIMA and ANN models were merged, a hybrid model was created that performed better than either model alone. In the study by Khan et al. (2020), a novel ARIMA-ANN hybrid forecasting model is created, utilizing the discrete wavelet transform. It is determined that for SPI and standard index of annual precipitation, which are regarded as meteorological DIs, the wavelet-ARIMA-ANN (W-2A) model assisted in enhancing forecasting accuracy. In the Algerois basin of Algeria, the SPI at three timescales (SPI 3, SPI 6, and SPI 12) was projected using the combined wavelet (W) and neural networks (W-ANN) technique (1936–2008). The solo, ANN, and two conventional stochastic models (ARIMA and SARIMA) were contrasted with the combined model. According to the findings, the wavelet approach enhances the quality of the data, and W-ANN performs best when considering various statistical parameters (Djerbouai & Souag-Gamane 2016).
These techniques involve generating stationary sub-series that are predicted using ARIMA, and the reconstructed combination of these sub-series serves as the prediction for the original time series. Several studies showed that hybrid models are more capable than applying individual models in time series analyzing except when the single model decomposed by preprocessing techniques such as wavelet transform. However, the application of different hybrid models increased in time series analysis because of their higher prediction accuracy opposed to single models, and some research studies announced that extreme utilization of such hybrid models lowered prediction accuracy and produced a model with no significant excellency (Khan et al. 2020).
Hydrological time series records including drought events encompass some subprocess and modeling, making it occasionally improper to use an individual model. To achieve a better comprehension of hidden physical features in time series, the original data's time–frequency change laws can be investigated by applying decomposition techniques (Wang et al. 2018). To simplify the prediction of a complicated series, a suitable decomposition approach can be exercised to the original time series to separate a single series into some sub-series, capturing a different aspect of original signal, and then to improve precision of the forecasting accuracy.
A majority of decomposition techniques used in earlier studies are based on wavelet analysis (WA), which can be used to recover potential information from nonstationary signals. The primary characteristic of WA is its capacity to localize a process on a temporal scale (Nourani et al. 2014). Multiple studies have shown that utilizing wavelet-decomposed time series as model inputs can produce more accurate predictions than using the original time series (Liu et al. 2011; Nourani et al. 2012; Nourani et al. 2014; Badrzadeh et al. 2016). However, several disadvantages for WA are not insignificant. For instance, poor mother wavelet functions or a low level of decomposition will reduce the accuracy of predictions.
As previously mentioned, ARIMA stands out as a popular choice for drought prediction owing to its enhanced flexibility and ability to capture comprehensive data changes over time. However, relying solely on the forecasting accuracy of a single model may fall short in meeting the demands of effective drought prediction. Moreover, precipitation data often exhibits nonlinear and nonstationary features. To enhance prediction accuracy, researchers turn to the use of hybrid models in the field of drought research.
In addition to wavelet transform, empirical mode decomposition (EMD) is one such method employed for processing nonstationary and nonlinear signals with frequency components. Unlike wavelet transformation, which may suffer from the limitations of poor mother wavelet functions or a low level of decomposition leading to lower accuracy in drought prediction, EMD emerges as a valuable technique. Combining EMD with ARIMA proves to be an effective strategy for improving time series drought prediction. This integration addresses the shortcomings associated with the purely linear and stationary nature of ARIMA, making the hybrid model better suited to handle the complexities inherent in drought forecasting (Libanda & Nkolola 2022; Xu et al. 2022).
Huang et al. (1998) proposed EMD, a data-adaptive technique that operates directly in temporal space as opposed to the equivalent frequency space as in WA (Huang & Wu 2008). Without requiring any prior basis functions, EMD sifts nonlinear and nonstationary series into several intrinsic mode functions (IMFs) and a residual component (Lee & Ouarda 2010; Zhu et al. 2016). As a data preparation technique, it can enhance the functionality of data-driven models (Wu & Huang 2009). According to Wang et al. (2018), EMD performs better than spectral time series analysis technology in analyzing the distinctive scales of annual streamflow time series. To forecast annual streamflow, Zhang et al. (2016) proposed a hybrid data-driven model that combines the external force variable, radial basis function neural networks, and EMD. They came to the conclusion that the hybrid model is workable for data-driven hydrologic forecasting in complex socio-hydrologic systems.
Wu & Huang (2009) proposed ensemble empirical mode decomposition (EEMD) with the aid of white noise to solve the mode mixing problem of the EMD method, which arises when a single IMF contains signals of widely different scales or when a signal of the same scale resides in different IMF components of the EMD method. With the addition of white noise and without the requirement for any basis functions, it can clearly combine various scale signals into suitable IMFs (Wu & Huang 2009). The mode mixing issue with the EMD approach is effectively resolved by EEMD. The genuine IMF component is defined by EEMD as the mean of an ensemble of trials, where each trial comprises the original signal and a finite-amplitude white noise. With its constituent parts belonging to various scales, the additional white noise is uniformly present over the entire time–frequency space. The mode mixing issue can be somewhat eliminated by EEMD, while the physical uniqueness of the decomposition can be maintained (Wu & Huang 2009). By employing EEMD, each component represents the real instinct change rule for a specific timescale of the original data.
For the prediction of hydrological streamflow, hybrid approaches combining decomposition and reconfiguration models have been quite popular recently. For monthly streamflow forecasting, Zhu et al. (2016) combined EMD and SVM models, and the hybrid model's best mean absolute percentage error (MAPE) is 17%. To create three hybrid models for the monthly streamflow forecasting of two stations, Zhang et al. (2015) merged three preprocessing techniques – WA, EMD, and single spectrum analysis (SSA) – with the autoregressive moving average (ARMA) model. These models are WA-ARMA, EMD-ARMA, and SSA-ARMA. EMD-ANN hybrid models with a predictive accuracy correlation coefficient (R) of 0.801 were developed by Kisi et al. (2014). For hydrological forecasting, attempts to couple EEMD with ANN were also made (Di et al. 2014; Wang et al. 2015; Zhao & Chen 2015). The original hydrological time series was broken down into several components by EEMD, and these components were used as inputs in ANN models for forecasting, and the outcomes supported its validity.
However, not many scientists have used EEMD to predict droughts. As a result, the primary goal of this study is to provide a novel hybrid method that effectively combines the benefits of two distinct models including the EEMD and ARIMA, for SPI-based drought prediction. Previous research on the drought in Afghanistan did not examine this hybrid method. The predictive hybrid model developed in this study using the SPI time series contributes to the existing knowledge and offers a fresh method for predicting future drought conditions. In summary, our research introduces a novel hybrid approach, combining EEMD with the ARIMA model for SPI-based drought prediction in Herat, Afghanistan – a region prone to severe drought but lacking sufficient scientific exploration. This study addresses a crucial literature gap by introducing EEMD, an underexplored technique in drought forecasting. By combining the strengths of ARIMA and EEMD, the hybrid model presented here aims to enhance predictive accuracy. This fills a significant gap in the literature and provides a valuable tool for water resource management in the drought-prone area of Herat, Afghanistan.
MATERIALS AND METHODS
Study area and data
Afghanistan has been suffering from a severe to below-normal drought, which has impacted the accessibility and availability of clean drinking water. Around 79% of homeowners reported having insufficient access to water for basic needs like drinking, cooking, bathing, and hygiene, according to the study by Mayar (2021). Therefore, by foreseeing drought, policymakers and managers of the water supply can at least lessen its disastrous impacts by taking sensible precautions in advance. The World Bank Climate Change Knowledge Portal (https://climateknowledgeportal.worldbank.org) was used to obtain the monthly precipitation data for Herat, the largest and most significant urban area in western Afghanistan, from January 1970 to December 2019 for this study.
Afghanistan is located in the southern boundary of the Eurasian plate, which was suggested during the Permian and Triassic periods, according to Ruleman et al. (2007). It consists of three thick sedimentary rock regions: the Southeastern Katawaz region, the Northern Afghanistan Basin, and the Southwestern Afghanistan Basin. The majority of Herat City's location is within the Basin of Southwest Afghanistan. The Hari Rud-Murghab River System, a 1,100 km long fluvial system, dominates the extensively irrigated areas surrounding Herat City and accounts for about 12% of Afghanistan's total water resources. At an elevation of 922 m, Herat in western Afghanistan is located on the Hari Rud River with the latitude and longitude coordinates of 34.343044 and 62.199074, respectively. Figure 1 depicts the geographical region under investigation.
Herat province is one of the provinces in Afghanistan grappling seriously with the issue of water scarcity. Mismanagement of surface waters has led farmers to increasingly rely on groundwater resources in recent years, significantly diminishing the water tables in this province. The substantial decrease in rainfall, irrational water consumption, and unregulated digging of deep wells are major factors contributing to the fundamental decline in groundwater levels, exacerbating drought conditions in the region. Concerns among the citizens of Herat are heightened as the current water consumption solely relies on underground sources, potentially leading to severe water shortages and increased aridity in the future. Moreover, in a country like Afghanistan, characterized by arid regions, environmental threats and water resource preservation are of paramount importance, especially considering the influence reaching the northern parts of the country. The current precipitation levels, water resource limitations, and climatic conditions underscore the necessity for drought preparedness and serious measures to address and mitigate its impacts when it occurs. Drought signifies various environmental effects on living organisms, animals, and humans, whereas aridity is a generalized concept not applicable to environmental elements. Proper management and reduction of drought-related consequences in regional development require strategic planning and proactive measures, demanding the application of sufficient knowledge in drought prediction (Timuri 2023).
Standardized precipitation index
The SPI, a widely used DI for detecting drought events, was first proposed by McKee et al. (1995). It uses historical precipitation data in space and includes both positive and negative values; the former show surpluses, while the latter show shortages. Despite the fact that there are numerous DI, the WMO tried to create a standard index that can be used as a starting point for all regions and nations, as opposed to other DI that can be applied in only certain areas and require large datasets and complicated execution procedures (Yihdego et al. 2019). In contrast to other DIs such as the Palmer index and crop moisture index, the SPI is solely based on precipitation, which makes its evaluation relatively simple (Cacciamani et al. 2007). SPI is regarded as the DI for illustrating the variability in Eastern African drought (Ntale & Gan 2003), and it can describe drought at various timescales, which is one of the main advantages of using SPI to forecast the occurrence of droughts.
According to Belayneh & Adamowski (2013), SPI can be calculated by fitting a probability distribution to monthly precipitation aggregate series (3, 6, 9, 12, and 24 months). This probability density function (PDF) was then converted into a normal standardized index, whose values classify the category of drought characteristics for each location and timescale.








The constants c0 = 2.515517, c1 = 0.802853, c2 = 0.010328, d1 = 1.432788, d2 = 0.189269, and d3 = 0.001308.
Table 1 displays the SPI categorization.
To sum up, the SPI was chosen for this study due to its unique advantages over other drought measures. Supported by the WMO, SPI provides robust spatial interoperability across a variety of climate zones with a simpler computing process. Because of its probabilistic properties, it is a solid option, particularly in regions with data-sparse characteristics, where immediate access to additional parameters may be challenging. Furthermore, a thorough understanding of the various types of droughts – from hydrological and socioeconomic droughts to meteorological and agricultural droughts – is made possible by SPI's versatility in computing different timescales. This study highlights SPI's critical role in tackling water resource management issues, particularly in Herat province, by utilizing its versatility to improve drought prediction in Afghanistan.
Classification of drought and wetness using SPI
SPI values . | Category . |
---|---|
+2 and higher than +2 | Extremely wet |
1.5 to 1.99 | Very wet |
1 to 1.49 | Moderately wet |
−0.99 to 0.99 | Nearly normal |
−1.49 to −1 | Moderate drought |
−1.99 to −1.5 | Severe drought |
−2 and less than −2 | Extreme drought |
SPI values . | Category . |
---|---|
+2 and higher than +2 | Extremely wet |
1.5 to 1.99 | Very wet |
1 to 1.49 | Moderately wet |
−0.99 to 0.99 | Nearly normal |
−1.49 to −1 | Moderate drought |
−1.99 to −1.5 | Severe drought |
−2 and less than −2 | Extreme drought |
ARIMA model




SARIMA model
Empirical mode decomposition
A nonstationary data processing method is EMD (Huang et al. 1998; Wang et al. 2018). Numerous applications of EMD have been made in the field of hydrological analysis to demonstrate its effectiveness without the necessity for basic functions (Lee & Ouarda 2010; Zhu et al. 2016). Different timescale features are present in hydrological time series, and these characteristics may be recovered using EMD as IMFs with various timescales. Each IMF must adhere to the following two requirements: (1) In the whole dataset, the number of zero crossings and local extreme values must be equal or separated by one. (2) At any time point, the mean values of both the upper envelope, which is defined by the local maximum, and the lower envelope, which is defined by the local lowest, must be zero.
The EMD procedure can be described as follows (Wu & Huang 2009):
- 1.
Let
stand for the SPI original time series. To create the upper envelope,
, and lower envelope,
, all local extremes of x(t) are detected, and all maxima and minima are connected by a cubic spline line (Hou & Andrews 1978). The cubic spline might have a large swing at the endpoint, which is a severe issue for the spline fitting approach. To solve this issue, Huang & Wu (2008) inserted characteristic waves at the endpoints of the original time series, which are determined by the two successive extrema for the frequency and amplitude of the additional waves. Large swings have been effectively contained using this technique.
- 2.
- 3.To obtain the component h(t), as stated in Equation (14), m(t) is subtracted from the initial time series x(t).
By using the aforementioned procedure, the initial SPI series can be broken down into a number of straightforward IMFs and a residual sequence, revealing the distinct scale and probable trend of time series. The EMD technique is an empirical, direct, and adaptable approach for data analysis that, in contrast to all the prior methods, analyzes data in temporal space rather than matching frequency space (Huang & Wu 2008).
Ensemble empirical mode decomposition
Due to signal intermittency, EMD may have mode mixing issues that cause timescale inaccuracies in IMF components, which prevents these components from accurately representing the real timescale features of the original time series (Wu & Huang 2009). The EEMD approach was put out by Wu & Huang (2009) to address such mode mixing issues. The average of a group of trials, each of which consists of a signal and a white noise with a limited amplitude, is what is referred to as the actual IMF components. The extra white noise would evenly spread the time–frequency space with a number of components at various scales. The signals with varied scales are automatically projected onto the appropriate scales of reference provided by the white noise in the background when they are added to the uniform white backdrop. Thus, this strategy effectively solves the mode mixing problem (Wu & Huang 2009). As a result, each IMF component exhibits characteristics of the original data's appropriate timescale. The following succinct summary of EEMD implementation is provided by Wu & Huang (2009):
- 1.
- 2.
Use EMD to deconstruct time series data after adding white noise.
- 3.Repeat steps 1 through 2 with various white noise, adding fresh white noise each time, integrating the results of the decomposition, and taking the average as the final result:where N is indicating the number of white noises added to the model, and
is the ith IMF following the addition of the jth white noise.
Because EEMD is effective at addressing the nonstationarity and nonlinearity present in drought time series data, it is frequently used as a preprocessing technique in conjunction with models such as ARIMA, ANN, SVM, and others in the field of drought forecasting. By breaking down data into IMFs, EEMD makes it easier to conduct a multiscale study and captures the different temporal scales that impact drought events. Its versatility in handling intricate patterns and capacity to boost the signal-to-noise ratio render it advantageous in distinguishing meaningful signals from background interference, hence augmenting the precision of ensuing forecasting models. Enhancing forecasting performance and supporting decision-making in water resource management and related applications, EEMD offers a more nuanced depiction of the underlying dynamics and mitigates assumptions violations in standard models (Başakın et al. 2021; Liu et al. 2022).
By decomposing the original data series, EEMD can effectively handle nonlinear and nonstationary data series and extract valuable information from the data. Wang et al. (2015) present the use of EEMD in conjunction with the ARIMA model to forecast yearly runoff time series. The findings suggest that EEMD can significantly improve predicting accuracy. Because of its strong generalization capabilities, the radial basis function neural network (RBFN) model is frequently employed in the time series analysis. It is capable of effectively forecasting data series. In recent years, ANN has been applied extensively and with good results in a wide range of fields. To solve the hydrologic problem, a hybrid model called EEMD–RBFN–SVM was presented. It was based on EEMD, RBFN, and SVM. The EEMD–RBFN–SVM model's performance was assessed using precipitation data from the Qinghai Yushu Tibetan Autonomous Region, which exhibits clear outperformance of hybrid EEMD with other RBFN and SVM in the context of hydrological time series prediction (Jiao et al. 2016).
The success of combining EEMD with various statistical and machine learning methods to tackle nonlinear and nonstationary time series has strongly encouraged us to incorporate this preprocessing approach into the traditional ARIMA framework for drought forecasting.
2.7. Hybrid EEMD-ARIMA model
Since the hybrid model combines multiple approaches, it is expected to perform better than other SPI-based drought prediction models. The hybrid method utilizes the complementing strengths of various models by combining preprocessing techniques such as EEMD with statistical methods like ARIMA. To address the nonlinear and nonstationary properties included in SPI time series data, this integration is especially important because it enables a more thorough description of intricate patterns. Furthermore, the hybrid model is expected to outperform in capturing and forecasting complex dynamics of drought conditions based on SPI metrics due to its adaptability to varied temporal scales, decreased sensitivity to model assumptions, and improved accuracy through the collaborative expertise of different methods (Wang et al. 2018; Alawsi et al. 2022).
- 1.
First, using the EEMD methodology, the original yearly SPI time series y(t) (t = 1, 2,…, n) is divided into m IMF components, ci(t), i = 1, 2,…, m, and one residual component rm(t).
- 2.
Second, the ARIMA model is applied as a forecasting tool to model the decomposed IMF and residual components and produce the corresponding prediction for each extracted IMF component and residual component, respectively.
- 3.
Finally, the forecasting outcomes of all extracted IMF and residual components acquired by ARIMA are aggregated to provide an output that may be utilized as the final forecasting outcome for the initial yearly SPI time series.
In conclusion, the hybrid EEMD-ARIMA forecasting model implements the decomposition and ensemble concepts. The ensemble creates a consensus forecast based on the original data, whereas the decomposition simplifies the forecasting work. Herat annual precipitation time series from January 1970 to December 2019 are used for testing purposes in this study to verify and make the pattern of the extracted IMFs and residual components reflect the forecasting model and to improve forecasting performance.
Evaluation indicators
Key metrics such RMSE, MAE, MAPE, and R2 have specific meanings in the context of drought prediction and management. Decision-makers can better comprehend and gauge the severity of the situation when the forecasting model more correctly captures the magnitude of drought occurrences, as indicated by the reduced RMSE and MAE. Simultaneously, a lower MAPE signifies a closer percentage alignment between the expected and observed values, providing reliable data for mitigation measures and resource allocation. Furthermore, a higher R2 indicates that a greater percentage of the variability in the observed drought conditions can be explained by the model, providing confidence in its capacity to replicate drought patterns and enabling sound resource allocation and management decision-making.
R2 is a significant parameter to measure the explanatory power of the model and its capacity to explain the variance in the observed SPI values in the context of SPI-based drought forecasting. A higher R2 indicates that the forecasting model can account for a greater share of the variability in SPI, which represents standardized precipitation anomalies. Practically speaking, a high R2 indicates that the model is good at identifying and recreating the dynamics and patterns present in SPI time series data. This reflects the degree to which the model offers a trustworthy depiction of observed drought conditions, which is important information for decision-makers in drought management. Conversely, a low R2 can suggest that the model is not doing a good enough job of explaining the variability in SPI, which could result in fewer accurate forecasts. Consequently, a careful analysis of R2 in SPI-based drought forecasting highlights the explanatory ability of the model, affecting the level of trust that managers of water resources might have in the forecasted results and facilitating better-informed decision-making in drought-prone areas.
APPLICATION ANALYSIS AND DISCUSSION
SPI 3, 6, 9, and 12 plots based on historical data from Herat (the case study area).
SPI 3, 6, 9, and 12 plots based on historical data from Herat (the case study area).
Decomposing SPI time series using EEMD
where n represents ensemble number, is the amplitude of added white noise, and
is indicator of the standard deviation of error. Wu & Huang (2009) came to the conclusion that the effect of the additional white noise should diminish using Equation (21) and noted that if the additional noise amplitude is too small, it might not result in the change of extrema that the EMD depends on; in contrast, if the additional noise is too large, it would produce redundant IMF components. According to the experimental findings in the majority of cases, the injected noise's magnitude should be roughly 0.2 times the standard deviation of the sample data (Wu & Huang 2009). Zhang et al. (2010) studied the EEMD parameter settings for various applications, and their results on the additional white noise amplitude are consistent with the results presented by Wu & Huang (2009). The EEMD approach is used in this work with an ensemble size of 100 and a white noise amplitude of 0.2 times standard deviation. Wu & Huang (2009) provide a detailed examination of the impacts of noise and ensemble averaging on decomposition with reference to these factors; thus, it is not necessary to repeat it here.
Hence, this study follows the statistical principles outlined by Wu & Huang (2009) to determine the amplitude of added white noise, ensemble number, and the indicator of the standard deviation of error (σ). These principles guide the selection of white noise magnitude and ensemble size for the EEMD process. The parameters are fixed based on Equation (22). Aiming for an ideal balance, the selected white noise amplitude was roughly 0.2 times the sample data standard deviation. This allowed for enough impact without adding unnecessary information, which is consistent with the results of the experiments and the literature (Zhang et al. 2010). The ensemble size, set at 100, followed established practices, considering the robustness of the decomposition process for precipitation data. So, to ensure the physical relevance and accuracy of the EEMD decomposition in the context of drought forecasting for SPI time series, established guidelines and literature findings were taken into consideration while choosing the values.
All IMFs were found to have unique amplitudes, wavelengths, and frequencies. The first IMF had the smallest wavelength, highest frequency, and greatest amplitude. In comparison to the first IMF, the second IMF had a lesser amplitude, lower frequency, and longer wavelengths. Similar to this, the pattern continues to the last IMF in which the wavelength grows while the matching frequency and amplitude decreases. As a result, the decomposition offered by EEMD is physically relevant with various scales for each independent IMF (Debert et al. 2011). The residue component of the time series for monthly precipitation shows the overall trend. Decomposition can be therefore used to convert nonstationary, nonlinear time series into stationary time series and to enhance drought forecasting accuracy.
ARIMA modeling approach
In general, the ARIMA model is frequently used for time series forecasting for the Box–Jenkins approach (Box & Jenkins 1976). However, for the ARIMA model to be applied, the time series must be stationary. In other words, the ARIMA algorithm assumes that the process maintains equilibrium at a fixed mean level. So, checking the stationarity of the data is the first step in time series approaches. Plots of the autocorrelation function (ACF) and partial autocorrelation function (PACF) can be used to check data stationarity, which a rapid fluctuation in time series data indicate nonstationarity. The nonstationary nature of the time series is shown by a slow decaying ACF plot, and it is eliminated using differencing treatment to produce stationary data (Box et al. 1994). In this study, to check the stationarity of monthly original SPI and decomposed SPI series augmented Dickey–Fuller (ADF) is utilized (Elliott et al. 1996).
Model identification, parameter estimation, and diagnosis checking are the three primary phases that make up the ARIMA model, as we are all aware. The identification step, which consists of two stages, is crucial among these three procedures: (1) if necessary, proper series differencing is carried out to attain stationarity and normality and (2) the order of the AR and MA parts of ARIMA model is identified. The main tools used by Box and Jenkins (Box & Jenkins 1976) to determine the order of the ARIMA model were the ACF and PACF of the sample data. If sample data is an AR(p) model, the PACF cutoff is at lag p. In contrast, if sample data are an MA(q) process, the ACF has a cutoff at lag q. However, when dealing with mixed ARIMA processes, the PACF and ACF technique is not very helpful. In most cases, a quick glance at the ACF and PACF graphs would not reveal the values of p and q for mixed models (Chan 1999). The maximum likelihood approach, which was put forth by Box & Jenkins (1976), was used in this study to estimate the ARIMA model's parameters, which is a technique for determining a statistical model's parameters.

Proposed EEMD-ARIMA approach
According to the aforementioned description, the ADF test is first applied to the original monthly SPI time series and all deconstructed SPI components. Table 2 provides a summary of the outcomes of the ADF unit root testing. As seen in Table 2, for SPI 3, both the original and decomposed IMF 1–IMF 5 time series are stationary. The ADF test indicated that IMF 8 and the residue became stationary after applying one differencing. In addition, IMF 6 and IMF 7 achieved stationarity after applying a differencing order of 2. In the case of SPI 6, the original and decomposed IMF 1–IMF 5 are already stationary. In contrast, IMF 6 became stationary by applying first-order differencing, while IMF 7 and IMF 8 achieved stationarity through the application of second-order differencing. Original IMF 1–IMF 4 and residue of SPI 9 are inherently stationary. To achieve stationary series for IMF 5–IMF 8, the second-order differencing is employed. The original data, along with IMF 1 to IMF 4 and the residue of SPI 9, is inherently stationary. To achieve stationary series for IMF 5–IMF 8, second-order differencing is applied. In a similar manner, for SPI 12, to attain stationary series for IMF 5–IMF 8 as well as the residue, second-order differencing is applied. However, the ADF test for the original data and IMF 1–IMF 8 has already confirmed the presence of stationary series, negating the need for applying differencing.
Augmented Dickey–Fuller test for stationarity in SPI series
Augmented Dickey–Fuller test for SPI series . | |||||||
---|---|---|---|---|---|---|---|
. | SPI . | i-Statistic . | p-Value . | . | SPI . | t-Statistic . | p-Value . |
SPI 3 | Original | −8.7072 | 0.01 | SPI 6 | Original | −6.7912 | 0.01 |
IMF 1 | −11.312 | 0.01 | IMF 1 | −11.044 | 0.01 | ||
IMF 2 | −11.109 | 0.01 | IMF 2 | −10.817 | 0.01 | ||
IMF 3 | −10.568 | 0.01 | IMF 3 | −11.866 | 0.01 | ||
IMF 4 | −6.7024 | 0.01 | IMF 4 | −4.7236 | 0.01 | ||
IMF 5 | −5.0946 | 0.01 | IMF 5 | −5.2543 | 0.01 | ||
IMF 6 | −4.2457 | 0.347 | IMF 6 | −3.7869 | 0.0197 | ||
IMF 7 | −3.1274 | 0.1011 | IMF 7 | −2.1965 | 0.4952 | ||
IMF 8 | 2.5335 | 0.99 | IMF 8 | 1.4107 | 0.99 | ||
Residue | 0.22081 | 0.99 | Residue | −193117 | 0.01 | ||
SPI 9 | Original | −8.2453 | 0.01 | SPI 12 | Original | −6.2147 | 0.01 |
IMF 1 | −10.449 | 0.01 | IMF 1 | −10.319 | 0.01 | ||
IMF 2 | −11.989 | 0.01 | IMF 2 | −13.536 | 0.01 | ||
IMF 3 | −9.9568 | 0.01 | IMF 3 | −12.449 | 0.01 | ||
IMF 4 | −7.0845 | 0.01 | IMF 4 | −7.4394 | 0.01 | ||
IMF 5 | −1.6962 | 0.7069 | IMF 5 | −1.4495 | 0.8113 | ||
IMF 6 | −3.8991 | 0.0141 | IMF 6 | −3.7497 | 0.0215 | ||
IMF 7 | −2.0814 | 0.5439 | IMF 7 | −1.7361 | 0.69 | ||
IMF 8 | 1.3193 | 0.99 | IMF 8 | 2.6928 | 0.99 | ||
Residue | 0.54257 | 0.99 | Residue | 0.067147 | 0.99 |
Augmented Dickey–Fuller test for SPI series . | |||||||
---|---|---|---|---|---|---|---|
. | SPI . | i-Statistic . | p-Value . | . | SPI . | t-Statistic . | p-Value . |
SPI 3 | Original | −8.7072 | 0.01 | SPI 6 | Original | −6.7912 | 0.01 |
IMF 1 | −11.312 | 0.01 | IMF 1 | −11.044 | 0.01 | ||
IMF 2 | −11.109 | 0.01 | IMF 2 | −10.817 | 0.01 | ||
IMF 3 | −10.568 | 0.01 | IMF 3 | −11.866 | 0.01 | ||
IMF 4 | −6.7024 | 0.01 | IMF 4 | −4.7236 | 0.01 | ||
IMF 5 | −5.0946 | 0.01 | IMF 5 | −5.2543 | 0.01 | ||
IMF 6 | −4.2457 | 0.347 | IMF 6 | −3.7869 | 0.0197 | ||
IMF 7 | −3.1274 | 0.1011 | IMF 7 | −2.1965 | 0.4952 | ||
IMF 8 | 2.5335 | 0.99 | IMF 8 | 1.4107 | 0.99 | ||
Residue | 0.22081 | 0.99 | Residue | −193117 | 0.01 | ||
SPI 9 | Original | −8.2453 | 0.01 | SPI 12 | Original | −6.2147 | 0.01 |
IMF 1 | −10.449 | 0.01 | IMF 1 | −10.319 | 0.01 | ||
IMF 2 | −11.989 | 0.01 | IMF 2 | −13.536 | 0.01 | ||
IMF 3 | −9.9568 | 0.01 | IMF 3 | −12.449 | 0.01 | ||
IMF 4 | −7.0845 | 0.01 | IMF 4 | −7.4394 | 0.01 | ||
IMF 5 | −1.6962 | 0.7069 | IMF 5 | −1.4495 | 0.8113 | ||
IMF 6 | −3.8991 | 0.0141 | IMF 6 | −3.7497 | 0.0215 | ||
IMF 7 | −2.0814 | 0.5439 | IMF 7 | −1.7361 | 0.69 | ||
IMF 8 | 1.3193 | 0.99 | IMF 8 | 2.6928 | 0.99 | ||
Residue | 0.54257 | 0.99 | Residue | 0.067147 | 0.99 |
As a supportive visualization method, we also used ACF and partial autocorrelation plots to check the stationarity of SPI series. To determine stationarity, a rapid decay is observed in ACF, a sudden cutoff takes place in PACF, and it is ensured that there are no persistent spikes in either plot beyond specific lags. On the other hand, if there is a slow decay in ACF or persistent spikes in PACF, it might indicate nonstationarity, meaning the data have a trend or repeating pattern. Overall, we are checking for patterns that show the data do not have a persistent trend for a stationary series. Combining both ACF and PACF helps us get a clear picture.
To determine the order of the ARIMA model, we utilized both ACF and PACF plots, in addition to the ‘auto.arima’ function in R. First, it is essential to confirm the stationarity of the data using the ADF test. If the series is found to be nonstationary, differencing is applied to achieve stationarity. Next, determine the order of auto regressive (p) and MA (q) terms by looking at the ACF and PACF plots of the differenced series. The PACF plot, which shows partial autocorrelations, assists in detecting p order, while the ACF plot, which shows autocorrelations at different lags, assists in recognizing the necessity for q terms. Examine both plots for notable spikes and cutoff patterns. A severe cutoff or notable spike at lag k in the PACF figure indicates that a p term of order k is required. Likewise, an order of q is shown if the ACF plot stops at lag k. To prevent overdifferencing and to look for unit roots, care should be given and model orders adjusted appropriately. The aim is to achieve a balance between model complexity and efficacy in identifying the underlying patterns in the time series.
The best-selected model for each original SPI and its decomposed components was chosen based on the AIC and BIC criteria. The models with the minimum AIC and BIC were considered the best choices among the other candidates. Table 3 indicates the best-selected ARIMA/SARIMA models for each SPI and their corresponding decomposed counterparts, while Table 4 provides the estimated parameters for these selected models.
Summary of the best-selected ARIMA/SARIMA model for each decomposed and original SPI se based on AIC and BIC criterion
SPI series . | Model . | AIC . | BIC . | |
---|---|---|---|---|
SPI 3 | Original | ARIMA(2,0,2) | 1168.39 | 1193.41 |
IMF 1 | ARIMA(2,0,1)(2,0,1)[3] | 583.16 | 612.37 | |
IMF 2 | ARIMA(3,0,3)(1,0,1)[3] | −683.83 | −642.09 | |
IMF 3 | ARIMA(4,0,3) | −2911.12 | −2873.56 | |
IMF 4 | ARIMA(3,0,4)(1,0,1)[3] | −5900.84 | −5854.93 | |
IMF 5 | ARIMA(5,0,4)(1,0,1)[3] | −9230.57 | −9176.31 | |
IMF 6 | ARIMA(1,2,3)(0,0,1)[3] | −10,817.71 | −10,792.69 | |
IMF 7 | ARIMA(2,2,1) | −13,175.09 | −13,158.41 | |
IMF 8 | ARIMA(1,2,3) | −16,675.75 | −16,654.9 | |
Residue | ARIMA(2,2,2) | −21,672.52 | −21,651.67 | |
SPI 6 | Original | ARIMA(2,0,1)(2,0,0)[6] | 1035.7 | 1064.85 |
IMF 1 | ARIMA(2,0,2)(2,0,0)[6] | 401.46 | 430.68 | |
IMF 2 | ARIMA(2,0,4) | −934.82 | −901.43 | |
IMF 3 | ARIMA(4,0,1)(2,0,2)[6] | −2929.65 | −2887.91 | |
IMF 4 | ARIMA(4,0,1)(2,0,1)[6] | −5443.85 | −5402.11 | |
IMF 5 | ARIMA(2,0,2)(2,0,1)[6] | 404.55 | 442.12 | |
IMF 6 | ARIMA(2,1,2) | −10,366.43 | −10,345.57 | |
IMF 7 | ARIMA(1,2,3) | −12,354.06 | −12,333.21 | |
IMF 8 | ARIMA(1,2,3) | −15,755.59 | −15,734.74 | |
Residue | ARIMA(0,2,4)(0,0,2)[6] | −14,086.88 | −14,057.69 | |
SPI 9 | Original | ARIMA(1,0,0)(2,0,1)[9] | 691.37 | 716.32 |
IMF 1 | ARIMA(6,0,6)(1,0,1)[9] | −120.37 | −53.59 | |
IMF 2 | ARIMA(5,0,5)(1,0,0)[9] | −1328.13 | −1273.87 | |
IMF 3 | ARIMA(5,0,3) | −3384.73 | −3342.99 | |
IMF 4 | ARIMA(4,0,2)(0,0,1)[9] | −5956.57 | −5919 | |
IMF 5 | ARIMA(3,2,1)(0,0,1)[9] | −9195.8 | −9170.78 | |
IMF 6 | ARIMA(1,2,1)(0,0,1)[9] | −9976.5 | −9959.82 | |
IMF 7 | ARIMA(1,2,2) | −12,658.72 | −12,642.04 | |
IMF 8 | ARIMA(1,2,2) | −15,963.44 | −15,946.76 | |
Residue | ARIMA(1,2,0) | −14,610.08 | −14,601.74 | |
SPI 12 | Original | ARIMA(1,1,0)(1,0,1)[12] | 238.8 | 255.39 |
IMF 1 | ARIMA(4,0,5)(1,0,2)[12] | −636.18 | −577.74 | |
IMF 2 | ARIMA(5,0,5)(0,0,1)[12] | −1844.78 | −1790.52 | |
IMF 3 | ARIMA(6,0,6)(1,0,1)[12] | −3868.3 | −3801.52 | |
IMF 4 | ARIMA(4,0,5)(1,0,3)[12] | −6301.14 | −6238.53 | |
IMF 5 | ARIMA(2,2,3) | −9408.67 | −9383.65 | |
IMF 6 | ARIMA(2,2,2)(0,0,1)[12] | −10,505.03 | −10,480.01 | |
IMF 7 | ARIMA(1,2,2) | −12,844.36 | −12,827.68 | |
IMF 8 | ARIMA(1,2,3) | −16,132.19 | −16,111.34 | |
Residue | ARIMA(1,2,0) | −32,363.44 | −32,355.1 |
SPI series . | Model . | AIC . | BIC . | |
---|---|---|---|---|
SPI 3 | Original | ARIMA(2,0,2) | 1168.39 | 1193.41 |
IMF 1 | ARIMA(2,0,1)(2,0,1)[3] | 583.16 | 612.37 | |
IMF 2 | ARIMA(3,0,3)(1,0,1)[3] | −683.83 | −642.09 | |
IMF 3 | ARIMA(4,0,3) | −2911.12 | −2873.56 | |
IMF 4 | ARIMA(3,0,4)(1,0,1)[3] | −5900.84 | −5854.93 | |
IMF 5 | ARIMA(5,0,4)(1,0,1)[3] | −9230.57 | −9176.31 | |
IMF 6 | ARIMA(1,2,3)(0,0,1)[3] | −10,817.71 | −10,792.69 | |
IMF 7 | ARIMA(2,2,1) | −13,175.09 | −13,158.41 | |
IMF 8 | ARIMA(1,2,3) | −16,675.75 | −16,654.9 | |
Residue | ARIMA(2,2,2) | −21,672.52 | −21,651.67 | |
SPI 6 | Original | ARIMA(2,0,1)(2,0,0)[6] | 1035.7 | 1064.85 |
IMF 1 | ARIMA(2,0,2)(2,0,0)[6] | 401.46 | 430.68 | |
IMF 2 | ARIMA(2,0,4) | −934.82 | −901.43 | |
IMF 3 | ARIMA(4,0,1)(2,0,2)[6] | −2929.65 | −2887.91 | |
IMF 4 | ARIMA(4,0,1)(2,0,1)[6] | −5443.85 | −5402.11 | |
IMF 5 | ARIMA(2,0,2)(2,0,1)[6] | 404.55 | 442.12 | |
IMF 6 | ARIMA(2,1,2) | −10,366.43 | −10,345.57 | |
IMF 7 | ARIMA(1,2,3) | −12,354.06 | −12,333.21 | |
IMF 8 | ARIMA(1,2,3) | −15,755.59 | −15,734.74 | |
Residue | ARIMA(0,2,4)(0,0,2)[6] | −14,086.88 | −14,057.69 | |
SPI 9 | Original | ARIMA(1,0,0)(2,0,1)[9] | 691.37 | 716.32 |
IMF 1 | ARIMA(6,0,6)(1,0,1)[9] | −120.37 | −53.59 | |
IMF 2 | ARIMA(5,0,5)(1,0,0)[9] | −1328.13 | −1273.87 | |
IMF 3 | ARIMA(5,0,3) | −3384.73 | −3342.99 | |
IMF 4 | ARIMA(4,0,2)(0,0,1)[9] | −5956.57 | −5919 | |
IMF 5 | ARIMA(3,2,1)(0,0,1)[9] | −9195.8 | −9170.78 | |
IMF 6 | ARIMA(1,2,1)(0,0,1)[9] | −9976.5 | −9959.82 | |
IMF 7 | ARIMA(1,2,2) | −12,658.72 | −12,642.04 | |
IMF 8 | ARIMA(1,2,2) | −15,963.44 | −15,946.76 | |
Residue | ARIMA(1,2,0) | −14,610.08 | −14,601.74 | |
SPI 12 | Original | ARIMA(1,1,0)(1,0,1)[12] | 238.8 | 255.39 |
IMF 1 | ARIMA(4,0,5)(1,0,2)[12] | −636.18 | −577.74 | |
IMF 2 | ARIMA(5,0,5)(0,0,1)[12] | −1844.78 | −1790.52 | |
IMF 3 | ARIMA(6,0,6)(1,0,1)[12] | −3868.3 | −3801.52 | |
IMF 4 | ARIMA(4,0,5)(1,0,3)[12] | −6301.14 | −6238.53 | |
IMF 5 | ARIMA(2,2,3) | −9408.67 | −9383.65 | |
IMF 6 | ARIMA(2,2,2)(0,0,1)[12] | −10,505.03 | −10,480.01 | |
IMF 7 | ARIMA(1,2,2) | −12,844.36 | −12,827.68 | |
IMF 8 | ARIMA(1,2,3) | −16,132.19 | −16,111.34 | |
Residue | ARIMA(1,2,0) | −32,363.44 | −32,355.1 |
Extracted structure and parameters of ARIMA/SARIMA based on AIC and BIC
Structure ![]() . | ||||||
---|---|---|---|---|---|---|
SPI . | ![]() | ![]() | ![]() | ![]() | Intercept . | |
SPI 3 | Original | −0.2409, 0.3044 | 0.8790, 0.2809 | – | – | −0.0218 |
IMF 1 | −0.4174, −0.3940 | 0.0660 | 0.01692, 0.2351 | −0.6360 | – | |
IMF 2 | 1.7332, −1.2695, 0.3661 | 0.5420, −0.7911, −0.5352 | 0.9352 | −0.9888 | −0.0057 | |
IMF 3 | 3.4687, −4.7641, 3.0574, −0.7772 | 0.0405, 0.5438, −0.4197 | – | – | 0.0063 | |
IMF 4 | 2.6841, −2.4324, 0.7386 | 1.8703, 1.9749, 1.1563, 0.3704 | – | – | −0.0022 | |
IMF 5 | 2.8356, −1.8005, −1.6866, 2.4277, −0.7766 | 2.2394, 2.1006, 1.0899, 0.2477 | 0.2829 | −0.0640 | −0.0080 | |
IMF 6 | 0.9511 | 2.1356, 2.0102, 0.7603, −0.7928 | – | 0.7963 | – | |
IMF7 | 1.9441, −0.9454 | −0.7927 | – | – | – | |
IMF 8 | 0.9935 | 1.8167, 1.5234, 0.5849 | – | – | – | |
Residue | 0.0034, 0.0997 | 0.0997, 0.0088 | – | – | – | |
SPI 6 | Original | −0.1528, 0.6894 | 0.9397 | −0.1600, −0.1505 | – | −0.0081 |
IMF 1 | 0.6380, −0.5642 | −0.9607, 0.5261 | −0.2298, −0.1888 | – | – | |
IMF 2 | 1.1709, −0.5340 | 1.0466, −0.2284, −0.7986, −0.3978 | – | – | 0.0031 | |
IMF 3 | 3.2550, −4.3106, 2.7282, −0.7098 | 0.5090 | −0.3478, −0.5471 | 0.4456, 0.6053 | – | |
IMF 4 | 3.6842, −5.2066, 3.3439, −0.8250 | 0.6847 | 0.8261, −0.1560 | −0.6867 | −0.0064 | |
IMF 5 | 0.6725, −0.5757 | −0.9951, 0.5513 | −0.0891, −0.1643 | −0.1556 | 0.0046 | |
IMF 6 | 1.9941, −0.9958 | 1.7473, 0.9697 | – | – | – | |
IMF7 | 0.9739 | 1.7212, 1.4532, 0.5938 | – | – | – | |
IMF 8 | 0.9937 | 2.2815, 2.0890, 0.7687 | – | – | – | |
Residue | – | 3.0428, 4.0657, 2.9476, 0.9383 | – | 1.1962, 0.8396 | – | |
SPI 9 | Original | 0.9140 | – | 0.3717, 0.1644 | −0.7952 | 0.0126 |
IMF 1 | 0.0733, −0.2958, 0.3445, 0.0454, 0.7514, −0.3630 | −0.5105, 0.1553, −0.3859, −0.0024, −0.6785, 0.7361 | 0.1612 | −0.4871 | 0.0017 | |
IMF 2 | 1.4464, −0.5645, −0.6913, 0.8231, −0.3078 | 0.4965, −0.9922, −0.7617, 0.14756, 0.3077 | −0.1392 | – | 0.0009 | |
IMF 3 | 4.2670, −7.5117, 6.8128, −3.1820, 0.6116 | −0.2681, −0.5230, −0.1493 | – | – | 0.0026 | |
IMF 4 | 3.7167, −5.2682, 3.3759, −0.8263 | 1.0392, 0.3307 | – | 0.0520 | −0.0146 | |
IMF 5 | 2.4858, −2.0056, 0.5149 | 0.6004 | −0.0069 | – | – | |
IMF 6 | 0.9895 | 0.9747 | – | 0.7081 | – | |
IMF 7 | 0.9854 | 1.2452, 0.6115 | – | – | – | |
IMF 8 | 0.9957 | 1.9015, 0.9485 | – | – | – | |
Residue | 0.9882 | – | – | – | – | |
SPI 12 | Original | 0.0885 | – | −0.2455 | −0.5738 | – |
IMF 1 | −0.3121, 0.2519, −0.3860, −0.7530 | −0.2130, −0.5854, 0.6124, 0.7488, −0.5729 | −0.0138 | −0.5002, 0.2241 | 0.0051 | |
IMF 2 | 2.5759, −3.0621, 1.9887, −0.7371, 0.1102 | −0.3409, −0.8279, −0.0327, 0.3896, 0.1478 | – | −0.5601 | 0.0029 | |
IMF 3 | 1.3862, 0.7032, −1.9933, 0.3425, 1.0202, −0.5654 | 2.6446, 3.1023, 2.6853, 2.2607, 1.4044, 0.4271 | −0.0372 | −0.3341 | 0.0061 | |
IMF 4 | 3.7954, −5.4732, 3.5543, −0.8776 | 1.1042, 0.3522, −0.1246, −0.2226, −0.1185 | −0.9165 | 0.6506, −0.0930, 0.1381 | 0.0004 | |
IMF 5 | 1.9686, −0.9774 | 1.2736, 0.6994, 0.2105 | – | – | – | |
IMF 6 | 1.9718, −0.9735 | 0.1499, −0.6899 | – | −0.5643 | – | |
IMF7 | 0.9880 | 1.7448, 0.8727 | – | – | – | |
IMF 8 | 0.9933 | 2.0196, 1.7685, 0.6679 | – | – | – | |
Residue | 1.0000 | – | – | – | – |
Structure ![]() . | ||||||
---|---|---|---|---|---|---|
SPI . | ![]() | ![]() | ![]() | ![]() | Intercept . | |
SPI 3 | Original | −0.2409, 0.3044 | 0.8790, 0.2809 | – | – | −0.0218 |
IMF 1 | −0.4174, −0.3940 | 0.0660 | 0.01692, 0.2351 | −0.6360 | – | |
IMF 2 | 1.7332, −1.2695, 0.3661 | 0.5420, −0.7911, −0.5352 | 0.9352 | −0.9888 | −0.0057 | |
IMF 3 | 3.4687, −4.7641, 3.0574, −0.7772 | 0.0405, 0.5438, −0.4197 | – | – | 0.0063 | |
IMF 4 | 2.6841, −2.4324, 0.7386 | 1.8703, 1.9749, 1.1563, 0.3704 | – | – | −0.0022 | |
IMF 5 | 2.8356, −1.8005, −1.6866, 2.4277, −0.7766 | 2.2394, 2.1006, 1.0899, 0.2477 | 0.2829 | −0.0640 | −0.0080 | |
IMF 6 | 0.9511 | 2.1356, 2.0102, 0.7603, −0.7928 | – | 0.7963 | – | |
IMF7 | 1.9441, −0.9454 | −0.7927 | – | – | – | |
IMF 8 | 0.9935 | 1.8167, 1.5234, 0.5849 | – | – | – | |
Residue | 0.0034, 0.0997 | 0.0997, 0.0088 | – | – | – | |
SPI 6 | Original | −0.1528, 0.6894 | 0.9397 | −0.1600, −0.1505 | – | −0.0081 |
IMF 1 | 0.6380, −0.5642 | −0.9607, 0.5261 | −0.2298, −0.1888 | – | – | |
IMF 2 | 1.1709, −0.5340 | 1.0466, −0.2284, −0.7986, −0.3978 | – | – | 0.0031 | |
IMF 3 | 3.2550, −4.3106, 2.7282, −0.7098 | 0.5090 | −0.3478, −0.5471 | 0.4456, 0.6053 | – | |
IMF 4 | 3.6842, −5.2066, 3.3439, −0.8250 | 0.6847 | 0.8261, −0.1560 | −0.6867 | −0.0064 | |
IMF 5 | 0.6725, −0.5757 | −0.9951, 0.5513 | −0.0891, −0.1643 | −0.1556 | 0.0046 | |
IMF 6 | 1.9941, −0.9958 | 1.7473, 0.9697 | – | – | – | |
IMF7 | 0.9739 | 1.7212, 1.4532, 0.5938 | – | – | – | |
IMF 8 | 0.9937 | 2.2815, 2.0890, 0.7687 | – | – | – | |
Residue | – | 3.0428, 4.0657, 2.9476, 0.9383 | – | 1.1962, 0.8396 | – | |
SPI 9 | Original | 0.9140 | – | 0.3717, 0.1644 | −0.7952 | 0.0126 |
IMF 1 | 0.0733, −0.2958, 0.3445, 0.0454, 0.7514, −0.3630 | −0.5105, 0.1553, −0.3859, −0.0024, −0.6785, 0.7361 | 0.1612 | −0.4871 | 0.0017 | |
IMF 2 | 1.4464, −0.5645, −0.6913, 0.8231, −0.3078 | 0.4965, −0.9922, −0.7617, 0.14756, 0.3077 | −0.1392 | – | 0.0009 | |
IMF 3 | 4.2670, −7.5117, 6.8128, −3.1820, 0.6116 | −0.2681, −0.5230, −0.1493 | – | – | 0.0026 | |
IMF 4 | 3.7167, −5.2682, 3.3759, −0.8263 | 1.0392, 0.3307 | – | 0.0520 | −0.0146 | |
IMF 5 | 2.4858, −2.0056, 0.5149 | 0.6004 | −0.0069 | – | – | |
IMF 6 | 0.9895 | 0.9747 | – | 0.7081 | – | |
IMF 7 | 0.9854 | 1.2452, 0.6115 | – | – | – | |
IMF 8 | 0.9957 | 1.9015, 0.9485 | – | – | – | |
Residue | 0.9882 | – | – | – | – | |
SPI 12 | Original | 0.0885 | – | −0.2455 | −0.5738 | – |
IMF 1 | −0.3121, 0.2519, −0.3860, −0.7530 | −0.2130, −0.5854, 0.6124, 0.7488, −0.5729 | −0.0138 | −0.5002, 0.2241 | 0.0051 | |
IMF 2 | 2.5759, −3.0621, 1.9887, −0.7371, 0.1102 | −0.3409, −0.8279, −0.0327, 0.3896, 0.1478 | – | −0.5601 | 0.0029 | |
IMF 3 | 1.3862, 0.7032, −1.9933, 0.3425, 1.0202, −0.5654 | 2.6446, 3.1023, 2.6853, 2.2607, 1.4044, 0.4271 | −0.0372 | −0.3341 | 0.0061 | |
IMF 4 | 3.7954, −5.4732, 3.5543, −0.8776 | 1.1042, 0.3522, −0.1246, −0.2226, −0.1185 | −0.9165 | 0.6506, −0.0930, 0.1381 | 0.0004 | |
IMF 5 | 1.9686, −0.9774 | 1.2736, 0.6994, 0.2105 | – | – | – | |
IMF 6 | 1.9718, −0.9735 | 0.1499, −0.6899 | – | −0.5643 | – | |
IMF7 | 0.9880 | 1.7448, 0.8727 | – | – | – | |
IMF 8 | 0.9933 | 2.0196, 1.7685, 0.6679 | – | – | – | |
Residue | 1.0000 | – | – | – | – |
The performances of ARIMA models based on the original SPI time series and the proposed EEMD-ARIMA models are assessed using the four quantitative standard statistical performance evaluation measures RMSE, MAE, MAPE, and R2. Table 5 summarizes the statistical outcomes of the various models in both training and validation periods. The most accurate models are those with the lowest RMSE, MAE, and MAPE and the highest R2.
Forecasting evaluation indices of all SPI series
Name . | Model . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | MAPE . | R2 . | RMSE . | MAE . | MAPE . | R2 . | ||
SPI 3 | ARIMA | 0.811 | 0.632 | 271.170 | 0.380 | 0.658 | 0.510 | 387.054 | 0.353 |
EEMD-ARIMA | 0.431 | 0.333 | 446.648 | 0.826 | 0.308 | 0.244 | 105.330 | 0.860 | |
SPI 6 | ARIMA | 0.708 | 0.509 | 179.427 | 0.554 | 0.503 | 0.546 | 149.074 | 0.578 |
EEMD-ARIMA | 0.714 | 0.505 | 193.719 | 0.672 | 0.211 | 0.170 | 119.262 | 0.926 | |
SPI 9 | ARIMA | 0.495 | 0.339 | 395.772 | 0.775 | 0.330 | 0.239 | 179.918 | 0.775 |
EEMD-ARIMA | 0.205 | 0.144 | 128.230 | 0.962 | 0.108 | 0.080 | 57.977 | 0.976 | |
SPI 12 | ARIMA | 0.306 | 0.182 | 90.516 | 0.914 | 0.214 | 0.128 | 58.442 | 0.899 |
EEMD-ARIMA | 0.118 | 0.084 | 85.546 | 0.987 | 0.060 | 0.047 | 15.188 | 0.992 |
Name . | Model . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | MAPE . | R2 . | RMSE . | MAE . | MAPE . | R2 . | ||
SPI 3 | ARIMA | 0.811 | 0.632 | 271.170 | 0.380 | 0.658 | 0.510 | 387.054 | 0.353 |
EEMD-ARIMA | 0.431 | 0.333 | 446.648 | 0.826 | 0.308 | 0.244 | 105.330 | 0.860 | |
SPI 6 | ARIMA | 0.708 | 0.509 | 179.427 | 0.554 | 0.503 | 0.546 | 149.074 | 0.578 |
EEMD-ARIMA | 0.714 | 0.505 | 193.719 | 0.672 | 0.211 | 0.170 | 119.262 | 0.926 | |
SPI 9 | ARIMA | 0.495 | 0.339 | 395.772 | 0.775 | 0.330 | 0.239 | 179.918 | 0.775 |
EEMD-ARIMA | 0.205 | 0.144 | 128.230 | 0.962 | 0.108 | 0.080 | 57.977 | 0.976 | |
SPI 12 | ARIMA | 0.306 | 0.182 | 90.516 | 0.914 | 0.214 | 0.128 | 58.442 | 0.899 |
EEMD-ARIMA | 0.118 | 0.084 | 85.546 | 0.987 | 0.060 | 0.047 | 15.188 | 0.992 |
Table 5 gives a thorough breakdown of how well the ARIMA and EEMD-ARIMA models performed throughout both the training and validation phases in predicting SPI values at various temporal scales.
In the context of the training phase for SPI 3, it is noted that the EEMD-ARIMA model outperforms the ARIMA model, with superior values for RMSE (0.431 versus 0.811), MAE (0.333 versus 0.632), and R2 (0.826 versus 0.380). However, it is acknowledged that the MAPE for EEMD-ARIMA is higher than that of ARIMA. Despite this anomaly in MAPE, the other error indices consistently demonstrate the significant improvement of the EEMD-ARIMA model over the conventional ARIMA approach. EEMD-ARIMA continues to perform better than ARIMA throughout the SPI 3 validation phase (RMSE: 0.658, MAE: 0.510, MAPE: 387.54, R2: 0.353 vs. RMSE: 0.308, MAE: 0.244, MAPE: 105.330, R2: 0.860). While also attaining a higher R2, EEMD-ARIMA demonstrated a substantially lower RMSE, MAE, and MAPE, indicating superior accuracy and a better fit to the validation data.
Upon a detailed examination for training data in SPI 6, the comparison between the ARIMA and EEMD-ARIMA models reveals distinct patterns in their performance metrics. In the ARIMA model, we observe a lower MAPE of 179.427, indicating a relatively better accuracy in predicting absolute percentage errors. On the other hand, the EEMD-ARIMA model shows a higher MAPE of 193.719, suggesting a marginally reduced accuracy in this specific measure. Interestingly, when considering R2 values, the EEMD-ARIMA model outshines the ARIMA model significantly, boasting a much higher R2 of 0.672 compared to 0.554 in ARIMA. This underscores the nuanced strengths and weaknesses of each model, with the ARIMA model excelling in MAPE, while the EEMD-ARIMA model demonstrates superior explanatory power as reflected by the higher R2 values. However, EEMD-ARIMA performed significantly better than ARIMA in the validation phase for SPI 6 (RMSE: 0.503, MAE: 0.546, MAPE: 149.074, R2: 0.578 versus RMSE: 0.211, MAE: 0.170, MAPE: 119.262, R2: 0.926). In addition to generating a significantly higher R2, EEMD-ARIMA showed noticeably reduced RMSE, MAE, and MAPE values, indicating enhanced accuracy and a better fit to the validation data.
In the training phase for SPI 9, both models showed relatively good performance. ARIMA (RMSE: 0.495, MAE: 0.339, MAPE: 395.722, R2: 0.775) and EEMD-ARIMA (RMSE: 0.205, MAE: 0.144, MAPE: 128.230, R2: 0.962) exhibited differing RMSE and MAE values, but both had reasonable R2 values, indicating good agreement with the training data. In the validation phase for SPI 9, EEMD-ARIMA (RMSE: 0.108, MAE: 0.080, MAPE: 57.977, R2: 0.976) continued to outperform ARIMA (RMSE: 0.330, MAE: 0.239, MAPE: 179.218, R2: 0.775) by a significant margin. EEMD-ARIMA showed substantially lower RMSE, MAE, and MAPE values, indicating superior accuracy and a better fit to the validation data, while also achieving a considerably higher R2.
For SPI 12, both models performed well during the training phase. In spite of having different RMSE and MAE values, both ARIMA (RMSE: 0.306, MAE: 0.182, MAPE: 90.516, R2: 0.914) and EEMD-ARIMA (RMSE: 0.118, MAE: 0.084, MAPE: 85.546, R2: 0.978) produced high R2 values, indicating an excellent fit to the training data. In the validation phase for SPI 12, EEMD-ARIMA (RMSE: 0.214, MAE: 0.128, MAPE: 58.442, R2: 0.899) considerably beats ARIMA (RMSE: 0.060, MAE: 0.047, MAPE: 15.188, R2: 0.992). In addition to generating a significantly better R2, EEMD-ARIMA demonstrated noticeably reduced RMSE, MAE, and MAPE values, indicating superior accuracy and a superb fit to the validation data.
The result of this analysis reveals that the EEMD-ARIMA model consistently outperformed the ARIMA model across all SPI timescales and in both training and validation phases. EEMD-ARIMA also consistently achieved higher R2 values and showed significantly lower RMSE, MAE, and MAPE values, indicating higher accuracy and better goodness of fit to the data.
ARIMA, EEMD-ARMA forecasted and observed SPI 3 during validation period.
ARIMA, EEMD-ARMA forecasted and observed SPI 6 during validation period.
ARIMA, EEMD-ARMA forecasted and observed SPI 9 during validation period.
ARIMA, EEMD-ARMA forecasted and observed SPI 12 during validation period.
The efficiency of the EEMD preprocessing method is demonstrated by the EEMD-ARIMA model's persistent outperformance over the ARIMA model in SPI-based drought prediction, as shown by much lower RMSE, MAE, and MAPE values together with higher R2 values. Even though there are sporadic higher MAPE occurrences, the general improvement highlights how well EEMD and ARIMA work together. By utilizing EEMD's advantages in managing nonstationary and nonlinear components, the hybrid model allows ARIMA to focus on linear trends and autocorrelations. This cooperative method produces a greater accuracy synergy, which is especially noticeable during the validation stages. The theoretical claim is supported by the empirical data, which highlights the EEMD-ARIMA model's resilience in improving drought forecast accuracy by overcoming the shortcomings of individual models. To sum up, this clearly illustrates the ‘decomposition and ensemble’ notion in SPI-based drought forecasting, highlighting the importance of EEMD incorporation as a preprocessing step to improve overall prediction accuracy.
Due to the requirement for parameter tuning, which includes optimizing EEMD ensemble trials, ARIMA orders, and decomposition levels, EEMD-ARIMA has some computational challenges that stem from its heavy computational load. Addressing missing data and nonstationarities adds to this computational load, too. Because of the randomness introduced by the ensemble component, consistent findings need several runs. There are difficulties in interpreting a potentially large number of decomposed components, and the long training times of ARIMA models for individual components could be a limitation.
In areas with comparable climates, the EEMD-ARIMA approach has the potential to increase the precision of drought forecasts. Its versatility in capturing nonstationary and nonlinear precipitation patterns – shown in Afghanistan's Herat province – indicates that it might find use elsewhere. However, because different climate dynamics exist in different places, its performance may vary. It is important to weigh robustness and constraints, highlighting the requirement for region-specific calibration to guarantee accurate forecasts. Variations in precipitation patterns and environmental factors may have an impact on the model's efficacy. Furthermore, different geographical contexts may demand different resource requirements and computing efficiency, thus deploying the model there will require careful attention.
CONCLUSIONS
Modern water resources management has the crucial but frequently challenging issue of increasing the accuracy of hydrological forecasts, particularly long-term drought forecasting. The ARIMA model has emerged as one of the most widely used approaches in forecasting theory and practice as a way of Box–Jenkins time series analysis. In this work, an effort was made to evaluate the effectiveness of the ARIMA method based on EEMD for forecasting drought time series to increase its forecasting accuracy. As a benchmark for comparison, an ARIMA model built on the original SPI time series is also used. The models described in this article were developed using monthly precipitation data from Herat, Afghanistan. To assess the performances of the models produced, four common statistical performance evaluation metrics (RMSE, MAE, MAPE, and R2) are used.
The findings of this study show that the suggested hybrid EEMD-ARIMA model can greatly improve ARIMA time series techniques for drought prediction. The suggested approach has a number of benefits. First, although EEMD's fundamental idea is straightforward, it can offer profound insight into the properties of time series representing SPI. Second, only data from the relevant SPI time series are needed for ARIMA forecasting. Third, conducting ARIMA modeling benefits from knowing the zero mean of IMF components. Last but not least, the suggested models do not necessitate laborious decision-making on the explicit form of models for each unique case. As a result, using a hybrid forecasting model that includes EEMD presents findings that are more accurate and reliable. It may also be useful in studies that focus on hydrological time series forecasting for a variety of issues connected to successful reservoir management. In comparison to earlier studies, our study marks a substantial leap in drought forecasting, extending the application of the EEMD-ARIMA model to Herat's particular environment and giving a blueprint for similar places confronting analogous issues. Our work illustrates the possibility of novel methodology that unite conventional hydrological modeling with modern data-driven techniques, enhancing the conversation on climate adaption options by using the EEMD and the SPI time series.
ACKNOWLEDGEMENTS
The first author would like to express gratitude to Afghanistan's Ministry of Higher Education (MoHE) for the scholarship and Kabul Education University (KEU) for the study leave.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.