Abstract
Climate change and water supply shortages are paramount global concerns. Drought, a complex and often underestimated phenomenon, profoundly affects various aspects of human life. Thus, early drought forecasting is crucial for strategic planning and water resource management. This study introduces a novel hybrid model, combining wavelet transform with the Autoregressive Integrated Moving Average (ARIMA) model, known as Wavelet ARIMA (W-ARIMA), to enhance drought prediction accuracy. We meticulously analyze monthly precipitation data from January 1970 to December 2019 in Kabul, Afghanistan, focusing on multiple time scales (SPI 3, SPI 6, SPI 9, SPI 12). Comparative assessment against the conventional ARIMA approach reveals the superior performance of our W-ARIMA model. Key statistical indicators, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), underscore the improvements achieved by the W-ARIMA model, notably in SPI 12 forecasting. Additionally, we evaluate performance using metrics like R-square, NSE, PBIAS, and KGE, consistently demonstrating the W-ARIMA model's superiority. This substantial enhancement highlights the innovative model's clear superiority in drought forecasting for Kabul, Afghanistan. Our research underscores the critical significance of this hybrid model in addressing the challenges posed by drought within the broader context of climate change and water resource management.
HIGHLIGHTS
In this research article, a hybrid wavelet autoregressive integrated moving average (ARIMA) model is proposed for drought forecasting based on SPI. The proposed model showed significant excellency over the individual ARIMA model for drought forecasting according to statistical metrices.
A hybrid W-ARIMA model significantly outperforms individual ARIMA models in drought forecasting based on SPI, as evidenced by statistical metrics.
INTRODUCTION
As an environmental catastrophe that almost occurs in every climate, drought has enticed the attention of researchers in the diverse field of study including agriculture, meteorology, environment, and ecology in recent years (Mishra & Singh 2010). Drought, which means the shortage of soil moisture in a particular period and decreasing water supplies in both the surface and groundwater reservoirs, has a negative impact on human life. Because drought is an incomprehensible event and leads to some negative effects on society, forecasting of occurrence of drought can be considered among the logical measurements to diminish its effect (Wilhite 2005).
Drought can lead to different impacts in different regions. To show a function of hydro-meteorological variables such as precipitation and streamflow, drought indices (DIs) are frequently used for analyzing its impact. For four types of droughts including meteorological, hydrological, agricultural, and socioeconomic, DI can be used for evaluation. However, there is no guarantee for the onset of drought occurrence, and it is necessary to monitor drought events by indices, so, the availability of hydro-meteorological data and the potential chosen of DI should be assessed since drought changes temporarily and spatially (Khan et al. 2020). Meteorological droughts gradually emerge, offering time for preparation and alertness, unlike sudden disasters. These droughts, initially mild in impact, provide a chance for proactive water conservation and sustainable practices. As they progress, public awareness about water use increases, encouraging positive behavioral shifts. Moreover, meteorological droughts drive scientific advancements, inspiring innovations in prediction and resilience technologies. Importantly, their gradual onset reduces immediate risks to life and infrastructure, giving communities more adaptation time. While benefits can vary, embracing these opportunities enables societies to enhance resilience, adopt sustainable measures, and effectively manage a range of drought-related challenges. The significance of the meteorological viewpoint arises from its capacity to serve as an initial indicator of impending drought conditions (Wu et al. 2004; Eslamian et al. 2017). Several indices including the standard precipitation evaporation index, effective drought index, and standardized precipitation index (SPI) are previously used to evaluate the meteorological drought. SPI stands as one of the most commonly used DIs, offering a recent approach for assessing drought severity in accordance with the guidelines set forth by the World Meteorological Organization (WMO) (Hayes et al. 2011) report. In this study, SPI has been chosen because of its inherent benefits. First, SPI is just calculated based on precipitation, so it will be a great advantage, particularly in the area without access to soil moisture, temperature, and evaporation. Second, SPI was introduced as a means to assess deficits in precipitation across various timescales. Shorter or longer timescales may reflect lags in the response of different water resources to precipitation anomalies (Mishra & Desai 2005). Finally, calculating SPI is less complex and is a standardized index that can be implemented in many different regions (Guttman 1998; Zargar et al. 2011).
There are several approaches for predicting time series events including the autoregressive integrated moving average (ARIMA) model, which is famous for its remarkable accuracy in forecasting time-oriented occurrences. As a reliable method, ARIMA has been widely used in time series forecasting such as streamflow and drought forecasting because of some of its benefits over other methods such as exponential smoothing and neural networks (Mishra & Desai 2005). For example, ARIMA can effectively consider serial correlation, which is mostly observed in time series modeling. This model can also provide a searching stage including identification, estimation, and diagnostic checking for selecting a suitable model. ARIMA is a well-preferred approach in many types of time series data owning of its flexibility and prediction precision (Zhang 2003; Mishra & Desai 2005). However, based on (Nourani et al. 2009), it is assumed that ARIMA could be successfully applied in linear and stationary datasets and has a limited ability to capture nonlinear and nonstationary time-oriented data.
Several methods are available in the literature for time series forecasting such as simple moving averages (MAs), linear regression, neural networks, auto regressive moving average (ARMA), and auto regressive integrated moving average (ARIMA). To predict future events, these methods analyze historical records, while time series data are not determinist series, and researchers considered these series as stationary series. Considering time series as a combination of deterministic function and white noise is a way of modeling time series. Using a de-nosing procedure such as wavelet transform, the white noise of any time series can be minimized and a better model can be obtained (Al Wadia & Ismail 2011).
In many fields of mathematical forecasting, wavelet transform along with stochastic and artificial intelligence methods has been frequently used to augment the precision of prediction (Nourani et al. 2014;). Wang & Ding (2003) reported wavelet as one of the useful tools for drought forecasting. In this study, they combined wavelet transform and artificial neural network (ANN), and the presented model picked up some merits of wavelet and ANN and increased the accuracy of drought prediction. In a previous study, Kriechbaumer et al. (2014) used wavelet transform as a data preprocessing way to improve the accuracy of the ARIMA model in metal price forecasting. The result of this study confirmed the usefulness of wavelet transform. Venkata Ramana et al. (2013) combined wavelet transforms with ANN and applied them in predicting monthly rainfall data, and then, the calibration and validation performances of the model were assessed with proper statistical criteria. The outcomes of this study showed that hybrid wavelet ANN can significantly improve the accuracy of monthly rainfall series over single ANN. Wavelet transform as an accepted method gained a wide attention of many researchers for time series analyzing trends particularly, periodicities and variations (Seo et al. 2017; Zhou et al. 2017; Quilty et al. 2019). By applying wavelet transform, a signal in both the time and frequency domains will generate and provide a reliable figure on the arrangement of the basic process to be modeled. Detailed information about data structure and its periodicity will be achieved by each decomposed subseries, and the results of research works illustrated that the wavelet-based approach is a promising technique for dealing with time-oriented datasets (Kim & Valdés 2003; Partal & Kişi 2007; Rathinasamy et al. 2014). Recently, integrated wavelets with stochastic models such as ARIMA and wavelet artificial intelligence such as ANN remarkably increased as a data preprocessing technique to de-noise inputs of hydrologic time series and improve the ability of the single ANN model because the sole ANN model is unable to deal with nonstationary series (Adamowski et al. 2012; Shabri 2015; Belayneh et al. 2016; Soh et al. 2018).
A study by Belayneh & Adamowski (2012) compared the ability of three data-driven techniques including SVM, ANN, and WANN for drought prediction in the Awash River Basin of Ethiopia based on SPI values. The performances of the models were assessed according to root-mean-square error (RMSE), mean absolute error (MAE), and R2 criteria, and the hybrid model of WANN illustrated high accuracy for drought forecasting over two other methods. Two years later, the authors, in addition to the comparison of the performances of the traditional stochastic model (ARIMA), SVR, and ANN for prediction of drought occurrences at the same river basin, implemented the wavelet transform as a suitable data preprocessing technique in those models. In this study, the WSVR was implemented for the first time for predicting drought events. The outcomes from this study disclosed the excellence of WSVR for long-term (6- and 12-month lead time) drought prognosticating based on SPI amounts according to statistical measurements (Belayneh et al. 2014). A new combination model, namely, wavelet linear genetic Programming (WLGP) was applied for the prediction of drought events based on Palmer's modified drought index. The results confirmed the ability of WLGP methods over the traditional genetic programing models for long-term drought prediction, while the simple genetic model was unable to model over a 3-month lead time (Danandeh Mehr et al. 2014). To analyze the accuracy of drought modeling, Djerbouai & Souag-Gamane (2016) compared stochastic models (ARIMA/seasonal ARIMA (SARIMA)) with the ANN models using SPI values for different lead timescales in the Algerois basin, Algeria. They used wavelet transform as a data preprocessing technique to improve the ability of the ANN model. The results from this study show that wavelet transform can significantly increase the accuracy of the model over sole utilization of ANN based on statistical performance measures such as Nash–Sutcliffe efficiency (NSE) coefficient, RMSE, and MAE for all SPI time series lead time ranging from 1 to 6 month. A new hybrid model was suggested for predicting DIs known as a wavelet-based extreme learning machine (WELM) in three distinct stations in Australia. This combination of methods was compared with extreme learning machine (ELM), ANN, LSSVR, and their wavelet equivalent (WANN and WLSSVR). From the performance measures and statistical criteria, it can be easily found that WELM has a strong ability to predict drought incidents over the ELM, ANN, LSSVR, and their wavelet correspondent models. Moreover, WANN shows satisfactory outputs associated with simple computation and lower frequency error rates. The study illustrates the efficiency of wavelet transform as an effective screening data input in improving drought forecasting models (Deo et al. 2017).
The main objective of this research article is to investigate the predictive accuracy of the ARIMA and wavelet autoregressive integrated moving average (W-ARIMA) models using the SPI for drought forecasting. In addition, this study aims to compare the predictive performance of the ARIMA model with the hybrid W-ARIMA model. The SPI values of varying timescales, including SPI3, SPI6, SPI9, and SPI12, are utilized to assess both short- and long-term drought conditions. While the ARIMA model has gained popularity in recent hydrological time series prediction, it exhibits limitations in handling nonlinear and nonstationary data, characteristics inherent to SPI series. This article addresses the dearth of academic research on drought forecasting in Kabul, Afghanistan – a region susceptible to the socioeconomic repercussions of drought. Introducing both the ARIMA and W-ARIMA models for precise prediction, the study undertakes a comparative analysis of their effectiveness.
MATERIALS AND METHODS
Study area and data
Afghanistan has experienced below-normal to severe drought, which affected the availability and access to safe drinking water. According to the Afghanistan Assessment Report (Mayar 2021), around 79% of householders reported inadequate water access for daily activities such as drinking, cooking, bathing, and hygiene. So, forecasting drought provides a broad vision for upcoming years for policymakers and water supply managers to at least mitigate its devastating effects by taking effective measures in advance. In this study, the monthly precipitation data for Kabul, the capital and largest city of Afghanistan, from January 1970 to December 2019 are extracted from the World Bank Climate Change Knowledge Portal (https://climateknowledgeportal.worldbank.org). Kabul is located in the east central part of Afghanistan with the latitude and longitude coordinates of 34.543896 and 69.160652, respectively. This city is situated in a stripe-like right near the Hindu Kush Mountain. The total area of Kabul is estimated at around 400 square miles, and the river of Darya-e Kabul (the river of Kabul) crosses the city from its eastern to the western side. Kabul Province is home to the capital city of Kabul and features a diverse scenery of mountains, valleys, and plains. The province has a dry to semi-dry climate with distinct seasons, and because of irregular and little rainfall, it frequently encounters drought problems. The Kabul River and its tributaries are used for irrigation as well as to grow commodities such as wheat and fruits, which are a major part of the local economy. However, droughts pose serious hazards because they have an impact on crops, water supplies, and daily living. Due to Kabul's high population density, droughts have a significant negative influence on both its social and economic elements. Modern technologies and data analysis techniques may be able to help with effective prediction and reaction plans. Figure 1 shows the study area.
In this study, we utilize monthly precipitation data from Kabul, Afghanistan, for the purpose of model validation. The country has witnessed a substantial increase in drought severity between 1901 and 2010. Despite this, there exists a scarcity of scientific works pertaining to drought prediction in Afghanistan. This study stands out as one of the pioneering pieces of research focusing on drought forecasting in Afghanistan through the implementation of data-driven approaches. The dataset for this case study encompasses 50 years of historical monthly time series precipitation records, spanning from January 1970 to December 2019. Subsequently, these records are divided into distinct training and testing sets. The training set comprises 80% of the total data, while the testing set accounts for the remaining 20%. Table 1 presents a comprehensive overview of descriptive statistics for Kabul Province, the geographical area under investigation, encompassing key metrics such as mean, median, standard deviation, skewness, and kurtosis. See Supplementary Data for details.
Location . | Mean . | Standard deviation . | Min . | Median . | Max . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|
Kabul (KBL) | 45.412 | 36.890 | 0.610 | 33.120 | 200.270 | 1.274 | 1.512 |
Location . | Mean . | Standard deviation . | Min . | Median . | Max . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|
Kabul (KBL) | 45.412 | 36.890 | 0.610 | 33.120 | 200.270 | 1.274 | 1.512 |
Standardized precipitation index
The SPI is a common drought index frequently used for identifying drought events. It was initially suggested by McKee et al. (1995) and utilizes historical precipitation data in space. It encompasses both positive and negative values, with positive values illustrating surplus, while negative values indicating shortage events. Despite the existence of several other DIs, the WMO attempted to develop a standard index that can be utilized as a starting point for every region and country. Other indices can only be applied in specific areas and require large datasets and complex procedures for execution (Yihdego et al. 2019). As mentioned earlier, SPI is based solely on precipitation, making its evaluation relatively easy compared to other indices such as the Palmer index and crop moisture index (Cacciamani et al. 2007). SPI is considered the index for representing variability in Eastern African drought (Ntale & Gan 2003), and it can describe drought at multiple timescales, which is one major benefit of using SPI in predicting drought occurrences.
SPI can be computed by fitting a probability distribution to aggregate monthly precipitation series (3, 6, 9, 12, and 24 months), then this probability density function (PDF) transformed into a normal standardized index whose values classify the category of drought characteristics in each place and timescale (Belayneh & Adamowski 2013).
The SPI classification is presented in Table 2.
SPI values . | Category . |
---|---|
+2 and higher than +2 | Extremely wet |
1.5–1.99 | Very wet |
1–1.49 | Moderately wet |
−0.99 to 0.99 | Nearly normal |
−1.49 to −1 | Moderate drought |
−1.99 to −1.5 | Severe drought |
−2 and less than −2 | Extreme drought |
SPI values . | Category . |
---|---|
+2 and higher than +2 | Extremely wet |
1.5–1.99 | Very wet |
1–1.49 | Moderately wet |
−0.99 to 0.99 | Nearly normal |
−1.49 to −1 | Moderate drought |
−1.99 to −1.5 | Severe drought |
−2 and less than −2 | Extreme drought |
ARIMA model
SARIMA model
Both ARIMA and SARIMA are time series forecasting models that are used to predict future values based on past observations. The main difference between these two models lies in their treatment of seasonal components. In summary, the main difference between ARIMA and SARIMA models is the inclusion of seasonal components in SARIMA to handle time series data with recurring patterns at regular intervals. If time series data exhibits a seasonal pattern, SARIMA may provide more accurate forecasts compared to ARIMA, which is better suited for nonseasonal data. The choice between ARIMA and SARIMA depends on the characteristics of time series data and the patterns you are trying to capture.
Wavelet decomposition
Wavelet, a mathematical model, serves as a transformative tool that converts the original signal, primarily in the time domain, into various domains for analysis and processing (Soltani 2002; Moosavi et al. 2013). It is a widely utilized model for handling nonstationary datasets, encompassing hydrological and climatological records, wherein the mean and autocorrelation of the signal exhibit inherent inconsistencies over time.
For the discrete wavelet at scale and , wavelet coefficient can be considered .
We used DWT in this study because continuous wavelet transform requires more data and generates more information which is not suitable for this study (Che & Zhai 2022).
Hybrid W-ARIMA model
- 1.In the first stage, the original SPI values are decomposed using DWT by MATLAB. This transformation separated data into proper approximate and detailed components. Several wavelet decompositions are suggested including Daubechies, Symlet, Meyer, and Morlet, in which the type of mother wavelet is dependent on the characteristic of data (Benaouda et al. 2006; Nury et al. 2017). In this study, the Daubechies function of order 2 and decomposed level 3 are used. So, we havewhere the decomposed layer of data is represented by n, while Dn and An indicate the detail and approximation components of each layer, respectively. To make the temporal scale constant with the original data, the approximate element of the last layer (An), and detail components of each layer (D1, D2,…, Dn) should be reconstructed.
- 2.
In the second phase, the best ARIMA models are fitted into each decomposed layer for every SPI series. To meet the appropriate ARIMA model, all iterative steps explained in Section 3.1 are implemented to make a particular model for each decomposed component.
- 3.Finally, the signal will be reconstructed using these decomposed and extended signals on different scales with the help of the following equation. The forecasted value of W-ARIMA can be achieved by arithmetic summing all subseries predictions of each decomposed layer for each SPI.where is the forecasted value of each SPI for the next year ahead.
Evaluation metrices
R2 . | PBias (%) . | NSE . | Performance rating . |
---|---|---|---|
Very good (VG) | |||
Good (G) | |||
Satisfactory (S) | |||
Unsatisfactory (U) | |||
Inappropriate (I) |
R2 . | PBias (%) . | NSE . | Performance rating . |
---|---|---|---|
Very good (VG) | |||
Good (G) | |||
Satisfactory (S) | |||
Unsatisfactory (U) | |||
Inappropriate (I) |
RESULTS AND DISCUSSION
ARIMA model
As mentioned earlier, to arrive at a suitable forecasting ARIMA model, we need to follow three iterative steps, which are described in the following sections.
Model identification
. | Augmented Dickey–Fuller test for SPI series . | |||
---|---|---|---|---|
SPI 3 . | SPI6 . | SPI 9 . | SPI 12 . | |
ADF statistic | − 7.1557 | −5.8933 | −7.7892 | −5.8192 |
p-Value | 0.01 | 0.01 | 0.01 | 0.01 |
. | Augmented Dickey–Fuller test for SPI series . | |||
---|---|---|---|---|
SPI 3 . | SPI6 . | SPI 9 . | SPI 12 . | |
ADF statistic | − 7.1557 | −5.8933 | −7.7892 | −5.8192 |
p-Value | 0.01 | 0.01 | 0.01 | 0.01 |
Multiple models have been fitted to various SPI timescales. Table 5 presents a summary of the best ARIMA models based on the AIC and SBC criteria. The best model for each original SPI can be determined by selecting the model with the lowest AIC and SBC values.
SPI series . | Model . | AIC . | SBC . |
---|---|---|---|
SPI 3 | ARIMA(1,0,2)(1,0,0)[3] | 1041.37 | 1062.22 |
SPI 6 | ARIMA(3,0,3)(0,0,1)[6] | 687.65 | 720.95 |
SPI 9 | ARIMA(1,0,1)(1,0,2)[9] | 442.6 | 467.54 |
SPI 12 | ARIMA(1,0,0)(1,0,2)[12] | 172.99 | 194.88 |
SPI series . | Model . | AIC . | SBC . |
---|---|---|---|
SPI 3 | ARIMA(1,0,2)(1,0,0)[3] | 1041.37 | 1062.22 |
SPI 6 | ARIMA(3,0,3)(0,0,1)[6] | 687.65 | 720.95 |
SPI 9 | ARIMA(1,0,1)(1,0,2)[9] | 442.6 | 467.54 |
SPI 12 | ARIMA(1,0,0)(1,0,2)[12] | 172.99 | 194.88 |
Parameter estimation
The subsequent step in constructing the best ARIMA model involves estimating the parameters of the selected model from the previous stage. Following the approach outlined by Box & Jenkins (1976), the maximum likelihood method, a technique for estimating parameters in statistical models, has been employed for parameter estimation. A robust estimator for the parameters can be computed by assuming data stationarity and maximizing the probability concerning the parameters. Generally, there are two conditions on parameters in estimating, one is stationary condition, which means , and the other is invertibility condition, which means . Model parameters, standard errors, t-statistics, and p-values are presented in Table 6. Clearly, the standard errors are relatively small in comparison to the values of the model parameters, and the majority of p-values in the models are significant. This indicates that these parameters can be included in the models.
SPI series . | Model parameters . | Variables in the model . | |||
---|---|---|---|---|---|
Value of parameters . | Standard error . | t-Statistic . | p-Value . | ||
SPI 3 | −0.1664 | 0.0686 | −2.4253 | 0.0153 | |
0.9304 | 0.0502 | 18.5219 | 2 × 10−16 | ||
0.7288 | 0.0458 | 15.8998 | 2 × 10−16 | ||
0.1170 | 0.0619 | 1.8897 | 0.0590 | ||
SPI 6 | − 0.6893 | 0.1894 | −3.6387 | 0.0003 | |
0.8144 | 0.0661 | 12.3223 | 2.2 × 10−16 | ||
0.6398 | 0.1600 | 3.9986 | 6.4 × 10−05 | ||
1.6030 | 0.1935 | 8.2849 | 2.2 × 10−16 | ||
0.5597 | 0.2276 | 2.4589 | 0.0139 | ||
−0.1123 | 0.0613 | −1.8310 | 0.0671 | ||
-0.5218 | 0.0481 | −10.8487 | 2.2 × 10−16 | ||
SPI 9 | 0.9672 | 0.0127 | 76.1960 | 2.2 × 10−16 | |
−0.0763 | 0.0485 | −1.5719 | 0.1160 | ||
0.7301 | 0.2428 | 3.0072 | 0.0026 | ||
−1.3686 | 0.2535 | −5.3986 | 6.7 × 10−08 | ||
0.4442 | 0.1820 | 2.4407 | 0.0146 | ||
SPI 12 | 0.9826 | 0.0079 | 124.2860 | 2.2 × 10−16 | |
−0.7153 | 0.2769 | −2.5829 | 0.0098 | ||
−0.0713 | 0.2885 | −0.2471 | 0.8049 | ||
−0.5015 | 0.2257 | −2.2223 | 0.0263 |
SPI series . | Model parameters . | Variables in the model . | |||
---|---|---|---|---|---|
Value of parameters . | Standard error . | t-Statistic . | p-Value . | ||
SPI 3 | −0.1664 | 0.0686 | −2.4253 | 0.0153 | |
0.9304 | 0.0502 | 18.5219 | 2 × 10−16 | ||
0.7288 | 0.0458 | 15.8998 | 2 × 10−16 | ||
0.1170 | 0.0619 | 1.8897 | 0.0590 | ||
SPI 6 | − 0.6893 | 0.1894 | −3.6387 | 0.0003 | |
0.8144 | 0.0661 | 12.3223 | 2.2 × 10−16 | ||
0.6398 | 0.1600 | 3.9986 | 6.4 × 10−05 | ||
1.6030 | 0.1935 | 8.2849 | 2.2 × 10−16 | ||
0.5597 | 0.2276 | 2.4589 | 0.0139 | ||
−0.1123 | 0.0613 | −1.8310 | 0.0671 | ||
-0.5218 | 0.0481 | −10.8487 | 2.2 × 10−16 | ||
SPI 9 | 0.9672 | 0.0127 | 76.1960 | 2.2 × 10−16 | |
−0.0763 | 0.0485 | −1.5719 | 0.1160 | ||
0.7301 | 0.2428 | 3.0072 | 0.0026 | ||
−1.3686 | 0.2535 | −5.3986 | 6.7 × 10−08 | ||
0.4442 | 0.1820 | 2.4407 | 0.0146 | ||
SPI 12 | 0.9826 | 0.0079 | 124.2860 | 2.2 × 10−16 | |
−0.7153 | 0.2769 | −2.5829 | 0.0098 | ||
−0.0713 | 0.2885 | −0.2471 | 0.8049 | ||
−0.5015 | 0.2257 | −2.2223 | 0.0263 |
Diagnostic checking
Furthermore, the LBQ test was utilized to ascertain the independence of residuals. The outcomes of this test, as shown in Table 7, indicate that the residuals for each SPI value across various timescales were uncorrelated and exhibited the properties of a white noise process.
SPI series . | SPI 3 . | SPI 6 . | SPI 9 . | SPI 12 . |
---|---|---|---|---|
x2 | 0.0042 | 0.0034 | 0.0137 | 1.2907 |
p-Value | 0.9486 | 0.9538 | 0.9069 | 0.2559 |
SPI series . | SPI 3 . | SPI 6 . | SPI 9 . | SPI 12 . |
---|---|---|---|---|
x2 | 0.0042 | 0.0034 | 0.0137 | 1.2907 |
p-Value | 0.9486 | 0.9538 | 0.9069 | 0.2559 |
SPI series . | Model . | AIC . | SBC . | |
---|---|---|---|---|
SPI 3 | A3 | ARIMA(2,0,0)(1,0,1)[3] | −390.7 | −370.22 |
D3 | ARIMA(2,0,0)(2,0,2)[3] | −430.01 | −401.4 | |
D2 | ARIMA(1,0,0)(1,0,0)[3] | −286.61 | −270.14 | |
D1 | ARIMA(2,0,0)(2,0,0)[3] | −471.1 | −450.47 | |
SPI 6 | A3 | ARIMA(2,0,2)(2,0,2)[6] | −397.54 | −360.84 |
D3 | ARIMA(5,0,0)(1,0,2)[6] | −512.96 | −476.26 | |
D2 | ARIMA(5,0,0) | −717.34 | −692.5 | |
D1 | ARIMA(0,0,3)(2,0,1)[6] | −799.14 | −765.92 | |
SPI 9 | A3 | ARIMA(2,0,2)(1,0,0)[9] | −374.35 | −345.75 |
D3 | ARIMA(4,0,0)(1,0,0)[9] | −429.45 | −404.9 | |
D2 | ARIMA(5,0,0) | −717.16 | −692.5 | |
D1 | ARIMA(1,0,0)(1,0,0)[9] | −258.23 | −241.71 | |
SPI 12 | A3 | ARIMA(2,0,2)(2,0,0)[12] | −408.84 | −376.19 |
D3 | ARIMA(4,0,0)(1,0,0)[12] | −485.18 | −460.63 | |
D2 | ARIMA(4,0,0)(1,0,0)[12] | −728.84 | −704.18 | |
D1 | ARIMA(1,0,0)(0,0,2)[12] | −262.85 | −242.22 |
SPI series . | Model . | AIC . | SBC . | |
---|---|---|---|---|
SPI 3 | A3 | ARIMA(2,0,0)(1,0,1)[3] | −390.7 | −370.22 |
D3 | ARIMA(2,0,0)(2,0,2)[3] | −430.01 | −401.4 | |
D2 | ARIMA(1,0,0)(1,0,0)[3] | −286.61 | −270.14 | |
D1 | ARIMA(2,0,0)(2,0,0)[3] | −471.1 | −450.47 | |
SPI 6 | A3 | ARIMA(2,0,2)(2,0,2)[6] | −397.54 | −360.84 |
D3 | ARIMA(5,0,0)(1,0,2)[6] | −512.96 | −476.26 | |
D2 | ARIMA(5,0,0) | −717.34 | −692.5 | |
D1 | ARIMA(0,0,3)(2,0,1)[6] | −799.14 | −765.92 | |
SPI 9 | A3 | ARIMA(2,0,2)(1,0,0)[9] | −374.35 | −345.75 |
D3 | ARIMA(4,0,0)(1,0,0)[9] | −429.45 | −404.9 | |
D2 | ARIMA(5,0,0) | −717.16 | −692.5 | |
D1 | ARIMA(1,0,0)(1,0,0)[9] | −258.23 | −241.71 | |
SPI 12 | A3 | ARIMA(2,0,2)(2,0,0)[12] | −408.84 | −376.19 |
D3 | ARIMA(4,0,0)(1,0,0)[12] | −485.18 | −460.63 | |
D2 | ARIMA(4,0,0)(1,0,0)[12] | −728.84 | −704.18 | |
D1 | ARIMA(1,0,0)(0,0,2)[12] | −262.85 | −242.22 |
Proposed W-ARIMA model
To enhance forecasting accuracy, a combined model named the W-ARIMA model, integrating wavelet and ARIMA, is introduced to address the limitations of standalone ARIMA models when dealing with nonstationary data. In this approach, the DWT is applied to decompose the SPI series, generating suitable approximate and detailed components. Inverse wavelet transform is subsequently used to reconstruct the decomposed series. Optimal ARIMA/SARIMA models are fitted to each of the decomposed elements, and the W-ARIMA predictions are obtained by summing the forecasted values from all the decomposed components.
The proposed W-ARIMA model is easily derived by aggregating the predicted values from each decomposed layer, utilizing the three iterative stages previously elucidated, along with the corresponding ARIMA model for each constituent subseries. For various ARIMA/SARIMA models applied to each decomposed SPI (A3, D3, D2, and D1) Timescales, the summary of optimal models based on AIC and SBC, is presented in Table 8. The most suitable model for each original SPI can be identified by selecting the models with the lowest AIC and SBC values.
Evaluation metrices . | ARIMA . | W-ARIMA . | ||||||
---|---|---|---|---|---|---|---|---|
SPI 3 . | SPI6 . | SPI 9 . | SPI 12 . | SPI 3 . | SPI 6 . | SPI 9 . | SPI 12 . | |
RMSE | 0.6380 | 0.4275 | 0.3487 | 0.2474 | 0.5212 | 0.2148 | 0.2160 | 0.1466 |
MAE | 0.4734 | 0.3160 | 0.2529 | 0.1831 | 0.4134 | 0.1688 | 0.1555 | 0.1040 |
MAPE | 122.7013 | 184.5775 | 83.0959 | 55.6354 | 118.145 | 75.8269 | 71.7270 | 32.9000 |
R2 | 0.5732 | 0.8066 | 0.8721 | 0.9414 | 0.7155 | 0.9507 | 09508 | 0.9782 |
PBias | 0.2699 | −0.1087 | −0.4325 | −0.0833 | 0.8830 | −0.3487 | −0.8091 | 0.0759 |
NSE | 0.5723 | 0.8043 | 0.8706 | 0.9379 | 0.7146 | 0.9506 | 0.9504 | 0.9782 |
KGE | 0.6630 | 0.8190 | 0.8760 | 0.9060 | 0.7580 | 0.9710 | 0.9470 | 0.9810 |
Evaluation metrices . | ARIMA . | W-ARIMA . | ||||||
---|---|---|---|---|---|---|---|---|
SPI 3 . | SPI6 . | SPI 9 . | SPI 12 . | SPI 3 . | SPI 6 . | SPI 9 . | SPI 12 . | |
RMSE | 0.6380 | 0.4275 | 0.3487 | 0.2474 | 0.5212 | 0.2148 | 0.2160 | 0.1466 |
MAE | 0.4734 | 0.3160 | 0.2529 | 0.1831 | 0.4134 | 0.1688 | 0.1555 | 0.1040 |
MAPE | 122.7013 | 184.5775 | 83.0959 | 55.6354 | 118.145 | 75.8269 | 71.7270 | 32.9000 |
R2 | 0.5732 | 0.8066 | 0.8721 | 0.9414 | 0.7155 | 0.9507 | 09508 | 0.9782 |
PBias | 0.2699 | −0.1087 | −0.4325 | −0.0833 | 0.8830 | −0.3487 | −0.8091 | 0.0759 |
NSE | 0.5723 | 0.8043 | 0.8706 | 0.9379 | 0.7146 | 0.9506 | 0.9504 | 0.9782 |
KGE | 0.6630 | 0.8190 | 0.8760 | 0.9060 | 0.7580 | 0.9710 | 0.9470 | 0.9810 |
The comparison table (Table 9) assessing the accuracy of drought forecasting between the ARIMA and W-ARIMA models reveals notable insights. Various evaluation metrics, such as R2, RMSE, MAPE, NSE, KGE, and PBIAS, were employed to comprehensively evaluate their performances. In terms of R2, RMSE, MAPE, NSE, and KGE, the W-ARIMA model consistently demonstrated superiority over the ARIMA model, showcasing its enhanced predictive capability and precision. These metrics collectively indicate that the W-ARIMA model outperforms ARIMA across different dimensions of forecasting accuracy. However, it is noteworthy that for the PBIAS criterion, no significant difference between the two models was observed, implying a comparable performance in addressing bias. This comprehensive assessment underscores the favorable attributes of the W-ARIMA approach in improving the precision and reliability of drought forecasting compared to the traditional ARIMA method.
In the realm of drought forecasting, our study's hybrid W-ARIMA model tailored to Kabul, Afghanistan, emerges as a groundbreaking advance. Leveraging original data, we pioneer a statistical and data-driven approach, marking the first academic effort of its kind within Afghanistan. Thorough evaluation, encompassing diverse metrics excluding RMSE and R2 underscores the hybrid model's pronounced edge over traditional ARIMA. Distinctively, our work distinguishes itself by its localized focus on Kabul, serving as a trailblazer in introducing sophisticated methodologies for drought prediction in this region. This unique contribution augments both academic discourse and the practical domain, promising enhanced resilience strategies for managing drought-related challenges in Kabul and beyond.
CONCLUSION
In this study, we introduced a novel hybrid W-ARIMA model and applied it to forecast drought occurrences in Kabul, Afghanistan, addressing the critical issue of drought forecasting. Our innovative utilization of data-driven methods for drought prediction in Afghanistan is further enhanced by employing an original dataset. By adapting the well-established W-ARIMA model to Kabul's distinct hydro-climatic conditions, we extend its applicability and contribute to evolving drought prediction techniques in the region.
A central contribution of our research is the comprehensive comparison between the proposed W-ARIMA model and the traditional ARIMA model based on SPI. Through a comprehensive analysis using various performance metrics, including RMSE, MAE, MAPE, R2, NSE, PBIAS, and KGE, the clear superiority of the W-ARIMA approach becomes evident. Across all metrics, except PBIAS, the W-ARIMA model consistently outperforms the traditional ARIMA model, underscoring its potential to enhance drought forecast accuracy.
Initially, we establish the optimal ARIMA/SARIMA model for each SPI series, serving as a benchmark. Subsequently, the DWT (db2) is applied to decompose each SPI series, capturing the essential multiscale information required for accurate forecasting. The final W-ARIMA forecast for each SPI series is derived by fitting suitable ARIMA models to the decomposed elements (A3, D3, D2, and D1). W-ARIMA model can be obtained by summing all predicted values of each decomposed layer using the three iterative stages explained in methodology with the corresponding ARIMA model for each constituent subseries.
Ultimately, our study pioneers the application of an advanced forecasting model for drought prediction in Kabul, Afghanistan. The proven effectiveness of the W-ARIMA model, coupled with a comprehensive comparison to the SPI-based ARIMA model, highlights its potential to enhance the precision of drought forecasting. Emphasizing localized approaches to improve forecasting accuracy, our work contributes to the broader field of climate resilience strategies. In comparison to prior research, our study represents a significant advancement in drought forecasting, extending the application of the W-ARIMA model to Kabul's unique context and providing a blueprint for similar regions facing comparable challenges. By incorporating the DWT and the SPI framework, our work showcases the potential of innovative methodologies that bridge traditional hydrological modeling with contemporary data-driven techniques, thus enriching the discourse on climate adaptation strategies.
ACKNOWLEDGEMENT
The first author would like to express gratitude to Afghanistan's Ministry of Higher Education (MoHE) for the scholarship and Kabul Education University (KEU) for the study leave.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.