Drought forecasting using W-ARIMA model with standardized precipitation index

Climate change and water supply shortages are paramount global concerns. Drought, a complex and often underestimated phenomenon, profoundly affects various aspects of human life. Thus, early drought forecasting is crucial for strategic planning and water resource management. This study introduces a novel hybrid model, combining wavelet transform with the Autoregressive Integrated Moving Average (ARIMA) model, known as Wavelet ARIMA (W-ARIMA), to enhance drought prediction accuracy. We meticulously analyze monthly precipitation data from January 1970 to December 2019 in Kabul, Afghanistan, focusing on multiple time scales (SPI 3, SPI 6, SPI 9, SPI 12). Comparative assessment against the conventional ARIMA approach reveals the superior performance of our W-ARIMA model. Key statistical indicators, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), underscore the improvements achieved by the W-ARIMA model, notably in SPI 12 forecasting. Additionally, we evaluate performance using metrics like R-square, NSE, PBIAS, and KGE, consistently demonstrating the W-ARIMA model ’ s superiority. This substantial enhancement highlights the innovative model ’ s clear superiority in drought forecasting for Kabul, Afghanistan. Our research underscores the critical signi ﬁ cance of this hybrid model in addressing the challenges posed by drought within the broader context of climate change and water resource management.


INTRODUCTION
As an environmental catastrophe that almost occurs in every climate, drought has enticed the attention of researchers in the diverse field of study including agriculture, meteorology, environment, and ecology in recent years (Mishra & Singh 2010).Drought, which means the shortage of soil moisture in a particular period and decreasing water supplies in both the surface and groundwater reservoirs, has a negative impact on human life.Because drought is an incomprehensible event and leads to some negative effects on society, forecasting of occurrence of drought can be considered among the logical measurements to diminish its effect (Wilhite 2005).
Drought can lead to different impacts in different regions.To show a function of hydro-meteorological variables such as precipitation and streamflow, drought indices (DIs) are frequently used for analyzing its impact.For four types of droughts including meteorological, hydrological, agricultural, and socioeconomic, DI can be used for evaluation.However, there is no guarantee for the onset of drought occurrence, and it is necessary to monitor drought events by indices, so, the availability of hydro-meteorological data and the potential chosen of DI should be assessed since drought changes temporarily and spatially (Khan et al. 2020).Meteorological droughts gradually emerge, offering time for preparation and alertness, unlike sudden disasters.These droughts, initially mild in impact, provide a chance for proactive water conservation and sustainable practices.As they progress, public awareness about water use increases, encouraging positive behavioral shifts.Moreover, meteorological droughts drive scientific advancements, inspiring innovations in prediction and resilience technologies.Importantly, their gradual onset reduces immediate risks to life and infrastructure, giving communities more adaptation time.While benefits can vary, embracing these opportunities enables societies to enhance resilience, adopt sustainable measures, and effectively manage a range of drought-related challenges.The significance of the meteorological viewpoint arises from its capacity to serve as an initial indicator of impending drought conditions (Wu et al. 2004;Eslamian et al. 2017).Several indices including the standard precipitation evaporation index, effective drought index, and standardized precipitation index (SPI) are previously used to evaluate the meteorological drought.SPI stands as one of the most commonly used DIs, offering a recent approach for assessing drought severity in accordance with the guidelines set forth by the World Meteorological Organization (WMO) (Hayes et al. 2011) report.In this study, SPI has been chosen because of its inherent benefits.First, SPI is just calculated based on precipitation, so it will be a great advantage, particularly in the area without access to soil moisture, temperature, and evaporation.Second, SPI was introduced as a means to assess deficits in precipitation across various timescales.Shorter or longer timescales may reflect lags in the response of different water resources to precipitation anomalies (Mishra & Desai 2005).Finally, calculating SPI is less complex and is a standardized index that can be implemented in many different regions (Guttman 1998;Zargar et al. 2011).
There are several approaches for predicting time series events including the autoregressive integrated moving average (ARIMA) model, which is famous for its remarkable accuracy in forecasting time-oriented occurrences.As a reliable method, ARIMA has been widely used in time series forecasting such as streamflow and drought forecasting because of some of its benefits over other methods such as exponential smoothing and neural networks (Mishra & Desai 2005).For example, ARIMA can effectively consider serial correlation, which is mostly observed in time series modeling.This model can also provide a searching stage including identification, estimation, and diagnostic checking for selecting a suitable model.ARIMA is a well-preferred approach in many types of time series data owning of its flexibility and prediction precision (Zhang 2003;Mishra & Desai 2005).However, based on (Nourani et al. 2009), it is assumed that ARIMA could be successfully applied in linear and stationary datasets and has a limited ability to capture nonlinear and nonstationary timeoriented data.
Several methods are available in the literature for time series forecasting such as simple moving averages (MAs), linear regression, neural networks, auto regressive moving average (ARMA), and auto regressive integrated moving average (ARIMA).To predict future events, these methods analyze historical records, while time series data are not determinist series, and researchers considered these series as stationary series.Considering time series as a combination of deterministic function and white noise is a way of modeling time series.Using a de-nosing procedure such as wavelet transform, the white noise of any time series can be minimized and a better model can be obtained (Al Wadia & Ismail 2011).
In many fields of mathematical forecasting, wavelet transform along with stochastic and artificial intelligence methods has been frequently used to augment the precision of prediction (Nourani et al. 2014;).Wang & Ding (2003) reported wavelet as one of the useful tools for drought forecasting.In this study, they combined wavelet transform and artificial neural network (ANN), and the presented model picked up some merits of wavelet and ANN and increased the accuracy of drought prediction.In a previous study, Kriechbaumer et al. (2014) used wavelet transform as a data preprocessing way to improve the accuracy of the ARIMA model in metal price forecasting.The result of this study confirmed the usefulness of wavelet transform.Venkata Ramana et al. (2013) combined wavelet transforms with ANN and applied them in predicting monthly rainfall data, and then, the calibration and validation performances of the model were assessed with proper statistical criteria.The outcomes of this study showed that hybrid wavelet ANN can significantly improve the accuracy of monthly rainfall series over single ANN.Wavelet transform as an accepted method gained a wide attention of many researchers for time series analyzing trends particularly, periodicities and variations (Seo et al. 2017;Zhou et al. 2017;Quilty et al. 2019).By applying wavelet transform, a signal in both the time and frequency domains will generate and provide a reliable figure on the arrangement of the basic process to be modeled.Detailed information about data structure and its periodicity will be achieved by each decomposed subseries, and the results of research works illustrated that the wavelet-based approach is a promising technique for dealing with time-oriented datasets (Kim & Valdés 2003;Partal & Kisi 2007;Rathinasamy et al. 2014).Recently, integrated wavelets with stochastic models such as ARIMA and wavelet artificial intelligence such as ANN remarkably increased as a data preprocessing technique to de-noise inputs of hydrologic time series and improve the ability of the single ANN model because the sole ANN model is unable to deal with nonstationary series (Adamowski et al. 2012;Shabri 2015;Belayneh et al. 2016;Soh et al. 2018).
A study by Belayneh & Adamowski (2012) compared the ability of three data-driven techniques including SVM, ANN, and WANN for drought prediction in the Awash River Basin of Ethiopia based on SPI values.The performances of the models were assessed according to root-mean-square error (RMSE), mean absolute error (MAE), and R 2 criteria, and the hybrid model of WANN illustrated high accuracy for drought forecasting over two other methods.Two years later, the authors, in addition to the comparison of the performances of the traditional stochastic model (ARIMA), SVR, and ANN for prediction of drought occurrences at the same river basin, implemented the wavelet transform as a suitable data preprocessing technique in those models.In this study, the WSVR was implemented for the first time for predicting drought events.The outcomes from this study disclosed the excellence of WSVR for long-term (6-and 12-month lead time) drought prognosticating based on SPI amounts according to statistical measurements (Belayneh et al. 2014).A new combination model, namely, wavelet linear genetic Programming (WLGP) was applied for the prediction of drought events based on Palmer's modified drought index.The results confirmed the ability of WLGP methods over the traditional genetic programing models for long-term drought prediction, while the simple genetic model was unable to model over a 3-month lead time (Danandeh Mehr et al. 2014).To analyze the accuracy of drought modeling, Djerbouai & Souag-Gamane (2016) compared stochastic models (ARIMA/seasonal ARIMA (SARIMA)) with the ANN models using SPI values for different lead timescales in the Algerois basin, Algeria.They used wavelet transform as a data preprocessing technique to improve the ability of the ANN model.The results from this study show that wavelet transform can significantly increase the accuracy of the model over sole utilization of ANN based on statistical performance measures such as Nash-Sutcliffe efficiency (NSE) coefficient, RMSE, and MAE for all SPI time series lead time ranging from 1 to 6 month.A new hybrid model was suggested for predicting DIs known as a wavelet-based extreme learning machine (WELM) in three distinct stations in Australia.This combination of methods was compared with extreme learning machine (ELM), ANN, LSSVR, and their wavelet equivalent (WANN and WLSSVR).From the performance measures and statistical criteria, it can be easily found that WELM has a strong ability to predict drought incidents over the ELM, ANN, LSSVR, and their wavelet correspondent models.Moreover, WANN shows satisfactory outputs associated with simple computation and lower frequency error rates.The study illustrates the efficiency of wavelet transform as an effective screening data input in improving drought forecasting models (Deo et al. 2017).
The main objective of this research article is to investigate the predictive accuracy of the ARIMA and wavelet autoregressive integrated moving average (W-ARIMA) models using the SPI for drought forecasting.In addition, this study aims to compare the predictive performance of the ARIMA model with the hybrid W-ARIMA model.The SPI values of varying timescales, including SPI3, SPI6, SPI9, and SPI12, are utilized to assess both short-and long-term drought conditions.While the ARIMA model has gained popularity in recent hydrological time series prediction, it exhibits limitations in handling nonlinear and nonstationary data, characteristics inherent to SPI series.This article addresses the dearth of academic research on drought forecasting in Kabul, Afghanistana region susceptible to the socioeconomic repercussions of drought.Introducing both the ARIMA and W-ARIMA models for precise prediction, the study undertakes a comparative analysis of their effectiveness.

Study area and data
Afghanistan has experienced below-normal to severe drought, which affected the availability and access to safe drinking water.According to the Afghanistan Assessment Report (Mayar 2021), around 79% of householders reported inadequate water access for daily activities such as drinking, cooking, bathing, and hygiene.So, forecasting drought provides a broad vision for upcoming years for policymakers and water supply managers to at least mitigate its devastating effects by taking effective measures in advance.In this study, the monthly precipitation data for Kabul, the capital and largest city of Afghanistan, from January 1970 to December 2019 are extracted from the World Bank Climate Change Knowledge Portal (https://climateknowledgeportal.worldbank.org).Kabul is located in the east central part of Afghanistan with the latitude and longitude coordinates of 34.543896 and 69.160652, respectively.This city is situated in a stripe-like right near the Hindu Kush Mountain.The total area of Kabul is estimated at around 400 square miles, and the river of Darya-e Kabul (the river of Kabul) crosses the city from its eastern to the western side.Kabul Province is home to the capital city of Kabul and features a diverse scenery of mountains, valleys, and plains.The province has a dry to semi-dry climate with distinct seasons, and because of irregular and little rainfall, it frequently encounters drought problems.The Kabul River and its tributaries are used for irrigation as well as to grow commodities such as wheat and fruits, which are a major part of the local economy.However, droughts pose serious hazards because they have an impact on crops, water supplies, and daily living.Due to Kabul's high population density, droughts have a significant negative influence on both its social and economic elements.Modern technologies and data analysis techniques may be able to help with effective prediction and reaction plans.Figure 1 shows the study area.
In this study, we utilize monthly precipitation data from Kabul, Afghanistan, for the purpose of model validation.The country has witnessed a substantial increase in drought severity between 1901 and 2010.Despite this, there exists a scarcity of scientific works pertaining to drought prediction in Afghanistan.This study stands out as one of the pioneering pieces of research focusing on drought forecasting in Afghanistan through the implementation of data-driven approaches.The dataset for this case study encompasses 50 years of historical monthly time series precipitation records, spanning from January 1970 to December 2019.Subsequently, these records are divided into distinct training and testing sets.
The training set comprises 80% of the total data, while the testing set accounts for the remaining 20%.Table 1 presents a comprehensive overview of descriptive statistics for Kabul Province, the geographical area under investigation, encompassing key metrics such as mean, median, standard deviation, skewness, and kurtosis.See Supplementary Data for details.

Standardized precipitation index
The SPI is a common drought index frequently used for identifying drought events.It was initially suggested by McKee et al. (1995) and utilizes historical precipitation data in space.It encompasses both positive and negative values, with positive values illustrating surplus, while negative values indicating shortage events.Despite the existence of several other DIs, the WMO attempted to develop a standard index that can be utilized as a starting point for every region and country.Other indices can only be applied in specific areas and require large datasets and complex procedures for execution (Yihdego et al. 2019).As mentioned earlier, SPI is based solely on precipitation, making its evaluation relatively easy compared to other indices such as the Palmer index and crop moisture index (Cacciamani et al. 2007).SPI is considered the index for representing variability in Eastern African drought (Ntale & Gan 2003), and it can describe drought at multiple timescales, which is one major benefit of using SPI in predicting drought occurrences.
SPI can be computed by fitting a probability distribution to aggregate monthly precipitation series (3, 6, 9, 12, and 24 months), then this probability density function (PDF) transformed into a normal standardized index whose values classify the category of drought characteristics in each place and timescale (Belayneh & Adamowski 2013).
The gamma distribution is used to fit lengthy periods of historical rainfall records because it closely matches the sequence of rainfall episodes.The PDF derived from the gamma distribution is shown in Equation (1).
x aÀ1 e À x b for x, a, b .0 (1)  where the scale, shape variables, quantity of precipitation, and gamma function are represented by b, a, x, and G(a), respectively.The best estimates of aand b bounds are provided by Equations ( 2) and (3) (Guttman 1999).
where the rainfall average and the total number of observations are represented by the letters x and n, respectively.
x aÀ1 e À x b dx (4) The variables required to calculate the cumulative probability for nonzero rainfalls are shown in Equation ( 5).
where t ¼ x b For x ¼ 0, the gamma function is uncertain, the rainfall time series data may include zero precipitation, the cumulative probability of zero and nonzero precipitation are calculated, and Equation ( 6) is used to determine H(x).
where q denotes the probability of no precipitation, a is the number of zeros (which occur in a rainfall time series), and b is calculated using m/n.
Once the cumulative probability is changed to a standardized normal distribution (Sönmez et al. 2005), as demonstrated by Equations ( 7) and ( 8), the SPI mean and variance would equal zero and one, respectively.
The SPI classification is presented in Table 2.

ARIMA model
Linear stochastic models, also referred to as time series models, have been frequently applied in hydrological and meteorological drought forecasting due to their ability to provide a systematic empirical approach for predicting and analyzing time series events over the past decades.One of the most commonly used methods in this category is the ARIMA model, initially introduced by Box & Jenkins (1976), which is a suitable solution for addressing nonstationarity in historical time series records.This property of ARIMA arises from the effective combination of two simpler models, including autoregressive (AR) and MA.The general nonseasonal ARIMA model involves an AR order of p, an MA order of q, and d differences for the original time series z t .This model is denoted as ARIMA(p,d,q) and can be represented as follows: where f(B) and u(B) are polynomials of order p and q, respectively.The operators f(B) and u(B)will be described as follows: 2.4.SARIMA model Box et al. (1994) developed the ARIMA model and introduced the SARIMA model to deal with nonstationary time series with seasonal characteristics because many time series in hydrology have periodic properties.Generally, the SARIMA model is known as SARIMA( p, d, q)(P, D, Q) S , where the ( p, d, q) illustrates the nonseasonal part of the original series, while the seasonal part of the series will be described by (P, D, Q), which can be written as follows: where p is the order of the nonseasonal AR model, q is the order of the nonseasonal MA model, d is the number of regular differencing, P is the order of seasonal autoregression, D is the number of seasonal differencing, Q is the order of seasonal MA, and eventually s is the length of the season.Both ARIMA and SARIMA are time series forecasting models that are used to predict future values based on past observations.The main difference between these two models lies in their treatment of seasonal components.In summary, the main difference between ARIMA and SARIMA models is the inclusion of seasonal components in SARIMA to handle time series data with recurring patterns at regular intervals.If time series data exhibits a seasonal pattern, SARIMA may provide more accurate forecasts compared to ARIMA, which is better suited for nonseasonal data.The choice between ARIMA and SARIMA depends on the characteristics of time series data and the patterns you are trying to capture.

Wavelet decomposition
Wavelet, a mathematical model, serves as a transformative tool that converts the original signal, primarily in the time domain, into various domains for analysis and processing (Soltani 2002;Moosavi et al. 2013).It is a widely utilized model for handling nonstationary datasets, encompassing hydrological and climatological records, wherein the mean and autocorrelation of the signal exhibit inherent inconsistencies over time.
Wavelet is a powerful method in time series forecasting that researchers have implemented in many different fields of studies, including drought forecasting, in combination with other statistical and machine learning methods.When dealing with time series analysis, we encounter linear and nonlinear time-oriented records where ARIMA and ANN can be applied, respectively, under these conditions.Khan et al. (2020) showed that the sole application of ARIMA and ANN has a low ability in drought forecasting precision, while the combined approach of ARIMA-ANN improved the accuracy of the model.They developed a new discrete wavelet transform (DWT) based on ARIMA-ANN, known as W-ARIMA-ANN (W-2A), which helped in improving drought forecasting compared to both single and hybrid ARIMA-ANN models, considered meteorological DIs.Wavelet can be divided into two categories, including continuous wavelet transform and DWT, where the former is rarely used in time series forecasting due to its computational complexity and significant time requirement, while DWT is frequently used in prediction implementation to simplify the numerical solution (Shabri 2014).The following equation represents DWT.
where c(t) illustrates mother wavelet, n and m are integers that control the time and scale, respectively.Parameters s 0 ¼ 2 and t 0 ¼ 1 can be considered most popular choices.Based on Mallat's theory, the inverse DWT can be used to decompose the original discrete time series into a series of linearity-independent approximation and detail signals (Mallat 1989).The inverse DWT is defined by Mallat as the following equation: For the discrete wavelet at scale s ¼ 2 m and t ¼ 2 m n, wavelet coefficient can be considered As mentioned earlier, the aim of this research article is to enhance the accuracy of the ARIMA model for drought forecasting using wavelet transform as a data preprocessing step and to introduce the W-ARIMA model, a hybrid approach achieved by combining the wavelet and ARIMA models.In this approach, the original drought SPI is decomposed into sub-time series elements, which can then be utilized as inputs in the ARIMA model to enhance forecasting precision.Several levels of decomposition can be obtained using the following formula (Dawson et al. 2007): where L indicates the decomposition level and N represents the number of SPI data series.In this study, N ¼ 600, thus.
According to this formula, the original SPI data can be decomposed into several component levels (A, D 1, D 2, … , D LÀ1 ), each representing different frequency components of the original data.These components are unique and exert distinct effects on the original data.To enhance the forecast performance of the hybrid model, researchers incorporate an appropriate D component into the model instead of utilizing D components independently.A schematic diagram of the wavelet transform is displayed in Figure 2, which indicates decomposed time series, the approximate composition of the layer, and detailed components of all layers.The sum of detail components of all layers and the approximate element of the last layer constitutes the decomposed time series, which is mathematically represented in the following equation: Since the three-level wavelet transform of the signal is displayed in Figure 2, Equation ( 17) can be obtained according to Equation ( 16) like this.
We used DWT in this study because continuous wavelet transform requires more data and generates more information which is not suitable for this study (Che & Zhai 2022).

Hybrid W-ARIMA model
The hybrid W-ARIMA model combines the DWT and ARIMA techniques for time series analysis.It decomposes the original time series using DWT to capture both high-frequency and low-frequency patterns.ARIMA models are then applied separately to each decomposed component to forecast future values.The forecasts from the ARIMA models are inverse transformed using wavelets to reconstruct the final hybrid forecast for the original time series, thereby leveraging the strengths of both methods to enhance accuracy in modeling and forecasting complex time series data.The flow chart of this method is shown in Figure 3.The suggested hybrid model is composed of three following stages: 1.In the first stage, the original SPI values are decomposed using DWT by MATLAB.This transformation separated data into proper approximate and detailed components.Several wavelet decompositions are suggested including Daubechies, Symlet, Meyer, and Morlet, in which the type of mother wavelet is dependent on the characteristic of data (Benaouda et al. 2006;Nury et al. 2017).In this study, the Daubechies function of order 2 and decomposed level 3 are used.So, we have where the decomposed layer of data is represented by n, while D n and A n indicate the detail and approximation components of each layer, respectively.To make the temporal scale constant with the original data, the approximate element of the last layer (A n ), and detail components of each layer (D 1 , D 2 , … , D n ) should be reconstructed.2. In the second phase, the best ARIMA models are fitted into each decomposed layer for every SPI series.To meet the appropriate ARIMA model, all iterative steps explained in Section 3.1 are implemented to make a particular model for each decomposed component.3. Finally, the signal will be reconstructed using these decomposed and extended signals on different scales with the help of the following equation.The forecasted value of W-ARIMA can be achieved by arithmetic summing all subseries predictions of each decomposed layer for each SPI.
where f Ã (t) is the forecasted value of each SPI for the next year ahead.

Evaluation metrices
In this study, we employed several different statistical criteria to gauge the forecasting performance and the reliability of the proposed prototype, aiming to verify the accuracy of the model.The RMSE represents the standard deviation of the model's forecasted values; MAE provides the average difference between actual data and predicted outcomes; mean absolute percentage error (MAPE) serves as an indicator of how accurately forecasted values align with actual quantities; R 2 indicates the proportion of variability in observed data accounted for by the model; PBIAS (percent bias) is a hydrological performance metric quantifying the overall bias or systematic deviation of model forecasts from observed data; NSE is a standardized metric that assesses the proportion of residual variability (noise) relative to the variance of the observed data (information); and the Kling-Gupta efficiency (KGE) evaluates the level of agreement between the simulated values of a model and the observed data.The equations for these performance evaluation metrics are provided in Equations ( 20)-( 26), with further information available in the literature (Moriasi et al. 2007;Tongal 2013;Buyukyildiz et al. 2014;Koycegiz & Buyukyildiz 2019, 2022, 2023;Wu et al. 2021).

RMSE
where n represents the number of data, Y i indicates the observed data, Y i shows the mean of observed data, and Ŷi illustrates the predicted data.In the last equation, r represents the Pearson correlation coefficient between the simulated and observed values, α stands for the ratio of the standard deviation of the simulated data to that of the observed data, and β signifies the ratio of the mean of the simulated data to that of the observed data.Based on these criteria, a better model performance will be gained with smaller RMSE and MAE, and MAPE and large R 2 .KGE values vary within the range of -∞ to 1, where higher values signify improved model performance.A KGE value of 1 denotes a complete alignment between simulated and observed data, while lower values suggest less precise model performance.Table 3 (Koycegiz & Buyukyildiz 2019) presents the performance ratings of certain statistical indices that have been utilized.

RESULTS AND DISCUSSION
In this study, the W-ARIMA model is proposed to enhance the accuracy of the ARIMA model in predicting drought events in the Kabul province based on SPI.The dataset is divided into two periods: a training phase from 1970 to 2009 and validation data from 2010 to 2019.The model development comprises three major stages: model identification, estimation of unknown parameters, and diagnostic checking.In this study, the original SPI values are decomposed using MATLAB, while R software is employed for model development and predicting future drought occurrences.To classify drought events into short-term and long-term categories, different SPI values are computed considering total precipitation for running periods of 3, 6, 9, and 12 months.SPI 3 indicates short-term drought, SPI 6 and SPI 9 illustrate mid-term drought events, while long-term drought can be represented by SPI 12. Different SPI values are plotted in Figure 4 based on historical records.

ARIMA model
As mentioned earlier, to arrive at a suitable forecasting ARIMA model, we need to follow three iterative steps, which are described in the following sections.

Model identification
The stationarity and seasonality of the time series can be determined at this phase before parameter estimation.Autocorrelation function (ACF) and partial autocorrelation function (PACF) plots can be utilized to evaluate stationarity.In the case of a nonstationary time series, differential transformation can be applied to obtain stationary data.To determine the order of the ARIMA model, ACF and PACF plots are used.The best-fitted model can be chosen by goodness-of-fit criteria through the Akaike information criteria (AIC) and Schwarz-Bayesian criterion (SBC).According to these statistical measurements, the model with the lowest AIC and Bayesian information criterion is selected as the best (Akaike 1974;Schwarz 1978;Yeh & Hsu 2019).The mathematical formulation of these criteria is described as follows: where k is illustrating the number of parameters in the model ( is the likelihood function of the ARIMA model, and n is the number of data in the model.The goal of model identification is to uncover a fitting subclass of the ARIMA model that accurately fits the time series data.Initially, assessing the stationarity of the records is necessary.To achieve this, the augmented Dickey-Fuller test was employed.The outcome confirmed the data's stationarity and indicated no need for differencing.This result has been visualized in Table 4.
Figure 5 shows the ACF and PACF plots for the original time series of SPI 3 for illustration.
Multiple models have been fitted to various SPI timescales.Table 5 presents a summary of the best ARIMA models based on the AIC and SBC criteria.The best model for each original SPI can be determined by selecting the model with the lowest AIC and SBC values.

Parameter estimation
The subsequent step in constructing the best ARIMA model involves estimating the parameters of the selected model from the previous stage.Following the approach outlined by Box & Jenkins (1976), the maximum likelihood method, a technique for estimating parameters in statistical models, has been employed for parameter estimation.A robust estimator for the parameters can be computed by assuming data stationarity and maximizing the probability concerning the parameters.Generally, there are two conditions on parameters in estimating, one is stationary condition, which means jf p j , 1, and the other is invertibility condition, which means ju q j , 1. Model parameters, standard errors, t-statistics, and p-values are presented in Table 6.Clearly, the standard errors are relatively small in comparison to the values of the model parameters, and the majority of p-values in the models are significant.This indicates that these parameters can be included in the models.

Diagnostic checking
The third step in fitting an appropriate ARIMA model involves diagnostic checking, which entails examining whether the selected models from the previous stages genuinely satisfy the fundamental assumptions of time series.If the residuals of the obtained model are uncorrelated random variables with a mean of zero and constant variance, then the estimated model is suitable and confidently meets the basic assumptions, such as independence, normality, and homoscedasticity of the residuals.Alongside the residual ACF and residual PACF, the Ljung-Box Q (LBQ) statistic is employed to assess the overall adequacy of the model.The test statistic Q is expressed as follows: where n is the number of residuals, m represents the number of time lags, and r k indicates the residual autocorrelation at lag k.
The fitted models can be used to predict each SPI value if the tests and plots provide significant support for the hypothesis on residuals.In this study, both the correlogram and the LBQ test were employed to examine the independence of residuals.Figure 6 demonstrates the independence of residuals for SPI 3, as the majority of the RACF and RPACF values fall within the confidence limits.This clearly indicates that there is no substantial evidence of residual correlation among them.
Furthermore, the LBQ test was utilized to ascertain the independence of residuals.The outcomes of this test, as shown in Table 7, indicate that the residuals for each SPI value across various timescales were uncorrelated and exhibited the properties of a white noise process.
The histogram and normal probability plot are employed to assess the normality assumption of residuals.Figure 7 illustrates the histogram and normal probability test for SPI 3 as an example.It is evident that the residuals are distributed around zero and exhibit a relatively normal distribution (Mahmud et al. 2016;Rahman et al. 2017).In addition, since the residuals align closely with the diagonal line on the normal probability plot, the normality assumption is fulfilled (Mahmud et al. 2016;Widowati et al. 2016).
A scatterplot of the residuals against the predicted values was generated to verify the homoscedasticity of the residuals and assess whether the fitted models maintain consistent predictive ability for variable values.Figure 8 portrays the scatterplots of residuals for SPI 3 against fitted values as an illustrative example.The findings indicate that these scatterplots for each SPI  series lack any discernible pattern, implying that the residuals are randomly dispersed.Consequently, based on the diagnostic examination mentioned earlier, the residuals exhibit the characteristics of white noise: they are uncorrelated, normally distributed, randomly scattered, and possess constant variances.Thus, it can be concluded that the selected models are suitable for the respective SPI timescales.
To evaluate the validity of the fitted models, we employ monthly data covering the period from 2010 to 2019. Figure 9 provides a visual comparison between observed data and predicted values for each SPI series, utilizing the optimal SARIMA model.Table 8 in Section 3.2 presents the statistical performance metrics for both the ARMA/SARMA model and the W-ARIMA model for comparative analysis.

Proposed W-ARIMA model
To enhance forecasting accuracy, a combined model named the W-ARIMA model, integrating wavelet and ARIMA, is introduced to address the limitations of standalone ARIMA models when dealing with nonstationary data.In this approach, the DWT is applied to decompose the SPI series, generating suitable approximate and detailed components.Inverse wavelet transform is subsequently used to reconstruct the decomposed series.Optimal ARIMA/SARIMA models are fitted to each of the decomposed elements, and the W-ARIMA predictions are obtained by summing the forecasted values from all the decomposed components.
Each SPI series undergoes a three-level decomposition utilizing the DWT with the application of the Daubechies function of order 2 (db2).This decomposition results in the extraction of components including an approximate representation (A 3 ) and various detail components (D 3 , D 2 , D 1 ). Figure 10 illustrates the decomposition of SPI 3 into approximate and detailed components, provided as an example.
The proposed W-ARIMA model is easily derived by aggregating the predicted values from each decomposed layer, utilizing the three iterative stages previously elucidated, along with the corresponding ARIMA model for each constituent subseries.For various ARIMA/SARIMA models applied to each decomposed SPI (A 3 , D 3 , D 2 , and D 1 ) Timescales, the summary of optimal models based on AIC and SBC, is presented in Table 8.The most suitable model for each original SPI can be identified by selecting the models with the lowest AIC and SBC values.for both ARIMA and W-ARIMA, looking 1 year (12 months) ahead for each SPI series, are presented in Table 9 and Figure 15, respectively.
The comparison table (Table 9) assessing the accuracy of drought forecasting between the ARIMA and W-ARIMA models reveals notable insights.Various evaluation metrics, such as R 2 , RMSE, MAPE, NSE, KGE, and PBIAS, were employed to comprehensively evaluate their performances.In terms of R 2 , RMSE, MAPE, NSE, and KGE, the W-ARIMA model consistently demonstrated superiority over the ARIMA model, showcasing its enhanced predictive capability and precision.These metrics collectively indicate that the W-ARIMA model outperforms ARIMA across different dimensions of forecasting accuracy.However, it is noteworthy that for the PBIAS criterion, no significant difference between the two models was observed, implying a comparable performance in addressing bias.This comprehensive assessment underscores the favorable attributes of the W-ARIMA approach in improving the precision and reliability of drought forecasting compared to the traditional ARIMA method.
Figure 16 illustrates a comparative analysis between the ARIMA and W-ARIMA models for different SPI intervals (SPI 3, 6, 9, and 12) in terms of predictive results for a 1-year forecast.The evaluation encompasses several performance metrics.The RMSE, MAE, R 2 , and NSE metrics demonstrate the W-ARIMA model has the edge in drought forecasting.In all SPI intervals, W-ARIMA consistently produces more accurate forecasts, as evidenced by consistently lower RMSE and MAE values.The W-ARIMA model also exhibits higher R 2 values, which indicate a stronger relationship between forecasted and observed SPI values, as well as higher NSE values, which indicate a better fit to the data in general.W-ARIMA is therefore a better model for capturing SPI patterns and making drought forecasts more accurate and reliable than traditional ARIMA.
In the realm of drought forecasting, our study's hybrid W-ARIMA model tailored to Kabul, Afghanistan, emerges as a groundbreaking advance.Leveraging original data, we pioneer a statistical and data-driven approach, marking the first  academic effort of its kind within Afghanistan.Thorough evaluation, encompassing diverse metrics excluding RMSE and R 2 underscores the hybrid model's pronounced edge over traditional ARIMA.Distinctively, our work distinguishes itself by its localized focus on Kabul, serving as a trailblazer in introducing sophisticated methodologies for drought prediction in this region.This unique contribution augments both academic discourse and the practical domain, promising enhanced resilience strategies for managing drought-related challenges in Kabul and beyond.

CONCLUSION
In this study, we introduced a novel hybrid W-ARIMA model and applied it to forecast drought occurrences in Kabul, Afghanistan, addressing the critical issue of drought forecasting.Our innovative utilization of data-driven methods for drought  A central contribution of our research is the comprehensive comparison between the proposed W-ARIMA model and the traditional ARIMA model based on SPI.Through a comprehensive analysis using various performance metrics, including RMSE, MAE, MAPE, R 2 , NSE, PBIAS, and KGE, the clear superiority of the W-ARIMA approach becomes evident.Across all metrics, except PBIAS, the W-ARIMA model consistently outperforms the traditional ARIMA model, underscoring its potential to enhance drought forecast accuracy.
Initially, we establish the optimal ARIMA/SARIMA model for each SPI series, serving as a benchmark.Subsequently, the DWT (db2) is applied to decompose each SPI series, capturing the essential multiscale information required for accurate forecasting.The final W-ARIMA forecast for each SPI series is derived by fitting suitable ARIMA models to the decomposed elements (A 3 , D 3 , D 2 , and D 1 ).W-ARIMA model can be obtained by summing all predicted values of each decomposed layer using the three iterative stages explained in methodology with the corresponding ARIMA model for each constituent subseries.
Ultimately, our study pioneers the application of an advanced forecasting model for drought prediction in Kabul, Afghanistan.The proven effectiveness of the W-ARIMA model, coupled with a comprehensive comparison to the SPI-based ARIMA model, highlights its potential to enhance the precision of drought forecasting.Emphasizing localized approaches to improve forecasting accuracy, our work contributes to the broader field of climate resilience strategies.In comparison to prior research, our study represents a significant advancement in drought forecasting, extending the application of the W-ARIMA model to Kabul's unique context and providing a blueprint for similar regions facing comparable challenges.By incorporating the DWT and the SPI framework, our work showcases the potential of innovative methodologies that bridge traditional hydrological modeling with contemporary data-driven techniques, thus enriching the discourse on climate adaptation strategies.

Figure 1 |
Figure 1 | Location map of Kabul province and Kabul River Basin with the meteorological stations.

Figure 4 |
Figure 4 | SPI time series based on the average precipitation over the Kabul province, Afghanistan.

Figure 5 |
Figure 5 | ACF and PACF plot for model selection of SPI 3.

Figure 7 |
Figure 7 | The histogram (left column) and normal probability plot (right column) of residuals for SPI 3.

Figure 9 |
Figure 9 | Comparison of observed values with predicted values using the best SARIMA model for each SPI.

Figure 12 |
Figure 12 | Actual versus predicted values of A 3 , D 3 , D 2 , and D 1 for SPI 6 through W-ARIMA.

Figure 11 |
Figure 11 | Actual versus predicted values of A 3 , D 3 , D 2 , and D 1 for SPI 3 through W-ARIMA.

Figure 13 |
Figure 13 | Actual versus predicted values of A 3 , D 3 , D 2 , and D 1 for SPI 9 through W-ARIMA.

Figure 14 |
Figure 14 | Actual versus predicted values of A 3 , D 3 , D 2 , and D 1 for SPI 12 through W-ARIMA.

Figure 15 |
Figure 15 | Comparison of SPI forecasting of ARIMA versus W-ARIMA model for 1 year ahead.

Figure 16 |
Figure 16 | Evaluation performance indices for prediction results in ARIMA and W-ARIMA models.

Table 1
| Descriptive statistics of monthly precipitation data for Kabul, Afghanistan

Table 3 |
General performance ratings

Table 4 |
Stationarity assessment: augmented Dickey-Fuller test for SPI series

Table 5 |
Summary of best selected ARIMA model for each SPI series based on AIC and SBC criterion

Table 6 |
Statistical analysis of the model parameters for SPI timescales Figure 6 | The ACF and PACF of residuals for SPI 3.

Table 7 |
LBQ statistics for SPI series Figure 8 | The scatterplot of the residuals against predicted values for SPI 3.

Table 9 |
Comparative accuracy analysis of ARIMA and W-ARIMA models for each SPI