ABSTRACT
This study proposes a hybrid discrete wavelet transform (DWT) and support vector machine (SVM) model (W-SVM) to enhance drought prediction accuracy compared to traditional SVM and autoregressive integrated moving average (ARIMA) models using the Standardized Precipitation Index (SPI). Fifty years of historical average monthly precipitation data from the Kabul province, Afghanistan, are analysed. The data are split into training and testing sets at an 80/20 ratio. Daubechies order 2 (db2) is selected as the mother wavelet function with three decomposition levels for SPI-3, -6, -9, and -12. Each decomposed element, including detail components (D1, D2, D3) and approximation (A3), is forecasted with an SVM model, and the final forecast is obtained by summing all levels. Statistical metrics such as root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R2) are used for evaluation. Results demonstrate that W-SVM outperforms benchmark models, achieving lower RMSE and MAE values and higher R2 across all SPI values. These findings highlight DWT's effectiveness in enhancing SVM's capability for SPI pattern recognition. The proposed model provides accurate and timely drought predictions, making it a valuable tool for real-time drought monitoring and management, enabling informed decision-making to reduce drought impacts.
HIGHLIGHTS
Hybrid W-SVM model improves drought prediction accuracy over traditional SVM and ARIMA models.
Uses 50 years of precipitation data from Kabul, Afghanistan, based on SPI indicators.
Wavelet decomposition (db2) enhances SVM's ability to capture complex drought patterns.
W-SVM shows lower RMSE, MAE, and higher R2 across all SPI values.
Contributes to better early drought warning systems for climate adaptation.
INTRODUCTION
Drought, a natural phenomenon resulting from lower-than-average precipitation, directly and indirectly affects human and animal life. Natural disasters, including drought and floods, are among the costliest, accounting for 22% of economic damage (Wilhite et al. 2007). As a slow-developing event, drought has garnered significant attention from researchers and policymakers in recent decades due to its profound impact. Early warning, monitoring, and forecasting of drought in advance can mitigate its potential damage. Drought, categorized into meteorological, hydrological, agricultural, and socioeconomic types (Mishra & Singh 2011; Kikon & Deka 2022), highly depends on several variables. Factors such as soil moisture, temperature, low humidity, and rare precipitation lead to drought in specific regions (Quan et al. 2021). Among the various types of droughts, meteorological drought is considered a significant reduction in precipitation levels in a particular region (Deepa et al. 2022), as it can significantly increase evaporation rates and temperature and reduce soil moisture, leading to other types of droughts, such as agricultural drought (Faghmous & Kumar 2014; Luo et al. 2017).
Due to the complex nature of determining drought, several drought indices, including the Standardized Precipitation Index (SPI), Standardized Precipitation Evaporation Index (SPEI), and Palmer Drought Severity Index (PDSI) (McKee et al. 1993a, b; Vicente-Serrano et al. 2010; Dai 2011; Eslamian et al. 2017) have been developed for identifying drought. Among them, the SPI is widely used for meteorological drought identification because of its unique advantages, such as a simple calculation procedure that depends only on precipitation and its normalized characteristic, which makes it possible to apply in various geographical areas (McKee et al. 1993a, b; Karavitis et al. 2011).
In addition to autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA) models and regression-based models which are still popular in hydrological time series forecasting, machine learning methods had gained more attention in various field such as engineering, finance, agriculture, medicine and environmental sciences. The rapid widespread adoption of machine learning methods can be attributed to their high computational power, speed, availability of sensor data, and advancements in big data analysis (Yaseen et al. 2021; Çoban et al. 2024). Among the machine learning methods used in this area we can name artificial neural network (ANN), support vector machine (SVM), adaptive neuro-fuzzy inference system (ANFIS), long short-term memory (LSTM) (Chiang & Tsai 2013; Han et al. 2013; Mokhtarzad et al. 2017; Poornima & Pushpalatha 2019; Rezaiy & Shabri 2024b).
In a study by Moghimi et al. (2020), various ARIMA models were used to forecast the 3-month Reconnaissance Drought Index (RDI) for South Iran between 1980 and 2010. They determined that the moving average (MA) model was the most accurate, based on Akaike's Information Criterion Corrected (AICC). In another study, Zhang et al. (2020) evaluated ARIMA, wavelet neural network (WNN), and SVM models for predicting the 12-month SPEI in Sanjiang Plain, China. The ARIMA model proved to be the most effective, achieving high R2 and Nash–Sutcliffe Efficiency Coefficient (NSE) values above 0.9, with lower mean squared error (MSE) and Kolmogorov–Smirnov (K–S) distance. This underscores ARIMA's effectiveness in managing stationary data and capturing key features of hydrological time series.
Many researchers like Chen et al. (2020) used ordinary least square regression to develop meteorological composite drought index in Hubei Province, China. Li et al. (2020a, b) used ridge regression and lasso regression, two types of penalized linear regression, to forecast the SPEI over 3-, 6-, 12-, and 24-month timescales in Northeast China. They compared these models to ordinary least squares regression and three ensemble-based models: decision tree (DT), adaptive boosting, and random forest (RF). Among these, lasso regression demonstrated the highest accuracy for forecasting SPEI for both short- and long-term droughts. In their 2021 study, Ghasemi et al. (2021) used Gaussian process regression (GPR), multi-layer perceptron, and general regression neural network models to predict the SPEI for 12 months in advance, with lead times of 1–3 months, in Iran. They assessed the models' performance using mean absolute error (MAE), root mean square error (RMSE), sum of squared errors (SSE), and the coefficient of determination (R2). The GPR model stood out as the most accurate, achieving the lowest RMSE and highest R2 values, although its accuracy declined with longer lead times. Regression-based models for drought forecasting have become less popular in recent years because of their limitations. These models tend to fall short when it comes to accurately predicting outcomes in the presence of nonlinear relationships between hydrological variables (Fung et al. 2020).
In a study aimed at improving drought forecasting in eastern Australia, researchers Deo et al. (2017) utilized three models – multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree – to predict the SPI. The MARS model, which incorporated rainfall and climate indices, proved to be the most effective at three out of five stations, significantly reducing forecasting errors. At the other stations, the M5Tree model outperformed both MARS and LSSVM, while LSSVM performed best for Bathurst. Introducing periodicity into the models enhanced their accuracy by up to 8.1% and significantly reduced errors. These findings highlight the importance of considering periodicity and reveal that model performance can vary based on geographic location and seasonal factors, reflecting the complexity of drought dynamics. Zhang et al. (2017) used three different models – ARIMA, ANN, and wavelet neural networks (WANN) – to forecast drought in the North Haihe River Basin, looking at SPI over two periods (SPI-6 and SPI-12). They evaluated how well each model worked using various tests and metrics such as the Kolmogorov–Smirnov test, Kendall rank correlation, and R2 values. The findings showed that WANN was the best model for predicting SPI-6 and SPI-12 in this region. Essam et al. (2022) tested various machine learning models, including SVM, ANN, and LSTM, to forecast streamflow for 11 rivers using data from Malaysia's Department of Irrigation and Drainage. The ANN3 model, which uses streamflow data from the previous 3 days, turned out to be the most effective. It performed the best in 4 of the 11 datasets and was consistently reliable. This makes it the top choice for predicting streamflow in the region. Different pre-processing methods like variational mode decomposition (VMD), bagging, boosting, bagging-VMD, and boosting-VMD were tested to predict river water levels, which is key for flood management and planning. The study used daily rainfall data from the Dungun River Basin in Malaysia. Boosting-VMD made a big difference in accuracy when paired with ANN and support vector regression (SVR). Among these, the Boosting-ANN model performed the best, with the lowest errors and highest accuracy (Tiu et al. 2022). Mokhtar et al. (2021) used several machine learning models – RF, Extreme Gradient Boost (XGB), convolutional neural network (CNN), and LSTM – to forecast drought on the Tibetan Plateau between 1980 and 2019, using the SPEI as a measure. Among them, the XGB and RF models showed the best performance, accurately predicting drought conditions for both 3-month (SPEI-3) and 6-month (SPEI-6) periods using a variety of climate data. To predict hydrological drought in the Gidra River, researchers used ANNs and SVM on 58 years of daily river discharge data. These models achieved perfect accuracy, highlighting their strong potential for early drought detection and effective water management strategies (Almikaeel et al. 2022).
Achite et al. (2023) applied four machine learning models – ANN, ANFIS, SVM, and DT – to predict hydrological drought in Algeria's Wadi Ouahrane basin. The models were assessed using metrics like RMSE, MAE, NSE, and R2. All models performed well, but the SVM model stood out, with an average RMSE of 0.28, MAE of 0.19, NSE of 0.86, and R2 of 0.90. For forecasting the Standardized Runoff Index (SRI) over a 12-month period, SVM achieved an R2 of 0.95, although its accuracy decreased with shorter timescales. Researchers also evaluated different machine learning models like ANN, SVR, SVR-PSO (particle swarm optimization), and SVR-RSM (response surface method) to predict monthly drought indices like SPI, normal precipitation (PN), Effective Drought Index (EDI), and Modified China-Z Index (MCZI). The SVR-RSM model, with its new calibration method, proved to be the most accurate. SPI had the best match with MCZI, while EDI forecasts were less precise. For short-term drought predictions in dry areas, SVR-RSM is the recommended model due to its strong performance (Piri et al. 2023).
Machine learning methods often find it challenging to forecast non-stationary drought time series (Belayneh et al. 2014). To tackle this issue, data preprocessing techniques such as wavelet transform and empirical mode decomposition can be helpful. Wavelet transform is particularly effective for breaking down complex data into more manageable pieces, making it easier to handle nonlinear and non-stationary time series (Hinge et al. 2022; Karbasi et al. 2022; Piri et al. 2023). By analyzing data at different resolution levels, wavelet transform can improve the performance of machine learning models, especially when using decomposed data instead of raw data. Belayneh et al. (2016) investigated how SVR and ANN models for drought prediction in the Awash River Basin of Ethiopia by combining wavelet transforms with bootstrap and boosting approaches. For SPI values (SPI-3, SPI-12, SPI-24) that represent varying drought durations, wavelet analysis increased the accuracy of drought predictions. RMSE, MAE, and R2 were used to assess the models' performance; the boosting ensemble technique improved the correlation between the observed and predicted SPIs. In terms of prediction accuracy, the wavelet boosting-ANN (WBS-ANN) and wavelet boosting SVR (WBS-SVR) models fared better than other model types overall. ANN models were used for drought forecasting in the Algerois basin, Algeria, and compared with ARIMA and SARIMA models. In order to improve accuracy, the study applied wavelet preprocessing (WANN) to the SPI across a range of temporal scales. Due to their greater accuracy and simplicity, the WANN models are highly recommended for drought early warning, regularly outperforming the other models in predicting SPI levels (Djerbouai & Souag-Gamane 2016). In addition to that, wavelet transform and Ensemble Empirical Mode Decomposition (EEMD) as two powerful data preprocessing techniques are used with ARIMA and SARIMA modeling to predict drought in Kabul and Herat provinces in Afghanistan. The combined models W-ARIMA (Rezaiy & Shabri 2023a) and EEMD-ARIMA (Rezaiy & Shabri 2024a, b) showed excellent results with lower RMSE, MAE, and higher R2 compared to traditional ARIMA/SARIMA.
Although preprocessing techniques like wavelet transform are commonly used in combination with both traditional and machine learning methods for hydrological time series prediction, their application in drought forecasting – especially with advanced machine learning models like SVM – has been relatively limited. Most existing hybrid models combining wavelet transforms with machine learning, such as WANN, WBS-ANN, or WBS-SVR, have focused on specific aspects of prediction accuracy, often neglecting the integration of computational efficiency and adaptability across varying timescales of SPI.
The novelty of the W-SVM model is that it can effectively combine wavelet transform with SVM to improve the prediction accuracy with less computational cost. Unlike previous models which often use ensemble method or limited to specific SPI timescales, the W-SVM model uses the strength of wavelet decomposition to preprocess and decompose the raw precipitation data into meaningful components. Each component is modeled using SVM and then combined to provide highly accurate prediction of SPI at multiple timescales (SPI-3, SPI-6, SPI-9, and SPI-12). This structured approach can capture the nonlinearity and nonstationarity of drought time series. Moreover, the study applies this hybrid model to Kabul province, Afghanistan where limited studies have applied such advanced method for drought forecasting. The W-SVM model fills the gaps and provides a new framework for drought forecasting with better performance than individual models such as ARIMA and SVM. The performance of proposed model is examined through statistical evaluation metrices like RMSE, MAE, and R2 on forecasting SPI series.
MATERIAL AND METHODS
Study area and data
Afghanistan, a country in South Asia, has an arid and semi-arid climate. Water resources and agricultural watering are highly dependent on winter precipitation, which typically occurs as snowfall. This country has experienced several drought periods since 1970 that have adversely affected the lives of millions of people, directly or indirectly. The most recent significant drought reported in 2018 affected 10.5 to 17 million people, leading to difficulties in drinking water, lack of enough water for land irrigation, unemployment, and consequently migration both within and outside the country (Mayar 2021). Therefore, an early and accurate drought prediction system is vital to mitigate the negative impact of drought. It can help policymakers, water resource managers, and people to take precautionary measures.
Standardized Precipitation Index
The SPI, initially proposed by McKee et al. (1995), is one of the most popular and frequently used drought indices. Its dependence only on precipitation data and simple calculation procedure have placed it among popular indices, and the World Meteorological Organization (WMO) recognized it as a standard drought index that can be applied in all climate regions. In contrast, other drought indices might require a large number of different types of data, such as rainfall, temperature, and soil moisture, which make the calculation process more sophisticated, and all types of data might not be easily available in all regions (Yihdego et al. 2019).
The applicability of the SPI for use in arid and semi-arid climates was carefully evaluated in this study. Since SPI relies on the gamma distribution for probability fitting, its performance can sometimes be affected by deviations from this distribution, particularly in regions with low and variable precipitation. A goodness-of-fit test was conducted to assess whether the precipitation data from Kabul province adhered to the gamma distribution. The results indicated slight deviations (p-value = 0.0062), which are commonly observed in real-world precipitation data from arid regions. Such deviations are consistent with findings in similar studies (Wu et al. 2007; Mahmoudi et al. 2022). Despite this, SPI has been widely recognized as a reliable tool for drought analysis, especially when used alongside suitable pre-processing methods.
To address these deviations, wavelet decomposition was applied to preprocess the precipitation data. This technique allowed multi-temporal variability to be captured effectively, reducing the impact of the observed deviations on SPI calculations. The combination of wavelet decomposition and the SPI's established reliability ensures that it remains a robust tool for drought assessment in the study area. Similar approaches have been successfully applied in other arid and semi-arid regions, as demonstrated in studies by (Belayneh et al. 2016) and (Zhang et al. 2020), further supporting the methodology employed in this research





The constant values in Equation (6) and Equation (7) are C0 = 2.515517, C1 = 0.802853, C2 = 0.010328, D1 = 1.432788, D2 = 0.189269, and D3 = 0.001308.
To have a better insight into SPI values and their interpretation, Table 1 illustrates the SPI categorization.
Drought and moisture classification using the SPI (Guttman 1999)
SPI values . | Category . |
---|---|
![]() | Extremely wet |
![]() | Very wet |
![]() | Moderately wet |
![]() | Nearly normal |
![]() | Moderate drought |
![]() | Severe drought |
![]() | Extreme drought |
SPI values . | Category . |
---|---|
![]() | Extremely wet |
![]() | Very wet |
![]() | Moderately wet |
![]() | Nearly normal |
![]() | Moderate drought |
![]() | Severe drought |
![]() | Extreme drought |
ARIMA
In these equations, represents the observed time series, while
and
are polynomials of order p and q, respectively. The random errors,
, are assumed to be independently and identically distributed with a mean of zero and constant variance. The differencing operation,
, is used to make the data series stationary, where d indicates the number of regular differences applied.
In the SARIMA model, P denotes the order seasonal AR part, and Q indicates the order of the seasonal MA part. D is the number of seasonal differencing needed to make the series stationary, and S specifies the length of the seasonal cycle. On the other hand, p represents the order of the non-seasonal AR component, q is the order of the non-seasonal MA component, and d is the number of non-seasonal differencing required to stabilize the series. For more information on how to implement ARIMA and SARIMA models, check out (Rezaiy & Shabri 2023b). It offers a clear guide on using these models, including steps for applying them and tips for getting the best results.
Support vector machine
SVM is a popular machine learning technique used for tasks like regression, classification, and pattern recognition. The core aim of SVM is to minimize errors and enhance the model's generalization capabilities. It's grounded in statistical learning theory, emphasizing structural risk minimization to improve performance. Due to its proven effectiveness and robustness, SVM has become a widely used tool across various fields (Vapnik 1995; Deka 2014).
In Equations (13) and (14), and
are slack variables that set limits on the output values. The model is optimized using the Lagrange function and includes a kernel function to handle non-linearities. The parameter C controls the trade-off between the model's complexity and its ability to generalize. Meanwhile,
defines the acceptable error level, helping to keep the solution simple and sparse.
In this context, is the kernel function, which defines the inner product between
and
in a higher-dimensional space. The variables
and
are Lagrange multipliers that emerge from solving the optimization problem.
Radial Basis Function (RBF) and polynomial kernel functions are used in this study as they have been shown to work well with nonlinear and complex relationships in hydrological time series data such as SPI. RBF kernel is good at capturing intricate nonlinear relationships by mapping the input data into a higher-dimensional feature space. This is important for drought prediction as drought patterns often have complex interactions with climate and temporal factors. Moreover, RBF kernel is flexible and makes minimal assumptions about the data distribution and is a robust choice for different datasets. Its hyperparameter gamma controls the influence of individual training samples and balances bias and variance (Cristianini & Shawe-Taylor 2000).
On the other hand, the polynomial kernel excels in modeling structured relationships through its ability to capture polynomial trends of varying complexity via its degree parameter. This flexibility is particularly beneficial for datasets with periodic or quasi-periodic trends, which are common in climatic time series. By adjusting the kernel's degree and coefficients, it can represent both linear and nonlinear relationships effectively. Previous studies, including those by Vapnik (1995), Cristianini & Shawe-Taylor (2000), and Smola & Schölkopf (2004) have highlighted the strengths of these kernels in applications involving nonlinear and structured data.
By leveraging the strengths of both kernels, this study ensures a comprehensive approach to modeling SPI data, where the RBF kernel addresses nonlinearity, and the polynomial kernel captures underlying structured relationships. This combination enhances the robustness and accuracy of the proposed drought forecasting framework across multiple timescales.
Discrete wavelet transform



Proposed hybrid W-SVM model
Model evaluation metrics


RESULTS AND DISCUSSIONS
SPI values for monthly average precipitation data obtained from the Kabul province (1970–2019).
SPI values for monthly average precipitation data obtained from the Kabul province (1970–2019).
The ARIMA model consists of three main stages: model identification, parameter estimation, and diagnostic checking. The best model is determined by the minimum Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), while the maximum likelihood approach is the most common method for model parameter estimation. The diagnostic checking process ensures that the residuals follow white noise and that the model is sufficient. For more information on how the ARIMA model can be used for drought forecasting, refer to (Rezaiy & Shabri 2023b).
Accuracy comparison of W-SVM, SVM, and ARIMA models for drought prediction using statistical evaluation metrics
Name . | Model . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE . | MAE . | R2 . | RMS . | MAE . | R2 . | ||
SPI-3 | ARIMA | 0.7100 | 0.5534 | 0.4922 | 0.6380 | 0.4734 | 0.5732 |
SVM | 0.6883 | 0.5024 | 0.5207 | 0.6550 | 0.5051 | 0.5476 | |
W-SVM | 0.3002 | 0.2102 | 0.9124 | 0.3857 | 0.2567 | 0.8383 | |
SPI-6 | ARIMA | 0.4895 | 0.3779 | 0.7445 | 0.4399 | 0.3256 | 0.7942 |
SVM | 0.5126 | 0.3863 | 0.7130 | 0.5477 | 0.4052 | 0.6991 | |
W-SVM | 0.2338 | 0.1584 | 0.9406 | 0.2153 | 0.1574 | 0.9529 | |
SPI-9 | ARIMA | 0.3789 | 0.2866 | 0.8402 | 0.3550 | 0.2561 | 0.8674 |
SVM | 0.4107 | 0.2974 | 0.8095 | 0.4277 | 0.3185 | 0.8385 | |
W-SVM | 0.1898 | 0.1344 | 0.9594 | 0.1924 | 0.1401 | 0.9629 | |
SPI-12 | ARIMA | 0.2743 | 0.2010 | 0.9191 | 0.2468 | 0.1846 | 0.9411 |
SVM | 0.3368 | 0.2403 | 0.8739 | 0.3603 | 0.2380 | 0.8759 | |
W-SVM | 0.1651 | 0.1127 | 0.9696 | 0.1949 | 0.1428 | 0.9640 |
Name . | Model . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE . | MAE . | R2 . | RMS . | MAE . | R2 . | ||
SPI-3 | ARIMA | 0.7100 | 0.5534 | 0.4922 | 0.6380 | 0.4734 | 0.5732 |
SVM | 0.6883 | 0.5024 | 0.5207 | 0.6550 | 0.5051 | 0.5476 | |
W-SVM | 0.3002 | 0.2102 | 0.9124 | 0.3857 | 0.2567 | 0.8383 | |
SPI-6 | ARIMA | 0.4895 | 0.3779 | 0.7445 | 0.4399 | 0.3256 | 0.7942 |
SVM | 0.5126 | 0.3863 | 0.7130 | 0.5477 | 0.4052 | 0.6991 | |
W-SVM | 0.2338 | 0.1584 | 0.9406 | 0.2153 | 0.1574 | 0.9529 | |
SPI-9 | ARIMA | 0.3789 | 0.2866 | 0.8402 | 0.3550 | 0.2561 | 0.8674 |
SVM | 0.4107 | 0.2974 | 0.8095 | 0.4277 | 0.3185 | 0.8385 | |
W-SVM | 0.1898 | 0.1344 | 0.9594 | 0.1924 | 0.1401 | 0.9629 | |
SPI-12 | ARIMA | 0.2743 | 0.2010 | 0.9191 | 0.2468 | 0.1846 | 0.9411 |
SVM | 0.3368 | 0.2403 | 0.8739 | 0.3603 | 0.2380 | 0.8759 | |
W-SVM | 0.1651 | 0.1127 | 0.9696 | 0.1949 | 0.1428 | 0.9640 |
From Table 2, it can be clearly observed that the proposed W-SVM model outperformed both the ARIMA and SVM models for all SPI values in terms of lower RMSE and MAE, and higher R2 values. A general overview of the results in the table shows that all three models – ARIMA, SVM, and W-SVM – improved prediction accuracy from short-term drought (SPI-3) to medium (SPI-6) and long-term (SPI-9, -12) drought scales. However, W-SVM outperformed the other benchmarks in short, mid, and long-term drought prediction.
Looking more closely at the results in the table, for example, in SPI-3 during the testing period, the ARIMA model had an RMSE of 0.6380, MAE of 0.4734, and R2 of 0.5732. The SVM model's RMSE, MAE, and R2 were 0.6550, 0.5051, and 0.5476, respectively. The evaluation performance criteria for the proposed W-SVM were RMSE: 0.3857, MAE: 0.2567, and R2: 0.8383. These numbers indicate that the proposed model improved drought prediction accuracy compared to ARIMA by 39.55% and compared to SVM by 41.11%, based on RMSE. The improvement for MAE compared to ARIMA and SVM is 45.76 and 49.18%, respectively. The W-SVM improved R2 by 38.78 and 34.68% compared to the traditional ARIMA and SVM models.
These promising results continued for mid and longer-period SPI. For example, in the testing period of SPI-12, the W-SVM demonstrated improved drought prediction accuracy compared to ARIMA, with reductions in RMSE by 21.03%, MAE by 22.64%, and an increase in R2 by 2.38%. The RMSE, MAE, and R2 for the SVM model in SPI-12 are 0.3603, 0.2380, and 0.8759, respectively, while these measures for W-SVM are 0.1949, 0.1428, and 0.9640, respectively, for RMSE, MAE, and R2. This means that the proposed W-SVM improved drought prediction accuracy. The improvement percentage is 45.91% for RMSE, 40% for MAE, and 9.14% for R2.
Comparison of W-SVM, SVM, and ARIMA models with observed values across different SPI levels.
Comparison of W-SVM, SVM, and ARIMA models with observed values across different SPI levels.
Limitations and future work
While the proposed W-SVM model demonstrates significant improvements in drought forecasting accuracy, it is not without limitations. A key challenge lies in the reliance on the SPI, which, as highlighted by Wu et al. (2007), can introduce inaccuracies in arid and semi-arid regions due to its dependence on a gamma distribution. Mahmoudi et al. (2022) suggested adjustments to SPI for better representation of drought conditions in these regions, while Isia et al. (2022) emphasized its limitations in capturing drought severity, particularly under irregular rainfall patterns. These studies underscore the need for careful consideration of SPI's assumptions when applied to such climates.
The integration of wavelet transform with SVM in this study aimed to address some of SPI's shortcomings by decomposing multi-temporal precipitation patterns. This approach enhances SPI's utility in arid environments, but its performance in temperate regions, where SPI is inherently more robust, has yet to be explored. Extending this model to temperate zones may offer additional insights into its adaptability and reliability across diverse climatic conditions.
Another limitation of the W-SVM model lies in its validation within a single semi-arid case study region. While the results indicate the model's potential for generalization, further research is required to confirm its applicability in other semi-arid and arid regions with varying climatic and hydrological characteristics. Testing the model in regions such as the Middle East, North Africa, and Central Asia would help establish its robustness. Similarly, adapting the model for use with other drought indices, such as the PDSI or RDI, could further broaden its applicability.
The computational complexity of wavelet decomposition and SVM modeling may limit its applicability to real-world problems. Future research should focus on developing computationally efficient variants of the model or exploring hybrid approaches that integrate W-SVM with advanced deep learning techniques.
Therefore, the following areas are proposed for future research:
(i) Refining SPI or integrating alternative drought indices to address limitations in arid and semi-arid regions.
(ii) Testing the model's generalizability across diverse climatic zones, including temperate regions.
(iii) Developing computationally efficient methodologies to ensure broader accessibility and usability.
(iv) Exploring hybrid models that integrate DWT with deep learning methods such as LSTM and CNNs to further enhance prediction accuracy.
(v) Applying different wavelet mother functions and comparing the prediction performance of other preprocessing techniques like EMD and EEMD with DWT in creating combined models for drought forecasting.
CONCLUSION
Time series drought prediction is vital for better water supply management and precautionary actions against the potential negative effects of drought. Drought, which can be categorized through drought indices like SPI, may exhibit complex patterns such as nonstationarity and nonlinearity, making accurate predictions more complex. This is because statistical and machine learning methods have limitations in capturing these non-stationary and nonlinear characteristics. This study aimed to develop a new approach for drought prediction that combines DWT with SVM (W-SVM) using 50 years of historical data from Kabul province, Afghanistan, which has been considered a drought-prone area in recent decades.
The findings of the study revealed that the proposed model outperformed the benchmark models of SVM and ARIMA across short, mid, and long-term droughts. In SPI-3, the W-SVM improved drought prediction accuracy by about 39.55 and 41.11% compared to ARIMA and SVM according to RMSE, and by 45.76 and 49.18% according to MAE, respectively. The R2 for the W-SVM showed an improvement of 38.78 and 34.68% compared to the benchmark ARIMA and SVM models. This promising performance is also valid for SPI-6, SPI-9, and SPI-12. For instance, the W-SVM improved drought prediction accuracy by 21.03% (RMSE), 22.64% (MAE), and 2.38% (R2) compared to the ARIMA model during the testing period. Meanwhile, these statistics against the SVM models show increased accuracy of 45.91% (RMSE), 40% (MAE), and 9.14% (R2).
Another interesting result is that all three models (ARIMA, SVM, and W-SVM) demonstrated greater capability in capturing longer drought prediction patterns. As we move from SPI-3 (short-term) to SPI-6 (mid-term) and SPI-9 and 12 (long-term) droughts, the RMSE and MAE of the models decrease, while the R2 increases. This indicates that W-SVM, as well as traditional ARIMA and SVM, have a better ability to capture longer drought patterns, with the proposed model surpassing the other two benchmarks across all SPI values.
ACKNOWLEDGEMENTS
The first author would like to express gratitude to the Kabul Education University (KEU) for the study leave and Afghanistan's Ministry of Higher Education for the scholarship.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.