Abstract
Kazakhstan is recently experiencing an increase in drought trends. However, low-capacity probabilistic drought forecasts and poor dissemination have led to a drought crisis in 2021 that resulted in the loss of thousands of livestock. To improve drought forecasting accuracy, this study applies Machine Learning and Deep Learning (ML and DL) algorithms to capture the sequences of drought events using a non-contiguous drought analysis (NCDA). Precipitation, 2-m temperature, runoff, solar radiation, relative humidity, and evaporation were collected from the ERA5 database as input variables. Combinations of inputs were used to build ML models, including seven classifiers (Logistic, K-NN, Kernel SVM, Decision Tree, Random Forest, XGBoost, and GRU). The output events were defined by standardized precipitation index (SPI) and SPEI indicators as binary classes. Weekly time series from 1991 to 2021 for each cell were used to forecast a lead time from 1 week to 6 months. GRU provided 97–99% accuracy in more volatile regions while Random Forest and XGBoost showed 94–99% accuracy at a lead time of 6 months. The accuracy evaluation was based on the confusion matrix and F1 score to analyze the stage change capture. This study demonstrates the effectiveness of using ML and DL algorithms for drought forecasting, with potential applications for other regions.
HIGHLIGHTS
Advanced Forecasting: ML and DL algorithms, including non-contiguous drought analysis, were implemented.
Data Diversity: ERA5 data on precipitation, temperature, and more is used for model construction.
High Accuracy: GRU achieves 97-99% accuracy, and Random Forest/XGBoost show 94-99% accuracy at a 6-month lead time.
Global Relevance: Study highlights ML/DL effectiveness in drought forecasting, applicable to similar regions.
INTRODUCTION
Droughts are drawing the attention of experts in various fields increasingly from year to year as an emerging environmental disaster. Droughts occur in all climatic zones, including both high- and low-rainfall areas. Hot temperatures, low relative humidity, and the timing and characteristics of precipitation – including the distribution of wet days during agricultural growing seasons, the strength and duration of the rain, as well as its onset and termination – all aggravate them (Mubenga-Tshitaka et al. 2021). A drought, unlike aridity, which is a permanent component of climate and is limited to low-rainfall areas, has a temporary occurrence. Drought can strike everywhere on the planet, having a devastating impact on water supplies and economic activities. According to FAO (2017), in the last 40 years, the percentage of land affected by drought has doubled. For developing countries, 80% of losses are associated with droughts since agriculture is influenced the most. Although drought impacts are severest for the developing world, developed countries are under the same risk. For example, the United States experienced the harshest flash droughts in 2007, 2012, and 2017 (USDM 2021). The World Meteorological Organization (WMO) emphasizes the importance of developing and implementing national policies based on the best definition and characterization of drought to improve drought impact mitigation. Droughts are produced by a distinct combination of environmental and economic elements in meteorological, ecological, agricultural, hydrologic, and socioeconomic droughts, making it complicated to construct a single holistic definition of it. The recent appearance of flash drought has added to the complexity. Flash droughts are defined by their sudden onset, which is usually caused by abnormally elevated temperatures, high evapotranspiration, little precipitation, and low soil moisture. The current monitoring is mostly accounting for a lack of water (Otkin et al. 2018).
To be prepared for drought, it is essential to know its spatial distribution by having a coherent procedure for forecasting and dependable models. Recent technologies such as machine learning (ML) have been studied to provide reliable long-term forecasts months in advance. ML, called data-driven models, is significantly less complex than physical-based models, uses far fewer computational resources, and can reach greater accuracy. The capacity to uncover subtle or hidden patterns in complicated geographical data with a limited prior understanding of how these factors interact is a benefit of ML (Brust et al. 2021).
ML has significantly advanced drought research, enhancing our ability to predict, monitor, and mitigate drought conditions. These applications include early warning systems for drought prediction, the use of remote sensing and satellite data for real-time monitoring, crop yield prediction to aid farming decisions, optimization of water resource management during droughts, forecasting droughts using meteorological and environmental data, assessing risks related to drought-related disasters, integrating diverse data sources for a comprehensive understanding of drought conditions, developing climate change adaptation strategies, creating decision support systems for policy and response guidance, and building public awareness tools to educate and engage the public. These developments rely on high-quality data and ongoing model refinement, making ML crucial for addressing drought challenges (Prodhan et al. 2022; Ghobadi & Kang 2023). The accuracy of the results however varies and depends on the data quality and quantity, the ability of modelers to incorporate all important factors of the problem in question, and the specifics (physics) of the modeled processes. An emerging trend is the ‘explainable AI’ (see Molnar 2020) aiming at building ML models which would not be black boxes but allow for interpretation by domain experts; for example, M5 model trees (being equivalent to piece-wise linear models) belong to this class of models. A similar trend is named ‘physics-aware AI’, aiming at a combination of process-based (which encapsulate the process physics) and data-driven (ML) models (e.g., Jiang Zheng & Solomatine 2020). In this paper, a somewhat simpler (and more often used) approach to the incorporation of physical knowledge is employed: its essence is in the choice of the most relevant set of variables (features) to be used in ML models as inputs, aiming either at optimizing the resulting model performance, or via incorporation of expert knowledge, or both. This approach is typically referred to as ‘feature engineering’, or ‘input variables selection’ (see e.g., Moreido et al. 2021).
In terms of the ML techniques used, there are various techniques that have been used in modeling hydrometeorological processes. Lately, a lot of attention has been given to the so-called deep learning (recurrent multi-layered neural networks), e.g., Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) (a version of LSTM), which also demonstrated high accuracy in drought forecasting (Brust et al. 2021; Dikshit & Pradhan 2021).
Consequently, the goal of the study is to develop and examine the methodology for forecasting the meteorological drought events and spatial extent using the standardized precipitation index (SPI) and SPEI (classifying subregions to be in a state of drought, or not). As a case study, this work will focus on Kazakhstan. Although most of the land in Kazakhstan is already arid and semi-arid, seasonal droughts create even more hostile conditions for agriculture and normal human life activities. This research employs a methodology based on the input variable selection and comparative analysis of several ML models, in which forecasting uncertainty is also analyzed. Finally, conclusions are drawn, and further recommendations are made.
STUDY AREA AND DATA
Case study
The utilization of water resources in Kazakhstan represents a complex set of interconnected challenges encompassing social, political, and economic aspects. Inadequate management of water resources serves as a hindrance to their effective utilization. As the population in the region continues to grow, the issue of water scarcity in Central Asia becomes increasingly pressing as the population is increasing. Given the scarcity of water resources, it is imperative to view water security as an integral component of national security in the Republic of Kazakhstan. The WMO has defined four levels of stress linked to water scarcity. In Kazakhstan, the highest levels of stress are observed in five out of eight water economic basins (WEBs), with the Shu-Talas and Nura-Sarysu WEBs registering indices of 0.98 and 1, indicating full utilization of river runoff. Anticipated reductions in river runoff in Kazakhstan may lead to significant shifts in both the quantities and patterns of water consumption (Tursunova et al. 2022).
In a land-locked country like Kazakhstan, evidence of climate change such as warming oceans, decreasing ice sheets, sea-level rise, and ocean acidification is barely discernible, yet temperature rise and the increased likelihood of occurrence of extreme events (floods and droughts) are of great concern. The ICCP RCP8.5 projection of temperature and precipitation changes is going to extremes by the end of the century. If the temperature is rising all over the territory, the precipitation pattern is different depending on the region: it is increasing in the North and East while decreasing in the South and West. West Kazakhstan is already experiencing severe drought conditions for the last 3 years, having the severest in history last summer, in 2021 (Pannett 2021). It is forecast that Kazakhstan will be affected by all types of droughts (meteorological, hydrological, and agricultural). Currently, an annual mean probability of a meteorological drought is not exceeding 5%. Nevertheless, even with the most fortunate scenario (RCP2.6), the probability of severe drought is increasing up to 40%. The greatest impact is on the West and South (Mangystau and Kyzylorda), having a probability of over 80% by the end of the century (ADB 2021). Mangystau region is already experiencing drought conditions and drastic loss of livestock.
An early drought warning is the responsibility of the National Hydro-Meteorological Service (NMHS). NMHS develops forecasts and predictions of extreme weather events and risks based on the network of hydrometeorological stations for prompt communication of all key stakeholders, including state authorities, economic sectors, and the public. To be able to provide improved operational monitoring and drought event forecasting, the present system needs to advance its technological capabilities. Responsible authorities should consider tools like ML forecasting to provide accurate and timely information for knowledge-driven decisions related to droughts in the country's various social and economic sectors considering the rising risks associated with the impacts of climate variability and climate change in the region (Dubovyk et al. 2019).
Data
Data for input variables
One of the most common hydrometeorological datasets used in research is the ERA5 data, from the European Centre for Medium-Range Weather Forecasts (ECMWF). Initially, we aimed to consider the hydrometeorological variables typically used in drought forecasting, e.g., in the study by Brust et al. (2021), namely 13 variables: precipitation, surface soil moisture, vapor pressure deficit, rootzone soil moisture, wind speed, minimum temperature, solar radiation, maximum relative humidity, maximum temperature, minimum relative humidity, evapotranspiration, gross primary product, and runoff. However, not all variables were present in the database for the chosen period (1991–2021). Soil moisture is an essential indicator of the water content available in the soil, and it was a desired input parameter to consider. Unfortunately, none of the available ECMWF datasets could provide soil moisture values within the historical period chosen (1991–2021) and for the whole area. We were also not able to acquire additional datasets during the duration of this study. Finally, data for the six variables are presented in Table 1. Final input parameters were used. The monthly averaged data were retrieved from Muñoz Sabater (2019) databases. Determining whether the parameters are enough for accurate forecasting is a part of the research. As an important reference, Liu et al. (2021) also used a (limited) set of 5 variables for similar research (precipitation, air temperature, relative humidity, sunshine duration, and wind), obtaining successful results.
Index . | Meteorological parameter . | Units . |
---|---|---|
1 | Precipitation | m |
2 | 2 m temperature | K |
4 | Solar radiation | Jm−2 |
5 | Relative humidity | % |
6 | Evaporation | m of water equivalent |
. | Hydrological parameter . | . |
3 | Runoff | M |
Index . | Meteorological parameter . | Units . |
---|---|---|
1 | Precipitation | m |
2 | 2 m temperature | K |
4 | Solar radiation | Jm−2 |
5 | Relative humidity | % |
6 | Evaporation | m of water equivalent |
. | Hydrological parameter . | . |
3 | Runoff | M |
The aim of this research is to demonstrate that precise forecasting can be achieved using readily accessible data, such as ERA5, even in regions where a diverse range of data is typically unavailable. Consequently, data from weather stations in Kazakhstan was neither utilized as a data source nor employed for verification purposes.
Data for output variables
Standardized precipitation index
The SPI from precipitation data was calculated by using Python code, by defining the spi function as the inverse of CDF (gamma distribution). Since the precipitation data were retrieved from the ECMWF database, it was possible to obtain the SPI for all pixels (0.5° × 0.5°, 1,480 pixels in total following the shape of Kazakhstan) for a monthly interval from January 1991 to December 2021. The average monthly data were converted to a weekly one by assuming that 1 month = 4 weeks using the formulae below.
Standardized Precipitation Evaporation Index
The SPEI-1 indices were collected from the Global SPEI database (Beguería et al. 2014). The collected data for Kazakhstan will contain only pixels related to Kazakhstan (similarly to SPI, 0.5° × 0.5°, 1,480 pixels in total). Since the database contains time series up to December 2018, it was decided not to infuse the available data with the data from other datasets. Therefore, the historical period of analysis was also modified for SPEI-based model analysis to be January 1991–December 2018.
METHODOLOGY
This study mainly follows the spatiotemporal drought analysis methodology, and non-contiguous drought analysis (NCDA), developed by Corzo Perez et al. (2011). NCDA is used to identify the hydrological drought on a larger scale. Although the reference study analyzes hydrological droughts, the same methodology can be applied to meteorological droughts since all droughts are caused by a lack of precipitation. NCDA focuses on Kazakhstan as a whole. First, the drought index calculation was conducted where the definition of the threshold of the drought anomaly was chosen based on the literature review. The majority of the studies follow McKee Doesken & Kleist (1993), choosing −1 as a threshold. This indicates that for time scale i, a drought event is defined as a period during which the SPI is consistently negative and achieves a value of −1 or below. The drought begins when the SPI drops below zero for the first time and ends when the SPI returns to a positive value after a value of −1 or less. Therefore, −1 was chosen as a threshold for both SPI and SPEI as Liu et al. (2021) used it.
ML model setup
Preventing overtraining (overfitting) is a critical aspect of ML, and there are various measures taken to ensure that the model generalizes effectively to the new, unseen data. It can be done in different ways, e.g. introducing a cross-validation set. We have not done that, but undertook several other ways to prevent overfitting, mainly by ensuring that the model is not more complex than needed (Occam's razor principle). First, at various stages we have employed the ‘leave-one-out’ method to check validity of the model on unseen data. Second, we conducted feature selection to eliminate irrelevant features, and hence building simpler models. Thirdly, we experimented with models of different structures (complexities) to determine the most suitable level. Fourth, we applied data augmentation to increase the dataset's size and introduce data variations. Fifth, we fine-tuned the model's hyperparameters to discover the best settings for the ML models. These strategies are discussed in more detail in the following sections.
Creation of input combinations
Creation of output dataset
*tp = precipitation, t2m = 2 m temperature, ro = runoff, ssr = solar radiation, r = relative humidity, e = evaporation, and k is a lead time and varies from 1 week to 6 months.
The spatial analysis is based on the occurrence of an event. The analysis was performed for two learning problem types: classification and regression.
Classification
Regression
The difference between the regression analysis from the classification is that the mask was not applied. The output was SPI or SPEI values themselves, being from −3 to +3. The regression was evaluated only on six ML models, excluding GRU.
Data preprocessing
Input variables for both training and test sets were scaled by sklearn.preprocessing StandardScaler since the range of the variables varied significantly (from the negative values of evaporation to 106 for the solar radiation).
Upsampling to balance the dataset
Hyperparameter tuning
It was decided to tune the hyperparameters of ML models to achieve the maximum accuracy possible, which was applied to all classifiers except for Logistic. RandomizedSearchCV from sklearn.model_selection was used to perform an exhaustive search of the most suitable hyperparameters.
ML algorithms
For this paper, we are applying several ML techniques, one of which is a deep learning model (gated recurrent unit, GRU, a version of a recurrent neural network), and all others we may call ‘shallow’ ML models.
Logistic classification or regression
It is a linear regression model extension for classification problems. Although a linear regression model can be effective for some regression problems, it is too simple to perform well for classification (Molnar 2020). Therefore, it was not expected to have good results on logistic classification, but it was of the research interest to observe the behavior of such a classifier on a complicated correlation between six input variables and a binary outcome.
K-Nearest Neighbor
The K-NN approach classifies an unseen sample based on the majority of the k neighbors' (output) classes. Distances to these neighbors can be weighted by the inverse distance to the unseen instance (or using a kernel function of it) (Mucherino Papajorgji & Pardalos 2009).
Kernel Support Vector Machine
The SVM model, based on the ideas developed by Vladimir Vapnik in the 1970s (see Vapnik 1999), constructs an N-dimensional surface with margins, which separates samples belonging to different classes, ensuring effective generalization ability even without using a cross-validation set (so-called ‘big margin classifier’).
Decision tree
A decision tree is one of the oldest classification models that uses a series of tests stated at each branch (or node) in the tree to recursively segment a dataset into smaller subdivisions. A root node (made from all the data), a collection of internal nodes (splits), and a set of terminal nodes make up the tree (leaves). The dataset is classed in this framework by systematically subdividing it according to a certain criterion (typically, minimizing entropy in each resulting subset), and a class label is issued to each observation based on which leaf node it falls into (Friedl & Brodley 1997).
Random forest
Random forest is a model proposed by Breiman & Cutler (2001) which is a set (ensemble) of classification trees (typically, Breiman's regression trees), which are basic models that predict outcomes using binary splits on predictor variables. Many classification trees are built in the random forest scenario utilizing randomly selected training datasets and random subsets of predictor variables for modeling outcomes. As a result, as compared to a single decision tree model, random forest frequently gives superior accuracy while retaining some of the tree model's advantages. The capacity to manage datasets with many predictor variables is one of the key advantages of utilizing random forests in a wide variety of applications. Regarding variable selection methods for random forests, see, e.g., Speiser et al. (2019).
XGBoost
Boosting is an approach leading also to an ensemble of decision or regression trees, but it is a sequential model, where each subsequent tree is dependent on the outcome of the previous. Boosting assigns weak learners to a weighted subset of the original dataset. Weak learners have little predictive ability and perform just marginally better than random guessing. Subsets that were previously misclassified are given more weight and hence the probability to be selected for the subsequent learner. As a result, the ensemble has a good generalizing ability. The two widely used versions of boosting are adaptive boosting AdaBoost (see e.g., Shrestha & Solomatine 2006), and gradient boosting (Friedman 2001). A popular implementation of the latter is in XGBoost (extreme gradient boosting), a C ++ library with APIs for several languages (XGBoost 2023), and it was used in this study.
Gated Recurrent Unit
Cho et al. (2014) proposed GRU, a deep learning model, which is comparable to LSTM but easier to compute and apply. The reset gate r and the update gate z make up a typical GRU cell. The update gate selects how to use the previously stored information to generate the new state, whereas the reset gate chooses how to mix the new input with the previously stored data. Utilizing the hidden state at time t-1 and the input time series value at time t, the hidden state output at time t is calculated. Details can be found in Cho et al. (2014) and Lynn et al. (2019).
Stage 1: regionalization
N . | Location of the reference point . | Terrestrial ecosystem . | Köppen climate class . | N of wet days/year . | N of cells (SPI based) . | N of cells (SPEI based) . |
---|---|---|---|---|---|---|
0 | 44.25 ° N 51.25 ° E | Desert | BWk | 60 | 157 | 132 |
1 | 54.25 ° N 69.25 ° E | Forest & cropland | Dfb | 147 | 280 | 204 |
2 | 50.25 ° N 83.25 ° E | Grassland (mountain) | Dfb | 141 | 173 | 187 |
3 | 49.75 ° N 52.75 ° E | Arable land | BSk | 94 | 290 | 310 |
4 | 49.75 ° N 72.75 ° E | Grassland (steppe) | Dfa | 129 | 264 | 342 |
5 | 42.25 ° N 69.75 ° E | Desert | Dsa | 86 | 316 | 305 |
1,480 cells |
N . | Location of the reference point . | Terrestrial ecosystem . | Köppen climate class . | N of wet days/year . | N of cells (SPI based) . | N of cells (SPEI based) . |
---|---|---|---|---|---|---|
0 | 44.25 ° N 51.25 ° E | Desert | BWk | 60 | 157 | 132 |
1 | 54.25 ° N 69.25 ° E | Forest & cropland | Dfb | 147 | 280 | 204 |
2 | 50.25 ° N 83.25 ° E | Grassland (mountain) | Dfb | 141 | 173 | 187 |
3 | 49.75 ° N 52.75 ° E | Arable land | BSk | 94 | 290 | 310 |
4 | 49.75 ° N 72.75 ° E | Grassland (steppe) | Dfa | 129 | 264 | 342 |
5 | 42.25 ° N 69.75 ° E | Desert | Dsa | 86 | 316 | 305 |
1,480 cells |
Stage 2: The whole area
At this stage, the multivariate time series of all points (1,480 cells) within the regions are engaged in drought forecasting instead of only six reference points as in Stage 1. The division of the research area into cells was discussed in Section 2.3.2. The difference from Stage 1 is that instead of only 1 point per region, the forecasting is performed for every point within a region. The number of points per region is shown in Table 2. This gives the highest possible precision of the forecasting since the exact location of drought cells is identified. This gives us a clear picture of where exactly drought spatial location is expected. However, Stage 1 was needed to identify the best combinations of input variables and the best performing ML classifier for every lead time and SPI and SPEI-based drought indices, to be used in this stage. Therefore, the exhaustive search is not performed here which saves computational capacity and time.
Model performance metrics
Accuracy for binary classification
Accuracy for the state change
Accuracy for regression
RESULTS AND DISCUSSIONS
Stage 1. Regionalization
Preparing data
Choice of the reference points (regions) number
The location of the chosen reference points and their characteristics are presented in Figure 8 and Table 2. As mentioned before, the division was performed based on the highest linear correlation of SPI or SPEI between every 1,480 cells to the reference point. The area is divided into six regions since it is the optimal number of regions to achieve at least a minimum (50%) correlation. Table 3 shows the minimum correlation and the corresponding number of regions. As can be seen from Table 3, the optimal number is 6.
Number of regions . | Minimum correlation [%] . | |
---|---|---|
Based on SPI . | Based on SPEI . | |
2 | 18.0 | 19.7 |
3 | 23.7 | 24.2 |
4 | 32.3 | 35.6 |
5 | 47.6 | 48.0 |
6 | 50.0 | 51.1 |
Number of regions . | Minimum correlation [%] . | |
---|---|---|
Based on SPI . | Based on SPEI . | |
2 | 18.0 | 19.7 |
3 | 23.7 | 24.2 |
4 | 32.3 | 35.6 |
5 | 47.6 | 48.0 |
6 | 50.0 | 51.1 |
As it was discussed in Section 2.3.1.3.1, upsampling of the training set was performed to balance the dataset with drought events as well as hyperparameter tuning to improve the forecasting accuracy. As discussed in Section 2.3.1.1, the number of input parameters varies from 2 to 6 forming all combinations with precipitation. For easier representation, the hydrological and meteorological parameters in combinations are labeled with numbers.
Effect of upsampling
4.1.2. Classification by ‘shallow’ ML models: drought or no-drought
Tuning hyperparameters based on the model's performance
One of the major accuracy improvements was to tune the hyperparameters of ML models to achieve the maximum accuracy possible, which was applied to K-NN, Kernel SVM, Decision Tree, Random Forest, and XGBoost. Random Forest and XGBoost were already providing high accuracy results (>86%) and the boost of the accuracy was not as significant as for other ML models. Table 4 shows an example of the parameters of the ML output to achieve the highest accuracy at lead time = 4 weeks: region number, SPI or SPEI based, a combination of the best input parameters, the highest accuracy itself, CM (true positive, false positive, false negative, true negative), tuned parameters obtained through a randomized search of the hyperparameters' combinations.
Region ID . | SPI/SPEI . | Combination . | Accuracy . | ML model . | Confusion matrix . | Untuned/Tuned parameters . | |
---|---|---|---|---|---|---|---|
0 | SPEI | 1 2 3 5 6 7 9 12 | 0.8627 | XGBoost | 236 | 15 | Untuned |
31 | 53 | ||||||
0 | SPEI | 1 2 3 6 9 10 11 12 | 0.9413 | Random Forest | 228 | 23 | {'n_estimators': 90, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'auto', 'max_depth': 30, 'bootstrap': False} |
34 | 50 | ||||||
1 | SPI | 1 2 3 4 5 6 7 10 11 12 | 0.9407 | XGBoost | 325 | 3 | Untuned |
19 | 24 | ||||||
1 | SPI | 1 2 3 4 7 8 10 12 | 0.9923 | Random Forest | 324 | 4 | {'n_estimators': 50, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'auto', 'max_depth': 50, 'bootstrap': False} |
28 | 15 |
Region ID . | SPI/SPEI . | Combination . | Accuracy . | ML model . | Confusion matrix . | Untuned/Tuned parameters . | |
---|---|---|---|---|---|---|---|
0 | SPEI | 1 2 3 5 6 7 9 12 | 0.8627 | XGBoost | 236 | 15 | Untuned |
31 | 53 | ||||||
0 | SPEI | 1 2 3 6 9 10 11 12 | 0.9413 | Random Forest | 228 | 23 | {'n_estimators': 90, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'auto', 'max_depth': 30, 'bootstrap': False} |
34 | 50 | ||||||
1 | SPI | 1 2 3 4 5 6 7 10 11 12 | 0.9407 | XGBoost | 325 | 3 | Untuned |
19 | 24 | ||||||
1 | SPI | 1 2 3 4 7 8 10 12 | 0.9923 | Random Forest | 324 | 4 | {'n_estimators': 50, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'auto', 'max_depth': 50, 'bootstrap': False} |
28 | 15 |
As can be observed from Table 4, before tuning, XGBoost was showing the highest accuracy with the combination of input variables. Then, when the parameters were tuned (for every ML model except Logistic), Random Forest showed better accuracy with another combination of inputs by adjusting the number of estimators, and minimum sample splits. Therefore, the boost of the accuracy from 0.86 to 0.94 was obtained for the SPEI-based model of Region 0 and from 0.94 to 0.99 for the SPI-based model of Region 1.
As can be seen from Figure 13, for SPI-based models at lead time = 1 week, precipitation at (t− 1) has the highest importance having a much higher value than the rest of the variables, except for Region 5. For SPEI-based models, the importance of the variables is more distributed than for the SPI-based. Regarding the lead time = 6 months, it is seen that the combinations change significantly for SPI-based rather than for SPEI-based. What is observed for SPEI-based is that evaporation at (t− 1) becomes an important parameter to identify the drought conditions.
Classification by a deep learning algorithm (GRU)
What can be observed from Figure 14 is that the accuracy increased from 0.97 to 0.99 across six regions, the lowest for Region 0, as it was with ML models. However, it is still higher than what Logistic, K-NN, Kernel SVM, Decision Tree, Random Forest, and XGBoost could provide. For the rest, there were occasional improvements in accuracy, but not for all regions. Therefore, GRU can provide better results for more volatile regions than traditional ML models.
Is deep learning (GRU) better? Indeed, we can observe an improvement in accuracy, most probably since for each forecast made DL can automatically include many more data instances from the past, and because the model is much more complex (has many more weights) than other considered models. However, this ‘deepness’ may mean that a DL model automatically makes physically irrelevant (lagged) inputs part of a model (on this, see e.g., Moreido et al. 2021). The disadvantage of GRU compared to other (non-deep) ML models is the significantly greater computational time and load. Other ML classification models have much simpler configurations and more transparent and physically explainable sets of inputs.
Regression: forecast of SPI or SPEI
As can be seen from Figure 15, the accuracy is quite high (around 0.8) for the lead time within 2 weeks for Regions 0–4 for SPI-based models (Region 5 is an outlier) while dramatically decreasing as the lead time increases and reaches months. This is no surprise since a regression problem is much more difficult than a binary classification problem. Interestingly, the accuracy of SPI-based models decreases more significantly than that of SPEI-based models. Better tuning may improve accuracy at a longer lead time. It would be also reasonable to test deep learning models, which are reportedly accurate for time series forecasting problems.
Forecasting the state change between drought and no-drought
Stage 2. Considering the whole area
Although for combinations the accuracy was high (minimum 86% among six regions and two types of bases for drought indices for ML models and 97%+ for GRU models), there is still a question: how representative is 1 point for the relatively large area to characterize a state of drought? How homogeneous is the region? So, as was explained earlier, Stage 2 was undertaken, where all points of the area are considered. This section will discuss the PDA provided by the forecasting in every cell and the comparison of the forecasted results and observed drought cells to identify the level of accuracy.
Percentage of drought area
The PDA time series indicate the most drought-susceptible seasons for every region. As can be seen from Figure 17, SPI-based PDA has a pattern for every region except for Region 0 while it is chaotic for SPEI-based PDA. As discussed before, forecasting for SPEI is a more complicated problem due to a more entangled correlation between input and output parameters. Droughts covering up to 100% happened in different years at different months making the region drought susceptible at any time of the year. This is threatening since regional agriculture is concentrated around farming. Consequently, the water supply must be abundant for the entire year. On the contrary, as seen from the SPI-based PDA, Region 1 is most susceptible to drought during the end of the summer-fall seasons and has low PDA for the rest of the year with few outliers. This also creates a problem since this is the main cereal-producing region, and summer is the crop-producing season. The region historically relies on natural precipitation, which is getting less dependable as the climate is changing. Region 2 is most susceptible during wintertime, which affects crop production indirectly such as not enough snow layer to protect the soil. However, this does not generate the same level of an issue as for Region 1. Region 3 shows a similar pattern as Region 0 (both are in the West part of Kazakhstan) with occasional drought area peaks at any time of the year. However, most of the time, summer is the most drought-susceptible season, which affects agriculture since the vast area is arable land. Another problem is the drying of the available freshwater resources: The Ural River and the Caspian Sea. This is a major concern of the region for the last few years. Region 4 has a similar pattern. The difference between Region 1 and 4 is that this region has irrigation (Karatal irrigation massif) and is more protected from drought. Region 5 is mostly affected during the winter and fall seasons.
Comparison to the reference and determination of the forecasting accuracy
Discussion
In the 21st century, the active utilization of machine and deep learning for drought forecasting has become increasingly prominent. Despite the rapid advancements in technology, our research focuses on crafting a finely tuned model by synthesizing existing models prevalent at the time. A key innovation in our approach lies in meticulous feature selection, where we explored 2,047 combinations of input variables, tested seven machine and deep learning classifiers, and two types of output variables. This exhaustive exploration resulted in the evaluation of over 1.5 million models to identify the most suitable configuration for six distinct regions across nine lead times.
Contrary to the common belief that maximizing input variables enhances model performance, our study reveals the potential risk of overtraining and its adverse impact on forecasting capability. To refine model accuracy, we employed hyperparameter tuning, an often overlooked facet in model development. Notably, we achieved an 87% accuracy in forecasting "borderline events," which are pivotal for identifying transitions from a non-drought to a drought state and determining drought duration.
While the ensemble forecasting approach, specifically utilizing the stacking method, significantly improved accuracy, detailed discussion on this aspect is beyond the scope of this work. Our ensemble strategy demonstrated heightened accuracy, particularly over a lead time of up to 6 months. The study recognizes the potential of ensembles to enhance accuracy and outlines avenues for future research, emphasizing the importance of nuanced model development strategies to optimize drought forecasting precision. These insights have implications for understanding and mitigating the impact of drought events.
CONCLUSION
The main objectives of the research were to examine the performance of ML-based forecasting of spatiotemporal meteorological drought events in Kazakhstan employing extensive input variable selection. To analyze the spatiotemporal drought development and its characteristics utilizing large-scale gridded time series of hydrometeorological data, the NCDA methodology was utilized. This allowed for the identification of spatial extension and drought event occurrence in time. The work was performed by gradually increasing the complexity of the model. Therefore, two stages were used, from analyzing one reference point per region to the whole area of Kazakhstan. From analyzing the results, some key findings should be formulated:
Although SPI was derived from precipitation, the combinations of input variables lead to a better result than when the precipitation was the only input. Therefore, it is recommended to use combinations of meteorological and hydrological variables instead of only precipitation for drought forecasting.
The inclusion of lagged input variables for the previous time step (along with those for the current step) did not only increase the selection of the input combinations but also provided more accurate forecasting, increasing the accuracy of tuned models up to 99% for SPI-based and 94% for SPEI-based.
GRU (a deep learning technique) performs somewhat better than ‘shallow’ ML models having an accuracy of up to 97% for SPEI-based models and 99% for SPI-based for a more volatile region (Region 0 where the severe drought occurred in 2021). This can be explained by the fact that for each forecast made DL is including data from the (deeper) past, and because the model is much more complex than ‘shallow’ learning models. However, this also means that a DL model takes physically irrelevant (far-lagged) inputs, which makes it less explainable than the ‘shallow’ ML models. DL models are also more complicated, and computationally demanding during training.
ML techniques provided not only good results to classify ‘drought’ or ‘no-drought’ conditions but also delivered adequate results for the ‘borderline events’ at the state change (change from drought to no-drought and vice versa). This was obtained using an F1 score, having a minimum of 87% accuracy.
Regardless of the complexity of the analysis, it was observed that SPI-based drought indices models performed better at shorter lead times while worse with increasing lead time, compared to SPEI-based. The core of the reason also lies in the different derivations of the indices. SPEI is better suited for monitoring agricultural and hydrological drought, which happens over a longer period than meteorological drought for which SPI is suited better.
Generally, accuracy for both SPI and SPEI does not fall below 94%. It was found that all models typically overestimate – the drought area is exaggerated.
Regression (numerical forecasting of SPI or SPEI indices) showed relatively high results for shorter lead times (1–2 weeks) but failed to achieve good results when lead time increased to months in advance.
For further research interest, it is suggested to explore the use of other deep learning techniques, committee models (weighted ensembles), multiclass drought classification, and explore dynamics of the drought cells and clusters.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.