## Abstract

Drought is a serious natural disaster that causes huge losses to various regions of the world. To effectively cope with this disaster, we need to use drought indices to classify and compare the drought conditions of different regions. We can take appropriate measures according to the category of drought to mitigate the impact of drought. Recently, deep learning models have shown promising results in this domain. However, few of these models consider the relationships between different areas, which limits their ability to capture the complex spatio-temporal dynamics of droughts. In this study, we propose a novel multivariate spatio-temporal sensitive network (MSTSN) for drought prediction, which incorporates both geographical and temporal knowledge in the network and improves its predictive power. We obtained the standardized precipitation evapotranspiration index and meteorological data from the climatic research unit dataset, covering the period from 1961 to 2018. This is the first deep learning method that embeds geographical knowledge in drought prediction. We also provide a solid foundation for comparing our method with other deep learning baselines and evaluating their performance. Experiments show that our method consistently outperforms the existing state-of-the-art methods on various metrics, validating the effectiveness of geospatial and temporal information.

## HIGHLIGHTS

A novel multivariate spatio-temporal sensitive network (MSTSN) is proposed for drought prediction.

The MSTSN model captures the relationships between different areas by graph neural network.

The MSTSN model extracts the long-term dependencies of data through a gated recurrent unit and enhances its effect by multi-head self-attention.

The MSTSN model predicts the spatial drought distribution closest to the actual distribution.

## INTRODUCTION

Droughts develop slowly and affect vast regions over time, causing severe social and economic damage. Droughts in China have become more frequent, severe, and widespread in the last six decades due to global climate change. From 1950 to 2015, the cumulative disaster area in Henan Province was 113,796,627 acres, and the average disaster area over the years was 1,724,191 acres. As the main grain-producing area in China, drought not only impacts the economic development of Henan Province but also affects the national food security when the drought is severe. Therefore, we choose Henan Province as the focus of this study to improve the ability of drought prediction and help decision-makers make informed decisions, which is of great significance in reducing agricultural economic loss.

Previous studies have developed various indices to capture the different features of drought. Palmer (1965) developed a drought index called the Palmer drought severity index (PDSI) using precipitation and temperature data, which is still in use today. McKee *et al.* (1993) developed the standard precipitation index (SPI), and in 2009, the World Meteorological Organization recommended the index to other countries around the world. Currently, the SPI is widely used globally. The standardized precipitation evapotranspiration index (SPEI) was developed by Vicente-Serrano *et al.* (2010). Compared to SPI, SPEI includes temperature information by introducing potential evapotranspiration. Moreover, its different time scales can characterize different types of drought. SPEI is extensively used in drought assessment and forecasting. According to the National Center for Environmental Information (NCEI), drought classification converts a large amount of drought index into a category representing drought severity. Interpreting the drought category is simpler than interpreting drought index values and can help stakeholders easily understand. Therefore, decision-makers are most concerned with the category of drought rather than the values of the drought indices (Bazrkar & Chu 2022).

Drought prediction research can be broadly categorized into physical models, statistical models, and deep learning models. The physical processes of the land, ocean, and atmosphere are simulated by physical models. However, their accuracy in forecasting precipitation on a monthly or seasonal scale limits their ability to predict drought (Deo & Şahin 2015). Statistical models use various influencing factors as predictors to analyze relationships between historical records. This includes various techniques like regression (Li *et al.* 2020b), time series analysis (Han *et al.* 2010), and machine learning approaches (Komasi *et al.* 2018; Fung *et al.* 2020; Ma *et al.* 2022). These models are widely used because of their simplicity in structure, small data requirements, and low computational costs. However, they fail to adapt to the non-stationary nature of drought estimation, and they are prone to overfitting because of the lagged terms in the time series data. Therefore, we should explore the potential of more advanced deep learning methods.

Deep learning models are increasingly used for drought forecasting. One of the most popular models is the recurrent neural network (RNN) (Le *et al.* 2017), which can handle sequential data. However, RNNs struggle to capture long-range dependencies in time series data, which are important for drought prediction. To overcome this limitation, long short-term memory (LSTM) (Poornima & Pushpalatha 2019) and gated recurrent unit (GRU) (Zhou *et al.* 2022; Yan *et al.* 2023) models have been developed, which can retain information over longer periods. However, these models may face the problem of gradient vanishing or gradient explosion. Transformer (Vaswani *et al.* 2017) solves their limitations in handling long-term dependencies and capturing long sequences. Another type of model that can be useful for time series classification is the convolutional neural network (CNN) (Ham *et al.* 2019), which can extract features from different dimensions and reduce model complexity. Fully convolutional networks (FCNs) (Wang *et al.* 2017) are a variant of CNNs that achieve high performance in time series classification tasks. A recent model that combines LSTM and FCN with a Squeeze-and-Excitation Block is MLSTM-FCN (Karim *et al.* 2019), which can learn the relationships between different features at each time step. However, these models all ignore the spatial structure of large regions, which can affect drought patterns. Graph neural networks (GNNs) (Scarselli *et al.* 2008) are a class of models that can exploit the spatial structure within a region and have shown success in time series prediction tasks such as crop yield prediction (Fan *et al.* 2022) and COVID forecasting (Kapoor *et al.* 2020), traffic flow prediction (Lan *et al.* 2022). Common graph neural network (GNN) models include graph convolutional network (GCN) (Bhatti *et al.* 2023), graph attention network (GAT) (Chen *et al.* 2023), GraphSAGE (Liu *et al.* 2023), etc. To enhance the performance of these models, the attention mechanism (Li *et al.* 2020a) was introduced. It is a technique that enables a neural network to selectively attend to the most significant portions of the input and output data, while disregarding less relevant parts. Vaswani *et al.* (2017) introduced a compelling enhancement to this mechanism known as multi-head self-attention, which allows the network to further refine its focus by attending to multiple subspaces of the input and output sequences simultaneously. It has several benefits over single-head attention, such as capturing various dependencies and facilitating interpretability.

*et al.*2021). Henan Province is a major agricultural region in China and is vulnerable to drought. Accurate prediction of drought severity in different areas of Henan Province is essential for developing effective and sustainable adaptation strategies, which can prevent water and food scarcity and reduce economic losses (Adnan

*et al.*2023). This study use the Climate Research Unit (CRU TS v4.03) (Harris

*et al.*2020) dataset from 1961 to 2018, with a spatial resolution of 0.5° × 0.5°. Henan Province has 69 grid points, numbered from 1 to 69 in a top-to-bottom, left-to-right order. Existing methods treat each grid point as an independent region, which may not fully use the spatial structure of a larger region. Figure 1 shows that adjacent grid points have a strong correlation in drought, while non-adjacent grid points have significant differences, violating the independence assumption. To improve the prediction ability of drought severity and address the limitations of previous methods that neglect the geographical knowledge of each region, this study proposes a novel multivariate spatio-temporal sensitive network (MSTSN) for drought prediction. This method has two modules: the spatial aware module (SAM) and the temporal enhanced module (TEM). The SAM uses GNN to combine the features from neighboring grid points with its own features to boost the predictive power, while the TEM extracts temporal information of the aggregated features. By integrating geospatial and temporal information, our model captures the complex spatio-temporal dynamics of droughts more effectively than existing deep learning models. To our knowledge, our work is the first to incorporate geographical knowledge into drought prediction. The main contributions and objectives of this paper are: (1) To propose a novel MSTSN for drought prediction, which can effectively capture the complex spatio-temporal dynamics of droughts and improve the prediction ability of drought severity; (2) To introduce geographical knowledge into drought prediction for the first time, by using GNN to fuse the features from neighboring grid points with its own features, fully utilizing the spatial structure information among regions; (3) To conduct a comprehensive experimental evaluation, comparing with four deep learning models, and visualizing the prediction results of each model, analyzing the evolution of drought in Henan Province from 2015 to 2018.

## STUDY AREA AND DATA

The dataset used in this study is the climatic research unit (CRU) dataset, which was developed by the University of East Anglia. The dataset features a spatial resolution of 0.5° × 0.5° and comprises 11 variables, including cloud cover, frost day frequency, potential evapotranspiration, rainfall, diurnal temperature range, relative humidity, daily mean temperature, monthly average daily maximum temperature, vapor pressure, monthly average daily minimum temperature, and wet day frequency. This dataset has been utilized for a broad range of applications, such as climate variability, agronomic research (Renard & Tilman 2019), and paleo-climatic studies (Nagavciuc *et al.* 2019). We use ArcMap (Wadwekar & Kapshe 2023) to filter the dataset for Henan Province's meteorological data, based on its boundary vector file. The resulting dataset contains meteorological data for 69 grid points, which is illustrated by the red dots in Figure 2.

In this study, we use the standardized precipitation evapotranspiration index (SPEI) as the drought index for prediction. This index takes into account not only the statistical distribution of precipitation but also potential surface evapotranspiration, providing a more comprehensive reflection of regional drought conditions. The specific calculation process is described in Vicente-Serrano *et al.* (2010). The SPEI values are available at multiple time scales, including 1, 3, 6, 9, 12, and 24 months, and different time scales can characterize different types of drought. Generally, shorter time scales are typically used to assess meteorological drought, while medium time scales are commonly utilized to evaluate agricultural drought. Longer time scales are more appropriate for describing hydrological drought. The CRU dataset provides SPEI values at different time scales globally. The drought categories, which are divided into drought categories according to the grades of meteorological drought^{1}, are presented in Table 1.

Category . | Description . | SPEI classifications . |
---|---|---|

0 | No drought | > − 0.5 |

1 | Mild drought | [−1.0, − 0.5] |

2 | Moderate drought | [−1.5, − 1.0) |

3 | Severe drought | [−2.0, − 1.5) |

4 | Extreme drought | < − 2.0 |

Category . | Description . | SPEI classifications . |
---|---|---|

0 | No drought | > − 0.5 |

1 | Mild drought | [−1.0, − 0.5] |

2 | Moderate drought | [−1.5, − 1.0) |

3 | Severe drought | [−2.0, − 1.5) |

4 | Extreme drought | < − 2.0 |

## METHODOLOGY

In drought prediction, we denote each grid point's by and ground-truth drought category by , where *c*, *t* represent grid point and month, respectively. Each contains 11 weather features, detailed symbol explanations are shown in Table 2.

Symbols . | Meanings . |
---|---|

the c-th grid point at time t, | |

A | symmetric adjacency matrix |

N,M | number of grid points, number of samples |

the aggregated embedding of c-th grid point at time t | |

GNN embedding of c-th grid point at time t | |

,,, | the values of the reset gate, the values of the update gate, the current candidate state, the current hidden state |

the activation function | |

_{,}_{,} | the learnable weight matrices for the i-th attention head |

Q,K,V | query matrix, key matrix, value matrix |

,, | the real drought category of the c-th grid point at time t, the final output of the TEM module, the predicted distribution of the c-th grid point at t-th month |

GNN | Graph neural network |

GRU | Gated recurrent unit |

SPEI | Standardized precipitation evapotranspiration index |

CRU | Climate research unit |

LSTM | Long short-term memory |

FCN | Fully convolutional network |

CNN | Convolutional neural network |

ROC-AUC | Receiver operating characteristic – area under the curve |

Symbols . | Meanings . |
---|---|

the c-th grid point at time t, | |

A | symmetric adjacency matrix |

N,M | number of grid points, number of samples |

the aggregated embedding of c-th grid point at time t | |

GNN embedding of c-th grid point at time t | |

,,, | the values of the reset gate, the values of the update gate, the current candidate state, the current hidden state |

the activation function | |

_{,}_{,} | the learnable weight matrices for the i-th attention head |

Q,K,V | query matrix, key matrix, value matrix |

,, | the real drought category of the c-th grid point at time t, the final output of the TEM module, the predicted distribution of the c-th grid point at t-th month |

GNN | Graph neural network |

GRU | Gated recurrent unit |

SPEI | Standardized precipitation evapotranspiration index |

CRU | Climate research unit |

LSTM | Long short-term memory |

FCN | Fully convolutional network |

CNN | Convolutional neural network |

ROC-AUC | Receiver operating characteristic – area under the curve |

### Spatial aware module

As shown in Figure 1, it can be seen that geographically adjacent grid points have a strong correlation in drought. Intuitively, if some grid points have experienced severe drought, nearby grid points tend to have similar situations. Incorporating relevant information from the neighboring grid points can potentially enhance the accuracy of the prediction if appropriately integrated. Previous studies have used convolutional neural networks (CNNs) to extract spatial features from structured images, but they are not suitable for our problem, where the grid locations form an unstructured graph with irregular node arrangements. Graph neural networks (GNNs) are a recent class of neural networks designed to handle complex dependencies that exist within graph-structured data sources. With GNNs, there is greater flexibility and a broader range of representation space available to encode node and edge information from the graph, thereby facilitating more effective inference. Formally, a graph can be represented as *G**=**(V, E)*, where *V* denotes the collection of nodes and *E* represents the connections between them. In this drought prediction task, each node is a grid point. *E* is represented as a symmetric adjacency matrix, where if two grid points adjacent and otherwise. *N* is the total number of grid points. For each month, there is an associated value of for each node.

GraphSAGE is a widely used GNN model that employs node feature information to learn node embeddings via neighborhood aggregation. Unlike other techniques that rely on matrix factorization and normalization, GraphSAGE simply aggregates features from a node's local neighborhood, resulting in lower computational requirements. The model is highly adaptable since it can use features from different numbers of hops or search depths, leading to better generalization. As the adjacency matrix is sparse due to the fact that a majority of grid points have only a few neighboring points, GraphSAGE is a suitable approach for drought prediction.

*l*-th layer of GraphSAGE,where , and . is the collection of neighboring grid points for

*c*. The function that aggregates the

*l*-th layer is represented by, which can be pooling, graph convolution, or mean function. Through experiments, we find that mean aggregation is an effective method in GraphSAGE. It averages the features of each node's neighbors to obtain the aggregated feature. This allows the node to capture the overall features of the surrounding nodes. Mean aggregation is simple but powerful in learning the spatial structure of the graph. is the aggregated embedding from the bordering grid points. We concatenate with the last layer's before the transformation using . is a non-linear function.

### Temporal enhanced module

To analyze drought patterns and cycles, it is essential to extract temporal features from historical knowledge. We propose a TEM that captures the temporal dynamics of the output embedding from GNN. This module consists of two components: (1) GRU models long-term dependencies and trends over time and (2) multi-head self-attention mechanism enhances its effect by assigning a weight to each month.

#### Gated recurrent unit

#### Multi-head self-attention mechanism

In this task, the meteorological characteristics of the input months have varying degrees of influence on the prediction results. For instance, the impact of adjacent months' meteorological factors on the prediction results is likely more significant than that of non-adjacent months. Therefore, by incorporating a multi-head self-attention mechanism, the model can focus on the characteristics of important moments.

*hidden_size*represents the hidden dimensions of GRU.

The output of each attention head is concatenated and transformed linearly again to obtain the final output of the multi-head self-attention mechanism. This allows the model to attend to different aspects of the input in parallel and capture different types of information.

*Q*,

*K*, and

*V*represent the query matrix, key matrix, and value matrix, respectively, and

*h*denotes the number of attention heads utilized in the multi-head self-attention mechanism, is the output weight matrix. Each attention is defined as:where , , and are the learnable weight matrices for the

*i*-th attention head. The function of

*Attention*is defined as:where is the dimension of the key matrix

*K*.

### Drought prediction

*c*-th grid point at

*t*-th month.

*i*can be computed as follows:where

*M*is the number of samples, is the true probability distribution of the

*j*-th grid point at

*t*-th month.

## RESULTS AND ANALYSIS

### Performance metrics

To evaluate the predictive performance of each model, we utilize four evaluation metrics: precision, recall, F1 score, and accuracy. Additionally, we employ a multi-class receiver operating characteristic–area under the curve (ROC-AUC) for each classifier, providing a better illustration of their performance.

For binary classification, we use a 2 × 2 confusion matrix to show the prediction results of a classifier, and Table 3 displays true positive (TP), true negative (TN), false negative (FN), and false positive (FP).

. | Prediction outcome . | ||
---|---|---|---|

1 . | 0 . | ||

Actual value | 1 | True positive (TP) | False negative (FN) |

0 | False positive (FP) | True negative (TN) |

. | Prediction outcome . | ||
---|---|---|---|

1 . | 0 . | ||

Actual value | 1 | True positive (TP) | False negative (FN) |

0 | False positive (FP) | True negative (TN) |

Given that this study involves multiple categories, the performance metrics can be computed independently for each category. Specifically, samples from a specific category are treated as positive, while the remaining categories are considered negative. The final result is obtained by averaging the metrics across all categories. To account for the imbalanced distribution of the dataset, a macro approach is utilized to calculate the average, which does not take into account the proportion of each category in the dataset. This approach results in a greater penalty when the model performs poorly in minority categories. All results presented in this study are macro-averaged.

The performance of a classifier is typically evaluated using the AUC-ROC curve, which takes into account the true positive rate (TPR) and false positive rate (FPR) at various thresholds. A higher AUC–ROC value corresponds to better classification performance, as it indicates a superior trade-off between TPR and FPR. Therefore, this curve is an important metric for evaluating the performance of a classifier and can aid in selecting the best classifier and optimizing its parameters.

### Data preprocessing

To normalize the data and reduce errors from data magnitude differences, we use min–max scaling to map the values to the [0, 1] range. Then, we divide the data into training and testing sets: 1961–2014 for training and 2015–2018 for testing. We randomly select 20% of the data from the training set as a validation set.

### Parameter settings

To prevent overfitting, a dropout layer with a 20% rate is incorporated into the model. The Adam optimization algorithm (Kingma & Ba 2015) is utilized to iteratively update the weight parameter matrix, with a learning rate of 0.0001. We apply early stopping with a patience of 10 to stop training when the validation loss does not decrease for 10 consecutive rounds but increases instead. After multiple tests, we found that a batch size of 32 is optimal for the model. We set the time step between 5 and 15 and choose the best time step based on the prediction results of each model at each time scale. Through experiments, we set the timestep of 1, 3, 6, 12, and 24 months to 5, 8, 10, 12, and 15, respectively, that is, we use the time steps of the past 5, 8, 10, 12, and 15 months of data to predict the drought category for the next month at each scale.

### Results

To practically validate the prediction accuracy of MSTSN for drought category prediction, we compare MSTSN with classical deep learning models, including LSTM, fully convolutional network (FCN), MLSTM-FCN, and GNN-RNN. These four models are briefly introduced as follows:

LSTM (Dikshit *et al.* 2021): a variant of recurrent neural networks is designed to capture long-term dependencies in sequential data.

FCN (Wang *et al.* 2017): a neural network architecture that relies on convolutional layers to extract features from the input time series data. Subsequently, global pooling and fully connected layers are utilized for classification.

MLSTM-FCN (Karim *et al.* 2019): a hybrid architecture that integrates both LSTM and FCN designs to effectively capture both short-term and long-term temporal dependencies within time series data.

GNN-RNN (Fan *et al.* 2022): the combination of GNN and RNN is a powerful technique that synergistically blends spatio-temporal information, enabling accurate predictions that incorporate both geospatial and temporal dependencies.

To ensure a fairer comparison and demonstrate the superiority of our model, we selected the best-performing hyperparameters for each model. Table 4 shows the chosen hyperparameters for MSTSN. The detailed results of the model are presented in Table 5, which displays the performance of five models: LSTM, FCN, MLSTM-FCN, GNN-RNN, and MSTSN in predicting drought categories at different time scales.

Parameters . | Value . |
---|---|

Learning rate | 0.0001 |

Batch size | 32 |

Dropout rate | 0.2 |

Optimizer | Adam |

Regularization strategy | Early stopping |

Parameters . | Value . |
---|---|

Learning rate | 0.0001 |

Batch size | 32 |

Dropout rate | 0.2 |

Optimizer | Adam |

Regularization strategy | Early stopping |

Time scale . | Model . | Accuracy . | F1 . | Recall . | Precision . |
---|---|---|---|---|---|

1 month | LSTM | 0.573 | 0.179 | 0.222 | 0.159 |

FCN | 0.639 | 0.243 | 0.226 | 0.271 | |

MLSTM-FCN | 0.576 | 0.260 | 0.254 | 0.284 | |

GNN-RNN | 0.604 | 0.265 | 0.260 | 0.296 | |

MSTSN | 0.655 | 0.289 | 0.279 | 0.311 | |

3 months | LSTM | 0.658 | 0.468 | 0.205 | 0.242 |

FCN | 0.649 | 0.473 | 0.469 | 0.478 | |

MLSTM-FCN | 0.610 | 0.196 | 0.210 | 0.199 | |

GNN-RNN | 0.654 | 0.460 | 0.386 | 0.387 | |

MSTSN | 0.661 | 0.449 | 0.425 | 0.494 | |

6 months | LSTM | 0.801 | 0.296 | 0.333 | 0.267 |

FCN | 0.743 | 0.417 | 0.424 | 0.471 | |

MLSTM-FCN | 0.721 | 0.606 | 0.625 | 0.614 | |

GNN-RNN | 0.734 | 0.446 | 0.517 | 0.468 | |

MSTSN | 0.871 | 0.687 | 0.710 | 0.668 | |

12 months | LSTM | 0.716 | 0.299 | 0.311 | 0.296 |

FCN | 0.761 | 0.534 | 0.522 | 0.553 | |

MLSTM-FCN | 0.736 | 0.509 | 0.495 | 0.527 | |

GNN-RNN | 0.816 | 0.652 | 0.612 | 0.647 | |

MSTSN | 0.847 | 0.709 | 0.708 | 0.721 | |

24 months | LSTM | 0.703 | 0.478 | 0.478 | 0.499 |

FCN | 0.768 | 0.718 | 0.735 | 0.723 | |

MLSTM-FCN | 0.738 | 0.554 | 0.431 | 0.831 | |

GNN-RNN | 0.842 | 0.766 | 0.775 | 0.834 | |

MSTSN | 0.870 | 0.803 | 0.768 | 0.843 |

Time scale . | Model . | Accuracy . | F1 . | Recall . | Precision . |
---|---|---|---|---|---|

1 month | LSTM | 0.573 | 0.179 | 0.222 | 0.159 |

FCN | 0.639 | 0.243 | 0.226 | 0.271 | |

MLSTM-FCN | 0.576 | 0.260 | 0.254 | 0.284 | |

GNN-RNN | 0.604 | 0.265 | 0.260 | 0.296 | |

MSTSN | 0.655 | 0.289 | 0.279 | 0.311 | |

3 months | LSTM | 0.658 | 0.468 | 0.205 | 0.242 |

FCN | 0.649 | 0.473 | 0.469 | 0.478 | |

MLSTM-FCN | 0.610 | 0.196 | 0.210 | 0.199 | |

GNN-RNN | 0.654 | 0.460 | 0.386 | 0.387 | |

MSTSN | 0.661 | 0.449 | 0.425 | 0.494 | |

6 months | LSTM | 0.801 | 0.296 | 0.333 | 0.267 |

FCN | 0.743 | 0.417 | 0.424 | 0.471 | |

MLSTM-FCN | 0.721 | 0.606 | 0.625 | 0.614 | |

GNN-RNN | 0.734 | 0.446 | 0.517 | 0.468 | |

MSTSN | 0.871 | 0.687 | 0.710 | 0.668 | |

12 months | LSTM | 0.716 | 0.299 | 0.311 | 0.296 |

FCN | 0.761 | 0.534 | 0.522 | 0.553 | |

MLSTM-FCN | 0.736 | 0.509 | 0.495 | 0.527 | |

GNN-RNN | 0.816 | 0.652 | 0.612 | 0.647 | |

MSTSN | 0.847 | 0.709 | 0.708 | 0.721 | |

24 months | LSTM | 0.703 | 0.478 | 0.478 | 0.499 |

FCN | 0.768 | 0.718 | 0.735 | 0.723 | |

MLSTM-FCN | 0.738 | 0.554 | 0.431 | 0.831 | |

GNN-RNN | 0.842 | 0.766 | 0.775 | 0.834 | |

MSTSN | 0.870 | 0.803 | 0.768 | 0.843 |

In performance comparisons of multiple algorithms, the best performing results are shown in bold.

From the perspective of accuracy, MSTSN performs the best at all five time scales, followed by GNN-RNN, LSTM, and FCN, while MLSTM-FCN has the lowest accuracy. For example, at the 6-month time scale, MSTSN's accuracy is 15% higher than MLSTM-FCN's. The use of graph neural networks to extract spatial characteristics, as seen in MSTSN and GNN-RNN, proved to be more accurate than methods that only consider temporal characteristics. The accuracy rate of MSTSN is higher than that of GNN-RNN, which can be attributed to the effectiveness of the TEM module. This module uses GRU to extract the long-term dependencies of data and enhance its effect by assigning a weight to each month through multi-head self-attention. From the perspective of F1 score, our model scores the highest, except for the FCN model, which performs slightly better than MSTSN at the 3-month time scale. A high macro-F1 score means the model can classify drought categories accurately and fairly without bias toward the majority class. Therefore, MSTSN is superior to other methods, particularly for small-scale, high-intensity droughts.

From the overall perspective, the five models perform worst at the 1-month time scale. However, as the time scale increased, the predictive accuracy of the five models improved. This improvement may be attributed to the reduction in data and the tendency of data sequences to become more stable with increased time scales, leading to better predictive performance.

### Sensitivity analysis

### Error analysis

### Ablation experiment

To evaluate the contributions of the SAM module and the components of the TEM Module: GRU layer, and multi-head self-attention layer to our model's performance on drought prediction, we perform ablation experiments on five time scales. Table 6 shows the average evaluation metrics for each ablation setting. ‘MSTSN-SAM’ means removing the SAM module, while ‘MSTSN-TEM-GRU’ and ‘MSTSN-TEM-ATT’ mean removing the corresponding layer from the TEM block.

Model . | Accuracy . | F1 . | Recall . | Precision . |
---|---|---|---|---|

MSTSN | 0.780 | 0.587 | 0.578 | 0.607 |

MSTSN-SAM | 0.742 | 0.547 | 0.535 | 0.565 |

MSTSN-TEM-GRU | 0.754 | 0.554 | 0.554 | 0.564 |

MSTSN-TEM-ATT | 0.762 | 0.549 | 0.546 | 0.567 |

Model . | Accuracy . | F1 . | Recall . | Precision . |
---|---|---|---|---|

MSTSN | 0.780 | 0.587 | 0.578 | 0.607 |

MSTSN-SAM | 0.742 | 0.547 | 0.535 | 0.565 |

MSTSN-TEM-GRU | 0.754 | 0.554 | 0.554 | 0.564 |

MSTSN-TEM-ATT | 0.762 | 0.549 | 0.546 | 0.567 |

In performance comparisons of multiple algorithms, the best performing results are shown in bold.

Table 6 shows that MSTSN outperforms the model without the SAM module, which confirms the correlation of drought conditions among adjacent grid points and the effectiveness of graph neural networks for spatial feature extraction. By handling long-term dependencies and capturing the temporal relationship with the GRU layer, the model can better predict the future drought development trend based on the historical drought situation, which further improves the accuracy by 2.6%. Since the data of each month have different impacts on drought, the multi-head self-attention layer assigns a weight to each month, allowing the model to focus on key months, which improves the accuracy by 1.8%. Generally, SAM has the greatest impact on the model. Removing the SAM module reduces the model's accuracy by 3.8%, which shows that introducing graph neural networks has an important influence on drought prediction.

### Visualization of results

From Figure 8, we can see that MSTSN has the closest prediction to the actual spatial distribution of drought and can predict the approximate time and range of drought, followed by GNN-RNN. LSTM fails to predict different categories of drought and predicts all drought categories as 0, which proves that it cannot predict categories with small sample sizes well. FCN and MLSTM-FCN have lower prediction accuracy, especially in predicting the first half of 2015 and 2018. When small-scale drought occurs, all five models have some deviations.

## DISCUSSIONS AND CONCLUSIONS

In this study, we have developed a novel MSTSN for predicting drought categories. The model consists of two modules: the SAM and the TEM. The SAM module aggregates information from neighboring grid points to incorporate spatial information and generate a new feature matrix. This matrix is then inputted into the TEM module, which extracts temporal features. Finally, the extracted features are fed into a softmax layer to obtain the final drought category predictions.

We compare MSTSN with common deep learning models and find that MSTSN has the highest prediction accuracy and F1 score at most time scales, indicating its ability to predict small-scale and high-intensity droughts. Additionally, we discover a positive correlation between the drought category and the SPEI time scale. Ablation experiments are conducted to analyze the contributions of each module. The results show that the SAM module improves the model's performance by leveraging graph neural networks to extract spatial features. The TEM module captures long-term dependencies through GRU and enhances its effectiveness by incorporating multi-head self-attention with weighted monthly assignments.

Although this model demonstrates improved accuracy in drought prediction by considering the correlation and temporal dependence between adjacent regions, it has some limitations. The study solely relies on distance to determine adjacency between grid points, disregarding important geographical features. Additionally, the GRU model used struggles to capture long-term temporal dependencies, leading to prediction errors in long-term tasks and an inability to accurately reflect real change trends.

In summary, this study proposes a novel drought category prediction model, MSTSN, and showcases its effectiveness and superiority through extensive experiments. By introducing a GNN and considering geographical knowledge, the model surpasses the limitations of treating each area as independent, resulting in improved prediction accuracy. This has significant implications for industries, such as agriculture, water resource management, and climate monitoring, providing more accurate drought monitoring and forecasting information for decision-makers.

However, future research should incorporate additional factors that influence drought, such as human factors, terrain, and other natural factors. Moreover, geographic features can be utilized to determine adjacency between grid points more accurately. To predict long-term drought conditions, advanced deep learning methods like transformers can be explored to capture temporal dependency relationships effectively.

## ACKNOWLEDGEMENTS

This work was supported in part by the National Key Research and Development Program of China (No.2021YFE014400). This work was supported in part by the National Natural Science Foundation of China (No.62102187). This work was supported in part by the Science and Technology Development Fund of Egypt (No.43088).

GB/T20481-2017, National Standard of the People's Republic of China

## CODE AVAILABILITY SECTION

Name of the code/library: MSTSN

Contact: e-mail and phone number

Hardware requirements: CPU: Intel Xeon Gold 6330; GPU: NVIDIA GeForce RTX A5000; Memory: 30 G.

Program language: Python 3.7

Software required: Pycharm, Anaconda3, Pandas, Numpy

Program size: 37KB

The source codes are available for download at the link: https://github.com/nuist-yjx/MSTSN.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories: https://crudata.uea.ac.uk/cru/data/hrg/.

## CONFLICT OF INTEREST

The authors declare there is no conflict.