## Abstract

Drought forecasting, which can enable contingency actions to be implemented in advance of a drought, plays a significant role in reducing the risks and impacts of drought. In this study, a simulation framework of the occurrence probability of drought events based on a nested copula function and Gibbs sampling is proposed to effectively compensate for the high-dimensional problems and lack of initial data in traditional methods. Precipitation data of 718 meteorological stations from 1961 to 2018 in China was analyzed. The results showed that the occurrence location of drought events was mainly concentrated from 35° to 42° north latitude and 105° to 120° east longitude, with the occurrence time mainly concentrated from September to November. The Archimedean-copula function, constructed based on latitude, longitude, and occurrence time, could precisely determine the spatiotemporal joint probability distribution of drought events (RMSE: 0.01). The optimal time-varying nested Archimedean-copula functions were obtained from February to May (spring), June to September (summer) and October to January (autumn and winter). Compared with the nested Archimedean-copula function, the accuracy of Gibbs sampling and simulation based on the time-varying nested Archimedean-copula function was increased by 84.05% latitude and 69.76% longitude. The results provide an effective means for scientific drought forecasting, and water resource management departments can take preventive measures at an early stage.

## HIGHLIGHTS

This is the first to attempt to propose the quantitative framework of drought forecast based on run analysis theory, Archimedean-copula function, and Gibbs sampling.

The optimal time-varying nested Archimedean-copula functions are constructed.

Drought prediction accuracy is improved.

## INTRODUCTION

Water resources, as one of the most basic conditions for human survival and development, are essential natural and strategic resources that maintain the functions of natural ecosystems and support the evolution of socioeconomic systems (Lai *et al.* 2018; Yang *et al.* 2021). Currently, many countries and regions in the world are facing water resources issues, such as water resource scarcity (Zhao *et al.* 2020a, 2020b), water environment pollution (Zinatloo-Ajabshir *et al.* 2021a, 2021c) and water ecological destruction (Zinatloo-Ajabshir *et al.* 2021b). The problems of China are mainly manifested by fewer per capita water resources, uneven spatiotemporal distribution, serious pollution, low utilization efficiency, and inconsistency in regional economic development (Qian *et al.* 2014). To ensure the sustainable use of water resources in the future, China has adopted a series of safeguard measures in recent years, such as the South-to-North Water Diversion Project and the ‘three red lines’ of water resources (Zhang 2009; Zang *et al.* 2016). However, due to the scarcity, circulation, and uncertainty of water resources, the water scarcity risk always exists (Wang *et al.* 2018). Moreover, water scarcity in China has tended to increase with the rapid development of the economy and society (Hoekstra 2014). Reducing the population affected by water scarcity is one of the 17 Sustainable Development Goals identified by the United Nations, and is also a significant agenda for China (Sun *et al.* 2019).

It is generally believed that water scarcity is guided by the imbalance of water supply and water demand. The water demand is mainly determined by the level of socioeconomic development and the size of the population, and it is relatively stable over a period of time and is less likely to undergo dramatic changes (Qian *et al.* 2014). Thus, the water supply amount has become the dominant factor in the occurrence of regional water scarcity. The total amount of regional water resources is represented by surface and underground water yields, that is, the sum of surface runoff and precipitation infiltration recharge (Wan *et al.* 2016). Furthermore, droughts can be divided into four categories: meteorological drought (Duan *et al.* 2014), hydrological drought (Li *et al.* 2020), agricultural drought, and socio-economic drought (Mishra & Singh 2010, 2011). Drought is a natural phenomenon where rainfall in a certain area has been reduced for a period of time, thus affecting society, economy, and environment. In addition, drought is a normal cyclical climatic event characterized by many complex factors, related not only to rainfall, but also to topography, soil properties, and water facilities (Chang 1991). Drought events, as represented by meteorological drought, are mainly focused on precipitation, and are interrelated hydrological events, such as drought duration, drought intensity and drought severity (Wong *et al.* 2013; Choi *et al.* 2018). Generally, drought has become a major problem in regional water scarcity research. At present, the research direction of drought events can be summarized into three categories: (a) the identification, classification, and real-time monitoring of drought events based on GIS and remote sensing (Liu & Kogan 1996; Gao *et al.* 2014); (b) the construction of drought event indicator systems and risk based on the theory of disaster systems (Mishra & Singh 2010, 2011); (c) the probability modeling and evaluation of drought events based on mathematical statistics and multiple interdisciplinary studies (Chung & Salas 2000; Shiau 2006). To reduce losses and avoid the adverse consequences caused by drought events, it is urgent to start research from the identification and the occurrence probability of drought events.

In recent years, with the development of statistical methods and machine learning technology, a variety of studies have employed a number of forecasting methods, such as Markov chain (Lohani & Loganathan 1997), time series analysis (Mishra & Desai 2005), support vector machine (Ganguli & Reddy 2014), and artificial neural network (Araghi *et al.* 2017) to establish drought forecast models. However, these methods are limited to univariate analysis of drought characteristics. For drought events that include multiple related variables, univariate analysis is limited in its ability to reflect the joint distribution of multiple variables. In general, dynamical forecasts based on the copula model are a promising tool and have been increasingly used for drought forecasting globally because they can flexibly perform joint probability analysis of multiple variables (Qian *et al.* 2018).

At present, the copula method has been widely used in drought studies due to its ability to address uncertainties in multifactor joints, conditional probability analysis, and recurrence period analysis (Dupuis 2007; Genest & Favre 2007; Qian *et al.* 2018). Copula theory, which originated from financial risk analysis, was first proposed as an important tool for constructing multivariate joint distributions (Genest *et al.* 2007). In fact, it is a function that connects the joint distribution with the edge distribution, which is also called a connection function (Ma *et al.* 2013). The copula method was first used to analyze hydrological problems caused by drought in 2006. In recent years, many scholars have used bivariate and trivariate copula functions to analyze precipitation in northern Iran (Mirabbasi *et al.* 2012), India (Ganguli & Reddy 2012), peninsular Malaysia (Yusof *et al.* 2013), and Texas (Reddy & Singh 2014); Chinese scholars have also established a bivariate copula function model of drought duration and drought intensity to study Dujiangyan (Li *et al.* 2016), southwest China (Zuo *et al.* 2014), and the Aksu River Basin (Zhang *et al.* 2014). Although the copula model has unique advantages and extensive applications in the field of drought forecasting, there are still two shortcomings (Qian *et al.* 2018). First, the traditional copula model is more complicated when dealing with high-dimensional problems. Second, drought forecasting based on the copula model cannot compensate for the lack of initial data, which affects forecast accuracy. This scenario requires a combination of the copula model and other approaches to improve the forecast effect. Nested function and Gibbs sampling provide an effective research tool for resolving the above problems.

In this paper, a simulation framework that can determine the occurrence probability of drought events was proposed. The core of the simulation framework is run analysis theory, the time-varying Archimedean-copula function model, and Gibbs sampling. As an example, China was selected to assess serious water resource situations based on precipitation data of 718 meteorological stations from 1961 to 2018. Generally, the main objectives of this study are as follows: (1) to identify drought events; (2) to obtain a spatially adaptive time-varying copula joint distribution; and (3) to study the nationwide regular patterns of drought events. This study will provide a reference for researchers and decision-makers to guide water resource management and agricultural development.

## MATERIALS AND METHODS

### Study area and data

China has significant regional differences in natural conditions, which determine the spatial characteristics of meteorological elements. Precipitation is a key factor affecting regional agricultural systems, and the features in China show an uneven distribution, where the north is drier and colder than the south (Wang *et al.* 2019). A total of 718 meteorological stations, with daily precipitation data for a period from 1961 to 2018, were provided by the National Meteorological Information Center of China (Figure 1) (http://data.cma.cn/). Precipitation data were homogenized, and missing values were interpolated from ten adjacent stations (Shahid *et al.* 2018; Shahid & Rahman 2021).

### Run analysis theory

Run analysis theory, as a common method of identifying high and low precipitation levels, is a type of time series method (Figure 2). It refers to different events before and after the same type of events that occur continuously, such as alternate occurrences of wetness and dryness, and flow changes of river runoff. If the annual precipitation data are regarded as a discrete sequence, the multiyear average precipitation is selected as the standard quantity. When , the variation is positive and belongs to a wet year; when , the variation is negative and belongs to a dry year. When () occurs continuously, this indicates a continuous wet year (continuous dry year), which is called a positive run (negative run), and the continuous time is the length of run. In practical research, the wet years and the dry years are analyzed according to the classification criteria, and the frequency of continuous wet years and continuous dry years are counted. Additionally, the probability of continuous high and low precipitation in each meteorological station is calculated based on run analysis theory. This is of great significance for taking preventive measures in advance to reduce the losses caused by risk events.

Generally, the application of run analysis theory to identify drought events is roughly divided into four steps:

- (1)
Deciding on the nature of water scarcity: Water scarcity research relies on analytical factors, such as runoff, precipitation, and soil moisture. Considering the definition of water scarcity and the difficulty of data acquisition, this study selected the precipitation data of 718 meteorological stations in China from 1961 to 2018.

- (2)
Determining the time step: The time step can be hours, days, months, seasons, years, etc. Considering that a drought event is a natural phenomenon with slow development and long duration, months were used as the time step in this study.

- (3)
Establishing the truncation level: The truncation level is mainly used to distinguish different events in historical data, which reflects the demand for incoming water. It can be a constant, a function that changes over time, or a certain percentage of runoff and precipitation; 50% of monthly average precipitation was selected as the truncation level in this study.

- (4)
Calculating the water scarcity degree: If

*S*is the water scarcity degree,*M*is the water scarcity severity (run-severity), and*D*is the water scarcity duration (run-length), then . This formula describes the relationship among the three basic characteristics of drought events: the greater the absolute value of*S*(negative value), the more severe the drought.

### Copula method

#### Nested Archimedean-copula function

The Archimedean-copula function is a class of copula functions with simple form and good combination. Its stable properties and simple construction have led to its extensive use (Salvadori & De Michele 2004; Serinaldi & Grimaldi 2007). The high-dimensional Archimedean-copula function has a variety of constructors, of which the nested method is widely used (Müller & Scarsini 2005; Serinaldi & Grimaldi 2007). For three random variables of the three dimensions *X*, *Y*, and *Z*, the joint distribution can be expressed as , where is the joint distribution copula function of *X* and *Y*, and is the joint distribution copula function of and *Z*, that is, is the three-dimensional joint distribution function of *X*, *Y*, and *Z*.

According to the associativity of the Archimedean-copula function (Salvadori & De Michele 2007), the three-dimensional copula functions and are equivalent. The commonly used typical Archimedean-copula functions are the Clayton copula, Frank copula, and Gumbel copula.

#### Selection of Archimedean-copula function

This study selected the best-fitting copula on the basis of the goodness of fit. To compare the goodness of fit of the five copula functions, the average error (AE), root mean square error (RMSE), Akaike information criterion (AIC), and Bayesian information criterion (BIC) were obtained.

### Gibbs sampling

Gibbs sampling is a series of samplings generated by the joint distribution of two or more random variables, whose greatest advantage is that only the conditional distribution of a single variable is considered. Then, the dimensionality reduction of high-dimensional problems is solved. Compared with other sampling methods, Gibbs sampling has less autocorrelation and better convergence, and is suitable for samples with data loss. The main idea of Gibbs sampling is to set the edge distribution of two-dimensional variables, examine the conditional distribution , and simplify the calculation with a joint density function . Generally, the application of Gibbs sampling is roughly divided into four steps: (1) initialize and calculate by the ; (2) update by ; (3) repeat *k* times to obtain a Gibbs sequence of length *k*; (4) if the Gibbs sequence converges to a stationary distribution, then each value in the sequence is independent of the initial value. Then, this stationary distribution is the target distribution that needs to be simulated.

### Technology roadmap

In summary, to determine the occurrence probability of drought events, a model-based simulation framework is proposed (Figure 3). This framework mainly includes drought identification based on run analysis, joint probability distribution of drought events based on the copula method, and sampling and simulation based on Gibbs sampling.

## RESULTS

### Drought event identification

Based on the precipitation data of 718 meteorological stations in China from 1961 to 2018, the distribution of drought events in latitude and longitude was identified by run analysis theory. According to the statistical results (Figure 4), the occurrence of drought events was mainly concentrated from 35° to 42° north latitude (frequency/class width >0.04) and 105° to 120° east longitude (frequency/class width >0.025). Additionally, the occurrence time of drought events was mainly concentrated from September to November (frequency/class width >0.15).

### Three-dimensional nested Archimedean-copula function construction

To simulate the spatiotemporal distribution of drought events, a nested Archimedean-copula function of drought events was obtained as follows: (1) estimation of edge probability distribution; (2) Archimedean-copula function construction of location; (3) nested Archimedean-copula function construction of location and time.

Through the nonparametric estimation of the probability distribution of latitude, longitude and time, the cumulative probability distribution was procured (Figure 5). Moreover, the K-S method was used to test the rationality. The *p*-values of the K-S test are 0.2334 (latitude *X*), 0.3502 (longitude *Y*) and 0.1254 (time *Z*), respectively, which are all greater than 0.05. This shows that the three empirical distribution functions of latitude, longitude and time can be used to estimate the edge distribution of the original variable under a significance level of 95%.

Furthermore, the joint probability distribution of location was constructed based on the Archimedean-copula function, which was fitted by the three Clayton, Frank and Gumbel functions. Finally, the Gumbel copula has the minimum AE, RMSE, AIC, and BIC, indicating that it had the best goodness of fit (Table 1), and the Gumbel copula with the minimum sum of squared deviation was selected as the joint function (Figure 6).

Functions . | Clayton . | Frank . | Gumbel . |
---|---|---|---|

Parameter estimation θ | − 0.1928 | 0.4904 | 1.1258 |

AE (10^{−2}) | 2.1509 | 2.1241 | 2.1035 |

RMSE (10^{−1}) | 1.4962 | 1.4824 | 1.4231 |

AIC | − 1.0154 | − 1.0175 | − 1.0187 |

BIC | − 1.0142 | − 1.0021 | − 1.0154 |

Functions . | Clayton . | Frank . | Gumbel . |
---|---|---|---|

Parameter estimation θ | − 0.1928 | 0.4904 | 1.1258 |

AE (10^{−2}) | 2.1509 | 2.1241 | 2.1035 |

RMSE (10^{−1}) | 1.4962 | 1.4824 | 1.4231 |

AIC | − 1.0154 | − 1.0175 | − 1.0187 |

BIC | − 1.0142 | − 1.0021 | − 1.0154 |

Additionally, the spatiotemporal joint probability distribution of was constructed. Finally, the Gumbel copula has the minimum AE, RMSE, AIC, and BIC, indicating that it had the best goodness of fit (Table 2), and the Gumbel copula with the minimum sum of squared deviation was selected as the joint function (Figure 7).

Functions . | Clayton . | Frank . | Gumbel . |
---|---|---|---|

Parameter estimation θ | −0.0774 | −0.4664 | 1.0002 |

AE (10^{−2}) | 2.3268 | 2.4502 | 1.9365 |

RMSE (10^{−1}) | 1.5674 | 1.3268 | 1.2982 |

AIC | −1.3254 | −1.3689 | −1.3761 |

BIC | −1.3268 | −1.2645 | −1.3892 |

Functions . | Clayton . | Frank . | Gumbel . |
---|---|---|---|

Parameter estimation θ | −0.0774 | −0.4664 | 1.0002 |

AE (10^{−2}) | 2.3268 | 2.4502 | 1.9365 |

RMSE (10^{−1}) | 1.5674 | 1.3268 | 1.2982 |

AIC | −1.3254 | −1.3689 | −1.3761 |

BIC | −1.3268 | −1.2645 | −1.3892 |

Through the Monte Carlo GoF test, the test statistics p (0.1883) were obtained. The actual joint empirical distribution and the established nested two-dimensional Archimedean-copula joint distribution had a very high degree of coincidence (RMSE: 0.01) (Figure 8). It can be concluded that the joint probability distribution of time and location by using the nested *Archimedean copula* is scientific and reasonable, and the Archimedean-copula function can accurately determine the spatiotemporal joint probability distribution of drought events.

### Gibbs sampling based on nested Archimedean-copula function

Based on the Gumbel copula model constructed in 3.2, Gibbs sampling of the same sample size was performed in the observed samples. The longitude and latitude of future drought events were simulated (Figure 9).

Furthermore, based on the Gibbs sampling simulation results, the average longitude and latitude of the drought events in each month were calculated according to the different occurrence times of each drought event in the simulation results. Comparing the average latitude and longitude of the simulated sample and actual sample with the monthly trend (Figure 10), it is found that there is a great difference between the actual curve and the predicted curve, indicating that there is still a large gap between the fitted models and the real model (RMSE: 3.26 (latitude), 2.48 (longitude)). Therefore, the nested Archimedean-copula function established above still has large room for improvement. Finally, to improve the accuracy of fitting and prediction, this study tried to establish a time-varying nested Archimedean-copula function model according to the different categories of drought events.

### Time-varying nested Archimedean-copula function construction

According to the trend of the average longitude and latitude of drought events with the change of months (solid line in Figure 10), drought events can be divided into three categories:

- (1)
Category A: In February, March, April and May (spring), the latitude of the drought events decreased, while the longitude decreased first and then increased.

- (2)
Category B: In June, July, August and September (summer), the latitude and longitude of the drought events showed a trend of turbulence.

- (3)
Category C: In October, November, December and January (autumn and winter), the latitude and longitude of drought events increased.

The Archimedean-copula models are constructed for the three categories A, B and C, and the optimal functions can be found in the following (Table 3):

- (1)
The optimal joint distribution function of latitude and longitude of category A is a Clayton copula, and the optimal joint distribution function of time and location is a Gumbel copula.

- (2)
The optimal joint distribution function of latitude and longitude of category B is a Frank copula, and the optimal joint distribution function of time and location is a Gumbel copula.

- (3)
The optimal joint distribution function of latitude and longitude of category C is a Gumbel copula, and the optimal joint distribution function of time and location is a Gumbel copula.

Joint distribution function . | Clayton . | Frank . | Gumbel . | ||
---|---|---|---|---|---|

Category A | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | 0.36727 | 1.0974351 |

AE (10^{−2}) | 2.3164 | 2.4562 | 2.8934 | ||

RMSE (10^{−1}) | 1.3267 | 1.6354 | 1.9835 | ||

AIC | −1.5627 | −1.3694 | −1.2563 | ||

BIC | −1.3649 | −1.2694 | −1.3182 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −2.00139 | 1.0000014 | |

AE (10^{−2}) | 2.2364 | 2.2694 | 1.6497 | ||

RMSE (10^{−1}) | 1.2643 | 1.2973 | 1.2394 | ||

AIC | −1.4524 | −1.3294 | −1.5329 | ||

BIC | −1.2643 | −1.5329 | −1.5764 | ||

Category B | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | −0.56059 | 1.0754822 |

AE (10^{−2}) | 2.6435 | 2.3294 | 2.3497 | ||

RMSE (10^{−1}) | 1.6584 | 1.2031 | 1.6495 | ||

AIC | −1.3649 | −1.4697 | −1.3964 | ||

BIC | −1.2346 | −1.3092 | −1.2396 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −1.55987 | 1.0000014 | |

AE (10^{−2}) | 2.5673 | 2.5634 | 1.2534 | ||

RMSE (10^{−1}) | 1.5693 | 1.1249 | 1.0139 | ||

AIC | −1.8637 | −1.2694 | −1.9346 | ||

BIC | −1.2039 | −1.1647 | −1.3237 | ||

Category C | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | 1.32744 | 1.1904942 |

AE (10^{−2}) | 1.9643 | 1.8643 | 1.5697 | ||

RMSE (10^{−1}) | 1.2356 | 1.7346 | 1.0248 | ||

AIC | −1.3465 | −1.8643 | −2.0349 | ||

BIC | −1.6349 | −1.2349 | −1.9403 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −1.08826 | 1.0000014 | |

AE (10^{−2}) | 2.2356 | 2.4967 | 1.8649 | ||

RMSE (10^{−1}) | 2.0619 | 2.9631 | 1.9873 | ||

AIC | −1.2346 | −1.3790 | −1.5037 | ||

BIC | −1.5319 | −1.5916 | −1.9127 |

Joint distribution function . | Clayton . | Frank . | Gumbel . | ||
---|---|---|---|---|---|

Category A | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | 0.36727 | 1.0974351 |

AE (10^{−2}) | 2.3164 | 2.4562 | 2.8934 | ||

RMSE (10^{−1}) | 1.3267 | 1.6354 | 1.9835 | ||

AIC | −1.5627 | −1.3694 | −1.2563 | ||

BIC | −1.3649 | −1.2694 | −1.3182 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −2.00139 | 1.0000014 | |

AE (10^{−2}) | 2.2364 | 2.2694 | 1.6497 | ||

RMSE (10^{−1}) | 1.2643 | 1.2973 | 1.2394 | ||

AIC | −1.4524 | −1.3294 | −1.5329 | ||

BIC | −1.2643 | −1.5329 | −1.5764 | ||

Category B | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | −0.56059 | 1.0754822 |

AE (10^{−2}) | 2.6435 | 2.3294 | 2.3497 | ||

RMSE (10^{−1}) | 1.6584 | 1.2031 | 1.6495 | ||

AIC | −1.3649 | −1.4697 | −1.3964 | ||

BIC | −1.2346 | −1.3092 | −1.2396 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −1.55987 | 1.0000014 | |

AE (10^{−2}) | 2.5673 | 2.5634 | 1.2534 | ||

RMSE (10^{−1}) | 1.5693 | 1.1249 | 1.0139 | ||

AIC | −1.8637 | −1.2694 | −1.9346 | ||

BIC | −1.2039 | −1.1647 | −1.3237 | ||

Category C | Latitude and longitude (C_{1}) | Parameter estimation θ | 1.45 × 10^{−6} | 1.32744 | 1.1904942 |

AE (10^{−2}) | 1.9643 | 1.8643 | 1.5697 | ||

RMSE (10^{−1}) | 1.2356 | 1.7346 | 1.0248 | ||

AIC | −1.3465 | −1.8643 | −2.0349 | ||

BIC | −1.6349 | −1.2349 | −1.9403 | ||

Time and location (C_{2}) | Parameter estimation θ | 1.45 × 10^{−6} | −1.08826 | 1.0000014 | |

AE (10^{−2}) | 2.2356 | 2.4967 | 1.8649 | ||

RMSE (10^{−1}) | 2.0619 | 2.9631 | 1.9873 | ||

AIC | −1.2346 | −1.3790 | −1.5037 | ||

BIC | −1.5319 | −1.5916 | −1.9127 |

In addition, as shown in Table 4, the three categories of time-varying nested *Archimedean copula* all passed the Monte Carlo GoF test. Therefore, different time periods only affect the joint distribution function of latitude and longitude of drought events (*C*_{1}) but have no effect on the joint distribution function of time and location of drought events (*C*_{2}); that is, the joint distribution function of time and location of drought events (*C*_{2}) is fixed at different times of the year. Moreover, the parameters *θ* of the joint distribution function *C*_{2} have not changed, suggesting that drought will occur at the same time for a certain region each year. Compared with the non-time-varying nested Archimedean-copula model, the fitting effect of the joint distribution of drought events obtained by the three categories of A, B and C has been greatly improved.

Time-varying nested Archimedean copula . | p
. |
---|---|

From February to May (spring) | 0.2479 |

From June to September (summer) | 0.2175 |

From October to January (autumn and winter) | 0.3721 |

Time-varying nested Archimedean copula . | p
. |
---|---|

From February to May (spring) | 0.2479 |

From June to September (summer) | 0.2175 |

From October to January (autumn and winter) | 0.3721 |

### Gibbs sampling based on the time-varying nested Archimedean-copula function

Based on the principle of Gibbs sampling and the Archimedean-copula function of three time periods, this study simulated the spatiotemporal distribution of future drought events according to the occurrence times of the drought events from February to May (spring), from June to September (summer), and from October to January (autumn and winter). Finally, the simulated sample distribution scatters of the three time periods were obtained with the Gibbs sampling simulation results (Figure 11).

Comparing the average latitude and longitude of the simulated sample and actual sample with the monthly trend (Figure 12), it is found that the coincidence degree is high (RMSE: 0.52 (latitude), 0.75 (longitude)). This also proved the reliability of the established nested Archimedean-copula function as a predictive model. Additionally, it is verified that the time-varying nested Archimedean-copula function model constructed by observed samples at different time periods is much better than the whole nested Archimedean-copula function model. That is, it is reasonable and necessary to carry out classification modeling analysis according to the different time periods of drought events.

## DISCUSSION

Since the first publication on hydrological problems caused by drought based on the copula method, numerous studies have confirmed that the copula model has a unique advantage both in China and abroad on multivariate distributions of drought characteristics, such as the intensity, duration, frequency, and recurrence. China is a vast area with different climatic regions, and it is not surprising that the spatiotemporal patterns of drought events can have regional characteristics.

### The relationship between drought and precipitation

Generally, drought refers to a long-term meteorological condition of minimal rain or absence of rain, which makes soil water insufficient and induces crop water imbalance. The definition of drought usually depends on the selected influencing factors; thus, droughts can be divided into four categories: meteorological drought (Duan *et al.* 2014), hydrological drought (Dracup *et al.* 1980), agricultural drought, and socioeconomic drought (Figure 13) (Mishra & Singh 2010, 2011). Different types of drought have different research priorities. Meteorological drought refers to a water scarcity caused by the imbalance of evaporation and precipitation, and water expenditure is greater than water income. Agricultural drought refers to the water deficit caused by the continuous lack of soil moisture during the growing period of crops. Hydrological drought refers to the imbalance of surface water or groundwater in a certain period due to the long-term shortage of precipitation, which reduces river flows, lake water levels, reservoir water storage and so on. Socioeconomic drought is an abnormal water deficit caused by the imbalance between water supply and water demand in the natural system and the human socioeconomic system. For these four types of droughts, meteorological drought is a natural phenomenon characterized by the reduction of precipitation, while agricultural drought, hydrological drought and socioeconomic drought are more concerned with human and social aspects (Adarsh *et al.* 2019). In general, meteorological drought is the basis for the three other types of drought (Figure 13). Therefore, this study mainly uses precipitation to study meteorological drought. This is consistent with Fan *et al.* (2017), who noted the similarity coefficient between drought probability and low precipitation probability in China.

### Identification of drought events

Run analysis theory can directly start from the simple statistical run phenomenon and reveal the probability of the occurrence of the run phenomenon, which is an effective method to study drought. Therefore, the application of run analysis theory in the field of drought has become increasingly popular in recent years, and many scholars have used the theory to analyze drought events in the northern regions of south Africa (Herbst *et al.* 1966), Beijing (Fan *et al.* 2017) and the Zhangweinan Canal Basin (Bai *et al.* 2018). Although the direct analysis of precipitation data has been presented as research has progressed, the Standardized Precipitation Index (SPI), Palmer Drought Severity Index (PDSI), and Standardized Precipitation Evaporation Index (SPEI) are the most commonly used and effective drought indices (Figure 13) (Heim 2002; Bothe *et al.* 2010; Hernandez & Uddameri 2014). The SPI is an index used to measure the degree of drought and flooding, and it is mainly based on the *Γ* distribution of precipitation, which is obtained by calculating the cumulative frequency of precipitation after normalization (Nikravesh *et al.* 2020). The PDSI is based on a simple two-layer soil moisture budget model that considers the current soil moisture supply (precipitation) and demand (evaporation). The SPEI is mainly used to detect drought distribution based on the SPI and can reflect the duration and accumulation of drought. The SPI calculation is convenient because only precipitation is considered, and the PDSI is usually used because of its wide perspective, whereas the SPEI emphasizes the effect of warming on drought (Hayes *et al.* 1999; Dai *et al.* 2004; Vicente-Serrano *et al.* 2010). In general, the precipitation index of the SPI, PDSI and SPEI is used to identify drought events based on run analysis theory and is also a possible research direction at present. Moreover, there are still some shortcomings in the study of drought events. For the law of drought change, many scholars have extracted the variables of drought duration and drought severity. They have paid less attention to the study of drought areas and determined the probability of extreme drought events. This study proposed a simulation framework that can analyze the joint probability distribution of drought events. However, the study of drought areas has not been considered.

### Thoughts on the study of drought events in China

There are two main geographical features in China (Fan *et al.* 2016). First, the land area is vast, and the topography and physiognomy are complex. The east and south of China are close to the ocean with more precipitation, while the north and west of China are deep inside the continent and drier. Second, the latitude and longitude spans are large and include the tropics, subtropics, warm temperate zones, middle temperate zones, cold temperate zones and the Qinghai-Tibet Plateau. The difference in thermal properties between land and sea causes the precipitation to decrease from the southeast coast to the northwest inlands (Wang *et al.* 2019). In general, precipitation and temperature are the key factors affecting regional drought events and the features in China's regional system show an uneven distribution. To explore the influences of precipitation factors on drought events over China and offer suggestions for drought prediction, this study used precipitation data based on run analysis theory, the copula method and Gibbs sampling to analyze the three-dimensional spatiotemporal probability distribution of drought events in China. The significant regional differences in China's natural conditions determine the spatial characteristics of Chinese meteorological factors. The study area can be divided into seven regions in accordance with the properties of geography, land use type, climate and farming system: northeast China (NE), northwest China (NW), central China (CC), northern China (NC), southern China (SC), eastern China (EC) and southwest China (SW) (Figure 13) (Song *et al.* 2011; Wang *et al.* 2019). The optimal joint distribution Archimedean-copula function for each of the seven regions may not be consistent with China as a whole, but the overall accuracy is bound to increase. Separate research on China's seven regions will be our future research direction, which will fully consider geographical characteristics, while offering more effective guiding significance for the prevention of local drought events.

### Other perspectives of drought research

Solar radiation is an important source of energy for atmospheric motion and a direct driver of surface temperature, precipitation and heat flux. The radiation budget at the top of the atmosphere maintains an approximate balance, while the radiation transmission process at the earth's surface is very complicated (Bisht *et al.* 2005). Moreover, the annual average radiation income from satellite observations indicates that nearly 50% of solar radiation energy is absorbed by the earth's surface. This part of the energy exchanges with the surface and the atmosphere through sensible heat flux, latent heat flux and soil heat flux, which affects the surface energy balance and land surface process changes and provides power for near-surface atmospheric movement (Yang *et al.* 2008). At the same time, the balance of the surface radiation budget is also closely related to regional evapotranspiration and hydrological processes (Mecikalski *et al.* 2018). In general, solar radiation has great significance for the study of drought. This study summarizes the factors affecting solar radiation based on a literature review and data query. The influencing factors are divided into two categories. One is the periodic factors, such as season, geography and meteorology, and the other is nonperiodic factors, such as natural disasters (Figure 13).

In detail, the solar incident angle, the duration of illumination and the intensity of sunshine are different in each season, which is the main manifestation of the impact of seasonal change on drought. Therefore, the solar elevation angle and the duration of sunshine can be considered seasonal factors in the study. The influence of geographical factors on solar radiation should not be underestimated; thus, latitude, longitude and altitude can be considered in the study. The latitude, longitude and season are mainly considered in this paper. Meteorological conditions mainly affect the scattering and absorption of solar radiation, and weather conditions and cloud conditions can be considered in the study. Although natural disasters (such as rain and snow disasters, wind anomalies, sand dust and typhoons) can also be meteorological factors to some extent, they have a strong impact on drought events. Finally, other perspectives of drought research are shown in Figure 13.

### Uncertainties

This study presents a simulation framework that can construct a joint probability distribution of drought events. However, it should be noted that some methods related to the framework are not perfect; thus, uncertainties are potentially introduced into our results. Although drought event identification based on run analysis theory is the foundation of this study, the recognition process has deficiencies such as lag and rigidity. Generally, the accumulation and recession effects of drought are regarded as a physical process. The process description can be divided into four stages of accumulation stage, outbreak stage, disposal stage and regression stage, which can realize the dynamic identification of drought events and eliminate uncertainties. Moreover, the joint probability distribution of drought events based on the Archimedean copula is the focus of this study. Despite this, to improve the rationality and interpretability of drought events, a variety of methods (e.g., the meta-elliptic copula and Plackett copula) should be used to explore the joint probability distribution. In addition, this study only determined the trend of average latitude and longitude of drought events in China as a whole, which is not sufficient. Thus, different regional assessments of the joint probability distribution should be conducted with the geographical features of China for applicability.

## CONCLUSIONS

Although drought is the main problem in the field of water scarcity, studies of the spatiotemporal probability distribution of drought events are relatively rare. In this study, a simulation framework of the occurrence probability of drought events is proposed to effectively compensate for the high-dimensional problems and lack of initial data in traditional methods. The core of the simulation framework is run analysis theory, a time-varying nested Archimedean-copula function model, and Gibbs sampling. The run analysis theory mainly identifies the drought events, and the copula function determines the spatiotemporal probability distribution of drought events, while the Gibbs model performs sampling and simulation. And this study demonstrated the application of the proposed simulation framework in China based on precipitation data of 718 meteorological stations from 1961 to 2018.

The occurrence location of drought events is mainly concentrated from 35° to 42° north latitude and 105° to 120° east longitude, while the occurrence time is mainly concentrated from September to November. The three-dimensional nested copula function can accurately determine the spatiotemporal joint probability distribution of drought events (RMSE: 0.01). Gibbs sampling and simulation of the time-varying nested Archimedean-copula function (RMSE: 0.52 (latitude), 0.75 (longitude)) is more accurate than the nested Archimedean-copula function (RMSE: 3.26 (latitude), 2.48 (longitude)). Compared with the nested Archimedean-copula function, the accuracy of Gibbs sampling and simulation based on the time-varying nested Archimedean-copula function was increased by 84.05% latitude and 69.76% longitude. And the drought will occur at the same time every year. The proposed simulation framework established the spatiotemporal joint probability distribution of drought events, which provided an effective means for predicting drought events scientifically. In the future, it should pay more attention to supplementary data and multidisciplinary methods to achieve more accurate predictions of drought events.

## AUTHOR CONTRIBUTIONS

The first author contributed substantially to conceptualization, methodology, validation, data curation, data interpretation, and writing. All authors participate in drafting the article or revising it critically, and give final approval of the version to be submitted.

## FUNDING

This research was supported by the National Key Research and Development Program of China (2019YFC0408902), the National Natural Science Foundation of China (Grant No. 51879010 and 51479003), and the Graduate Innovation Fund in Beijing Key Laboratory of Urban Hydrological Cycle and Sponge City Technology (HYD2020IFDC03).

## CONFLICTS OF INTEREST

The authors declare no conflict of interest.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.