## Abstract

Duration and severity are the two main variables used in drought analysis. The copula functions are appropriate for multivariate drought analysis, as it lacks the limitations of the classical multivariate distribution function. This study investigated the bivariate frequency analysis of drought duration and severity of Yazd city in Iran synoptic station during 1953–2013. To this end, first, the drought duration and severity variables were derived from the 6-month Standardized Precipitation Index. Then, considering the distribution functions, the gamma distribution function was selected for analyzing the severity and the exponential distribution function was selected for analyzing the duration and then the Clayton copula function was selected out of the three copula functions as the most appropriate one. After conducting bivariate frequency analysis, the joint seasonal and conjunctive return period and conditional return period curves were plotted. The current study well signified that multivariate analyses could present better interpretations of the reality; for example, as it was identified in conditional return period curves of the drought, for every constant duration, the amount of the return period changed as the severity changed. On the contrary, in analyzing the univariate of duration, no effects of severity were observed.

## HIGHLIGHTS

Use of multivariate analysis in research due to the high degree of correlation between the severity and duration of drought.

Accurate study of the correlation between the two parameters of hardness and duration of drought.

Using copula functions in this study that removes the limitations of classical distribution functions.

Calculation of 6 months.

Estimation of the parameters of the coupling function.

## INTRODUCTION

As a natural phenomenon, drought affects wet and dry areas. There are other consequences on the rain-fed agriculture and food security. Different economic sectors could lead to insufficient water resources in cities (Balana *et al.* 2020). Since drought is a critical factor in water resources management and planning (Katipoğlu *et al.* 2020), understanding the characteristics of drought-related problems is an essential step in creating mitigation measures (Bravo *et al.* 2021). Unlike many natural disasters that occur abruptly, drought has a slow pace and is referred to as a creeping phenomenon (Drysdale *et al.* 2020). Moreover, since drought is remarkably associated with unexpected phenomena, such as precipitation and flow patterns (Helama *et al.* 2018), investigating the drought problem requires the use of probability theory and stochastic processes methods. Accordingly, bivariate probabilistic distribution characteristics of drought could be applied which are not derived by a separate variable (Shiau 2006; Zhu *et al.* 2019; Bravo *et al.* 2021). Drought could cost an average of 6–8 billion dollars annually worldwide. Overall, the drought phenomenon also could trigger more damages (Heim 2002; Stone 2010; Sheffield *et al.* 2012). In addition, drought is a complex phenomenon and usually is identified by several correlations. To quantitatively assess the characteristics of drought, one suitable method is using the probability theory and the stochastic process method. However, drought variables, such as duration and severity, are correlated. Univariate analysis of drought characteristics lacks significant detection capability (Wilhite 2000; Huard *et al.* 2006; Kao & Govindaraju 2010; Chen & Sun 2015; Vicente-Serrano *et al.* 2015; She & Xia 2018; Real-Rangel *et al.* 2020). Based on agricultural areas, drought is a costly natural hazard in the world (Wilhite 2000; Heim 2002; Mishra *et al.* 2021). Since droughts are dynamic and multi-index, simultaneous assessment of several drought characteristics is crucial in assessing drought risk (Shiau & Modarres 2009; Zhang & He 2016). Drought is generally referred to as a long-term lack of precipitation. Throughout the history, the drought of typical and frequent climate phenomena has triggered the formation of civilization. Due to negative impacts on the economic, environmental, and social sectors of the system, water delivery design capacity is often employed for urban areas during a critical drought period (Tsiaris *et al.* 2017; Wong 2019). Due to random nature of drought, the best way is to investigate droughts using the probability theory. A significant volume of the research introduced the concept of implementation for drought analysis (Chung & Salas 2000; Shiau & Shen 2001; Gonzalez & Valdes 2003; Tabari *et al.* 2013; Hao & Singh 2015; Tsakiris 2017; Kwon *et al.* 2019; Liu *et al.* 2019; Shamshirband *et al.* 2020; Wu *et al.* 2020).

Drought characteristics have been considered as essential factors in water resources. Complex hydrological phenomena such as droughts and floods are multivariate; thus, several correlated random variables describe them (Wani *et al.* 2019; Wu *et al.* 2020). Two variables of duration and severity are generally applied to describe drought, since the variables have a correlation with each other (Tsakiris *et al.* 2016; Kwon *et al.* 2019; Dai *et al.* 2020). Most of the studies refer to univariate analysis, since the statistical techniques are well developed in hydrology. On the other hand, problems such as limited multivariate statistical techniques, the complexity of mathematical calculations, and the requirement for more data than univariate methods led a few researchers to apply multivariate analyses (Shiau 2006; Azam *et al.* 2018; Ali *et al.* 2020). Many researchers warned that employing univariate analysis, in the presence of the correlation between variables, would lead to overestimated or underestimated results (Salvadori & De Michele 2004; De Michele *et al.* 2005; Nazemi & Elshorbagy 2011; She & Xia 2018; Ben Aissia *et al.* 2017; Wani *et al.* 2019; Aguayo *et al.* 2021). Most of the studies on multivariate analyses of correlated hydrological variables applied the classical multivariate distribution functions such as bivariate normal distribution (Montaseri *et al.* 2018; da Rocha Júnior *et al.* 2020; Li *et al.*, 2021), bivariate exponential distribution (Singh *et al.* 1999; Kao & Govindaraju 2010; Zuo *et al.* 2017; Won *et al.* 2020; Mohsin & Pilz 2021), bivariate gamma distribution (Yue 2001; Vazifehkhah *et al.* 2019; Vergni *et al.* 2020), and bivariate Gumbel distribution (Kim *et al.* 2003; Shiau 2003; Qian *et al.* 2018; Chen *et al.* 2019; Seyedabadi *et al.* 2020). Having fit the bivariate Gumble distribution function on peak discharge and flood runoff volume variables, the conditional return period curves of the peak discharge were drawn for different runoff volumes which were against the return period curve of the marginal distribution of peak discharge (Yang *et al.* 2018; Datta *et al.* 2021). Yue (2001) found that in the presence of positive correlation between two random variables, the conditional return period of a variable was significantly different than its marginal return period. Thus, applying multivariate analyses take the precedence over univariate analyses. For the first time, Shiau (2006) used copula functions for drought analysis. The study fitted copula functions on drought variables of severity and duration and obtained joint return periods of the variables. Applying copula functions in hydrology was first proposed by De Michele *et al.* (2005) through modeling the correlation of precipitation intensity and duration variables. Favre *et al.* (2004) continued using these functions in hydrology. Using multivariate frequency analysis in hydrology was advantageous in terms of copula functions. The classical multivariate probability distribution functions were used in the most attempts to investigate the multivariate analysis of correlated hydrological variables.

The classical multivariate distribution functions have problems, including the specificity of marginal distribution functions, and the determination of their parameters; marginal distribution functions must be of the same type; marginal distribution functions could only be parametric; and classic multivariate distribution functions only use the parametric correlation coefficient and therefore do not consider nonlinear correlations (Karmakar & Simonovic 2009; Chen & Sun 2015). The application of copula function has been presented in hydrology for the first time by modeling the correlation of precipitation duration and intensity variables. Favre *et al.* (2004) continued solving hydrology problems using copula functions. Moreover, correlation modeling of dependent variables by copula functions was introduced in other fields. For example, Bárdossy (2006) found the utility of copula function modeling in identifying groundwater quality specifications. Song & Singh (2010) employed copula functions to analyze precipitation frequency. Samaniego *et al.* (2010) applied copula functions to predict the flow in basins lacking stations. A significant number of studies have been conducted on drought analysis using copula functions (Shiau *et al.* 2007; Serinaldi *et al.* 2009; Song & Singh 2010; Mirabbasi *et al.* 2012; Wong *et al.* 2013; Assani & Guerfi 2017; Kim *et al.* 2018; Han *et al.* 2020; Asif khan *et al.* 2021). One of the statistical analysis problems for drought is determining the correlation between each of the drought parameters or variables. It is well clear that the duration and severity of drought have a mutual relationship, while this is not true for the rain and flood phenomena. Therefore, bivariate analyses are inevitable and copula function could be defined as a function describing the relationship between every related variable. Limitation and problem also could be the inefficiency of the required data for the precise and better calculation of Standardized Precipitation Index (SPI), since the precise daily, long-term and annual rain data might not be available (Serinaldi *et al.* 2009; Shiau & Modarres 2009; Chen *et al.* 2012; Mirabbasi *et al.* 2012; Wong *et al.* 2013; Kwak *et al.* 2014).

In most of the studies investigating drought and the application of copula functions in Iran, the time interval applied for calculating SPI parameters was monthly or seasonal (3 months); however, the current study considered 6 months as the time interval for calculating the SPI of Yazd city. In addition, in most of the studied articles regarding drought and the application of copula functions in Iran, two parameters including severity and duration of drought were utilized for calculating copula functions; however, this study not only used these two essential parameters, but also made the use of frequency parameter in copula function calculations whose similar and simultaneous application of frequency and severity had been less investigated in Iranian articles.

## MATERIALS AND METHODS

### Drought duration and severity variables

To analyze the frequency, initially, an appropriate probabilistic distribution function should be fitted on variables. Gamma distribution function is generally used in drought studies for checking the severity variable, exponential distribution function, and the duration of the drought (Zhao *et al.* 2017). The same principle was followed in the current article.

_{i}and are the observable i values and the i values could be calculated from the cumulative distribution function. n is the number of data (Ben Aissia

*et al.*2017; Aguayo

*et al.*2021).

### Copula functions

#### Copula function definition

_{X,Y}(x,y) joint distribution function of correlated random variables of X and Y, which, respectively, have a cumulative distribution of F

_{Y}(y) and F

_{X}(x), can be expressed as Equation (2) according to the theorem (Won

*et al.*2020). C() is a copula function here. Each copula function has a parameter called the correlation parameter.

The characteristics of the two-parameter gamma functions and the two-parameter exponential function used in the article have been presented in Table 1 (Zuo *et al.* 2017; Mohsin *et al.* 2021).

Probability distribution function . | Probability density function . | Descriptions . |
---|---|---|

Two-parameter gamma | α: Shape parameter | |

β: scale parameter | ||

Г(.): gamma function | ||

Two-parameter exponential | α: location parameter | |

ε: scale parameter |

Probability distribution function . | Probability density function . | Descriptions . |
---|---|---|

Two-parameter gamma | α: Shape parameter | |

β: scale parameter | ||

Г(.): gamma function | ||

Two-parameter exponential | α: location parameter | |

ε: scale parameter |

There are several families of copula functions including the t family: Elliptical, Plackett, Gaussian, and Archimedean. The Archimedean family is employed in numerous studies in water engineering (e.g. Singh *et al.* 2008; Caccamo *et al.* 2011; Hao & Singh 2015; Zuo *et al.* 2017; Chen *et al.* 2019; Vergni *et al.* 2020). Three Archimedean copula functions of Gumbel, Frank, and Clayton were applied in this article to model the correlation of variables. The relationships and characteristics of the three copula functions used in this article have been presented in Table 2, in which *u* and denote two dependent cumulative distribution functions which range between 0 and 1; and *θ* is a parameter used to measure the degree of association between *u* and (Tosunoglu & Kisi 2016; Zhang He 2016; Won *et al.* 2020).

Parameter space . | . | Copula function . |
---|---|---|

Clayton | ||

Frank | ||

Gumbel |

Parameter space . | . | Copula function . |
---|---|---|

Clayton | ||

Frank | ||

Gumbel |

#### Parameter estimation and the fitness test for copula function

*et al.*2013; Hao & Singh 2015; Tsakiris

*et al.*2016; Kwon

*et al.*2019). Like the MLE method, this method is also based on maximizing the logarithm of the likelihood function for different values of the correlation parameter, which is given in Equation (3). In the following relation, C is the copula density function. F

_{x}(x) and F

_{y}(y) are marginal cumulative distribution functions and in this citation meaning is for two parameters. To select the best copula function, the higher the MLE logarithm value of a copula function, the more acceptable it is (Tsakiris

*et al.*2016; Datta

*et al.*2021).

#### Drought joint return periods

*et al.*2018; Datta

*et al.*2021).

*et al.*2018; Han & Singh 2020).where T

_{D|S≥s}denotes the conditional return period for D given S ≥ s. Drought severity return period is obtained from Equation (7), provided that the duration of the drought exceeds a certain threshold. T

_{S|D≥d}denotes the conditional return period for S given D ≥ d (Assani & Guerfi 2017; Kim

*et al.*2018).

#### Standardized Precipitation Index

In the study, the 6-month SPI was employed to calculate drought variables. This index aims to define and monitor the drought. SPI calculation based on long-term precipitation series is applied for a specific period, such as 1, 3, 6, and 12 months (Heim 2002; Stone 2010; Assani & Guerfi 2017). To calculate a 6-month SPI, first, a new precipitation series must be generated from the monthly precipitation series. According to these series, the amount of precipitation per month is obtained from the sum of values of the previous 6 months' precipitation and the same month's precipitation. Then, the empirical Green's function could be fitted on these series of data. Gamma distribution was applied for the severity parameter, and exponential distribution was used for the duration parameter. Then, the relevant parameters of each were calculated. Finally, the inverse of the standard normal distribution function was applied for the calculation of probability values, resulting in a 6-month SPI (Zhao *et al.* 2017; Asif Khan *et al.* 2021).

It should be noted that drought is referred to as a period in which SPI is continuously negative. In other words, SPI lower than zero is considered as drought. Accordingly, the duration of the period is the drought duration, and the sum of the SPI values of the period is the drought severity (Hao & Singh 2015; Wu *et al.* 2021).

## STUDY DATA

In this study, precipitation data of the Yazd in Iran (Figure 1) synoptic station from 1953 to 2013 was applied. At the 2011 census, the population was 529,673, and it is currently the 15th largest city in Iran. This station has the geographical coordinates of 54.29° longitude, 31.90° latitude, and 1237 altitude meters above the sea level. The average temperature is 19.8 °C, and the average relative humidity is 31%. The average annual precipitation during the studied years is 46.2 mm. Due to long-term daily statistics, the data validity of the station is high. In recent years, drought triggered costly damages to the lives of the people of Yazd, both in terms of food threats and water shortages. It also had damages to historical and important areas of the Yazd city; thus, considering the problem of drought in Yazd is of utmost importance.

## RESULTS

### Six-month SPI and drought severity and duration variables

The 6-month SPI series has been shown in Figure 2. In the study, SPI lower than zero is considered as drought, and 45 droughts over a 6-month period during 1953–2013 have been identified by the SPI index.

The curve of severity versus duration for the 6-month SPI is shown as a slip curve in the diagram of Figure 3. This diagram showed that by increasing the duration of drought, the density of drought severity decreased; however, the severity of each parameter increased.

Correlation coefficients between drought severity and the duration of Pearson, Spearman, and Kendall were calculated. Their values are displayed in Table 3 to assess the correlation between drought severity and duration variables. All three correlation coefficients showed significant positive values, so the multivariate analysis was preferable to univariate analysis.

Correlation coefficient . | Spearman . | Pearson . | Kendall . |
---|---|---|---|

Value | 0.78 | 0.90 | 0.71 |

Correlation coefficient . | Spearman . | Pearson . | Kendall . |
---|---|---|---|

Value | 0.78 | 0.90 | 0.71 |

Probability distribution function . | Variable . | Parameters value . | RMSE . |
---|---|---|---|

Two-Parameter gamma | Drought severity | α = 0.85 and β = 18.07 | 1.322 |

Two-parameter Exponential | Drought duration | λ = 5.46 and α = 6.23 | 1.165 |

Probability distribution function . | Variable . | Parameters value . | RMSE . |
---|---|---|---|

Two-Parameter gamma | Drought severity | α = 0.85 and β = 18.07 | 1.322 |

Two-parameter Exponential | Drought duration | λ = 5.46 and α = 6.23 | 1.165 |

### Univariate distribution functions

The estimated parameter values of the univariate distribution functions through the two-parameter exponential functions have been presented in the following table regarding which, the duration and the two-parameter gamma functions of severity had the lowest error rates among the relevant functions; the mean squared error value has also been presented accordingly.

The joint cumulative distribution functions of the drought variables estimated by the copula function had been compared to the probability of empirical absence (Table 4). In this study, Green's relations and drawing curves of Figures 4 and 5 were used to obtain the probability of empirical absence. The diagrams showed that as drought severity increased, the probability density decreased; however, the probability of the occurrence of each parameter increased. Green's empirical probability diagram of drought duration and severity is shown in Figures 4 and 5.

### Copula functions

The values of estimated parameters of each of the copula functions using the MLE method and the MLE logarithm are given in Table 5.

Copula function . | Parameter value . | MLE . |
---|---|---|

Clayton | 3.37 | 30.02 |

Frank | 16.25 | 27.16 |

Gumbel | 3.17 | 29.81 |

Copula function . | Parameter value . | MLE . |
---|---|---|

Clayton | 3.37 | 30.02 |

Frank | 16.25 | 27.16 |

Gumbel | 3.17 | 29.81 |

Severity (SPI) . | Duration (months) . | Return period (years) . |
---|---|---|

2.48453 | 4.2421517 | 2 |

7.3854142 | 9.2494869 | 5 |

11.566743 | 13.03739 | 10 |

15.960275 | 16.825293 | 20 |

21.971487 | 21.832628 | 50 |

26.623237 | 25.620531 | 100 |

Severity (SPI) . | Duration (months) . | Return period (years) . |
---|---|---|

2.48453 | 4.2421517 | 2 |

7.3854142 | 9.2494869 | 5 |

11.566743 | 13.03739 | 10 |

15.960275 | 16.825293 | 20 |

21.971487 | 21.832628 | 50 |

26.623237 | 25.620531 | 100 |

According to Table 5, it can be seen that the Clayton copula function had the highest MLE value, which led to its selection as the most suitable copula function. Curves related to any of the calculated copula functions are as follows: Figure 6(a), 6(b) and 6(c) referred to the Clayton, Frank, and Gumbel, respectively. Given that the drought duration exceeds the 2, 5, 10, 20, 50 and 100-year return period, the variations of the conditional probability density of drought severity for Clayton copula, Frank copula, and Gumbel copula have been shown in Figure 6(a), 6(b) and 6(c). It can be seen that the Clayton copula was suitable for the drought variables.

### Conditional and joint return periods

Equations (4) and (5) are used to calculate seasonal and conjunctive joint return periods (probability return period and . The results were in the same return period curves as shown in Figure 7(a) and 7(b). The corresponding values were as follows (Table 6).

Figure 7(a) shows that as the monthly time duration increased, the severity value in the type of probability return period decreased. The probability return period was as the following . Figure 7(b) shows that as the monthly time duration increased, the drought severity value in this type of probability return period decreased, and finally, it remained constant. Diagrams of return period drought parameters and duration showed that with increasing the return period, both the amount of drought hardness parameter and the duration of drought duration increased .

Equations (6) and (7) are used to obtain the curves of conditional drought severity and duration return periods. The curves are shown in Figure 8(a) and 8(b). Return period diagrams of severity and duration of drought showed that as the amount of time parameter (D) increased for increasing the amount of severity and as the amount of severity parameter (S) increased for increasing the amount of time parameter, the amount of return periods also increased. The return period of or the duration of 1–10 (Figure 8(a)) and the return period of or the severity of 1–10 (Figure 8(b)) have been shown. Figure 8(a) shows that by increasing the value of severity, the value of the annual return period increased. The conditional return period of drought severity indicated that the duration was greater than a certain value, D. Figure 8(b) shows that by increasing the duration, the value of the annual return period also increased. The conditional return period of drought duration showed that the severity was greater than a certain value S.

Conditional diagrams of severity and duration of drought showed that the probability of occurrence increases with increasing time parameter for increasing severity and increasing severity parameter for increasing time parameter. Conditional severity and duration diagrams for the duration of 1–20 and as well as the severity of 1–20 are as follows. Figure 9(a) shows that by increasing the assumed desired duration, the conditional probability of the curve also increased. This graph is useful for determining the probability of drought severity considering the drought duration exceeding a certain value. Figure 9(b) shows that by increasing the duration of the desired interval, severity increased the conditional curve.

## DISCUSSION

The current article has investigated the bivariate frequency analysis of duration and severity of drought as the most significant variables in designing water supply systems. Accordingly, first, each of the drought variables of the 6-month was derived, and the appropriate distribution for each variable was selected among two-parameter gamma functions, type three Pearson, normal and Lognormal three-parameters, two-parameter exponential, Gamble and Wibble three parameters. This study had utilized a two-parameter exponential function for studying the drought duration parameter and had used a two-parameter gamma function for investigating the severity; they were the best parameters in terms of having the least errors. Then, the copula function was selected among Frank, Clayton and Gumbel functions. Clayton function was the best function in this study. Overall, after conducting bivariate frequency analysis, the joint seasonal and conjunctive return period and conditional return period curves were plotted. The results of multivariate analyses of Copula functions could be used in water resources management plans, risk analysis and probabilistic scheduling. According to the current study, the return period related to any duration and severity could be obtained; for example, the highest observed duration and severity in the studied years equaled 25 months and 2135.19 and the seasonal conjunctive return period of these amounts equaled 1,675 years and 41 years, respectively. Moreover, considering any amounts for the drought severity based on SPI and having the amount of condition for S and D parameters and according to Figure 7(a) and 7(b), the amount of return period could be obtained. For example, according to Figure 7(a), having the severity level equal to 15 and the condition for the duration parameter equal to D > 6, the return period amount equaled 75 years.

## CONCLUSION

According to the study, variables of drought severity and duration had a high correlation. Thus, the multivariate analysis produces presented more reliable results than univariate analysis. In this study, bivariate frequency analysis of drought severity and duration was done using copula functions. Accordingly, first, 6-month SPI series was obtained from the precipitation of Yazd synoptic station during 1953–2013, and the variables of drought severity and duration were derived from this index. The two-parameter gamma functions for severity and the two-parameter exponential functions for the duration were selected as the best functions. The least RMSE was used as the selection criterion. Among the functions of Clayton, Frank, and Gumbel, the Clayton function had the highest value of the MLE logarithm. After conducting bivariate frequency analysis, the joint seasonal and conjunctive return period and conditional return period curves were plotted. The following points could be presented regarding the results of the article:

- 1.
The current study showed that multivariate analysis provided a better analysis of the reality. For example, as can be seen in return period curves during drought duration, as the severity changed, the value of the return period changed for a definite time, whereas in univariate analysis, no severity effects could be observed.

- 2.
Distributions of variables of drought severity and duration were different. Thus, the classical multivariate distribution functions could not be used for multivariate analyses. A copula function is a suitable tool that lacks the problems of the classical multivariate distribution functions.

- 3.
Results of multivariate analyses can be applied in water resources management plans, risk analysis, and probabilistic scheduling. For example, if the design considers a 100-year return period, the various values that make up this return period can be obtained from Figure 7(a) and 7(b). It is also possible to obtain a return period related to any duration and severity.

- 4.
The above results indicated that copula was a useful tool in exploring the associations of the correlated drought variables. However, the current study focused on two random variables as well as one-parameter copulas. Multivariate and two-parameter copula applications could be appropriate topics for future studies.

## DATA AVAILABILITY STATEMENT

Data available on request from the authors.

## REFERENCES

*Culex pipiens*larvae in Yazd Province

*T. aestivum*L.)