## Abstract

Bivariate approaches in Regional Frequency Analysis (RFA) address two issues: first, to evaluate the homogeneity of regions, and second, to estimate the joint return periods. This study was conducted to investigate the joint return period of a severe historical drought in southwestern Iran. Fifty-nine rain gauges were first clustered into three, four, and five regions using the fuzzy c-means clustering (FCM) algorithm. Then bivariate discordancy and homogeneity tests were applied to adjust the initial clusters. The results showed that only in the case of three clusters were all the regions homogeneous. Therefore, it can be inferred that combining clustering analysis and discordancy test is insufficient to form homogeneous regions. Finally, the joint return period, by choosing Generalized Logistic and Wakeby as marginal distributions and Clayton as a copula, was estimated for all the sites in the three regions. Since no three-parameter distribution function fitted well to the variable severity, the bivariate homogeneity index does not necessarily attest to region homogeneity regarding the marginal distribution functions. It is also deduced that sites with higher mean annual precipiataion (MAP) and, correspondingly, higher elevation are more likely to experience shorter return periods of same drought events, in contrast to sites with lower MAP or lower elevation.

## HIGHLIGHTS

The estimation of return periods for extreme events relies on the availability of sufficient data.

Generalized Logistic and Wakeby are chosen as marginal distributions, and Clayton is selected as a copula.

Sites with higher mean annual precipitation (MAP) and higher elevation are found to be more likely to experience shorter return periods of the same drought events.

## INTRODUCTION

Frequency analysis is a valuable tool for estimating quantiles of extreme values, such as drought, flood, and storm events. Needless to say, reliable frequency analysis is highly dependent on data availability. Since data records are often short, the method called Regional Frequency Analysis (RFA) was developed to increase the reliability of the quantile estimates (Hosking & Wallis 1997). Additionally, RFA can be used to provide reliable analysis in ungauged sites (Sadri & Burn 2011).

To discuss exactly what RFA does, let’s assume a data set that includes N sites, each containing an n-year data record. Thus, using regional data (with data) instead of individual site data (with a record length equal to n) helps to retain longer data records and improve the reliability of estimates of at-site quantiles in frequency analysis. Therefore, RFA consists of two main steps: first, regionalization of the area, which involves identifying homogeneous regions in which the sites have the same frequency distribution; and second, estimating quantiles using an appropriate regional distribution function.

Various methods have been developed for regionalization, with cluster analysis being a standard statistical multivariate analysis technique used to divide data points into multiple groups. This method has been successfully applied to RFA (Raziei *et al.* 2008; Yoo *et al.* 2012; Shamshirband *et al.* 2015; Azam *et al.* 2018; Ahani *et al.* 2020; Rahman & Rahman 2020; Ahani *et al.* 2022).

It should be noted that sites in a cluster may not necessarily follow the same frequency distribution. To address this issue, Hosking & Wallis (1993) investigated the practical utility of RFA using the index-flood procedure by developing the application of L-moments in hydrology. They presented two tests based on L-moments to measure discordancy to identify discordant sites and homogeneity in a region. Ngongondo *et al.* (2011) used the L-moments approach for RFA in southern Malawi. Using the concepts of entropy and L-moments to perform the Hosking–Wallis homogeneity test. Rajsekhar *et al.* (2013) extracted eight and nine homogeneous regions in Texas in terms of severity and duration of drought, apart. Using the Standard Precipitation Index (SPI) and Hierarchical Cluster Analysis, Gocic & Trajkovic (2013) divided Serbia into three regions and used discordancy and homogeneity tests to examine the homogeneity of the regions. With the help of FCM, Goyal & Sharma (2016) investigated drought in three states of western India; after calculating the SPI on different timescales, they optimized the number of clusters using five cluster validation indices; finally, the univariate Hosking–Wallis homogeneity test was applied to the clusters. Ahani *et al.* (2022) proposed a ranking method applied to Karun-e-bozorg basin in the southwest of Iran to identify a suitable regionalization for regional flood frequency analysis; the method was developed to compare and rank various regionalizations and choose the most appropriate one. Their method has an acceptable performance in identifying a regionalization that provides suitable homogeneity, quantile estimation accuracy, and average region size. Parvizi *et al.* (2022) implemented RFA of drought severity and duration in the Karkheh River Basin, Iran. They used the L-moments to investigate the delay of hydrological drought after the meteorological drought.

It should be noted that drought is a multivariate phenomena. Drought severity and duration are the two key variables that characterize droughts (Shiau & Shen 2001). Their multivariate nature adds to the complexity of drought regional frequency analysis. When variables are correlated, univariate approaches become impractical (Abbasian *et al.* 2021).

The requirement for multivariate approaches has led to the creation of multivariate homogeneity and discordancy tests by Chebana & Ouarda (2007), which were further developed with the introduction of L-comoments by Serfling & Xiao (2007). Applying these bivariate tests to regional flood data for Quebec, Canada, Chabana *et al.* (2009) concluded that the bivariate tests are appropriate and more efficient than univariate tests. Based on the measures developed by Chebana & Ouarda (2007), Sadri & Burn (2011) compared the performance of univariate and bivariate RFA; the results signified that bivariate RFA of drought outperforms univariate homogeneity and discordancy tests for adjusting the hydrological regions. Zhang *et al.* (2015) used the fuzzy c-means clustering (FCM) method and L-comoments to divide China into five homogeneous regions, and they adopted RFA. Abdi *et al.* (2017) applied univariate and bivariate homogeneity tests based on univariate and bivariate L-moments to check the homogeneity of their study area, whose results indicated that the bivariate method provided higher accuracy than the univariate one. Azam *et al.* (2018) performed hierarchical clustering and PCA and used bivariate homogeneity tests to confirm discordancy and homogeneity in Korean regions; the results of univariate and bivariate homogeneity and discordancy tests showed significant differences due to the inability of univariate measures to consider the joint behavior of duration and severity. Therefore, bivariate approaches are necessary for a thorough analysis of drought characteristics. Mortuza *et al.* (2019), to assess future drought trends in Bangladesh, utilized copula-based bivariate regional frequency analysis due to the strong correlation between drought duration and severity. Their findings concluded that compared to the bivariate drought frequency analysis, the standard univariate frequency analysis leads to underestimation or overestimation. Karahacane *et al.* (2020), in order to enhance the design of more efficient hydraulic structures in northern Algeria, conducted complete multivariate flood frequency analysis using copula and multivariate homogeneity tests considering the flood characteristics – duration, volume, and peak flow.

To conduct RFA, data are pooled from homogeneous regions and analyzed using frequency analysis. Given the multivariate nature of drought, both its severity and duration must be considered in characterizing its impact. Therefore, calculating bivariate return periods is more effective than using only univariate return periods. However, fitting classical bivariate distribution functions is challenging which is rooted in the fact that the marginal distribution functions of drought severity and duration may not be identical (Shiau 2006). This limitation has led water resources experts to explore copula functions as an alternative approach (Sadri & Burn 2014).

Copulas, introduced by Sklar (1973) and developed by Nelsen (2007), connect multivariate probability distributions to their one-dimensional marginal distribution while considering the existence of dependence and correlation among the random variables (Shiau 2006). These functions are very flexible in studying the correlation structure of continuous random variables, and their efficiency in hydrology has been shown by many researchers (Sadri & Burn 2014; Yu *et al.* 2014; Montaseri *et al.* 2018; Nabaei *et al.* 2019; Abbasian *et al.* 2021). Copulas do not face the limitations of classical multivariate distribution functions and offer the possibility of separating the study of the correlation structure of variables from the study of margins (Shiau & Modarres 2009; Mirabbasi *et al.* 2012). A copula can be utilized to separately model the marginal distributions of severity and duration, enabling each variable to have its distribution. The copula function then links these individual distributions to create a joint distribution, which captures any relationship between the variables. This can help insurers analyse frequency more accurately.

Iran is home to vast arid and semi-arid areas. In 1999–2002, Iran experienced one of the worst droughts in its history, according to UN-SPIDER^{1} . The severe lack of precipitation resulted in a total loss of $3:5 billion, the death of 800,000 livestock, and the drying up of major reservoirs and inland lakes (Amoli *et al.* 2015). Given the devastating impact of this drought, it is important to understand its characteristics and estimate the return period of such extreme events. Moreover, a significant number of studies have shown that bivariate tests outperform univariate ones in terms of both regionalization and frequency analysis. Therefore, utilizing the tests is a viable option for investigating regional frequency analysis in studies. This study applied bivariate RFA in three sub-basins: Karkheh, Karun, and Jarahi-Zohra, to estimate the bivariate return period for the most severe drought in the last four decades. This study divided the meteorological stations into three, four, and five clusters using FCM. Then, the initial clusters were adjusted with the help of the bivariate discordancy test. After that, the homogeneity of the formed regions was examined by the bivariate homogeneity test based on which all three clusters were compared. Eventually, the copula extracted and analyzed the joint return periods of the most severe droughts in each region.

## STUDY AREA AND DATA

Generally, two types of data are of interest to identify regions in regional hydrology studies: (1) at-site statistics based on measurements related to drought characteristics and (2) other site descriptors known as site characteristics. Site characteristics are physical properties of the site that can be identified even before any data are measured. These usually include the site’s geographical location, elevation, and other physical attributes. Mean Annual Precipitation (MAP) can also be considered a site characteristic since it is not necessarily directly related to the probability distribution (Hosking & Wallis (1997)).

^{2}Figure 1 shows the location of the 59 stations within the study area.

## METHODS

### Standardized precipitation index

The SPI is the most appropriate index for meteorological drought investigation (Xu *et al.* 2015; Nabaei *et al.* 2019). According to McKee *et al.* (1993), to measure the SPI time series, the long-term precipitation records were first ditted to a gamma distribution and subsequently to a normal distribution. In this study, a 6-month SPI was conducted (Svoboda *et al.* 2012). However, different timescales do not affect regionalization results (Raziei *et al.* 2012; Portela *et al.* 2015). Therefore, assuming the truncation level is equal to 0 (Shiau 2006), duration (the number of months that a drought event lasts) and severity (a cumulative deficiency of SPI during a drought event) was extracted by the Run Theory (Yevjevich 1967).

### Formation of initial clusters

*et al.*2013). FCM developed by Dunn (1974) and Bezdek

*et al.*(1984) is a soft clustering algorithm in which a data point belongs to every cluster with different degrees of membership (Sadri & Burn 2014; Zhang

*et al.*2015; Goyal & Sharma 2016). The objective function which has to be minimized in the FCM algorithm is as follows:in which is the number of data, is the number of clusters, denotes the center of the ith cluster, is the fuzzifier, and is the membership degree of data point in the cluster .

In this study, longitude, latitude, and MAP were selected as inputs for initial clustering (Supplement, Table 6). It is noteworthy that the elevation of sites is usually correlated with MAP. The fuzziness of the FCM leads to different estimates for the same station, where by using the weighted average, a more reliable estimation of the bivariate return period can be provided for this category of stations.

In this study, initially, three clusters were created. Then, four and five clusters were added to investigate the possibility of having less discordant sites, which do not belong to any region. We did not pursue more clusters because increasing the number of clusters results in smaller region sizes and less data, which can lead to less reliable estimations.

### L-moments

Developed by Hosking & Wallis (1993), the L-moment approach is an alternative system for describing probability distribution function. According to Hosking & Wallis (1997), L-moments have three important applications in RFA:

Discordancy test: to identify discordant sites

Homogeneity test: to assess if the region is homogeneous

Fitting statistical distribution candidates and evaluating goodness-of-fit

^{3}for a random variable with distribution , are matrices whose elements are calculated (Chebana & Ouarda 2007).

### Bivariate discordancy test

*site characteristics*, while

*at-site statistics*are preferred for subsequent testing of the homogeneity of a proposed set of regions (Sadri & Burn 2011). Following the formation of the initial regions, identifying discordant sites whose L-moments significantly deviate from the region’s average L-moments is necessary to maintain the region’s homogeneity (Hosking & Wallis 1997). The bivariate discordancy test proposed by Chebana & Ouarda (2007) for site is written as . For a region with N sites, the matrix , defined as . The region’s unweighted average is thus:Assume thatThe matrix is defined as follows:To transform the matrix from multidimensional space to the real line, it is possible to use a norm.

The critical value depends on the number of sites in the region. It is suggested that exposes a discordant site.

In this study, the bivariate discordancy test was performed for each site in each cluster. It should be noted that removing a site changes the mean of the L-moments. Therefore, discordant sites were removed one by one until all sites in the group had a discordancy index value below the critical value. After all, the belongingness of the discordant sites to other clusters were checked.

### Bivariate homogeneity test

*et al.*(2017)):where is the norm of matrix ; and is the L-covariation coefficient matrix for site i with record length .

*et al.*2017).The critical values for H are recommended as below:

### Marginal distribution

Five widely used three-parameter probability distribution functions in hydrology, including (1) Generalized Logistic, (2) Generalized Extreme-value, (3) Generalized Normal, (4) Pearson type III, and (5) Generalized Pareto, were candidates for selecting the best marginal distribution functions for the two variables of drought severity and duration.

Regarding the calculation of the goodness of fit of the drought severity, the statistic at a significance level of 10% was chosen, which was previously used in studies of regional floods and droughts (Hosking & Wallis 1997; Sadri & Burn 2014). However, due to the limited variety of drought duration values, the Kolmogorov–Smirnov (KS) test was used in this study to calculate the goodness-of-fit of duration.

The parameters of the marginal distribution functions were estimated by L-moments.

### Copula

Copulas are functions that connect multivariate probability distributions to their one-dimensional marginal probability distribution, while considering the correlation between random variables. Let and be fitted by probability density functions and respectively; then CDF of the variables, and , are the variables with the univariate pdf according to Sklar (1973). If and are marginal distribution for continuously, there exists a uniqe Copula which is a CDF and its margins are univariate.

*et al.*2019; Mortuza

*et al.*2019; Karahacane

*et al.*2020), and in this study were candidates for joint probability distribution functions. The best copula was selected based on the maximum pseudo-likelihood. Also, the candidate parameters were estimated by maximum pseudo-likelihood.

All computations and statistical analyses and graphs in this study were performed using the ‘R’ software.

## RESULTS AND DISCUSSION

### Formation of initial clusters

As previously stated, sites in a cluster may not necessarily follow the same frequency distribution. Therefore, it is necessary to apply the discordancy and homogeneity test to each region to assess this issue.

### Discordancy and homogeneity tests

After the initial clustering, the discordant sites were identified and removed, performing the bivariate discordancy test. After removing the discordant sites, their membership in other clusters was checked. The adjusted clusters, , , and , shown in Figure 3, were obtained after applying the discordancy test. Adjusting the clusters resulted in a decrease in the density of sites in the regions, particularly in the Karun sub-basin (Figure 3). Moreover, the total number of discordant sites belonging to no region decreased as the number of clusters increased. The results for the adjusted clusters are presented in Table 1. In this study, the results of modified clustering analysis showed that increasing the number of clusters led to a decrease in the total number of removed stations, but did not significantly affect the size of each region. Given the fact that the discordancy test may not necessarily be sufficient to create homogeneous regions, a bivariate homogeneity test was applied to all the regions in all three cases.

. | Three clusters . | Four clusters . | Five Clusters . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | . | . | . | . | . |

6 | 7 | 6 | 14 | 7 | 10 | 6 | 32 | 3 | 22 | 4 | 12 | |

21 | 10 | 25 | 32 | 9 | 16 | 12 | 34 | 9 | 30 | 10 | 25 | |

25 | 22 | 31 | 34 | 22 | 28 | 25 | 35 | 10 | 38 | 12 | 31 | |

27 | 26 | 32 | 35 | 26 | 42 | 27 | 39 | 28 | 54 | 13 | 40 | |

44 | 28 | 35 | 39 | 38 | 52 | 44 | 47 | 51 | 55 | 24 | 44 | |

46 | 51 | 44 | 47 | 41 | 52 | 46 | 49 | 52 | 33 | 27 | 47 | |

56 | 52 | 47 | 48 | 54 | 53 | 56 | 50 | 53 | 7 | 36 | 48 | |

58 | 53 | 56 | 49 | 55 | 59 | 58 | 29 | 29 | 26 | 57 | 56 | |

39 | 54 | 13 | 50 | 50 | 21 | 58 | ||||||

H | 1.0 | 0.1 | 0.8 | 0.1 | 2.3 | -0.8 | 0.8 | 0.6 | 0.9 | |||

293 | 329 | 296 |

. | Three clusters . | Four clusters . | Five Clusters . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | . | . | . | . | . |

6 | 7 | 6 | 14 | 7 | 10 | 6 | 32 | 3 | 22 | 4 | 12 | |

21 | 10 | 25 | 32 | 9 | 16 | 12 | 34 | 9 | 30 | 10 | 25 | |

25 | 22 | 31 | 34 | 22 | 28 | 25 | 35 | 10 | 38 | 12 | 31 | |

27 | 26 | 32 | 35 | 26 | 42 | 27 | 39 | 28 | 54 | 13 | 40 | |

44 | 28 | 35 | 39 | 38 | 52 | 44 | 47 | 51 | 55 | 24 | 44 | |

46 | 51 | 44 | 47 | 41 | 52 | 46 | 49 | 52 | 33 | 27 | 47 | |

56 | 52 | 47 | 48 | 54 | 53 | 56 | 50 | 53 | 7 | 36 | 48 | |

58 | 53 | 56 | 49 | 55 | 59 | 58 | 29 | 29 | 26 | 57 | 56 | |

39 | 54 | 13 | 50 | 50 | 21 | 58 | ||||||

H | 1.0 | 0.1 | 0.8 | 0.1 | 2.3 | -0.8 | 0.8 | 0.6 | 0.9 | |||

293 | 329 | 296 |

possibility homogeneous defenitly heterogeneous

In the case of , three homogeneous regions were created, each containing nine sites. In the cases of and , two and one heterogeneous regions were detected, respectively. The homogeneity index of each region is presented in Table 1. Since the homogeneity condition is only met when nc = 3, this study applied regional drought frequency analysis to these three homogeneous regions. The size of each region is related to the concept of degree of homogeneity; smaller regions tend to have higher degrees of homogeneity. However, the small size of the region leads to insufficient data for RFA (Chebana & Ouarda 2007). As mentioned before, the aim of RFA is to improve the reliability of estimates by pooling data from multiple sites rather than just one. Larger regions generally have more data, which leads to more reliable estimates. To determine if a region is large enough, the 5T guideline is utilized where T represents the return period (Sadri & Burn 2011).

Notably, the formed regions do not coincide with the national classification of watersheds because each base is different. In other words, regionalization does not necessarily rely on the physical vicinity of weather stations; rather, it is based on the similarity in shape of their probability distributions.

At this stage, observed data from all stations within the region were aggregated and a frequency analysis of data from the entire region was performed.

### Marginal distributions

Five widely used three-parameter probability distribution functions in hydrology, including (1) Generalized Logistic, (2) Generalized Extreme-value, (3) Generalized Normal, (4) Pearson Type III, and (5) Generalized Pareto, were candidates to fit as marginal distribution functions for the two variables of drought severity and duration.

The statistic was used to calculate the goodness-of-fit of the drought severity. As noted in Table 2, the distribution function statistics for the three-parameter candidates are not within the acceptable range in all three regions. It is practical to use the Wakeby distribution function as the regional distribution for regions that do not completely meet the homogeneity (Hosking & Wallis 1993). Therefore, five-parameter Wakeby distribution was used as a regional distribution. Table 2 also shows the parameters of the distributions.

Region . | Variable . | . | Distribution . | Best-fitted . | . | . | Parameters . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | glo . | gev . | gno . | pe3 . | gpa . | distribution . | . | . | . | . | . | ||

S | 3.37 | 4.70 | 5.00 | 5.63 | 7.75 | ||||||||

1.785 | 9.525 | 3.804 | 0.919 | 0.270 | |||||||||

D | KS | 0.28 | 0.28 | 0.28 | 0.28 | 0.27 | glo | ||||||

RMSE | 0.005 | 0.009 | 0.011 | 0.014 | 0.024 | 5.757 | 1.093 | 0.217 | |||||

S | 2.52 | 3.71 | 4.15 | 4.97 | 6.61 | ||||||||

1.466 | 10.935 | 4.486 | 1.211 | 0.242 | |||||||||

D | KS | 0.23 | 0.23 | 0.23 | 0.26 | 0.31 | glo | ||||||

RMSE | 0.003 | 0.006 | 0.008 | 0.13 | 0.017 | 5.697 | 1.159 | 0.279 | |||||

S | 1.65 | 3.37 | 3.52 | 4.00 | 7.13 | ||||||||

1.652 | 9.411 | 3.61 | 0.993 | 0.229 | |||||||||

D | KS | 0.23 | 0.30 | 0.30 | 0.30 | 0.27 | glo | ||||||

RMSE | 0.004 | 0.008 | 0.009 | 0.011 | 0.020 | 5.784 | 1.056 | 0.207 |

Region . | Variable . | . | Distribution . | Best-fitted . | . | . | Parameters . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | glo . | gev . | gno . | pe3 . | gpa . | distribution . | . | . | . | . | . | ||

S | 3.37 | 4.70 | 5.00 | 5.63 | 7.75 | ||||||||

1.785 | 9.525 | 3.804 | 0.919 | 0.270 | |||||||||

D | KS | 0.28 | 0.28 | 0.28 | 0.28 | 0.27 | glo | ||||||

RMSE | 0.005 | 0.009 | 0.011 | 0.014 | 0.024 | 5.757 | 1.093 | 0.217 | |||||

S | 2.52 | 3.71 | 4.15 | 4.97 | 6.61 | ||||||||

1.466 | 10.935 | 4.486 | 1.211 | 0.242 | |||||||||

D | KS | 0.23 | 0.23 | 0.23 | 0.26 | 0.31 | glo | ||||||

RMSE | 0.003 | 0.006 | 0.008 | 0.13 | 0.017 | 5.697 | 1.159 | 0.279 | |||||

S | 1.65 | 3.37 | 3.52 | 4.00 | 7.13 | ||||||||

1.652 | 9.411 | 3.61 | 0.993 | 0.229 | |||||||||

D | KS | 0.23 | 0.30 | 0.30 | 0.30 | 0.27 | glo | ||||||

RMSE | 0.004 | 0.008 | 0.009 | 0.011 | 0.020 | 5.784 | 1.056 | 0.207 |

Since no three-parameter distribution was in the acceptable range concerning the goodness-of-fit criterion, the five-parameter Wakeby was fitted to the data as the most appropriate choice

In terms of duration, the KS statistical method is used in this study. The critical and extracted values are summarized in Table 2 for all three regions. According to that, all candidate drought duration distribution functions are within the acceptable range for all three regions. Therefore, the criterion for choosing the best distribution function is the Root Mean Square Error (RMSE) between theoretical and experimental probabilities. Finally, the Generalized Logistic function was chosen as the best distribution function for all three regions. The parameters of the distribution function chosen for the region are provided in Table 2. These parameters were estimated by L-moments.

### Copula

In this study, the Frank, Clayton, and Gumble families of copulas are candidates for joint probability distribution functions. Based on the maximum pseudo-likelihood results, Clayton was chosen in all three regions. The candidate parameters are also estimated by maximum pseudo-likelihood (Table 3).

Region . | . | Frank . | Clyton . | Gumbel . |
---|---|---|---|---|

8.938 | 2.581 | 2.794 | ||

mlp | 0.059 | 0.139 | 0.051 | |

8.316 | 2.540 | 2.658 | ||

mlp | 0.059 | 0.342 | 0.012 | |

8.416 | 2.434 | 2.651 | ||

mlp | 0.051 | 0.242 | 0.025 |

Region . | . | Frank . | Clyton . | Gumbel . |
---|---|---|---|---|

8.938 | 2.581 | 2.794 | ||

mlp | 0.059 | 0.139 | 0.051 | |

8.316 | 2.540 | 2.658 | ||

mlp | 0.059 | 0.342 | 0.012 | |

8.416 | 2.434 | 2.651 | ||

mlp | 0.051 | 0.242 | 0.025 |

The first quantile among the most extreme drought recorded at each site was identified, characterized by a duration of 16 months and a severity index of 12.7. The return period for each site was determined through regional frequency analysis, resulting in values of 63.45, 38.74, and 73.1 for Regions 1, 2, and 3, respectively. This indicates that a drought event with similar duration and severity occurs approximately every 63.5 years in the sites exclusively belonging to Region 1, every 38.7 years in Region 2, and every 73.1 years in Region 3. Notably, the return period of the same drought event in Region 2 is significantly shorter than that in Regions 1 and 3. Also, the results suggest an earlier occurrence in Region 1 compared to Region 3.

However, for the sites belonging to more than one region, such as Site Number 6, which exhibits a membership of 0.6 in Region 1 and 0.35 in Region 2, the return period was computed using a weighted average formula (), accounting for the fuzzy clustering.

These estimations are deemed reliable, given that they do not deviate significantly from for each respective region.

The second (duration 17, severity 13.9) and third (duration 19, severity 16) quantiles were not pursued, as return periods beyond those presented in Table 4 are deemed unreliable. Notably, the severity of the first quantile for extreme drought events surpasses that of the third quantile for all drought events, with a duration of 7 and a severity of 5.9.

Site.No. . | Membership . | Return Period (year) . | . |
---|---|---|---|

21-27-58-39-46 | Region 1 | 63.4 | 63 |

7-10-22-26-28-51-52-53-54 | Region 2 | 38.7 | 73 |

31-32-35-47-13 | Region 3 | 73.1 | 72 |

6 | 0.6 Region 1 | 67.0 | |

0.35 Region 3 | |||

25 | 0.45 Region 1 | 68.6 | |

0.51 Region 3 | |||

44 | 0.39 Region 1 | 69.2 | |

0.57 Region 3 | |||

56 | 0.33 Region 1 | 69.8 | |

0.64 Region 3 |

Site.No. . | Membership . | Return Period (year) . | . |
---|---|---|---|

21-27-58-39-46 | Region 1 | 63.4 | 63 |

7-10-22-26-28-51-52-53-54 | Region 2 | 38.7 | 73 |

31-32-35-47-13 | Region 3 | 73.1 | 72 |

6 | 0.6 Region 1 | 67.0 | |

0.35 Region 3 | |||

25 | 0.45 Region 1 | 68.6 | |

0.51 Region 3 | |||

44 | 0.39 Region 1 | 69.2 | |

0.57 Region 3 | |||

56 | 0.33 Region 1 | 69.8 | |

0.64 Region 3 |

Assuming that the conclusion is correct, it must be represented in the sites in close proximity while having different MAPs. For instance, Sites 27 and 28 are adjacent in terms of longitude and latitude, whereas the MAP of Site 28, at 733 meters, is higher than that of Site 27, which is 490 meters. Consequently, it is anticipated that Site 28 has a shorter return period. Table 4 illustrates that the return period of Site 28 (38.7) is indeed shorter than that of Site 27 (63.4). Similar findings apply to Sites 54 and 58.

Moreover, an examination of Sites 25 and 31, which are geographically close and have similar MAPs (386 and 362, respectively), reveals return periods of approximately 68.6 and 73.1, respectively, with the former being slightly shorter. These comparisons substantiate the assertion that sites with higher MAPs undergo a shorter return period for the same drought event.

This claim can also be deducted through an analysis of the standard deviation of MAP, Longitude, and Latitude. As depicted in Table 5, the standard deviation of MAP for each group is significantly lower than that of Longitude and Latitude. This observation affirms that sites within a region are more closely associated with similar MAPs than with horizontal distances. It is crucial to emphasize that all inputs underwent normalization and standardization, exhibiting a standard deviation of 1 and an average of 0 before the clustering analysis.

Region . | MAP . | Longitude . | Latitude . |
---|---|---|---|

0.3610587 | 0.9044799 | 0.8587007 | |

0.3717357 | 0.7313328 | 0.9435776 | |

0.469 | 0.784707 | 0.947184 |

Region . | MAP . | Longitude . | Latitude . |
---|---|---|---|

0.3610587 | 0.9044799 | 0.8587007 | |

0.3717357 | 0.7313328 | 0.9435776 | |

0.469 | 0.784707 | 0.947184 |

## LIMITATIONS

In the course of this investigation, a substantial number of sites were identified that did not belong to any region. Consequently, there arises an imperative need to explore innovative strategies aimed at minimizing the exclusion of such sites.

It is crucial to emphasize that the reliability of this approach cannot be corroborated by at-site frequency analysis results, primarily due to the limited length of the at-site records. The short length of these records precludes conclusive confirmation of the obtained results.

To fortify the robustness of our model, alternative validation methods, including data-driven models and machine learning approaches, warrant exploration. Furthermore, expanding the application of this methodology to larger geographical scales, such as continental or global domains, would provide valuable insights. These propositions present intriguing avenues for future investigations, and their pursuit would contribute significantly to the refinement and generalizability of the findings.

## CONCLUSION

This study utilizes bivariate RFA to estimate the return period of a severe drought in Iran across three catchments (Karkheh, Karun, and Jarahi-Zohreh) with increased reliability. The fuzzy c-means clustering method was used to cluster 59 meteorological stations into regions, followed by bivariate discordancy and homogeneity tests using multivariate L-moments to investigate the homogeneity of the regions.

It was observed that the total number of discordant sites was higher for than for . It was similarly higher for compared to . Thus, as expected, increasing the number of clusters reduces the total number of eliminated sites. In fact, when fewer clusters are considered, there would be more sites that do not belong to any of the clusters, but it does not necessarily lead to more or fewer sites in each cluster. Although it may be expected that as the number of clusters increases, the size of regions decreases, it did not vary significantly for any clusters. Therefore, changing the number of clusters does not necessarily affect region size.

Only for the nc 3 clustering, all regions were homogeneous. The results showed that combining the initial clustering method and discordancy test does not necessarily lead to the formation of homogeneous regions, which is a serious weakness. In this study, the three-parameter Generalized Logistic and the five-parameter Wakeby were the best-fitted distribution functions for the regional drought duration and severity variables, respectively. Regarding copula and drought duration, the regions were homogeneous. However, it was not the case regarding the severity since no three-parameter marginal distribution function was fitted to the drought severity variable in any of the regions. In other words, satisfying the bivariate homogeneity index does not necessarily result in region homogeneity regarding the marginal distribution functions. However, the homogeneity of the marginal distribution function is still necessary. Clayton was the most appropriate copula function for all three regions.

The approach of regional frequency analysis, focusing on probability distribution similarities rather than physical proximity, has provided reliable estimations for each region. The distinct temporal patterns in return periods across Regions 1, 2, and 3 underline the variability of drought patterns.

Given the lower return period in Region 2 compared to Regions 1 and 3, it is deduced that sites with higher MAP and, correspondingly, higher elevation are more prone to experiencing shorter return periods of same drought events, in contrast to sites with lower MAP or lower elevation.

In summary, this study advances our understanding of regionalized drought patterns. Still, further investigations with extended data and alternative methodologies are necessary for a more comprehensive grasp. The findings hold practical significance for policymakers and researchers alike, underscoring the importance of considering regional nuances in drought risk assessments and water resource management strategies.

However, it’s crucial to acknowledge limitations, particularly the reliance on regional frequency analysis due to the short duration of at-site records. This underscores the need for future research with extended datasets to enhance the robustness of our conclusions.

## CODE AVAILABILITY

The code was written in R programming by the original author for the Masters thesis at Shahid Beheshti University in Iran. The availability of the code requires the consent of the other authors and the university.

## DATA AVAILABILITY STATEMENT

All relevant data are available in the supplementary material.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

‘co’ indicates covariance and ‘L’ indicates linear.

## REFERENCES

*Proceedings of the 8th Conference on Applied Climatology*Vol 17. Boston, pp. 179–183

*World Meteorological Organization Geneva, Switzerland*