Multivariate groundwater drought analysis using copulas

Drought characteristics are among major inputs in the planning and management of water resources. Although numerous studies on probabilistic aspects of meteorological drought characteristics and their joint distribution functions have been reported, multivariate analysis of groundwater (GW) drought is rarely available. In this paper, while proposing a framework for statistical analysis of disturbed hydrological systems, copula-based multivariate GW drought analysis was performed in an over-drafted aquifer. For this purpose, a 1,000-year synthetic time series of naturalized GW level was produced. GW drought was monitored via the SGI index while the multivariate GW drought probability and return period were determined via copulas. Comparison between the copula and empirical GW drought probabilities using statistical goodness-of- ﬁ t tests proved suf ﬁ cient accuracy of copula models in multivariate drought analysis. The results showed strong dependence among GW drought characteristics. Generally speaking, multivariate GW drought analysis incorporates major drought characteristics and provides concrete scienti ﬁ c basis for planning drought management strategies.


INTRODUCTION
Drought is a natural hazard affecting society, economy, environment, agriculture, and industry. Groundwater (GW) drought is a type of drought that adversely affects the GW resources and leads to lack of public water supply (Bloomfield & Marchant ).
There are a number of indices for GW drought monitoring. In particular, one of the indices whose variants has been used to describe GW drought is the Standardized Precipitation Index (SPI). Although SPI is preliminarily proposed to monitor meteorological drought, adoption of its concept to GW drought monitoring has increased. In some studies, GW drought monitoring with this index was questioned because of its pitfalls and limitations (e.g. Kummar et al.
The SGI is calculated via a non-parametric normal score convert of GW level. They calculated the SGI in their case study and compared the SGI results with those of the SPI.
They found that there is a good conformation between SGI time series and formerly independently documented GW droughts. Also, relations between SGI and SPI were dependent to a range of SPI accumulations.
Li & Rodell () calculated the GW Drought Index (GWI) on the basis of GW storage simulated by the Catchment Land Surface Model (CLSM) and compared the results to those computed based on the observation boreholes data. They found that the GWI derived from CLSM had a strong relationship with the GWI calculated from the observation boreholes data. The authors also found high correlation between both GWIs with the meteorological SPI-12 and SPI-24 drought indices.
Since droughts are complicated natural events, using a single monitoring index cannot provide a comprehensive estimate (Shiau et al. ). In several studies, multivariate parameters were used in drought analysis in lieu of using one variable. Multivariate drought analysis provides important information for water resources management and planning (Chen et al. ; Ma et al. ). For this purpose, copula functions were adopted. In recent years, the use of copula functions in multivariate hydrologic analysis has increased significantly. Numerous studies on flood analysis (Genest et al. ; Shiau et al. ), rainfall frequency analysis (Grimaldi & Serinaldi ), storm analysis (Salvadori & De Michele ), and fitting drought characteristics (Shiau )  () used bivariate drought characteristics including drought severity and duration via copulas and calculated the bivariate probability and return period of hydrological droughts. They found that drought severity was well correlated with drought duration. In another study, Chen et al.
() calculated meteorological drought characteristics including drought severity, duration, interval, and the minimum SPI. Then, they used several copulas to construct multivariate joint distributions and calculated drought probability and return period. They found that the Student copula was more suitable in drought analysis. Saghafian & Mehdikhani () monitored meteorological drought via the SPI index. They calculated joint distribution functions of duration, severity and peak of drought characteristics.
Six copulas were tested and the best-fitted copula was selected. It was indicated that copulas were useful to describe the correspondence between drought variables.
They also calculated the copula-based three-variate drought return period. Hao et al. () defined the drought in Southwest China by the 3-month Standardized Precipitation Evapotranspiration Index (SPEI). They used multivariate drought characteristics including drought duration, severity and peak via copulas and calculated the multivariate probability and return period of hydrological droughts. They showed that the Student t copula was a powerful function in drought analysis. Dehghani et al. () used Archimedean copulas to model the dependence structure between the hydrological drought index (SPI) and the meteorological drought index (SHDI). They found that copulas were appropriate in drought class transition forecasting.
As pointed out, many studies have reported on multivariate meteorological drought statistical analysis.
However, similar studies on GW drought are lacking.
Most of the work on GW drought has been performed based on observed GW level in plains with no or minimal abstraction (e.g. Bloomfield et al. ) and no multivariate drought analysis has been reported to our knowledge. However, in regions with high abstractions, aquifers are intensively influenced by human activities whereas the definition of drought concerns natural variability. Thus, GW drought may be performed on the aquifer time series after removing the impact of human interventions. Moreover, the duration/frequency of GW droughts are much longer/ lower than those of meteorological droughts. As a result, naturalized aquifer time series generation for GW drought frequency analysis is necessary.
Dealing with GW drought, this study was apart from previous studies in that it conducted a number of interrelated tasks involving naturalization of groundwater level, generation of GW level long time series required for GW drought analysis in an over-drafted aquifer, GW drought monitoring via SGI index, selecting the best family copula based on GW drought characteristics, multivariate GW drought analysis, and comparison with univariate analysis.
For this purpose, a framework was proposed for multivariate drought analysis in a disturbed hydrological system using copulas. Based on this framework, a 1,000-year artificial time series of monthly naturalized GW level was generated via the MODFLOW-ANN modeling framework as proposed by Sanginabadi et al. (). They monitored GW drought using SGI as determined based on the time series of naturalized GW level while drought characteristics including duration, severity and peak were determined via frequency analysis. The joint drought duration, severity, and peak were analyzed via copula functions and multivariate return period analysis of GW drought was performed.

STUDY AREA AND DATA
Qazvin Plain is one of the largest aquifers in Iran, situated in the province of Qazvin along the southern Alborz Mountains (Figure 1). Qazvin Plain water withdrawal began in 1963. The aquifer is one of unconfined single layer.
There are 44 precipitation gauges spread over the plain.
The plain's average monthly and annual precipitation is presented in Table 1.
The mean annual precipitation over the plain is 256.6 mm while the mean annual temperature is 13.5 C.
The climate of the region is arid to cold semi-arid follow-

METHODS
The framework that we propose for multivariate drought analysis in a disturbed hydrological system is depicted in Figure 2.
The main requirements in this framework are the hydrometeorological observed data in order to produce the naturalized state of the system. Also, a long-time series of naturalized state of the system is needed for proper drought analysis. Therefore, the generation of a synthetic time series of the groundwater level is often inevitable. In the following, drought monitoring is performed based on the naturalized condition so that the drought characteristics may be determined. As multivariate drought may be conducted via copulas, selection of the best copula family and copula type was to be performed based on the type of the hydrological system, drought properties, and goodness-of-fit criteria. The procedure is described below.

Naturalization of GW level
Drought is caused by natural variability in hydro-climatological time series and must be analyzed via natural series.
Since GW level time series is severely influenced by water withdrawals, the GW level should first be naturalized for drought analysis purposes. Naturalized GW level is the level that is formed due to natural meteorological and hydrological drivers in the absence of anthropogenic effects; i.e. They discretized the plain into a total of 9,028 cells, 1,000 × 1,000 m 2 , while the boundaries were defined following the water table contour map. Parallel boundaries with the contours were determined as time-variant specified head boundaries. Others that were perpendicular to the contours were defined as no-flow boundaries. River network characteristics were also collected and input to the model. River surface water head was determined via stage-discharge  A 1,000-year period was generated by ANN by setting the abstraction to zero; thus producing the naturalized spatially-averaged GW level.
In this study, using the results provided by Sanginabadi et al. (), the naturalized GW level time series over a 1,000-year period was adopted for further drought analysis as discussed below.

Precipitation analysis
Monthly precipitation time series were obtained for 44 precipitation gauges over a 50-year period from 1966 to 2016.
Then, Spearman's correlation coefficient was estimated for observed precipitation records. The P value of the Spearman's correlation between two successive months is presented in Table 2 and shows that the precipitation of two successive months are independent, at 5% significant level. So, synthetic monthly precipitations were generated via assuming random independent variables.
Next, for each month, 48 statistical distribution functions were fitted to the monthly precipitation time series which showed that Gamma was the best distribution.
Gamma distribution has been strongly recommended in previous studies due to its flexibility in describing rainfall data year monthly precipitation was generated following the calculated distribution parameters.

Streamflow analysis
For generating a 1,000-year synthetic time series of monthly streamflow, the time series of historic monthly precipitation (spatial average) and monthly streamflow were considered over the 1966-1999 period. In this period, streamflow series did not indicate a trend. Then, an ANN model was trained with historic precipitation as input and monthly corresponding streamflow volume as output. Several ANN architectures and transfer functions were tested under feed-forward standard back propagation training.
Next, synthetic monthly streamflow time series for 1,000 years was generated via the trained ANN model using the 1,000-year generated precipitation as input.
Monthly evapotranspiration is an input to the MOD-

GW drought index
In this study, GW drought was monitored via the SGI index. Anderson-Darling, and chi-square tests and the best distribution was identified for each month.

Drought characteristics
Studied drought characteristics are duration, severity, and peak as defined based on the run theory proposed by Yevjevich (1967). Drought duration (denoted by D in Figure 3) is usually defined as the time between the initiation and termination time of a drought. Drought severity (denoted by S in Figure 3) is the cumulative deficiency of a single drought event below a critical (threshold) level.
Drought peak is the maximum absolute value of X t below the critical level during a drought event (minimum SGI values). Drought characteristics are shown in Figure 3 in which X 0 is a threshold below which the drought occurs

Copula functions
A copula is a multivariate distribution defined on the unit hypercube According to Sklar (), if there are n random variables, X 1 , X 2 ,…, X n with marginal distribution F 1 (x 1 ), F 2 (x 2 ),…, F n (x n ), then the n-dimensional joint cumulative distribution function is determined as follows: where C: [0,1] n →[0,1] is a copula.
There are many copula types varying on tail dependence and symmetry. Generally, copulas have been divided into three families: Archimedean, Elliptical, and other types such as EFGM copulas and Extreme Value copulas (Li et al. ). The properties of different copula types are listed in Table 3.
Among all copulas, those frequently used are Gumbel, Frank, and Clayton from the Archimedean family as well as t-copula and Normal copula from the Elliptical family (Li et al. ).
In drought hydrology, Archimedean and Elliptical copu- If the marginal distribution functions are continuous, the copula function C is unique. Thus, the multivariate probability density function is calculated via individual probability density and copula functions, as expressed by: where F k (x k ) ¼ u k for k ¼ 1,…,n.

Copula parameter determination
The parameters of copulas may be calculated via the maximum pseudo-likelihood method (Chen & Guo ). In this method, the parametric marginal distributions are substituted via empirical or rank-based marginal distributions.
The copula parameters are estimated via the maximization of pseudo log-likelihood function. Given a random sample {(X i1 , X i2 , . . . , X id ): i ¼ 1,…, n}, under {C θ ; θ ∈ Θ ⊆ R p }, the procedure involves calculating the pseudo log-likelihood as follows: in whichÛ id ¼ R id n þ 1 , and R id is the Ranked data of X id Then, copula parameter is calculated as: R software packages were applied to calculate the copula parameters in this study.

Goodness of fit tests
The log likelihood function was used to select the best copula via the Akaike information criteria (AIC) and Bayesian information criteria (BIC) through the following equations (Akaike ): where L(θ) is the Likelihood function, k is the number of copula parameters, and n is the number of data. The copula with minimum AIC or BIC is the best copula (Akaike ).
Moreover, two goodness-of-fit tests based on empirical copula, including SnB and SnC criteria (Genest et al.

)
, were applied in order to assess the copulas. In this research, R software packages were used to calculate these criteria.

Drought return period
The univariate drought return period may be determined as follows (Shiau ; Chen et al. ): where F D (d), F S (s) and F P (p) are the cumulative marginal distribution function of drought duration, severity and peak, respectively. E(I ) is the expected drought inter-arrival time.
Since drought is a natural event, univariate frequency analysis may lead to erroneous results in drought risk calculations. So, drought should be considered as a multivariate phenomenon specified by drought duration, drought severity, and drought peak (Saghafian & Mehdikhani ). Furthermore, a system failure may occur when one or more characteristics of drought are less than one or more certain threshold(s). Thus, determination of drought risk and drought return period based on multivariate analysis is for drought prediction and decision making. Also, drought return period has been studied in previous studies So, the multivariate return period of droughts are calculated in two shapes: return period for D ! d or S ! s or P ! p, T or , and return period for D ! d and S ! s and P ! p, T and , as below: According to Equations (1) and (11)

GW level time series naturalization and generation
In this study, a 1,000-year time series of naturalized GW level was generated. For this purpose, a 1,000-year time series of input data to the MODFLOW-ANN model is required. First, such data was generated for the precipitation via the procedure mentioned previously.
Results showed that the Gamma distribution was the best distribution to fit the monthly precipitation data that satisfies the Kolmogorov-Smirnov and chi-squared tests at p ¼ 0.05 significance level. Then, based on the fitted Gamma distributions for each month, 1,000 random time series for each calendar month were generated. As Table 4 shows, the statistical characteristics of observed precipitation data are preserved in the generation process. The results also showed that the trained precipitation-  (Table 6).
According to Table 6, GW drought duration varied from two months to 20 years in Qazvin Plain while drought severity varied widely.
The average time between two consecutive GW droughts was 26 months. Comparing this time with the average drought duration in Table 6 indicates that the aquifer was in dry condition for about 50% of the time.
In this study, 60 distributions were fitted on drought characteristics and the best distribution was selected via  Table 7.
Figures 5-7 indicate that GW droughts with smaller duration, severity and peak have more probability. Also, the P-P plots in these figures show that selected distributions fit the drought characteristics well.

Copula functions
To select the appropriate copula and determine the probability of GW drought, it is necessary to evaluate the dependence between drought duration, severity and peak.
According to Table 8, GW drought characteristics are strongly dependent.
Moreover, the form of dependence between drought characteristics may be studied using the pairwise scatter-   duration-peak and severity-peak. Duration-severity and duration-peak have upper tail dependence while severity-peak has lower tail dependence. Besides, groundwater level is a continuous variable and its structure is generally asymmetric. So, for groundwater drought analysis, it is necessary to use copula families that have the ability of positive dependence, low to high tail dependence, and asymmetry structure. Accordingly, this study took advantage   of Archimedean and Elliptical families of copula to cover these specifications (Table 3). Five copulas of these families, including Frank, Gumbel, Clayton, Normal, and Student t, were applied.
The values of parameter θ, AIC and BIC for durationseverity, severity-peak and duration-peak pairs corresponding to five tested copulas are presented in Table 9, showing that for duration-peak, duration-severity, severitypeak, and duration-severity-peak the values of the AIC and BIC criteria of Frank, Student t, Normal, and Normal copulas are less than those of other copulas, respectively. Thus,    The results are shown in Table 10. Tables 9 and 10 imply that Student t, Student t, Normal, and Normal copulas were the best for duration-severity, duration-peak, severitypeak, and duration-severity-peak analysis, respectively.
Hence, these copulas were selected and used to determine the dependence structure among drought characteristics.
In the next step, the joint probability distribution of GW drought characteristics was estimated. For example, Figure 9 shows the duration-severity cumulative and density probability distribution. To evaluate the proposed model, three-variate empirical probabilities of historic GW drought events- were compared with the derived copula probabilities. The empirical probability is the ratio of the number of the outcomes (droughts with to the total number of trials (total droughts). Derived copula probabilities were calculated using selected copulas and Equation (13). In order to measure the copula accuracy, R 2 , root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), and mean absolute error (MAE) criteria between the empirical and copula-derived probabilities were computed as presented in Table 11. Values of the criteria in Table 11 indicate sufficient accuracy of the copula model. Furthermore, the duration, peak, severity, and return period relationships were determined. GW drought duration, severity and peak corresponding to various univariate and multivariate return periods are listed in Table 12.
Comparison among univariate and multivariate drought return periods in Table 12 indicates that all univariate return periods fall between the corresponding multivariate return   (11) and (12).
Multivariate joint distribution of GW drought involves all major drought aspects. Since for a given drought duration, severity and peak, the difference between the univariate   Table 12 indicates that the multivariate joint GW drought return  period grows larger as the duration, severity or peak increase.
For the same value of duration, severity and peak, T and (Equation (15)) is greater than T or (Equation (14)); the difference significantly grows with an increase in drought duration, severity and peak.
In Figure 10 the relationship between duration, severity and return period of GW drought using three-variable copula function is shown. The uniform distance between contours in Figure 10 indicates regular changes in the drought return period with changes in drought severity and duration.
Also, in Figures 11-13, the relationship between severity, peak, and return period of GW drought corresponding to droughts with two, five and 10-year duration via threevariate copula are shown.
According to Figures 10-13, the joint return period grows larger as the duration or severity increases. Thus, GW droughts with higher severity or longer duration have greater return periods. Moreover, comparison among Figures 11-13 indicates that the contour curvature of the severity-peakreturn period are similar for different GW drought durations.
According to the naturalized GW SGI, the aquifer experienced four periods of GW drought over the 40-year period.
The characteristics of such droughts including drought duration, severity, and peak are presented in The results of GW drought analysis conducted in this study indicated that the multivariate approach is more comprehensive compared with the univariate analysis.
Accordingly, the multivariate estimation of probability and return period provide helpful tools in drought prediction and decision making.

CONCLUSIONS
GW drought is a natural event that has a severe impact on the society, economy, environment, and industry. The impact is much more severe in semi-arid and arid regions where most water demand is met by GW, a major supply source that is under-studied in terms of drought quantification. Furthermore, droughts have various characteristics, the main of which are duration, severity, and peak. As a result, GW drought analysis must incorporate multivariate drought aspects.
In this research, we proposed a framework for multivariate GW drought analysis in a disturbed hydrological system using copulas, case study Qazvin Plain, Iran. For this purpose, a 1,000-year long naturalized GW level time series was generated in order to monitor GW drought via an SGI index. Furthermore, duration, severity and peak drought characteristics were studied via frequency analysis.
Univariate and multivariate probability and return period of historical GW drought were determined and evaluated via copula functions. The main conclusions of this study are as follows: 1. A substantial drop has occurred in the GW level of Ghazvin plain during the past 50 years due to substantial overdraft.  months under all conditions of severity is fairly high in comparison with other duration-severity combinations.
The results of multivariate GW drought analysis revealed that prolonged GW drought is to be expected.
3. Based on the GW drought characteristics, Elliptical family copulas were found suitable copula types in multivariate GW drought analysis.
4. Strong dependence was found among GW drought characteristics. Based on AIC, BIC and SnB and SnC criteria, Student t, Student t and Normal were the best copula models to characterize the dependence structure for the pairs of duration-peak, duration-severity and severity-peak, respectively.

Comparison between the copula and empirical GW
drought probabilities via statistical goodness-of-fit criteria indicated sufficient accuracy of the fitted copula models.
Thus, copula was found as a powerful tool for multivariate GW drought analysis.
6. For a given drought duration, severity and peak, the difference between the univariate return period and the corresponding multivariate drought is significant. It is expected that univariate GW drought analysis introduces errors in drought quantification and thus in mitigation measures.
7. Multivariate analysis of GW drought based on the proposed framework evaluates GW drought more comprehensively as compared with the univariate analysis. The multivariate analysis provides essential information in drought prediction and decision making.