Response of flood events to extreme precipitation: two case studies in Taihu Basin, China

Flood events are typically triggered by extreme precipitation in rain-dominant basins. In this study, to better understand the genetic mechanisms and characteristics of floods, copula functions are used to analyze the response of flood events to extreme precipitation. The coincidence probabilities of the typical extreme flood and precipitation events are calculated; different return periods of their arbitrary combinations are calculated, whereas the dangerous domains for flood control under different return periods are identified; furthermore, flood risk analysis under different extreme precipitation scenarios is performed via their conditional exceedance probabilities. The Xitiaoxi catchment (XC) and Dongtiaoxi catchment (DC) in the Zhexi Region of the Taihu Basin are selected as the study area. The results show that in four scenarios with precipitation frequencies of 80%, 90%, 93.33%, and 95%, the probabilities of the dangerous flood are 9.72%, 10.57%, 10.86%, and 11.01% in the XC, respectively, and 5.91%, 6.31%, 6.44%, and 6.51% in the DC, respectively. This study provides a practical basis and guidance for the computation of rainstorm designs, management of flood control safety, and water resource scheduling in the Taihu Basin.


INTRODUCTION
The application of copula in hydrology can be categorized into two primary types. The first is to analyze the frequency of the multifeature attributes of a certain hydrological phenomenon, such as the drought duration and intensity (Kwon & Lall ); flood duration, peak, and volume (Daneshkhah et al. ); and precipitation duration, intensity, and depth (Gao et al. ). The other is to study the combination of hydrological events at different time and space scales, such as the flood coincidence probability analysis for the mainstream and its tributaries (Gao et   However, these studies focused on the different feature attributes of a single meteorological or hydrological event in the Taihu Basin, whereas the joint distribution analysis of extreme meteorological and hydrological events has not yet been investigated. Meanwhile, the aforementioned studies pertaining to the Taihu Basin considers the average situation of the entire Taihu Basin as the study object; however, a targeted research regarding the Zhexi Region, i.e., the main water source of the Taihu Lake, has not been conducted.
Therefore, the main purpose of this study is to investigate the response of flood events to extreme precipitation in the Zhexi Region of the Taihu Basin through the joint risk analy- The technical roadmap of this study is shown in Figure 1.

Study area
The Xitiaoxi catchment (XC) and Dongtiaoxi catchment (DC) are located in one of the eight hydrological subregions of the Taihu Basin, i.e., the Zhexi Region. The upstream area of the Hengtangcun hydrometrical station in the XC and the upstream area of the Pingyao hydrometrical station in the DC were selected as the study area (Figure 2), whose drainage areas measured 1,359 km 2 and 1,420 km 2 , respectively. The Zhexi Region is located upstream of the Taihu Lake and is the subregion of the Taihu Basin with the highest rainfall. The inflow of the Taihu Lake from the Zhexi Region accounts for approximately 50% of the total inflow of this lake (Wang et al. ). The steep slopes and fast flow-

Dataset
The data used in this study included: (1) Daily precipitation data during the flood season (from 1965 to 2016) of 14 representative stations in the XC, including the Hengtangcun station, and 12 representative stations in the DC, including the Pingyao station.

Determination of marginal distribution
The marginal distributions of the four series were determined: (1) the maximum flood discharge during the flood season (MF) at the Hengtangcun station, (2) the areal average precipitation during the flood season (Pr) in the XC, (3) the MF at the Pingyao Station, and (4) the Pr in the DC.

Autocorrelation test
Autocorrelation refers to the degree of correlation of the same variable between two successive time intervals. Autocorrelation must be tested when analyzing a set of historical data to evaluate its randomness. The autocorrelation coefficient measures the relation between the lagged and original versions of the value in a time series and is expressed as where {x t } is the time series, n the sample size, x the mean of the time series, and h the lag.
In the autocorrelation plot, the upper and lower bounds for autocorrelation with significance level α are obtained as follows: where z is the cumulative distribution function (CDF) of the standard normal distribution, α the significance level, SE the standard error, and γ h the autocorrelation coefficient. If the autocorrelation is higher (lower) than this upper (lower) bound, the null hypothesis that no autocorrelation exists at and beyond a specified lag is rejected at a significance level of α.

Parameter estimation
The generalized extreme value (GEV), Gumbel, Pearson type III (P-III), gamma, normal, and log-normal distributions, which are widely used in the distribution of extreme hydrological events, were selected to fit the marginal distribution of the four series. We applied the maximum likelihood (ML) estimator to estimate the parameters, which were both intuitive and flexible. The log-likelihood function is where x 1 , x 2 , Á Á Á , x n are a number of observations; θ 1 , θ 2 , Á Á Á , θ k are parameters of the marginal distribution; and f( Á ) is the marginal distribution probability distribution function (PDF). The function L(Θ) is maximized over the parameter space Θ, which corresponds to solving

Goodness-of-fit test
The Kolmogorov-Smirnov (K-S) goodness-of-fit test was used to evaluate the performance of the candidate marginal distributions, whose statistic is expressed as where F(x i ) is the estimated CDF; F e (x i ) is the empirical CDF, which is calculated using the Gringorten plotting position formula expressed as where n is the sample size, and k is the ranking of the data set in the increasing order. The p-value for the K-S test was estimated using Miller's approximation (Miller ).
Moreover, the mean absolute error (MAE), probability plot correlation coefficient (PPCC), and deterministic coefficient (DMC) were used to further select the most suitable distribution. These methods are expressed as follows: where a i and b i are the empirical CDF and estimated CDF, respectively; a and b are the means of the empirical CDF and estimated CDF, respectively; and n is the number of samples. The smaller the value of MAE, the better is the fitting performance. The larger the values of the PCC and DMC, the better is the fitting performance.

Establishment of copula model
The copula model used to describe the dependence structure of the MF and Pr series in the study areas was constructed by joining their marginal distributions.

Dependence evaluation
The Spearman's rank correlation coefficient and Kendall rank correlation coefficient were used to measure the dependence between the MF and Pr series; they were computed as τ ¼ (number of concordant pairs) À (number of discordant pairs) n(n À 1)=2 , where r s is the Spearman's rank correlation coefficient; ρ is the Pearson correlation coefficient of the rank variables; rg X and rg Y are ranks; cov(rg X , rg Y ) is the covariance of the rank variables; σ rgX and σ rgY are the standard deviations of the rank variables; τ is the Kendall rank correlation coefficient; and n is the sample size.

Copula function and parameter estimation
A copula function is a multivariate CDF function that can solve the marginal distribution of each variable and connection function separately. For d-dimensional random variables where H is the multivariate CDF; The density h( Á ) of the multivariate distribution is expressed as The copula is unique if the marginals F i (u i ) are continuous.
The typically applied Archimedean copulas in hydrology, including four single-parameter copulas and four double-parameter copulas whose structures are more flexible, were employed in this study. They are defined in Table 1.
We used the inference function for margins (IFM) estimator to estimate the parameters of the copula model, which addressed the computational inefficiency of the ML estimator by performing the estimation in two steps. The first was the estimation of the marginal parameters, as described in the section on Determination of marginal distribution. Subsequently, the estimated marginal parameters were substituted into the log-likelihood function to obtain the estimated copula parameter.
The log-likelihood function of the joint distribution is where n is the sample size; Θ c is the parameter vector of the copula; andΘ 1 ,Θ 2 , Á Á Á ,Θ d are estimated parameter vectors of the marginal distribution.
The function L C is maximized over the parameter space Θ c to obtain the estimated copula parameterΘ c , which cor-

Goodness-of-fit test
The Cramér-von Mises test, which proved to be more effective than the K-S test (Mesfioui et al. ), was used to perform the goodness-of-fit test. The Cramér-von Mises statistic for copulas is expressed as where n is the sample size; C Θ is a specified parametric family of copulas; and Θ is the estimated parameters derived from the pseudo-observations U 1 , U 2 , Á Á Á , U d ; C e is the empirical copula expressed as The corresponding p-values were obtained via Monte Carlo methods, the detailed procedures of which have been provided by Genest et al. ().
Moreover, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) were applied to select the best-fitting copula model. The formulas for them are as follows: where L c is the likelihood function of the joint distribution; k is the number of parameters; and n is the sample size. Generally, the smaller the AIC and BIC values, the better the distribution fits.

Return periods and dangerous domains
The return period indicates the average time between the concurrence of two hydrological events and is critical for flood prevention and mitigation work. Three types of bivariate return periods, including the joint return period (T or ), cooccurrence return period (T and ), and secondary return period (T sec ), are discussed here.
T or refers to one of the MF (X) or Pr (Y) that is greater than or equal to a certain value, as well as the average interval required for each occurrence. T and refers to both the MF (X) or Pr (Y) that are greater than or equal to a certain value, as well as the average interval required for each occurrence.
T or and T and are also known as the primary return periods, whose formulas are expressed as where x and y are thresholds of X and Y, respectively; F X (x) and F Y (y) are the marginal CDFs of the MF and Pr, respectively; C( Á ) is the copula CDF; μ T is the average inter-arrival time between two successive events (μ T ¼ 1 for the maximum annual events).
T sec , also known as the Kendall's return period, describes the probability of occurrence of an event in the area over the copula level curve of value t (Salvadori & Michele ), and it can be expressed as where t ∈ I is the probability level; K C is Kendall's distribution, which is the distribution of random variable . For Archimedean copulas, K C is expressed as where φ 0 (t þ ) is the right derivative of the additive generator function φ(t), as presented in Table 1. (1) Under T or : where at least one of the components exceeds a prescribed threshold, as shown in the gray-shaded area in Figure 3(a).
(2) Under T and : where both the components exceed a prescribed threshold, as shown in the gray-shaded area in Figure 3(b).
(3) Under T sec : For a bivariate distribution H ¼ C(F x , F y ) and t ∈ (0, 1), the critical line L H t of level t is defined as Evidently, for any vector of thresholds (x Ã , y Ã ), a unique critical line L H t exists. Hence, the dangerous domain identified under T sec is as shown in the gray-shaded area in Figure 3(c).
The definition of the multivariate return periods T or and T and is limited in the identification of dangerous domains, whereas T sec based on the Kendall measure is more rational.
An advantage of T sec is that thresholds lying over the same critical line always generate the same dangerous domain; however, this does not apply when considering T or and T and , as illustrated in detail in Figure 3 and Table 2.

Conditional exceedance probability
If an appropriate copula function is estimated, then the conditional joint distribution can be obtained. The conditional CDF of X x for Y ¼ y can be expressed as Hence, the conditional exceedance probability of X > x for Y ¼ y can be expressed as

Determination of marginal distributions
We first tested the randomness of the variables by computing their autocorrelations. The autocorrelation plots of the MF and Pr series in the XC and DC are shown in Figure 4.
As shown, the autocorrelations in the four series can be disregarded at the 5% significance level. Therefore, their univariate distribution fitting can be performed.
We used six distribution models (the GEV, Gumbel, P-III, gamma, normal, and log-normal distributions) to fit The K-S test, MAE, PPCC, and DMC were used to evaluate the performances of six candidate distribution models, and the results are shown in Table 3. The p-values for all  Figure 3(d), the joint return period of B 1 is larger than that of A 1 ; but part of the dangerous domain identified by B 1 is outside that identified by A 1 (Rectangle A 1 GIJ). Hence, hydrological events in Rectangle A 1 GIJ are considered dangerous by a large return period point B 1 but safe by a small return period point A 1 , which is unreasonable.
As shown in Figure 3(e), the co-occurrence return period of B 2 is larger than that of A 2 ; but part of the dangerous domain identified by B 2 is outside that identified by A 2 (Rectangle B 2 POM). Hence, hydrological events in rectangle B 2 POM are considered dangerous by a large return period point B 2 but safe by a small return period point A 2 , which is unreasonable. Figure 3(f), any point on the same secondary return period contour line has the same dangerous domain. The larger the return period, the smaller the dangerous domain. The dangerous domain with a smaller return period necessarily covers that with a larger return period. Compared with T or and T and , this division of dangerous domain is more reasonable and conducive to the flood risk assessment.
A r e aB 3 TRU were greater than 0.05, indicating that all the candidate distributions were suitable for fitting the distribution of the MF and Pr series at the 5% significance level. It can be concluded that the MF and Pr series in the XC fitted the gamma distribution the best, whereas the MF and Pr series in the DC fitted the GEV and gamma distributions the best, respectively.

Establishment of copula model
We measured the dependence between the MF and Pr series to determine whether it is appropriate to establish their copula model. The Kendall rank correlation coefficients between the MF and Pr series in the XC and DC were 0.327 and 0.403, respectively, which were greater than the threshold at the 5% significance level, i.e. 0.273. The Spearman's rank correlation coefficients between the MF and Pr series in the XC and DC were 0.480 and 0.569, respectively, which were greater than the threshold at the 5% significance level, i.e., 0.297. This signified that the null hypothesis at the 5% significance level was rejected and a correlation existed between the two variables. Therefore, it is feasible to use the copula function to establish the joint probability distribution between them.
The parameters of eight candidate copulas estimated using the IFM method and their goodness-of-fit tests through the Cramér-von Mises test, AIC, and BIC are shown in Table 4. As shown, all the p-values were greater than 0.05, indicating that all candidate copulas can be applied to fit the dependence structure between the MF and Pr series.
The Joe-Frank copula performed the best for modeling the dependence structure of the MF and Pr series in the XC, whereas the Clayton copula performed the best for modeling the dependence structure of the MF and Pr series in the DC.

Coincidence probability
The 80th, 90th, 93.33th, and 95th quantiles of the Pr series were selected as the typical extreme precipitation events in both the XC and the DC, and they were 1,136, 1,245, 1,304,  The situation when the MF does not exceed the safety discharge is defined as a safe flood, which is beneficial for flood control; meanwhile, the situation when the MF is greater than or equal to the warning discharge is defined as a dangerous flood, which is adverse to flood control.
Using the established copula models, the coincidence probabilities of the typical MF and Pr events in the XC and DC were calculated to study the joint risk of the two extreme events, the results of which are shown in Table 5.
The following were observed in Table 5: (1) For cases with the same MF, with the increase in the Pr, the marginal distribution probability and coincidence probability of the Pr decreased. Similarly, for cases with the same Pr, as the MF increased, the marginal distribution probability and coincidence probability of the MF decreased.
(2) As the MF increased, the sensitivity of its coincidence probability to changes in the Pr increased. Similarly, as the Pr increased, the sensitivity of its coincidence probability to changes in the MF increased.
Hence, we conclude that in the XC, the coincidence probabilities of encountering dangerous floods with the Pr frequency exceeding 80%, 90%, 93.33%, and 95% were 22.80%, 13.69%, 10.69%, and 9.19%, respectively. In the DC, the coincidence probabilities of encountering dangerous floods with the Pr frequency exceeding 80%, 90%, 93.33%, and 95% were 21.84%, 12.39%, 9.26%, and 7.70%, respectively. Notes: P-value > 0.05 indicates that the estimated copula can be accepted at the 5% significance level. The smaller the value of the AIC and BIC, the better the distribution fits. The most appropriate copulas are indicated in boldface.

Return periods and dangerous domains
The bivariate return periods of various combinations of the flood and extreme precipitation are more effective for actual flood control and management than univariate return period analysis; therefore, the joint return, co-occurrence return, and secondary return periods of the joint distribution of the MF and Pr series in the XC and DC were calculated.
The contour maps of different return periods are shown in Figure 7, and some important values are shown in Table 6.
If the MF and Pr are known, one can determine their T or , T and , and T sec based on Figure 7. Moreover, using the method described in the section on Return periods and dangerous domains the dangerous domains of the thresholds under different return periods can be identified.
The following were obtained based on Table 6 and Figure 7: (1) The return periods satisfied the following two theoretical inequalities: T or < T sec < T and , where T un1 and T un2 are the univariate return periods.

).
(2) The higher the return period, the greater was the differences among T or , T and , and T sec .
(3) The return periods of the XC and DC were not significantly different, indicating that the joint frequencies of the MF and Pr in the two catchments were the same.
However, under the same univariate return period, T or in the XC was slightly larger than that in the DC, whereas T and and T sec were slightly smaller than those in the DC.
The contour map trend of the return periods was consistent those of previous studies (Bezak et al. ; Fan et al. ), indicating that the results were reasonable.

Flood risk under specified precipitation scenarios
In practice, water resource managers are more interested in  Table 7, and their contour maps are shown in Figure 8. The following were obtained based on Table 7 and (1) Under the same Pr scenario, as the MF increased, the conditional exceedance probability decreased.
(2) For the same MF, the conditional exceedance probability in the heavy precipitation scenario was greater than that in the small precipitation scenario, i.e., the conditional exceedance probabilities of the MF increased with the amount of Pr.  (3) These conditional exceedance probabilities enabled the possibility of certain flood discharges under different precipitation scenarios to be identified. Using the XC for example, for Pr ¼ 1,136 mm in the future, the probability of MF >800 m 3 /s is 61.89%.
The contour map trend of the conditional exceedance probabilities was consistent with that of a previous study (Guo et al. ), indicating the results were reasonable.
The contributions of this study are as follows: (1) We focused on the flood risk in the main water source of Taihu Lake, i.e. the Zhexi Region, which had a direct  This region had not been emphasized in previous studies, whereas it is vital for flood prevention and drainage work.
(2) Extreme hydrological and meteorological events (i.e., The main conclusions were as follows: (1) According to the K-S test, MAE, PPCC, and DMC, both the MF and Pr in the XC fitted the gamma distribution the best, whereas the MF and Pr in the DC fitted the GEV and gamma distributions the best, respectively.
Based on the Cramér-von Mises test, AIC, and BIC, the Joe-Frank copula and Clayton copula were selected as the most appropriate copulas to fit the joint distribution of the MF and Pr in the XC and DC, respectively.
Water resources managers can identify the corresponding flood risk information for different flood and precipitation events from the contour map of return periods ( Figure 7). Furthermore, the probability of a certain flood occurring under various precipitation scenarios can be assessed based on the contour map of conditional exceedance probabilities (Figure 8).
The results of this study can provide a practical basis and guidance for the computation of rainstorm designs, management of flood control safety, and water resource scheduling in the Taihu Basin. The framework designed in this study for analyzing the response of flood events to extreme precipitation can be applied to similar river basins. Furthermore, the flood generation mechanism has changed owing to climate change and human activities.
Therefore, for future studies, we plan to perform flood frequency analysis under changed flood generation mechanisms to identify the effects of climate change and human activities on floods, as well as to provide a more practical reference for flood control.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.