Flood event consists of peak discharge and flood volume that are mutually correlated and can be described by a copula function. For a given bivariate joint distribution, a choice of design return period will lead to infinite combinations of peak discharge and flood volume. A boundary identification method is developed to define the feasible ranges of flood peak and volume suitable for combination, and two combination methods, i.e., equivalent frequency combination (EFC) method and conditional expectation combination method for estimating unique bivariate flood quantiles are also proposed. Monte Carlo simulation method is used to evaluate the performance of these combination methods. The Geheyan reservoir in China was selected as case study. It is shown that the joint design values estimated by the two proposed combination methods are both within the feasible range, which means that the methods could be selected for designing unique flood quantiles. The proposed bivariate combination methods are also compared with univariate method, and the reservoir water level estimated by EFC method is higher than the other methods, which means the EFC method is safer for reservoir design. The developed approach provides an applicable way for the identification of feasible range and flood quantile estimation.
INTRODUCTION
Conventional hydrologic frequency analysis has mainly focused on a single characteristic value such as peak discharge or flood volume by using univariate distribution. However, many hydrological events are characterized by the joint behavior of several correlated random variables; for example, flood hydrograph can be described in terms of flood peak, volume, and duration. When these correlated random variables are relevant to risk analyses or design purposes, the joint study of their probabilistic characteristics leads to a better understanding of the phenomena under investigation and avoids the over- or underestimation of the risks that would be produced by univariate analyses (Salvadori & De Michele 2004; Xiao et al. 2009; Gräler et al. 2013).
Many multivariate analysis methods and case studies have been proposed and discussed in the past decades. These methods have been applied mainly to the analysis of rainfall storms (De Michele & Salvadori 2003; Zhang et al. 2012; Sun et al. 2013; Abdul & Zeephongsekul 2014), flood frequency (Chowdhary et al. 2011; Chen et al. 2012; Sraj et al. 2015), drought events (Chen et al. 2013a, 2013b; Ganguli 2014; Salvadori & De Michele 2010, 2015), regional frequency analysis (Chebana & Ouarda 2009; Ben et al. 2015; Zhang et al. 2015a), and synthetic flood hydrograph (Gräler et al. 2013; Salvadori et al. 2013).
These multivariate hydrological frequency analysis studies have mainly concentrated on the following aspects: (1) showing the importance and explaining the usefulness of the multivariate framework; (2) fitting the appropriate multivariate distribution (copula and marginal distributions) and estimating the corresponding parameters; (3) estimating multivariate return periods and quantile values. Of these three aspects, the third one, i.e., the question of estimating multivariate return periods and quantile values, still has not yet been properly addressed in the multivariate frequency analysis framework (Chebana & Ouarda 2011), although the estimation of return periods and quantile values has been extensively studied in univariate hydrological frequency analysis.
One of the main difficulties in the multivariate quantile estimation is how to choose the proper combinations of design values of the concerned random variables for a given multivariate return period of hydrologic structure design. Take the bivariate case (peak discharge Q and flood volume W) as an example. The combinations can differ greatly in terms of their values: moving along the multivariate quantile curve to an asymptote, one of the two variables will approach its marginal value, while the other tends to indefinitely increase (for unbounded random variables). Chebana & Ouarda (2011) proposed the decomposition of the level curve into a naive part (tail) and the proper part (central); they assumed that the naive part was composed of two segments starting at the end of each extremity of the proper part. Salvadori et al. (2011) introduced two basic design realizations, i.e., component-wise excess design realization and most-likely design realization. Volpi & Fiori (2012) identified a subset of the critical combinations set that includes a fixed and arbitrarily chosen percentage in probability of the events, on the basis of their probability of occurrence.
Following the ideas of Chebana & Ouarda (2011) and Volpi & Fiori (2012), a developed approach to identify the feasible ranges of flood peak and flood volume combination is suggested. Two combination methods, i.e., the equivalent frequency combination (EFC) method and the conditional expectation combination (CEC) method, are developed to derive the quantiles of flood peak and flood volume for given multivariate return periods. The Geheyan reservoir in China was selected as a case study. The design flood quantiles and design flood hydrographs (DFHs) were derived by bivariate combination methods and the univariate distribution method, respectively. The remainder of this paper is organized as follows. The ‘Methods’ section gives the methodology used to derive the feasible range and bivariate combination methods. In the ‘Statistical procedure’ section, we present the statistical procedures used for assessing the proposed combination methods. The ‘Application’ section addresses a case study of China's Geheyan Reservoir. Finally, the ‘Conclusions’ section summarizes the main results and provides necessary explanations.
METHODS
In this study, we propose a developed approach for estimating bivariate feasible ranges of flood peak and flood volume suitable for combination in the critical level curve. To derive the feasible range, a boundary identification method is suggested, which was inspired by the ideas of Chebana & Ouarda (2011) and Volpi & Fiori (2012). Two combination methods for estimating unique bivariate flood quantiles, i.e., the EFC method and the CEC method, are proposed based on the assumption of the relationship between u and v (or q and w).
Copula functions
Different families of copula functions have been proposed and described by Nelsen (2006) and Salvadori et al. (2007). Of all the copula families, the Archimedean family is more desirable for hydrological analyses because it can be easily constructed and can be applied to whether the correlation among the hydrological variables is positive or negative (Nelsen 2006; Chen et al. 2014a). A large variety of copulas belong to this family. Four one-parameter Archimedean copulas, including the Gumbel–Hougaard (G–H), Ali–Mikhai–Haq, Frank and Cook–Johnson copulas, have been applied in frequency analysis by many authors (Zhang & Singh 2006; Salvadori et al. 2007; Chen et al. 2014b; Liu et al. 2015).
Bivariate return period
In the conventional univariate analysis, flood events of interest are often defined by return periods. In the bivariate domain, however, it is still discussed by the community as to which method is most suitable to transform the joint exceedance probability to a bivariate joint return period (JRP). Different JPRs estimated by copula function have been developed for the case of a bivariate flood frequency analysis. Eight types of possible joint events were presented by Salvadori & De Michele (2004) using ‘OR’ and ‘AND’ operators, of which two cases are of the greatest interest in hydrological applications (Shiau 2003; Salvadori & De Michele 2004):
Different definitions of the multivariate return period are available in the literature, based on regression analysis, bivariate conditional distributions, survival Kendall distribution function, and structure performance function. For instance, some studies have focused on a structure-based return period for the design and/or risk assessment of hydrological structures in a bivariate environment (Requena et al. 2013; Volpi & Fiori 2014). A comprehensive review of the JRP estimation methods was given by De Michele et al. (2013), Serinaldi (2015), and Volpi et al. (2015), and references therein.
The OR return period given in Equation (4) has been extensively applied in multivariate hydrological frequency analysis (e.g., Shiau 2003; Salvadori & De Michele 2004; Chebana & Ouarda 2011; Volpi & Fiori 2012; Li et al. 2013; Liu et al. 2015; Sraj et al. 2015). In this study, we focus on the OR case for quantile estimation in a bivariate context.
Feasible range identification for bivariate quantile curve
Chebana & Ouarda (2011) proposed a method to decompose the quantile curve in Figure 1 into a naive part (i.e., the subset BC) and a proper part (outside subset BC). They assumed that the naive part is composed of two segments starting at the end of each extremity of the proper part. They also suggested selecting these boundary points according to the empirical version or as close as to the asymptotes (the naive part). Volpi & Fiori (2012) defined the distance of each point along the quantile curve in Figure 1 from its vertex as random variable (s) and derived its PDF. The boundary points of the quantile curve were identified with a chosen percentage in probability of the events. They also proposed a way of decomposition of the quantile curve into naive part and proper part. However, the procedure presented by Volpi & Fiori (2012) is difficult to apply in the curvilinear coordinate system [s(x,y), n(x,y)] or to derive the expression of random variable (s). To overcome these limitations, an approach to identify the boundary points (i.e., B and C) of the quantile curve is developed. A new density function φ(q) is used to measure the relative likelihood of flood events, which is a non-curvilinear variable in the procedure.
To derive the new density function with a chosen probability level to decompose the quantile curve, a joint distribution of annual maximum (AM), flood peak (Q) and flood volume (W) should be built by copula functions. The joint distribution function F(q,w) can be expressed in terms of its marginal functions and by using an associated dependence function C, F(q,w) = C[FQ(q), FW(w)].
It is found that flood peak and volumes are usually upper-tailed dependent variables and the G–H copula could reproduce best the observed tail dependence coefficient (e.g., Poulin et al. 2007; Ben et al. 2015). Therefore, the G–H copula is taken as an example to illustrate the developed boundary identification method because of its easy expression and wide applications (Li et al. 2013; Chen et al. 2015; Sraj et al. 2015).
It should be noted that other copulas with more complicated formulas sometimes may be needed. For the Frank copula, Clayton copula and several two-parameter copulas, the implicit expression for describing the relationship between Q and W in Equations (7)–(10) could be derived. For copulas with more complicated expressions, the numerical method should be applied. For example, the unique value of w could be obtained with given q by a trial and error method (Liu et al. 2015).
Bivariate flood quantile selection
For a given bivariate return period T, there are countless combinations of u and v that satisfy Equation (7). To derive the design values of flood peak q and flood volume w, the unique combination of u and v (or q and w) should be determined. Hence besides Equation (7), one more equation that can establish the relationship between u and v (or q and w) is necessary. Two combination methods were proposed to derive the quantiles of flood peak and flood volume for given multivariate return periods, and they are now outlined.
EFC method
With a given bivariate return period T, we assume that the flood peak and flood volume have the same probability of occurrence, i.e., u = v (or FQ(q) = FW(w)). This assumption is usually taken as a uniform procedure for the derivations of design flood values and DFH in China (MWR 2006; Xiao et al. 2008, 2009; Chen et al. 2010). Then, the design frequency of bivariate EFC can be obtained by jointly solve the equation u = v and Equation (7).
CEC method
STATISTICAL PROCEDURE
Monte Carlo simulation method
Since the flood volume W is conditioned on the flood peak Q, the conditional distribution is employed to generate the flood data series. The samples are generated from a bivariate distribution composed by Pearson type three (P3) marginal distributions (Li et al. 2013) and copula function. The Monte Carlo simulation method is based on the algorithm developed by Chebana & Ouarda (2009), Lee & Salas (2011), and Chen et al. (2015). The main generating procedures with the conditional copula are summarized as follows:
Fit the marginal distribution FQ(q) and FW(w) under recorded data using the P3 marginal distributions, u = FQ(q) and v = FW(w).
Establish the joint distribution F(q,w) using copulas, F(q,w) = C(u,v), and estimate their parameters based on the Kendall tau method.
One independent random value r1 is generated from [0, 1] uniform distribution. Let r1 represent the value of cumulative probability FQ(q), i.e., u = FQ(q) = r1, then the peak discharge q can be calculated using the fitted P3 distribution, i.e., .
- Another independent random value r2 is also generated from [0, 1] uniform distribution. Let r2 represent the value of the conditional distribution function , i.e., , then the flood volume w can be calculated by . Since the solution of Equation (25) is very complex, it could be simplified using copula function as follows:
After solving , the flood volume w can be obtained by .
Repeat steps (4)–(5) N times, then the data series of peak discharge Q and flood value W with size N are obtained. The parameters of P3 marginal distributions are estimated by L-moment method, respectively. The parameter of copula functions is estimated and the joint distribution is established.
The flood volume w is estimated under one given peak discharge q using the EFC method and the flood volume w is estimated by Equation (23) for the CEC method.
Repeat steps (4)–(7) ms times.
Performance evaluation
The following evaluation criteria were used to evaluate and compare different methods:
APPLICATION
Parameter estimation and hypothesis test
Variables . | Sample statistic values . | P3 parameters . | ||||
---|---|---|---|---|---|---|
Mean . | L-Cv . | L-Cs . | α . | Β . | δ . | |
Qp (m3/s) | 7,820 | 0.4 | 1.2 | 2.78 | 0.0005 | 2606.7 |
W7 (108 m3) | 17 | 0.5 | 1.5 | 1.78 | 0.1569 | 5.7 |
Variables . | Sample statistic values . | P3 parameters . | ||||
---|---|---|---|---|---|---|
Mean . | L-Cv . | L-Cs . | α . | Β . | δ . | |
Qp (m3/s) | 7,820 | 0.4 | 1.2 | 2.78 | 0.0005 | 2606.7 |
W7 (108 m3) | 17 | 0.5 | 1.5 | 1.78 | 0.1569 | 5.7 |
Variables . | Chi-square test . | K–S test . | ||
---|---|---|---|---|
χ20.05 . | Chi-square statistics, χ2 . | Dn,0.95 . | Dn . | |
Qp (m3/s) | 7.815 | 5.313 | 0.185 | 0.096 |
W7 (108 m3) | 9.488 | 3.396 | 0.185 | 0.078 |
Variables . | Chi-square test . | K–S test . | ||
---|---|---|---|---|
χ20.05 . | Chi-square statistics, χ2 . | Dn,0.95 . | Dn . | |
Qp (m3/s) | 7.815 | 5.313 | 0.185 | 0.096 |
W7 (108 m3) | 9.488 | 3.396 | 0.185 | 0.078 |
Joint distribution of flood peak and volume based on copulas
Copula . | Cθ(u, v) . | θ ∈ . | τ . |
---|---|---|---|
G–H | [1, + ∞) | ||
Clayton | (0, + ∞) | ||
Frank | R\{0} |
Copula . | Cθ(u, v) . | θ ∈ . | τ . |
---|---|---|---|
G–H | [1, + ∞) | ||
Clayton | (0, + ∞) | ||
Frank | R\{0} |
. | Parameter . | Statistical test . | . | |
---|---|---|---|---|
Copula . | θ . | RMSE . | S1n(p − valve) . | Dn (Dn,0.95) . |
G–H | 2.98 | 0.068 | 0.011 (0.44) | 0.035 (0.215) |
Clayton | 3.95 | 0.074 | 0.021 (0.17) | 0.048 (0.215) |
Frank | 9.93 | 0.071 | 0.015 (0.28) | 0.043 (0.215) |
. | Parameter . | Statistical test . | . | |
---|---|---|---|---|
Copula . | θ . | RMSE . | S1n(p − valve) . | Dn (Dn,0.95) . |
G–H | 2.98 | 0.068 | 0.011 (0.44) | 0.035 (0.215) |
Clayton | 3.95 | 0.074 | 0.021 (0.17) | 0.048 (0.215) |
Frank | 9.93 | 0.071 | 0.015 (0.28) | 0.043 (0.215) |
Bivariate quantile curve and feasible range identification
The upper and lower bounds on the level curve were estimated numerically by solving Equations (16) and (17), and assuming for simplicity (although other assumptions are possible) α1 = α2 = ɛ/2, with ɛ = 0.05. The upper and lower bounds are denoted as B1 and C1, respectively, in Figure 7. It is found that the bounds are close to the horizontal asymptote (i.e., w7 = 61.49 × 108 m3 for T= 1,000 and w7 = 50.23 × 108 m3 for T= 200) and vertical asymptote (i.e., qp = 22,800 m3/s for T= 1,000 and qp = 19,300 m3/s for T= 200) due to the small value assumed for the probability level ɛ. The upper and lower bounds were also calculated by the boundary identification method proposed by Volpi & Fiori (2012). The results are also presented in Table 5 and the derived bounds are denoted as B2 and C2, as shown in Figure 7. It is shown that the bounds estimated by the proposed method and that proposed by Volpi & Fiori (2012) are very similar.
Boundary identification method . | Return period . | Lower bound . | Upper bound . | ||
---|---|---|---|---|---|
Qp (m3/s) . | W7 (108 m3) . | Qp (m3/s) . | W7 (108 m3) . | ||
Volpi & Fiori (2012) | 1,000 | 22,930 | 65.84 | 26,080 | 61.54 |
200 | 19,350 | 50.27 | 22,460 | 55.86 | |
Developed method | 1,000 | 23,000 | 65.76 | 26,100 | 61.52 |
200 | 19,400 | 54.49 | 22,500 | 50.26 |
Boundary identification method . | Return period . | Lower bound . | Upper bound . | ||
---|---|---|---|---|---|
Qp (m3/s) . | W7 (108 m3) . | Qp (m3/s) . | W7 (108 m3) . | ||
Volpi & Fiori (2012) | 1,000 | 22,930 | 65.84 | 26,080 | 61.54 |
200 | 19,350 | 50.27 | 22,460 | 55.86 | |
Developed method | 1,000 | 23,000 | 65.76 | 26,100 | 61.52 |
200 | 19,400 | 54.49 | 22,500 | 50.26 |
Estimation of bivariate flood quantiles
The proposed bivariate EFC and CEC methods were used to estimate flood peak and 7-day flood volume quantiles with return periods of T = 1,000 and T = 200 years, respectively. For the purpose of comparison, the univariate flood quantiles (called marginal quantiles by Chebana & Ouarda (2011)) were estimated by marginal distributions, assuming that the univariate return periods (TQ and TW) were equal to the bivariate return period (i.e., TQ = TW = T). The univariate flood quantiles can be obtained from the equations and . The results of the component-wise excess realization and the most likely realization proposed by Salvadori et al. (2011) were also estimated. The estimation results of bivariate and univariate quantiles were listed in Table 6. It is shown that the design values of bivariate quantiles are larger than those of univariate quantiles. The quantiles estimated by the four bivariate event selection methods are also shown in Figure 7, and the estimation points of the EFC method are denoted as point E, while the quantiles estimated by the CEC method are denoted as point F. For the results of selection approaches proposed by Salvadori et al. (2011), the events of component-wise excess realization are denoted as point W, and the events of most likely realization are denoted as point L. From Figure 7, we find that the joint design values estimated by the four event selection methods are within the feasible regions. Consequently, the two proposed methods and selection approaches proposed by Salvadori et al. (2011) could be selected as an option of deriving unique flood quantiles, and they could satisfy the inherent law of hydrologic events and have statistical basis to some degree. It can be seen from Table 6 and Figure 7 that the estimated events of the EFC method and that of the most likely realization are similar. The bivariate EFC results have larger flood volume and smaller flood peak than bivariate CEC results. As well, the results estimated by the component-wise excess realization have larger flood peak and smaller flood volume than the other three methods.
T . | Method . | Qp (m3/s) . | W7 (×108 m3) . | Zmax (m) . |
---|---|---|---|---|
1,000 | EFC | 23,390 | 63.09 | 202.97 |
CEC | 23,420 | 62.98 | 202.92 | |
Component-wise excess realization | 23,510 | 62.78 | 202.90 | |
Most-likely realization | 23,400 | 63.05 | 202.95 | |
Univariate distribution | 22,800 | 61.49 | 202.58 | |
200 | EFC | 19,800 | 51.87 | 198.10 |
CEC | 20,130 | 51.11 | 197.79 | |
Component-wise excess realization | 20,200 | 51.03 | 197.59 | |
Most-likely realization | 19,940 | 51.50 | 197.82 | |
Univariate distribution | 19,300 | 50.23 | 197.30 |
T . | Method . | Qp (m3/s) . | W7 (×108 m3) . | Zmax (m) . |
---|---|---|---|---|
1,000 | EFC | 23,390 | 63.09 | 202.97 |
CEC | 23,420 | 62.98 | 202.92 | |
Component-wise excess realization | 23,510 | 62.78 | 202.90 | |
Most-likely realization | 23,400 | 63.05 | 202.95 | |
Univariate distribution | 22,800 | 61.49 | 202.58 | |
200 | EFC | 19,800 | 51.87 | 198.10 |
CEC | 20,130 | 51.11 | 197.79 | |
Component-wise excess realization | 20,200 | 51.03 | 197.59 | |
Most-likely realization | 19,940 | 51.50 | 197.82 | |
Univariate distribution | 19,300 | 50.23 | 197.30 |
Results of statistical test
A Monte Carlo experiment was carried out to assess the performance of the proposed combination (EFC and CEC) methods. In order to describe the correlation between Q and W, the parameter of G–H copula is taken to be θ = 2.98, while the Kendall correlation coefficient is 0.66. In order to test the validation of the experiment method, 10,000 samples of flood pair (Q-W) were generated and the statistic values calculated based on the simulated data and given in Table 7. The Kendall correlation coefficient of the simulated samples was also calculated as 0.63. It can be seen that there is no significant difference between the generated and observed values. Thus, the generated samples could be used for statistical testing hereafter.
Variables . | Sample statistic values . | ||
---|---|---|---|
Mean . | Cv . | Cs . | |
Qp (m3/s) | 7,803 | 0.39 | 1.18 |
W7 (108m3) | 16.75 | 0.48 | 1.47 |
Variables . | Sample statistic values . | ||
---|---|---|---|
Mean . | Cv . | Cs . | |
Qp (m3/s) | 7,803 | 0.39 | 1.18 |
W7 (108m3) | 16.75 | 0.48 | 1.47 |
DFH based on joint distribution
The DFH rescaled by univariate distribution design values and two realizations proposed by Salvadori et al. (2011) was also derived from TFH by Equation (30). These DFHs were routed through the Geheyan reservoir with initial water level (flood control limiting water level, 192.2 m). The corresponding highest reservoir water levels (Zmax) were calculated and are listed in Table 6.
It is shown in Table 6 that the design values of flood peak and 7-day flood volume obtained by univariate distribution method are both smaller than those obtained by four bivariate methods. The resulting Zmax of the univariate method is relatively lower than those of bivariate approaches. Since flood events are naturally multivariate phenomena and flood peak and flood volume are mutually correlated, the quantiles estimated by bivariate distribution are more rational than these by univariate distribution (Chebana & Ouarda 2011).
The comparison results listed in Table 6 also show that Zmax obtained by bivariate EFC method is larger than that obtained by the other three bivariate methods, while the component-wise excess method reaches the lowest Zmax. The results of Zmax calculated by most-likely realization are a little lower than the EFC method and the CEC method obtains a slightly higher Zmax than that of component-wise excess method. Comparing the results of 200-year and 1,000-year return period, it is found that the differences among the four bivariate methods decrease as the return period increases. The water level reaches 202.97 m by the EFC method and is slightly higher than other methods for the 1,000-year return period. Since the Geheyan reservoir has a large amount of flood control storage with annual regulation ability, the design flood volume is relatively more important than peak discharge for flood prevention safety. As a consequence, the bivariate EFC method with slightly larger 7-day flood volume is safer for reservoir design than other methods.
CONCLUSIONS
The quantile is an important and extensively studied notion in hydrological frequency analysis, but it is not properly addressed in the multivariate frequency analysis frameworks. Inspired by the idea of Chebana & Ouarda (2011) and Volpi & Fiori (2012), a developed approach to derive the feasible range and for estimating bivariate flood quantiles is proposed and applied in the Geheyan reservoir. The following conclusions were drawn from this study:
The developed quantile curve boundary identification method using the normalized joint PDF values to measure the relative occurrence likelihood of a seasonal flood event provides a developed applicable rule to narrow down the infinite number of possible choices for bivariate hydrological design values. The developed methods can be applied to bivariate OR return period and given the probability level ɛ.
The quantile combination methods provide a simple but effective way for bivariate quantile estimation with given bivariate return period. The results illustrate that the joint design values estimated by the two proposed combination methods are within the feasible regions, and the EFC method perform satisfactorily.
The application results of Geheyan reservoir show that the bivariate EFC results with slightly larger 7-day flood volume can obtain the highest reservoir water level (Zmax) and, therefore, EFC is safer for reservoir design than other methods.
ACKNOWLEDGEMENTS
The study was financially supported by the National Natural Science Fund of China (51539009) and the National Key Research and Development Plan of China (2016YFC0402206).