Flood event consists of peak discharge and flood volume that are mutually correlated and can be described by a copula function. For a given bivariate joint distribution, a choice of design return period will lead to infinite combinations of peak discharge and flood volume. A boundary identification method is developed to define the feasible ranges of flood peak and volume suitable for combination, and two combination methods, i.e., equivalent frequency combination (EFC) method and conditional expectation combination method for estimating unique bivariate flood quantiles are also proposed. Monte Carlo simulation method is used to evaluate the performance of these combination methods. The Geheyan reservoir in China was selected as case study. It is shown that the joint design values estimated by the two proposed combination methods are both within the feasible range, which means that the methods could be selected for designing unique flood quantiles. The proposed bivariate combination methods are also compared with univariate method, and the reservoir water level estimated by EFC method is higher than the other methods, which means the EFC method is safer for reservoir design. The developed approach provides an applicable way for the identification of feasible range and flood quantile estimation.

## INTRODUCTION

Conventional hydrologic frequency analysis has mainly focused on a single characteristic value such as peak discharge or flood volume by using univariate distribution. However, many hydrological events are characterized by the joint behavior of several correlated random variables; for example, flood hydrograph can be described in terms of flood peak, volume, and duration. When these correlated random variables are relevant to risk analyses or design purposes, the joint study of their probabilistic characteristics leads to a better understanding of the phenomena under investigation and avoids the over- or underestimation of the risks that would be produced by univariate analyses (Salvadori & De Michele 2004; Xiao *et al.* 2009; Gräler *et al.* 2013).

Many multivariate analysis methods and case studies have been proposed and discussed in the past decades. These methods have been applied mainly to the analysis of rainfall storms (De Michele & Salvadori 2003; Zhang *et al.* 2012; Sun *et al.* 2013; Abdul & Zeephongsekul 2014), flood frequency (Chowdhary *et al.* 2011; Chen *et al.* 2012; Sraj *et al.* 2015), drought events (Chen *et al.* 2013a, 2013b; Ganguli 2014; Salvadori & De Michele 2010, 2015), regional frequency analysis (Chebana & Ouarda 2009; Ben *et al.* 2015; Zhang *et al.* 2015a), and synthetic flood hydrograph (Gräler *et al.* 2013; Salvadori *et al*. 2013).

These multivariate hydrological frequency analysis studies have mainly concentrated on the following aspects: (1) showing the importance and explaining the usefulness of the multivariate framework; (2) fitting the appropriate multivariate distribution (copula and marginal distributions) and estimating the corresponding parameters; (3) estimating multivariate return periods and quantile values. Of these three aspects, the third one, i.e., the question of estimating multivariate return periods and quantile values, still has not yet been properly addressed in the multivariate frequency analysis framework (Chebana & Ouarda 2011), although the estimation of return periods and quantile values has been extensively studied in univariate hydrological frequency analysis.

One of the main difficulties in the multivariate quantile estimation is how to choose the proper combinations of design values of the concerned random variables for a given multivariate return period of hydrologic structure design. Take the bivariate case (peak discharge *Q* and flood volume *W*) as an example. The combinations can differ greatly in terms of their values: moving along the multivariate quantile curve to an asymptote, one of the two variables will approach its marginal value, while the other tends to indefinitely increase (for unbounded random variables). Chebana & Ouarda (2011) proposed the decomposition of the level curve into a naive part (tail) and the proper part (central); they assumed that the naive part was composed of two segments starting at the end of each extremity of the proper part. Salvadori *et al.* (2011) introduced two basic design realizations, i.e., component-wise excess design realization and most-likely design realization. Volpi & Fiori (2012) identified a subset of the critical combinations set that includes a fixed and arbitrarily chosen percentage in probability of the events, on the basis of their probability of occurrence.

Following the ideas of Chebana & Ouarda (2011) and Volpi & Fiori (2012), a developed approach to identify the feasible ranges of flood peak and flood volume combination is suggested. Two combination methods, i.e., the equivalent frequency combination (EFC) method and the conditional expectation combination (CEC) method, are developed to derive the quantiles of flood peak and flood volume for given multivariate return periods. The Geheyan reservoir in China was selected as a case study. The design flood quantiles and design flood hydrographs (DFHs) were derived by bivariate combination methods and the univariate distribution method, respectively. The remainder of this paper is organized as follows. The ‘Methods’ section gives the methodology used to derive the feasible range and bivariate combination methods. In the ‘Statistical procedure’ section, we present the statistical procedures used for assessing the proposed combination methods. The ‘Application’ section addresses a case study of China's Geheyan Reservoir. Finally, the ‘Conclusions’ section summarizes the main results and provides necessary explanations.

## METHODS

In this study, we propose a developed approach for estimating bivariate feasible ranges of flood peak and flood volume suitable for combination in the critical level curve. To derive the feasible range, a boundary identification method is suggested, which was inspired by the ideas of Chebana & Ouarda (2011) and Volpi & Fiori (2012). Two combination methods for estimating unique bivariate flood quantiles, i.e., the EFC method and the CEC method, are proposed based on the assumption of the relationship between *u* and *v* (or *q* and *w*).

### Copula functions

*et al*. (2007). Every joint distribution can be written in terms of a copula and its univariate marginal distributions. Copula is a function that connects multivariate probability distribution to its one-dimensional marginal distributions (Nelsen 2006). Let be the marginal cumulative distribution functions (CDFs) of

*X*

_{i}. The objective is to determine the multivariate distribution, denoted as or simply

*H*. Thus, the multivariate probability distribution

*H*is expressed in terms of its marginal distributions and the associated dependence function, known as Sklar's theorem (Sklar 1959): where

*C*, called copula, is uniquely determined whenever are continuous, and captures the essential features of the dependence among the random variables. For details on the copula approach and its hydrological applications, refer to the literature (e.g., Salvadori & De Michele 2015; Zhang

*et al.*2015b).

Different families of copula functions have been proposed and described by Nelsen (2006) and Salvadori *et al.* (2007). Of all the copula families, the Archimedean family is more desirable for hydrological analyses because it can be easily constructed and can be applied to whether the correlation among the hydrological variables is positive or negative (Nelsen 2006; Chen *et al.* 2014a). A large variety of copulas belong to this family. Four one-parameter Archimedean copulas, including the Gumbel–Hougaard (G–H), Ali–Mikhai–Haq, Frank and Cook–Johnson copulas, have been applied in frequency analysis by many authors (Zhang & Singh 2006; Salvadori *et al*. 2007; Chen *et al.* 2014b; Liu *et al.* 2015).

### Bivariate return period

In the conventional univariate analysis, flood events of interest are often defined by return periods. In the bivariate domain, however, it is still discussed by the community as to which method is most suitable to transform the joint exceedance probability to a bivariate joint return period (JRP). Different JPRs estimated by copula function have been developed for the case of a bivariate flood frequency analysis. Eight types of possible joint events were presented by Salvadori & De Michele (2004) using ‘OR’ and ‘AND’ operators, of which two cases are of the greatest interest in hydrological applications (Shiau 2003; Salvadori & De Michele 2004):

*E*

_{or}to happen it is sufficient that either peak discharge

*Q*or flood volume

*W*(or both) exceed given thresholds; instead, for

*E*

_{and}to happen it is necessary that both

*Q*and

*W*are larger than prescribed values. Thus, two different JRPs can be defined accordingly (De Michele

*et al.*2005): where

*μ*is the mean inter-arrival time between two consecutive events (in the case of annual maxima

*μ*= 1 year), and .

*K*is the Kendall's distribution function associated with the joint CDF of the copula's level curves:

_{C}*K*(t) =

_{C}*P*[

*C*(u,v) ≤

*t*]. It allows for the calculation of the probability that a random point (

*u*,

*v*) in the unit square has a smaller (or larger) copula value than a given critical probability level

*t*. In other words, it is related to the probability of occurrence of an event in the area over the copula level curve of value

*t*.

Different definitions of the multivariate return period are available in the literature, based on regression analysis, bivariate conditional distributions, survival Kendall distribution function, and structure performance function. For instance, some studies have focused on a structure-based return period for the design and/or risk assessment of hydrological structures in a bivariate environment (Requena *et al.* 2013; Volpi & Fiori 2014). A comprehensive review of the JRP estimation methods was given by De Michele *et al.* (2013), Serinaldi (2015), and Volpi *et al*. (2015), and references therein.

The OR return period given in Equation (4) has been extensively applied in multivariate hydrological frequency analysis (e.g., Shiau 2003; Salvadori & De Michele 2004; Chebana & Ouarda 2011; Volpi & Fiori 2012; Li *et al.* 2013; Liu *et al.* 2015; Sraj *et al.* 2015). In this study, we focus on the OR case for quantile estimation in a bivariate context.

### Feasible range identification for bivariate quantile curve

*T*or critical probability level

*p*has two asymptotes,

*q*=

*q*and

_{p}*w*=

*w*, where and are the quantiles of the marginal distribution for the given probability level

_{p}*p*. According to Equation (4) in the bivariate case, the choice of an appropriate return period

*T*or a critical probability level

*p*for hydraulic structure design will lead to the infinite combinations of flood peak and volume. However, all the bivariate flood events with the same value of

*T*or

*p*along the level curve differ greatly not only in terms of their quantile values, but also in terms of their probability of occurrence, which is measured by the joint probability density function (PDF), i.e.,

*f*(

*q*,

*w*), evaluated along the critical level curve (Volpi & Fiori 2012). Meanwhile, different combinations of

*Q*and

*W*are generally not equivalent from a practical point of view, although they all satisfy the flood prevention standards. The boundaries (see points

*B*and

*C*in Figure 1) for selection of design flood peak and volume are necessary in the case that the flood combinations are outside the boundaries with unrealistically low occurrence probabilities.

Chebana & Ouarda (2011) proposed a method to decompose the quantile curve in Figure 1 into a naive part (i.e., the subset *BC*) and a proper part (outside subset *BC*). They assumed that the naive part is composed of two segments starting at the end of each extremity of the proper part. They also suggested selecting these boundary points according to the empirical version or as close as to the asymptotes (the naive part). Volpi & Fiori (2012) defined the distance of each point along the quantile curve in Figure 1 from its vertex as random variable (*s*) and derived its PDF. The boundary points of the quantile curve were identified with a chosen percentage in probability of the events. They also proposed a way of decomposition of the quantile curve into naive part and proper part. However, the procedure presented by Volpi & Fiori (2012) is difficult to apply in the curvilinear coordinate system [*s*(*x*,*y*), *n*(*x*,*y*)] or to derive the expression of random variable (*s*). To overcome these limitations, an approach to identify the boundary points (i.e., *B* and *C*) of the quantile curve is developed. A new density function *φ*(*q*) is used to measure the relative likelihood of flood events, which is a non-curvilinear variable in the procedure.

To derive the new density function with a chosen probability level to decompose the quantile curve, a joint distribution of annual maximum (AM), flood peak (*Q*) and flood volume (*W*) should be built by copula functions. The joint distribution function *F*(*q*,*w*) can be expressed in terms of its marginal functions and by using an associated dependence function *C*, *F*(*q*,*w*) = *C*[*F _{Q}*(q),

*F*(w)].

_{W}It is found that flood peak and volumes are usually upper-tailed dependent variables and the G–H copula could reproduce best the observed tail dependence coefficient (e.g., Poulin *et al.* 2007; Ben *et al.* 2015). Therefore, the G–H copula is taken as an example to illustrate the developed boundary identification method because of its easy expression and wide applications (Li *et al.* 2013; Chen *et al.* 2015; Sraj *et al.* 2015).

*C*(

_{θ}*u*,

*v*) and bivariate return period

*T*can be expressed as (

*μ*= 1 for annual maxima flood series): where

*θ*is the dependence parameter of the G–H copula,

*u*=

*F*(

_{Q}*q*),

*v*=

*F*(

_{W}*w*).

*u*and

*v*with the given bivariate return period

*T*can be derived as: Replacing and into the above equation yields: in which,

*Q*and

*W*with the fixed bivariate return period

*T*can be derived as: where is the inverse CDF of flood volume

*W*. The above equation reveals that

*W*can be derived by

*Q*if the bivariate return period

*T*is fixed.

It should be noted that other copulas with more complicated formulas sometimes may be needed. For the Frank copula, Clayton copula and several two-parameter copulas, the implicit expression for describing the relationship between *Q* and *W* in Equations (7)–(10) could be derived. For copulas with more complicated expressions, the numerical method should be applied. For example, the unique value of *w* could be obtained with given *q* by a trial and error method (Liu *et al.* 2015).

*w*and

*q*for the flood events along the critical level curve, the bivariate joint PDF of

*w*and

*q*can be expressed according to Sklar's theory (Nelsen 2006) as: where

*f*(

_{Q}*q*) and

*f*(

_{W}*w*) are univariate PDFs of flood peak and volume, respectively, and is the density of and defined as: Referring to Equations (9) and (10), the bivariate joint PDF (Equation (11)) of flood peak and volume can be finally described as the function of the single random variable of flood peak

*Q*for the fixed bivariate return period

*T*, i.e., According to Equation (13), there is a curve that can describe the relationship between joint PDF

*f*(

*q*,

*w*) and flood peak

*Q*for a given bivariate return period

*T*or a critical probability level

*p*. Assume that the area between the curve of

*f*(

*q*,

*w*) and the horizontal axis of flood peak

*Q*is

*A*, i.e., where

*q*represents univariate design value of flood peak, i.e., , which is chosen as the lower bound of flood peak in estimation of the bivariate design flood values.

_{p}*f*(

*q*,

*w*) is a joint density function of

*q*and

*w*, area

*A*does not equal to 1 if only

*q*is taken as integral variable (i.e.,

*A*≠ 1). A new density function

*φ*(

*q*) over the area

*A*which has proper density characters is constructed and expressed as follows: Obviously, there is a one-to-one correspondence between the density function

*φ*(

*q*) and bivariate PDF

*f*(

*q*,

*w*). The density function

*φ*(

*q*) varies with the horizontal axis and .

*f*(

*q*,

*w*) of bivariate design flood combination near the upper and lower bounds of quantile curve is smaller than that near the middle of the quantile curve. The density function

*φ*(

*q*) has the same property as the bivariate PDF

*f*(

*q*,

*w*). As the design flood peak (or flood volume) varies from the lower bound, i.e., (

*q*) to infinitely great, the density function

_{p}*φ*(

*q*) increases to the maximum value and then decreases gradually, as shown in Figure 2. The vertex of the density function

*φ*(

*q*) describing the full dependence (Chebana & Ouarda 2011; Volpi & Fiori 2012) between peak and volume has the highest density. In other words, this is the most likely bivariate design flood event.

*φ*(

*q*) along

*Q*is defined by Equation (13), we can evaluate the lower and upper bounds that contain

*φ*(

*q*) with probability of 1−

*ɛ*, for a given probability level

*ɛ*. The quantiles of lower and upper bounds (

*q*and

_{B}*q*) are specified respectively by (Volpi & Fiori 2012): where

_{C}*α*

_{1}+

*α*

_{2}=

*ɛ*. The lower and upper bounds

*q*and

_{B}*q*identify a feasible range on the quantile curve, bounded by the points of coordinates (

_{C}*q*,

_{B}*ζ*(

*q*)) and (

_{B}*q*,

_{C}*ζ*(

*q*)), that excludes the

_{C}*ɛ*percentage in probability of the critical events. The probability levels

*α*

_{1}and

*α*

_{2}can be arbitrarily chosen, taking account of the specific problem under investigation (Volpi & Fiori 2012).

### Bivariate flood quantile selection

For a given bivariate return period *T*, there are countless combinations of *u* and *v* that satisfy Equation (7). To derive the design values of flood peak *q* and flood volume *w*, the unique combination of *u* and *v* (or *q* and *w*) should be determined. Hence besides Equation (7), one more equation that can establish the relationship between *u* and *v* (or *q* and *w*) is necessary. Two combination methods were proposed to derive the quantiles of flood peak and flood volume for given multivariate return periods, and they are now outlined.

### EFC method

With a given bivariate return period *T*, we assume that the flood peak and flood volume have the same probability of occurrence, i.e., *u* = *v* (or *F _{Q}*(

*q*) =

*F*(

_{W}*w*)). This assumption is usually taken as a uniform procedure for the derivations of design flood values and DFH in China (MWR 2006; Xiao

*et al.*2008, 2009; Chen

*et al.*2010). Then, the design frequency of bivariate EFC can be obtained by jointly solve the equation

*u*=

*v*and Equation (7).

*u*and

*v*with the given bivariate return period

*T*is described in Equation (7). Based on the assumption that

*u*

*=*

*v*, the probabilities of occurrence of flood peak and volume (i.e.,

*u*and

*v*) can be estimated by the solution of Equation (18), which is expressed as follows: where , and

*θ*is the dependence parameter of the G–H copula.

### CEC method

*Q*and flood volume

*W*are dependent variables, one may wish to predict the value of

*W*based on an observed value of

*Q*. Let

*g*(

*Q*) be a predictor, i.e.,

*g*∈

*N*= {all Borel functions

*g*with }. Each predictor is assessed by the ‘mean squared prediction error’ . The conditional expectation is the best predictor of

*W*in the sense that (Shao 2003): Herein, during a flood event, when the flood peak

*Q*=

*q*takes place, the conditional expectation is used to estimate the value of flood volume, which can be derived as:

*W*.

*q*and will be the CEC if the following equations are satisfied: The above equation can be solved by trial and error method with different values of

*q*.

## STATISTICAL PROCEDURE

### Monte Carlo simulation method

Since the flood volume *W* is conditioned on the flood peak *Q*, the conditional distribution is employed to generate the flood data series. The samples are generated from a bivariate distribution composed by Pearson type three (P3) marginal distributions (Li *et al.* 2013) and copula function. The Monte Carlo simulation method is based on the algorithm developed by Chebana & Ouarda (2009), Lee & Salas (2011), and Chen *et al.* (2015). The main generating procedures with the conditional copula are summarized as follows:

Fit the marginal distribution

*F*(_{Q}*q*) and*F*(_{W}*w*) under recorded data using the P3 marginal distributions,*u*=*F*(_{Q}*q*) and*v*=*F*(_{W}*w*).Establish the joint distribution

*F*(*q*,*w*) using copulas,*F*(*q*,*w*) =*C*(*u*,*v*), and estimate their parameters based on the Kendall tau method.One independent random value

*r*_{1}is generated from [0, 1] uniform distribution. Let*r*_{1}represent the value of cumulative probability*F*(_{Q}*q*), i.e.,*u*=*F*(_{Q}*q*) =*r*_{1}, then the peak discharge*q*can be calculated using the fitted P3 distribution, i.e., .- Another independent random value
*r*_{2}is also generated from [0, 1] uniform distribution. Let*r*_{2}represent the value of the conditional distribution function , i.e., , then the flood volume*w*can be calculated by . Since the solution of Equation (25) is very complex, it could be simplified using copula function as follows:After solving , the flood volume

*w*can be obtained by . Repeat steps (4)–(5)

*N*times, then the data series of peak discharge*Q*and flood value*W*with size*N*are obtained. The parameters of P3 marginal distributions are estimated by*L*-moment method, respectively. The parameter of copula functions is estimated and the joint distribution is established.The flood volume

*w*is estimated under one given peak discharge*q*using the EFC method and the flood volume*w*is estimated by Equation (23) for the CEC method.Repeat steps (4)–(7)

*m*_{s}times.

### Performance evaluation

The following evaluation criteria were used to evaluate and compare different methods:

*MRE*: Measure of efficiency, denoted by the relative root mean square error,

*RRMSE*: where

*N*is the sample size;

*m*

_{s}is the number of Monte Carlo experiment iterations; is the

*j*th simulated flood volume in the

*i*th experiment; is the

*j*th estimated flood volume derived by the

*j*th simulated peak discharge in the

*i*th experiment.

## APPLICATION

^{2}with an annual average rainfall of 1,500 mm. The annual average discharge and volume at the dam site are 393 m

^{3}/s and 124 × 10

^{8}m

^{3}, respectively. The flood season lasts for 5 months, from May to September. Since it is a multi-purpose reservoir, there is serious conflict between flood prevention and hydropower generation. The reservoir design flood with 1,000-year return period is used as the flood prevention standard. The flood control limiting water level during the flood season and normal water level are 192.2 m and 200 m (a.m.s.l.), respectively. The AM flood peak discharge (

*Q*) and 7-day flood volume (

_{p}*W*

_{7}) are available with a systematic record of 54 years (1951–2004).

### Parameter estimation and hypothesis test

*et al.*2014a, 2014b). The PDF of the P3 distribution is defined as: where

*α*,

*β*, and

*δ*are shape, scale, and location parameters of the P3 distribution, respectively.

*D*

_{n}<

*D*

_{n,0.95}. The marginal distribution frequency curves of flood peak and 7-day flood volumes are shown in Figure 4, in which the line represents the theoretical distribution and the crosses represent empirical probabilities. Figure 4 indicates that these theoretical distributions can fit the observed data reasonably well.

Variables | Sample statistic values | P3 parameters | ||||
---|---|---|---|---|---|---|

Mean | L-C_{v} | L-C_{s} | α | Β | δ | |

Q (m_{p}^{3}/s) | 7,820 | 0.4 | 1.2 | 2.78 | 0.0005 | 2606.7 |

W_{7} (10^{8} m^{3}) | 17 | 0.5 | 1.5 | 1.78 | 0.1569 | 5.7 |

Variables | Sample statistic values | P3 parameters | ||||
---|---|---|---|---|---|---|

Mean | L-C_{v} | L-C_{s} | α | Β | δ | |

Q (m_{p}^{3}/s) | 7,820 | 0.4 | 1.2 | 2.78 | 0.0005 | 2606.7 |

W_{7} (10^{8} m^{3}) | 17 | 0.5 | 1.5 | 1.78 | 0.1569 | 5.7 |

Variables | Chi-square test | K–S test | ||
---|---|---|---|---|

χ^{2}_{0.05} | Chi-square statistics, χ^{2} | D _{n,0.95} | D _{n} | |

Q (m_{p}^{3}/s) | 7.815 | 5.313 | 0.185 | 0.096 |

W_{7} (10^{8} m^{3}) | 9.488 | 3.396 | 0.185 | 0.078 |

Variables | Chi-square test | K–S test | ||
---|---|---|---|---|

χ^{2}_{0.05} | Chi-square statistics, χ^{2} | D _{n,0.95} | D _{n} | |

Q (m_{p}^{3}/s) | 7.815 | 5.313 | 0.185 | 0.096 |

W_{7} (10^{8} m^{3}) | 9.488 | 3.396 | 0.185 | 0.078 |

### Joint distribution of flood peak and volume based on copulas

*τ*), and the expressions which describe the relationship between the estimated parameters and Kendall correlation coefficient are also shown in Table 3. The result of the Kendall tau is 0.66 in Geheyan reservoir. The results of the Cramer–von Mises test and K–S test

*D*for the pair are presented in Table 4, and

_{n}*P*-values were calculated based on the parametric bootstrap or multipliers procedure with 10,000 runs (Kojadinovic

*et al.*2011; Sraj

*et al.*2015). The root mean square error (

*RMSE*) was also calculated and listed in Table 4. The graphical goodness-of-fit test between simulated (sample size 10,000) and observed values is plotted in Figure 5. On the basis of the statistical tests and graphical fitting (Figure 5), it is found that the G–H copula is much more suitable for constructing the joint distribution of peak discharge and flood volume. This finding is in accordance with other authors (e.g., Genest & Mackay 1986; Poulin

*et al.*2007; Sraj

*et al.*2015; Zhang

*et al.*2015b).

Copula | C(_{θ}u, v) | θ ∈ | τ |
---|---|---|---|

G–H | [1, + ∞) | ||

Clayton | (0, + ∞) | ||

Frank | R\{0} |

Copula | C(_{θ}u, v) | θ ∈ | τ |
---|---|---|---|

G–H | [1, + ∞) | ||

Clayton | (0, + ∞) | ||

Frank | R\{0} |

Parameter | Statistical test | |||
---|---|---|---|---|

Copula | θ | RMSE | S^{1}_{n}(p − valve) | D_{n} (D_{n,0.95}) |

G–H | 2.98 | 0.068 | 0.011 (0.44) | 0.035 (0.215) |

Clayton | 3.95 | 0.074 | 0.021 (0.17) | 0.048 (0.215) |

Frank | 9.93 | 0.071 | 0.015 (0.28) | 0.043 (0.215) |

Parameter | Statistical test | |||
---|---|---|---|---|

Copula | θ | RMSE | S^{1}_{n}(p − valve) | D_{n} (D_{n,0.95}) |

G–H | 2.98 | 0.068 | 0.011 (0.44) | 0.035 (0.215) |

Clayton | 3.95 | 0.074 | 0.021 (0.17) | 0.048 (0.215) |

Frank | 9.93 | 0.071 | 0.015 (0.28) | 0.043 (0.215) |

### Bivariate quantile curve and feasible range identification

*T*= 1,000-year, was selected as the bivariate return period and

*T*= 200-year was also chosen for comparison. The bivariate quantile curves of the two return periods are shown in Figure 7. The relationship between density function

*φ*(

*q*) and

*q*is drawn in Figure 8. Even if the G–H copula model is symmetric, the PDF

*φ*(

*q*) is not symmetrical due to the difference in the marginal distributions.

The upper and lower bounds on the level curve were estimated numerically by solving Equations (16) and (17), and assuming for simplicity (although other assumptions are possible) *α*_{1} = *α*_{2} = *ɛ/*2, with *ɛ* = 0.05. The upper and lower bounds are denoted as *B*_{1} and *C*_{1}, respectively, in Figure 7. It is found that the bounds are close to the horizontal asymptote (i.e., *w _{7}* = 61.49 × 10

^{8}m

^{3}for

*T*

*=*1,000 and

*w*= 50.23 × 10

_{7}^{8}m

^{3}for

*T*

*=*200) and vertical asymptote (i.e.,

*q*= 22,800 m

_{p}^{3}/

*s*for

*T*

*=*1,000 and

*q*= 19,300 m

_{p}^{3}/

*s*for

*T*

*=*200) due to the small value assumed for the probability level

*ɛ*. The upper and lower bounds were also calculated by the boundary identification method proposed by Volpi & Fiori (2012). The results are also presented in Table 5 and the derived bounds are denoted as

*B*

_{2}and

*C*

_{2}, as shown in Figure 7. It is shown that the bounds estimated by the proposed method and that proposed by Volpi & Fiori (2012) are very similar.

Boundary identification method | Return period | Lower bound | Upper bound | ||
---|---|---|---|---|---|

Q (m_{p}^{3}/s) | W_{7} (10^{8} m^{3}) | Q (m_{p}^{3}/s) | W_{7} (10^{8} m^{3}) | ||

Volpi & Fiori (2012) | 1,000 | 22,930 | 65.84 | 26,080 | 61.54 |

200 | 19,350 | 50.27 | 22,460 | 55.86 | |

Developed method | 1,000 | 23,000 | 65.76 | 26,100 | 61.52 |

200 | 19,400 | 54.49 | 22,500 | 50.26 |

Boundary identification method | Return period | Lower bound | Upper bound | ||
---|---|---|---|---|---|

Q (m_{p}^{3}/s) | W_{7} (10^{8} m^{3}) | Q (m_{p}^{3}/s) | W_{7} (10^{8} m^{3}) | ||

Volpi & Fiori (2012) | 1,000 | 22,930 | 65.84 | 26,080 | 61.54 |

200 | 19,350 | 50.27 | 22,460 | 55.86 | |

Developed method | 1,000 | 23,000 | 65.76 | 26,100 | 61.52 |

200 | 19,400 | 54.49 | 22,500 | 50.26 |

### Estimation of bivariate flood quantiles

The proposed bivariate EFC and CEC methods were used to estimate flood peak and 7-day flood volume quantiles with return periods of *T* = 1,000 and *T* = 200 years, respectively. For the purpose of comparison, the univariate flood quantiles (called marginal quantiles by Chebana & Ouarda (2011)) were estimated by marginal distributions, assuming that the univariate return periods (*T _{Q}* and

*T*) were equal to the bivariate return period (i.e.,

_{W}*T*). The univariate flood quantiles can be obtained from the equations and . The results of the component-wise excess realization and the most likely realization proposed by Salvadori

_{Q}= T_{W}= T*et al.*(2011) were also estimated. The estimation results of bivariate and univariate quantiles were listed in Table 6. It is shown that the design values of bivariate quantiles are larger than those of univariate quantiles. The quantiles estimated by the four bivariate event selection methods are also shown in Figure 7, and the estimation points of the EFC method are denoted as point E, while the quantiles estimated by the CEC method are denoted as point F. For the results of selection approaches proposed by Salvadori

*et al.*(2011), the events of component-wise excess realization are denoted as point W, and the events of most likely realization are denoted as point L. From Figure 7, we find that the joint design values estimated by the four event selection methods are within the feasible regions. Consequently, the two proposed methods and selection approaches proposed by Salvadori

*et al.*(2011) could be selected as an option of deriving unique flood quantiles, and they could satisfy the inherent law of hydrologic events and have statistical basis to some degree. It can be seen from Table 6 and Figure 7 that the estimated events of the EFC method and that of the most likely realization are similar. The bivariate EFC results have larger flood volume and smaller flood peak than bivariate CEC results. As well, the results estimated by the component-wise excess realization have larger flood peak and smaller flood volume than the other three methods.

T | Method | Q_{p} (m^{3}/s) | W_{7} (×10^{8} m^{3}) | Z_{max} (m) |
---|---|---|---|---|

1,000 | EFC | 23,390 | 63.09 | 202.97 |

CEC | 23,420 | 62.98 | 202.92 | |

Component-wise excess realization | 23,510 | 62.78 | 202.90 | |

Most-likely realization | 23,400 | 63.05 | 202.95 | |

Univariate distribution | 22,800 | 61.49 | 202.58 | |

200 | EFC | 19,800 | 51.87 | 198.10 |

CEC | 20,130 | 51.11 | 197.79 | |

Component-wise excess realization | 20,200 | 51.03 | 197.59 | |

Most-likely realization | 19,940 | 51.50 | 197.82 | |

Univariate distribution | 19,300 | 50.23 | 197.30 |

T | Method | Q_{p} (m^{3}/s) | W_{7} (×10^{8} m^{3}) | Z_{max} (m) |
---|---|---|---|---|

1,000 | EFC | 23,390 | 63.09 | 202.97 |

CEC | 23,420 | 62.98 | 202.92 | |

Component-wise excess realization | 23,510 | 62.78 | 202.90 | |

Most-likely realization | 23,400 | 63.05 | 202.95 | |

Univariate distribution | 22,800 | 61.49 | 202.58 | |

200 | EFC | 19,800 | 51.87 | 198.10 |

CEC | 20,130 | 51.11 | 197.79 | |

Component-wise excess realization | 20,200 | 51.03 | 197.59 | |

Most-likely realization | 19,940 | 51.50 | 197.82 | |

Univariate distribution | 19,300 | 50.23 | 197.30 |

### Results of statistical test

A Monte Carlo experiment was carried out to assess the performance of the proposed combination (EFC and CEC) methods. In order to describe the correlation between *Q* and *W*, the parameter of G–H copula is taken to be *θ* = 2.98, while the Kendall correlation coefficient is 0.66. In order to test the validation of the experiment method, 10,000 samples of flood pair (*Q*-*W*) were generated and the statistic values calculated based on the simulated data and given in Table 7. The Kendall correlation coefficient of the simulated samples was also calculated as 0.63. It can be seen that there is no significant difference between the generated and observed values. Thus, the generated samples could be used for statistical testing hereafter.

Variables | Sample statistic values | ||
---|---|---|---|

Mean | C_{v} | C_{s} | |

Q (m_{p}^{3}/s) | 7,803 | 0.39 | 1.18 |

W_{7} (10^{8}m^{3}) | 16.75 | 0.48 | 1.47 |

Variables | Sample statistic values | ||
---|---|---|---|

Mean | C_{v} | C_{s} | |

Q (m_{p}^{3}/s) | 7,803 | 0.39 | 1.18 |

W_{7} (10^{8}m^{3}) | 16.75 | 0.48 | 1.47 |

*Q*-

*W*) with different sample sizes varying from

*N*

*=*30 to

*N*

*=*100 were generated. All experiment iterations were conducted

*m*

_{s}= 500 times for these simulations. The evaluation criteria of

*RRMSE*and

*MRE*are calculated and plotted in Figure 9. It is shown that with increasing data size

*N*, there is a slight decrease in absolute value of

*RRMSE*and

*MRE*. When the sample size increases from 30 to 100, the

*RRMSE*value decreases from 0.25% to 0.20% by 20% and the absolute value of

*MRE*decreases from 3.3% to 2.5% by 24% for the EFC method. In the case of the CEC method, the

*RRMSE*value decreases by 16% and the absolute value of

*MRE*decreases by 22%. It is shown in Figure 9 that smaller RRMSE and MRE absolute values are estimated by the EFC method. When the sample size is greater than 60 years, the values of

*RRMSE*are almost less than 0.28 and the values of

*MRE*are less than 4.0%; while for the CEC method, the

*RRMSE*is less than 0.30 and the

*MRE*is less than 6.0%. Monte Carlo experimental results show that the EFC method performs a little better than the conditional expectation method.

### DFH based on joint distribution

*et al.*2008): where

*DFH*(

*t*) and

*TFH*(

*t*) are the flood discharges of the DFH and TFH for time

*t*respectively;

*Q*is flood peak discharge of TFH;

_{TFH}*W*is 7-day flood volume of TFH for flood duration

_{TFH}*DT*;

*q*and

*w*are flood peak and 7-day flood volume of bivariate design flood combination, respectively. Nevertheless, other DFH generation methods based on flood peak and volume are also available and can be applied with the bivariate design value combinations.

*p*-level curve strongly depends on the correlation coefficient of

*Q*and

*W.*In the limiting case of full dependence, the level curve reduces to its vertex and the width of the feasible range tends to 0 (Volpi & Fiori 2012). Since the Kendall correlation coefficient between flood peak and 7-day volume in Geheyan reservoir equals to 0.66, the differences of quantiles estimated by EFC and CEC methods are relatively small in this case study.

The DFH rescaled by univariate distribution design values and two realizations proposed by Salvadori *et al.* (2011) was also derived from TFH by Equation (30). These DFHs were routed through the Geheyan reservoir with initial water level (flood control limiting water level, 192.2 m). The corresponding highest reservoir water levels (*Z*_{max}) were calculated and are listed in Table 6.

It is shown in Table 6 that the design values of flood peak and 7-day flood volume obtained by univariate distribution method are both smaller than those obtained by four bivariate methods. The resulting *Z*_{max} of the univariate method is relatively lower than those of bivariate approaches. Since flood events are naturally multivariate phenomena and flood peak and flood volume are mutually correlated, the quantiles estimated by bivariate distribution are more rational than these by univariate distribution (Chebana & Ouarda 2011).

The comparison results listed in Table 6 also show that *Z*_{max} obtained by bivariate EFC method is larger than that obtained by the other three bivariate methods, while the component-wise excess method reaches the lowest *Z*_{max}. The results of *Z*_{max} calculated by most-likely realization are a little lower than the EFC method and the CEC method obtains a slightly higher *Z*_{max} than that of component-wise excess method. Comparing the results of 200-year and 1,000-year return period, it is found that the differences among the four bivariate methods decrease as the return period increases. The water level reaches 202.97 m by the EFC method and is slightly higher than other methods for the 1,000-year return period. Since the Geheyan reservoir has a large amount of flood control storage with annual regulation ability, the design flood volume is relatively more important than peak discharge for flood prevention safety. As a consequence, the bivariate EFC method with slightly larger 7-day flood volume is safer for reservoir design than other methods.

## CONCLUSIONS

The quantile is an important and extensively studied notion in hydrological frequency analysis, but it is not properly addressed in the multivariate frequency analysis frameworks. Inspired by the idea of Chebana & Ouarda (2011) and Volpi & Fiori (2012), a developed approach to derive the feasible range and for estimating bivariate flood quantiles is proposed and applied in the Geheyan reservoir. The following conclusions were drawn from this study:

The developed quantile curve boundary identification method using the normalized joint PDF values to measure the relative occurrence likelihood of a seasonal flood event provides a developed applicable rule to narrow down the infinite number of possible choices for bivariate hydrological design values. The developed methods can be applied to bivariate OR return period and given the probability level

*ɛ*.The quantile combination methods provide a simple but effective way for bivariate quantile estimation with given bivariate return period. The results illustrate that the joint design values estimated by the two proposed combination methods are within the feasible regions, and the EFC method perform satisfactorily.

The application results of Geheyan reservoir show that the bivariate EFC results with slightly larger 7-day flood volume can obtain the highest reservoir water level (

*Z*_{max}) and, therefore, EFC is safer for reservoir design than other methods.

## ACKNOWLEDGEMENTS

The study was financially supported by the National Natural Science Fund of China (51539009) and the National Key Research and Development Plan of China (2016YFC0402206).