A meta-heuristic approach for multivariate design flood quantile estimation incorporating historical information

Design flood quantiles are crucial for hydraulic structures design, water resources planning and management, whereas previous multivariate hydrological quantile estimation methods usually do not consider historical flood information. To overcome such limitations, a meta-heuristic inference function for margins (MHIFM) approach, couplingmeta-heuristic algorithmwith amodified inference function for margins (IFM) method, is developed for modeling the joint distributions of flood peak and volumes with incorporation of historical flood information. Then, the most likely realization (MLR) and equivalent frequency combination (EFC) methods are employed for selecting multivariate design floods on a quantile iso-surface. The Danjiangkou reservoir located in Hanjiang River basin, the first pilot basin of most regulated water resources management policy in China, is selected as a case study. Application results indicate that the MHIFM approach shows good performance for estimating the parameters of marginal and joint distributions; moreover, the MLR method yields safer design flood quantiles than the EFC method in terms of highest routed reservoir water levels. The proposed MHIFM approach associated with the MLR method is safer and more rational for reservoir design, which would provide rich information as the reference for flood risk assessment, reservoir operation and management. doi: 10.2166/nh.2018.060 om http://iwaponline.com/hr/article-pdf/50/2/526/549119/nh0500526.pdf er 2021 Jiabo Yin Shenglian Guo (corresponding author) Xushu Wu Guang Yang Feng Xiong Yanlai Zhou State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan 430072, China E-mail: slguo@whu.edu.cn


INTRODUCTION
Flood frequency analysis is performed to quantify the risk of flooding at different spatial locations and to provide guidelines for determining the design capacity of flood control structures. The design capacities of spillways, levees and several other flood control structures, and reservoir operation or management depend on the estimated magnitude of design flood quantiles (Xiao et al. ). Conventional hydrological frequency analysis mainly focuses on unique hydrological characteristics such as peak discharge or flood volume utilizing univariate distribution, which is unable to consider the interdependence between correlated hydrological variables and fails to provide a complete description of hydrological events (Favre et al. ). Therefore, a multivariate frequency analysis is needed for characterizing the random hydrological variables' interdependent features, particularly in designing hydraulic structures such as dams (Li et al. ). While the copula-based methodology for multivariate frequency analysis is well established and explored, a common problem exists in both univariate and multivariate flood frequency analysis, i.e. the gauged record is rarely long enough to yield a reliable estimation of design flood quantiles, especially for high return periods. Furthermore, the existence of a cyclic variation over periods longer than the duration of the records would introduce further bias (Stedinger & Cohn ; Guo & Cunnane ). To resolve these problems, hydrologists have considerable interest in augmenting flow records, increasing sample size and estimating uncertainty. For example, the peak over threshold sampling method was used to extract flood series and thus significantly increased sample size compared with the annual maximum sampling method (Bhunya et al. ).
Several pioneering approaches such as regionalization procedures (Reed ), Bayesian method (Zhang et al. b) and bootstrapping approaches (Yin et al. a, b) have been suggested to reduce the design flood estimation uncertainty. The widely acknowledged and reliable technique is to incorporate historical or paleological information into flood frequency analysis, which has been recommended as an official procedure (e.g. NERC ; MWR ). Several methods, such as historically weighted moments, maximum likelihood, probability weighted moments, and L-moments (Guo &  Although several studies investigating the interdependence between correlated hydrological variables have been conducted, there is little previous work on deriving multivariate joint design floods with incorporation of historical information, which is extremely crucial in making decisions about flood control and water resources management, particularly for reservoirs with large regulation capacity. This study develops a meta-heuristic inference function for margins (MHIFM) approach to model the joint distributions of flood peak and volumes with incorporation of historical information. Moreover, we derive the general formulae of most likely realization (MLR) and EFC methods for selecting multivariate design flood quantiles under a given JRP.
The remainder of this paper is organized as follows. The 'Study area and data' section introduces the systematic record and historical floods in Hanjiang River basin. In the 'Methodology' section, we present the methodology used to estimate the parameters of marginal and multivariate distributions and flood quantile selection methods.
The 'Results and discussion' section addresses a case study in Danjiangkou reservoir. Finally, the 'Conclusions' section summarizes the main results and provides necessary explanations.

Danjiangkou reservoir
The Hanjiang River basin, the first pilot basin of most regulated water resources management policy in China, is the largest tributary of the Yangtze River. The Hanjiang River spans 1,570 km with a basin area of 159,000 km 2 , which is dominated by a sub-tropical monsoon climate.
The annual precipitation in Hanjiang River basin varies from 700 to 1,100 mm, associated with large spatial and seasonal variability. The rainfall from June to October accounts for 75% of the annual amount, and the convective rainstorms in the early summer and long-lasting frontal rains in the autumn often lead to intense flash flood hazards.
The Danjiangkou reservoir marked in Figure 1 Table 1) since 1,583 were quantitatively evaluated by the CWRC.

METHODOLOGY Copula functions and JRP
This study focuses on flood peak discharge (Q) and flood volumes (W i (i ¼1, 2, . . . , m)) to investigate the joint probability of flood occurrence by copulas and to derive the DFHs, in which m represents the number of flood volumes needed for estimating DFHs. The multivariate joint distribution H can be expressed in terms of univariate marginal distributions and the associated dependence function, known as Sklar's theorem (Sklar ): where μ is the mean inter-arrival time between two consecutive events (in the case of annual maxima μ ¼ 1 year);

Univariate maximum likelihood estimation with historical information
In univariate hydrological frequency analysis, several methods for incorporating historical information have In the annual maximum flood series shown in Figure 2, there is a total of g known floods. Of these, l is known to be the largest flood during the maximum historical textual period of n years, while the c large flood occurred during the systematic record (recently gauged data) of s year (c l, c < s, s n and g ¼ s þ l À c). A fixed threshold X 0 is assumed to be exceeded by the l largest floods and is not exceeded by any of the remaining n À l floods. It is also noted that the r (r ¼ l À c) floods in the pre-gauging period h (h ¼ n À s) are known as they are included in the l floods which exceed X 0 , and it is assumed that no other floods exceed the threshold during the pre-gauging period.
Let f X and F X denote the probability density function (PDF) and the CDF of variable X, respectively. The likelihood function for the mixed sample of s þ r known and h À r unknown data is given as where α is the parameter vector of f X and F X .
Since c flood events exceeding the perception threshold X 0 occur during the systematic period s (analogously to the sketch in Figure 2), the c events are removed from the period s and are treated as historical data (Bayliss & Reed ).
Then, Equation (3) can be expressed as: floods exceeding the threshold X 0 .
The log-likelihood function for the univariate distribution can be expressed as (Li et al. ): The maximum likelihood estimates are those values of α that maximize Equation (5).

MHIFM approach
The inference function for margins (IFM) method is the pre- Let q j and w j i (j ¼ 1, . . . , s À c) denote the systematic data of flood peak (Q) and volumes (W i ), respectively; g k and z k i (k ¼ 1, . . . , l) denote the flood peak and volumes of the l largest floods with the same length, respectively; Q 0 (or W 0i ) is the fixed threshold exceeded by the l largest flood peaks (or volumes) and not exceeded by any of the remaining n À l flood peaks (or volumes). Furthermore, let f Q (q), f W 1 (w i ) denote the univariate marginal PDFs, and F Q (q), For the bivariate case (m equals 1), referring to Equation (5), the likelihood function with historical floods for bivariate joint distribution can be described as (Li et al. ): , α 1 and α 2 denote the parameter vector of F Q (q) and F W 1 (w 1 ) respectively.
Then, the log-likelihood function for bivariate joint distribution can be expressed as: Similarly, for the multivariate case, the likelihood function with historical floods for joint distribution can be derived as: Under the assumption that the marginal distributions are continuous with the probability density functions, the joint PDF can be expressed by copulas as: where c θ is the copula density function of Q, W 1 ,…, W m .
Then, the log-likelihood function for multivariate joint distribution can be expressed as: In which, the m þ 1 log-likelihood functions for the univariate marginal distribution are: The parameter estimation procedure consists of m þ 1 separate optimizations. The optimization of multivariate likelihood is a function of the dependence parameter vector.
To overcome these deficiencies, a meta-heuristic algorithm could be incorporated with the modified IFM method con- At each temperature, an initial configuration is generated and employed to evaluate the corresponding objective function E a (energy value). Then, a small displacement is imposed on the current configuration to obtain a new objec- is accepted as the best solution. Otherwise, it is accepted with the probability: where T is the current temperature and K is the Boltzmann's constant. If this probability is higher than a random value, this configuration is accepted as the best, although it is not better than the previous solution. Once the system has reached equilibrium at the current temperature T, the algorithm continues and reduces temperature gradually until the system reaches its minimum tempera- In this study, the optimization problem in SAA is a constrained single-objective optimization minimizing the opposite of the log-likelihood function L 1 , L 2 , . . . , L mþ1 or L separately in Equations (10) and (11), i.e.
(2) Set a variation interval for the parameters of distribution and generate initial solution configuration randomly.
(3) Calculate the value of objective function E a (e.g. ÀL a ).
(4) Generate a new solution configuration by imposing a small displacement on current configuration and evaluate the value of objective function E b (e.g. ÀL b ).
accepted. Otherwise, accept the new solution with a probability p(δE).

EFC method
After modelling the joint distribution with incorporation of historical information by the developed MHIFM approach, we need to estimate the design flood quantiles, which is To meet the needs in engineering, a unique combination of flood peak and volumes should be determined.
Hence, besides Equation (2), one or more constraints characterizing the interdependence between hydrological variables are necessary. The EFC method recommended by the Chinese design flood guidelines (MWR ) is widely employed due to its ease of implementation, which assumes that the flood peak and volumes have the same ). As shown in Figure 3, the tangent curve on which F Q (q) ¼ F W 1 (w 1 ) is depicted as a chosen constraint on the critical level curve and the intersection A point demonstrates the unique estimator of the EFC method. Moreover, point D depicted in Figure 3 indicates the estimated design flood quantiles by the conventional univariate method, in which only the marginal distribution is used under the assumption that the design frequency of different variables is equal to the joint design frequency.
Taking the Asymmetric Gumbel-Hougaard (A-GH) copula within trivariate framework as an example, the relationship of joint distribution C θ (u, v 1 , v 2 ) and trivariate JRP T OR can be expressed as: where θ 1 , θ 2 is the parameter of A-GH copula function.
Based on the assumption that u ¼ v 1 ¼ v 2 , the probabilities of occurrence of design flood quantiles (i.e. u, v 1 and v 2 ) can be estimated by the solution of Equation (14), which is expressed as follows: It should be noted that other copulas with more compli-   For a given JRP, the most likely flood event (qÃ, w Ã 1 , . . . , w Ã m ) at the quantile iso-surface is derived by selecting the event with the highest joint probability density: The Lagrange multiplier method ( where λ is the Lagrange multiplier.
The first order derivative equal to zero will reach the maximum value. For the bivariate case, the following equation should be satisfied: where c u ¼ @cðu; v 1 Þ @u , c v1 ¼ @cðu; v 1 Þ @u and f 0 Q ðqÞ and f 0 W 1 ðw 1 Þ is the derivative function of f Q (q) and f W 1 (w 1 ), respectively.
For (m þ 1) dimension case, the following equation should be satisfied: where The nonlinear Equation (20) can be solved by numerical methods, such as the incomplete Jacobian Newton method proposed by Liu & Ni ().

Parameter estimation for marginal distributions
For annual maximum flood series, the Pearson type three (P-III) distributions have been recommended by MWR () as a uniform procedure for flood frequency analysis in China. The PDF of the P-III distribution is defined as: where α, β and δ are shape, scale and location parameters of the P-III distribution, respectively.
The series of flood peak discharge (Q), 7-day flood volume (W 1 ) and 15-day flood volume (W 2 ) were considered in the Danjiangkou reservoir following a previous study (Xu et al. ). in Table 2 show that hypothesis could not be rejected at the 5% significance level because χ 2 < χ 2 0:05 , and the P-III distributions can also pass the K-S test because D KS < Table 3 also indicates that the bivariate G-H copula has the smallest RMSE and AIC values for constructing the joint distribution of Q-W 1 , and the trivariate asymmetric G-H copula performs best for Q-W 1 -W 2 . Therefore, the bivariate and trivariate asymmetric G-H copula is used to model the dependence between the extreme maximum annual flood peak and volumes in this study. Figure 5 demonstrates the bivariate joint CDFs of flood peak and volumes by MHIFM and IFM methods respectively, which show the CDFs constructed by G-H copula is continuous.

Bivariate design flood quantile estimation
After constructing the joint distribution by MHIFM and IFM methods, the EFC and MLR methods are used to estimate the bivariate design flood quantiles of flood peak and 7-day volume under JRP T OR ¼ 200, T OR ¼ 100, T OR ¼ 50, T OR ¼ 20 and T OR ¼ 10 year, respectively. For the purpose of comparison, the univariate method considering historical floods (Chinese design flood guidelines) was also used to estimate the design flood quantiles by assuming that the univariate return period was to the bivariate return period. Figure 6 demonstrates the bivariate quantile curves and the estimated design flood quantiles under the G-H copula.
To assess the rationality of the estimated design flood quantiles by the EFC and MLR methods, the 95% confidence interval (CI) bounds were derived by the approach  The estimated quantiles of univariate and bivariate methods are listed in Table 4. The results in Table 4 indicate that the design flood quantile based on the univariate Comparing the results estimated by the MHIFM and IFM approaches in Table 4 and Figure 6, it is shown that for the IFM method the design flood quantiles selected by the EFC and MLR methods are both underestimated compared with the MHIFM method. This is due to the fact that the conventional IFM method fails to consider the historical floods and leads to underestimation for flood frequency analysis. Table 4 also demonstrates that the estimated flood quantiles under the IFM method are less than the univariate estimations when the return period is over 20 years. However, for the 10-year return period, both the EFC and MLR designed events under the IFM method are larger than the univariate design floods. This may imply that, even though the historical   Table 5.
It is shown in Table 5 Table 5.
The comparison results listed in Table 5    JRP. This may be due to the fact that the Danjiangkou reservoir has a large amount of flood control storage with annual regulation ability, and therefore the design flood volume is relatively more important than peak discharge for flood prevention safety. As a consequence, the trivariate MLR method with slightly larger 7-and 15-day flood volume is safer for reservoir design compared with the EFC method. (1) The investigated data series consist of systematic record and six historical flood events, and the fitting results indicate that the developed MHIFM approach that incorporates historical information into multivariate distribution has a good performance for estimating the parameters of marginal and joint distributions.

CONCLUSIONS
(2) The conventional IFM method that fails to consider historical floods produces underestimated design flood quantiles compared with the MHIFM method, and is also inferior to the univariate method with incorporation of historical information, implying that the conventional IFM method is unable to satisfy the flood prevention standard. The reservoir routing results show that Z max obtained by the trivariate MHIFM approach is higher than that obtained by the conventional IFM approach and the trivariate MLR method yields safer results for reservoir design than the EFC method.
(3) The developed MHIFM approach associated with the MLR method could provide a more comprehensive and effective way for multivariate quantile estimation to incorporate historical information, particularly when the recorded data series is relatively short, which would provide rich information in engineering practice as the references for flood risk assessment, reservoir operation and management.