Abstract

In this paper, we study uncertainty in estimating extreme floods of the Dongting Lake basin, China. We used three methods, including the Delta, profile likelihood function (PLF), and the Bayesian Markov chain Monte Carlo (MCMC) methods, to calculate confidence intervals of parameters of the generalized extreme value (GEV) distribution and quantiles of extreme floods. The annual maximum flow (AMF) data from four hydrologic stations were selected. Our results show that AMF data from Taoyuan and Xiangtan stations followed the Weibull class distribution, while the data from Shimen and Taojiang stations followed the Fréchet class distribution. The three methods show similar confidence intervals of design floods for short return periods. However, there are large differences between results of the Delta and the other two methods for long return periods. Both PLF and Bayesian MCMC methods have similar confidence intervals to reflect the uncertainty of design floods. However, because the PLF method is quite burdensome in computation, the Bayesian MCMC method is more suitable for practical use.

INTRODUCTION

The main interest in flood frequency analysis (FFA) is to estimate flood design values, which play an important role in design and management of water resources (Katz et al. 2002; Strupczewski et al. 2007; Liang et al. 2011, 2018; Obeysekera & Salas 2014; Arnaud et al. 2017). Statistical modeling is important for FFA, in which a probability distribution is usually selected to fit the historical flood data series (Arnaud et al. 2017; Wang et al. 2017). Commonly used models for FFA include, but are not limited to, the lognormal, Pearson type III, log-Pearson type III, and the generalized extreme value (GEV) distributions (Obeysekera & Salas 2014). However, these methods are highly dependent on the hydrologic data. For example, long data series can improve the rigorousness of the hydrologic frequency calculation. However, in practice, the measured hydrologic data are generally not long enough to meet the requirements of design precision. This leads to uncertainty in hydrologic frequency calculation. Therefore, uncertainty assessment in FFA has become one of hydrology's active areas of research (O'Connell et al. 2002; Reis & Stedinger 2005; Liang et al. 2011; Halbert et al. 2016; Arnaud et al. 2017; Xue et al. 2018).

Uncertainty is usually illustrated by a confidence interval in hydrology (Dupuis & Field 1998; Kyselý 2008, 2009; Obeysekera & Salas 2014; Arnaud et al. 2017; Wang et al. 2017). Prior works have investigated different methods for computing confidence intervals for parameters and design floods from probabilistic models (Dupuis & Field 1998; Kuczera 1999; O'Connell 2005; Kyselý 2008; Bolívar-Cimé et al. 2015).

The Delta method is one of the most frequently used methods in uncertainty evaluation. The confidence intervals obtained by this method are known as asymptotic maximum likelihood confidence intervals (Dupuis & Field 1998; Coles 2001; Reis & Stedinger 2005; Bolívar-Cimé et al. 2015). However, they have important shortcomings for small and moderate samples. The asymptotic maximum likelihood confidence intervals are symmetric around the maximum likelihood estimate (MLE) and do not consider the significant asymmetry of the likelihood surface for small and moderate samples. Therefore, the Delta method may underestimate the true values of large quantiles of interest (Bolívar-Cimé et al. 2015).

The profile likelihood function (PLF) method is a more robust and an accurate method for determining the uncertainty of quantiles (Coles 2001). However, it is a less common method in hydrology due to its absence of a probabilistic formalism for return level values (Coles & Pericchi 2003; Yoon et al. 2009) and its computational burden (Obeysekera & Salas 2014). The confidence intervals for quantiles obtained by PLF method takes into account the asymmetry of the likelihood surface, so can yield reasonable inferences for the observed samples (Bolívar-Cimé et al. 2015). This method has attracted more and more attention in recent years (Tajvidi 2003; Virtanen & Uusipaikka 2008; Lu et al. 2013; Bolívar-Cimé et al. 2015; Wang et al. 2016). Lu et al. (2013) used the PLF method to analyze the uncertainty of hydro-meteorological extreme. Bolívar-Cimé et al. (2015) compared the coverage frequencies of three types of intervals for large quantiles and found the profile likelihood intervals of quantiles to be optimal for samples of sizes of . Wang et al. (2016) adopted the PLF method to evaluate uncertainties in extreme flood estimations of the upper Yangtze River. They found that the PLF method is more proficient in reflecting the uncertainty of design flood than the Delta method.

The Bayesian Markov chain Monte Carlo (MCMC) method is another technique for estimating uncertainties. Many researchers have suggested using Bayesian method for estimating credible intervals of extreme values (Kuczera 1999; O'Connell et al. 2002; Reis & Stedinger 2005; Ribatet et al. 2007; Liang et al. 2011; Wang et al. 2017). The Bayesian method can provide a more accurate description of flood risk and parameter uncertainties, because it generates samples directly from the posterior distribution and does not need any approximation. The credible intervals for parameters or any function of the parameters obtained by the Bayesian method are more easily interpreted than the confidence intervals in classical statistics (Reis & Stedinger 2005; Viglione et al. 2013). Kuczera (1999) suggested a Monte Carlo Bayesian method for computing the expected probability distribution and quantile confidence limits for any flood frequency distribution. Reis & Stedinger (2005) found that Bayesian MCMC method provided an accurate description of large flood records and historical flood information and their uncertainty. Liang et al. (2011) proposed a new FFA method which is based on Bayesian MCMC and considered uncertainty of both the model and the parameters.

The Delta, PLF, and Bayesian MCMC methods are three likelihood-based methods for uncertainty assessment. Most prior studies have focused on only one, and in some cases, two of the methods. However, cross comparison between PLF and Bayesian MCMC methods has yet to be reported in the literature. The goal of this paper is to investigate and compare the performances of the Delta, PLF, and Bayesian MCMC methods in uncertainty assessment of extreme flood estimation.

The extreme value theory (EVT) is commonly used to evaluate the characteristics of extreme events, and has been widely used in hydrology and meteorology (Katz et al. 2002). The GEV distribution and generalized Pareto distribution (GPD) are two standard distributions used in EVT to describe the behavior of extremes (Coles 2001). For a given time series, if a sequence of maximum values is selected from blocks or periods of equal length (e.g., annual maximum), then the probability distribution of the selected values will converge asymptotically to the GEV distribution under the assumption of being independent and identically distributed (Jenkinson 1955). This approach is called the block maxima method (Coles 2001). In this study, the GEV distribution is used to model flood extremes, because it has the ability to capture a wide range of extreme tail behaviors and has been widely used in previous hydrological studies (Katz et al. 2002; Morrison & Smith 2002; Yoon et al. 2009; Wang et al. 2016).

The Dongting Lake basin is located in the middle and lower reaches of the Yangtze River region of China. This area belongs to the middle part of ‘three easily disaster-affected zones’ of China, and has become one of the most severe disaster regions especially for flood and drought (Xiong et al. 2009). Floods have become one of the greatest obstacles to the sustainable development of agriculture in Dongting Lake basin. The objectives of this study are mainly: (1) to quantify uncertainty in extreme flood estimation of the Dongting Lake basin and (2) to compare the performances of the Delta, PLF, and Bayesian MCMC methods in evaluating parameters and quantiles of the GEV distribution.

STUDY AND DATA

Study area

The Dongting Lake basin includes the whole of Hunan province and part of Hubei, Guizhou, Guangxi province in China, with a total area of 2.63 × 105 km2 (Figure 1). The river system in this basin is very developed, and the Lishui, Yuanjiang, Zishui, and Xiangjiang are four main rivers in this basin. The basin lies in the subtropical monsoon climate zone, with a wet season from July to September and a dry season between November and February (Yuan et al. 2016). Precipitation in the rainy season is from April to September which accounts for more than 65% of the annual precipitation. Heavy rains often cause local floods in the region.

Figure 1

Location of the Dongting Lake basin and hydrologic stations.

Figure 1

Location of the Dongting Lake basin and hydrologic stations.

In this paper, the upper and middle reaches of Dongting Lake basin were chosen as our study area. This includes Dongting Lake area, as well as Lishui, Yuanjiang, Zishui, and Xiangjiang rivers, with a total area of 1.738 × 105 km2 (82.7% of total area of the Dongting Lake basin) (Xiong et al. 2009). The Shimen, Taoyuan, Taojiang, and Xiangtan stations are the control stations for the Lishui, Yuanjiang, Zishui, and Xiangjiang rivers, respectively. Figure 1 shows the study area, the main rivers, and the hydrologic stations.

Data

The observed annual maximum flow (AMF) data recorded at Shimen, Taoyuan, Taojiang, and Xiangtan stations were used to conduct an analysis of flood frequency in Dongting Lake basin. Table 1 shows information of the four selected hydrologic stations. Data were obtained from the Hydrology and Water Resources Survey Bureau of Hunan Province. All the data are quality controlled before their release.

Table 1

Information of hydrologic stations

Hydrologic stations Corresponding river Catchment area (km2Length of data series 
Shimen Lishui 18,496 1951–2014 
Taoyuan Yuanjiang 89,163 1953–2014 
Taojiang Zishui 28,142 1951–2014 
Xiangtan Xiangjiang 94,660 1951–2012 
Hydrologic stations Corresponding river Catchment area (km2Length of data series 
Shimen Lishui 18,496 1951–2014 
Taoyuan Yuanjiang 89,163 1953–2014 
Taojiang Zishui 28,142 1951–2014 
Xiangtan Xiangjiang 94,660 1951–2012 

METHODOLOGY

Test for stationarity

The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test is performed on the AMF data to test for stationarity (Kwiatkowski et al. 1992). Let be an observed time series (i.e., AMF data) for which we wish to test stationarity, and it is assumed to be the sum of a deterministic trend, a random walk, and a stationary error. 
formula
(1)
where is a deterministic trend coefficient, is a stationary error, is a random walk, and is white noise with . If a time series is stationary around a deterministic trend, then the null hypothesis is . In another case, if a time series is stationary around a fixed level, then the null hypothesis is .

Generalized extreme value distribution and its quantile

The GEV distribution was introduced by Jenkinson (1955) as a model for extreme values. The form of the cumulative distribution function is as follows: 
formula
(2)
where , are the location, scale, and shape parameters, respectively. Here, , , , . The GEV distribution is equivalent to a Weibull class distribution if , to a Fréchet class distribution if and to a Gumbel class distribution if .
The quantile () of the GEV distribution is given by: 
formula
(3)
where this quantile is an inversion of the GEV distribution. In flood frequency analysis, is the return level of the design flood flow associated with the return period .

Maximum likelihood estimation and Delta-based confidence interval

Suppose are independent and identically distributed random variables from a GEV distribution, and is a sample of n observations. The log-likelihood function of the GEV distribution (when ) is (Coles 2001): 
formula
(4)
and it becomes (when ): 
formula
(5)

Maximization of Equations (4) and (5) with respect to the parameter vector leads to the ML estimate.

The ML confidence intervals for the parameters of interest are usually calculated by using the Delta method. The main steps for the Delta method are as follows (Coles 2001; Lu et al. 2013; Wang et al. 2016):

  1. Calculate the variance-covariance matrix V for parameter estimators. The observed information matrix can be calculated as , where is the parameter vector of the GEV distribution. V is equal to the inverse of when , is the ML estimator of . However, the GEV distribution has a very short bounded upper tail when , which is rarely encountered in applications of extreme value modeling.

  2. Calculate the confidence intervals of and the standard errors of can be obtained by finding the square roots of diagonals of matrix V. Denoting the terms of matrix V by , then the approximation of confidence intervals of can be calculated as follows: 
    formula
    (6)
    where is the quantile of the standard normal distribution.
  3. Calculate the variance of ML estimator , ,where 
    formula
    (7)
    evaluated at and .
  4. Calculate the approximate confidence interval of the design flood flows based on the asymptotic normality of ML estimation, 
    formula
    (8)
    where is the quantile of the standard normal distribution.

Confidence interval estimation based on the profile likelihood function

The profile log-likelihood for parameter is first defined as (Coles 2001): 
formula
(9)
where denotes all components of excluding , and is the log-likelihood for . That is, for each value , the profile log-likelihood is the maximized log-likelihood with respect to all other components of .

The profile likelihood function (PLF) method can be used to obtain confidence intervals of particular parameters, such as . It can also be used to calculate the confidence interval of the quantile (Coles 2001; Lu et al. 2013; Wang et al. 2016).

In order to obtain the profile likelihood estimate for first we fix to a constant (). Then we maximize Equation (4) with respect to the remaining parameters ( and ) and we repeat this procedure for a certain range of . The corresponding maximized values constitute the profile log-likelihood function for . Then by maximizing the profile log-likelihood function, one can obtain the profile log-likelihood estimate estimator . The approximation of confidence interval of can be calculated as: 
formula
(10)
where is quantile of the Chi-square distribution with 1 degree of freedom.
The PLF method can also be applied to calculate the confidence interval of the quantile , which is a combination of parameters , and . For this purpose, we can re-parameterize the GEV distribution by introducing into the GEV distribution. For example, 
formula
(11)

By replacement of in Equation (4) with Equation (11) we can obtain the log-likelihood function of the GEV distribution for parameters , , and. Then the profile likelihood estimator and the PLF confidence interval of are obtained.

Parameter estimation and credible interval estimation based on Bayesian MCMC

Unlike classical statistical inferences, in the Bayesian framework the parameter is treated as a random variable, and its probability density function is given based on expert knowledge (i.e., prior experience) or historical data (this is the prior density of). The Bayes' theorem is then used to combine this prior information with the sample information. For given observations, , the Bayes' theorem states that: 
formula
(12)
where and are the prior and posterior probability densities of parameter , respectively. The is the likelihood function of samples x, and is the parameter space of parameter . In Bayesian inference, the posterior distribution is applied to make inferences about parameter . In this study, the mean of the posterior distribution is used as point estimate of parameter , and the credible interval of the posterior distribution is regarded as an interval estimate of .

The integral in the denominator of Equation (12) serves as a normalization constant, and it is computed on the whole parameter space . However, in most cases, the normalization constant cannot be easily computed. In this study, the MCMC method is employed to deal with this intractable problem. The MCMC generates samples directly from the posterior distribution without computing the normalization constant. Several MCMC algorithms have been used in flood hydrology research (Kuczera 1999; O'Connell 2005; Reis & Stedinger 2005; Ribatet et al. 2007; Liang et al. 2011). In this work, we used the Metropolis–Hastings algorithm (Metropolis et al. 1953; Hastings 1970) as it is one of the widely used MCMC methods (O'Connell et al. 2002; Reis & Stedinger 2005; Lee & Kim 2008; Yoon et al. 2009).

Two types of prior distributions are used in practice: data and non-data based prior distributions (Lee & Kim 2008; Wang et al. 2017). The former is obtained from the existing data and available results, while the latter comes from subjective judgments or theoretical considerations. In most practical cases, it is difficult to obtain the prior knowledge about the parameters. Therefore, a non-informative prior distribution is often adopted which is a special case of non-data based prior distribution. In such cases, non-informative priors with large variances are usually used to reflect this prior ignorance (Coles 2001). This study adopts independent zero-mean normal prior distributions on , and , with variances . This is a common prior distribution elicitation in Bayesian approach to hydrological processes (Coles 2001; Yoon et al. 2009; Chikobvu & Chifurira 2015; Liu et al. 2016; Liang et al. 2018). All of these are summarized below: 
formula
(13)
 
formula
(14)
 
formula
(15)
 
formula
(16)
Prediction is straightforward under the Bayesian framework. If z denotes future flood design values, then the predictive density of z, given the observed values x, is given by: 
formula
(17)

Compared to other predictive approaches, the predictive density has the advantage in reflecting uncertainty in the model by and also uncertainty due to the variability in the future observations by (Coles 2001; Yoon et al. 2009).

For a given return period , the design flood is defined by solving: 
formula
(18)
where is the cumulative distribution function of Z given that is the true parameter vector.

RESULTS

Test for stationarity of the AMF data

We used the KPSS test to examine the stationarity of AMF data at Shimen, Taoyuan, Taojiang, and Xiangtan stations. The KPSS ‘level’ test statistics of the four stations are 0.238, 0.079, 0.225, 0.055, while the KPSS ‘trend’ test statistics are 0.036, 0.068, 0.132, 0.055, which all are less than the critical values 0.463 and 0.146 at the 5% significance level. The results do not reject the null hypothesis of ‘level’ or ‘trend’ stationarity. Therefore, all the AMF data series can be considered as stationary.

Distribution selection of the AMF data

First, three widely used flood frequency distributions in China, including Pearson type III (P-III), GEV, and lognormal (LN) distribution, were employed as the candidate flood distributions. The fitting quality of the candidate distributions was assessed by two goodness-of-fit tests: Kolmogorov–Smirnov (K-S) (Yevjevich 1972) and Anderson–Darling (A-D) (Stephens 1974) tests.

The K-S statistic is represented as: 
formula
(19)
The A-D statistic is defined as: 
formula
(20)
 
formula
(21)
where n is the sample size, is the empirical distribution function and is the cumulative distribution function of interest. The distribution with the lowest statistic is considered as the best fitting model.

Table 2 shows that the GEV distribution has the lowest statistic values in both test methods, except for Shimen station, where the K-S statistic of the GEV distribution is slightly larger than that of the LN distribution. However, the A-D statistic value of the GEV distribution is the lowest among the three distributions. We selected the GEV distribution as the best distribution for all the four stations.

Table 2

Goodness-of-fit for AMF distribution based on K-S and A-D tests (the lowest statistic value is shown in bold)

Stations Test methods GEV P-III LN 
Shimen K-S 0.0725 0.0711 0.0701 
A-D 0.2635 0.2655 0.2713 
Taoyuan K-S 0.0831 0.0979 0.1233 
A-D 0.3012 0.3746 0.6600 
Taojiang K-S 0.0685 0.0814 0.0838 
A-D 0.2822 0.3590 0.3778 
Xiangtan K-S 0.0673 0.0701 0.0819 
A-D 0.3445 0.4040 0.4479 
Stations Test methods GEV P-III LN 
Shimen K-S 0.0725 0.0711 0.0701 
A-D 0.2635 0.2655 0.2713 
Taoyuan K-S 0.0831 0.0979 0.1233 
A-D 0.3012 0.3746 0.6600 
Taojiang K-S 0.0685 0.0814 0.0838 
A-D 0.2822 0.3590 0.3778 
Xiangtan K-S 0.0673 0.0701 0.0819 
A-D 0.3445 0.4040 0.4479 

Parameter estimation and goodness-of-fit test of the GEV distribution

The ML estimators of GEV parameters in all four stations were obtained by maximization of Equations (4) and (5) with respect to the parameter vector . Then by substituting these parameters into Equation (3), the design floods for a return period of 10, 50, and 100 years were estimated. Table 3 shows the ML estimators of the GEV parameters for each station. It is clear in Table 3 that the shape parameters of Taoyuan and Xiangtan stations are less than zero, which indicates that the two sample series can be modeled by Weibull class distribution (a short tail and bounded above). Shape parameters of Shimen and Taojiang stations are slightly more than zero indicating that the sample series of the two stations follow the Fréchet class distribution. The design floods under different return periods are shown in Table 4. For Taoyuan and Xiangtan stations, the design floods are comparatively large, and both of their design floods for the 50-year return level are larger than . The design floods are maximum at Taoyuan station and minimum at Taojiang station for all different return periods.

Table 3

Estimators and 95% confidence intervals of GEV parameters (Delta method)

Stations Parameters estimators
 
Location parameter
 
Scale parameter
 
Shape parameter
 
Location Scale Shape SE CI (SE CI (SE CI 
Shimen 5449 2633 0.042 372 [4.72,6.18] 295 [2.05,3.21] 0.113 [−0.180,0.264] 
Taoyuan 14069 4867 −0.233 686 [12.72,15.42] 540 [3.81,5.93] 0.104 [−0.438, −0.029] 
Taojiang 4477 1853 0.008 258 [3.97,4.98] 198 [1.47,2.24] 0.096 [−0.180,0.195] 
Xiangtan 11198 3704 −0.239 543 [10.13,12.26] 446 [2.83,4.58] 0.133 [−0.502,0.023] 
Stations Parameters estimators
 
Location parameter
 
Scale parameter
 
Shape parameter
 
Location Scale Shape SE CI (SE CI (SE CI 
Shimen 5449 2633 0.042 372 [4.72,6.18] 295 [2.05,3.21] 0.113 [−0.180,0.264] 
Taoyuan 14069 4867 −0.233 686 [12.72,15.42] 540 [3.81,5.93] 0.104 [−0.438, −0.029] 
Taojiang 4477 1853 0.008 258 [3.97,4.98] 198 [1.47,2.24] 0.096 [−0.180,0.195] 
Xiangtan 11198 3704 −0.239 543 [10.13,12.26] 446 [2.83,4.58] 0.133 [−0.502,0.023] 
Table 4

Quantiles and 95% confidence intervals of AMF (Delta method)

Stations Quantiles
 
CI
 
10 years 50 years 100 years 10 years 50 years 100 years 
Shimen 11.66 16.61 18.81 [9.80,13.53] [11.92,21.30] [12.25,25.37] 
Taoyuan 22.59 26.54 27.81 [20.81,24.38] [23.41,29.67] [23.87,31.74] 
Taojiang 8.68 11.82 13.16 [7.50,9.87] [9.15,14.48] [9.54,16.77] 
Xiangtan 17.64 20.59 21.53 [16.30,18.98] [17.88,23.30] [18.01,25.05] 
Stations Quantiles
 
CI
 
10 years 50 years 100 years 10 years 50 years 100 years 
Shimen 11.66 16.61 18.81 [9.80,13.53] [11.92,21.30] [12.25,25.37] 
Taoyuan 22.59 26.54 27.81 [20.81,24.38] [23.41,29.67] [23.87,31.74] 
Taojiang 8.68 11.82 13.16 [7.50,9.87] [9.15,14.48] [9.54,16.77] 
Xiangtan 17.64 20.59 21.53 [16.30,18.98] [17.88,23.30] [18.01,25.05] 

We also estimated parameters of GEV distribution by the Bayesian MCMC method using the Metropolis–Hastings algorithm, and a random walk proposal transition on parameters , and was used. To make the Markov chains converge faster, the maximum likelihood estimates of , , and were assigned to their original values. For each parameter, samples were generated from the posterior distribution, and the burn-in periods were selected as 500 although all chains converge near the initial values. Therefore, 9,500 samples were used in the computation of the mean and credible interval of each parameter.

Tables 5 and 6 show the parameter estimation results and the design floods for a return period of 10, 50, and 100 years using the Bayesian MCMC method. From Tables 3 and 5, it can be seen that the estimation results for location and scale parameters by MLE and Bayesian MCMC methods are close, with the largest differences of location and scale parameters as 128 and 51 at Xiangtan station (discrepancies of 1.14% and 1.38%). All shape parameters estimated by Bayesian MCMC method are slightly smaller than MLE. However, selected distributions of the four stations are exactly the same as the ML method. Table 6 shows that the largest design floods for all different return periods are at Taoyuan station and the smallest values are at Taojiang station. These results are similar to those using ML method.

Table 5

Estimators and 95% credible intervals of GEV parameters (Bayesian MCMC method)

Stations Parameters estimators
 
CI
 
Location Scale Shape Location  Scale  Shape 
Shimen 5478 2671 0.053 [4.74,6.22] [2.15,3.29] [−0.137,0.280] 
Taoyuan 14043 4843 −0.196 [12.69,15.35] [3.94,5.95] [−0.369,0.019] 
Taojiang 4455 1855 0.037 [3.97,4.98] [1.51,2.28] [−0.117,0.249] 
Xiangtan 11326 3755 −0.223 [10.28,12.41] [3.05,4.65] [−0.444,0.030] 
Stations Parameters estimators
 
CI
 
Location Scale Shape Location  Scale  Shape 
Shimen 5478 2671 0.053 [4.74,6.22] [2.15,3.29] [−0.137,0.280] 
Taoyuan 14043 4843 −0.196 [12.69,15.35] [3.94,5.95] [−0.369,0.019] 
Taojiang 4455 1855 0.037 [3.97,4.98] [1.51,2.28] [−0.117,0.249] 
Xiangtan 11326 3755 −0.223 [10.28,12.41] [3.05,4.65] [−0.444,0.030] 
Table 6

Quantiles and 95% credible intervals of AMF (Bayesian MCMC method) (unit:)

Stations Quantiles
 
CI
 
10 years 50 years 100 years 10 years 50 years 100 years 
Shimen 11.91 17.36 20.00 [10.26,14.39] [13.67,24.66] [14.93,30.69] 
Taoyuan 22.88 27.43 29.02 [21.12,25.26] [24.71,32.89] [25.72,36.38] 
Taojiang 8.83 12.40 14.04 [7.76,10.33] [10.19,16.74] [11.10,20.40] 
Xiangtan 18.00 21.30 22.44 [16.69,19.83] [19.21,26.00] [19.82,28.81] 
Stations Quantiles
 
CI
 
10 years 50 years 100 years 10 years 50 years 100 years 
Shimen 11.91 17.36 20.00 [10.26,14.39] [13.67,24.66] [14.93,30.69] 
Taoyuan 22.88 27.43 29.02 [21.12,25.26] [24.71,32.89] [25.72,36.38] 
Taojiang 8.83 12.40 14.04 [7.76,10.33] [10.19,16.74] [11.10,20.40] 
Xiangtan 18.00 21.30 22.44 [16.69,19.83] [19.21,26.00] [19.82,28.81] 

To compare the Bayesian estimation and MLE fitting effects, two methods were employed to test for goodness-of-fit: (1) root-mean-square error (RMSE) and (2) K-S statistic.

The RMSE is expressed as: 
formula
(22)
where n is the number of values, denotes the observed value, and denotes the computed value. The model is more accurate when RMSE is close to zero.

Table 7 shows the RMSE and K-S test goodness-of-fit statistics. From results in Table 7 one can conclude that all GEV models based on Bayesian MCMC estimation and MLE have passed the K-S test (significance level ) indicating that all observed series have an appropriate fit using the theoretical GEV distribution. In contrast, it is observed that the RMSE of Bayesian MCMC estimation is slightly larger than MLE with the largest difference of 0.0651 at Shimen station. The results confirm the MLE shows slightly better performance in fitting than the Bayesian MCMC estimation. At the Xiangtan station, the RMSE of Bayesian MCMC estimation is slightly smaller than that of MLE, indicating that the fitting by Bayesian is slightly better than the MLE. In general, both methods have almost the same fitting results. This conclusion is in agreement with prior work by Lee & Kim (2008) and Wang et al. (2017).

Table 7

Goodness-of-fit test for Bayesian estimation and MLE

Stations Statistics Bayesian MCMC MLE 
Shimen K-S statistic 0.0755 0.0725 
RMSE 0.4677 0.4026 
Taoyuan K-S statistic 0.0841 0.0831 
RMSE 0.5650 0.5516 
Taojiang K-S statistic 0.0724 0.0685 
RMSE 0.2654 0.2508 
Xiangtan K-S statistic 0.0797 0.0673 
RMSE 0.5284 0.5338 
Stations Statistics Bayesian MCMC MLE 
Shimen K-S statistic 0.0755 0.0725 
RMSE 0.4677 0.4026 
Taoyuan K-S statistic 0.0841 0.0831 
RMSE 0.5650 0.5516 
Taojiang K-S statistic 0.0724 0.0685 
RMSE 0.2654 0.2508 
Xiangtan K-S statistic 0.0797 0.0673 
RMSE 0.5284 0.5338 

Uncertainty analysis based on the Delta method

We calculated standard errors (SE) of the estimators , , and obtained by Delta method. Then we calculated the approximate 95% confidence intervals of parameters , , and by Equation (6). Table 3 shows the SE and 95% confidence intervals of GEV parameters. There is a positive correlation between the SE of GEV parameter estimators and the width of confidence intervals. Larger SE can lead to wider confidence interval indicating that the parameter estimation has a high uncertainty. Largest SE and widest confidence interval is at Taoyuan station, while the value in the Taojiang station is the smallest.

The quantiles and corresponding confidence intervals at different return periods were calculated by Equations (3) and (8). Table 4 shows the approximate 95% confidence intervals for a return period of 10, 50, and 100 years. Figure 2 compares confidence intervals' width at different return periods. For the Shimen, Taoyuan, Taojiang, and Xiangtan stations, widths of confidence interval of AMF for a return period of 50 and 100 years are 2.5, 1.8, 2.3, 2.0 and 3.5, 2.2, 3.1, and 2.6 times larger than that of 10-year return period, respectively. The results show that different return periods have a large effect on the width of confidence intervals. This is because the information provided by the data decreases as the return period increases resulting in an increase of uncertainty in FFA.

Figure 2

Comparisons for width of confidence intervals for different return periods obtained by the Delta method.

Figure 2

Comparisons for width of confidence intervals for different return periods obtained by the Delta method.

Figure 3 shows the return level plots of AMF for the four stations. The upper and lower curves are the upper and lower limits of the 95% confidence intervals of the design floods. As shown in Figure 3, almost all sample data are within the range of the confidence intervals and around the middle curves of the Delta method. However, all the lower curves of the Delta method show a downward trend beyond a return period of 100 years. This problem is reported in prior work by Coles et al. (2003) and Reis & Stedinger (2005). Lack of sampling data on extreme floods and the limitations of the Delta method are the two main reasons for this observation.

Figure 3

Return level plots of AMF with 95% confidence/credible interval estimated by the Delta, PLF, and Bayesian MCMC methods: (a) Shimen, (b) Taoyuan, (c) Taojiang, and (d) Xiangtan. In each case, the middle line represents the design flood estimate, while the upper and lower curves are the upper and lower limits of the 95% confidence/credible intervals of the design floods. The dots are empirical estimates of design floods.

Figure 3

Return level plots of AMF with 95% confidence/credible interval estimated by the Delta, PLF, and Bayesian MCMC methods: (a) Shimen, (b) Taoyuan, (c) Taojiang, and (d) Xiangtan. In each case, the middle line represents the design flood estimate, while the upper and lower curves are the upper and lower limits of the 95% confidence/credible intervals of the design floods. The dots are empirical estimates of design floods.

Uncertainty analysis based on the PLF method

Profile log-likelihood function curves of the shape parameter of GEV distribution are shown in Figure 4 for the four studied stations. The abscissa of the peak point of the curve in Figure 4 is the estimate of , and the left and right intersections between the curve and the straight line are the lower and upper limits of confidence interval of at 95% confidence. Therefore, the 95% confidence interval for at Shimen, Taoyuan, Taojiang, and Xiangtan stations is [−0.146,0.278], [−0.404,0.020], [−0.137,0.233], and [−0.478,0.028]. These confidence intervals are slightly different than those obtained by the Delta method.

Figure 4

PLF curve for the shape parameter of the GEV distribution: (a) Shimen, (b) Taoyuan, (c) Taojiang, and (d) Xiangtan. The abscissas at which the curve intersects the lower horizontal line are the lower and upper limits of the 95% confidence interval.

Figure 4

PLF curve for the shape parameter of the GEV distribution: (a) Shimen, (b) Taoyuan, (c) Taojiang, and (d) Xiangtan. The abscissas at which the curve intersects the lower horizontal line are the lower and upper limits of the 95% confidence interval.

Figure 5 shows the PLF curves for design floods of AMF for a return period of 10, 50, and 100 years at Taoyuan station. It can be seen that all PLF curves in Figure 5 are asymmetric. This asymmetry increases with the return period. The reason is that the information provided by the data decreases as the return period increases. Table 7 shows the 95% confidence intervals of AMF for a return period of 10, 50, and 100 years obtained by the PLF method. Confidence intervals of AMF for the 10-year return period in Table 8 are similar to the Delta method in Table 4. However, confidence intervals of AMF for the 50- and 100-years' return period are significantly different than those obtained by the Delta method.

Table 8

Confidence intervals of AMF (PLF method, 95% confidence) (unit: )

Stations Return periods
 
10 years 50 years 100 years 
Shimen [10.09,14.08] [13.46,24.12] [14.69,30.01] 
Taoyuan [20.84,25.00] [24.31,32.49] [25.28,35.82] 
Taojiang [7.63,10.16] [9.93,16.20] [10.79,19.59] 
Xiangtan [16.49,19.57] [18.93,25.44] [19.54,28.03] 
Stations Return periods
 
10 years 50 years 100 years 
Shimen [10.09,14.08] [13.46,24.12] [14.69,30.01] 
Taoyuan [20.84,25.00] [24.31,32.49] [25.28,35.82] 
Taojiang [7.63,10.16] [9.93,16.20] [10.79,19.59] 
Xiangtan [16.49,19.57] [18.93,25.44] [19.54,28.03] 
Figure 5

PLF curves for design floods at Taoyuan station: (a) 10 years, (b) 50 years, and (c) for 100 years. The abscissas at which the curve intersects the lower horizontal line are the lower and upper limits of the 95% confidence interval.

Figure 5

PLF curves for design floods at Taoyuan station: (a) 10 years, (b) 50 years, and (c) for 100 years. The abscissas at which the curve intersects the lower horizontal line are the lower and upper limits of the 95% confidence interval.

Uncertainty analysis based on the Bayesian MCMC method

Table 5 shows results of the calculated 95% credible interval (an approximation of confidence interval in classical statistics) for each parameter. These credible intervals are slightly different than those obtained by both Delta and PLF methods. The 95% credible intervals of AMF for a return period of 10, 50, and 100 years obtained using the Bayesian MCMC method are shown in Table 6. As per Table 6, values of design floods estimated by Bayesian MCMC method under different return periods at each station are less than means of the corresponding credible intervals, and the credible intervals are not symmetric to the design values. The lengths between the upper credible limits and the design values are greater than the lower credible limits and the design values. This is because the Bayesian MCMC method estimation is based on the true posterior quantile distribution and is able to capture the skewness of the posterior distribution. The asymmetry of the credible intervals is more realistic than the traditional methods such as Delta. Additionally, Table 6 shows that larger return periods have the larger design values, and the widths of the corresponding credible intervals are wider indicating that the uncertainty increases with return period.

Comparison of the three methods results

To compare the results of the three methods presented above, the ranges of 95% confidence or credible intervals for the 10, 50, and 100 years return levels were used. From Tables 4, 6, 8 and Figure 3, we can see that the lower and upper limits of 95% credible intervals of AMF obtained by Bayesian MCMC method have the largest values, and those obtained by Delta method have the smallest values. Values obtained by PLF method are in between the Bayesian MCMC and the Delta methods. All three methods have similar intervals for the 10-year return level, while there are relatively large differences between the three methods with the 100-year return level. In addition, the ML estimators of quantiles for the Delta method are in the middle of confidence intervals, whereas they are all in the left part of the confidence or credible intervals for the PLF and Bayesian MCMC methods. In other words, the confidence interval of quantile for the Delta method is symmetric, but the confidence or credible interval of quantile for the PLF and Bayesian MCMC methods is asymmetric.

To further analyze the performance of the three methods, we compared the confidence or credible intervals of design floods with measured and simulated data. Some simulated time series of AMF for each station were generated for this purpose by using the Monte Carlo method which followed the GEV distribution with parameters displayed in Table 3. The record lengths of simulated time series are consistent with that measured data. We counted the average number of times that the simulated time series exceeded the lower limit of confidence or credible interval for the 100-year return level. Our results are shown in Table 9, which clearly shows that both measured and simulated time series had exceeded the lower limit of the confidence interval obtained by Delta method more than four times. The exceeded times by Delta method are about twice as large as that of the PLF and Bayesian MCMC methods. However, all these numbers obtained by the PLF and Bayesian MCMC methods are greater than 1 and less than 3 and there is little difference between the two methods. Our results show that the Delta method may overestimate the uncertainties of the design floods. The PLF and Bayesian MCMC methods have similar confidence or credible intervals of design floods.

Table 9

The numbers that sample series exceeded the lower limit of confidence or credible interval for a return period of 100 years (95% confidence level)

Stations Measured time series
 
Simulated time series
 
Delta PLF Bayesian Delta PLF Bayesian 
Shimen 5.27 2.35 2.16 
Taoyuan 3.99 2.27 1.87 
Taojiang 4.13 2.11 1.82 
Xiangtan 5.42 2.36 2.00 
Stations Measured time series
 
Simulated time series
 
Delta PLF Bayesian Delta PLF Bayesian 
Shimen 5.27 2.35 2.16 
Taoyuan 3.99 2.27 1.87 
Taojiang 4.13 2.11 1.82 
Xiangtan 5.42 2.36 2.00 

Figure 6 shows the width of confidence or credible intervals for return period of 10, 50, and 100 years obtained by the PLF and Bayesian MCMC methods at Taoyuan station. Our results show that the widths of confidence intervals or credible intervals obtained by the PLF method are almost the same as those by the Bayesian MCMC method. The return level plots of the two methods are shown in Figure 3. We can see that the two methods have almost similar results. Overall, it seems that both the PLF and the Bayesian MCMC methods are effective methods for uncertainty analysis of design flood in the Dongting Lake basin.

Figure 6

Comparisons for width of confidence/credible intervals for different return periods obtained by PLF and Bayesian MCMC methods at Taoyuan station.

Figure 6

Comparisons for width of confidence/credible intervals for different return periods obtained by PLF and Bayesian MCMC methods at Taoyuan station.

CONCLUSIONS

This study focuses on uncertainty assessment in extreme flood estimation using three common methods of flood frequency uncertainty analysis in hydrology (Delta, PLF, and Bayesian MCMC methods). We used these methods to estimate confidence intervals of extreme floods using AMF data recorded at four main stations in Dongting Lake basin. The main conclusions of this work are summarized as follows:

  1. All the observed AMF data of the four main stations in Dongting Lake basin can be modeled by using the GEV distribution. The data at Taoyuan and Xiangtan stations follow the Weibull class distribution, while the data at Shimen and Taojiang stations follow the Fréchet class distribution. The MLE and Bayesian MCMC methods are applied to the GEV parameter estimation and almost have the same fitting effects. The design floods of Taoyuan station are the largest under 10, 50, and 100 years' return period, while these values in the Taojiang station are found to be the smallest.

  2. Different return periods have great influence on the width of confidence intervals of the design floods which increase with the return period.

  3. The lower and upper limits of the confidence/credible intervals obtained by the PLF and Bayesian MCMC methods are larger than the lower and upper limits of the confidence intervals obtained by the Delta method. However, all the three methods have similar confidence/credible intervals for short return periods, and the differences increase with the increase in the return periods. The asymmetry of confidence/credible intervals of quantiles for the PLF and Bayesian MCMC methods reflects the characteristic that the uncertainty of the upper limit is higher than that of the lower limit.

In summary, all the three methods can be used to estimate confidence intervals of extreme floods. In addition, both the PLF and Bayesian MCMC methods can generate more accurate results than that of Delta method. However, the Bayesian MCMC method is more suitable for practical use, since the PLF method is burdensome in computation.

In this work, the prior distributions for parameters are the ‘vague’ priors and only the systematic data used in the Bayesian MCMC method. In future work, other informative prior distributions and flood data with other types of information (e.g., temporal information on historic floods, spatial information on floods in neighboring catchments, and causal information on the flood processes) should be used to further evaluate the differences among the three methods.

ACKNOWLEDGEMENTS

This study was supported by the National Scientific Foundation of China (NSFC) (No. 51779074, No. 41371052), the Ministry of Water Resources' special funds for scientific research on public cause (201501059), State's Key Project of Research and Development Plan (2017YFC0404304), Jiangsu water conservancy science and technology project (2017027), Qing Lan Project of Jiangsu Province and Jiangsu Province outstanding young teachers and principals overseas training program [2015]35. The authors are very grateful for support by the Program for Outstanding Young Talents in Colleges and Universities of Anhui Province (gxyq2018143) and the Nature Science Foundation of Wanjiang University of Technology (WG18030). We are grateful to Dr Mohamad Reza Soltanian, whose suggestions have improved the quality of the paper. We also thank the anonymous reviewers for their very helpful reviews.

REFERENCES

REFERENCES
Chikobvu
D.
&
Chifurira
R.
2015
Modelling of extreme minimum rainfall using generalised extreme value distribution for Zimbabwe
.
S. Afr. J. Sci.
111
(
9–10
),
1
8
.
Coles
S.
2001
An Introduction to Statistical Modeling of Extreme Values
.
Springer
,
London
,
UK
.
Coles
S.
&
Pericchi
L.
2003
Anticipating catastrophes through extreme value modelling
.
J. Roy. Stat. Soc. C-App.
52
,
405
416
.
Coles
S.
,
Pericchi
L. R.
&
Sisson
S.
2003
A fully probabilistic approach to extreme rainfall modeling
.
J. Hydrol.
273
(
1–4
),
35
50
.
Dupuis
D. J.
&
Field
C. A.
1998
A comparison of confidence intervals for generalized extreme-value distributions
.
J. Stat. Comput. Sim.
61
(
4
),
341
360
.
Katz
R. W.
,
Parlange
M. B.
&
Naveau
P.
2002
Statistics of extremes in hydrology
.
Adv. Water Resour.
25
(
8–12
),
1287
1304
.
Kwiatkowski
D.
,
Phillips
P. C. B.
,
Schmidt
P.
&
Shin
Y.
1992
Testing the null hypothesis of stationarity against the alternative of a unit root
.
J. Econometrics
54
,
159
178
.
Liang
Z.
,
Chang
W.
&
Li
B.
2011
Bayesian flood frequency analysis in the light of model and parameter uncertainties
.
Stoch. Env. Res. Risk. A.
26
(
5
),
721
730
.
Liu
Y.
,
Lu
M.
,
Huo
X.
,
Hao
Y.
,
Gao
H.
,
Liu
Y.
,
Fan
Y.
,
Cui
Y.
&
Metivier
F.
2016
A Bayesian analysis of generalized Pareto distribution of runoff minima
.
Hydrol. Process.
30
(
3
),
424
432
.
Lu
F.
,
Wang
H.
,
Yan
D.
,
Zhang
D.
&
Xiao
W.
2013
Application of profile likelihood function to the uncertainty analysis of hydrometeorological extreme inference
.
Sci. China Technol. Sci.
56
(
12
),
3151
3160
.
Metropolis
N.
,
Rosenbluth
A. W.
,
Rosenbluth
M. N.
&
Teller
A. H.
1953
Equation of state calculations by fast computing machines
.
J. Chem. Phys.
21
(
6
),
1087
1092
.
Obeysekera
J.
&
Salas
J. D.
2014
Quantifying the uncertainty of design floods under nonstationary conditions
.
J. Hydrol. Eng.
19
(
7
),
1438
1446
.
O'Connell
D. R. H.
2005
Nonparametric Bayesian flood frequency estimation
.
J. Hydrol.
313
(
1–2
),
79
96
.
O'Connell
D. R. H.
,
Ostenaa
D. A.
,
Levish
D. R.
&
Klinger
R. E.
2002
Bayesian flood frequency analysis with paleohydrologic bound data
.
Water Resour. Res.
38
(
5
),
1058
.
doi:10.1029/2000WR000028
.
Reis
D. S.
&
Stedinger
J. R.
2005
Bayesian MCMC flood frequency analysis with historical information
.
J. Hydrol.
313
(
1–2
),
97
116
.
Ribatet
M.
,
Sauquet
E.
,
Grésillon
J. M.
&
Ouarda
T. B. M. J.
2007
Usefulness of the reversible jump Markov chain Monte Carlo model in regional flood frequency analysis
.
Water Resour. Res.
43
(
8
),
W08403
.
doi:10.1029/2006WR005525
.
Stephens
M. A.
1974
EDF statistics for goodness of fit and some comparisons
.
J. Am. Stat. Assoc.
69
(
347
),
730
737
.
Strupczewski
W. G.
,
Kochanek
K.
&
Singh
V. P.
2007
On the informative value of the largest sample element of log-Gumbel distribution
.
Acta Geophys.
55
(
4
),
652
678
.
Viglione
A.
,
Merz
R.
,
Salinas
J. L.
&
Blöschl
G.
2013
Flood frequency hydrology: 3. A Bayesian analysis
.
Water Resour. Res.
49
(
2
),
675
692
.
Xiong
Y.
,
Wang
K.
&
Liu
C.
2009
Impact of LUCC on ecosystem service value in the up and middle reaches of Dongting Lake Basin, China
.
Water Science
7
(
4
),
327
334
.
Yevjevich
V.
1972
Probability and Statistics in Hydrology
.
Water Resources Publications
,
Fort Collins, CO
,
USA
.
Yuan
Y.
,
Zhang
C.
,
Zeng
G.
,
Liang
J.
,
Guo
S.
,
Huang
L.
,
Wu
H.
&
Hua
S.
2016
Quantitative assessment of the contribution of climate variability and human activity to streamflow alteration in Dongting Lake, China
.
Hydrol. Process.
30
(
12
),
1929
1939
.