## Abstract

Quantifying the uncertainty of non-stationary flood frequency analysis is very crucial and beneficial for planning and design of water engineering projects, which is fundamentally challenging especially in the presence of high climate variability and reservoir regulation. This study proposed an integrated approach that combined the Generalized Additive Model for Location, Scale and Shape parameters (GAMLSS) method, the Copula function and the Bayesian Uncertainty Processor (BUP) technique to make reliable probabilistic interval estimations of design floods. The reliability and applicability of the proposed approach were assessed by flood datasets collected from two hydrological monitoring stations located in the Hanjiang River of China. The precipitation and the reservoir index were selected as the explanatory variables for modeling the time-varying parameters of marginal and joint distributions using long-term (1954–2018) observed datasets. First, the GAMLSS method was employed to model and fit the time-varying characteristics of parameters in marginal and joint distributions. Second, the Copula function was employed to execute the point estimations of non-stationary design floods. Finally, the BUP technique was employed to perform the interval estimations of design floods based on the point estimations obtained from the Copula function. The results demonstrated that the proposed approach can provide reliable probabilistic interval estimations of design floods meanwhile reducing the uncertainty of non-stationary flood frequency analysis. Consequently, the integrated approach is a promising way to offer an indication on how design values can be estimated in a high-dimensional problem.

## HIGHLIGHTS

This study proposes an integrated approach to reduce uncertainties of flood frequency analysis.

The GAMLSS method models time-varying characteristics in marginal and joint distributions.

The Copula function makes point estimations of non-stationary design floods.

The BUP creates reliable interval estimations of non-stationary design floods.

## INTRODUCTION

Floods are among the world's costliest natural disasters, and changing environments have made catastrophic floods more likely. Numerous lessons have been learnt about the failure of hydraulic structures related to changing environments or changing characteristics of annual maximum streamflow series (Hui *et al.* 2018; Ragno *et al.* 2019). The flood frequency analysis results have been widely applied in the planning and design of a dam/reservoir and other civil engineering applications (Li *et al.* 2020). Conventional flood frequency analysis methods are based on the assumption of temporal stationarity that the return period (or occurrence probability) of extreme floods will not alter significantly with time. In the presence of climate and anthropogenic changes, the stationary assumption of hydrological extreme series has been challenged (Cheng & AghaKouchak 2015; Yan *et al.* 2017a, 2021). If flood non-stationarity was not adequately considered, flood risks based on the stationarity assumption would be miscalculated (under-/overestimated) in practice (Li *et al.* 2015; Serago & Vogel 2018). Therefore, non-stationary frequency analysis has been an important research topic for the ultimate purpose of supporting the planning and design of hydraulic engineering projects.

The statistical characteristics of hydrological series are commonly altered by climate change and/or human activities (Song *et al.* 2020). Such changing characteristics include the changes in statistical parameters and/or the type of probability distributions (Xiong *et al.* 2017; Liu *et al.* 2019; Sun *et al.* 2020; Zhou 2020). In recent years, several methods were used to describe the variation of parameters and weighting coefficients in distributions of hydrological series, such as single-type distributions (Gilroy & McCuen 2012), two-component mixture distributions (Bayazit 2015; Yan *et al.* 2017b) and time-varying moments (Xiong *et al.* 2015a; Yu *et al.* 2018). Additionally, the change point and trend change detection methods such as the Pettitt test (Pettitt 1979; Reeves *et al.* 2007; Ngongondo *et al.* 2020), the Mann–Kendall test (Chebana *et al.* 2013) and the trend-free pre-whitening method (Yue *et al.* 2002) were applied to detect the variability of hydrological series (Villarini & Smith 2010; Şen 2011; Rougé *et al.* 2013). Although the change point and trend change detection methods can be used to detect the hydrological non-stationarity, they cannot quantify the effects of possible physical factors on the non-stationarity of hydrological series (López & Francés 2013; Villarini & Strong 2014; Zhang *et al.* 2015; Wu *et al.* 2017; Liang *et al.* 2018; Su & Chen 2019; Bian *et al.* 2020). The Generalized Additive Model for Location, Scale and Shape parameters (GAMLSS) (Stasinopoulos & Rigby 2007) method provides a higher flexible choice in modeling the time-varying characteristics of distribution parameters, as compared with classical generalized additive models, generalized linear models, generalized linear mixed models and generalized additive mixed models (Villarini *et al.* 2009). Hence, in this study, the GAMLSS method is adopted to model the impacts of possible physical factors on the annual maximum streamflow series.

The joint probability distribution of multiple hydrological variables is very important in the hydraulic planning and design of a river basin (Parent *et al.* 2014; Li *et al.* 2018). The implementation of hydrological frequency analysis is fundamentally challenging for multivariate design variables. As compared with the multivariate functions, such as Gaussian, Student's *t* and Gamma (Chen & Guo 2018; Zhang & Singh 2019), Copula functions have the flexibility in choosing marginal distribution and modeling a non-linear dependence of variables when fitting hydrological data (Kwon & Lall 2016; Sarhadi *et al.* 2016; Chen & Guo 2018; Wei & Song 2018; Tan *et al.* 2021). For instance, Jiang *et al.* (2015) applied Copula functions to make bivariate frequency analysis of low-streamflow series. Fan *et al.* (2016) fused Gaussian mixtures into a Copula function to implement the flood frequency analysis for estimating hydrological risks. Liu *et al.* (2017) integrated the maximum entropy and Copula functions to model the non-linear dependence of multiple hydro-meteorological events. Vinnarasi & Dhanya (2019) adopted a dynamic Copula function to model the non-stationarity in extreme rainfall series. Consequently, in this study, the Copula function is employed to derive the joint distribution of multiple annual maximum streamflow series.

One of the effective techniques to quantify uncertainties in flood frequency analysis is to create probabilistic interval estimations of design floods (Read & Vogel 2015; Yan *et al.* 2019). The combination of point estimation method and probabilistic post-processing technique has been a promising integrated approach for quantifying the impacts of the uncertainties of distribution types and parameters on flood frequency analysis. Probabilistic interval estimation was commonly used to supplement the information provided by point estimations (Bracken *et al.* 2018). The Bayesian Uncertainty Processor (BUP) technique proposed by Krzysztofowicz (1999) is able to quantify the uncertainty associated with flood frequency analysis, owing to its ability to facilitate the development of the interval estimation (e.g. Thorarinsdottir *et al.* 2018; Xu *et al.* 2018; Li & Krafty 2019; Uranchimeg *et al.* 2020). Therefore, it is interesting and important to conduct an in-depth research on the exploration of the BUP post-processing technique for reducing the uncertainty encountered in non-stationary flood frequency analysis.

This study is aimed at acquiring reliable probabilistic interval estimations of non-stationary design floods. The novelty of this study lies in proposing an integrated frequency analysis approach by combining the GAMLSS method, the Copula function and the BUP post-processing technique to reduce the uncertainty associated with flood frequency analysis for the first time. First, the GAMLSS method is employed to model and fit the time-varying characteristics of the parameters both in marginal and joint distributions of annual maximum streamflow series. Then, the Copula function is employed to conduct the point estimations of non-stationary design floods. Finally, the BUP technique is employed to make the interval estimations of design floods based on the point estimations obtained from the Copula function. To verify the applicability of the proposed approach in probabilistic interval estimations of design floods, this study utilizes annual maximum streamflow series of two hydrological monitoring stations in the Hanjiang River of China as a case study.

## STUDY AREA AND MATERIALS

### Study area

The Hanjiang River catchment with an area of 159,000 km^{2} is located in central southern China (Figure 1). It has 1,567 km river length and 49.3 billion m^{3} average annual runoff. The Hanjiang River catchment has a subtropical monsoon climate and spatial heterogeneity of water resources. Most of precipitation (70–80%) occurs in the flood season (May–September), and the annual value varies from 700 m to 1,100 mm. In the past decades, several reservoirs have been constructed in the river basin. The characteristics of 10 cascade reservoirs are listed in Table 1. The operation purposes of the 10 reservoirs include flood control, water supply (or water transfer) and hydro-power generation. The total flood control storage and regulatory storage of 10 reservoirs reach 15.4 and 21.5 billion m^{3}, respectively. The Danjiangkou Reservoir is the largest reservoir in the river basin and has been used as the water source of the middle route of the South-to-North Water Transfer Project since 2014 (Zhou *et al.* 2017), which has an annual water transfer demand of 9.5 billion m^{3}. The Danjiangkou Reservoir that is located in the middle-stream controls >58% runoff of the catchment. Besides, the Ankang monitoring station controls a catchment area of 38,600 km^{2} and is located at 30 km downstream of the Ankang Reservoir, while the Huangzhuang monitoring station controls a catchment area of 142,000 km^{2} and is located at 240 km downstream of the Danjiangkou Reservoir. The Shiquan Reservoir and Ankang Reservoir, which are located in the upstream of the Hanjiang River, control <7% runoff of the catchment. The reservoirs that are located in the tributaries of the Hanjiang River control >12% runoff of the catchment.

Parameter . | Reservoir . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Shiquan . | Ankang . | Eping . | Songshuling . | Pankou . | Huanglongtan . | Danjiangkou . | Sanliping . | Siping . | Yahekou . | |

Main function beside flood control | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Water supply, hydro-power | Hydro-power | Hydro-power | Hydro-power |

Top of conservation pool (m) | 410 | 330 | 550 | 394 | 355 | 247 | 170 | 416 | 315 | 180 |

Top of buffer pool in Summer (m) | 405 | 325 | 548 | 394 | 355 | 247 | 160 | 403 | 315 | 175 |

Top of buffer pool in Autumn (m) | 405 | 325 | 548 | 394 | 355 | 247 | 164 | 412 | 315 | 175 |

Total storage (million m^{3}) | 440 | 2,925 | 302 | 57 | 2,353 | 1,162 | 2,9050 | 499 | 269 | 1,316 |

Regulatory storage (million m^{3}) | 180 | 1,670 | 153 | 32 | 1,120 | 598 | 1,6360 | 211 | 162 | 1,040 |

Flood storage (million m^{3}) | 180 | 360 | 153 | 0 | 0 | 0 | 1,4100 | 121 | 0 | 521 |

Installed capacity (MW) | 225 | 850 | 114 | 50 | 500 | 510 | 900 | 70 | 60 | 13 |

Put into operation (year) | 2002 | 1992 | 2005 | 2006 | 2011 | 1976 | 1968 | 2013 | 2006 | 1960 |

Parameter . | Reservoir . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Shiquan . | Ankang . | Eping . | Songshuling . | Pankou . | Huanglongtan . | Danjiangkou . | Sanliping . | Siping . | Yahekou . | |

Main function beside flood control | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Hydro-power | Water supply, hydro-power | Hydro-power | Hydro-power | Hydro-power |

Top of conservation pool (m) | 410 | 330 | 550 | 394 | 355 | 247 | 170 | 416 | 315 | 180 |

Top of buffer pool in Summer (m) | 405 | 325 | 548 | 394 | 355 | 247 | 160 | 403 | 315 | 175 |

Top of buffer pool in Autumn (m) | 405 | 325 | 548 | 394 | 355 | 247 | 164 | 412 | 315 | 175 |

Total storage (million m^{3}) | 440 | 2,925 | 302 | 57 | 2,353 | 1,162 | 2,9050 | 499 | 269 | 1,316 |

Regulatory storage (million m^{3}) | 180 | 1,670 | 153 | 32 | 1,120 | 598 | 1,6360 | 211 | 162 | 1,040 |

Flood storage (million m^{3}) | 180 | 360 | 153 | 0 | 0 | 0 | 1,4100 | 121 | 0 | 521 |

Installed capacity (MW) | 225 | 850 | 114 | 50 | 500 | 510 | 900 | 70 | 60 | 13 |

Put into operation (year) | 2002 | 1992 | 2005 | 2006 | 2011 | 1976 | 1968 | 2013 | 2006 | 1960 |

### Material collection

The meteorological network monitoring daily rainfall over the Hanjiang River basin contains >100 stations in 2019, about 38 of which also gauge hourly rainfall. Significant variation of the number of monitoring stations appeared over the years, leading to relatively few long precipitation observations for a long-term (>60 years) variability detection. Consequently, 12 meteorological monitoring stations are used in this study by considering the data availability and representativeness: (1) the length of the rainfall dataset at each meteorological monitoring station is >60 years spanning from 1954 to 2018 and (2) the stations are evenly distributed in the Hanjiang River basin (Figure 1). To reveal the impacts of precipitation change and reservoir regulation on annual maximum streamflow series, this study collected the monitoring records of hydro-meteorological factors occurring from 1954 to 2018, including the accumulated annual precipitation series of 12 meteorological monitoring stations, and the annual maximum streamflow series and flood control capacity corresponding to two hydrological monitoring stations (Ankang and Huangzhuang) in the Hanjiang River basin. The annual maximum streamflow series of Ankang and Huangzhuang stations are selected from the observed daily river discharge datasets. The Ankang and Huangzhuang monitoring stations are selected as the pair of design sites in this study due to the following reasons: (1) the two stations are important for flood control in the Hanjiang River and (2) modeling the degree of dependence between two stations can adequately quantify the effects of precipitation change and reservoir regulation on basin-scale flood variability. The hydrological monitoring datasets and reservoir characteristic values related to the reservoir index (RI) can be downloaded from the Changjiang Hydrology website (http://www.cjh.com.cn/, Chinese), and the rainfall monitoring datasets can be downloaded from the China National Meteorological Information Center website (http://data.cma.cn/site/index.html, Chinese).

### Pre-analysis of materials

The Kendall's *τ* correlation analysis was performed to determine what precipitation metrics were the most important in explaining the variability of annual maximum streamflow series. The monthly precipitation is transformed into the accumulated annual precipitation, which displays a higher Kendall's *τ* correlation with streamflow series as compared with the precipitation in monthly and seasonal scales. Consequently, the accumulated annual precipitation is taken as the covariate of flood non-stationarity in this study.

Figure 2 presents the time series of annual maximum streamflow, areal mean annual precipitation and flood control capacity corresponding to the two hydrological stations (Ankang and Huangzhuang) from 1954 to 2018. The Pettitt test (Villarini *et al.* 2009) and the Lombard Mood test (James & Matteson 2015) were used to detect change points corresponding to mean and variance, while the Mann–Kendall test (Villarini & Smith 2010) was used to detect trend change of these time series. In this study, we conducted the change point test for both mean and variance. The information from the change point test was considered to separate the time series into two subseries (before and after change point) on which the trend change analysis would be then implemented separately.

The results of Figure 2 reveal that both stations exhibit one or two change points in the mean or variance of all time series at 5 or 10% significance level. Figure 3 shows the trend change in the time series of annual maximum streamflow, areal mean annual precipitation and flood control capacity corresponding to the Ankang and Huangzhuang hydrological stations. The flood and precipitation series in two stations manifest a statistically significant decreasing trend at 5 or 10% level. Therefore, it is essential to conduct non-stationary flood frequency analysis to further quantify the impacts of precipitation change and reservoir regulation on design flood values in the Hanjiang River basin.

## METHODS

The architecture of the integrated approach is illustrated in Figure 4, where Figure 4(a) presents the GAMLSS method, Figure 4(b) presents the Copula function and Figure 4(c) presents the BUP technique, respectively. The time-varying characteristics both in marginal and joint distribution parameters are modeled and fitted by the GAMLSS method. The point estimations of non-stationary design floods are conducted using the Copula function. The interval estimations of design floods are produced by the BUP post-processing technique. The methods used in this study are briefly introduced as follows.

### GAMLSS method

The GAMLSS method is adopted to model a parametric distribution of annual maximum streamflow series with time-varying characteristics by considering the distribution parameter as a function of explanatory variables (*i* = 1, 2, …, *m*) with time *t*.

*et al.*2009, 2012; Jiang

*et al.*2015; Xiong

*et al.*2015b). Therefore, this study assumes time-varying characteristics resided in the location and scale parameters, whereas the shape parameter is assumed to be constant with time in all distributions.where is the

*i-*th (

*i*= 0, 1, 2, …,

*m*) explanatory variable at

*t*time.

*m*is the number of covariates (i.e. explanatory variables). is the log link function recognizing that the annual maximum streamflow series may be skewed. (

*i*= 0, 1, 2, …,

*m*) is the vector of distribution parameters accounting for location and scale, where and .

*et al.*2015; Agilan & Umamahesh 2017), these drivers are taken as the explanatory variables of the marginal distribution parameters in this study. The RI is described as follows:where and are the drainage area regulated by the

*i*th reservoir and the total drainage area monitored by the river streamflow gauge station, respectively. and are the flood capacity of the

*i*th reservoir and the total reservoir flood capacity in the river streamflow gauge station, respectively.

*N*is the number of reservoirs.

For comparative purpose, we considered several three-parameter distributions and two-parameter distributions to model the distribution of annual maximum streamflow series in this study (Table 2).

Distribution . | Probability distribution function (pdf) . | Range . | Parameter . |
---|---|---|---|

Gaussian | |||

Gamma | |||

Gumbel | |||

GEV | |||

Person type III | |||

Log-Weibull |

Distribution . | Probability distribution function (pdf) . | Range . | Parameter . |
---|---|---|---|

Gaussian | |||

Gamma | |||

Gumbel | |||

GEV | |||

Person type III | |||

Log-Weibull |

All the aforementioned computations are conducted in R (https://www.r-project.org/) using the freely available GAMLSS package (Stasinopoulos *et al.* 2008).

### Copula functions for point estimations of design floods

After modeling the time-varying characteristics of two marginal distributions, the Copula functions are used to compute the joint probability (or joint design floods) under the four scenarios of the parameters in the marginal and joint distributions (Figure 4(b).

*et al.*2014; Yu

*et al.*2014; Li

*et al.*2019). The time-varying parameter of the Copula function can be modeled as a function of covariates (

*i*= 1, 2, …,

*m*) and described as follows:where and are the link function and the Copula function, respectively. (

*i*=0, 1, 2, …,

*m*) is the parameter vector of covariates. The link function relies on the value of parameters in Copula functions, i.e. for Frank Copula function (), , whereas for Gumbel–Hougaard and Clayton Copula functions , . and are the two marginal probabilities of the Copula function at

*t*time, respectively, which should be both uniformly distributed on . is the cumulative distribution function with non-exceedance probability. and are the annual maximum streamflow series of two marginal distributions at

*t*time, respectively. is the time-varying parameter of the Copula function at

*t*time. and are the time-varying parameters of two marginal distributions at

*t*time, respectively.

Copula function . | Joint distribution function . | Parameter . |
---|---|---|

Gumbel–Hougaard (GH) | ^{a} | |

Frank | ||

Clayton |

Copula function . | Joint distribution function . | Parameter . |
---|---|---|

Gumbel–Hougaard (GH) | ^{a} | |

Frank | ||

Clayton |

^{a}*τ* is the Kendall's coefficient.

In this study, the Inference Function for Margins method (Joe 1997) is used to estimate the parameters of the time-varying joint distribution. In addition, the evaluation of marginal distributions and joint distribution is conducted by comparing the values of Kolmogorov–Smirnov (K–S) test, Akaike Information Criterion (AIC, Akaike 1974) and diagnostic plots (e.g. worm plots). Owing to the higher complexity of a joint distribution (e.g. Copula functions), the selection of a particular joint distribution can be further examined by computing the first four moments of the residuals and their Filliben correlation coefficient (Villarini *et al.* 2012). The particular distribution with the minimum K–S indicator, the minimum AIC value and rather flat worm plots, as well as the statistical properties of the residuals close to the standard normal distribution, is selected. For more details about distribution selection and fitting, the interested reader is pointed to the references (Stasinopoulos *et al.* 2008; Villarini *et al.* 2012).

*T*are the joint probability and the return period corresponding to flood pairs. is the joint probability function. and are the two marginal distributions, respectively. and are the observations of two annual maximum streamflow series, respectively.

### BUP for the probabilistic interval estimation of design floods

In this study, the BUP proposed by Krzysztofowicz (1999) is applied to model the non-linear dependence between observed and estimated data (i.e. design flood values) at each joint probability *p* (e.g. *p* starts from 0.01 up to 99.99% with time step 0.01%) step by step. The implementation procedures of the BUP (Figure 4(c)) for estimating the probabilistic interval of design flood values consist of the following four steps (Krzysztofowicz 1999):

*Step 1*: Implement data transformation for converting observed (*H*) and estimated (_{p}*S*) flood pairs from the real space to the Gaussian space._{p}*Step 2*: Compute the prior density and likelihood functions of flood pairs.*Step 3*: Calculate the posterior density function of flood pairs.*Step 4*: Conduct data transformation for converting datasets from the Gaussian space to the real space and then execute the Monte Carlo simulation to produce the probabilistic interval estimation of design flood values.

## RESULTS AND DISCUSSION

The results and findings were presented in the order of the time-varying characteristics in distribution parameters (Section 4.1), flood frequency analysis using the Copula function (Section 4.2) and probabilistic interval estimations of design floods using the BUP technique and summarization (Section 4.3), and were detailed as follows.

### Time-varying characteristics in distribution parameters

Table 4 summarizes the results of the two time-varying marginal distributions with the precipitation and the RI as the covariates corresponding to the flood-generating mechanism and reservoir regulation in the distribution parameters based on the GAMLSS method. The results point out that the Gumbel (GU) distribution and generalized extreme value (GEV) distribution can be considered as the best-fitted distributions for *Q*_{a} and *Q*_{h}, respectively, owing to the minimum values of K–S and AIC indicators. We also tested each covariate in the case separately, in which these experiment schemes created larger values of K–S and AIC indicators as compared with the results of Table 4. Therefore, when time-varying marginal distribution considers precipitation and RI simultaneously as covariates, the GU and GEV distributions have a good quality in fitting the two annual maximum streamflow series (*Q*_{a} and *Q*_{h}), respectively.

Series . | Distribution . | Distribution parameters . | K–S statistic indicator^{a}
. | AIC value . | ||
---|---|---|---|---|---|---|

. | . | . | ||||

Q_{a} | Gaussian | 11,595 | exp(−721+1.08P_{a}−14.59RI_{a}) | – | 0.132 | 459 |

Gamma | 11,418 | exp(−522+0.94P_{a}−10.33RI_{a}) | – | 0.134 | 448 | |

Gumbel | 10,107 | exp(−339+1.11P_{a}−15.23RI_{a}) | – | 0.111 | 355 | |

GEV | 9,817 | exp(−283+0.76P_{a}−16.05RI_{a}) | 0.22 | 0.130 | 460 | |

Person type III | 11,507 | exp(−610+1.25P_{a}−16.58RI_{a}) | 0.35 | 0.146 | 502 | |

Log-Weibull | 183 | exp(−138+2.09P_{a}−10.24RI_{a}) | 0.30 | 0.195 | 531 | |

Q_{h} | Gaussian | exp(−303+0.75P_{h}−10.01RI_{h}) | 0.35 | – | 0.148 | 743 |

Gamma | exp(−219+1.02P_{h}−8.40RI_{h}) | 0.33 | – | 0.153 | 807 | |

Gumbel | exp(−412+0.91P_{h}−9.05RI_{h}) | exp(−650+1.19P_{a}−16.61RI_{a}) | – | 0.131 | 726 | |

GEV | exp(−406+1.09P_{h}−15.84RI_{h}) | 0.38 | 0.27 | 0.115 | 618 | |

Person type III | exp(−452+0.78P_{h}−11.10RI_{h}) | 0.32 | 0.48 | 0.144 | 765 | |

Log-Weibull | exp(−229+0.84P_{h}−14.23RI_{h}) | 3.11 | 0.35 | 0.173 | 794 |

Series . | Distribution . | Distribution parameters . | K–S statistic indicator^{a}
. | AIC value . | ||
---|---|---|---|---|---|---|

. | . | . | ||||

Q_{a} | Gaussian | 11,595 | exp(−721+1.08P_{a}−14.59RI_{a}) | – | 0.132 | 459 |

Gamma | 11,418 | exp(−522+0.94P_{a}−10.33RI_{a}) | – | 0.134 | 448 | |

Gumbel | 10,107 | exp(−339+1.11P_{a}−15.23RI_{a}) | – | 0.111 | 355 | |

GEV | 9,817 | exp(−283+0.76P_{a}−16.05RI_{a}) | 0.22 | 0.130 | 460 | |

Person type III | 11,507 | exp(−610+1.25P_{a}−16.58RI_{a}) | 0.35 | 0.146 | 502 | |

Log-Weibull | 183 | exp(−138+2.09P_{a}−10.24RI_{a}) | 0.30 | 0.195 | 531 | |

Q_{h} | Gaussian | exp(−303+0.75P_{h}−10.01RI_{h}) | 0.35 | – | 0.148 | 743 |

Gamma | exp(−219+1.02P_{h}−8.40RI_{h}) | 0.33 | – | 0.153 | 807 | |

Gumbel | exp(−412+0.91P_{h}−9.05RI_{h}) | exp(−650+1.19P_{a}−16.61RI_{a}) | – | 0.131 | 726 | |

GEV | exp(−406+1.09P_{h}−15.84RI_{h}) | 0.38 | 0.27 | 0.115 | 618 | |

Person type III | exp(−452+0.78P_{h}−11.10RI_{h}) | 0.32 | 0.48 | 0.144 | 765 | |

Log-Weibull | exp(−229+0.84P_{h}−14.23RI_{h}) | 3.11 | 0.35 | 0.173 | 794 |

^{a}The K–S test is performed at a significance level of 0.05. The null hypothesis states that the empirical distribution fits the theoretical distribution. If the value of the K–S test indicator is smaller than the value of *D*(65, 0.05) , the null hypothesis would not be rejected. The variables *P*_{a}, *P*_{h}, RI_{a} and RI_{h} are the precipitation and the RI corresponding to the Ankang and Huangzhuang stations, respectively.

The change point and trend change tests (Figures 2 and 3) also demonstrate that the physical reason for non-stationary streamflow series would be attributed to precipitation change and reservoir regulation. From the perspective of flood-generating mechanism, near-record flood peaks are closely related to the subtropical monsoon climate, where such weather feature dominates the variability of precipitation in the Hanjiang River. From the perspective of reservoir regulation, Shiquan and Ankang reservoirs control 6.8% runoff of the catchment, while the rest reservoirs control >70% runoff.

In Figure 5, if all points fall within the 95% confidence interval (between two dashed semicircles), the marginal distribution has a good fitting quality. The results of worm plots (Figure 5) and the K–S statistic test (Table 4) indicate that the selected marginal distributions have a good fitting quality. Therefore, the factors of precipitation and RI can be selected as the covariates to fit the distributions of annual maximum streamflow series of Ankang and Huangzhuang stations.

Table 5 presents the results of three Copula functions with constant parameter (A3) and time-varying parameter (A4) for modeling the dependence between two streamflow series (*Q*_{a} and *Q*_{h}) at Ankang and Huangzhuang stations. The results reveal that under the time-varying dependence (scenario A4), the GH Copula function has the smallest values of K–S and AIC indicators, while the values of K–S and AIC indicators under the scenario A4 are smaller than those of the scenario A3. The comparison between two scenarios (A3 and A4) demonstrates that the GH Copula function with the climate and anthropogenic covariates can effectively model the non-stationarity in parameters of the joint distribution of two streamflow series.

Scenario . | Copula . | Parameter . | K–S statistic indicator^{a}. | AIC value . | ||
---|---|---|---|---|---|---|

Variable X_{1}
. | Variable X_{2}
. | Variable X_{1}
. | Variable X_{2}
. | |||

A3 | GH | 2.70 | 0.127 | 0.131 | 43 | 55 |

Frank | 7.85 | 0.144 | 0.157 | 57 | 67 | |

Clayton | 3.46 | 0.172 | 0.175 | 71 | 80 | |

A4 | GH | exp(−414+0.56P_{h}−0.73RI_{h}) | 0.119 | 0.122 | 33 | 39 |

Frank | exp(−320+0.18P_{h}−1.45RI_{h}) | 0.149 | 0.154 | 35 | 49 | |

Clayton | exp(−257+0.47P_{h}−0.69RI_{h}) | 0.173 | 0.171 | 53 | 67 |

Scenario . | Copula . | Parameter . | K–S statistic indicator^{a}. | AIC value . | ||
---|---|---|---|---|---|---|

Variable X_{1}
. | Variable X_{2}
. | Variable X_{1}
. | Variable X_{2}
. | |||

A3 | GH | 2.70 | 0.127 | 0.131 | 43 | 55 |

Frank | 7.85 | 0.144 | 0.157 | 57 | 67 | |

Clayton | 3.46 | 0.172 | 0.175 | 71 | 80 | |

A4 | GH | exp(−414+0.56P_{h}−0.73RI_{h}) | 0.119 | 0.122 | 33 | 39 |

Frank | exp(−320+0.18P_{h}−1.45RI_{h}) | 0.149 | 0.154 | 35 | 49 | |

Clayton | exp(−257+0.47P_{h}−0.69RI_{h}) | 0.173 | 0.171 | 53 | 67 |

^{a}The K–S test is performed at a significance level of 0.05. The null hypothesis states that the empirical distribution fits the theoretical distribution. If the value of the K–S test indicator is smaller than the value of *D*(65, 0.05) , the null hypothesis would not be rejected. The variables *X*_{1} and *X*_{2} are the probability integral transformations for streamflow series *Q*_{a} and *Q*_{h} at Ankang and Huangzhuang stations, respectively.

The first four moments of the residuals (i.e. mean, variance, skewness and kurtosis) and their Filliben correlation coefficients (Table 6) and the worm plots (Figure 6) further point out that the selected GH Copula function with the time-varying parameter (scenario A4) has a satisfactory fitting quality to model the dependence structure between *Q*_{a} and *Q*_{h}.

Scenario . | Copula . | Variable X_{1}. | ||||
---|---|---|---|---|---|---|

Mean . | Variance . | Skewness . | Kurtosis . | Filliben . | ||

A3 | GH | 0.01 | 1.02 | 0.05 | 2.11 | 0.985 |

Frank | 0.02 | 1.03 | 0.08 | 2.34 | 0.977 | |

Clayton | −0.02 | 1.05 | 0.11 | 2.68 | 0.968 | |

A4 | GH | 0.01 | 1.03 | 0.01 | 2.08 | 0.995 |

Frank | 0.01 | 1.02 | 0.05 | 2.17 | 0.980 | |

Clayton | −0.02 | 1.07 | 0.05 | 2.52 | 0.971 | |

. | . | Variable X_{2}. | ||||

Scenario . | Copula . | Mean . | Variance . | Skewness . | Kurtosis . | Filliben . |

A3 | GH | −0.01 | 1.02 | 0.03 | 2.18 | 0.976 |

Frank | −0.02 | 1.03 | 0.05 | 2.40 | 0.971 | |

Clayton | 0.02 | 1.04 | 0.08 | 2.75 | 0.962 | |

A4 | GH | 0.00 | 1.00 | 0.01 | 2.12 | 0.989 |

Frank | −0.01 | 1.02 | 0.03 | 2.23 | 0.979 | |

Clayton | 0.03 | 1.05 | 0.07 | 2.58 | 0.971 |

Scenario . | Copula . | Variable X_{1}. | ||||
---|---|---|---|---|---|---|

Mean . | Variance . | Skewness . | Kurtosis . | Filliben . | ||

A3 | GH | 0.01 | 1.02 | 0.05 | 2.11 | 0.985 |

Frank | 0.02 | 1.03 | 0.08 | 2.34 | 0.977 | |

Clayton | −0.02 | 1.05 | 0.11 | 2.68 | 0.968 | |

A4 | GH | 0.01 | 1.03 | 0.01 | 2.08 | 0.995 |

Frank | 0.01 | 1.02 | 0.05 | 2.17 | 0.980 | |

Clayton | −0.02 | 1.07 | 0.05 | 2.52 | 0.971 | |

. | . | Variable X_{2}. | ||||

Scenario . | Copula . | Mean . | Variance . | Skewness . | Kurtosis . | Filliben . |

A3 | GH | −0.01 | 1.02 | 0.03 | 2.18 | 0.976 |

Frank | −0.02 | 1.03 | 0.05 | 2.40 | 0.971 | |

Clayton | 0.02 | 1.04 | 0.08 | 2.75 | 0.962 | |

A4 | GH | 0.00 | 1.00 | 0.01 | 2.12 | 0.989 |

Frank | −0.01 | 1.02 | 0.03 | 2.23 | 0.979 | |

Clayton | 0.03 | 1.05 | 0.07 | 2.58 | 0.971 |

### Point estimation of design floods using the Copula function

*Q*

_{a}and

*Q*

_{h}) can be formulated as follows:where and are the GU and GEV distributions of two streamflow series (

*Q*

_{a}and

*Q*

_{h}), respectively. is the GH Copula function. The two marginal distribution parameters are expressed as follows:

Figure 7 shows the results of flood frequency analysis under the two scenarios (A3 and A4). It is interesting to find that the difference in joint return periods of two scenarios is significant after 1967, especially for medium–high flood magnitudes. Being incorporated with climate and anthropogenic covariates, the time-varying property of the Copula function not only can model the dependence between variables, but also mitigate the under-estimation of joint return periods. Despite the fact that flood non-stationarity of the Hanjiang River can be attributed to both climate change and reservoir regulation, the significant variation after 1967 is mainly due to the regulation of the largest reservoir (i.e. Danjiangkou Reservoir). The Danjiangkou Reservoir is able to control >58% runoff of the Hanjiang River catchment, since it was put into operation in 1967. Considering climate change and reservoir regulation, the time-varying characteristics in marginal distributions and Copula function have significant impacts on joint return periods of flood pairs (*Q*_{a} and *Q*_{h}).

Consider that the parameters in Equations (8)–(10) have time-varying characteristics, 1,000 simulations of design flood pairs corresponding to a given joint return period under the scenario A4 were performed to further investigate the impacts of precipitation change and reservoir regulation on the joint return period of annual maximum streamflow series (*Q*_{a} and *Q*_{h}). The median values of design flood pairs corresponding to a given joint return period (100 years) were computed in accordance with the joint distribution of each time segment.

Figure 8 displays the isolines of the design flood pairs corresponding to the joint return period (100 years) at three-time segments under the scenarios A3 and A4. The results indicate that (1) from the first time segment (1954–1967) to the second time segment (1968–1991), the isolines of the joint return period under two scenarios (A3 and A4) move to the left, owing to the decreasing mean value of annual maximum streamflow series *Q*_{h} (Huangzhuang station); (2) and then, from the second time segment (1968–1991) to the third time segment (1992–2018), the isolines of the joint return period under two scenarios (A3 and A4) drastically move downward, owing to the decreasing coefficient of variation in both annual maximum streamflow series *Q*_{a} (Ankang station) and *Q*_{h} (Huangzhuang station). The coefficient of variation denotes the extent of variability in relation to the mean of the population. For the scenario A3, it is easy for the Copula function with a constant parameter to over-estimate the coefficient of variation without modeling the time-varying characteristics. For the scenario A4, both marginal and joint distributions have been modeled with the time-varying parameters to adequately quantify the coefficient of variation. The decreasing coefficient of variation in two streamflow series can be in a large measure attributed to the impact of climate change and reservoir regulation.

There are both lateral translations (reduction in flow magnitudes) and vertical decreases in flow magnitudes. The reasons for inducing both lateral and vertical translations consist of (1) the largest Danjiangkou Reservoir in the Hanjiang river has been put into operation since 1967 and largely reduces the mean value of annual maximum streamflow series *Q*_{h} (Huangzhuang station) and (2) due to the completion of the rest of nine reservoirs during 1991–2018 and the decreasing trend of precipitation after 1991, the variances of streamflow series both in two stations decrease significantly. The changing covariates of climate and reservoir regulation (Figures 3 and 4) would adequately explain the physical meaning of the combination of lateral and vertical translations.

### Probabilistic interval estimations of design floods using the BUP technique

We further assessed the impacts of time-varying Copula function on the uncertainty of the joint probability between two annual maximum streamflow series (*Q*_{a} and *Q*_{h}) based on Quantile–Quantile (QQ) plots. Figure 9 presents the QQ plots of probabilistic flood frequency analysis corresponding to three-time segments (1954–1967, 1968–1991 and 1992–2018) using the BUP approach under two scenarios (A3 and A4).

It can be seen from Figure 9 that (1) from the first time segment (1954–1967) to the second time segment (1968–1991) and the third time segment (1992–2018), the differences between the two QQ plotlines under two scenarios (A3 and A4) become more and more significant and (2) for all three-time segments, the QQ plots corresponding to scenario A4 are closer to the theoretical 1:1 line, in comparison to those of the scenario A3. That is to say, the non-stationary Copula function (scenario A4) produces the smaller biases of the joint probability than the stationary Copula function (scenario A3). The results demonstrate that the BUP approach can effectively quantify the uncertainty of flood frequency analysis owing to its better agreement to the theoretical 1:1 line.

Figure 10 presents the probabilistic interval estimations of design floods for the given joint return period (100 years) corresponding to the three-time segments (1954–1967, 1968–1991 and 1992–2018) using the BUP approach under the two scenarios (A3 and A4). The results reveal that (1) for the first time segment (1954–1967), the probabilistic interval of design floods created by the BUP approach under the scenario A4 (non-stationary Copula) approximates to that under the scenario A3 (stationary Copula), because the impacts of climate change and reservoir regulation on flood non-stationarity are not significant in the first time segment (Figures 3 and 4) and (2) from the first time segment (1954–1967) to the next two time segments (1968–1991 and 1992–2018), the probabilistic interval of design floods created by the BUP approach under the scenario A4 is smaller than that of the scenario A3, because the impacts of climate change and reservoir regulation on flood non-stationarity are becoming more and more significant in the next two time segments (Figures 3 and 4). The results demonstrate that the BUP post-processing technique can effectively quantify the uncertainty of probabilistic interval estimation of non-stationary design floods by decreasing the probabilistic distribution to a small range.

## DISCUSSION

Because the one-on-one relationship between the return period and the return level that exists in the univariate case does not exist in the non-stationary and the multivariate case, it is important to provide an indication on how design values (i.e. flood pairs) can be inferred in such a high-dimensional problem. Take the lines A–C under the scenario A4 in Figure 10, for example, the probabilistic interval estimations can provide the maximal and minimal design values to increase the reliability in the planning and design of hydraulic engineering projects. Furthermore, there is no doubt that the hydro-meteorological environment is experiencing variability due to a changing world. Accordingly, the proposed methodology and the pre-experience maps (e.g. point estimation, isolines of design flood pairs and interval estimation) of hydrological uncertainty presented in this study are useful tools to improve the operation process, in which the predictive hydrological design is essential because the hydrological frequency and the return period will be closely associated with the variability of hydro-meteorological conditions.

The results of this study may be affected by the combined methods. For instance, the GAMLSS method needs to consider the uncertainty of model parameters containing various possible options and combinations with respect to explanatory variables and link functions. In addition, the curse of dimensionality cannot be avoided: Copula function fitting and parameter estimation are more demanding and computationally intensive in higher dimensions. Finally, the BUP application needs transform point estimations from the original space to the Gaussian space before producing interval estimations. Such data space transformation is bound to induce information loss. Therefore, more modular design, parameter estimation and hybridization are imminent to quantify and reduce the uncertainty encountered in the interval estimation of design floods.

The proposed methodology is scalable and transferable, with capacity to address problems ranging from non-stationary hydrological frequency analysis to uncertainty assessment for local and global regions of concerns. There are quite many physical attributions of hydro-meteorological non-stationarity. This study is only a case that utilizes the integrated approach to quantify the impacts of precipitation and reservoir regulation on flood non-stationarity. Notwithstanding these promising achievements, future research can be executed to investigate the far-reaching effects of climate change on future flood design values using climate model outputs.

## CONCLUSIONS

This study proposes an integrated approach that combines the GAMLSS method, the Copula function and the BUP post-processing technique to bring about the probabilistic interval estimation of design floods. The capability of the proposed approach is verified at the Ankang and Huangzhuang hydrological stations in the Hanjiang River basin of China based on long-term (1954–2018) observed daily hydrological datasets. The results reveal that the proposed approach not only adequately model the time-varying characteristics of parameters in both marginal and joint distributions of annual maximum streamflow series, but also effectively increase the reliability of probabilistic interval estimation of design floods encountered in high climate variability and reservoir regulation conditions. The findings of this study are summarized as follows:

At a significance level of 0.05 or 0.1, the years 1968 and 1992 are identified as change points with significant and moderate decreasing trends (mean or variance), respectively. The annual maximum streamflow series of the Ankang station have significantly changed in 1992, due to the regulation of the Ankang Reservoir meanwhile decreasing precipitation. The annual maximum streamflow series of the Huangzhuang station have significantly changed in 1968, because the Danjiangkou Reservoir has been put into operation. Moreover, the annual maximum streamflow series of the Huangzhuang station have moderately changed in 1992 because of the decreasing precipitation.

The determination of time-varying characteristics concentrates on the location and scale parameters, whereas the shape parameter is assumed to be constant with time in all distributions. The GAMLSS method can adequately model the time-varying characteristics in the location and scale parameters of all marginal and joint distributions of two annual maximum streamflow series at Ankang and Huangzhuang stations, while the Copula function can effectively accomplish the point estimations of non-stationary design floods. From the first time segment (1954–1967) to the next two time segments (1968–1991 and 1992–2018), the design flood values of the Ankang and Huangzhuang stations dramatically decrease under the non-stationary condition as compared with the stationary condition.

The BUP post-processing technique not only can effectively quantify the uncertainty of probabilistic interval estimation of non-stationary design floods, but also can reduce the probabilistic distribution to a small range. The results of this study highlight the benefits of non-stationary hydrological frequency analysis to planning and design of hydraulic engineering projects meanwhile water resources management.

We believe that improving the reliability and generalizability of flood frequency analysis and decreasing the uncertainty of probabilistic interval estimation of design floods would enhance the trust in non-stationary frequency analysis approaches and promote more practices in hydrological sciences.

## ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China (No. U20A20317 and 51538173), the Research Council of Norway (FRINATEK Project 274310) and the China Three Gorges Corporation (No. 0799254). The authors thank the Editors and anonymous Reviewers for their constructive comments that greatly contributed to enrich the manuscript.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.