## Abstract

Hydrological series lengths are decreasing due to decreasing investments and increasing human activities. For short sequences, a copula-based composite likelihood approach (CBCLA) has been employed to enhance the quality of hydrological design values. However, the Pearson type III (P-III) distribution for short annual precipitation records has not yet been thoroughly investigated using the CBCLA. This study used the CBCLA to incorporate the concurrent and non-concurrent periods contained in data of various lengths into an integrated framework to estimate the parameters of precipitation frequency distributions. The marginal distributions were fitted using the P-III distribution, and the joint probability was constructed using a copula which offers flexibility in choosing arbitrary marginals and dependence structure. Furthermore, the uncertainties in the estimated precipitation design values for the short series obtained from this approach were compared with those obtained from univariate analysis. Then, Monte-Carlo simulations were performed to examine the feasibility of this approach. The annual precipitation series at four stations in Weihe River basin, China, were used as a case study. Results showed that CBCLA with P-III marginals reduced the uncertainty in the precipitation design values for the short series and the reduction in the uncertainty became more significant with longer adjacent series.

## INTRODUCTION

Hydrologic frequency analysis is used to estimate the magnitude of a hydrologic event with a given frequency of occurrence or to estimate the frequency of occurrence of a hydrologic event with a given magnitude (Haan 1977; Rao & Hamed 1999). Sufficiently long data are essential for accurate parameter estimation. However, it is often difficult to obtain such long-term data, and hydrological design estimates based on data of insufficient length involve large uncertainties. Therefore, analyzing these uncertainties is of great significance for improving the calculation accuracy and reducing the uncertainties of hydrological design values.

In the past, most hydrologic frequency analyses were analyzed through the use of univariate distributions. Using an assumed probability distribution to estimate the frequency analysis of a hydrologic event, one can compute the statistics for various magnitudes, which can then be used for the planning, design, and management of water resource systems.

Research on the univariate flood frequency analysis has been reviewed by Bobée & Rasmussen (1995), Rao & Hamed (1999), Singh & Strupczewski (2002), and Vittal *et al.* (2015). Singh & Strupczewski (2002) classified frequency analysis methods into four groups: (1) empirical, (2) phenomenological, (3) dynamic, and (4) stochastic watershed in conjunction with Monte-Carlo simulation. In addition, the suggested distributions for fitting hydrological data and a number of parameter estimation methods were summarized by Rao & Hamed (1999). Two popular methods for regional frequency analysis are the index FFA (Flood Frequency Analysis) and regression analysis (Singh & Strupczewski 2002).

Some studies on annual maximum flow frequency analysis show that the empirical probability plot often displays two or more distinct segments. Using a single probability distribution smooth function to fit all the segments may produce a compromised curve and may lead to serious flood estimation errors for large return periods. In most cases, the observed data of the smaller censored flows have no influence on the fitting of the flood frequency curve. If the smaller flows produce some (very marginal) influence, the final fitting still depends very heavily on the larger flows. Wang (1990a, 1990b, 1996a, 1996b, 1997a, 1997b) proposed using partial probability weighted moments and higher probability weighted moments to estimate flood distribution and presents the potential merits of this method.

Recently, a vast amount of research has been dedicated to investigating frequency analysis under conditions of climate change, land use change and river regulation, acting individually or together (Strupczewski & Kaczmarek 2001; Strupczewski *et al.* 2001; Cunderlik & Burn 2003; Khaliq *et al.* 2006; Leclerc & Ouarda 2007; Villarini *et al.* 2009, 2010; Gilroy & McCuen 2012; Salas & Obeysekera 2014; Zeng *et al.* 2014; Li & Tan 2015; Vasiliades *et al.* 2015; Xiong *et al.* 2015).

The above-mentioned studies still require data over a sufficiently long period to obtain sound hydrological designs based on statistical analysis. However, measuring and monitoring hydrologic events involves high costs, and the global hydrological gauge network is still unevenly distributed, with most measuring stations located in a few developed countries and regions and particularly poor gauge coverage in developing countries. The station density is relatively low in developing countries, and the length of the hydrological data is limited, resulting in critical data gaps or representativeness problems. As a result of various natural conditions and human activities, such as destruction and migration of stations, construction of water resource projects, and river regulation, these deficiencies have been exacerbated over recent decades. Inadequate data lengths can produce large uncertainties in hydrological design estimates (Chowdhary & Singh 2010). Even when employing new parameter estimation techniques, such as partial probability weighted moments and higher probability weighted moments, it is difficult to improve the estimation accuracy for shorter data sequences.

Therefore, to overcome the problem of insufficient data for frequency analysis, additional information has been used to reduce the uncertainty in hydrologic estimates. For example, Vogel & Stedinger (1985) lengthened short hydrologic records by employing the cross-correlation among nearby longer records. They also developed improved unbiased estimators of the mean and variance of the streamflow at the sites with short records, and these estimators could obtain equal or lower variance than the simple at-site sample maximum likelihood estimates. In addition, to increase the precision of flood quantile estimators for a relatively short data series, historical or paleofloods data have been incorporated into flood frequency analyses (Stedinger & Cohn 1986; Reis & Stedinger 2005; Halbert *et al.* 2016), and joint cumulative distribution functions (bivariate or trivariate normal distributions, extreme value distributions, Gumbel mixed model and bivariate Gamma distribution) have been adopted to simultaneously consider data from adjacent stations (Escalante-Sanboval & Raynal-Villasenor 1998; Yue 2000, 2001, 2002; Yue *et al*. 2001; Escalante-Sandoval 2007).

The conventional multivariate approaches are limited due to the marginal distributions and the joint distribution belonging to the same family and the mathematical formulation becoming increasingly complicated with the increasing number of parameters (Chen *et al.* 2010; Li *et al.* 2013). To overcome these shortcomings, copula functions have been proposed and widely used in multivariate hydrological frequency analyses (Zhang & Singh 2006, 2007a, 2007b; Song & Singh 2010; Salvadori *et al.* 2011; Ma *et al.* 2013; Salvadori *et al.* 2013; Zhang *et al.* 2013, 2015; Fu & Butler 2014; Huang *et al.* 2015; Chen *et al.* 2016).

Chowdhary & Singh (2009, 2010) noted that only the concurrent periods of data sets have been used in either conventional or copula-based multivariate hydrological frequency analysis, whereas a majority of non-concurrent data in the extensive amount of staggered hydrological data remains unutilized for hydrological analysis. These non-concurrent data offer untapped potential for reducing parameter uncertainty by considering a multivariate framework which can simultaneously consider partially concurrent information in an integrated manner. However, such studies are currently still limited. Sandoval & Raynal-Villasenor (2008) constructed a trivariate logistic model with generalized extreme value distribution marginals to analyze samples with equal or unequal record lengths. Their aim was to reduce the bias or uncertainty in flood estimates by combining observations from several sites with an insufficiently long record to increase the available information and to provide a regional at-site design event. Chowdhary & Singh (2009, 2010) proposed a copula-based composite likelihood approach (CBCLA) to analyze various lengths of multivariate data and to reduce uncertainty in hydrologic design estimates. The results show that the CBCLA can provide necessary flexibility for admitting arbitrary marginals and can significantly reduce uncertainty in design flood quantiles for a relatively short flood series by utilizing associated downstream flood data. This approach can offset the impact of dwindling hydrological observation networks around the world by adding information that can be derived from existing networks.

Combining the CBCLA with the P-III distribution for short annual precipitation records has not yet been thoroughly investigated. The primary aim of this paper is to use the CBCLA to investigate the uncertainty of annual precipitation design values for a relatively short series fitted with the P-III distribution. In this study, the concurrent and exclusive parts of a multivariate data set with various lengths were combined in an integrated manner using copulas and the maximum likelihood function, and the uncertainty was quantified in terms of confidence intervals. Examples using simulated and measured data are presented in this paper. The precipitation sequences of four stations, namely Xi'an, Dali, Lintong and Huayin, in Weihe River basin (WRB) in China were selected as a case study. The data length at Huayin station was 28 years and is thus short series. The data lengths at the other three stations were 77, 53 and 50 respectively, and they are long series that can provide additional information. The confidence intervals and standard errors of the design values at Huayin station were derived and compared with the results from univariate estimations. The results show that the CBCLA can reduce the uncertainty in the precipitation design values for a relatively short data series.

## METHODS

### Bivariate copulas

*X*and

*Y*be two random variables with cumulative distribution functions (cdfs) of ; the bivariate joint distribution function of (

*X*,

*Y*) is defined as: where

*F*(

*x*,

*y*) is the bivariate distribution of

*X*and

*Y*, denotes the copula function with the dependence parameter , are the marginal distributions of

*X*and

*Y*, respectively, and with . From Equation (1), the copula-based joint probability density function (pdf) can be obtained for the bivariate variables (Shiau 2006): where and are marginal pdfs and is the copula density function of concurrent random samples.

*Y*, the pdf of the P-III distribution can be written as: where are the scale, shape and location parameters, respectively. The MLM (Maximum Likelihood Method) was used to estimate the P-III distribution parameters. Two general goodness of fit test statistics, Kolmogorov–Smirnov's (K-S)

*Dn*and Anderson–Darling's (A-D) , were used to determine whether the data were consistent with the P-III distribution (Song & Singh 2010).

### Parameter estimation and goodness-of-fit test for a bivariate distribution

The Archimedean copula family is a popular model for hydrologic analyses (Nelsen 2007) because it is easy to construct and it can be applied to both positively and negatively correlated hydrologic variables. Therefore, three one-parameter Archimedean copulas, Gumbel–Hougaard (GH), Clayton and Frank, were selected to determine the joint probability distribution of the precipitation variables in this study. The analytical expressions of these copulas have been well documented in previous pioneering studies (Shiau 2006; Zhang & Singh 2007a, 2007b).

The dependence parameters for these copulas were estimated using Kendall's tau *τ* (KTE) and maximum likelihood estimation (MLE) (Genest & Favre 2007). For the KTE, the dependence parameter can be estimated from the relationship between *τ* and . The MLE is obtained by maximizing the likelihood function, which involves the parameters of both the marginal distributions and the copula function.

The root mean square error (RMSE), Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to represent the bias of the probability distributions (Posada & Buckley 2004; Zhang & Singh 2007a, 2007b; Ma *et al.* 2013). Additionally, a bootstrap version of the K-S test and A-D test based on Rosenblatt's transformation was employed to test the goodness-of-fit for the copulas (Dobrić & Schmid 2007). The expressions and calculation steps of these statistics have been documented in previous studies (Song & Singh 2010).

### Copula-based composite likelihood parameter estimation

The MLM is applicable to the parameter estimation of complex density functions. The arrangement of a typical composite event, comprising some overlapping and some exclusive periods of *X* and *Y*, is shown in Figure 1. and are the full lengths of the two data series, respectively. is the length of the concurrent period, and and are the lengths of the individual nonconcurrent periods. Correspondingly, is the concurrent event, and are the nonconcurrent events. These three events constitute a composite event *e _{C}*, and the likelihood function of

*e*is called the composite likelihood function (Chowdhary & Singh 2010). Let represent the pdfs of marginal distributions, and be the bivariate pdf of (

_{C}*X*,

*Y*), which takes the same form as

*f*(

*x*,

*y*) in Equation (2). Here, , and are the parameter vectors of these distributions, respectively; , and

*r*represent the numbers of parameters; is the dependence parameter vector of the bivariate pdf; and is a parameter number. For the sake of brevity, the pdfs are denoted as

*f*(

*x*),

*f*(

*y*), and

*f*(

*x*,

*y*), respectively.

*Y*shown in Figure 1, the likelihood function

*L*and the log-likelihood function

_{Y}*LL*with respect to

_{Y}*N*independent and identically distributed (iid) observations are given as: The maximum likelihood estimators are obtained by maximizing

_{Y}*LL*and solving the system of equations given by (Bobée 1979; Rao & Hamed 1999): For the case in which the data lengths are not identical, Raynal-Villasenor (1985) proposed a general form of the likelihood function that is applicable to all possible composite arrangements. Following his work, the likelihood function

_{Y}*L*and log-likelihood function

_{C}*LL*for a composite event can be expressed as: In the above equations,

_{C}*f*(

*x*) and

*f*(

*y*) are the univariate densities for the exclusive periods that are given by their marginal density functions and

*LL*and

_{X}*LL*are the corresponding log-likelihood functions (Escalante-Sandoval 2007; Raynal-Villasenor & Salas 2008). Unlike the composite likelihood function in Raynal-Villasenor's work, the bivariate pdf

_{Y}*f*(

*x*,

*y*) of the concurrent period in Equation (7) is modeled using the copula function shown in Equation (2), and

*LL*is its log-likelihood function.

_{XY}*I*, (

_{i}*i*= 1, 2, 3) represents an indicator variable, such as

*I*= 1 if (

_{i}*j*=

*X*,

*XY*,

*Y*) or

*I*= 0 if .

_{i}The estimators contain the new parameters of the marginal distributions for both the short and long series and the dependence parameter of the copula function. These parameters are used in the calculations in the following sections.

### Variance-covariance matrix of parameter estimation

To calculate the confidence intervals of the design values, the standard errors of the distribution parameters should be estimated first. An asymptotic variance-covariance matrix usually provides a good description of the parameter error (Rueda 1981). In this section, following Chowdhary & Singh (2010), we summarize the procedures of the variance-covariance matrix for parameter estimation.

The asymptotic variance-covariance matrix of parameter estimation is the inverse of the Fisher information matrix, which has elements that are equal to the expectations of the second partial derivatives of the log-likelihood function distribution with respect to the parameters (Kamwi 2005; Nadarajah 2006). The main diagonal elements of the variance-covariance matrix correspond to the asymptotic variances of the parameters.

*p*= 1, 2, …, and

*q*= 1, 2, …, .

**I**

*is: where and is the information content derived from a single observation of*

_{Y}*Y*(Chowdhary & Singh 2010). Using Equation (9),

**A**

*is given by: The details of the derivation of*

_{Y}**I**

*are given in Appendix A (available with the online version of this paper). Inverting*

_{Y}**I**

*leads to the corresponding variance-covariance matrix: where*

_{Y}*V*ar(·) and

*C*ov(·) are the asymptotic variances and covariances of the parameters, respectively.

*p*= 1, 2, …,

*r*and

*q*= 1, 2, …,

*r*.

*n*> 0, the above result can be expressed in terms of

_{XY}*a*using Equations (10) and (15): where are the ratios of the total lengths of

^{p,q}*X*and

*Y*to the concurrent period, respectively, and the Fisher information matrix for composite events is given by: where .

### Information matrix for the P-III distribution and bivariate distribution

*Y*in Figure 1 is given as: where .

*r*= 7, including the parameters of marginal distributions and the dependence parameter of the copula . The Fisher information matrix of the parameters is given by: where

### Uncertainty estimation of the design values

*et al.*2012). The distribution parameters of the short series

*Y*were obtained from: (a) the univariate log-likelihood function

*LL*given in Equation (3) and (b) the composite log-likelihood function

_{Y}*LL*given in Equation (7), and the standard errors of the design values were compared for the two cases. The variances of the design values for case (a) were calculated using the delta method (Coles

_{C}*et al.*2001): where is the quantile function of series

*Y*for a given return period, is the derivative of with respect to parameter , is the corresponding transposed matrix, and

**VC**

*is the variance-covariance matrix of the parameter vector estimated using the univariate likelihood function given in Equation (5). The standard error is written as:*

_{U}*Y*for a given return period, which takes the same form as . The derivatives of and

**VC**

*were determined according to the estimated values of the parameters . The detailed derivations are given in Appendix B (available with the online version of this paper).*

_{C}After deriving the standard errors of the design values estimated using Equations (24) and (25), the confidence intervals under confidence level were calculated using , where is the quantile of the standard normal distribution for confidence levels equal to , is the design value for the return period *T* which is calculated using and , and is the standard error ( and ) of .

### CBCLA simulation

In this section, CBCLA was used in a simulated series to validate its applicability for improving the accuracy of parameter estimation for shorter series. Assuming that two data series, *X* and *Y*, were P-III distributed and that the joint probability of their concurrent part was constructed following a Frank copula, the Monte-Carlo method was used to generate a data set with length *N* = 80 and the following distribution parameters: for *X*; for *Y*; and the Frank copula dependence parameter is , which corresponds to a Kendall correlation coefficient of . Based on the arrangement of the composite event shown in Figure 1, we constructed eleven different composite event cases S1, S2, S3, S4, S5, S6, S7, S8, S9, S10 and S11, as shown in Table 1, using the simulated data. Here, variables *X* and *Y* are called the ‘longer series’ and the ‘shorter series,’ respectively. The objective here was to see whether using CBCLA decreases the uncertainty in the design values for *Y*.

Y | X | ||||
---|---|---|---|---|---|

Case | |||||

S1 | 30 | 50 | 20 | 30 | 0 |

S2 | 60 | 30 | 30 | 0 | |

S3 | 60 | 20 | 40 | 0 | |

S4 | 40 | 70 | 30 | 40 | 0 |

S5 | 80 | 40 | 40 | 0 | |

S6 | 30 | 15 | 15 | 5 | |

S7 | 20 | 40 | 25 | 15 | 5 |

S8 | 50 | 35 | 15 | 5 | |

S9 | 40 | 5 | 35 | 5 | |

S10 | 40 | 45 | 10 | 35 | 5 |

S11 | 50 | 15 | 35 | 5 |

Y | X | ||||
---|---|---|---|---|---|

Case | |||||

S1 | 30 | 50 | 20 | 30 | 0 |

S2 | 60 | 30 | 30 | 0 | |

S3 | 60 | 20 | 40 | 0 | |

S4 | 40 | 70 | 30 | 40 | 0 |

S5 | 80 | 40 | 40 | 0 | |

S6 | 30 | 15 | 15 | 5 | |

S7 | 20 | 40 | 25 | 15 | 5 |

S8 | 50 | 35 | 15 | 5 | |

S9 | 40 | 5 | 35 | 5 | |

S10 | 40 | 45 | 10 | 35 | 5 |

S11 | 50 | 15 | 35 | 5 |

The distribution parameters of *Y* in different composite cases were estimated using CBCLA, and the design values and 95% confidence widths corresponding to return periods of 50, 100 and 200 years were calculated based on these parameters. Table 2 compares the calculation results with the estimations from the univariate case. For both the univariate and composite case, the standard errors and confidence widths increased with increasing return periods. For the same size of *Y*, the standard errors and confidence widths obtained based on CBCLA were less than that of the estimations using the univariate likelihood method. For example, in the case of *N _{Y}* = 30, the standard errors and confidence widths in cases S1 and S2 are smaller than those in the univariate case. Therefore, integrating additional observations with a series of insufficient length can increase the available information, can obtain more accurate parameter estimations from CBCLA, and can improve the accuracy of the design values.

T-50^{b} | T-100^{b} | T-200^{b} | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

N_{Y} | Case | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width |

UL^{a} | 824.9 | 55.2 | 216.5 | 854.2 | 65.3 | 256.0 | 880.3 | 75.7 | 296.8 | |

30 | S1 | 816.6 | 52.1 | 204.2 | 845.7 | 61.8 | 242.2 | 871.7 | 71.9 | 281.7 |

S2 | 818.5 | 51.9 | 203.6 | 847.4 | 61.6 | 241.4 | 873.2 | 71.6 | 280.8 | |

UL^{a} | 815.2 | 47.4 | 185.9 | 840.6 | 55.6 | 218.1 | 862.9 | 64.0 | 251.1 | |

40 | S3 | 810.1 | 44.6 | 174.6 | 835.4 | 52.4 | 205.6 | 857.5 | 60.6 | 237.5 |

S4 | 807.4 | 44.5 | 174.5 | 832.7 | 52.4 | 205.5 | 854.8 | 60.5 | 237.3 | |

S5 | 809.9 | 44.5 | 174.3 | 835.1 | 52.4 | 205.2 | 857.2 | 60.5 | 237.1 | |

UL^{a} | 836.1 | 87.9 | 344.5 | 879.5 | 99.9 | 391.7 | 920.5 | 111.7 | 438.0 | |

20 | S6 | 818.9 | 74.4 | 291.5 | 857.2 | 87.3 | 342.3 | 893.4 | 100.5 | 394.1 |

S7 | 819.4 | 74.0 | 290.1 | 857.6 | 87.0 | 341.0 | 893.8 | 100.2 | 393.0 | |

S8 | 818.3 | 74.0 | 290.1 | 856.5 | 87.0 | 340.9 | 892.6 | 100.2 | 392.7 | |

UL^{a} | 842.5 | 48.4 | 189.7 | 879.9 | 58.0 | 227.2 | 914.9 | 67.9 | 266.3 | |

40 | S9 | 850.0 | 47.9 | 187.9 | 887.6 | 57.5 | 225.3 | 922.9 | 67.5 | 264.6 |

S10 | 849.6 | 47.9 | 187.8 | 887.2 | 57.5 | 225.3 | 922.4 | 67.5 | 264.6 | |

S11 | 849.1 | 47.9 | 187.7 | 886.7 | 57.4 | 225.2 | 921.9 | 67.5 | 264.4 |

T-50^{b} | T-100^{b} | T-200^{b} | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

N_{Y} | Case | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width |

UL^{a} | 824.9 | 55.2 | 216.5 | 854.2 | 65.3 | 256.0 | 880.3 | 75.7 | 296.8 | |

30 | S1 | 816.6 | 52.1 | 204.2 | 845.7 | 61.8 | 242.2 | 871.7 | 71.9 | 281.7 |

S2 | 818.5 | 51.9 | 203.6 | 847.4 | 61.6 | 241.4 | 873.2 | 71.6 | 280.8 | |

UL^{a} | 815.2 | 47.4 | 185.9 | 840.6 | 55.6 | 218.1 | 862.9 | 64.0 | 251.1 | |

40 | S3 | 810.1 | 44.6 | 174.6 | 835.4 | 52.4 | 205.6 | 857.5 | 60.6 | 237.5 |

S4 | 807.4 | 44.5 | 174.5 | 832.7 | 52.4 | 205.5 | 854.8 | 60.5 | 237.3 | |

S5 | 809.9 | 44.5 | 174.3 | 835.1 | 52.4 | 205.2 | 857.2 | 60.5 | 237.1 | |

UL^{a} | 836.1 | 87.9 | 344.5 | 879.5 | 99.9 | 391.7 | 920.5 | 111.7 | 438.0 | |

20 | S6 | 818.9 | 74.4 | 291.5 | 857.2 | 87.3 | 342.3 | 893.4 | 100.5 | 394.1 |

S7 | 819.4 | 74.0 | 290.1 | 857.6 | 87.0 | 341.0 | 893.8 | 100.2 | 393.0 | |

S8 | 818.3 | 74.0 | 290.1 | 856.5 | 87.0 | 340.9 | 892.6 | 100.2 | 392.7 | |

UL^{a} | 842.5 | 48.4 | 189.7 | 879.9 | 58.0 | 227.2 | 914.9 | 67.9 | 266.3 | |

40 | S9 | 850.0 | 47.9 | 187.9 | 887.6 | 57.5 | 225.3 | 922.9 | 67.5 | 264.6 |

S10 | 849.6 | 47.9 | 187.8 | 887.2 | 57.5 | 225.3 | 922.4 | 67.5 | 264.6 | |

S11 | 849.1 | 47.9 | 187.7 | 886.7 | 57.4 | 225.2 | 921.9 | 67.5 | 264.4 |

^{a}UL implies that the results were estimated from the shorter series using the univariate maximum likelihood method.

^{b}T-50, T-100 and T-200 indicate the given return period.

Table 2 shows that for a constant length of *Y*, the standard errors and confidence widths decrease as the length of the additional series *X* increases. For example, in the case of *N _{Y}* = 40, the standard errors and confidence widths in cases S3, S4 and S5 gradually decrease, which indicates that the reduction in uncertainty increases as the length of the additional series increases. Therefore, CBCLA can reduce the uncertainty in the design values for inadequate series and can provide a more precise design basis for engineering projects.

## STUDY AREA AND DATA

To illustrate the application of the CBCLA as outlined above, we analyzed the annual precipitation data collected in the WRB, which is the longest tributary of the Yellow River in China. The river originates from Gansu province and flows 818 km eastward into the Yellow River, and the catchment covers an area of approximately 134,800 km^{2}. The WRB is located in the transitional zone between arid and humid areas with an average annual precipitation of 572 mm. Due to the typical temperate continental monsoon climate, the total annual precipitation exhibits significant spatial and seasonal variations.

In this study, the observed annual precipitation series at four rainfall gauges (Figure 2), namely, Xi'an, Dali, Lintong and Huayin, were used as a case study. All the data were obtained from the National Climate Center of China Meteorological Administration. The observation periods at these four stations were 1932–2008, 1956–2008, 1959–2008 and 1981–2008, respectively. All these data have low first-order series correlation coefficients (−0.1140, −0.0785, 0.0480 and 0.0503 respectively) and no missing records. Anderson's test of independence showed that these data have an independent structure at a 90% confidence level. The characteristics of the annual precipitation data are listed in Table 3.

Xi'an | Dali | Lintong | Huayin | |
---|---|---|---|---|

Min precipitation (mm) | 285.2 | 240.8 | 302.3 | 302.5 |

Max precipitation (mm) | 903.2 | 843.5 | 954.9 | 898.8 |

Average precipitation (mm) | 571.9 | 500.8 | 579.5 | 567.8 |

Standard deviation | 126.957 | 124.926 | 129.201 | 159.269 |

Xi'an | Dali | Lintong | Huayin | |
---|---|---|---|---|

Min precipitation (mm) | 285.2 | 240.8 | 302.3 | 302.5 |

Max precipitation (mm) | 903.2 | 843.5 | 954.9 | 898.8 |

Average precipitation (mm) | 571.9 | 500.8 | 579.5 | 567.8 |

Standard deviation | 126.957 | 124.926 | 129.201 | 159.269 |

Among the four stations, Huayin (the shortest data series) served as the base station, and the other three stations were regarded as reference stations to provide additional information. Three composite events were constructed based on the four series and were labelled case I, case II, and case III. The relevant information on the annual precipitation data series and the arrangement of composite events are summarized in Table 4.

X | Y | |||||
---|---|---|---|---|---|---|

Case | Station | Station | ||||

I | Xi'an | 77 | 49 | 28 | 28 | |

II | Dali | 53 | 25 | Huayin | 28 | 28 |

III | Lintong | 50 | 22 | 28 | 28 |

X | Y | |||||
---|---|---|---|---|---|---|

Case | Station | Station | ||||

I | Xi'an | 77 | 49 | 28 | 28 | |

II | Dali | 53 | 25 | Huayin | 28 | 28 |

III | Lintong | 50 | 22 | 28 | 28 |

## RESULTS AND DISCUSSION

### Marginal distribution and goodness-of-fit

The P-III distribution was applied to fit each data series on a univariate basis. The MLM was used to estimate the distribution parameters. Using the estimated parameters, we compared the theoretical and empirical probabilities of the observed data (Figure 3). The empirical nonexceedance probabilities were calculated using the Weibull formula recommended by the Ministry of Water Resources of China (MWR 2006). The P-III distribution in Figure 3 shows a satisfactory fitting of the observed data series.

The K-S test and A-D test were used to evaluate the goodness-of-fit of the precipitation observations. The computed sample values, *Dn* and , are listed in Table 5. The critical values of *Dn* and are also shown in Table 5 for simulation times *N* = 5,000 and significance levels of *α* = 0.20, 0.15, 0.10, 0.05, 0.02, 0.01. Because all the sample statistics are smaller than the corresponding critical values at various significance levels, the P-III distribution is acceptable as a marginal distribution.

Station | Type | Sample statistic | Critical values of various significance levels a | |||||
---|---|---|---|---|---|---|---|---|

0.20 | 0.15 | 0.10 | 0.05 | 0.02 | 0.01 | |||

Xi'an | Dn | 0.0733 | 0.1196 | 0.1268 | 0.1366 | 0.1519 | 0.1692 | 0.1797 |

0.3346 | 1.3756 | 1.5790 | 1.8912 | 2.4793 | 3.2856 | 3.9696 | ||

Dali | Dn | 0.0607 | 0.1441 | 0.1525 | 0.1653 | 0.1834 | 0.2013 | 0.2188 |

0.2794 | 1.4356 | 1.6459 | 1.9710 | 2.5110 | 3.2464 | 3.8063 | ||

Lintong | Dn | 0.0701 | 0.1494 | 0.1585 | 0.1701 | 0.1888 | 0.2096 | 0.2249 |

0.2561 | 1.3832 | 1.5679 | 1.8546 | 2.4415 | 3.3945 | 4.0102 | ||

Huayin | Dn | 0.0803 | 0.1977 | 0.2097 | 0.2260 | 0.2522 | 0.2797 | 0.2975 |

0.1934 | 1.4283 | 1.6449 | 1.9447 | 2.4772 | 3.2023 | 3.7704 |

Station | Type | Sample statistic | Critical values of various significance levels a | |||||
---|---|---|---|---|---|---|---|---|

0.20 | 0.15 | 0.10 | 0.05 | 0.02 | 0.01 | |||

Xi'an | Dn | 0.0733 | 0.1196 | 0.1268 | 0.1366 | 0.1519 | 0.1692 | 0.1797 |

0.3346 | 1.3756 | 1.5790 | 1.8912 | 2.4793 | 3.2856 | 3.9696 | ||

Dali | Dn | 0.0607 | 0.1441 | 0.1525 | 0.1653 | 0.1834 | 0.2013 | 0.2188 |

0.2794 | 1.4356 | 1.6459 | 1.9710 | 2.5110 | 3.2464 | 3.8063 | ||

Lintong | Dn | 0.0701 | 0.1494 | 0.1585 | 0.1701 | 0.1888 | 0.2096 | 0.2249 |

0.2561 | 1.3832 | 1.5679 | 1.8546 | 2.4415 | 3.3945 | 4.0102 | ||

Huayin | Dn | 0.0803 | 0.1977 | 0.2097 | 0.2260 | 0.2522 | 0.2797 | 0.2975 |

0.1934 | 1.4283 | 1.6449 | 1.9447 | 2.4772 | 3.2023 | 3.7704 |

### Bivariate distribution of concurrent period based on the copula

As shown in Table 4, the length of the concurrent periods in all composite events is 28 years. Common measures, such as the Pearson correlation coefficient *γ*, Kendall's *τ*, and Spearman's *ρ*, were used to investigate the dependence of the concurrent variables. The results of the correlation coefficients and the corresponding *p*-values shown in Table 6 reveal the presence of significant dependent relationships between the concurrent periods in cases I, II, and III.

Case | Pearson | Spearman | Kendall | |||
---|---|---|---|---|---|---|

γ | p1 | ρ | p2 | τ | p3 | |

I | 0.7825 | 8.66E-07 | 0.6579 | 0.0002 | 0.5079 | 8.34E-05 |

II | 0.8590 | 4.93E-09 | 0.7942 | 2.10E-06 | 0.6032 | 1.60E-06 |

III | 0.8396 | 2.33E-08 | 0.7690 | 4.18E-06 | 0.6085 | 4.18E-06 |

Case | Pearson | Spearman | Kendall | |||
---|---|---|---|---|---|---|

γ | p1 | ρ | p2 | τ | p3 | |

I | 0.7825 | 8.66E-07 | 0.6579 | 0.0002 | 0.5079 | 8.34E-05 |

II | 0.8590 | 4.93E-09 | 0.7942 | 2.10E-06 | 0.6032 | 1.60E-06 |

III | 0.8396 | 2.33E-08 | 0.7690 | 4.18E-06 | 0.6085 | 4.18E-06 |

The joint distribution was constructed using the GH, Clayton and Frank copulas. The dependence parameters estimated based on the MLE and KTE method are listed in Table 7, and the RMSE, AIC and BIC for three copulas are also compared in this table. According to the smallest values of these three statistics, the best-fit copula and parameter estimation methods for the three composite cases are GH (MLE), Clayton (KTE) and GH (KTE). Table 8 lists the goodness-of-fit results of the best-fit copulas for the three cases. Given a significance level of *α* = 0.05, all the test statistics based on the observed data are smaller than the corresponding critical values, indicating that GH, Clayton and GH copulas are acceptable for modeling the bivariate joint distributions of the concurrent data in cases I, II, III, respectively.

Case | Method | Copula | AIC | BIC | RMSE | |
---|---|---|---|---|---|---|

I | KTE | Frank | 5.8824 | −180.4813 | −179.1491 | 0.0377 |

GH | 2.0323 | −182.5350 | −181.2028 | 0.0364 | ||

Clayton | 2.0645 | −182.2199 | −180.8877 | 0.0366 | ||

Frank | 5.9547 | −180.5580 | −179.2258 | 0.0377 | ||

MLE | GH | 2.0430 | −182.6685 | −181.3363 | 0.0363 | |

Clayton | 1.6118 | −178.7845 | −177.4523 | 0.0389 | ||

KTE | Frank | 8.0150 | −182.2286 | −180.8964 | 0.0366 | |

II | GH | 2.5200 | −177.2635 | −175.9313 | 0.0400 | |

Clayton | 3.0400 | −183.6340 | −182.3018 | 0.0357 | ||

Frank | 7.6888 | −181.5102 | −180.1780 | 0.0371 | ||

MLE | GH | 2.4162 | −176.0000 | −174.6678 | 0.0409 | |

Clayton | 1.5949 | −166.9416 | −165.6094 | 0.0481 | ||

KTE | Frank | 8.1601 | −200.2240 | −200.2240 | 0.0265 | |

III | GH | 2.5541 | −204.5003 | −203.1681 | 0.0246 | |

Clayton | 3.1081 | −193.9300 | −192.5978 | 0.0297 | ||

Frank | 7.8457 | −200.3987 | −199.0665 | 0.0265 | ||

MLE | GH | 2.5534 | −204.4974 | −203.1652 | 0.0246 | |

Clayton | 2.1037 | −189.1661 | −187.8339 | 0.0323 |

Case | Method | Copula | AIC | BIC | RMSE | |
---|---|---|---|---|---|---|

I | KTE | Frank | 5.8824 | −180.4813 | −179.1491 | 0.0377 |

GH | 2.0323 | −182.5350 | −181.2028 | 0.0364 | ||

Clayton | 2.0645 | −182.2199 | −180.8877 | 0.0366 | ||

Frank | 5.9547 | −180.5580 | −179.2258 | 0.0377 | ||

MLE | GH | 2.0430 | −182.6685 | −181.3363 | 0.0363 | |

Clayton | 1.6118 | −178.7845 | −177.4523 | 0.0389 | ||

KTE | Frank | 8.0150 | −182.2286 | −180.8964 | 0.0366 | |

II | GH | 2.5200 | −177.2635 | −175.9313 | 0.0400 | |

Clayton | 3.0400 | −183.6340 | −182.3018 | 0.0357 | ||

Frank | 7.6888 | −181.5102 | −180.1780 | 0.0371 | ||

MLE | GH | 2.4162 | −176.0000 | −174.6678 | 0.0409 | |

Clayton | 1.5949 | −166.9416 | −165.6094 | 0.0481 | ||

KTE | Frank | 8.1601 | −200.2240 | −200.2240 | 0.0265 | |

III | GH | 2.5541 | −204.5003 | −203.1681 | 0.0246 | |

Clayton | 3.1081 | −193.9300 | −192.5978 | 0.0297 | ||

Frank | 7.8457 | −200.3987 | −199.0665 | 0.0265 | ||

MLE | GH | 2.5534 | −204.4974 | −203.1652 | 0.0246 | |

Clayton | 2.1037 | −189.1661 | −187.8339 | 0.0323 |

Case | Type | Sample statistic | Critical values of various significance level a | |||||
---|---|---|---|---|---|---|---|---|

0.20 | 0.15 | 0.10 | 0.05 | 0.02 | 0.01 | |||

I | Dn | 0.1139 | 0.9351 | 0.9467 | 0.9550 | 0.9614 | 0.9745 | 0.9883 |

1.0468 | 1.2392 | 1.3789 | 1.5587 | 2.0436 | 2.2835 | 2.5604 | ||

II | Dn | 0.1423 | 0.9283 | 0.9408 | 0.9598 | 0.9896 | 0.9907 | 0.9961 |

0.4408 | 1.5004 | 2.1661 | 2.6674 | 3.3539 | 4.8191 | 5.0050 | ||

III | Dn | 0.0895 | 0.9489 | 0.9581 | 0.9706 | 0.9882 | 0.9969 | 0.9985 |

0.3370 | 1.7051 | 2.0485 | 2.3323 | 3.0700 | 3.9724 | 4.7634 |

Case | Type | Sample statistic | Critical values of various significance level a | |||||
---|---|---|---|---|---|---|---|---|

0.20 | 0.15 | 0.10 | 0.05 | 0.02 | 0.01 | |||

I | Dn | 0.1139 | 0.9351 | 0.9467 | 0.9550 | 0.9614 | 0.9745 | 0.9883 |

1.0468 | 1.2392 | 1.3789 | 1.5587 | 2.0436 | 2.2835 | 2.5604 | ||

II | Dn | 0.1423 | 0.9283 | 0.9408 | 0.9598 | 0.9896 | 0.9907 | 0.9961 |

0.4408 | 1.5004 | 2.1661 | 2.6674 | 3.3539 | 4.8191 | 5.0050 | ||

III | Dn | 0.0895 | 0.9489 | 0.9581 | 0.9706 | 0.9882 | 0.9969 | 0.9985 |

0.3370 | 1.7051 | 2.0485 | 2.3323 | 3.0700 | 3.9724 | 4.7634 |

### Composite likelihood-based design values and uncertainty estimation

Considering P-III as a marginal distribution and the best-fit copulas selected above, the estimates of the distribution parameters and the variance-covariance matrix based on the CBCLA were obtained from Equations (8) and (25), respectively. The precipitation design values along with the associated standard errors and widths of the 95% confidence intervals for 10, 20, 50, 100, 200 and 500 year return periods based on these parameters and the dispersion characteristics are shown in Table 9. The precipitation design values, standard errors and 95% confidence interval widths derived for only a univariate basis are also given in Table 9.

Case | Return period (years) | Design value | Standard error | Confidence width | Case | Return period (years) | Design value | Standard error | Confidence width |
---|---|---|---|---|---|---|---|---|---|

Univariate | 10 | 775.2 | 51.2 | 200.7 | I | 10 | 757.8 | 45.6 | 178.8 |

20 | 841.8 | 64.8 | 253.9 | 20 | 818.3 | 57.7 | 226.1 | ||

50 | 919.8 | 85.0 | 333.2 | 50 | 889.1 | 75.9 | 297.5 | ||

100 | 973.6 | 101.4 | 397.5 | 100 | 937.8 | 90.8 | 355.7 | ||

200 | 1024.0 | 118.5 | 464.3 | 200 | 983.6 | 106.2 | 416.3 | ||

500 | 1086.8 | 141.7 | 555.5 | 500 | 1040.6 | 127.4 | 499.2 | ||

II | 10 | 788.2 | 49.8 | 195.1 | III | 10 | 775.0 | 49.7 | 194.9 |

20 | 854.8 | 62.9 | 246.7 | 20 | 840.7 | 62.9 | 246.5 | ||

50 | 932.8 | 82.9 | 325.0 | 50 | 917.5 | 82.7 | 324.1 | ||

100 | 986.5 | 99.2 | 389.0 | 100 | 970.4 | 98.8 | 387.4 | ||

200 | 1037.0 | 116.3 | 455.7 | 200 | 1020.1 | 115.6 | 453.2 | ||

500 | 1099.8 | 139.6 | 547.1 | 500 | 1081.9 | 138.6 | 543.1 |

Case | Return period (years) | Design value | Standard error | Confidence width | Case | Return period (years) | Design value | Standard error | Confidence width |
---|---|---|---|---|---|---|---|---|---|

Univariate | 10 | 775.2 | 51.2 | 200.7 | I | 10 | 757.8 | 45.6 | 178.8 |

20 | 841.8 | 64.8 | 253.9 | 20 | 818.3 | 57.7 | 226.1 | ||

50 | 919.8 | 85.0 | 333.2 | 50 | 889.1 | 75.9 | 297.5 | ||

100 | 973.6 | 101.4 | 397.5 | 100 | 937.8 | 90.8 | 355.7 | ||

200 | 1024.0 | 118.5 | 464.3 | 200 | 983.6 | 106.2 | 416.3 | ||

500 | 1086.8 | 141.7 | 555.5 | 500 | 1040.6 | 127.4 | 499.2 | ||

II | 10 | 788.2 | 49.8 | 195.1 | III | 10 | 775.0 | 49.7 | 194.9 |

20 | 854.8 | 62.9 | 246.7 | 20 | 840.7 | 62.9 | 246.5 | ||

50 | 932.8 | 82.9 | 325.0 | 50 | 917.5 | 82.7 | 324.1 | ||

100 | 986.5 | 99.2 | 389.0 | 100 | 970.4 | 98.8 | 387.4 | ||

200 | 1037.0 | 116.3 | 455.7 | 200 | 1020.1 | 115.6 | 453.2 | ||

500 | 1099.8 | 139.6 | 547.1 | 500 | 1081.9 | 138.6 | 543.1 |

In addition, the standard errors and confidence widths obtained from the composite cases were smaller than those derived from the univariate case (see Table 9), indicating that the CBCLA yields a more precise parameter estimation. Table 10 compares the differences in the uncertainty reductions for the standard errors and confidence widths under different situations. For example, for composite cases I, II, and III, the standard error and confidence interval for a return period of *T* = 50 years decreased 10.71, 2.48, and 2.73%, respectively, compared with the univariate case when using CBCLA.

Return period (years) | Case I to Univariate | Case II to Univariate | Case III to Univariate | ||||||
---|---|---|---|---|---|---|---|---|---|

Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | |

10 | −2.24 | −10.93 | −10.93 | 1.68 | −2.81 | −2.81 | −0.02 | −2.92 | −2.92 |

20 | −2.79 | −10.93 | −10.93 | 1.55 | −2.83 | −2.83 | −0.14 | −2.92 | −2.92 |

50 | −3.34 | −10.71 | −10.71 | 1.41 | −2.48 | −2.48 | −0.26 | −2.73 | −2.73 |

100 | −3.67 | −10.52 | −10.52 | 1.33 | −2.15 | −2.15 | −0.33 | −2.56 | −2.56 |

200 | −3.95 | −10.33 | −10.33 | 1.27 | −1.85 | −1.85 | −0.38 | −2.41 | −2.41 |

500 | −4.25 | −10.12 | −10.12 | 1.19 | −1.50 | −1.50 | −0.45 | −2.22 | −2.22 |

Return period (years) | Case I to Univariate | Case II to Univariate | Case III to Univariate | ||||||
---|---|---|---|---|---|---|---|---|---|

Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | Design value | Standard error | Confidence width | |

10 | −2.24 | −10.93 | −10.93 | 1.68 | −2.81 | −2.81 | −0.02 | −2.92 | −2.92 |

20 | −2.79 | −10.93 | −10.93 | 1.55 | −2.83 | −2.83 | −0.14 | −2.92 | −2.92 |

50 | −3.34 | −10.71 | −10.71 | 1.41 | −2.48 | −2.48 | −0.26 | −2.73 | −2.73 |

100 | −3.67 | −10.52 | −10.52 | 1.33 | −2.15 | −2.15 | −0.33 | −2.56 | −2.56 |

200 | −3.95 | −10.33 | −10.33 | 1.27 | −1.85 | −1.85 | −0.38 | −2.41 | −2.41 |

500 | −4.25 | −10.12 | −10.12 | 1.19 | −1.50 | −1.50 | −0.45 | −2.22 | −2.22 |

Furthermore, the reductions in the standard errors and confidence widths in case I (10–20%) are greater than those in cases II and III (1.5–3%) due to the length of the exclusive period in the additional data. For the same concurrent period length, the exclusive period in case I is longer than those in cases II and III (Table 4). The increased information for a shorter series is more significant when the length of the associated data is longer. As a result, integrating the multivariate data set in the CBCLA is more advantageous than the univariate analysis.

Overall, the uncertainty in the design values decreases when using the CBCLA, which validates the merits of using this approach to improve the accuracy and precision of design values with insufficient data.

## CONCLUSIONS

To summarize, this study presented a CBCLA for the parameter estimation of shorter series with a P-III distribution. Using this method improved the accuracy of the parameter estimation, as reflected in the confidence intervals of the design precipitation values. Four annual precipitation series in the WRB of China were selected as a case study. The following conclusions were drawn:

Three bivariate composite events were constructed based on four precipitation series of different lengths. The bivariate data set in each composite event consisted of two parts, i.e., the concurrent period and the exclusive part in the longer series. The marginal distributions of the concurrent and exclusive part were fitted using a P-III distribution, and the corresponding parameters were estimated using the MLM. The dependence structure of the overlapping records was modeled using Archimedean copulas, and a copula-based bivariate composite likelihood function for parameter estimation was established based on the likelihood function. Finally, the precision of the estimates calculated with this approach was compared with the precision of the univariate maximum likelihood estimates. The results show that the uncertainty in the precipitation design values due to inadequate data decreased and that this reduction in uncertainty became more significant as the length of the exclusive data series increased.

Monte-Carlo simulation results showed that using CBCLA with P-III marginals achieved improved design values for short sequences in terms of both standard error and confidence intervals.

The main advantage of this approach is that it considers the unused data in a longer data series to build a multivariate data set with different lengths, which cannot be implemented using conventional methods. This integration enhances the information for short series and yields improved parameter estimation and estimation precision. Furthermore, copulas were used to build joint distribution functions in this approach to avoid the assumption that the marginal distribution must belong to the same type, as in the traditional multivariate distribution. The marginal and bivariate distributions used the P-III distribution and Archimedean copulas, respectively. Further studies on this approach with other marginals and copula models related to other hydrologic designs should be conducted.

## ACKNOWLEDGEMENTS

The present study is ﬁnancially supported by the National Natural Science Foundation of China (Grant Nos 51479171, 51179160 and 51579059). The authors also wish to express their cordial gratitude to the editor and anonymous reviewers for their helpful comments which have greatly helped to improve the quality of this paper.

## REFERENCES

*.*

*.*

*.*

*.*

*.*

*.*