Quantification of the inherent uncertainty in hydrologic forecasting is essential for flood control and water resources management. The existing approaches, such as Bayesian model averaging (BMA), hydrologic uncertainty processor (HUP), copula-BMA (CBMA), aim at developing reliable probabilistic forecasts to characterize the uncertainty induced by model structures. In the probability forecast framework, these approaches either assume the probability density function (PDF) to follow a certain distribution, or are unable to reduce bias effectively for complex hydrological forecasts. To overcome these limitations, a copula Bayesian processor associated with BMA (CBP-BMA) method is proposed with ensemble lumped hydrological models. Comparing with the BMA and CBMA methods, the CBP-BMA method relaxes any assumption on the distribution of conditional PDFs. Several evaluation criteria, such as containing ratio, average bandwidth and average deviation amplitude of probabilistic application, are utilized to evaluate the model performance. The case study results demonstrate that the CBP-BMA method can improve hydrological forecasting precision with higher cover ratios more than 90%, which are increased by 4.4% and 3.2%, 2.2% and 1.7% over those of BMA and CBMA during the calibration and validation periods, respectively. The proposed CBP-BMA method provides an alternative approach for uncertainty estimation of hydrological multi-model forecasts.
Hydrological forecasting is an essential endeavor in water resources management and optimal control of rivers, such as, flood control, reservoir operation, navigation, and leisure activities (Montanari & Brath 2004; Fan et al. 2016; Liu et al. 2017a, 2017b; Wu et al. 2017). Despite the tremendous amount of resources invested in developing more hydrologic models, no one can convincingly claim that any particular model in existence today is superior to other models for all type of applications and under all conditions (Wu et al. 2015; Liu et al. 2016; Ba et al. 2017). Different models are capable of capturing different aspects of the hydrologic processes. The uncertainty of each model arises from parameters, calibration, design of model structure, and input measurements, which partially bring underlying imprecise influence (Götzinger & Bárdossy 2008; Li et al. 2011; Hemri et al. 2015). One of the primary techniques to reflect different uncertainties in hydrological forecasts is to create an ensemble of forecast trajectories (McEnery et al. 2005; Seo et al. 2006; Han et al. 2014).
In recent years, some techniques develop the probability distribution of forecast to account for different sources of uncertainty. The Bayesian forecasting system proposed by Krzysztofowicz (1999) provides a general framework for hydrologic probability forecasting. Krzysztofowicz & Kelly (2000) developed a meta-Gaussian hydrologic uncertainty processor (HUP), which converts the meta-Gaussian distribution to Gaussian distribution by normal quantile transform. Due to the advantages of non-normal distribution assumption, the meta-Gaussian HUP has been widely implemented in hydrologic and meteorological fields (Chen et al. 2013).
Aside from traditional HUP, the Bayesian model average (BMA) method introduced by Raftery et al. (2005) follows a statistical technique to combine the advantages of different models. Different from other multi-model methods, the BMA method presents a more realistic description of predictive uncertainty, since the BMA predictive variance can be broken down into two components: between-model variability and within-model variability (Ajami et al. 2007). The BMA method is a statistical procedure that infers consensus predictions by weighing individual predictions based on their probabilistic likelihood measures, with the better performing predictions receiving higher weights than the worse performing ones. The method has been explored to improve both the accuracy and reliability of stream flow predictions (Vrugt & Robinson 2007; Liang et al. 2011). Duan et al. (2007) concluded that the combination of multi-model ensemble strategies using the BMA framework could quantify statements on prediction uncertainty and significantly improve verification performances. Zhou et al. (2016) compared the mean prediction of BMA with its individual parameter transfer method (physical similarity approach) and demonstrated that the probabilistic predictions of BMA could reduce the uncertainty with a significant degree. Nevertheless, the standard BMA method imposes a large number of pseudo-variation requirements and this influences precise understanding of data variations, which gives rise to further development on this theoretical research (Madadgar & Moradkhani 2014).
Copula functions are versatile and promising tools for characterizing the dependence structures in hydrologic processes as it allows modeling the dependence structure among random variables independently of the marginal distributions, which have been successfully applied in probabilistic analysis in recent years (Lee & Salas 2011; Li et al. 2013; Chen et al. 2015; Liu et al. 2015; Liu et al. 2017a, 2017b). Klein et al. (2016) used a mixture of marginal density distribution to estimate predictive uncertainty of hydrologic multi-model ensembles by using pair-copula construction. Similar research shows that copula technique is an effective tool for reflecting the unclear and complex relationships, because it can flexibly choose an arbitrary type for the marginal distributions instead of Gaussian distribution (Carreau & Bouvier 2016; Khajehei & Moradkhani 2017). According to the promising results of using copula functions in post-processing of hydrologic forecasts, Madadgar & Moradkhani (2014) firstly integrated copula functions with BMA to estimate the posterior distribution and found that copula-BMA (CBMA) is an effective post-processor to relax any assumption on the distribution of conditional probability density function (PDF). The CBMA not only displayed better deterministic skill than BMA but also confirmed the impact of posterior distribution in calculating the weights of individual models by EM algorithm. Results indicated that the predictive distributions are more accurate and reliable. It is also shown that the post-processed forecasts have better correlation with observations after CBMA application. The CBMA method in the meteorological application does not need to assume the shape of posterior distribution and leaves out the data-transformation procedure and demonstrates that predictive distributions are less bias and more confident with small uncertainty (Möller et al. 2013).
Inspired by the ideas of Madadgar & Moradkhani (2014), a general framework of the combination of copula Bayesian processor with BMA (CBP-BMA) is proposed in this study, where the Bayesian theory is applied in the transformation of posterior distribution. The flowchart of different probability forecast methods based on deterministic models is described in Figure 1. The remainder of the paper describes the case study and three hydrological models in the next section. This is followed by illustrations of concepts of three different methods of probability forecast as well as the evaluation criteria applied for multi-model techniques. Results and discussions of the case study are then given, and finally a summary of the conclusions drawn.
MATERIALS AND HYDROLOGICAL MODELS
Study area and data
The proposed methods are applied to daily mean flow discharge at the Mumahe catchment (Figure 2), a sub-basin of Hanjiang River Basin in China. The catchment lies in Shanxi Province with an area of 1,224 km2 and locates in the subtropical monsoon region with humid climate and fairly heavy precipitation. The annual mean precipitation and runoff is 1,070 mm and 687 mm, respectively. The available dataset contains daily precipitation, runoff and evaporation with a length of 11 years (1980–1990). The first year (1980) is used as the spin-up period for each hydrologic model in order to achieve the best effective model formulation. The remaining years (1981–1990) are divided into two sub-periods, with 7 years (1981–1987) for calibration and 3 years (1988–1990) for validation.
Description of the hydrological models
Three well established conceptual hydrological models are implemented in the Mumahe catchment, including the Xinanjiang (XAJ), HBV and SIMHYD models. The XAJ model has been used in humid and semi-humid regions worldwide (Zhao 1992). It consists of a runoff generation component with seven parameters and a routing component with ten parameters. These physical parameters within the model represent abstract conceptual expressions of watershed features. The HBV model is a synthetic flow model with 13 parameters which need to be calibrated. Units of the HBV model form the routines for snowmelt accumulation, evapotranspiration and soil routine and response function. The core concept assumes runoff volume changes with soil humidity exponentially (Montero et al. 2016). The SIMHYD model is a lumped conceptual hydrological model which contains seven parameters requiring calibration. This model divides runoff into three components: surface flow, interflow and base flow. The surface flow is infiltration excess runoff, inter-flow is estimated as a linear function of the soil wetness, and base flow is simulated as a linear recession from the groundwater store (Chiew et al. 2009; Yu & Zhu 2015). The infiltration rate is a core of the model.
Bayesian model averaging
Raftery et al. (2005) successfully extended BMA to statistical post-processing for forecast ensembles. The BMA method addresses total model uncertainty by conditioning not only on a single outstanding model, but on the entire ensemble models. The method was originally proposed as a pathway for method combination of several competing models (Duan et al. 2007; Liang et al. 2011).
The BMA method
Estimation of probabilistic prediction
After the completion of BMA parameter estimation by the EM algorithm (Duan et al. 2007), another feature of the BMA method is to make use of the Monte Carlo method to derive BMA probabilistic ensemble prediction for any time t (Kuczera & Parent 1998). The procedures are described as follows (Zhou et al. 2016).
Select the probabilistic ensemble size, M (M = 100).
Randomly generate a value of k from the numbers [1, 2 … K] with probabilities using:
initial the cumulative weight = 0 and compute for ;
generate a random number u between 0 and 1;
if , then the ith member of the ensemble predictions are chosen.
Generate a value of q from the pdf of pi(q| fi, ).
Repeat steps (2) and (3) for M times.
The results are sorted in ascending order and the 90% confidence interval can be derived within the range of the 5% and 95% quantiles.
The hybrid CBMA
As illustrated before, the BMA predictive distribution provides a weighted average of simulation pdf which generally complies with a parametric distribution, e.g., Gaussian distribution after the Box-Cox transformation. Madadgar & Moradkhani (2014) employed copula to estimate the posterior distribution of forecast variables for each model, i.e., , and found that the hydrological forecasts are improved after the integration of CBMA. A series of research demonstrates that the procedures of CBMA not only eliminate the prophase bias-correction and the external calculation of variance, but also simplify the calculation of the weighted average and the probability model structure by copula (Möller et al. 2013).
The CBMA method
It can be seen from Equation (9) that it relaxes any assumption on the type of posterior distribution , whose term can be directly inferred with the help of copula functions. Once the term is defined, their weights are estimated by the EM algorithm with a few adjustments, which can refer to Madadgar & Moradkhani (2014) in detail.
The hybrid CBMA model applies the idea of ‘pair and ensemble’. The pair of observation q and the ith model simulation is established to acquire the probability task by the well developed copula theory, while the ensemble is to formulate a consensus probability interval.
Copula Bayesian processor associated with BMA (CBP-BMA) method
Copula Bayesian processor
The CBP-BMA method
Evaluation criteria for multi-model techniques
Deterministic model assessment indices
To evaluate the quality of the deterministic model, three metrics are introduced as follows.
- Nash–Sutcliffe efficiency coefficient (NSE) and are the observed and the simulated, respectively, and represents the average of the observed. NSE has a range of (, 1] with the best NSE equal to 1 (Nash & Sutcliffe 1970).
As the second tool employed is sensitive to the differences between observations and simulations, the values of DRMS approaching to 0 stand for better performance.
- Kling-Gupta efficiency (KGE) is the bias ratio indicator; is the variability ratio (Kling et al. 2012). All calculative variables are replaced by the expected values of the estimate predictive distributions.
Verification of probabilistic simulations
With regard to assessment of assessing the uncertainty analysis of simulation interval, Xiong et al. (2009) and Dong et al. (2013) presented multiple verification indices and applied in hydrologic practice. Three main metrics are selected to evaluate the simulation uncertainty intervals generated by the BMA, CBMA and CBP-BMA methods.
Containing ratio (CR)The containing ratio is utilized as a significant index for assessing the goodness of the uncertainty interval. It is defined as the percentage of observed data points that fall between the prediction bounds, directly reflecting the interval performance. is denoted as the lower bound corresponding to 5% quantile at time t, is denoted as the upper bound corresponding to 95% of the quantile. And is the number of the observed data points that satisfy the inequality conditions.
- Average bandwidth (B)
Average deviation amplitude (D)
RESULTS AND DISCUSSION
Different multi-model techniques, i.e., BMA, CBMA and CBP-BMA, are applied to combine the ensemble streamflow simulation. The structures of three hydrologic models ought to be determined as the deterministic results are crucial to the final uncertainty analysis. As mentioned above, the calibration parameters of the first BMA method are and . In the CBMA method, they are the parameters of marginal distributions, weights and the parameters of the pdf of copula. In the CBP-BMA method, the Monte Carlo sampling technique is also used to obtain the integral item.
Deterministic hydrologic model simulations
The genetic and simplex algorithms are used for model calibration on account of their flexibility and good convergence. The genetic algorithm can acquire the global optimal value independent of initial parameter values. The simplex algorithm is of high accuracy with low convergence rate. With the merit of two methods integrated, the approximately optimal values of model parameters are obtained. Three deterministic assessment indices, NSE, DRMS and KGE scores over the calibration period (1981–1987) and the validation period (1988–1990) are calculated for XAJ, HBV and SIMHYD models. Table 1 indicates that the XAJ model has the best results, the HBV model takes second place, and the SIMHYD model behaves worst among the three. The reason can be attributed to the dissimilar process for the calibration of each model (Nasonova et al. 2009). In practice, it might partially refer to inaccurate estimation of model parameters as one of error sources of the model structure, abstract formulation of physical processes, and different sources of forcing dataset for each model. In general, these simulation results can be used as the input data of multi-model ensemble in terms of the values of NSE and KGE, which are 85% and higher than 0.82, respectively, apart from the poor value of KGE of HBV.
|NSE (%)||DRMS||KGE (%)||NSE (%)||DRMS||KGE (%)|
|NSE (%)||DRMS||KGE (%)||NSE (%)||DRMS||KGE (%)|
Note: values in bold represent the optimal result.
Determination of the marginal distributions
The marginal distributions of the random variables of H and Si (i = 1, 2, 3) need to be determined (Li et al. 2017). Five common candidate distributions (in Table 2) have been fitted to the daily mean streamflow values as well as to the XAJ, HBV and SIMHYD model simulations.
|Gamma||x > 0|
|Pearson type III|
|Gamma||x > 0|
|Pearson type III|
With regard to random variable H, the parameters of five candidate distributions were estimated by the method of L-moment (Jonathan 1990; Chen et al. 2017; Li et al. 2017) and the parameter values are listed in Table 3. The K-S tests are used to verify the null hypothesis and the corresponding statistic DKS values are also listed in Table 3. It is shown that the null hypothesis could not be rejected at the 95% confidence level (threshold value , N is the number of sampling points) for Log Normal distributions with providing the minimum DKS value. Meanwhile, Figure 3 indicates the Log Normal is satisfactory on visual inspection and that the cumulative distribution function (cdf) plots of the theoretical Log Normal distributions fitted the empirical cdf values obtained from the Gringorten plotting-position formula relatively well. The estimation of marginal distributions for Si had similar procedures. The Kolmogorov-Smirnov statistics DKS indicate that the Log Normal distribution also gives the best fit in this study.
|Pearson type III||0.20||0.24||0.20||0.19|
|Pearson type III||0.20||0.24||0.20||0.19|
Note: values in bold denote that the distribution model passes the goodness-of-fit test at 0.05 significance level.
Archimedes copula selection and estimation
In the application of the CBMA and CBP-BMA methods, a copula function to link the cdf of observation and model simulations needs to be defined. Different families of copula have been proposed and described, such as Archimedes copula, T-copula, pair-copula. The Archimedes copula family is more desirable for hydrologic analyses, because most of the family can be easily constructed and some of the family can be applied to describe the positive correlation of variables exactly reflecting the relationship among observation and model simulations (Madadgar & Moradkhani 2014; Chen et al. 2015). In consequence, the Gumbel, Clayton and Frank copula belonging to Archimedes family are chosen to test for flexibility and universality.
For Archimedes copula, the Kendall correlation coefficient τi (i = 1, 2, 3) between observed and different simulated flows is firstly derived. The higher τi indicator reflects the stronger correlation of observation and model simulation. The corresponding copula parameter θi is calculated by the method based on the inversion of τi in Table 4. The parameter estimators and goodness-of-fit test (RMSE and AIC) are used to determine the best-fit copula for integrating the streamflow properties. The results illustrate that copulas have the good performance in exploring the associations of observed and simulated flows. All variables passed the null hypothesis for Gumbel and Frank copulas. Gumbel copula performs with the lowest RMSE and AIC values.
Deterministic assessment of three ensemble methods
Figure 4 displays the weight estimates of individual models. These weights range from 0.31 to 0.38, which reveals the three approximately occupy an equal proportion. We check the mean simulation of hydrologic multi-model ensembles using three criteria as shown in Equations (15)–(17). The effectiveness results of BMA, CBMA and CBP-BMA methods are listed in Table 1. The performances of different multi-models are better than that of the individual XAJ model in terms of NSE. The BMA method outperforms the reference model at the cost of DRMS and KGE indicators. The CBMA and CBP-BMA methods slightly improve in all aspects during the calibration period, which have excellent properties in the validation period. The reason of the CBMA and CBP-BMA methods enhancing the performance can be attributed to that copula functions are efficient tools to remove bias instead of a simple bias correction such as linear regression in the BMA method (Madadgar & Moradkhani 2014). Especially, copula has reliable parameter estimation prior model average procedure. Another reason might be owed to the weight of each individual model, which is directly influenced by the estimation of posterior distributions.
Figure 5 illustrates the bar plots of KGE score and its components. The KGE score might be a little descending through BMA or CBMA application, a little incremental through CBP-BMA application in comparison with the best XAJ model. The correlation coefficients between observation and simulation of individual models are up to 0.93 in the calibration period and 0.92 in the verification period, which represent stronger correlation for the values are more than 0.9. However, indicator of deterministic models varies from 0.64 for HBV model to 0.97 for XAJ model. The value less than 1 indicates total amount of streamflow simulation in any individual model is less than that of observation. It might cause the general underestimation of the mean streamflow (negative bias) in hydrological multi-model ensemble applications. The BMA method is such a promising method in locating simulation to observation for its term closer to 1. Regarding the variability ratio, all methods except for HBV could perfectly perform, but no particular method is superior to others with all .
Probabilistic verification of three ensemble methods
For probabilistic verification of simulation, Figures 6 and 7 describe the uncertainty bands of different methods for the representative year during calibration and verification periods with a visual inspection. These two plots indicate that the observed values approximately fall within the 5%–95% uncertainty range and fit the mean flow hydrograph for all multi-model ensembles. In this case, the 90% confidence interval could capture the flood peaks but miss more low flow values.
Three probabilistic verification measurements (containing ratio (CR), uncertainty (B), average deviation amplitude (D)) are presented in Table 5. It can be seen from these quantitative indices that they have a good performance in terms of containing ratio, which correspond to the confidence interval. The probability of observed value falling in the range should be in accord with the percentage of confidence interval containing points through many independent statistical experiments. The CBP-BMA method performs better than the CBMA method in terms of CR index, because it covers roughly 91% of the sample points, which is more than CBMA. A combination of CR and B possess the power to form a decision on model probabilistic performance. The comparison between the CBMA (and CBP-BMA) and BMA methods exactly illustrates that the CBMA method outperforms the BMA method, for either CR, B or D. In particular, the containing ratios of the CBP-BMA method in different periods are up to 91.17% and 91.33%, respectively. Referring to the smaller B result in the CBMA and CBP-BMA methods, the total predictive variance is reduced by relaxing the pdf generated by copula functions rather than the Gaussian posterior distribution via Box-Cox transformation. Since the between-model variance remains identical after using the same EM algorithm in all three methods, it is inferred that the reduction of within-model variance works.
|CR (%)||B||D||CR (%)||B||D|
|CR (%)||B||D||CR (%)||B||D|
Note: values in bold represent the optimal result.
The CBMA and CBP-BMA methods are two flexible and robust approaches to estimate uncertainty in terms of the optimal bandwidth and average deviation amplitude. They have an intuitive and simple structure conditional on several model simulations by the integration of BMA and copula tools, which makes this method promising to derive uncertainty. The difference between them reflected in the procedure of processing posterior distribution. Further improvement might be realized through the weight allocation for each model or the nonparametric posterior distribution.
Quantification of uncertainty in hydro-meteorological prediction is necessary for flood control and water resources management decision making. The proposed CBP-BMA method employed a Bayesian theory to modify the structure of the CBMA technique. The BMA, CBMA and CBP-BMA methods are implemented to evaluate uncertainties of hydrologic multi-model ensembles. The main findings are summarized as follows.
Deterministic results of different multi-model ensembles outperform those of the individual model. The CBMA and CBP-BMA methods slightly outperform BMA method in terms of NSE, DRMS and KGE. When the CBMA method is used as a reference, the CBP-BMA method can improve the NSE and KGE and enlarge DRMS values. Underestimation of all individual models may cause negative bias of ensemble multi-model.
The combination of containing ratio and bandwidth index demonstrates the probabilistic model performance with the auxiliary index-average deviation amplitude. It is found that containing ratio is approximately equal with the percentage of confidence interval. The CBMA or CBP-BMA methods outperform the BMA method in terms of evaluation criteria with a high containing ratio, small uncertainty, B, and average deviation amplitude, D.
The BMA method returns the predictive distribution of forecast variables as a weighted average of posterior distributions. The weights of the models involved are roughly defined proportional to the performance of each model. Integration of copula and BMA methods not only relaxes the assumption of posterior distribution, but simplifies the parameter estimation procedure. The results of this study are encouraging further integration of copula functions and Bayesian thought into hydro-meteorological applications where unknown conditional distributions require estimation.
This study is financially supported by the National Natural Science Foundation of China (51539009) and National Key Research and Development Project of China (2016YFC0402206).