Abstract
Although hydrological model forecasts aid water management decisions, they normally have predictive uncertainties. Generalized likelihood uncertainty estimation (GLUE) is popular for constructing predictive uncertainty bounds (PUBs). GLUE is based on simple Monte Carlo sampling (SMCS), a technique known to be ineffective in establishing behavioural simulations. This study introduced randomized block quasi-Monte Carlo sampling (RBMC). In RBMC, each parameter's range is divided into a stipulated number of sub-blocks (Snb). Parameters' values are separately generated in each sub-block. Finally, the sub-blocks are shuffled while maintaining the sequence of generated values in each sub-block. When Snb is equal to the number of simulations, RBMC reduces to SMCS. Otherwise, each Snb leads to a separate RBMC configuration or sampling scheme. The number of RBMC-based behavioural solutions was often found to be greater than that of SMCS, in some cases, by up to 33.6%. The width of the 90% confidence interval on 95th percentile flow based on SMCS was often larger than those of RBMC, sometimes by up to 23.4%. PUBs were found to vary in widths among RBMC configurations, thereby revealing the influence of the choice of a sampling scheme. Thus, GLUE based on RBMC is recommended to take into account the said influence.
HIGHLIGHTS
GLUE uses simple Monte Carlo sampling (SMCS) which is ineffective in establishing behavioral simulations.
This study introduced randomized block quasi–Monte Carlo Sampling (RBMC) for GLUE and SMCS becomes one of the various RBMC configurations.
RBMC improved the number of retained solutions by up to 33.6% in some cases.
RBMC improved the width of 90% confidence interval on a flow event by up to 23.4%.
RBMC takes into account the influence of the choice of a sampling scheme as a sub-source of calibration uncertainty.
INTRODUCTION
The indispensability of modelling as a tool for understanding different hydrological processes obligates scientific researchers to continually focus on ways of understanding and/or reducing model uncertainties. Uncertainty can be epistemic (Der Kiureghian & Ditlevsen 2009; Beven 2016; Nearing et al. 2016; Gupta & Govindaraju 2023) or aleatoric (Hora 1996; Gong et al. 2013; Nearing et al. 2016). Epistemic uncertainty can be linked to the model parameters and comprises a modeller's ignorance of his or her choice of the best-performing model. Aleatoric uncertainty is due to the noise inherent in the observations. To reduce epistemic (or reducible) uncertainty, a modeller has to make use of additional data (Nearing et al. 2016; Zhou et al. 2022) for training or calibration. Aleatoric uncertainty can be reduced by enhancing the precision and accuracy of measuring equipment. Aleatoric and epistemic uncertainties combine to yield predictive uncertainty in the model output (Beven & Binley 1992; Gupta & Govindaraju 2023). Although predictive uncertainty reduction does not mean simplification of decision-making, it is to increase the ‘trustworthiness’ of the inferences made on forecasts from scenario results. Apart from model structure, parameters, and input data, calibration comprises an important area on which a modeller can focus to reduce model uncertainty (Beven & Binley 1992; Kavetski et al. 2006; Zhang et al. 2009). Calibration entails changing the values of a model's parameters to guarantee the minimal mismatch between the observed and modelled variables. In other words, calibration is performed to minimize parameter estimation uncertainties (Eckhardt et al. 2005) and maximize a model's reliability.
Changing a model's parameters can be done manually or automatically. The idea behind manual calibration is that when certain parameters are adjusted (for instance, by increasing their magnitudes), some predictable changes in the modelled outputs can be obtained. When several interacting parameters are adjusted, the changes in the outputs can be unpredictable (Gupta et al. 1999). Furthermore, manual calibration tends to be time-consuming and requires expert knowledge of the modeller (Boyle et al. 2000). For complex models with many parameters, manual calibration can be inefficient and frustrating. In fact, a strict adherence to model calibration using manual procedure would inhibit the widespread use of complex and sophisticated models (Gupta et al. 1999). To overcome the challenges of changing parameters manually, automation of the model calibration started way back in the mid-1960s (see for instance, Dawdy & O'Donnell 1965). Automatic changing of parameters involves applications of mathematical and statistical approaches to minimize the model residuals based on mainly optimization functions. Even in the automatic calibration, it remains unlikely, given the various model uncertainties, that a modeller can obtain only one set of optimal parameters (Qi et al. 2019). This notion led to equifinality (Beven & Binley 1992; Beven & Freer 2001), a concept which recognizes the fact that there can be several sets of model parameters to yield a highly comparable model performance. The recognition and acceptance of equifinality steered the development of the generalized likelihood uncertainty estimation (GLUE) framework by Beven & Binley (1992).
GLUE is the Bayesian approach (which is, actually, an automatic calibration strategy) and consists of (i) stipulation of each parameter's upper and lower limits, (ii) randomization of many (e.g. 10,000) sets of model parameters from the prior distribution, and (iii) inferring posterior distribution using simulations. Apart from the GLUE framework, several approaches exist for establishing bounds of uncertainty on model predictions such as sequential data assimilation (Moradkhani et al. 2005; Vrugt et al. 2005), multi-model averaging methods (Georgekakos et al. 2004; Vrugt & Robinson 2007), classical Bayesian (Kuczera & Parent 1998; Thiemann et al. 2001), and pseudo-Bayesian (Freer et al. 1996). Other uncertainty analysis methods include the Bayesian total error analysis (Kavetski et al. 2003), parameter estimation code (PEST) (Doherty 2010), multi-objective analysis (Hadka & Reed 2013), and differential evolution adaptive metropolis (DREAM) (Vrugt 2016). While these methods have different underlying assumptions, each of them also has its own advantages and disadvantages. Nevertheless, among the various uncertainty analysis methods, GLUE is very popular due to its conceptual simplicity, ease of implementation, and capacity to handle various error structures and models (Blasone et al. 2008).
It can be argued that the randomization of parameter values in the GLUE framework with regard to Monte Carlo analysis is mainly ineffective in establishing behavioural simulations (Blasone et al. 2008). This problem can be compounded by the substantial computational time required to obtain the stipulated behavioural modelled series from complex models, and the hardship in dealing with high-dimensional parameter estimation problems (Blasone et al. 2008). Furthermore, the use of less formal likelihood by GLUE can lead to very flat posterior distributions (Mantovan & Todini 2006; Stedinger et al. 2008; Liu et al. 2022). To obtain better posterior distributions, one would think of other Markov chain Monte Carlo (MCMC) methods. Actually, through the use of simple Monte Carlo random sampling (hereinafter denoted as RND), GLUE does not take into account the influence of the choice of parameter sampling technique on calibration results and this is the gap on which this paper focused.
This paper introduced randomized block quasi-Monte Carlo sampling (RBMC) while investigating its suitability to support the GLUE framework. The RBMC starts by dividing each parameter range into a stipulated number of intervals or sub-blocks. The parameter's values are separately generated in each interval. The final step consists of shuffling the sub-blocks while the sequence of the generated values in each sub-block is not affected. In this way, RND is nested within the RBMC in terms of the chosen number of sub-blocks.
MATERIALS AND METHODS
RBMC sampling
Monte Carlo sampling relies on a pseudo-random generator with the underlying concept of using randomness to solve tasks which may, in principle, comprise deterministic problems. An important note is that the approach of using a pseudo-random generator leads to values which are chaotic or extremely in disarray. Thus, the generated random numbers are not equidistant and they have uneven differences in their magnitudes. In a quasi-Monte Carlo sampling, we can use the sequence of low discrepancy to bring about a faster rate of convergence. A number of low-discrepancy sequences exist such as the Faure sequence, Halton sequence, and Sobol sequence. A key challenge of low-discrepancy sequences is that they are deterministic and not random and this means that quasi-Monte Carlo sampling can be considered a de-randomized or deterministic approach. De-randomization in the quasi-Monte Carlo sampling leads to error bound which makes the estimation of the error hard. Since we want a method which can allow us to estimate the variance, randomization becomes a plausible technique to modify the quasi-Monte Carlo sampling into the randomized quasi-Monte Carlo sampling. One technique is the shuffling of sub-blocks of the generated values. Eventually, this study introduced RBMC.
For the initial parameter (or when , the first , second , third , and fourth sub-blocks are bounded by , , , and , respectively. It should be apparent that and . The next step entails determining the number of values to be generated from each sub-block of every parameter.
Let us assume that in our case, b = 4 and such that we want to generate two (i.e. 8/4 = 2) values from each sub-block (as illustrated in Figure 1). Based on the initial model parameter (θ1), we have , , and where and . In other words, we separately apply Equation (1) to each sub-block of every parameter's values. The last step entails separately shuffling the sub-blocks of the values of each parameter. Here, the shuffling process should be carefully done to ensure that only the sub-blocks are shuffled while the sequence of generated values in each sub-block is not altered. At this point, the total number of generated values for each parameter should be nsim and nwa for the cases where mod(nsim, b) = 0 and mod(nsim, b) > 0, respectively. For the first case (or when mod(nsim, b) = 0), we finally consider all the nsim generated values of each parameter for model calibration. In the second case (or when mod(nsim, b) > 0), only the first nsim (from the total nwa) generated values of each parameter are considered for model calibration or simulation. For 1 ≤ k ≤ nsim, a complete kth set of the model parameters is obtained by taking every kth value of each parameter. Therefore, a model can be run nsim times to yield nsim modelled series.
An important note is that there are two special cases when RBMC is the same as RND. The first case is when the value of the term b is set to one (1), meaning that the value of each parameter is generated once. This first case can rarely be applicable for simulation analysis given the need for large required to achieve realistic predictions in the context of Monte Carlo analysis. The second case is when the term b is set to (given that is far larger than 1) indicating that one value of a given parameter is generated in each of the nsim sub-blocks of every parameter's values. In other words, the number of every parameter's generated values becomes equal to nsim. This means that RBMC is a composite Monte Carlo sampling method which comprises RND as one of its configurations.
Existing schemes for comparison with RBMC
Latin hypercube sampling
Latin square is a square grid with sampling points such that each row and every column include one point. If this concept is extended to large numbers of dimensions such that each axis-aligned hyperplane consists of one point, it means we are dealing a with a Latin hypercube. To generate v sample values, we divide the total area under the pdf in to v equal areas. From each area, one random value is generated. Considering all the areas, we obtain a sample of v values. For instance, let us consider that we are dealing with a uniform distribution whose pdf with domain is to be divided into five equal areas. We divide the domain to obtain and . We make use of the analytical inversion method to generate one point from each interval. Thus, the interval is split into v equal portions and from each portion, a random variable is generated to obtain . The generated values become based on
Stratified Monte Carlo random sampling
Consider that a certain random variable has pdf and CDF denoted by and , respectively. For a stratified random sampling, we divide the range 0–1 into a finite number of intervals which can be of the same or equal width (such as 0–0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, 0.8–1.0 if there should be five intervals). Over each interval, we can generate the same number of deviates 's and the associated 's. The idea is that the procedure should lead to realizations of Z which are nearly evenly spread over the range with the advantage of achieving realistic estimates of using few realizations. One may realize that Latin hypercube sampling (LHS) is comparable with the stratified random sampling. However, an important note is that if we draw v samples with LHS, we will have v equal sub-space, while stratified random sampling can have w sub-space and in each sub-space samples will be drawn resulting into a total of v values or
Case study
Selected data and models
Quality-controlled daily precipitation and potential evapotranspiration (PET) over the River Mpanga catchment with an area of 1,484 km2 in Uganda were adopted from Onyutha et al. (2021). To allow comparison of the new and existing methods, two lumped conceptual hydrological models were applied. The first model was the VHM (Willems 2014) and it makes use of catchment-wide average of precipitation and PET as time-variable model inputs to generate runoff as a function of the soil moisture storage. The model splits runoff into overland flow, interflow, and slow flow and these components are separately routed and later combined into the modelled flow. The other model was the Nedbør–Afstrømnings model (NAM) (Nielsen & Hansen 1973). Like the VHM, NAM also uses lumped (or catchment-wide averaged) precipitation and PET as the meteorological model inputs. In NAM, surface storage is obtained as a function of the precipitation and PET. Quick flow and slow flow components are generated from the surface storage and separately routed and the resultant outputs are combined into the modelled flow from NAM.
The choice of the number of parameter sub-blocks
The choice of the term b in Equation (2) can be made on a case-by-case basis. For a high-dimensional parameter space, the term b could be set to range from 2 up to, say, . To consider the uncertainty of the choice of parameter sampling scheme on model output, can be greater than 10. However, since this study focused on testing the acceptability of RBMC for GLUE, was set to 10. Thus, qMC2, qMC3, … , qMC10, were considered.
Number of retained modelled series
Values of both NAM and VHM parameters were generated using RND, LHS, and nine configurations of the introduced method from qMC2 to qMC10. In other words, there were a total of 11 sampling schemes considered in this study. In the next step, and of each parameter were specified as shown in Appendix A. The number of simulations () was at first set to 1,000. In other words, a total of values of each parameter were generated using every sampling scheme. For a particular sampling scheme, each model was run times. This procedure was repeated with various values of varying from 10,000 to 90,000 at an interval of 10,000.
Comparison of uncertainty bounds of the various sampling schemes
To construct the confidence interval (CI), α = 0.10 was chosen. Normally, 95% CI (or the use of α = 0.05) is common in the application of GLUE. However, GLUE has a limitation such that it is difficult to ensure many observations fall within the 95% CI especially based on available formulations (Blasone et al. 2008). Therefore, the use of α = 0.10 was preferred to α = 0.05.
The initial value of was set to 1,000 and each model was calibrated using the sets of parameters' values generated by each of the selected sampling schemes. Two quantiles were considered for analysis including the 95th and 2nd percentile modelled flow events. The uncertainty bounds on each selected flow quantile were obtained as the difference between the upper and lower limits (or width) of the 90% CI. This procedure was repeated with varying from 10,000 to 90,000 at an interval of 10,000.
RESULTS
Likelihood
Figure 3 shows the variation of likelihood with the term N. The plots in Figure 3 were based on values in Table 1 for nsim equal to 90,000. The vertical axis of each of the plots in Figure 3(a) and 3(b) was made on a logarithmic scale for clarity. For all the values of nsim (though only shown for 90,000 in Figure 3 for illustration), the likelihood value decreased with an increase in N and this was due to the exponential effect of N. For NAM, some values were above and others below that of RND (Figure 3(a)). However, the likelihood of RND based on VHM was systematically below the values of other sampling schemes (Figure 3(b)). The likelihood of the best simulation was obtained with the term N equal to 1 (Figure 3(c) and 3(d)). Generally, the likelihood tended to increase with increasing nsim. For a particular nsim, there were a number of RBMC configurations which yielded higher likelihood values than that of RND and this was for both models (Figure 3(c) and 3(d)). Considering results from both models, there was no sampling scheme which consistently yielded the highest (or best) likelihood value for the various values of nsim and N. This demonstrates the uncertainty in calibration due to the choice of the sampling scheme. To quantify such an uncertainty, results from a large array of sampling schemes are required. Thus, the concept of having several configurations of RBMC, as demonstrated in this study, offers an important step for insights on quantifying influence from the said sub-source of calibration-related uncertainty.
Sampling scheme . | N . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 . | 5 . | 10 . | 15 . | 20 . | 1 . | 5 . | 10 . | 15 . | 20 . | |
VHM . | NAM . | |||||||||
RND | 0.7194 | 0.1927 | 0.0371 | 0.0072 | 0.0014 | 0.6800 | 0.1967 | 0.0387 | 0.0076 | 0.0011 |
LHS | 0.7166 | 0.1890 | 0.0357 | 0.0068 | 0.0013 | 0.6967 | 0.1641 | 0.0269 | 0.0044 | 0.0008 |
qMC2 | 0.7306 | 0.2082 | 0.0433 | 0.0090 | 0.0019 | 0.7248 | 0.2000 | 0.0400 | 0.0080 | 0.0016 |
qMC3 | 0.7462 | 0.2313 | 0.0535 | 0.0124 | 0.0029 | 0.7278 | 0.2042 | 0.0417 | 0.0085 | 0.0017 |
qMC4 | 0.7772 | 0.2837 | 0.0805 | 0.0228 | 0.0065 | 0.7210 | 0.1949 | 0.0380 | 0.0074 | 0.0014 |
qMC5 | 0.7351 | 0.2147 | 0.0461 | 0.0099 | 0.0021 | 0.7427 | 0.2260 | 0.0511 | 0.0115 | 0.0026 |
qMC6 | 0.8111 | 0.3511 | 0.1233 | 0.0433 | 0.0152 | 0.7074 | 0.1772 | 0.0314 | 0.0056 | 0.0010 |
qMC7 | 0.7676 | 0.2666 | 0.0711 | 0.0189 | 0.0050 | 0.7526 | 0.2415 | 0.0583 | 0.0141 | 0.0034 |
qMC8 | 0.7942 | 0.3160 | 0.0998 | 0.0315 | 0.0100 | 0.7194 | 0.1927 | 0.0371 | 0.0072 | 0.0014 |
qMC9 | 0.7236 | 0.1984 | 0.0394 | 0.0078 | 0.0015 | 0.7242 | 0.1991 | 0.0397 | 0.0079 | 0.0016 |
qMC10 | 0.7408 | 0.2231 | 0.0498 | 0.0111 | 0.0025 | 0.7147 | 0.1865 | 0.0348 | 0.0065 | 0.0012 |
Sampling scheme . | N . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 . | 5 . | 10 . | 15 . | 20 . | 1 . | 5 . | 10 . | 15 . | 20 . | |
VHM . | NAM . | |||||||||
RND | 0.7194 | 0.1927 | 0.0371 | 0.0072 | 0.0014 | 0.6800 | 0.1967 | 0.0387 | 0.0076 | 0.0011 |
LHS | 0.7166 | 0.1890 | 0.0357 | 0.0068 | 0.0013 | 0.6967 | 0.1641 | 0.0269 | 0.0044 | 0.0008 |
qMC2 | 0.7306 | 0.2082 | 0.0433 | 0.0090 | 0.0019 | 0.7248 | 0.2000 | 0.0400 | 0.0080 | 0.0016 |
qMC3 | 0.7462 | 0.2313 | 0.0535 | 0.0124 | 0.0029 | 0.7278 | 0.2042 | 0.0417 | 0.0085 | 0.0017 |
qMC4 | 0.7772 | 0.2837 | 0.0805 | 0.0228 | 0.0065 | 0.7210 | 0.1949 | 0.0380 | 0.0074 | 0.0014 |
qMC5 | 0.7351 | 0.2147 | 0.0461 | 0.0099 | 0.0021 | 0.7427 | 0.2260 | 0.0511 | 0.0115 | 0.0026 |
qMC6 | 0.8111 | 0.3511 | 0.1233 | 0.0433 | 0.0152 | 0.7074 | 0.1772 | 0.0314 | 0.0056 | 0.0010 |
qMC7 | 0.7676 | 0.2666 | 0.0711 | 0.0189 | 0.0050 | 0.7526 | 0.2415 | 0.0583 | 0.0141 | 0.0034 |
qMC8 | 0.7942 | 0.3160 | 0.0998 | 0.0315 | 0.0100 | 0.7194 | 0.1927 | 0.0371 | 0.0072 | 0.0014 |
qMC9 | 0.7236 | 0.1984 | 0.0394 | 0.0078 | 0.0015 | 0.7242 | 0.1991 | 0.0397 | 0.0079 | 0.0016 |
qMC10 | 0.7408 | 0.2231 | 0.0498 | 0.0111 | 0.0025 | 0.7147 | 0.1865 | 0.0348 | 0.0065 | 0.0012 |
Number of retained modelled series
For certain values of nsim, e.g. when nsim was 50,000 and 20,000 considering NAM (Figure 4(a)) and VHM (Figure 4(b)), respectively), the numbers of retained solutions from RND were less than those from other sampling schemes. Nevertheless, the numbers of behavioural solutions generally tended to fluctuate around those for RND. Values of RB are summarized in Table 2. The smaller the RB, the more comparable the numbers of series retained based on the two sampling schemes being considered. The larger the number of retained series, the better the sampling scheme. Thus, positive RB indicates better performance of a given sampling scheme than that of RND. The maximum improvement in NAM was 33.6% (qMC4) followed by 22.4 (qMC7). However, improvements in VHM went up to 57.6 (qMC9) followed by 32.9 (qMC7). This demonstrates the adequacy of the RBMC for GLUE. Importantly, the values of RB were both positive and negative even for particular nsim. This further reinforced the notion that the use of a single sampling scheme such as the well-known RND of LHS for GLUE comprises an uncertainty due to the choice of the parameter sampling method.
nsim × 104 . | LHS . | qMC2 . | qMC3 . | qMC4 . | qMC5 . | qMC6 . | qMC7 . | qMC8 . | qMC9 . | qMC10 . |
---|---|---|---|---|---|---|---|---|---|---|
NAM | ||||||||||
3 | − 13.5 | − 9.0 | − 3.3 | − 10.5 | − 12.3 | − 1.8 | 4.5 | − 16.8 | 1.2 | 12.6 |
4 | 8.7 | 13.5 | 16.0 | 11.2 | − 1.0 | 14.8 | 22.4 | 2.5 | − 2.3 | 20.6 |
5 | 10.4 | 13.2 | 16.2 | 33.6 | 21.7 | 6.0 | 20.0 | 1.7 | 11.7 | 8.9 |
6 | 4.9 | 0.2 | − 3.0 | 2.5 | 1.3 | 2.5 | − 0.9 | 2.7 | − 7.7 | − 9.0 |
7 | 3.0 | 10.4 | − 8.6 | − 1.1 | − 3.8 | 1.2 | 2.3 | 4.6 | 2.5 | 4.0 |
8 | 0.7 | 0.7 | − 6.3 | 1.9 | 0.9 | − 1.2 | 3.9 | 3.2 | 0.5 | − 9.3 |
9 | 1.4 | 14.3 | − 4.7 | 7.0 | 2.7 | − 1.7 | 5.4 | 10.3 | 1.8 | 1.2 |
VHM | ||||||||||
3 | 31.8 | 14.1 | 28.2 | − 4.7 | 9.4 | 14.1 | 32.9 | 27.1 | 57.6 | 14.1 |
4 | 4.5 | 21.1 | − 0.8 | 21.1 | 3.0 | 3.0 | − 13.5 | 17.3 | − 8.3 | − 7.5 |
5 | 3.5 | 0.6 | 4.7 | − 2.9 | 10.0 | 10.0 | − 11.2 | 4.7 | 28.8 | − 0.6 |
6 | 15.3 | 12.2 | 5.1 | 6.6 | 4.1 | − 15.8 | 9.7 | 3.1 | 12.8 | − 8.7 |
7 | − 6.6 | − 2.9 | 2.1 | − 3.3 | 5.8 | − 0.8 | 0.8 | 9.5 | 0.8 | 12.8 |
8 | − 8.5 | 19.6 | 15.0 | 3.8 | − 5.8 | 27.3 | 28.1 | 9.6 | 12.3 | 16.9 |
9 | 7.5 | 10.5 | 5.6 | 3.9 | 2.6 | 12.1 | 6.9 | 17.0 | − 16.4 | 2.3 |
nsim × 104 . | LHS . | qMC2 . | qMC3 . | qMC4 . | qMC5 . | qMC6 . | qMC7 . | qMC8 . | qMC9 . | qMC10 . |
---|---|---|---|---|---|---|---|---|---|---|
NAM | ||||||||||
3 | − 13.5 | − 9.0 | − 3.3 | − 10.5 | − 12.3 | − 1.8 | 4.5 | − 16.8 | 1.2 | 12.6 |
4 | 8.7 | 13.5 | 16.0 | 11.2 | − 1.0 | 14.8 | 22.4 | 2.5 | − 2.3 | 20.6 |
5 | 10.4 | 13.2 | 16.2 | 33.6 | 21.7 | 6.0 | 20.0 | 1.7 | 11.7 | 8.9 |
6 | 4.9 | 0.2 | − 3.0 | 2.5 | 1.3 | 2.5 | − 0.9 | 2.7 | − 7.7 | − 9.0 |
7 | 3.0 | 10.4 | − 8.6 | − 1.1 | − 3.8 | 1.2 | 2.3 | 4.6 | 2.5 | 4.0 |
8 | 0.7 | 0.7 | − 6.3 | 1.9 | 0.9 | − 1.2 | 3.9 | 3.2 | 0.5 | − 9.3 |
9 | 1.4 | 14.3 | − 4.7 | 7.0 | 2.7 | − 1.7 | 5.4 | 10.3 | 1.8 | 1.2 |
VHM | ||||||||||
3 | 31.8 | 14.1 | 28.2 | − 4.7 | 9.4 | 14.1 | 32.9 | 27.1 | 57.6 | 14.1 |
4 | 4.5 | 21.1 | − 0.8 | 21.1 | 3.0 | 3.0 | − 13.5 | 17.3 | − 8.3 | − 7.5 |
5 | 3.5 | 0.6 | 4.7 | − 2.9 | 10.0 | 10.0 | − 11.2 | 4.7 | 28.8 | − 0.6 |
6 | 15.3 | 12.2 | 5.1 | 6.6 | 4.1 | − 15.8 | 9.7 | 3.1 | 12.8 | − 8.7 |
7 | − 6.6 | − 2.9 | 2.1 | − 3.3 | 5.8 | − 0.8 | 0.8 | 9.5 | 0.8 | 12.8 |
8 | − 8.5 | 19.6 | 15.0 | 3.8 | − 5.8 | 27.3 | 28.1 | 9.6 | 12.3 | 16.9 |
9 | 7.5 | 10.5 | 5.6 | 3.9 | 2.6 | 12.1 | 6.9 | 17.0 | − 16.4 | 2.3 |
Uncertainty bounds
DISCUSSION
The likelihood value for the best simulation based on each RBMC configuration was shown to increase with the increasing nsim. Similar results were also obtained with RND and LHS. It is worth noting that GLUE uses the ‘less formal likelihoods’. Several studies (Mantovan & Todini 2006; Stedinger et al. 2008; Liu et al. 2022) argued that GLUE's use of informal likelihood functions (or soft rules) to determine the behavioural parameters normally results in very flat posterior distributions. Specifically, Mantovan & Todini (2006) demonstrated the incoherence of GLUE with Bayesian inference using a pseudo-Bayes experiment. They showed that ‘less formal likelihoods’ failed to add information to the conditioning process especially under the circumstance of gradually increasing the number of observations. Beven et al. (2007), in their response, clarified that GLUE was developed to be applied to actual calibration issues comprising errors from both model inputs and model structures. Furthermore, Beven et al. (2007) concluded that the demonstration by Mantovan & Todini (2006) of the incoherence of GLUE regarding the requirement that including every new observation should add information to the conditioning process could not hold.
The purpose of this paper was not to refute the argument that other MCMC methods could be more efficient and effective in obtaining a better posterior distribution of parameters than that based on GLUE. In fact, GLUE can be considered to be an extension of the Bayesian averaging approach to a less formal likelihood (Beven et al. 2000). The ‘less formal likelihood’ comprises the key aspect of the differentiation in the Bayesian inference thereby allowing for flexibility in the definition of the likelihood function to eliminate the need for strong assumptions on the error model (Jin et al. 2010). In the case when all the assumptions are satisfied, the use of the formal Bayesian technique becomes more acceptable given its linkage to its classical statistical theory and application of formal mathematical procedure and MCMC simulation for inferring parameter and model prediction distributions (Vrugt et al. 2009). It is worth noting that a direct comparison of GLUE and the formal Bayesian method is generally difficult for various reasons. Firstly, the formal Bayesian method's focus on unraveling the effects of errors due to model inputs, outputs, and model structures complicates statistical inference (Vrugt et al. 2009). On the other hand, GLUE does not consider separating these effects on the total uncertainty. Secondly, the formal Bayesian method makes use of an exact (or formal) likelihood function (assumed or transformed from an unknown form) to estimate the prediction precision of one-step ahead forecasting (Vrugt et al. 2009; Jin et al. 2010). However, GLUE applies an informal likelihood function to estimate prediction precision (or CI) on simulated variables (such as river flow in this case).
It is known that the sources of the total predictive uncertainty are numerous especially due to inputs (Renard et al. 2010; McMillan et al. 2018), calibration (Beven 2006), and model parameters or structure (Beven 1989; Butts et al. 2004; Renard et al. 2010). Attempts to reduce the total predictive uncertainty can be made by tackling the various sources of uncertainties. Beyond the calibration sphere, uncertainty due to the model structure becomes substantially dominant in the predictive uncertainty (Højberg & Refsgaard 2005; Rojas et al. 2008; Troin et al. 2018). Some sub-sources of calibration-related uncertainty include the choice of (i) a calibration method, (ii) an objective function, and (iii) an optimization approach. This study found that results (in terms of widths of CI, or number of behavioural solutions) of some RBMC configurations were below while others lied above those for RND. This demonstrated the influence from the choice of parameter sampling method as a sub-source of calibration uncertainty. Application of an array of the RBMC configurations presents a creditable capacity to quantify the total predictive uncertainty while offering the opportunity to understand the influence of the choice of a parameter sampling method. This property is not possessed by LHS or RND. Furthermore, the application of RND as a single parameter generation scheme conventionally used for GLUE makes it a difficult task to quantify the uncertainty due to the choice of sampling scheme. Other issues of the conventional sampling techniques especially RND and LHS are known. For instance, LHS assumes independence among the parameters (Petelet et al. 2010) and when some model parameters are dependent, the use of LHS in a GLUE framework can impact the number of behavioural solutions. Furthermore, the pseudo-random sampling in LHS requires several samples to be accurate (Huntington & Lyrintzis 1998). The advantage of RND normally used in the conventional GLUE framework is that it is simple and faster to generate parameter values (Huntington & Lyrintzis 1998). However, RND has the limitation of tackling problems which require high-dimensional parameter estimation (Blasone et al. 2008). Given the composite nature of the introduced RBMC such that it comprises RND as one of its configurations, this paper puts forth the proposition for researchers to adopt RBMC for running the GLUE framework.
An important step in adopting RBMC for the GLUE framework is to decide on the number of RBMC configurations for application. In this study, only 10 configurations were used to demonstrate the acceptability of the introduced method. However, the number of RBMC configurations in the application of GLUE can be far larger than 10. The larger the number of RBMC configurations used, the better the uncertainty analysis. For each RBMC configuration, a separate set of uncertainty limits and modelled series can be derived. A combination of modelled series from the various RBMC configurations yields a single model ensemble. Various methods for obtaining model ensembles exist and they date back to the 1970s (Twedt et al. 1977) among which we have simple arithmetic averaging, and weighted mean approach (Baker & Ellison 2008).
CONCLUSIONS
Many models in hydrology are complex and make use of non-linear equations in their structures. Using analytical techniques to quantify the uncertainty in such models is a difficult task. GLUE is a calibration strategy which relies on RND. The use of RND reduces the effectiveness of GLUE in establishing behavioural solutions. This study introduced RBMC and investigated its use for GLUE. In the first step of the RBMC, the upper and lower limits of each parameter are stipulated. The next step consists of deciding on the number of intervals into which the full range of each parameter is divided. Over each interval, values of every parameter are separately generated. Here, the RND approach of generating parameter values can be used. Lastly, the various sub-blocks of the generated values of a given parameter are shuffled. During the shuffling process, the sequence of the generated values in each interval is ensured not to be affected.
The number of behavioural solutions based on RBMC was larger than that of RND, in some cases, by up to 33.6%. The widths of 90% CI on 95th percentile flow based on some RBMC configurations were smaller than those of RND by up to 23.4%. For a selected nsim in the range 1,000–90,000, the numbers of behavioural solutions from RBMC fluctuated around those from RND. Similarly, the widths of 90% CI on a selected flow quantile were below and above that from RND. These findings revealed that the choice of a sampling scheme is a sub-source of calibration uncertainty. The use of RND or LHS as a single sampling scheme for GLUE is insufficient to support the quantification of the aforementioned sub-source of calibration uncertainty. In this line, the introduced method presents the key advantage that RND becomes nested within the RBMC approach. Furthermore, the introduced method takes into account the influence of the choice of a sampling scheme by offering an opportunity to select a particular number of RBMC configurations. Thus, RBMC given its demonstrated potential for uncertainty quantification is proposed to be adopted for GLUE instead of RND.
MATLAB codes for RBMC implemented in a simplified manner for calibrating HMSV lumped conceptual model (Onyutha, 2019) can be downloaded along with example modelling datasets via https://doi.org/10.5281/zenodo.10702810 for illustration purpose. The following procedure is recommended for constructing uncertainty bounds on predictions from a model using RBMC while taking into account the influence of the choice of a sampling scheme on calibration results:
- (a)
The upper and lower limits of each model parameter are stipulated, and the term b is also specified.
- (b)
is set to a large number (e.g. 100,000) and as guided under section 2.1, relevant number parameter values are generated in each interval or sub-block of a given parameter with respect to nsim, nwa and the term b stipulated in step (a). Here, a pseudo-random generator can be used to separately generate parameter values in each interval.
- (c)
The total number of behavioural solutions () is also stipulated, for instance, = 2,000.
- (d)
Threshold of the chosen objective function to generate behavioural solutions is specified (e.g. NSE = 0.6).
- (e)
The number of RBMC configurations () is chosen (e.g. = 20). This means that RBMC is varied using qMC(j) where j = 2, 3, … .., .
- (f)
By setting j in qMC(j) to 2 (or using qMC2), the model is set to run times using a while loop. This means that the while loop is terminated when the number of behavioural solutions is equal to stipulated in step (c).
- (g)
For each of the remaining RBMC configurations (or from j in qMC(j) set to 3, 4 … ., ), step (f) is repeated. This leads to sets of behavioural solutions each of size equal to .
- (h)
For each RBMC configuration, the series with the best value of the objective function (for instance, the highest NSE) is selected as the best-modelled series. The ensemble mean is obtained by averaging the best-modelled series based on all the considered RBMC configurations.
- (i)
It is worth noting that for each RBMC configuration, there are modelled values in an attempt to reproduce each observed flow event. From these modelled values, the (100 − α)% CI on the modelled flow event under consideration is obtained as th and th values, respectively. For all the selected RBMC configurations, it means that there are values of the upper limit of CI on a particular flow event. Similarly, there are values of the lower limit of CI on a flow value. To obtain the ensemble CI, the values of the upper limit of CI on a particular flow event are averaged. Similarly, the values of the lower limit of CI on a flow value are averaged.
ACKNOWLEDGEMENTS
The author acknowledges that this study made use of modelling data from Onyutha et al. (2021) which merged with reanalysis data published by Kobayashi et al. (2015), Funk et al. (2015), and Saha et al. (2014).
FUNDING
This research received no external funding.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information. Simplified MATLAB-based algorithm for RBMC sampling along with demonstration modelling datasets can be found via https://doi.org/10.5281/zenodo.10702810.
CONFLICT OF INTEREST
The authors declare there is no conflict.