Among the several hydrological model uncertainty estimation methods, the generalized uncertainty estimation (GLUE) method is popular due to its simplicity. The application of GLUE tends to be limited to the cases when hydrological models are applied individually. Notably, little attention is given to model differences when applying GLUE. This study introduced a framework for multi-hydrological model ensemble prediction uncertainty estimation (e-PRUNE). For demonstration, the framework was applied to real hydrometeorological data while considering three sub-sources of calibration-related uncertainty including the influence of the choice of a sampling scheme (SC), hydrological model (HM), and objective function (OF). Ten SCs, six HMs, and eight OFs were considered. Influences from SCs, OFs, and HMs were combined and assumed to substantially comprise the overall predictive uncertainty (OPU). The sub-uncertainty bound based on HM's choice was larger than that of either SC's or OF's selection. Contributions of sub-uncertainties from HM, SC, and OF to the OPU were additive. Thus, the effect of removing one source of uncertainty (for instance, OF) could easily be realized from the width of OPU's bounds. This study showed the importance of the e-PRUNE framework for insight into the contributions of various sub-uncertainty sources to the OPU.

  • GLUE tends to be limited to cases when models are applied individually.

  • Model differences are apparent, yet they are not considered when applying GLUE.

  • A multi-model ensemble prediction uncertainty estimation (e-PRUNE) framework is introduced.

  • The potential of the framework was demonstrated using three uncertainty sources.

  • The framework gives insights into the contributions of various uncertainty sources.

To aid predictive planning and decision-making for local or regional-scale management of hydrological systems under the ongoing global environmental challenges, the application of hydrological models (HM) is on the increase. There are various sources of uncertainties in hydrological modelling, including model structures, parameters, model inputs, uncertainty in observed data, calibration data, and calibration approaches (Renard et al. 2010; Blöschl et al. 2019; Moges et al. 2020). Uncertainties related to calibration stem from various sources such as errors in observations. Because the initial parameter values are often not optimal, mismatches between observed and modelled series tend to be large. Calibration or adjustments in the parameter values to reduce the said mismatches and thereby increase the credibility of a model's prediction can be manually or automatically made. Here, it is worth noting that there are several sub-sources of calibration uncertainty, including observation errors, the choice of a calibration method, the selection of an objective function (OF) (Onyutha 2024a), and opting for a particular sampling scheme (SC) (Onyutha 2024b). Apart from calibration uncertainty, other model uncertainty sources are linked with parameters, model structures, and model inputs (Renard et al. 2010; Faramarzi et al. 2013; Gupta & Govindaraju 2023). Due to various uncertainty sources, there are normally many sets of model parameters that can optimally guarantee satisfactory model fit (Qi et al. 2019). This is actually the notion of equifinality (Beven & Binley 1992; Beven 2006), in which there can exist various sets of model parameters for which the performance of a model cannot be rejected (Beven 2006). The said notion of equifinality further gave rise to the generalized likelihood uncertainty estimation (GLUE) methodology (Beven & Binley 1992) to construct predictive uncertainty bounds.

GLUE (Beven & Binley 1992) is an extension of the generalized sensitivity analysis (GSA) (Spear & Hornberger 1980). In GSA, each parameter of a model is generated from the normal or uniform distribution. Eventually, many sets of parameters can be obtained. A selected model is run using the various sets of parameters to obtain several modelled series of two groups, including behavioural and non-behavioural solutions, which are consistent and inconsistent with observations, respectively (Spear & Hornberger 1980). In GLUE, several sets of generated parameters are mainly generated from the uniform distribution. The model is run using each set of parameters. The acceptability of each model's output is assessed in terms of the mismatch between observed and modelled series, and this is done in terms of the ‘goodness-of-fit’ measure. A model's outputs for which values of the ‘goodness-of-fit’ measures fall below and above the chosen threshold are termed behavioural and non-behavioural solutions, respectively. We construct the cumulative density function (CDF) of the model's output using the behavioural solutions. In turn, the confidence intervals (CIs) (representing model uncertainty) are constructed using quantiles from the CDFs. For instance, to construct 95% CI, we generate 1,000 bootstrap resamples and select the 25th and 975th quantiles of the ranked solutions. Compared to formal techniques of uncertainty analysis, the overall advantage of the use of GLUE has attracted considerable debate. For instance, Blasone et al. (2008) concluded that GLUE is ineffective in establishing behavioural solutions. The ineffectiveness of GLUE could, to some extent, be linked to errors in the model's outputs (Xiong & O'Connor 2008) and the reliance of the framework on the simple Monte Carlo sampling technique (Onyutha 2024b). Furthermore, Blasone et al. (2008) demonstrated that the uncertainty bounds from GLUE tend to be subjective. In the appraisal of GLUE, the use of informal likelihood by GLUE was found to lead to flat posterior distributions (Christensen 2004; Mantovan & Todini 2006; Mantovan et al. 2007; Blasone et al. 2008; Stedinger et al. 2008; Liu et al. 2022). In the same vein, Stedinger et al. (2008) argued that the choice of a rigid threshold of a ‘goodness-of-fit’ measure for obtaining behavioural solutions indicates the incorrectness of the statistical analysis. Besides, Hossain & Anagnostou (2005) argued that GLUE has a prohibitive computational burden that substantially limits its application. Despite the above limitations, GLUE remains very popular because it is simple, easy to implement, and can handle several error structures and models (Blasone et al. 2008).

A few researchers made some improvements to GLUE. One improvement for GLUE was in terms of the introduction of an approach to stochastically model parameters' response surface (Hossain & Anagnostou 2005). The advantage of the introduced sampling approach is that the sampling recognizes the inherent non-linear interactions among parameters and it preserves the key features of uncertainty structure. Blasone et al. (2008) improved GLUE through the use of adaptive Markov Chain Monte Carlo (MCMC) to sample prior parameter space. The revised GLUE performs better than the original GLUE based on a pseudo-random generator (Blasone et al. 2008). In another development, by examining the connection between the informal GLUE and formal probabilistic approaches in hydrological modelling, Nott et al. (2012) showed that, regardless of whether the ‘generalized likelihood’ is not a true likelihood, the connection between formal Bayes and GLUE is deeper than the mere operational aspect. In fact, GLUE corresponds to some approximate Bayesian procedure. According to Nott et al. (2012), ‘Two interpretations of GLUE emerge, one as a computational approximation to a Bayes procedure for a certain ‘error-free’ model and the second as an exact Bayes procedure for a perturbation of that model in which the truncation of the generalized likelihood in GLUE plays a role’. Chagwiza et al. (2015) introduced a framework that combines GLUE with mixed integer programming (MIP) for model uncertainty quantification. The introduced framework GLUEMIP performs better than MIP when tested using a water distribution system. Onyutha (2024b) showed that the use of a pseudo-random generator in GLUE does not take into account the uncertainty from the selection of an SC on uncertainty bounds. To fill this gap, randomized block quasi-Monte Carlo sampling (RBMC) (Onyutha 2024b) was introduced to improve GLUE.

The application of GLUE tends to be limited to the use of a single model and selected likelihood function. In other words, studies that extend the concept of GLUE, while considering jointly various sources of uncertainties, such as model differences and disparity in the influences of OFs, are lacking. Of the various uncertainty sources, what could require further elaboration is the influences from the differences among HMs. A significant driving force for marked diversity in HMs can be linked to various applications (Weiler & Beven 2015; Horton et al. 2022). Thus, the degree of sophistication for each model should be commensurate with the purpose for which the model is needed (Rosbjerg & Madsen 2005). Nevertheless, the complexity of a model cannot be a guarantee for its top performance (Fatichi et al. 2016). Another reason for the differences among models could be thought of in terms of the modelling context and this can be referred to as uniqueness of place (Beven 2001). Here, models are required to be parsimonious and fit for purpose (Beven & Young 2013). As a matter of fact, it is practically impossible for only one model to be fit for every purpose (Hämäläinen 2015). Differences among models have often been recognized in terms of ambiguity (Beck & Halfon 1991; Zin 2002; Beven 2006), a term used in differentiating models identified on the same model inputs that have overlapping predictive uncertainty bounds (Beck & Halfon 1991) or models for which forecasts are made with different model inputs that cannot be statistically differentiated (Zin 2002). When a dominant system response exists, its identification can possibly be without relative ambiguity (Beven 2006). Ambiguity arises not in the system or model itself, but only in decisions regarding its different representations (Kirchner et al. 2001; Beven 2006). Ambiguity is a less contentious word than equifinality (Beven 2006). This can explain why little attention has been given to the differences among models despite their importance in the analysis of the uncertainty of HM.

To even out the uncertainties from various models that can be used for a particular application, the use of the multi-model ensemble is encouraged, especially to boost the credibility of forecasts or predictions. Frameworks are lacking for quantifying generalized uncertainties on multi-model ensembles while considering equifinality and differences in models. The need to address this gap is the motivation for this study. This is in line with the need to attend to 20 of the unsolved problems in hydrology (UPH 20) (Blöschl et al. 2019).

Selected data and models

Quality-controlled HM inputs used by Onyutha (2022), including daily river flow, lumped or catchment-wide rainfall, and potential evapotranspiration (PET) across the Jardine River catchment (Figure 1), which covers an area of 2,500 km2 in Australia, were adopted in this study for demonstration. The Telegraph Line gauge station (site number 927001A), located at −11.152° (latitude) and 142.355° (longitude), was considered the outlet of the River Jardine catchment. The catchment is in the Queensland state of Australia. It has an equatorial climate. Each daily time series was from 1/1/1974 to 31/12/1989.
Figure 1

Study area (River Jardine catchment).

Figure 1

Study area (River Jardine catchment).

Close modal

Many HMs were considered for this study, such as the Nedbør-Afstrømnings-Model (NAM) (Nielsen & Hansen 1973), the Veralgemeend Conceptueel Hydrologisch Model (VHM) (Willems 2014), HM focusing on sub-flows' variation (HMSV) (Onyutha 2019), ABCD (Martinez & Gupta 2010), the simplified HM (SIMHYD) (Porter & McMahon 1971), and the probability-distributed model (PDM) (Moore 2007). All these are lumped conceptual rainfall-runoff models and they were selected because of their compatibility with the lumped nature of the meteorological inputs adopted for this study. For brevity, the structure of each model was not presented in this study and can be obtained from the respective original paper, as cited in the first sentence of this paragraph.

Sources of uncertainty

To demonstrate the application of the introduced multi-HM ensemble prediction uncertainty estimation (e-PRUNE) framework, consideration was given to uncertainties due to the choice of (i) an HM, (ii) an OF, and (iii) a parameter SC. For part (i), the models applied included ABCD, NAM, VHM, HMSV, SIMHYD, and PDM. The common OFs in line with part (ii) considered in this study included the Nash-Sutcliffe efficiency (NSE) (Nash & Sutcliffe 1970), Kling–Gupta efficiency (KGE) (Kling et al. 2012), index of agreement (IOA) (Willmott 1981), Taylor skill score (TSS) (Taylor 2001), Liu mean efficiency (LME) (Liu 2020), and coefficient of determination (R2). Also considered were the revised R-squared (RRS) and model skill score (MSC) metrics (Onyutha 2022). However, in this paper, updated versions of RRS and MSC are introduced to replace the original versions of Onyutha (2022). Note that is the distance correlation, while r denotes the Pearson correlation coefficient. Furthermore, consider and as the standard deviations of the observed and modelled series , respectively. Let the means of x and y be represented by and , respectively. The updated versions of MSC and RRS are given in the following equations:
(1)
(2)
(3)
(4)

When a certain constant is first added to the values of x and y before applying Equation (3). This is to ensure that each of the x and y series comprises only positive values. The said constant is taken as the greater of the absolute values of the maximum or minimum values of x. In any case, stream flows can never be negative. Thus, the case with may materialize when modelling other variables apart from stream flows. Importantly, it should be apparent that a model that considers variability is not required when

Let and be the coefficients of variation of x and y, respectively. Other ‘goodness-of-fit’ measures were computed using the following equations:
(5)
(6)
(7)
(8)
(9)
(10)
while and is the maximum attainable correlation between x and

Generally, the limitations of R2 are more well-known than those for other ‘goodness-of-fit’ measures. For instance, it does not determine how biased a model is. Furthermore, a poor model can yield R2 close to one. On the other hand, a good model can produce R2 close to zero. Notwithstanding these disadvantages, R2 is still arguably the most widely applied metric for assessing model quality (Onyutha 2022). This was why the use of R2 as an OF was considered in this study. Nevertheless, to ensure that the uncertainty range based on R2 was not exaggerated, two stringent criteria were considered. First, the threshold for populating the set of behavioural solutions was raised to 0.8 compared to, say, MSC that was kept at 0.60. Second, modelled series for which R2 was high (for instance, R2 = 0.8), while NSE and KGE remained low (e.g. less than zero), were excluded from being considered part of the set of behavioural solutions. Here, NSE and KGE were considered because they are actually variants of R2.

Some of the well-known parameter sampling methods include the simple Monte Carlo random sampling (denoted in this study as RND) and Latin hypercube sampling (LHS). Recently, RBMC (Onyutha 2024b) was introduced to support GLUE. The robustness of RBMC in uncertainty analysis lies in its uniqueness in yielding ‘independent’ SCs based on the stipulated configurations. This property is not possessed by the common existing methods such as RND and LHS. For demonstration of the uncertainty framework, this study applied eight RBMC configurations denoted as ) with . In other words, the SCs used in this study included RND, LHS, qMC2, qMC3,…, and qMC9. RBMC coded in MATLAB with a demonstration of its application for uncertainty quantification can be found freely downloadable via https://doi.org/10.5281/zenodo.10702810. An overview of the three key sources of uncertainty considered in this study can be seen in Figure 2.

Uncertainty estimation framework

The summary of the general steps for the e-PRUNE framework is as follows.
  • 1. List the HMs to be applied for the uncertainty analysis. For instance, this study applied six models: NAM, PDM, VHM, HMSV, SIMHYD, and ABCD. Build up each model using the same datasets of a given catchment. Set the upper and lower limits of the parameters of a selected HM.

  • 2. Determine the number of OFs considered for the hydrological modelling. Here, there were eight OFs: NSE, KGE, IOA, TSS, RRS, LME, MSC, and R2. Set the threshold of the chosen OF for determining the behavioural solutions.

  • 3. Decide on the number of SCs for generating parameters of a selected model during modelling. This study employed RND, LHS, and ) with . Set the upper and lower limits of each model's parameters.

  • 4. Set the size of the set of behavioural solutions. This could be, for instance, 1,000 modelled series that yield values of OFs above the set threshold. For relative error measures, the best model performance is indicated by values close to one.

Figure 2

Three sources of calibration uncertainty were considered in this study: the choice of an SC, selection of an objective function, and choice of a HM.

Figure 2

Three sources of calibration uncertainty were considered in this study: the choice of an SC, selection of an objective function, and choice of a HM.

Close modal

There are two approaches of the e-PRUNE framework. The summary of the specific steps of the first approach of e-PRUNE is as follows.

  • (a) Select the first model (e.g. PDM) from step (1).

  • (b) Pick the first SC (e.g. RND) from step (3).

  • (c) Use the selected SC (e.g. RND) from step (3) to generate the values of the parameters within the limits stipulated in step (a).

  • (d) Choose the first OF (e.g. NSE) and determine its set threshold from step (2).

  • (e) Run the model from step (a) information in steps (b)–(d) several times until the size of the set of behavioural solutions stipulated in step (4) is achieved.

  • (f) From step (e), select and make the modelled series with the best (e.g. maximum) value of the OF as an optimal solution.

  • (g) Go to the next objective (e.g. KGE) from step (2) and repeat steps (e)–(f). Do this until all the OFs in step (2) are used.

  • (h) Go to the next SC (e.g. LHS) from step (3) and repeat steps (d)–(g).

  • (i) Go to the next model (e.g. VHM) from step (1) and repeat steps (b)–(h).

  • (j) Consider the overall ensemble mean to be the average of the modelled series obtained in step (f).

  • (k) Using the collection of all the sets of behavioural solutions from step (e), determine the total number of all the series in the collection and denote it as

  • (l) Determine the upper limit of the (100 − α)% confidence interval (CI) as the value. Similarly, the lower CI limit is taken as the value.

The summary of the second approach of e-PRUNE is as follows.

  • (i) Set the total number of simulations (ntsm) to a large value, e.g. 10,000.

  • (ii) Randomly select a model from step (1).

  • (iii) Randomly choose an SC from step (3) and generate the parameters of the model selected in step (iii).

  • (iv) Randomly select an OF from step (2). Also, consider the threshold of the selected OF as set from step (2).

  • (v) Run the model from step (ii) using parameters in step (iii) and information in step (iv). Keep the modelled series if it is behavioural. Otherwise, discard the modelled series.

  • (vi) Repeat steps (ii)–(v) until the size of the set of behavioural solutions is equal to the number of simulations stipulated in (i).

  • (vii) Consider the average of all the behavioural solutions from step (v) as the overall ensemble mean.

  • (viii) The upper and lower limits of the (100 − α)% CI are determined as the and values, respectively.

The first approach of e-PRUNE is important for obtaining insights into the contributions from the various uncertainty sources to the overall predictive uncertainty bounds. However, the focus of the second approach is on the overall predictive bound, regardless of the amount of contribution from each uncertainty source.

In this paper, focus was given to the first approach of the e-PRUNE framework. For its detailed description, Figure 3 shows the framework for overall uncertainty stemming from the choice of a model, selection of an SC, and opting for. For the ease of presentation of the flow chart, it is important to consider the number of simulations , the number of parameters of a model , the number of HMs , best value of , adjustable threshold of where (here, the term is initially set to 0.01), computed value of an OF , , number of behavioural simulations , number of SCs , lower limit of the parameter , and upper limit of the parameter. The limits of the parameters of each selected model can be found in Tables A1–A6 of the Supplementary Material.
Figure 3

Procedure to consider the various sources of calibration uncertainty.

Figure 3

Procedure to consider the various sources of calibration uncertainty.

Close modal

The simulation experiments used the following: = 8, , and = 6. The first approach of the introduced framework starts with the stipulation of the upper and lower limits of every parameter of each model. This is followed by the stipulation of and . In a standard GLUE, one would fix and subsequently is the number of acceptable solutions. Based on some factors like the rigidity of the likelihood threshold, such an approach could lead to a small that would end up affecting the reasonableness of the uncertainty bound. A good idea as adopted for the introduced framework is to ensure that is far greater than . Furthermore, is made large enough as well. For instance, and were set to 10,000 and 5,000, respectively.

In the next step, we choose the HM such that . Subsequently, for every model, select the OF such that . Finally, considering the model and OF, we choose the SC such that . For each combination of model, OF and SC, simulations are targeted to obtain behavioural solutions. The detailed procedure, based on RBMC, to estimate model uncertainty, while considering the influence of the choice of an SC, can be found in Onyutha (2024b). While considering OFs, the initial value of was set to 0.01, thereby making . The value for each of the various OFs considered in this study is 1. For the standard GLUE, a rigid threshold for a likelihood function tends to be considered under fixed . This, as hinted shortly before, has the implication of making small. To differ from the standard GLUE, the introduced framework considers an adjustable threshold to easily obtain the stipulated . This, in turn, permits to be made large enough. To do so, when is reached while the total acceptable modelled series is still less than the stipulated , we increase using where to obtain a new Consider that, say, with and , a total of 1,800 (instead of the 2,000) acceptable modelled series were obtained using when . It means that we make such that . We set again and run the model until the remaining or behavioral solutions are obtained and the procedure is stopped; otherwise, the procedure is repeated. The idea of ensuring that keeps reducing is only when Otherwise, for instance, if and the worst value of the is 1, we cause to keep increasing using with Furthermore, the value of can be set on a case-by-case basis. For instance, was set to 0.01, 0.02, …, 0.1 based on the desired computation load.

Using one model, we can choose a particular SC and vary the OFs. Here, using an OF, the optimal modelled series leads to the best value of the OF. For every observed flow value, there are modelled events, which can be ranked from the highest to the lowest. The upper and lower limits of the (100 − α)% CI on the optimal modelled event using a particular OF are given by and modelled values, respectively. In this study, α was taken as 5%. Repeating the procedure with all the OFs and SCs using one model, the ensemble can be obtained as the average of the sets of series. Finally, when the procedure is repeated with all the models, the overall model ensemble is obtained as the average of the sets of series. The final limits of the (100 − α)% CI on can be obtained as described in step (l) of the procedure for the first approach of e-PRUNE.

Model quality and distribution of parameters

Figure 4 shows the performance of each of the selected models when applied using various OFs. The chart on the left shows the maximum (or optimal) value of the OF. These results show that the performance of each model was good. The optimal values of IOA and TSS were very high (close to one) for all models. The best values of NSE following the application of ABCD, HMSV, NAM, PDM, SIMHYD, and VHM when the parameters of each model were generated using RND were 0.72, 0.74, 0.64, 0.74, 0.70, and 0.68, respectively. The chart on the right in Figure 4 shows the ratio of the number of times a model yielded simulations with an OF greater than 0.5 to the total number of behavioural solutions. Differences in the results of a given model show the influence of the choice of an OF. The OFs are different with respect to their sensitivities to outliers, bias, and variability, as demonstrated in Onyutha (2024a). Differences in the results of a particular OF show the influence of the choice of a model. Structural differences among models could be linked to the number of hydrological processes (or parameters). For instance, the numbers of parameters of ABCD, HMSV, NAM, PDM, SIMHYD, and VHM considered for calibration were 6, 10, 17, 14, 11, and 14, respectively. The details and ranges of these parameters are provided in Table A1–A6 of the Supplementary Material.
Figure 4

Model performance based on various OFs.

Figure 4

Model performance based on various OFs.

Close modal
Figure 5 shows the variation of the relative frequency (ratio of absolute frequency to ) for six selected parameters of each model. An overview of these parameters, including their names and ranges used for analysis, can be found in Tables A1–A6 of the Supplementary Material. Not all the parameters of each model were relevant at the same level. There were several rounds of simulations for each model. Each parameter had a wide range in the first round of simulations. Based on the sensitivities, ranges of a few parameters were reduced for the second round of simulations, and the process was repeated. The assessment of the sensitivity of parameters was with respect to the overall water balance closure. Reduction of the parameter space led to reduced sensitivities of the fixed ranges of the parameters. This, in turn, was found to depend on the chosen OF.
Figure 5

Frequency distribution of selected parameters for various HMs, including (a)–(f) ABDC, (g)–(l) PDM, (m)–(r) SIMHYD, (s)–(x) HMSV, (y)–(ad) NAM, and (ae)–(aj) VHM based on various OFs.

Figure 5

Frequency distribution of selected parameters for various HMs, including (a)–(f) ABDC, (g)–(l) PDM, (m)–(r) SIMHYD, (s)–(x) HMSV, (y)–(ad) NAM, and (ae)–(aj) VHM based on various OFs.

Close modal

The application of each of the eight selected OFs allowed the narrowing of the range over which every parameter could be generated. Narrowing each parameter's range (within which the optimal value of the respective parameter can be found) increases the precision with which observations can be predicted. High precision can be realized in terms of the narrow width of the uncertainty range. The precision of a model can depend on a number of factors, such as the choice of the OF, the number of model parameters, and the selected SC. For instance, how a chosen OF relates to the values of a particular parameter varies among the parameters.

The identifiability of each parameter of a given model varied among the OFs (Figure 5(a)–5(ai)), and it depended on the choice of the SC. The identifiability of a parameter is the degree to which the parameter can be endorsed based on available information that can be validated in reality (Zhao et al. 2008). It is about the level of precision with which each parameter's values can easily be identified. The identifiability of a parameter is an important aspect of the conceptualization of a model's structure. It can be enhanced through a strong linkage of the hydrological processes and the parameters of a model (Guse et al. 2017). Adjustments of a model structure are normally guided by measurements if available. In the absence of measurements, estimations are undertaken and the guess of the starting point for each parameter depends on the experience and expert judgement of the modeller. Complexity in parameter identifiability increases with the number of a model's parameters. Parsimonious models contain a few parameters and the components that can be identified from observations are not many. Having too few parameters can limit the number of hydrological processes or components to be considered within a model structure. Oversimplification of the model's structure comprises the challenge of failure to meet the expected purpose of the modelling exercise. On the other hand, including too many parameters or processes increases a model's complexity and this can affect the identifiability of parameters. This study considered models with parameters ranging from 6 (ABCD) to 17 (NAM). This was to consider the influence of the identifiability of models' parameters on the overall predictive uncertainty.

The main sources of uncertainty

Figure 6 shows the influence of each of the three main uncertainty sources on . As shown in Figure 6, was set to 20,000 and this was without fixing . Nevertheless, varied among the SCs (Figure 6(c)), OFs (Figure 6(c)), and HMs (Figure 6(c)). Thus, apart from the choice of the calibration method, there are other sub-sources of calibration uncertainty. The relationships among OFs tend to be non-linear. Thus, for instance, it is difficult during calibration for a modeller to determine the value of NSE that corresponds to that of MSC or TSS. Again, even for the same model and a particular SC, it is difficult to have a similar number of behavioural simulations under a fixed . Differences among the results of the various models reflect the dissimilarities in the structures of models. Furthermore, it is nearly an impossible task to stipulate ranges of two separate sets of parameters of two given models to ensure they yield the same . Therefore, as considered in this study, and were fixed, while was set to vary until the stipulated was obtained. The values of of NSE, KGE, LME, IOA, MSC, RRS, R2, and TSS, for which the stipulated was obtained, were 0.62, 0.56, 0.58, 0.76, 0.57, 0.60, 0.80, and 0.72, respectively.
Figure 6

Influence of the choice of the (a) SC, (b) OF and (c) the HM on the number of behavioural solutions from each model.

Figure 6

Influence of the choice of the (a) SC, (b) OF and (c) the HM on the number of behavioural solutions from each model.

Close modal

Influence of the choice of HM

Figure 7 shows the results of simulations from various HMs using NSE when parameters were generated based on RND. The thin dashed line is the optimal modelled series. The solid blue line represents the observed flow. The grey band, around the optimal modelled series, shows the various behavioural solutions. All the selected models captured the variability in the observed flow (Figure 7(a)–7(f)). However, the extent of the mismatches between the observed and modelled data points, especially for the peak high flows, varied among the models.
Figure 7

Uncertainty bounds on stream flows modelled by (a) ABCD, (b) PDM, (c) SIMHYD, (d) HMSV, (e) NAM, and (f) VHM. Parameters were generated using RND and the model was calibrated using NSE as the OF.

Figure 7

Uncertainty bounds on stream flows modelled by (a) ABCD, (b) PDM, (c) SIMHYD, (d) HMSV, (e) NAM, and (f) VHM. Parameters were generated using RND and the model was calibrated using NSE as the OF.

Close modal

Though, for brevity, the results, based on only NSE, were presented, increasing the number of OFs led to an increase in the width of the uncertainty bounds for each HM. Furthermore, the bounds of the uncertainty varied in width among the models. Generally, the width of the uncertainty bound is taken to reflect the precision with which the observed flow is reproduced. However, many factors affect the magnitudes of the uncertainties of models. Some of these factors include the compatibility of a model's structure with available data, errors in meteorological inputs, and errors in river flow (or calibrating data). Generally, these results reflect the need to apply many models to consider the influences of choosing a particular model based on a given OF on generating a set of modelled series.

Influence of the choice of an objective function

Figure 8 shows the results from HMSV when parameters were generated using RND and the model calibrated based on various OFs. The width of the uncertainty bound varied over the sub-period with a large mismatch between the observed and modelled flow. Over the said sub-period, the width of the uncertainty bound varied among the objective functions (Figure 8(a)–8(h)). Just like for HMSV, variation in the uncertainty obtained using each of the other models considered in this study was found to depend on the selected OF. For a particular model, the uncertainty bounds became wider with an increase in the number of OFs (Figure 8(a)–8(h)). Thus, it is vital to consider the choice of the OF as a sub-source of the calibration-related model uncertainty.
Figure 8

Uncertainty bounds on stream flows modelled by HMSV when its parameters were generated by RND and the model calibrated based on (a) IOA, (b) KGE, (c) LME, (d) NSE, (e) MSC, (f) RRS, (g) R2, and (h) TSS.

Figure 8

Uncertainty bounds on stream flows modelled by HMSV when its parameters were generated by RND and the model calibrated based on (a) IOA, (b) KGE, (c) LME, (d) NSE, (e) MSC, (f) RRS, (g) R2, and (h) TSS.

Close modal

Influence of the choice of sampling scheme

Figure 9 shows the results of HMSV when its parameters were generated by various SCs. When the OF was relaxed (for instance, by reducing from 0.55 to 0.35), the width of the uncertainty bound on the considered flow event increased substantially. Generally, the uncertainty bound over the section with large mismatches between the observed and modelled flow events was wider under the influence of the choice of an OF (Figure 8(a)–8(h)) than that of selecting an SC (Figure 9(a)–9(j)). The results of similar analyses, conducted using other models, still showed variation in the uncertainties based on the different OFs. For each model, the bounds of the uncertainty increased as the results, based on several OFs, were combined.
Figure 9

Uncertainty bounds on stream flows modelled by HMSV when its parameters were generated by (a) RND, (b) LHS, and (c)–(j) qMC2–9. The results of each chart were obtained when the model was calibrated using NSE as the OF.

Figure 9

Uncertainty bounds on stream flows modelled by HMSV when its parameters were generated by (a) RND, (b) LHS, and (c)–(j) qMC2–9. The results of each chart were obtained when the model was calibrated using NSE as the OF.

Close modal

Overall model uncertainty

Figure 10 shows the overall uncertainty bounds obtained by combining the results based on the three sub-sources of uncertainty. Contributions of the three uncertainty sub-sources to the overall uncertainty vary a lot. Three cases were considered. In the first case, one model was selected and run based on only NSE but for parameters generated by each of the chosen SCs. In the second case, the same model as in the first case was run using each of the eight OFs but with parameters generated by RND only. For a given model, differences in the widths of the uncertainty bounds were greater for the second than the first case. This suggested that the uncertainty, due to the choice of an OF, was greater than that based on the selection of an SC. In the third case, a given combination of an OF like NSE and an SC (such as RND) was chosen. Using the combination, each of the selected HMs was calibrated. Here, differences in widths of uncertainty bounds among the models were greater than for the results of either the first or second case. Thus, for the data used in this study, uncertainty due to the choice of a model was greater than that for the selection of SC or OF. This may suggest that ignoring the uncertainty from SCs or OFs and only focusing on the uncertainty due to the choice of a model could still capture a substantial percentage of the overall uncertainty range. However, for the validity of this suggestion, a large number of structurally different HMs are required.
Figure 10

Overall uncertainty was obtained from the combination of sub-uncertainties based on the choices of HM, OF and SC. These results are for (a) daily and (b) monthly scales.

Figure 10

Overall uncertainty was obtained from the combination of sub-uncertainties based on the choices of HM, OF and SC. These results are for (a) daily and (b) monthly scales.

Close modal

The interactive contributions to the overall uncertainty from the various sources may be additive or multiplicative in nature. For the additive case, the effect of removing one source of uncertainty, for instance, OF, can easily be realized from the overall uncertainty bound. This could be in terms of a slight amplification of the overall uncertainty due to combined contributions from various uncertainty sources. If the interactive contributions from the various uncertainty sources were multiplicative, the width of the uncertainty bound would be so much more pronounced than in the case shown in Figure 10. Even when considering the sources of uncertainty individually, the width of the uncertainty bounds can increase under various circumstances, for instance, (i) when the threshold of an OF (such as NSE, R2, and IOA) is reduced, (ii) when the parameter space gets larger, and (iii) when the number of structurally different HMs being applied is increased.

The results of hydrological modelling can be influenced by the scale, as shown in the results for daily (Figure 10(a)) and monthly (Figure 10(b)) scales. Aggregation of stream flows from daily to monthly scale leads to a reduction in the temporal variability or fluctuations in the series. Nevertheless, the results of the uncertainty remain valid under both daily and monthly scales.

The choice of scale is generally linked to the purpose of the analysis. For instance, the choice of daily scale is relevant for uncertainty analysis of risk-based hydrological quantiles necessary for the design of water resource management applications and hydraulic structures, such as dikes, dams, and irrigation systems. Uncertainty about quantiles of high temporal scales enables the modeller to estimate the cost implications of constructing and operating water resource management applications or hydraulic structures designed under various modelling uncertainties.

The results of an HM vary across catchments. This is because catchments tend to differ with respect to climatic and physiographic characteristics. Thus, the results of this study are limited to the catchment to which the models were applied. Furthermore, the results of a model depend on the sub-period considered for hydrological analysis. Due to climate variability, hydroclimatic conditions tend to differ over sub-periods. This can result in the changes of values of a given parameter over time, thereby affecting temporal transferability of a given model.

Different sources of model uncertainty contribute in varying proportions to the overall uncertainty bounds. In this study, the magnitude of the uncertainty due to OF was found to be substantial. There have been proposals for modellers to prefer multi-objective calibration (MOC) to single-objective calibration (SOC) of HMs. This was on the assumption that MOC leads to better results than in the case when SOC is used (Kim & Lee 2014; Mostafaie et al. 2018). In a relevant study by Zhang et al. (2018) on the mentioned proposal, MOC appeared not to necessarily yield higher model quality than SOC. In fact, large differences exist among results from various OFs (Krause et al. 2005; Onyutha 2024a). In this line, it is a difficult task to determine comparable thresholds of two or more OFs that could be used for determining behavioural solutions. Furthermore, even for the SOC, each OF for calibrating HMs has its advantages and disadvantages (Onyutha 2024a). By increasing the number of uncertainty sources, the overall uncertainty bound increased, thereby corroborating the importance of considering as many sources of uncertainty as possible in HM results. In this study, the SOC approach was considered. The uncertainty bounds in using SOC and MOC are deemed to be different.

The uncertainty bounds, based on the influence of the choice of a model, were the largest. This reflected the differences in the structures of the selected models, regarding their compatibility with the data of the chosen catchment. The idea of the differences among models can be easily given. For a given model, various hydrological processes are considered while conceptualizing the model structure. Some hydrological processes differ in various regions. Besides, catchments, of even the same climate region, can differ among themselves in terms of geology, soil, topography, drainage area, topographical aspect, land use and land cover types, and extent of urbanization. Therefore, the concept that a model can depend on physical laws and parameters established a priori to yield accurate mathematical results for different hydrological conditions, catchments, and climates concerns an impossible task or the so-called Hydrologic ElDorado (Grayson et al. 1992; Woolhiser 1996). The reasonableness of the said concept is further challenged by the non-linearity of hydrological processes, dependence on spatio-temporal scales, equifinality, and system heterogeneity (Beven & Cloke 2012). For instance, differences in characteristics of climate regions require a focus on some distinct processes such as snow melting in a temperate (but not tropical) region, and this can influence the structuring of an HM. Importantly, the compatibility of a model structure with observations will vary among catchments of even the same climate region. In summary, differences among models arise from dissimilarities of concepts for representing hydrological processes (Weiler & Beven 2015). Thus, model differences comprise an important consideration in making predictions to yield information for planning water resource management. Furthermore, regarding influences from the choice of a model, understanding inter-model and intra-model differences is important for the comparison of simulations of modelled results. Inter-model difference means that results from any two structurally dissimilar models tend to be different and this is due to the divergences in the concepts and assumptions in structuring the models (Butts et al. 2004). Furthermore, under the same initial condition, a particular model (due to its structural complexity) will always yield dissimilar simulations and this is linked with the intra-model differences.

In this study, the multi-model ensemble of modelled flow series was obtained by averaging. In fact, the idea of combining model results to obtain ensembles is not fresh (Twedt et al. 1977). Many studies applied averaging of model outputs when considering mean hydrological conditions (Rojas et al. 2008; Kumar et al. 2015; Onyutha et al. 2021). Applications of model averaging to extreme hydrological events are often rarely considered by hydrologists except on a few occasions like in Onyutha et al. (2021). Apart from the use of simple arithmetical averaging of all the realizations, other approaches, such as the weighted mean of outputs, also exist. Thus, ensembles could also vary with respect to the methods of combining the various model results.

Though not considered in this study, other sub-sources of model uncertainty exist such as (i) observation data errors, (ii) the choice of optimizers, and (iii) the method of calibration. Clarifications on these uncertainty sub-sources are as follows:

  • (i) Model inputs include rainfall and temperature or potential evapotranspiration. Model inputs are characterized by gaps in missing records, coarse spatio-temporal resolutions, short data records, and issues of data entry such as inaccurate placement of decimal points. Poor calibration of data recording equipment leads to bias in measurements. Errors in observed river flow or calibration data can arise due to inaccurate extrapolation or interpolation of the rating curve and hysteresis (McMillan et al. 2012; Sikorska et al. 2013; Moges et al. 2020).

  • (ii) Calibration is the process of changing the values of parameters to reduce the mismatch between the observed and modelled series. The calibration data for the rainfall-runoff model are observed river flows. However, optimization deals with the adjustments of the parameters of a model to guarantee the behaviour of a model with respect to a certain policy that can maximize performance without reference to measured data. Calibration can be manual or automatic. It is known that manual calibration tends to be subjective and time-consuming (Boyle et al. 2000). Automatic calibration uses some strategies that can guarantee minimal model residuals, by allowing global optimum to be located within the response surface characterized by several local minima. An example of a global search technique is the shuffled complex evolution (Duan et al. 1992). The results of automatic or manual calibration can be different to some extent.

  • (iii) For automatic calibration, there are various strategies or search algorithms/optimizers. Examples of calibration strategies include GLUE (Beven & Binley 1992) and the Bayesian MCMC simulation. Many optimizers exist such as charged system search (Kaveh & Talatahari 2010), genetic algorithm (Goldberg 1989; Wang 1991), simulated annealing (Kirkpatrick et al. 1983), sequential simplex optimization (Walters et al. 1991), Powell's method (Powell 1964), differential evolution (Storn & Price 1997), particle swarm optimization (Kennedy & Eberhart 1995), and direct local search or the Rosenbrock's method (Rosenbrock 1960). Others include the gravitational search algorithm (Rashedi et al. 2009), non-dominated sorting genetic algorithm II (Deb et al. 2002), ant colony optimization (Dorigo et al. 1996), dolphin echolocation (Kaveh & Farhoudi 2013), differential evolution (Storn & Price 1997), big bang-big crunch (Erol & Eksin 2006), grey wolf optimizer (Mirjalili & Hashim 2010), bacterial foraging (Passino 2002), and bat-inspired algorithm (Yang 2010). Each of the various optimization methods or algorithms has its own advantages and disadvantages. Besides, these optimizers have differences. Thus, results of calibration based on various optimizers are expected to be dissimilar to some extent. This means that the choice of an optimizer is a sub-source of calibration uncertainty.

In this study, model residuals were assumed to be of a nature that could allow them to be treated implicitly. Furthermore, the residuals in both calibration and prediction are assumed to be similar in nature. In other words, hyperparameters of an error model resulting from calibration tend to be identical to those based on prediction (Beven & Binley 2014). Explicit representation of errors is another option, by an explicit representation or by non-parametric technique (Beven & Smith 2015).

Apart from GLUE, several approaches for analysing uncertainty in HMs exist such as the differential evolution adaptive metropolis (Vrugt 2016), pseudo-Bayesian (Freer et al. 1996), parameter estimation algorithm (Doherty 2010), classical Bayesian (Thiemann et al. 2001), multi-model averaging methods (Vrugt & Robinson 2007), sequential data assimilation (Moradkhani et al. 2005), MOC analysis (Hadka & Reed 2013), and Bayesian total error analysis (Kavetski et al. 2003). It is deemed that these methods can lead to dissimilar uncertainty results due to the differences among them. Thus, the choice of uncertainty analysis method is a sub-source of the predictive uncertainty.

The non-linear nature of hydrological modelling equations makes the use of analytical approaches of uncertainty quantification an unstraightforward task. As a result, there are a number of uncertainty estimation methods such as the Bayesian total error analysis, sequential data assimilation, and GLUE. The popularity of GLUE compared with other methods is linked to its simplicity. However, little attention is normally given to differences among models when applying the original GLUE framework. This study introduced a framework that considers differences among models for generating ensemble uncertainty bounds.

The potential of the introduced framework was demonstrated using three sub-sources of calibration-related uncertainty, including the influence of the choice of an OF, selection of HM, and opting for a particular SC. Sub-uncertainty bounds on the multi-model ensemble, based on the choice of an OF, were wider than those due to the selection of a particular SC. However, the sub-uncertainty bounds on the multi-model ensemble due to the choice of an HM were wider than those due to the selection of an SC or OF. The uncertainty bounds based on combined contributions from HM, SC, and OF are wider than that for a particular uncertainty source. This means that the larger the number of sources of uncertainty, the more representative the predictive uncertainty. The introduced framework is recommended to consider various sources of uncertainty from different models.

It is recommended that other sources of uncertainty should be incorporated within the framework including the influences from the choice of optimizers, observation data errors, initial and boundary conditions, and if possible, issues of model transferability. Given that several uncertainty sources were considered, this study could not give an in-depth analysis of the identifiability of parameters from the various models, based on different SCs and OFs. A comprehensive assessment of parameter identifiability under various sources of uncertainty is recommended for a future study.

This research received no external funding.

The author acknowledges that this study used part of the modelling data from Onyutha (2022).

Data used in this study can be obtained as example dataset of Rainfall Runoff Library (RRL) downloadable via https://toolkit.ewater.org.au/Tools/RRL (June 16, 2022).

The authors declares there is no conflict.

Beck
M. B.
&
Halfon
E.
(
1991
)
Uncertainty, identifiability and the propagation of prediction errors: a case study of Lake Ontario
,
Journal of Forecasting
,
10
(
1–2
),
135
161
.
https://doi.org/10.1002/for.3980100109
.
Beven
J. K.
(
2001
)
Rainfall-Runoff Modelling – The Primer
.
Hoboken, NJ: John Wiley & Sons Ltd
.
Beven
K.
(
2006
)
A manifesto for the equifinality thesis
,
Journal of Hydrology
,
320
(
1–2
),
18
36
.
https://doi.org/10.1016/j.jhydrol.2005.07.007
.
Beven
K.
&
Binley
A.
(
2014
)
GLUE: 20 years on
,
Hydrological Processes
,
28
(
24
),
5897
5918
.
https://doi.org/10.1002/hyp.10082
.
Beven
K. J.
&
Cloke
H. L.
(
2012
)
Comment on ‘Hyperresolution global land surface modeling: meeting a grand challenge for monitoring Earth's terrestrial water’ by Eric F. Wood et al.
,
Water Resources Research
,
48
(
1
),
2011WR010982
.
https://doi.org/10.1029/2011WR010982
.
Beven
K.
&
Smith
P.
(
2015
)
Concepts of information content and likelihood in parameter calibration for hydrological simulation models
,
Journal of Hydrologic Engineering
,
20
(
1
),
A4014010
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000991
.
Beven
K.
&
Young
P.
(
2013
)
A guide to good practice in modeling semantics for authors and referees
,
Water Resources Research
,
49
(
8
),
5092
5098
.
https://doi.org/10.1002/wrcr.20393
.
Blasone
R.-S.
,
Vrugt
J. A.
,
Madsen
H.
,
Rosbjerg
D.
,
Robinson
B. A.
&
Zyvoloski
G. A.
(
2008
)
Generalized likelihood uncertainty estimation (GLUE) using adaptive Markov Chain Monte Carlo sampling
,
Advances in Water Resources
,
31
(
4
),
630
648
.
https://doi.org/10.1016/j.advwatres.2007.12.003
.
Blöschl
G.
,
Bierkens
M. F. P.
,
Chambel
A.
,
Cudennec
C.
,
Destouni
G.
,
Fiori
A.
,
Kirchner
J. W.
,
McDonnell
J. J.
,
Savenije
H. H. G.
… &
Zhang
Y.
(
2019
)
Twenty-three unsolved problems in hydrology (UPH) – a community perspective
,
Hydrological Sciences Journal
,
64
(
10
),
1141
1158
.
https://doi.org/10.1080/02626667.2019.1620507
.
Boyle
D. P.
,
Gupta
H. V.
&
Sorooshian
S.
(
2000
)
Toward improved calibration of hydrologic models: combining the strengths of manual and automatic methods
,
Water Resources Research
,
36
(
12
),
3663
3674
.
https://doi.org/10.1029/2000WR900207
.
Butts
M. B.
,
Payne
J. T.
,
Kristensen
M.
&
Madsen
H.
(
2004
)
An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation
,
Journal of Hydrology
,
298
(
1–4
),
242
266
.
https://doi.org/10.1016/j.jhydrol.2004.03.042
.
Chagwiza
G.
,
Jones
B. C.
,
Hove-Musekwa
S. D.
&
Mtisi
S.
(
2015
)
A generalised likelihood uncertainty estimation mixed-integer programming model: application to a water resource distribution network
,
Cogent Mathematics
,
2
(
1
),
1048076
.
https://doi.org/10.1080/23311835.2015.1048076
.
Christensen
S.
(
2004
)
A synthetic groundwater modelling study of the accuracy of GLUE uncertainty intervals
,
Hydrology Research
,
35
(
1
),
45
59
.
https://doi.org/10.2166/nh.2004.0004
.
Deb
K.
,
Pratap
A.
,
Agarwal
S.
&
Meyarivan
T.
(
2002
)
A fast and elitist multiobjective genetic algorithm: nSGA-II
,
IEEE Transactions on Evolutionary Computation
,
6
(
2
),
182
197
.
https://doi.org/10.1109/4235.996017
.
Doherty
J.
(
2010
)
PEST, Model-Independent Parameter Estimation – User Manual, 5th ed.; with Slight Additions
.
Brisbane, Australia
:
Watermark Numerical Computing
.
Dorigo
M.
,
Maniezzo
V.
&
Colorni
A.
(
1996
)
Ant system: optimization by a colony of cooperating agents
,
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
,
26
(
1
),
29
41
.
https://doi.org/10.1109/3477.484436
.
Duan
Q.
,
Sorooshian
S.
&
Gupta
V.
(
1992
)
Effective and efficient global optimization for conceptual rainfall-runoff models
,
Water Resources Research
,
28
(
4
),
1015
1031
.
https://doi.org/10.1029/91WR02985
.
Erol
O. K.
&
Eksin
I.
(
2006
)
A new optimization method: big Bang–Big Crunch
,
Advances in Engineering Software
,
37
(
2
),
106
111
.
https://doi.org/10.1016/j.advengsoft.2005.04.005
.
Faramarzi
M.
,
Abbaspour
K. C.
,
Vaghefi
A. S.
,
Farzaneh
M. R.
,
Zehnder
A. J. B.
,
Srinivasan
R.
&
Yang
H.
(
2013
)
Modeling impacts of climate change on freshwater availability in Africa
,
Journal of Hydrology
,
480
,
85
101
.
https://doi.org/10.1016/j.jhydrol.2012.12.016
.
Fatichi
S.
,
Vivoni
E. R.
,
Ogden
F. L.
,
Ivanov
V. Y.
,
Mirus
B.
,
Gochis
D.
,
Downer
C. W.
,
Camporese
M.
,
Davison
J. H.
,
Ebel
B.
,
Jones
N.
,
Kim
J.
,
Mascaro
G.
,
Niswonger
R.
,
Restrepo
P.
,
Rigon
R.
,
Shen
C.
,
Sulis
M.
&
Tarboton
D.
(
2016
)
An overview of current applications, challenges, and future trends in distributed process-based models in hydrology
,
Journal of Hydrology
,
537
,
45
60
.
https://doi.org/10.1016/j.jhydrol.2016.03.026
.
Freer
J.
,
Beven
K.
&
Ambroise
B.
(
1996
)
Bayesian estimation of uncertainty in runoff prediction and the value of data: an application of the GLUE approach
,
Water Resources Research
,
32
(
7
),
2161
2173
.
https://doi.org/10.1029/95WR03723
.
Goldberg
D. E.
(
1989
)
Genetic Algorithms in Search, Optimization and Machine Learning
.
London, UK
:
Addison-Wesley Longman Publishing Co., Inc
.
Grayson
R. B.
,
Moore
I. D.
&
McMahon
T. A.
(
1992
)
Physically based hydrologic modeling: 2. Is the concept realistic?
,
Water Resources Research
,
28
(
10
),
2659
2666
.
https://doi.org/10.1029/92WR01259
.
Gupta
A.
&
Govindaraju
R. S.
(
2023
)
Uncertainty quantification in watershed hydrology: which method to use?
,
Journal of Hydrology
,
616
,
128749
.
https://doi.org/10.1016/j.jhydrol.2022.128749
.
Guse
B.
,
Pfannerstill
M.
,
Gafurov
A.
,
Kiesel
J.
,
Lehr
C.
&
Fohrer
N.
(
2017
)
Identifying the connective strength between model parameters and performance criteria
,
Hydrology and Earth System Sciences
,
21
,
5663
5679
.
https://doi.org/10.5194/hess-21-5663-2017
.
Hadka
D.
&
Reed
P.
(
2013
)
Borg: an auto-adaptive many-objective evolutionary computing framework
,
Evolutionary Computation
,
21
(
2
),
231
259
.
https://doi.org/10.1162/EVCO_a_00075
.
Hämäläinen
R. P.
(
2015
)
Behavioural issues in environmental modelling – the missing perspective
,
Environmental Modelling & Software
,
73
,
244
253
.
https://doi.org/10.1016/j.envsoft.2015.08.019
.
Horton
P.
,
Schaefli
B.
&
Kauzlaric
M.
(
2022
)
Why do we have so many different hydrological models? A review based on the case of Switzerland
,
WIRES Water
,
9
(
1
),
e1574
.
https://doi.org/10.1002/wat2.1574
.
Hossain
F.
&
Anagnostou
E. N.
(
2005
)
Assessment of a stochastic interpolation based parameter sampling scheme for efficient uncertainty analyses of hydrologic models
,
Computers & Geosciences
,
31
(
4
),
497
512
.
https://doi.org/10.1016/j.cageo.2004.11.001
.
Kaveh
A.
&
Farhoudi
N.
(
2013
)
A new optimization method: dolphin echolocation
,
Advances in Engineering Software
,
59
,
53
70
.
https://doi.org/10.1016/j.advengsoft.2013.03.004
.
Kaveh
A.
&
Talatahari
S.
(
2010
)
A novel heuristic optimization method: charged system search
,
Acta Mechanica
,
213
(
3–4
),
267
289
.
https://doi.org/10.1007/s00707-009-0270-4
.
Kavetski
D.
,
Franks
S. W.
,
Kuczera
G.
, (
2003
)
Confronting input uncertainty in environmental modelling
. In:
Duan
Q.
,
Gupta
H. V.
,
Sorooshian
S.
,
Rousseau
A. N.
&
Turcotte
R.
(eds.)
Water Science and Application
, Vol.
6
.
Washington, DC
:
American Geophysical Union
, pp.
49
68
.
https://doi.org/10.1029/WS006p0049
.
Kennedy
J.
&
Eberhart
R.
(
1995
). '
Particle swarm optimization
',
Proceedings of ICNN'95 – International Conference on Neural Networks
, Vol.
4
, pp.
1942
1948
.
https://doi.org/10.1109/ICNN.1995.488968
.
Kim
H. S.
&
Lee
S.
(
2014
)
Assessment of a seasonal calibration technique using multiple objectives in rainfall–runoff analysis
,
Hydrological Processes
,
28
(
4
),
2159
2173
.
https://doi.org/10.1002/hyp.9785
.
Kirchner
J. W.
,
Feng
X.
&
Neal
C.
(
2001
)
Catchment-scale advection and dispersion as a mechanism for fractal scaling in stream tracer concentrations
,
Journal of Hydrology
,
254
(
1–4
),
82
101
.
https://doi.org/10.1016/S0022-1694(01)00487-5
.
Kirkpatrick
S.
,
Gelatt
C. D.
&
Vecchi
M. P.
(
1983
)
Optimization by simulated annealing
,
Science
,
220
(
4598
),
671
680
.
https://doi.org/10.1126/science.220.4598.671
.
Kling
H.
,
Fuchs
M.
&
Paulin
M.
(
2012
)
Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios
,
Journal of Hydrology
,
424–425
,
264
277
.
Krause
P.
,
Boyle
D. P.
&
Bäse
F.
(
2005
)
Comparison of different efficiency criteria for hydrological model assessment
,
Advances in Geosciences
,
5
,
89
97
.
https://doi.org/10.5194/adgeo-5-89-2005
.
Kumar
A.
,
Singh
R.
,
Jena
P. P.
,
Chatterjee
C.
&
Mishra
A.
(
2015
)
Identification of the best multi-model combination for simulating river discharge
,
Journal of Hydrology
,
525
,
313
325
.
https://doi.org/10.1016/j.jhydrol.2015.03.060
.
Liu
D.
(
2020
)
A rational performance criterion for hydrological model
,
Journal of Hydrology
,
590
,
125488
.
https://doi.org/10.1016/j.jhydrol.2020.125488
.
Liu
Y.
,
Fernández-Ortega
J.
,
Mudarra
M.
&
Hartmann
A.
(
2022
)
Pitfalls and a feasible solution for using KGE as an informal likelihood function in MCMC methods: dREAM(ZS) as an example
,
Hydrology and Earth System Sciences
,
26
(
20
),
5341
5355
.
https://doi.org/10.5194/hess-26-5341-2022
.
Mantovan
P.
&
Todini
E.
(
2006
)
Hydrological forecasting uncertainty assessment: incoherence of the GLUE methodology
,
Journal of Hydrology
,
330
(
1–2
),
368
381
.
https://doi.org/10.1016/j.jhydrol.2006.04.046
.
Mantovan
P.
,
Todini
E.
&
Martina
M. L. V.
(
2007
)
Reply to comment by Keith Beven, Paul Smith and Jim Freer on ‘Hydrological forecasting uncertainty assessment: incoherence of the GLUE methodology’
,
Journal of Hydrology
,
338
(
3–4
),
319
324
.
https://doi.org/10.1016/j.jhydrol.2007.02.029
.
McMillan
H.
,
Krueger
T.
&
Freer
J.
(
2012
)
Benchmarking observational uncertainties for hydrology: rainfall, river discharge and water quality
,
Hydrological Processes
,
26
(
26
),
4078
4111
.
https://doi.org/10.1002/hyp.9384
.
Mirjalili
S.
&
Hashim
S. Z. M.
(
2010
). '
A new hybrid PSOGSA algorithm for function optimization
',
2010 International Conference on Computer and Information Application
, pp.
374
377
.
https://doi.org/10.1109/ICCIA.2010.6141614
.
Moges
E.
,
Demissie
Y.
,
Larsen
L.
&
Yassin
F.
(
2020
)
Review: sources of hydrological model uncertainties and advances in their analysis
,
Water
,
13
(
1
),
28
.
https://doi.org/10.3390/w13010028
.
Moore
R. J.
(
2007
)
The PDM rainfall-runoff model
,
Hydrology and Earth System Sciences
,
11
(
1
),
483
499
.
https://doi.org/10.5194/hess-11-483-2007
.
Moradkhani
H.
,
Hsu
K.
,
Gupta
H.
&
Sorooshian
S.
(
2005
)
Uncertainty assessment of hydrologic model states and parameters: sequential data assimilation using the particle filter
,
Water Resources Research
,
41
(
5
),
2004WR003604
.
https://doi.org/10.1029/2004WR003604
.
Mostafaie
A.
,
Forootan
E.
,
Safari
A.
&
Schumacher
M.
(
2018
)
Comparing multi-objective optimization techniques to calibrate a conceptual hydrological model using in situ runoff and daily GRACE data
,
Computational Geosciences
,
22
(
3
),
789
814
.
https://doi.org/10.1007/s10596-018-9726-8
.
Nash
J. E.
&
Sutcliffe
J. V.
(
1970
)
River flow forecasting through conceptual models part I – a discussion of principles
,
Journal of Hydrology
,
10
(
3
),
282
290
.
https://doi.org/10.1016/0022-1694(70)90255-6
.
Nielsen
S. A.
&
Hansen
E.
(
1973
)
Numerical simulation of the rainfall-runoff process on a daily basis
,
Hydrology Research
,
4
(
3
),
171
190
.
Nott
D. J.
,
Marshall
L.
&
Brown
J.
(
2012
)
Generalized likelihood uncertainty estimation (GLUE) and approximate Bayesian computation: what's the connection?
,
Water Resources Research
,
48
(
12
),
2011WR011128
.
https://doi.org/10.1029/2011WR011128
.
Onyutha
C.
(
2022
)
A hydrological model skill score and revised R-squared
,
Hydrology Research
,
53
(
1
),
51
64
.
https://doi.org/10.2166/nh.2021.071
.
Onyutha
C.
(
2024a
)
Pros and cons of various efficiency criteria for hydrological model performance evaluation
,
Proceedings of IAHS
,
385
,
181
187
.
https://doi.org/10.5194/piahs-385-181-2024
.
Onyutha
C.
(
2024b
)
Randomized block quasi-Monte Carlo sampling for generalized likelihood uncertainty estimation
,
Hydrology Research
,
55
(
3
),
319
335
.
https://doi.org/10.2166/nh.2024.136
.
Onyutha
C.
,
Amollo
C. J.
,
Nyende
J.
&
Nakagiri
A.
(
2021
)
Suitability of averaged outputs from multiple rainfall-runoff models for hydrological extremes: a case of River Kafu catchment in East Africa
,
International Journal of Energy and Water Resources
,
5
(
1
),
43
56
.
https://doi.org/10.1007/s42108-020-00075-4
.
Passino
K. M.
(
2002
)
Biomimicry of bacterial foraging for distributed optimization and control
,
IEEE Control Systems
,
22
(
3
),
52
67
.
https://doi.org/10.1109/MCS.2002.1004010
.
Porter
J. W.
&
McMahon
T. A.
(
1971
)
A model for the simulation of streamflow data from climatic records
,
Journal of Hydrology
,
13
,
297
324
.
https://doi.org/10.1016/0022-1694(71)90250-2
.
Powell
M. J. D.
(
1964
)
An efficient method for finding the minimum of a function of several variables without calculating derivatives
,
The Computer Journal
,
7
(
2
),
155
162
.
https://doi.org/10.1093/comjnl/7.2.155
.
Qi
W.
,
Zhang
C.
,
Fu
G.
,
Sweetapple
C.
&
Liu
Y.
(
2019
)
Impact of robustness of hydrological model parameters on flood prediction uncertainty
,
Journal of Flood Risk Management
,
12
(
S1
),
e12488
.
https://doi.org/10.1111/jfr3.12488
.
Rashedi
E.
,
Nezamabadi-pour
H.
&
Saryazdi
S.
(
2009
)
GSA: a gravitational search algorithm
,
Information Sciences
,
179
(
13
),
2232
2248
.
https://doi.org/10.1016/j.ins.2009.03.004
.
Renard
B.
,
Kavetski
D.
,
Kuczera
G.
,
Thyer
M.
&
Franks
S. W.
(
2010
)
Understanding predictive uncertainty in hydrologic modeling: the challenge of identifying input and structural errors
,
Water Resources Research
,
46
(
5
),
2009WR008328
.
https://doi.org/10.1029/2009WR008328
.
Rojas
R.
,
Feyen
L.
&
Dassargues
A.
(
2008
)
Conceptual model uncertainty in groundwater modeling: combining generalized likelihood uncertainty estimation and Bayesian model averaging
,
Water Resources Research
,
44
(
12
),
2008WR006908
.
https://doi.org/10.1029/2008WR006908
.
Rosbjerg
D.
,
Madsen
H.
, (
2005
)
Concepts of hydrologic modeling
. In:
Anderson
M. G.
&
McDonnell
J. J.
(eds.)
Encyclopedia of Hydrological Sciences
, 1st ed.
Hoboken, NJ
:
Wiley
.
https://doi.org/10.1002/0470848944.hsa009
.
Rosenbrock
H. H.
(
1960
)
An automatic method for finding the greatest or least value of a function
,
The Computer Journal
,
3
(
3
),
175
184
.
https://doi.org/10.1093/comjnl/3.3.175
.
Sikorska
A. E.
,
Scheidegger
A.
,
Banasik
K.
&
Rieckermann
J.
(
2013
)
Considering rating curve uncertainty in water level predictions
,
Hydrology and Earth System Sciences
,
17
(
11
),
4415
4427
.
https://doi.org/10.5194/hess-17-4415-2013
.
Spear
R.
&
Hornberger
G. M.
(
1980
)
Eutrophication in peel inlet – II. Identification of critical uncertainties via generalized sensitivity analysis
,
Water Research
,
14
(
1
),
43
49
.
https://doi.org/10.1016/0043-1354(80)90040-8
.
Stedinger
J. R.
,
Vogel
R. M.
,
Lee
S. U.
&
Batchelder
R.
(
2008
)
Appraisal of the generalized likelihood uncertainty estimation (GLUE) method
,
Water Resources Research
,
44
(
12
),
2008WR006822
.
https://doi.org/10.1029/2008WR006822
.
Taylor
K. E.
(
2001
)
Summarizing multiple aspects of model performance in a single diagram
,
Journal of Geophysical Research: Atmospheres
,
106
(
D7
),
7183
7192
.
https://doi.org/10.1029/2000JD900719
.
Thiemann
M.
,
Trosset
M.
,
Gupta
H.
&
Sorooshian
S.
(
2001
)
Bayesian recursive parameter estimation for hydrologic models
,
Water Resources Research
,
37
(
10
),
2521
2535
.
https://doi.org/10.1029/2000WR900405
.
Twedt
T. M.
,
Schaake
J. C.
&
Peck
E. L.
(
1977
). '
National weather service extended streamflow prediction
',
Proceedings of the 45th Western Snow Conference
, pp.
52
57
.
Vrugt
J. A.
(
2016
)
Markov chain Monte Carlo simulation using the DREAM software package: theory, concepts, and MATLAB implementation
,
Environmental Modelling & Software
,
75
,
273
316
.
https://doi.org/10.1016/j.envsoft.2015.08.013
.
Vrugt
J. A.
&
Robinson
B. A.
(
2007
)
Treatment of uncertainty using ensemble methods: comparison of sequential data assimilation and Bayesian model averaging
,
Water Resources Research
,
43
(
1
),
2005WR004838
.
https://doi.org/10.1029/2005WR004838
.
Walters
F. H.
,
Parker
L. R.
, Jr.
,
Morgan
S. L.
&
Deming
S. N.
(
1991
)
Sequential Simplex Optimization, Chemometrics Series
.
Boca Raton, FL
:
CRC Press
.
Wang
Q. J.
(
1991
)
The genetic algorithm and its application to calibrating conceptual rainfall-runoff models
,
Water Resources Research
,
27
(
9
),
2467
2471
.
https://doi.org/10.1029/91WR01305
.
Weiler
M.
&
Beven
K.
(
2015
)
Do we need a community hydrological model?
,
Water Resources Research
,
51
(
9
),
7777
7784
.
https://doi.org/10.1002/2014WR016731
.
Willmott
C. J.
(
1981
)
On the validation of models
,
Physical Geography
,
2
(
2
),
184
194
.
https://doi.org/10.1080/02723646.1981.10642213
.
Woolhiser
D. A.
(
1996
)
Search for physically based runoff model – a hydrologic El Dorado?
,
Journal of Hydraulic Engineering
,
122
(
3
),
122
129
.
https://doi.org/10.1061/(ASCE)0733-9429(1996)122:3(122)
.
Xiong
L.
&
O'Connor
K. M.
(
2008
)
An empirical method to improve the prediction limits of the GLUE methodology in rainfall–runoff modeling
,
Journal of Hydrology
,
349
(
1–2
),
115
124
.
https://doi.org/10.1016/j.jhydrol.2007.10.029
.
Yang
X.-S.
, (
2010
)
A new metaheuristic bat-inspired algorithm
. In:
González
J. R.
,
Pelta
D. A.
,
Cruz
C.
,
Terrazas
G.
&
Krasnogor
N.
(eds.)
Nature Inspired Cooperative Strategies for Optimization (NICSO 2010)
, Vol.
284
.
Berlin, Heidelberg
:
Springer
, pp.
65
74
.
https://doi.org/10.1007/978-3-642-12538-6_6
.
Zhang
R.
,
Liu
J.
,
Gao
H.
&
Mao
G.
(
2018
)
Can multi-objective calibration of streamflow guarantee better hydrological model accuracy?
,
Journal of Hydroinformatics
,
20
(
3
),
687
698
.
https://doi.org/10.2166/hydro.2018.131
.
Zhao
H.
,
Yang
X.
&
Jiang
Y.
(
2008
)
Assessment of Hydrological Model Structure based on Parameter Identifiability
,
Vol. 322
.
Wallingford, UK
:
IAHS Publications
, pp.
129
136
.
Zin
I.
(
2002
).
Incertitudes et ambiguté dans la modélisation hydrologique. Institut National Polytechnique de Grenoble
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data