Parameter estimation for rainfall-runoff models in ungauged basins is a key aspect for a wide range of applications where streamflow predictions from a hydrological model can be used. The need for more reliable estimation of flow in data scarcity conditions is, in fact, thoroughly related to the necessity of reducing uncertainty associated with parameter estimation. This study extends the application of a Bayesian procedure that, given a generic rainfall-runoff model, allows for the assessment of posterior parameter distribution, using a regional estimate of ‘hydrological signatures’ available in ungauged basins. A set of eight catchments located in southern Italy was analyzed, and regionalized, and the first three L-moments of annual streamflow maxima were considered as signatures. Specifically, the effects of conditioning posterior model parameter distribution under different sets of signatures and the role played by uncertainty in their regional estimates were investigated with specific reference to the application of rainfall-runoff models in design flood estimation. For this purpose, the continuous simulation approach was employed and compared to purely statistical methods. The obtained results confirm the potential of the proposed methodology and that the use of the available regional information enables a reduction of the uncertainty of rainfall-runoff models in applications to ungauged basins.

## INTRODUCTION

The lack of streamflow observations and the consequent high uncertainty associated with model outputs, pose serious limitations to modelling the hydrological response of ungauged catchments. The calibration of rainfall-runoff models, particularly when applying conceptual-type models, in the common case that the catchment of interest is ungauged or poorly gauged (e.g. long periods of data are missing, or large gauging errors exist), is a key point to achieve reliable predictions and represents a challenging task for hydrological science (Efstratiadis *et al.* 2014).

Commonly used strategies generally can either transfer parameters calibrated on similar gauged catchments, or employ observable geo-morphoclimatic characteristics of watersheds to either directly infer values for model parameters or to derive regression equations relating parameters to selected catchment attributes (for example Viviroli *et al.* 2009; Grimaldi *et al.* 2012).

Recent alternative options propose the integration of all the available knowledge conveying hydrologically meaningful information in the calibration procedure, searching the parameter sets, or their distribution, that better reproduce this type of information (e.g. Winsemius *et al.* 2009). In this context, signature-based model calibration, involving the use of hydrological signatures that reflect the functional behaviour of the catchment, has shown significant developments in numerous studies (Montanari & Toth 2007; Yadav *et al.* 2007; Blazkova & Beven 2009; Bulygina *et al.* 2009; Castiglioni *et al.* 2010; Shafii & Tolson 2015), and is deemed to be suitable in ungauged basins application to constrain the model response and to reduce the uncertainty in parameter estimation.

This study extends the application of a procedure placed in the context of Bayesian inference that, given a particular model structure, allows for the assessment of posterior parameter distribution, using hydrological signatures of watershed response available for ungauged catchments (Bulygina *et al.* 2009, 2011). Specifically, we used model independent information, namely the regionalized first three L-moments of annual streamflow maxima related to observable catchment characteristics by means of regressive relationships (Biondi & De Luca 2015).

The case study is a set of eight catchments located in a poorly gauged region in southern Italy, on which alternative schemes for conditioning posterior parameter distributions and constraining hydrological predictions were tested, with specific reference to the use of a rainfall-runoff model for design flood estimation. Two main categories of methods using hydrological models for design flood estimation can be distinguished: event-based and continuous simulation approaches. In this work the continuous simulation approach (Boughton & Droop 2003) was used and compared to purely statistical methods.

The main objectives of this research are to assess the applicability of the proposed methodology in constraining posterior parameter distribution, also considering regional signatures uncertainty, and to compare continuous simulation and statistical methods for design flood estimation.

The ‘Methods’ section firstly provides an overview of the methodologies used in the subsequent applications: the Bayesian procedure for parameter conditioning, the description of the L-moments regionalization, the synthetic rainfall generator and the hydrological model for the rainfall-runoff transformation.

The investigated area, and the available observed data, together with the regional and sample L-moments estimates are described in the section ‘Case study’. The ‘Results and discussion’ section focuses on the resulting marginal distribution for each of the hydrological model parameters, and on the comparison of continuous simulation application and classic statistical approaches for flood frequency analysis. Final remarks are drawn in the ‘Conclusions’ section.

## METHODS

For a generic rainfall-runoff model, the applied methodology aims to restrict the space of hydrological model parameters , when sufficiently long streamflow time series for their estimation via calibration are typically not available, considering the available regional information about catchment response in a Bayesian framework (Bulygina *et al.* 2009, 2012).

The parameter conditioning procedure, the regionalization method, the rainfall generator and the hydrological model used for the design flood estimation, are briefly described in the following sub-sections. The reader can refer to Biondi & De Luca (2015) for further details.

### Parameter conditioning

According to the Bayes’ inference, a *prior* model parameter distribution , expressing the analyst's prior knowledge about , and due to the limited information available often deliberately taken as multi-uniform and with independent marginal functions, can be revised through a likelihood function *L* to obtain a posterior probability for parameters .

When streamflow observations are not available, regional estimates of hydrological signatures can be used to evaluate the likelihood function instead of recorded time series (Bulygina *et al.* 2009). Specifically, the adopted signature-based likelihood , defining the distribution of for a particular , provides a weight that is prescribed to each parameter set on the basis of the closeness of simulated signature to the corresponding, regional and model-independent, value (Wagener & Montanari 2011).

In this work, *L* is assumed to be proportional to a normal distribution with expected value and variance , the latter accounting for inaccuracies in regional estimation of signatures.

STEP 0 estimates the regional signature , and the corresponding variance , for the ungauged watershed from regional regression relationships.

STEP 1 randomly samples parameter sets of the selected hydrological model from the prior model parameter distribution and runs the model at the same ungauged site using a Monte Carlo approach (with available, sufficiently long, time series of input data).

STEP 2 evaluates the simulated signature for each run corresponding to a parameter set (in this case the L-moments for simulated streamflow annual maxima are calculated).

STEP 3 associates each parameter set, according to the corresponding , to a weight based on the likelihood measure computed using a normal distribution of parameters (,).

STEP 4 allows for derivation of posterior parameter distribution , according to Bayes’ theorem describing the updated knowledge about the model parameter vector given the available information about catchment response; the posterior parameter distribution is approximated through a discrete multivariate distribution with values defined by the sampled parameter sets and corresponding probabilities equal to the normalized prescribed weights.

As a concluding remark, parameter sets drawn from the posterior distribution can be employed for further model applications. Moreover, when multiple signatures are considered simultaneously, the dependencies among different sources of information are formally accounted for using a multivariate normal distribution and an intersignature error covariance structure to obtain a proper estimate of uncertainty (Almeida *et al.* 2012).

### Regional estimation of hydrological signatures (L-moments of annual streamflow maxima)

The hydrological signatures considered to restrict hydrological model parameters and predictions are the first three L-moments of annual streamflow maxima, namely the first order L-moment (), the L-moment coefficient of variation () and the L-skewness (). Regional relationships for these signatures are available for the study region from a previous investigation aimed at flood frequency analysis (Laio *et al.* 2011; Biondi *et al.* 2012).

Three regression models were calibrated considering 37 gauged basins in southern Italy, representative of a wide range of hydrological conditions, and more than 70 basin descriptors of geomorphologic and climatic characteristics as explanatory variables.

The applied methodology, i.e. the iterative Generalized Least Squares method (iGLS, Griffis & Stedinger 2007), implies a joint estimation of the regression coefficients and of the model error variance (); the latter, being constant for all the basins, in combination with uncertainties related to sampling errors, enables the evaluation of the regional estimator variance (Reis *et al.* 2005). Table 1 reports the regression equations considering the identified optimal subset of explanatory variables; Table 1 also contains the indication of the model error variance .

Equation | Model error variance σ_{δ}^{2} |
---|---|

0.223 | |

0.010 | |

0.031 |

Equation | Model error variance σ_{δ}^{2} |
---|---|

0.223 | |

0.010 | |

0.031 |

is the latitude of the centre of the basin (m), is the of annual maxima of rainfall heights with a duration of six hours, *A* is the extension of basin (km^{2}), *H _{m}* is the mean catchment elevation (m a.s.l.),

*LC_4*is the percentage of non-vegetated areas,

*LC_1*is the percentage of urbanized areas,

*IPSO_INTERQ*is the interquartile range of the hypsographic curve (m),

*SLDP*is the slope of the longest drainage path (%),

*n*is the exponent of Amount-Duration-Frequency (ADF) curve for mean values of annual rainfall maxima, PERM_BA&MOBA is an indicator of the catchment permeability.

### Rainfall generator and hydrological model

The sub-daily rainfall series, suitable as input for the rainfall-runoff model, were obtained by using the two-stage rainfall generation, described in Biondi & De Luca (2015), which is composed of the following two steps:

generation of daily rainfall series; and

downscaling of the series obtained from the previous step, in order to obtain sub-daily rainfall values.

The parameters of the daily rainfall generator are estimated on the longer daily sequence, while the downscaling scheme is calibrated on shorter fine-scale records. A single-site copula based approach (Sirangelo *et al.* 2007; Serinaldi 2009) is adopted to generate synthetic daily rainfall sequences. The disaggregation of generated daily series into sub-daily rainfall heights is then carried out using a specific downscaling scheme tested on southern Italy in De Luca (2014): it consists of a microcanonical model (Molnar & Burlando 2005), with parameters that depend on rainfall heights at coarser resolutions, while dependency on time scale is different from one month to another.

A simple, lumped, conceptual rainfall-runoff model that couples the Soil Conservation Service-Curve Number (SCS-CN) method (United States Department of Agriculture (USDA)-SCS 1964) and the Nash cascade unit hydrograph (Nash 1957) is used in this study. It has been pointed out in several studies (e.g. Michel *et al.* 2005; Grimaldi *et al.* 2013) that the SCS-CN method suffers from many weaknesses when applied as an infiltration model at sub-daily time resolution: the method, indeed, is a conceptual model, supported by empirical data, originally developed for the calculation of total depth loss for a given storm event. Despite more suitable approaches being available, the CN method has been applied here due to its widespread use in the investigated region for flood design estimation, owing to its relative simplicity and its reliance on a limited number of parameters.

In the adopted configuration, the hydrological model has four parameters to be calibrated: the curve number *CN*, the initial abstraction ratio *λ*, the number of linear reservoirs *n* and their storage coefficient *k* [T]. It is worth mentioning that the continuous implementation of the SCS-CN method, used in this study, requires an additional parameter, namely an event separation time, which is set equal to the concentration time of the basin. The effect of antecedent moisture conditions, estimated as a function of the five-day antecedent rainfall amount and season category (dormant and growing seasons), has been considered to adjust the *CN* for storm to storm variation according to the NEH-4 tables (National Engineering Handbook, Section 4 — Hydrology, USDA-SCS 1964).

## CASE STUDY

The region is characterized by a Mediterranean climate, with rainy periods mainly coinciding with autumn and winter months while summers are hot and dry, strongly affecting the seasonal runoff cycle of the streams. The selected catchments range in size from 27 to 537 km^{2}; Table 2 summarizes the main catchments characteristics. By closely inspecting Table 2, it can be noted that the study catchments are characterized by different runoff producing capability as inferred from the CN values ranging from 50 (Coscile at Camerata catchment) to 76 (Alli at Orso and Alaco at Pirrella catchments) considering standard antecedent wetness conditions. For the purposes of this study, daily and 20-min rainfall data as well as time series of streamflow annual maxima, were employed. The observation periods for peak discharge are comprised between the years 1927 and 2009.

Code | Catchment | Area (km^{2}) | Mean elevation (m a.s.l.) | Length of the longest drainage path (km) | Mean annual precipitation (mm) | CN (–) | n° annual streamflow maxima |
---|---|---|---|---|---|---|---|

#1 | Alli at Orso | 46.47 | 1,143.6 | 24.93 | 1,256.43 | 76 | 47 |

#2 | Alaco at Pirrella | 31.68 | 968.97 | 16.91 | 1,573.56 | 76 | 13 |

#3 | Corace at Grascio | 177.34 | 822 | 43.84 | 1,173.46 | 75 | 35 |

#4 | Coscile at Camerata | 274.5 | 748.91 | 32.34 | 1,037.44 | 50 | 44 |

#5 | Esaro at La Musica | 537.37 | 520.18 | 45.73 | 1,160.11 | 64 | 18 |

#6 | Esaro at San Francesco | 87.89 | 111.46 | 17.26 | 664.68 | 70 | 10 |

#7 | Tacina at Rivioto | 77.07 | 1,302.87 | 31.36 | 1,241.79 | 73 | 25 |

#8 | Turbolo at Mongrassano | 27.9 | 306.97 | 13.83 | 800.00 | 67 | 7 |

Code | Catchment | Area (km^{2}) | Mean elevation (m a.s.l.) | Length of the longest drainage path (km) | Mean annual precipitation (mm) | CN (–) | n° annual streamflow maxima |
---|---|---|---|---|---|---|---|

#1 | Alli at Orso | 46.47 | 1,143.6 | 24.93 | 1,256.43 | 76 | 47 |

#2 | Alaco at Pirrella | 31.68 | 968.97 | 16.91 | 1,573.56 | 76 | 13 |

#3 | Corace at Grascio | 177.34 | 822 | 43.84 | 1,173.46 | 75 | 35 |

#4 | Coscile at Camerata | 274.5 | 748.91 | 32.34 | 1,037.44 | 50 | 44 |

#5 | Esaro at La Musica | 537.37 | 520.18 | 45.73 | 1,160.11 | 64 | 18 |

#6 | Esaro at San Francesco | 87.89 | 111.46 | 17.26 | 664.68 | 70 | 10 |

#7 | Tacina at Rivioto | 77.07 | 1,302.87 | 31.36 | 1,241.79 | 73 | 25 |

#8 | Turbolo at Mongrassano | 27.9 | 306.97 | 13.83 | 800.00 | 67 | 7 |

Table 3 shows the standard deviations associated with both sample and regional values of the selected signatures for each study catchment: differences are particularly relevant for the first order moment (one order of magnitude for Coscile at Camerata, #4), mainly due to uncertainties related to sampling errors, given that the model error variance is equal to 0.223 (Table 1). However, variation coefficients of range from 0.51 to 0.69 for regional estimates and vary in a range of lower values, from 0.10 to 0.41, considering sample statistics.

Code | Catchment | (m^{3}/s) | (m^{3}/s) | (–) | (–) | (–) | (–) |
---|---|---|---|---|---|---|---|

#1 | Alli at Orso | 10.5 | 1.7 | 0.105 | 0.044 | 0.196 | 0.096 |

#2 | Alaco at Pirrella | 11.3 | 10.8 | 0.106 | 0.149 | 0.195 | 0.248 |

#3 | Corace at Grascio | 48.8 | 17.7 | 0.104 | 0.053 | 0.192 | 0.105 |

#4 | Coscile at Camerata | 83 | 7.7 | 0.106 | 0.047 | 0.205 | 0.093 |

#5 | Esaro at La Musica | 304.0 | 59.9 | 0.104 | 0.086 | 0.196 | 0.165 |

#6 | Esaro at San Francesco | 262.0 | 52.7 | 0.127 | 0.074 | 0.233 | 0.176 |

#7 | Tacina at Rivioto | 31.1 | 20.2 | 0.105 | 0.103 | 0.202 | 0.151 |

#8 | Turbolo at Mongrassano | 16.3 | 4.1 | 0.106 | 0.084 | 0.197 | 0.234 |

Code | Catchment | (m^{3}/s) | (m^{3}/s) | (–) | (–) | (–) | (–) |
---|---|---|---|---|---|---|---|

#1 | Alli at Orso | 10.5 | 1.7 | 0.105 | 0.044 | 0.196 | 0.096 |

#2 | Alaco at Pirrella | 11.3 | 10.8 | 0.106 | 0.149 | 0.195 | 0.248 |

#3 | Corace at Grascio | 48.8 | 17.7 | 0.104 | 0.053 | 0.192 | 0.105 |

#4 | Coscile at Camerata | 83 | 7.7 | 0.106 | 0.047 | 0.205 | 0.093 |

#5 | Esaro at La Musica | 304.0 | 59.9 | 0.104 | 0.086 | 0.196 | 0.165 |

#6 | Esaro at San Francesco | 262.0 | 52.7 | 0.127 | 0.074 | 0.233 | 0.176 |

#7 | Tacina at Rivioto | 31.1 | 20.2 | 0.105 | 0.103 | 0.202 | 0.151 |

#8 | Turbolo at Mongrassano | 16.3 | 4.1 | 0.106 | 0.084 | 0.197 | 0.234 |

In the following sections, the possible implications on model outputs of the uncertainty associated with the indices used for model parameter conditioning are also investigated.

## RESULTS AND DISCUSSION

### Marginal posterior distributions

In order to examine the effects of the adopted methodology, one catchment at a time was analysed and treated as ‘ungauged’ and results compared to those obtained when using available sample data in parameter distribution conditioning.

The four-dimensional prior parameter distribution was approximated by *N* = 1,000 parameter sets randomly sampled from a uniform parameter space using the Latin hypercube method. For each parameter set, the hydrological model was run using 10 years (2000–2009) of observed 20-min rainfall data as input. Each parameter set was, then, associated with a weight based on the closeness of simulated L-moments () to the corresponding regional value (), using a normal distribution as described in the section ‘Parameter conditioning’.

Several combinations of hydrological signatures were considered in the application (seven possible combinations: , , , -, -, -, --) obtaining different posterior parameter distributions.

For the sake of clarity, only the posterior distributions obtained for a single signature conditioning (, and ) and for the simultaneous combination -- are illustrated. As a benchmark against which results can be compared, prior uniform distributions are also depicted.

For every conditioning signatures combination, the parameters related to rainfall excess calculation, namely *CN* and *λ*, are those for which posterior marginal distributions more significantly differ from the prior ones. This evidence is generally valid for the overall set, although in a number of catchments (#1, #2, #3, #7), notable differences are also evident for routing parameters, *n* and *k*, particularly when considering the simultaneous combination of the three signatures, that in this case seems to be more effective for parameter identification. All these catchments are located on the western and inland parts of the study area, and consist of similar *CN* values.

The effect of adding more signatures to constrain model parameter distribution was assessed considering the performance of the mean of simulations in reproducing available time series of observed annual flow maximum and by using the average percentage error as a measure of accuracy. Unfortunately, simultaneous sub-daily series of rainfall data and discharge annual maxima are available only for basin #8. Looking at the results, not shown here, as expected the mean values of conditioned posterior distributions perform significantly better than that derived from the uniform distribution in reproducing peak flows, but it seems that passing from two to three regional signatures does not necessarily improve parameter identification. However, this analysis deserves further investigation.

*CN*one (not shown here for sake of brevity): indeed, according to Table 2, sample L-moments generally have lower uncertainties compared to the regional ones. Figure 5 shows the scatter plots of the posterior median values for each parameter obtained using the simultaneous combination of the L-moments for both regional and sample signatures. The best correlation between the pairs is observed for

*CN*median values: moreover, with only the exception of catchment #2, either considering regional signatures or sample signatures, the conditioning leads to high

*CN*median values, generally higher than those identified through the handbook tables (Table 2).

### Design flood estimation

For each catchment, the continuous simulation consists of: (1) stochastic generation of 500 years of 20-min synthetic rainfall data generated by the two-stage rainfall model; (2) application of the lumped rainfall-runoff model to generate synthetic discharge series; and (3) derivation of discharge values corresponding to assigned return periods from the empirical distribution of simulated annual maxima.

In order to assess the influence of different posterior parameter distributions on the *T*-year peak flow uncertainty, the continuous simulation method is applied within a Monte Carlo scheme (MCS).

As in Biondi & De Luca (2015), parameters were sampled from three notable distributions, each one being emblematic of a particular situation about data availability:

(1) a

**Prior**uniform distribution;(2) a posterior distribution conditioned on the combination of the three regional signatures, indicated in the following as

**Post-Reg**; and(3) a posterior distribution conditioned on the combination of the three sample signatures, indicated in the following as

**Post-Sample.**

*et al.*1984) was considered. Focusing on

*T*= 100 years, Figure 6 shows, for each catchment, box-plots derived from Monte Carlo simulations based on the three assumed parameter distributions. The blue and the black thick lines represent the average and the median values respectively, while the box-plot denotes the limits of the 50% and the whiskers indicate the 80% Monte Carlo confidence intervals. In each panel the quantile corresponding to the TCEV distribution fitted on the regional L-moments (TCEV-Reg, grey triangle) is plotted together with the quantile obtained when the TCEV distribution is estimated using sample L-moments (TCEV-Sample, black circle).

The results show that, by using Post-Reg and Post-Sample, there is a marked reduction of uncertainty band amplitudes with respect to the Prior distribution. An exception is represented by basin #6 (Esaro at Ponte San Francesco): for the Post-Sample case, the posterior distribution of *CN* presents high frequencies for values close to 100, and the marginal distributions for *n* and *k* are characterized by high frequencies for low values, which implies peak discharges are more sensitive to rainfall input values. Considering these effects, the box plot for the annual maximum reflected the variability of the rainfall input.

*N*is the number of basins,

_{c}*y*is the value which is considered as representative of simulations for the

_{i}*i*-th basin, and

*y*is the quantile obtained by using the TCEV distribution.

_{i,TCEV}In detail, both mean and median values are used for *y _{i}*, and both TCEV-Reg and TCEV-Sample are adopted for

*y*.

_{i,TCEV}Table 4 summarizes the values of MAE and AAPE indices. It is evident that, considering as reference either the mean or the median values from box-plots:

Post-Reg reproduces the quantile TCEV-Reg better than Prior; and

Post-Sample reproduces the quantile TCEV-Sample better than Prior, and in this case the values of AAPE are 50% (when mean value is considered) and 60% (when median value is adopted) of the corresponding values for Prior.

Prior vs TCEV-Reg | Prior vs TCEV-Sample | Post-Reg vs TCEV-Reg | Post-Sample vs TCEV-Sample | |||||
---|---|---|---|---|---|---|---|---|

MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | |

Mean value | 272.6 | 0.7 | 229.5 | 0.6 | 204.3 | 0.6 | 194.0 | 0.3 |

Median value | 318.3 | 0.6 | 224.0 | 0.5 | 224.4 | 0.5 | 157.8 | 0.3 |

Prior vs TCEV-Reg | Prior vs TCEV-Sample | Post-Reg vs TCEV-Reg | Post-Sample vs TCEV-Sample | |||||
---|---|---|---|---|---|---|---|---|

MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | MAE (m^{3}/s) | AAPE (–) | |

Mean value | 272.6 | 0.7 | 229.5 | 0.6 | 204.3 | 0.6 | 194.0 | 0.3 |

Median value | 318.3 | 0.6 | 224.0 | 0.5 | 224.4 | 0.5 | 157.8 | 0.3 |

Overall, for Post-Sample, six basins out of eight have TCEV-Sample estimates that fall within the respective 80% confidence bands, while for Post-Reg, four basins present whisker-plots comprising TCEV-Reg estimates. The biased estimation and the simultaneous reduced width of the box-plots, which occur for some basins (for example see Alli basin, Figure 6(a)) can be mainly explained by the following:

It is possible that simultaneous conditioning on all three L-moments can provide a good reconstruction of the distribution shape of floods, but with a not negligible bias.

Moreover, very low values of standard deviations for all three L-moments (as in the case of sample estimates of the Alli basin) implies the parameter posterior distribution is characterized by a low variability, and therefore very similar values of simulated quantiles.

## CONCLUSIONS

In this study, the application of a Bayesian procedure to be adopted for deriving posterior distribution of rainfall-runoff model parameters in ungauged or poorly gauged catchments, was illustrated. Based on the available regional estimates for the L-moments of annual streamflow maxima we tested the procedure with respect to design flood estimation for the case study of a region in southern Italy.

The results provided by the conditioning performed on regional signature estimates have a notable effect mainly on the two parameters related to the rainfall excess calculation, i.e. *CN* and *λ*.

Regarding the possibility to estimate the design flood values, outcomes of a continuous simulation application, carried out following a MCS based on different conditioned posterior parameter distributions, were compared with results of direct and regional statistical frequency analysis on annual maximum flood series.

For assigned return periods, the results show that the signature based conditioning procedure is able to constrain simulated peak flows and to better reproduce the statistically derived design flood values, especially if compared to those obtained from the uniform prior distribution of model parameters, which represent the common case in practical applications.

Despite the computational limitations placed on the method, the applied methodology offers attractive perspectives to perform model calibration and uncertainty analysis based on the available regional information in ungauged locations.