Identification of distributed precipitation–runoff models for hourly runoff simulation based on transfer of full parameters (FP) and partial parameters (PP) are lacking for boreal mid-Norway. We evaluated storage–discharge relationships based model (Kirchmod), the Basic-Grid-Model (BGM) and a simplified Hydrologiska Byråns Vattenbalansavdelning (HBV) model for multi-basins (26 catchments). A regional calibration objective function, which uses all streamflow records in the region, was used to optimize local calibration parameters for each catchment and regional parameters yielding maximum regional weighted average (MRWA) performance measures (PM). Based on regional median Nash–Sutcliffe efficiency (NSE) and NSEln (for log-transformed series) for the calibration and validation periods, the Kirchmod model performed better than the others. Parsimony of the Kirchmod model provided less parameter uncertainty for the FP case but did not guarantee parameter identifiability. Tradeoffs between parsimony and performance were observed despite advantages of parsimony to reduce parameter correlations for the PP, which requires preliminary sensitivity analysis to identify which parameters to transfer. There are potential advantages of using the MRWA method for parameter transfer in space. However, temporal validation indicated marked deterioration of the PM. The tradeoffs between parameter transfers in space and time substantiate both spatial and temporal validation of the regional calibration methodology.

## INTRODUCTION

Continuous streamflow simulation by precipitation–runoff (P-R) models for prediction purposes are widely employed, for instance to predict streamflow to reservoirs, floods and droughts, and to assess effects of the alteration of natural flow regime due to anthropogenic impacts. Moreover, utilization of hydropower reservoirs to satisfy peak energy demands (hydropeaking operation) requires streamflow forecasting at high temporal resolution. The European Water Framework Directive requirements for ecological protection further substantiate the need for better hydrological predictions for ecological impact management in regulated rivers. In addition, prevalence of flood events associated with the issues of land use and climate change require forecasting at high temporal resolution.

The current technology allows for measurements of environmental variables such as rainfall and streamflow with fine temporal resolution and a vast amount of sub-daily data from different sources may be available (see Jones 2005). However, the majority of previous studies on identification of the P-R models for continuous simulation and prediction purposes in the literature are based on a daily time scale, which leaves the potential high information content of available hourly data unexplored. Previous studies (e.g., Kavetski *et al.* 2011; Bastola & Murphy 2013) illustrated the dependence of optimal model parameters on the temporal resolution of data and substantial drawbacks of parameter transfer from daily calibration to prediction on an hourly time scale. Therefore, there is an interest in hourly calibration and prediction for operational use, which requires comprehensive study relevant to the research gaps on identification of suitable P-R models for the hourly prediction.

Wagener & McIntyre (2005) conducted a study on the identification of lumped conceptual rainfall–runoff models for operational applications based on daily streamflow on three catchments in the UK using the ‘split-sample’ and ‘proxy basin’ operational testing schemes of Klemeś (1986), and goodness-of-fit metrics for different flow ranges. Fenicia *et al.* (2011) used a flexible framework to identify model performance of several model structures for four different catchments in Europe and New Zealand. Smith & Marshall (2009) carried out model selection based on a suite of 30 conceptual, modular structures for a snow-dominated, mountainous experimental watershed in the USA using 12 hourly data. Orellana *et al.* (2008) applied seven semi-distributed rainfall–runoff model structures using hourly data from four gauging stations in the UK. However, these studies focused on coarse temporal resolutions and/or on a single catchment (with only one or more gauges) or a small number of catchments in a region rather than on fine temporal resolution (e.g., hourly) and multi-basin regional scale modelling based identification of the P-R algorithms.

There are also studies based on both multi-model and multi-basin simulations for both daily and hourly resolutions. Lee *et al.* (2005) conducted a study on the selection of 12 daily conceptual model structures for regionalization for prediction in ungauged basins (PUB) of the rainfall–runoff relationships for 28 UK catchments. Oudin *et al.* (2008, 2010) used two lumped models and daily streamflow records from a large number of catchments in France respectively for comparison of regionalization approaches for the PUB and for studying the relationships between physical similarity and hydrological similarity of catchments. Viviroli *et al.* (2009a, 2009b) conducted calibration for 140 mesoscale catchments for hourly flood prediction in ungauged Swiss catchments. However, the majority of the previous studies on multi-model calibration based on multi-basin data mainly focused on regionalization for the PUB rather than on the identification or performance evaluation of the models among alternative hydrological mechanisms as suggested by Jones (2005). An exception is the work by Perrin *et al.* (2001), who conducted a multi-model comparative performance assessment of 19 parsimonious to more complex daily lumped models on 429 catchments mostly located in France.

A thorough study of the identification of P-R models in simulation mode has the potential for improving forecast accuracy. Better performance of the precipitation–runoff models in simulation mode is crucial for forecast modes (see Refsgaard 1997; Bell & Moore 1998; Engeland & Steinsland 2014). In addition, the specific tools used in forecasting for data assimilation and correction affect the performance of a forecast (see Nicolle *et al.* 2014). Therefore, the review indicates that the previous work on hourly identification of P-R models based on multi-basin or regional calibration approach is lacking for boreal snow-dominated catchments. The use of regional scale data and hence data augmentation through the regional calibration is expected to allow more comprehensive performance evaluation than the at-site records-based local calibration and ‘proxy basin’-based model validation.

Identification of the P-R models are dependent on objective functions used for model calibration and performance measures used for model evaluation. For instance, fitting of the P-R models to reproduce the whole hydrograph for scientific research or to a specific flow regime for operational purposes would result in different optimal parameter vectors. For operational applications, it is common practice to use the P-R models as ‘fit-for-purpose’ decision support tools. The commonly used adjustments to make the operational models more right for a ‘fit-for-purpose’ performance are the error or bias correction parameters for precipitation measurements (e.g., Sevruk 1983; Yang *et al.* 1999; Herrnegger *et al.* 2014), but Moine *et al.* (2007) suggested that this practice should be avoided. In addition, an altitudinal gradient parameter for precipitation are considered in some applications but Hingray *et al.* (2010) noted that omitting an altitudinal gradient is a good option to simulate flood events, especially in cases of large precipitation events. Such adjustments for operational settings have the potential to force the models to be ‘right for the wrong reasons’ (Kirchner 2006).

Therefore, comprehensive identification of the P-R models is required for reliable continuous simulation of streamflow (e.g., Wagener 2003). Hailegeorgis *et al.* (2015b) focused on multi-model-based identification of four different types of regionalization methods including the regional calibration method defined by parameter sets yielding maximum regional weighted average (MRWA) performance measures (PM) based on transfer of full set of local calibrated parameters (FP). The authors applied the three P-R models on 26 catchments in mid-Norway, which are also used in the present study. Due to similar performance of the regionalization methods based on the MRWA and transferring of regional median parameters (RMedP), the authors suggested that it is worth testing the performance of fixing some of the parameters to regional median values, for instance, the snow and runoff routing routines parameters that are common to the three models, and then perform calibration and transfer of partial parameters (PP). Fixing some of the parameters is advantageous since it allows a more parsimonious parameterization while it may have potential disadvantages of reducing the performance of the models. However, studies related to the issues of transferring the full parameter set or partial parameters are necessary to further improve the results of regionalization tasks.

The main objective of the present study is the identification of the three P-R models for hourly runoff simulation based on calibration and transfer of partial parameters (PP) for the 26 catchments in mid-Norway compared to a study for the same region using the full parameter calibration and transfer (FP) case of Hailegeorgis *et al.* (2015b).

## THE STUDY REGION AND DATA

The study region is boreal mid-Norway, which consists of 26 unregulated gauged catchments ranging from 39 to 3,090 km^{2} in size (Table 1 and Figure 1). Streamflow and climate records of hourly time resolution (01.09.2008–01.01.2012) were used for model calibration. The climate forcing are precipitation (P), temperature (T), wind speed (W_{s}), relative humidity (H_{R}) and global radiation (R_{G}). Figure 1 shows locations of precipitation and streamflow gauging stations. Table 1 contains some characteristics of the catchments and streamflow stations.

Catchment no. | NVE* station no. | Station altitude (masl) | Catchment area (km^{2}) | Drainage density (km/km^{2}) | Forest (% of area) | Mountains/Bare rock (% of area) | Marsh/Bog (% of area) | Lakes (% of area) |
---|---|---|---|---|---|---|---|---|

1 | 127.13 | 25 | 480 | 2.1 | 41.1 | 21.2 | 21.3 | 2.7 |

2 | 109.9 | 550 | 745 | 0.8 | 6.8 | 84.0 | 1.2 | 1.9 |

3 | 122.11 | 330 | 668 | 2.0 | 24.6 | 44.0 | 12.6 | 2.8 |

4 | 139.26 | 160 | 495 | 1.8 | 18.7 | 61.7 | 10.6 | 3.4 |

5 | 124.13 | 401 | 220 | 2.0 | 24.5 | 34.4 | 19.6 | 14.8 |

6 | 122.9 | 45 | 3,090 | 1.9 | 36.7 | 35.8 | 14.5 | 2.1 |

7 | 121.29 | 580 | 95 | 1.2 | 29.6 | 46.8 | 15.4 | 0.6 |

8 | 122.17 | 135 | 546 | 1.6 | 53.6 | 20.7 | 16.7 | 1.0 |

9 | 124.2 | 97 | 495 | 2.1 | 28.7 | 27.5 | 26.2 | 7.4 |

10 | 103.2 | 103 | 44 | 1.0 | 18.2 | 72.8 | 0.4 | 2.0 |

11 | 123.31 | 173 | 145 | 1.8 | 43.0 | 35.1 | 14.8 | 2.2 |

12 | 133.7 | 87 | 207 | 1.2 | 20.2 | 57.0 | 9.0 | 7.6 |

13 | 308.1 | 354 | 450 | 1.7 | 39.3 | 24.8 | 22.1 | 8.1 |

14 | 122.14 | 515 | 168 | 3.1 | 21.7 | 65.3 | 9.0 | 1.1 |

15 | 307.5 | 311 | 346 | 1.8 | 53.7 | 17.1 | 13.4 | 8.9 |

16 | 105.1 | 12 | 138 | 1.3 | 57.4 | 9.6 | 15.9 | 8.2 |

17 | 103.4 | 60 | 1,100 | 0.9 | 17.0 | 72.8 | 1.1 | 3.8 |

18 | 112.8 | 460 | 91 | 1.2 | 15.1 | 64.1 | 6.9 | 3.1 |

19 | 139.25 | 354 | 546 | 1.8 | 48.3 | 19.6 | 15.1 | 12.4 |

20 | 111.9 | 40 | 138 | 1.9 | 36.0 | 37.2 | 6.1 | 0.9 |

21 | 128.5 | 80 | 477 | 1.9 | 53.0 | 16.1 | 19.4 | 4.8 |

22 | 139.35 | 137 | 852 | 1.8 | 42.1 | 29.2 | 13.6 | 9.3 |

23 | 117.4 | 10 | 39 | 1.2 | 35.2 | 0.0 | 27.7 | 10.1 |

24 | 100.1 | 265 | 226 | 0.9 | 9.8 | 77.7 | 0.5 | 5.3 |

25 | 104.23 | 50 | 67 | 2.2 | 31.9 | 55.2 | 4.1 | 2.3 |

26 | 138.1 | 103 | 239 | 1.7 | 39.6 | 26.6 | 18.8 | 6.1 |

Catchment no. | NVE* station no. | Station altitude (masl) | Catchment area (km^{2}) | Drainage density (km/km^{2}) | Forest (% of area) | Mountains/Bare rock (% of area) | Marsh/Bog (% of area) | Lakes (% of area) |
---|---|---|---|---|---|---|---|---|

1 | 127.13 | 25 | 480 | 2.1 | 41.1 | 21.2 | 21.3 | 2.7 |

2 | 109.9 | 550 | 745 | 0.8 | 6.8 | 84.0 | 1.2 | 1.9 |

3 | 122.11 | 330 | 668 | 2.0 | 24.6 | 44.0 | 12.6 | 2.8 |

4 | 139.26 | 160 | 495 | 1.8 | 18.7 | 61.7 | 10.6 | 3.4 |

5 | 124.13 | 401 | 220 | 2.0 | 24.5 | 34.4 | 19.6 | 14.8 |

6 | 122.9 | 45 | 3,090 | 1.9 | 36.7 | 35.8 | 14.5 | 2.1 |

7 | 121.29 | 580 | 95 | 1.2 | 29.6 | 46.8 | 15.4 | 0.6 |

8 | 122.17 | 135 | 546 | 1.6 | 53.6 | 20.7 | 16.7 | 1.0 |

9 | 124.2 | 97 | 495 | 2.1 | 28.7 | 27.5 | 26.2 | 7.4 |

10 | 103.2 | 103 | 44 | 1.0 | 18.2 | 72.8 | 0.4 | 2.0 |

11 | 123.31 | 173 | 145 | 1.8 | 43.0 | 35.1 | 14.8 | 2.2 |

12 | 133.7 | 87 | 207 | 1.2 | 20.2 | 57.0 | 9.0 | 7.6 |

13 | 308.1 | 354 | 450 | 1.7 | 39.3 | 24.8 | 22.1 | 8.1 |

14 | 122.14 | 515 | 168 | 3.1 | 21.7 | 65.3 | 9.0 | 1.1 |

15 | 307.5 | 311 | 346 | 1.8 | 53.7 | 17.1 | 13.4 | 8.9 |

16 | 105.1 | 12 | 138 | 1.3 | 57.4 | 9.6 | 15.9 | 8.2 |

17 | 103.4 | 60 | 1,100 | 0.9 | 17.0 | 72.8 | 1.1 | 3.8 |

18 | 112.8 | 460 | 91 | 1.2 | 15.1 | 64.1 | 6.9 | 3.1 |

19 | 139.25 | 354 | 546 | 1.8 | 48.3 | 19.6 | 15.1 | 12.4 |

20 | 111.9 | 40 | 138 | 1.9 | 36.0 | 37.2 | 6.1 | 0.9 |

21 | 128.5 | 80 | 477 | 1.9 | 53.0 | 16.1 | 19.4 | 4.8 |

22 | 139.35 | 137 | 852 | 1.8 | 42.1 | 29.2 | 13.6 | 9.3 |

23 | 117.4 | 10 | 39 | 1.2 | 35.2 | 0.0 | 27.7 | 10.1 |

24 | 100.1 | 265 | 226 | 0.9 | 9.8 | 77.7 | 0.5 | 5.3 |

25 | 104.23 | 50 | 67 | 2.2 | 31.9 | 55.2 | 4.1 | 2.3 |

26 | 138.1 | 103 | 239 | 1.7 | 39.6 | 26.6 | 18.8 | 6.1 |

*Norges vassdrags-og energidirektorat (Norwegian Water Resources and Energy Directorate).

Precipitation occurs in the form of snowfall during winter and rainfall dominates during summer, spring and autumn. The catchments exhibit wide ranges of variations in elevation and terrain slope. There is no systematic relationship between elevation and mean annual precipitation for the region and hence we did not consider altitudinal gradient corrections for the hourly precipitation data. An environmental lapse rate of −0.65 °C/100 m was used to account for elevation–temperature relationship. The dominant land uses/land covers in the study area are mountainous terrain above timberline and forests. Predominant soil or loose material is glacial tills and the dominant bedrock types for the study catchments are metamorphic and igneous rocks (http://www.ngu.no).

## MODELS AND METHODS

We evaluated three different distributed (1 × 1 km^{2} grid) precipitation–runoff models, namely the ‘top-down’ water balance model based on Kirchner (2009) or Kirchmod, the Basic-Grid-Model based on Bell & Moore (1998) or BGM and a simple configuration HBV model. Table 2 presents lists of calibrated parameters and their prior ranges or values of fixed parameters for both full parameter (FP) transfer and partial parameter (PP) transfer of the present study. For the PP case, parameters that are common for the three models were fixed to their multi-model regional median or MMRMedP (Equation (9)) values of the respective parameters obtained from calibration of the FP case. Similarly, parameters in the soil moisture accounting routine of the HBV model and exponent parameter of the subsurface drainage equation (Equation (6)) of the BGM model were fixed to their regional median or RMedP (Equation (7)) values. A total of six, seven and nine parameters were calibrated for the FP case for the Kirchmod, BGM and HBV models, respectively. A total of three parameters were calibrated for the PP case for all models. Therefore, for the PP case a total of three, four and six parameters of the Kirchmod, BGM and HBV models, respectively, were fixed. Brief descriptions of the models are given here. Descriptions of the models that are more detailed are referred to in Hailegeorgis *et al.* (2015b).

Prior ranges or fixed values | |||
---|---|---|---|

Parameters | Unit | Full parameters (FP) | Partial parameters (PP) |

Kirchmod | |||

EvR | – | [0.1–6.15] | [0.1–6.15] |

b _{0} | – | [−8–0] | [−8–0] |

b _{1} | – | [−1–1] | [−1–1] |

BGM | |||

S _{max} | mm | [150–1,000] | [150–1,000] |

I _{c} | mm h^{−1} | [0.1–40] | [0.1–40] |

k | mm^{1−n} h^{−1} | [10^{−7}–10^{−3}] | [10^{−7}–10^{−3}] |

n | – | [0.2–5] | 2.35 |

HBV | |||

k _{1} | d^{−1} | [0.001–1.5] | [0.001–1.5] |

k _{0} | d^{−1} | [0.0005–0.5] | [0.0005–0.5] |

PERC | mm d^{−1} | [0–6] | [0–6] |

FC | Mm | [50–600] | 307 |

LP | – | [0.5–0.99] | 0.90 |

β | – | [0.5–5] | 1.14 |

Parameters common for the three models | |||

TX | °C | [−3–2] | −0.82 |

WS | – | [1–6] | 4.35 |

V | m s^{−1} | [0.25–3.5] | 1.78 |

Prior ranges or fixed values | |||
---|---|---|---|

Parameters | Unit | Full parameters (FP) | Partial parameters (PP) |

Kirchmod | |||

EvR | – | [0.1–6.15] | [0.1–6.15] |

b _{0} | – | [−8–0] | [−8–0] |

b _{1} | – | [−1–1] | [−1–1] |

BGM | |||

S _{max} | mm | [150–1,000] | [150–1,000] |

I _{c} | mm h^{−1} | [0.1–40] | [0.1–40] |

k | mm^{1−n} h^{−1} | [10^{−7}–10^{−3}] | [10^{−7}–10^{−3}] |

n | – | [0.2–5] | 2.35 |

HBV | |||

k _{1} | d^{−1} | [0.001–1.5] | [0.001–1.5] |

k _{0} | d^{−1} | [0.0005–0.5] | [0.0005–0.5] |

PERC | mm d^{−1} | [0–6] | [0–6] |

FC | Mm | [50–600] | 307 |

LP | – | [0.5–0.99] | 0.90 |

β | – | [0.5–5] | 1.14 |

Parameters common for the three models | |||

TX | °C | [−3–2] | −0.82 |

WS | – | [1–6] | 4.35 |

V | m s^{−1} | [0.25–3.5] | 1.78 |

### Kirchner's runoff response routine (Kirchmod)

*Q*depends solely on the amount of catchment water storage (S) based on a nonlinear catchment storage–discharge relationship and a water balance equation: where

*g(Q)*=

*dQ*/

*dS*is the discharge sensitivity function (Kirchner 2009). The following linear regression relationship were inferred based on streamflow recession analysis following Kirchner (2009): The actual evapotranspiration (AET) was computed from potential evapotranspiration (PET) and discharge: where the AET, infiltration (I) = rainfall + snow melt (SM) and

*Q*are in mm/hr, S is in mm and

*t*is a time variable. The

*EvR*denotes a discharge at which AET equals 0.95*PET. The SCA is snow-covered fraction of grid cell to set the AET to zero for snow-covered areas. A Runge Kutta 4th order method was used to solve the integral (Equation (2)) over the time step. The

*Q*is an instantaneous simulated discharge obtained from the solver while an average

*Q*over the time step is used for calibration against an hourly averaged observed discharge. Observed discharge before the start of model run was used as an initial discharge for the numerical solver. Only the three response routine parameters

*b*,

_{0}*b*and

_{1}*EvR*were calibrated for the PP case.

### Basic Grid Model (BGM) runoff response routine

_{iex}[L] (Horton 1933), saturation excess runoff, R[L] (Dunne & Black 1970a, 1970b) and a subsurface drainage (D

_{rv}) runoff generation mechanisms are considered: where SNOWOUT[L] is the rainfall and snowmelt outflow from snow routine, TOSOIL[L] is the infiltration into the soil, TOSTORAGE[L] is the net input to the subsurface storage (S[L]), PET[L] and AET[L] are as defined earlier, Drv[L] is the subsurface flow or drainage per unit area, and L and T denote length and time dimensions. The

*I*[L/T] or an infiltration capacity, the coefficient

_{c}*k*[L

^{1−n/T}] and maximum subsurface storage capacity or

*S*[L] were calibrated parameters. Since marked correlation between the

_{max}*k*and

*n*[-] parameters was observed for the FP case in Hailegeorgis

*et al.*(2015b), in the present study the parameter

*n*was fixed to its calibrated RMedP (Equation (7)) value of the FP case to reduce the correlation and non-identifiability between the two parameters: where RmedP denotes regional median parameter,

*P*to

_{1}*P*denotes calibrated values of the parameter for each catchment and

_{NC}*N*is the total number of catchments calibrated.

_{C}### The HBV runoff response routines

*Q*and

_{UZ}*Q*, respectively, are outflows from the upper and lower reservoirs. Percolation from the upper to the lower reservoir in the runoff response routine is controlled by percolation parameter (

_{LZ}*PERC*). The soil moisture accounting routine was based on a non-linear partitioning curve for infiltration into change in soil moisture storage (

*ΔSM*) and recharge (R) to the upper zone (Bergström 1976). Only the three parameters of the runoff response routine, namely, recession coefficients in the upper reservoir (

*k*), base flow recession coefficient (

_{1}*k*) and the percolation rate (

_{0}*PERC*) to the lower zone were calibrated for the PP case. Two of the soil moisture accounting parameters, namely, shape parameter of the partitioning curve (

*β*) and field capacity (

*FC*) were fixed to RMedP (Equation (7)) values calibrated for the FP case (Table 2). The ‘limit for potential evaporation’ (

*LP*) was set to a constant value of 0.90, which is a default value of HBV-96 (Booij 2005).

### Snow accounting routine

*Qs*) and the remaining unmelted snow storage or the snow water equivalent (SWE) based on the Gamma distributed snow depletion curve (SDC). The SDC uses radiation for surface layer energy and phase change calculations (Kolberg & Gottschalk 2006) as implemented in

*ENKI*hydrological modelling platform (Kolberg & Bruland 2012). The parameters in this routine are common for the three models and include rainfall–snowfall threshold temperature (

*TX*) and snowmelt sensitivity to wind speed (

*WS*). These parameters were fixed to MMRMedP values (Equation (9)) of respective parameters calibrated for the FP case: where

*M*,

_{1}*M*and

_{2}*M*denotes Kirchmod, BGM and HBV models, respectively.

_{3}### Potential evapotranspiration routine

*α*is the Priestley–Taylor constant, Δ is the slope of saturation vapour pressure curve at air temperature at 2 m (kPa/°C),

*γ*is the psychrometric constant (0.066 kPa/°C),

*R*(W/m

_{n}^{2}) is net radiation,

*L*(kJ/m

_{v}^{3}) is volumetric latent heat of vaporization and

*Δt*(s) is the simulation time step in seconds. The net radiation is the sum of net shortwave radiation and net longwave radiation. We computed the net shortwave radiation from the global radiation (R

_{G}) and land albedo, and the net longwave radiation based on Sicart

*et al.*(2006). Following Teuling

*et al.*(2010),

*α*= 1.26 was used to reduce the number of calibrated parameters.

### Runoff routing

*et al.*(2015a) applied a source-to-sink routing with effective velocity of flow for mountainous catchments in mid-Norway. Li

*et al.*(2014) applied cell-to-cell routing and source-to-sink routing with spatially distributed velocity of flow for mountainous catchments in central southern Norway. Following Hailegeorgis

*et al.*(2015b), a simple translation based on 1-hr travel time isochrones was used to translate the runoff response from the hillslope (1 × 1 km

^{2}grid cells) to the catchment outlet. Routed simulated streamflow at the outlet is the sum of contributions from each grid cell: where

*t*and

*i*represent time and grid cells,

*N*is the number of grid cells in the catchments,

*Qsim*[LT

^{−3}] is streamflow at the outlet,

*qsim*[LT

^{−3}] is runoff generated at each grid cell,

*T*[T] is flow travel time lag to the outlet for each grid,

_{i}*L*[L] is flow travel path length computed from 25 m digital elevation model (DEM). The

_{i}*V*[LT

^{−1}] is velocity of flow, which is a parameter common to the three models and was fixed to MMRMedP (Equation (9)) of calibrated values for the FP case.

### Model calibration and evaluation

*et al.*2009) was used with residuals-based log-likelihood (

*L-L*) objective function, which was implemented in

*ENKI*hydrological modelling platform (Kolberg & Bruland 2012): where

*δ*denotes model parameter,

*σ*and

_{i}^{2}*n*, respectively, are error variance and the length of non-missing records of streamflow for catchment

_{i}*i*,

*N*is the total numbers of catchments in the region,

_{C}*Qsim*

^{(θ)}and

*Qobs*

^{(θ)}, respectively, are Box–Cox (Box & Cox 1964) transformed observed and simulated streamflow time series,

*θ*is the Box–Cox transformation parameter and

*f*represents a fraction of effectively independent observations which can be estimated from the autoregressive (AR1) model of error covariance (Zieba 2010). We used the Box–Cox transformation to approximate normality and homoscedasticity of the residuals. Values of

*θ*between 0.25 and 0.30 are common in the literature (e.g., Willems 2009). We used

*θ*= 0.3 and

*f*= 0.001 for the sake of consistency among the catchments. The DREAM calibration algorithm converges as the Gelman–Rubin convergence (Gelman & Rubin 1992) comes below 1.2. Details of the DREAM algorithm can be found in Vrugt

*et al.*(2009).

We evaluated the local and regional calibration based on the Nash–Sutcliffe efficiency or NSE (Nash & Sutcliffe 1970) and Nash–Sutcliffe efficiency for log-transferred series (NSEln) performance measures (PM). The NSE gives greater weight to high flows and the NSEln gives greater weight to low flows.

*et al.*2015b). The objective function in Equation (12) uses streamflow data from all stations in the region rather than using at-site streamflow records from only a particular site. Therefore, parameter sets among the DREAM samples which provide maximum performance measures (PM) for each catchment are taken as optimized parameters for local calibration (LC) for a specific catchment. Optimal parameter sets for the regional calibration are parameter sets among the DREAM samples that provided MRWA performance measures. In the present study, the term regional calibration and the MRWA are used interchangeably. Hailegeorgis

*et al.*(2015b) reported nearly equivalent performance of the MRWA method to more advanced regionalization methods like the physical similarity and spatial proximity methods. In the present study, the MRWA is used to evaluate the regional performance and hence performance of the models for prediction in ungauged basins. We allocated the weight for each catchment based on their length of non-missing streamflow records during the calibration period: where

*N*is the total length of time series for the calibration period. The weights for each catchment are the term in the parentheses, which are assigned based on the length of their non-missing streamflow records.

_{TS}The classical split-sample test (Klemeś 1986) was used for validation of the models (for FP and PP cases) outside the period used for calibration based on NSE for both local and regional calibration, and NSEln for regional calibration. Due to lack of long-term records, a validation period of only one year (01.01.2006–01.01.2007) was used. The regional calibration used in the present study is similar to regional calibration works, among others Fernandez *et al.* (2000), Beldring *et al.* (2003) and Engeland *et al.* (2006), except for the fact that weighted average performance measures are used rather than arithmetic averages for model evaluations. Model validation for this type of regional calibration is not common in the literature. However, Beldring *et al.* (2003) used a hierarchical scheme for model validation (Klemeś 1986) which distinguishes between simulations performed for the catchment used for calibration and for a different catchment by noting that the scheme is more adequate than the split-sample scheme using streamflow data from the same catchment during both calibration and validation.

We used histograms or distribution fits (e.g., Schoups & Vrugt 2010) and linear correlation coefficient matrix of the posterior parameters (e.g., Moreda *et al.* 2006; Blasone *et al.* 2007; Schoups & Vrugt 2010) to show parameter uncertainty and identifiability. The last 50% of the posterior parameters accepted by the DREAM algorithm after the burn-in iterations (Vrugt *et al.* 2009) were used to construct the histograms of posterior parameters and to calculate the correlation coefficients among the posterior parameters. Burn-in iteration refers to discarding an initial portion of the samples to minimize the effects of initial conditions (Hailegeorgis & Alfredsen 2015). Hailegeorgis & Alfredsen (2015) provided more details of the DREAM algorithm used in the present study.

## RESULTS

Figure 2(a)–2(c) and Figure 3(a)–3(c) display performance of the LC and MRWA of the models for the NSE and NSEln, respectively. For many catchments, the performance of the three models seems to be close but for some catchments (e.g., catchment 15), the HBV model performed markedly better than the others did. There are tradeoffs of reduction in performance due to the parsimony by fixing some of the parameters to their RMedP and MMRMedP values for the PP case as the large number of free parameters favours calibration performance for the FP case. The LC performance of FP is better than that of the PP for all catchments for the three models. For the MRWA, the NSE values of the FP are higher than that of the PP for the majority of the catchments except for catchments 2, 12 and 17 for the Kirchmod and BGM models, and catchments 2, 13 and 19 for the HBV model (Figure 2). This may be related to different levels of model performance sensitivity to the fixed parameters among the catchments. Generally, the MRWA for the FP case performed better than the PP case in terms of performance for individual catchment.

Similarly, the NSEln values of the FP are higher than that of the PP except for slightly higher NSEln values for some catchments, for instance catchment 2 for the Kirchmod and BGM models. Table 3 shows the regional median values of the PM or the regional performance of the models. In terms of the regional median of the NSE corresponding to the LC and MRWA, the Kirchmod model followed by the BGM model performed better than the HBV model (Table 3). However, the NSE for the Kirchmod and BGM are nearly similar for the FP case. In terms of the regional median of the NSEln corresponding to the LC and MRWA, the Kirchmod model followed by the HBV model performed better than the BGM model except for the FP case for MRWA (Table 3). However, performance of the HBV model and BGM model are nearly similar.

Local calibration (LC) | Regional calibration (MRWA) | |||||||
---|---|---|---|---|---|---|---|---|

Models | FP:NSE | PP:NSE | FP:NSEln | PP:NSEln | FP:NSE | PP:NSE | FP:NSEln | PP:NSEln |

Kirchmod | 0.69 | 0.62 | 0.74 | 0.71 | 0.54 | 0.53 | 0.67 | 0.67 |

BGM | 0.68 | 0.60 | 0.71 | 0.62 | 0.53 | 0.49 | 0.63 | 0.59 |

HBV | 0.62 | 0.49 | 0.71 | 0.67 | 0.47 | 0.42 | 0.64 | 0.58 |

Local calibration (LC) | Regional calibration (MRWA) | |||||||
---|---|---|---|---|---|---|---|---|

Models | FP:NSE | PP:NSE | FP:NSEln | PP:NSEln | FP:NSE | PP:NSE | FP:NSEln | PP:NSEln |

Kirchmod | 0.69 | 0.62 | 0.74 | 0.71 | 0.54 | 0.53 | 0.67 | 0.67 |

BGM | 0.68 | 0.60 | 0.71 | 0.62 | 0.53 | 0.49 | 0.63 | 0.59 |

HBV | 0.62 | 0.49 | 0.71 | 0.67 | 0.47 | 0.42 | 0.64 | 0.58 |

Figure 4(a)–4(c) present the NSE values for the validation period for both LC and MRWA. For the validation period, only 12 catchments exhibited NSE ≥ 0.50 for both FP and PP cases for the LC of the Kirchmod model. Only eight and six catchments exhibited NSE ≥ 0.50 for FP and PP cases, respectively, for the MRWA of Kirchmod model. Only nine and eight catchments for FP and PP cases, respectively, exhibited NSE ≥ 0.50 for both local calibration and MRWA for the BGM model. For the HBV model, only eight catchments exhibited NSE ≥ 0.50 for both FP and PP cases for the local calibration while only six catchments exhibited NSE ≥ 0.50 for both FP and PP cases for the MRWA. However, for the calibration period up to 23 and 16 catchments, respectively, exhibited NSE ≥ 0.50 for the LC and MRWA. Therefore, the results of split-sample validation indicated a marked deterioration of the NSE for both the FP and PP cases for the three models.

Table 4 presents the regional median NSE for the validation period for both the LC and MRWA, and regional median NSEln for the MRWA. In terms of the regional median of the NSE corresponding to the LC, the Kirchmod and HBV models exhibited equally better performance followed by the BGM model for the FP case while the Kirchmod model performed better followed by the BGM model and HBV model for the PP case (Table 4). For the MRWA, the Kirchmod model performed better followed by the HBV model and BGM model for both FP and PP cases. In terms of the regional median of the NSEln corresponding to the MRWA, the Kirchmod model performed better while the BGM and HBV models performed equally for the FP case. However, for the PP case, the Kirchmod model performed better followed by the BGM model while the HBV model exhibited the worst performance. The marked deterioration in performance of the HBV model for the PP case is most probably attributable to fixing the three parameters of the soil moisture accounting routines, namely, *FC*, *LP* and *β* to their RMedP values (Table 2) in addition to parameters that are common to the three models. Therefore, the validation results also show that the Kirchmod model performed relatively better than the BGM and HBV models.

Models | LC: NSE (FP) | LC: NSE (PP) | MRWA: NSE (FP) | MRWA: NSE (PP) | MRWA: NSEln (FP) | MRWA: NSEln (PP) |
---|---|---|---|---|---|---|

Kirchmod | 0.43 | 0.44 | 0.36 | 0.33 | 0.44 | 0.35 |

BGM | 0.38 | 0.36 | 0.33 | 0.23 | 0.38 | 0.25 |

HBV | 0.43 | 0.30 | 0.35 | 0.28 | 0.38 | − 1.37 |

Models | LC: NSE (FP) | LC: NSE (PP) | MRWA: NSE (FP) | MRWA: NSE (PP) | MRWA: NSEln (FP) | MRWA: NSEln (PP) |
---|---|---|---|---|---|---|

Kirchmod | 0.43 | 0.44 | 0.36 | 0.33 | 0.44 | 0.35 |

BGM | 0.38 | 0.36 | 0.33 | 0.23 | 0.38 | 0.25 |

HBV | 0.43 | 0.30 | 0.35 | 0.28 | 0.38 | − 1.37 |

Figure 5(a)–5(c) present values of calibrated parameters for the FP and PP, and RMedP or MMRMedP values of the fixed parameters for the PP case. The values of the calibrated parameters for the FP and PP are different, which show the sensitivity of calibrated parameters to fixing some of the parameters, i.e., the calibrated parameters compensate for the fact that some parameters were fixed to their RMedP (Equation (7)) or MMRMedP (Equation (9)) values. Figure 6(a)–6(f) present the histograms and ‘best-fit’ distributions fitted using the Statistics Toolbox 9.0 in Matlab for the posterior parameters obtained from the DREAM algorithm. The calibration resulted in different types of ‘best-fit’ posterior distributions of the parameters while the uniform prior distribution (Table 2) was used for all.

For the FP case, the three parameters of the Kirchmod model exhibit narrow posterior distributions (Figure 6(a)) indicating less parameter uncertainty compared to the parameters for the BGM model (Figure 6(c)) and HBV model (Figure 6(e)). In addition, some parameters like the coefficient for the storage–discharge relationship (*k*) of the BGM model and the slow flow recession coefficient (*k _{0}*) of the HBV model exhibit narrow posterior distributions. Even though there are equal numbers of calibrated parameters in the Kirchmod and HBV response routines, wider posterior distributions (hence large uncertainty) for the HBV response routine parameters for the FP case probably indicate less sensitivity of the response routine parameters and interactions between the soil moisture accounting routine and the response routine parameters for the HBV model. For the PP case, posterior distributions of calibrated parameters are wider than the FP cases (i.e., large uncertainty) for the Kirchmod (Figure 6(b)), BGM (Figure 6(d)) and HBV (Figure 6(f)) models.

Table 5 shows correlation matrices of posterior parameters as a measure of identifiability of the parameters. The correlation matrices showed considerable interactions among some parameters manifested by large positive or negative correlations. Positive correlation coefficients greater than 0.60 were observed between the regression parameters *b _{0}* and

*b*for the Kirchmod model for both FP and PP cases. The two parameters support each other to influence the discharge sensitivity for the change in storage (

_{1}*g*(

*Q*)) based on Equation (2), which shows challenges of parameter non-identifiability even for parsimonious parameterization.

Full parameter transfer (FP) | ||||
---|---|---|---|---|

Kirchmod | EvR | b _{1} | b _{0} | |

EvR | 1.00 | 0.16 | 0.46 | |

b _{1} | 1.00 | 0.87 | ||

b _{0} | 1.00 | |||

BGM | S _{max} | k | n | I _{c} |

S _{max} | 1.00 | 0.73 | − 0.29 | − 0.36 |

k | 1.00 | − 0.79 | − 0.37 | |

n | 1.00 | 0.22 | ||

I _{c} | 1.00 | |||

HBV | k _{1} | k _{0} | PERC | |

k _{1} | 1.00 | − 0.75 | 0.88 | |

k _{0} | 1.00 | − 0.71 | ||

PERC | 1.00 | |||

Partial parameter transfer (PP) | ||||

Kirchmod | EvR | b _{1} | b _{0} | |

EvR | 1.00 | − 0.33 | 0.09 | |

b _{1} | 1.00 | 0.89 | ||

b _{0} | 1.00 | |||

BGM | S _{max} | k | I _{c} | |

S _{max} | 1.00 | 0.54 | 0.22 | |

k | 1.00 | 0.30 | ||

I _{c} | 1.00 | |||

HBV | k _{1} | k _{0} | PERC | |

k _{1} | 1.00 | − 0.94 | − 0.12 | |

k _{0} | 1.00 | 0.41 | ||

PERC | 1.00 |

Full parameter transfer (FP) | ||||
---|---|---|---|---|

Kirchmod | EvR | b _{1} | b _{0} | |

EvR | 1.00 | 0.16 | 0.46 | |

b _{1} | 1.00 | 0.87 | ||

b _{0} | 1.00 | |||

BGM | S _{max} | k | n | I _{c} |

S _{max} | 1.00 | 0.73 | − 0.29 | − 0.36 |

k | 1.00 | − 0.79 | − 0.37 | |

n | 1.00 | 0.22 | ||

I _{c} | 1.00 | |||

HBV | k _{1} | k _{0} | PERC | |

k _{1} | 1.00 | − 0.75 | 0.88 | |

k _{0} | 1.00 | − 0.71 | ||

PERC | 1.00 | |||

Partial parameter transfer (PP) | ||||

Kirchmod | EvR | b _{1} | b _{0} | |

EvR | 1.00 | − 0.33 | 0.09 | |

b _{1} | 1.00 | 0.89 | ||

b _{0} | 1.00 | |||

BGM | S _{max} | k | I _{c} | |

S _{max} | 1.00 | 0.54 | 0.22 | |

k | 1.00 | 0.30 | ||

I _{c} | 1.00 | |||

HBV | k _{1} | k _{0} | PERC | |

k _{1} | 1.00 | − 0.94 | − 0.12 | |

k _{0} | 1.00 | 0.41 | ||

PERC | 1.00 |

Correlation coefficients in bold fonts are those >0.60 or < −0.6.

For the BGM model for the FP case, there is a positive correlation greater than 0.6 between *S _{max}* and the coefficient

*k*and there is a large negative correlation (

*r*≤ 0.6) between the exponent parameter

*n*and

*k*, which show that the

*S*

_{max}and

*k*support each other while

*n*and

*k*compensate each other according to Equation (6) for computation of the subsurface drainage. For the PP, there is no case of

*r*> 0.6 or

*r*≤ 0.6 for the BGM model that shows parameterization by fixing the

*n*in the subsurface drainage equation resulting in reduction of parameter correlations, which fulfilled the intention of fixing the parameter

*n*.

For the HBV model, there is a positive correlation greater than 0.60 between the quick flow recession coefficient (*k _{1}*) and percolation to the lower zone (

*PERC*) for the FP case. This shows that an increase in

*k*for the discharge from the upper zone (

_{1}*Q*) compensates the decrease in the upper zone storage due to an increase in the

_{UZ}*PERC*. However, there is less correlation between

*k*and

_{1}*PERC*for the PP case, most probably due to fixing the soil moisture accounting parameters. There is a large negative correlation (

*r*≤ 0.6) between

*k*and

_{0}*PERC*for the HBV for the FP case, which shows that the two parameters compensate each other for the baseflow contribution from the lower reservoir (

*Q*). For the HBV model, there is a large negative correlation (

_{LZ}*r*≤ 0.6) between the response routine parameters

*k*and

_{0}*k*for both FP and PP cases. This compensation between the discharge from the upper and the lower reservoirs in the response routine regardless of the parsimony obtained by fixing the parameters of the soil moisture accounting routine indicates higher challenges of parameter non-identifiability in the multiple storage HBV model.

_{1}## DISCUSSION

### Performance of calibration and validation

Local and regional calibration-based model performance for the NSE (Figures 2 and 3) indicate that the Kirchmod and BGM models provided better performance for the majority of the catchments. However, the HBV model provided best NSE and NSEln performance for local calibration for some catchments, e.g., catchments 15, 17 and 19. Generally, the best performing model varies among the catchments and performance measures and hence it is not possible to identify a unique model structure for the region. This complies with the uniqueness of place (Beven 2000) and previous findings that one cannot expect similar calibration performance for a model across different ranges of magnitudes of streamflow series (Gupta *et al.* 1998; Wagener *et al.* 2001; Madsen 2003). Lee *et al.* (2005) investigated whether it is justifiable to use one model structure to cover a range of catchment types and found that there is no evidence of relationships between catchment type and preferred model structure. The authors found the results based on classification of 28 catchments over a range of hydrological types and wide geographical extent in the UK based on different combinations of three catchment characteristics, namely, catchment area, a baseflow index from the hydrology of soil types classification and annual average rainfall for the period 1941–1970.

Due to higher values of regional median NSE, the Kirchmod and the BGM models are more suitable than the HBV model for the MRWA, which has a potential for prediction of high flows in ungauged basins. For the NSEln, the Kirchmod model provided higher performance than the HBV and BGM models; however, the HBV model provided slightly higher NSEln than the BGM model, probably due to separate simulation of baseflow from the lower reservoir for the HBV model. Hailegeorgis *et al.* (2015b) found similar performance of the MRWA to other more advanced regionalization methods and hence selection of the models based on their MRWA performance for the PUB is valid for the region. The ‘top-down’ Kirchmod model, which is based on single catchment storage–discharge relationships and does not consider an infiltration excess overland flow, performed better in terms of regional NSE and NSEln than the BGM model that considers both the infiltration excess and saturation excess runoff generation mechanisms and the HBV routines with multiple storage reservoirs. However, the general trends in performance of the three models are very close to each other for the majority of the catchments except for some catchments, e.g., catchment 15.

Deterioration of the NSE and NSEln from their values obtained for the LC were observed for the MRWA for nearly all of the catchments (Figures 2 and 3). The NSE and NSEln values for both the LC and MRWA are lower for the PP case than the FP case (Figures 2 and 3) for the majority of the catchments. These show that despite the fact that parsimony could be achieved by fixing some of the parameters to their RMedP or MMRMedP values, there are tradeoffs of noticeable deterioration in performance. The catchments with poor NSE and NSEln are of different sizes and located in different parts of the study region. However, the majority of these catchments are located far from precipitation gauging stations and thus the lesser representativeness of the precipitation stations probably affected the performance for these catchments.

The model validation using the split-sample test showed that the NSE for both LC and MRWA deteriorates for the outside calibration period (Figure 4 and Table 4). Similarly, the NSEln of the MRWA deteriorates for the validation period. For instance, the NSE values for the validation period for LC of the BGM model for catchment 6 are 0.40 and 0.39 for the FP and PP, respectively, (Figure 4(b)) compared to NSE values of 0.83 and 0.81 for the FP and PP, respectively, for LC for the calibration period (Figure 3(b)). Hailegeorgis *et al.* (2015a) obtained NSE values of 0.84 for both calibration and validation periods by calibrating catchment 6 by using only streamflow records for the catchment. The NSE values for the validation period for the LC of the HBV model for catchment 6 are 0.35 and 0.49 for the FP and PP, respectively, (Figure 4(c)) compared to NSE values of 0.74 for both the FP and PP for the calibration period for the LC (Figure 3(c)). Hailegeorgis & Alfredsen (2015), based on calibration of the HBV model for catchment 6 by using streamflow data only from the catchment, obtained NSE values of 0.75 and 0.71, respectively, for calibration and validation periods. The results demonstrated that performance of calibration using only streamflow data for a particular catchment would probably result in an optimal parameter that has better transferability in time. Split-sample test for validation of the regional calibration methodology used in the present study, which uses all available streamflow records from all catchments, is not common in the literature. However, the results of the present study comply with the study by Beldring *et al.* (2003), who found that regional calibration of a model failed to model the dynamics of hydrological processes for several catchments based on a hierarchical scheme for model validation.

However, there are merits of the multi-basin regional calibration to derive regional parameters, which yields the MRWA PM to transfer these parameters in space for prediction in ungauged basins in the region. The multi-basin and regional calibration approach would provide an opportunity for a more comprehensive evaluation of models better than the proxy basin (Klemeś 1986; Wrede *et al.* 2013) approach. Fenicia *et al.* (2011) proposed a flexible framework for conceptual hydrological modelling, SUPERFLEX, with one of the objectives being towards a more robust and reliable performance in operational contexts. For operational purposes, combined flexible models and multi-basin-based identification of robust and reliable model structures, parameterizations and modelling paradigms (e.g., ‘bottom-up’ process models and ‘top-down’ inferences from observations) among a pool of plausible competing options are advisable. Currently, fixed model and catchment scale modelling are more common due to their simplicity and less computational demand.

The model calibration based on continuous time series and model evaluations based on different performance measures (e.g., NSE and NSEln) could not necessarily yield optimal parameter sets, which can simultaneously simulate floods associated with high rainfall and snowmelt events, and low flows especially when extrapolated to the streamflow magnitude outside the calibration conditions. Wagener & McIntyre (2005), on identification of rainfall–runoff models for operational applications, suggested that a more empirical approach to identification of models for specific forecasting problems are preferable to trying to achieve a good all-round representation of the rainfall–runoff processes. Calibration for a specific modelling objective or reproducing a specific runoff signature may provide reliable prediction for the specific purpose.

### Parameter uncertainty and identifiability

Uhlenbrook *et al.* (1999) found considerable implications of parameter uncertainty and identifiability on the predictive uncertainty, and noted that parameter and model structure uncertainties should be considered for operational (practical) predictions. Wider posterior distributions (i.e., large uncertainty) of calibrated parameters for the PP case than the FP case for the Kirchmod (Figure 6(b)), BGM (Figure 6(d)) and HBV models (Figure 6(f)) show that parsimony in the number of parameters and longer data series for calibration do not necessarily provide less parameter uncertainty. However, while comparing the models for the FP case, narrow posterior parameter distributions of the Kirchmod (Figure 6(a)) compared to the other runoff response routines (Figures 6(c) and 6(e)) indicate that a small number of free parameters exhibits least parameter uncertainty. In addition to the parsimony, the model structure based on the ‘top-down’ modelling paradigm and relationship between catchment storage and discharge inferred from streamflow recession analysis might have contributed to the reduction in parameter uncertainty. For a given model structure, there is a likelihood of less predictive uncertainty from less parameter uncertainty, but uncertainties due to input data also contribute to the predictive uncertainty.

Few pairs of response routine parameters exhibit correlation coefficients (*r*) with either *r* > 0.60 or *r* ≤ 0.60 (Table 5). The parsimony for the PP case reduced correlations in runoff response routine parameters of the BGM and HBV models rather than the Kirchmod model. In terms of parameter correlations, the BGM model benefited much more from the parameterization of the subsurface drainage equation based on fixing the exponent parameter. Correlation of parameters results in lack of identifiability because a change in one parameter is compensated by a change in another, such that multiple parameter sets give the same output according to some quantity of interest (Libelli *et al.* 2014). The existence of either positive or negative correlations is an indication of non-identifiability of parameters and hence the potential for non-identifiability of the performance of the models, which is one of the main challenges in precipitation–runoff modelling. Hailegeorgis & Alfredsen (2015) found that compensation between the discharge from the upper reservoir and baseflow from the lower reservoir in the different HBV configurations resulted in indistinguishable streamflow hydrographs but less reliable baseflow simulation by some of the configurations.

The differences in the values of the calibrated parameters for the FP and PP cases (Figures 5(a)–5(c)) show the sensitivity of runoff simulation to the fixed parameters, and compensations and correlations among the parameters. Parameterization issues have potential impacts on regionalization based on transferring of parameters for PUB. Therefore, regionalization of precipitation–runoff models should be augmented by preliminary parameter sensitivity analysis to determine which parameters to transfer. The quality of input (both climate and streamflow data) should also be able to constrain the model parameters during calibration.

### Data quality

The expected conditions for the model calibration is that there is no considerable error in the observed streamflow data and uncertainty in estimation of precipitation fields is low. Errors in the observed streamflow and errors in estimation of precipitation fields have the potential to affect the reliability of calibrated (optimized) parameters. However, the discrepancies in the data potentially affect the reliability of modelling inferences and predictions, which is one of the challenges in hydrological modelling. The density and representativeness of precipitation gauging stations are crucial to capture the spatial variability of precipitation, for instance, localized intense precipitation events to reproduce the flood events. Sparse gauging networks for the hourly precipitation input, which may yield less accurate spatially interpolated precipitation fields on the 1 × 1 km^{2} grids, seem to be a major factor for the low NSE or poor estimation of peak flows. Engeland & Steinsland (2014) mentioned that they applied a hydrological forecasting model at daily time step for small size catchments (with time of concentration less than 1 day) in southwestern Norway due to the availability of most input data at daily resolution, which matches the current daily hydropower scheduling models. In addition to the density of precipitation data, the density of streamflow data is also important for the regional modelling. Pokhrel & Gupta (2011) noted the importance of multiple (high-density) streamflow gauging stations at interior catchments and exploiting the spatial information on soil moisture and evapotranspiration to infer the spatial catchment variability from streamflow hydrographs and for better identification of models.

## CONCLUSIONS

We conducted identification of three spatially distributed precipitation–runoff response models based on multi-basin local and regional calibration based on calibration and transfer of both full parameter (FP) and partial parameter (PP) for hourly runoff simulation in mid-Norway. The best performing model structure varies among the catchments, which may be related to the uniqueness of catchments. Different best performing models for a catchment were observed for different PM, which is attributed to different sensitivities of the PM to various parts of the hydrograph and different quality of streamflow records on various parts of the hydrograph. However, models were identifiable based on their overall regional performance and the calibration and validation results indicated that the Kirchmod model performed best. Even though it is not possible to identify a single best performing model structure for the entire catchments in the region, a flexible model and multi-basin-based regional modelling framework were found to be necessary for comprehensive identification of reliable model structure, parameterizations and modelling paradigms for specific objectives of prediction and for prediction in ungauged basins (PUB).

The parsimonious ‘top-down’ model (Kirchmod) provided the least parameter uncertainty for the full parameter transfer (FP). However, parsimony could not guarantee parameter identifiability due to the considerable correlations among the calibrated parameters. The deterioration of performance due to fixing of some of the parameters to their regional median or multi-model regional median values for the partial parameter transfer (PP) substantiates the need for preliminary assessment of parameter sensitivity to identify which parameters to transfer to minimize the tradeoffs between performance and parsimony. In addition, marked deterioration of performance measures for the validation period for the calibration objective function used in the present study, which uses streamflow records from all catchments in the region, indicate tradeoffs in regional calibration for parameter transfer in space for PUB and parameter transfer in time. Therefore, temporal validation tests for this type of regional calibration algorithm by using the split-sample scheme is indispensable. Performance of local calibration by using only at-site records for each catchment should be evaluated compared to the local calibration results obtained from the regional calibration methodology used in the present study, which use streamflow records from all catchments in the region.

Dense hourly precipitation gauging networks, which can provide more accurate spatially interpolated precipitation on the 1 × 1 km^{2} grids, are required for improved hourly prediction, especially for high flows and for improved identification of hourly P-R models for the region. In addition, streamflow measurements from dense hydrological gauging networks or spatially distributed observations of rainfall have the potential to improve multi-basin local and regional calibration-based identification of models for the hourly prediction.

## ACKNOWLEDGEMENTS

The Center for Environmental Design of Renewable Energy or CEDREN funded the project. We obtained the climate data from the Norwegian Meteorological Institute, Nord Trøndelag Elektrisitetsverk (NTE) and bioforsk We found the streamflow and land use data from the Norwegian Water Resources and Energy Directorate and the stream networks from a 1:50,000 map from the Norwegian Mapping Authority. We wish to thank the anonymous reviewers for their comments, which improved the manuscript.