## Abstract

Estimating peak river discharge, a critical issue in engineering hydrology, is essential for designing and managing hydraulic infrastructure such as dams and bridges. In the UK, practitioners typically apply the Flood Estimation Handbook (FEH) statistical method which estimates the design flood as the product of a relatively frequent flow estimate (the index flood, IF) and a regional growth factor. For gauged catchments the IF is estimated from observations. For ungauged catchments it is computed through a multiple regression model. While the FEH IF method provides peak flow estimates that are statistically robust, it does not readily take into account catchment heterogeneity or effect of environmental change on river flows. This study presents a new methodology to estimate the IF at national scale using continuous simulation from a physically based hydrological model (Grid-to-Grid). The methodology is tested across Great Britain and compares well with IF estimates at 550 gauging stations (R^{2} = 0.91). The promising results for Great Britain support the aspiration that continuous simulation from large-scale hydrological models coupled with increasing availability of global weather and climate products, could be used to estimate design floods in regions with limited gauge data or affected by environmental change.

## INTRODUCTION

*et al*. 2008; Blöschl

*et al*. 2013; Hrachowitz

*et al*. 2013). The index flood (IF) method (Dalrymple 1960) is a widely used regionalization procedure and forms the basis of the UK national standard methods with the Flood Estimation Handbook (FEH) (Institute of Hydrology 1999). The method is based on the assumption that, for all the sites inside a ‘hydrologically homogeneous’ region, the AMAX frequency distributions are identical apart from a local scaling factor (IF [m

^{3}s

^{−1}]). This assumption allows the computation of any p-th quantile at any location i-th as: where is the index flood at location i and [−] is the regional growth curve, a dimensionless quantile function assumed to be identical for all the sites in the region.

Various approaches have been developed to provide reliable estimates of IF, and Bocchiola *et al*. (2003) provide a summary of some of the most widely used. Broadly, if the site of interest is gauged, the IF can be estimated by direct methods, i.e., from the AMAX time series, using the sample mean (Dalrymple 1960; NERC 1975; Hosking & Wallis 1993), the sample median (Robson & Reed 1999), or using peak over threshold analysis (Chow *et al*. 1988; Robson & Reed 1999). If the site of interest is ungauged, a variety of ‘indirect’ methods have been proposed to estimate IF. The most commonly used are empirical methods (Hirsch *et al*. 1992; Meigh *et al*. 1997; Kjeldsen & Jones 2009) that relate the IF evaluated by AMAX measurements to a set of morpho-climatic catchment descriptors such as area, slope, average annual rainfall, land use, etc. These methods include coefficients that are usually estimated by least squares (e.g., Stedinger & Tasker 1985), maximum-likelihood (e.g., Kjeldsen *et al*. 2008) and Bayesian methods (e.g., Haddad *et al*. 2012). The uncertainty in the IF estimate attributable to the data used in the regression model fitting was quantified by Jaafar *et al*. (2012). Other indirect approaches for estimating IF and flow quantiles are based on the use of artificial neural networks (e.g., Hall *et al*. 2002; Shu & Burn 2004; Dawson *et al*. 2006) or on the connection between stochastic rainfall models and lumped event-based rainfall–runoff models (Córdova & Rodríguez-Iturbe 1983; Brath *et al*. 1992; Calver *et al*. 2005; Kjeldsen *et al*. 2005; Rigon *et al*. 2011). Limitations of the latter modelling approach are: (i) the simplified assumptions for the hydrological model component; (ii) the requirement of catchment initial moisture conditions; (iii) the assumption of high simplified and uniform rainfall storms in the catchment.

Indirect estimation of IF based on continuous physically based hydrological model simulations has also been explored in recent years. The advantages of such an approach include: (i) taking into account catchment heterogeneity; (ii) accommodation of temporal and spatial rainfall variability; and (iii) ability to provide a consistent IF estimate for multiple points on the river network. Demonstrations of the use of continuous, physically based model simulations for flood frequency analysis are provided for various catchments by Cameron *et al*. (2000), Calver *et al*. (2005), Moretti & Montanari (2008), Viviroli *et al*. (2009) and Camici *et al*. (2011), but to our knowledge, only Ravazzani *et al*. (2015) have used continuous hydrological model simulation for estimating the IF. They applied the model FEST-WB (Montaldo *et al*. 2007; Rabuffetti *et al*. 2008) to reconstruct river flows for an alpine basin in the northern part of Italy and to predict the IF.

For a gauged location, an estimate of the IF recommended by the FEH is the median of the observed AMAX. This corresponds to the two-year return period flow which is considered a good estimate of the bankfull river discharge. If less than 14 years of AMAX data are available, the FEH suggests use of peak over threshold data. For ungauged sites, the Environment Agency Flood Estimation Guidelines (2012) recommends use of the regression model of Kjeldsen *et al*. (2008) to estimate the IF. Practitioners are also advised that data transfer from donor catchments to the site of interest can improve the accuracy of IF estimates (Kjeldsen & Jones 2007). The ‘donors’ are gauged catchments hydrologically similar to the site of interest (i.e., located upstream or downstream on the same river, or possessing similar size and land use).

Next we present a general methodology section to estimate the IF at national scale using continuous hydrological simulation. This approach aims to: (i) integrate the indirect methods for IF estimation and address their limitations for larger and spatially heterogeneous catchments; and (ii) provide effective tools for IF estimation in ungauged or poorly gauged catchments. The following section tests the methodology in Great Britain and then it is assessed by comparison with estimates of the IF at 550 gauging stations.

## METHODOLOGY

The area-wide physically based hydrological model Grid-to-Grid (G2G, Bell *et al*. 2007a, 2007b, 2009) has been used to estimate the IF at national scale. The G2G typically operates at a 1 km^{2} resolution across Britain and has been configured to represent spatial variability in catchment response. The model uses landscape information provided by gridded spatial datasets of elevation, soil and geology in preference to the identification of model parameters through catchment calibration, and for the application discussed here, a single model configuration and set of parameters is applied across Britain (i.e., with no catchment specific calibration). G2G model configuration and inputs are discussed in the subsection ‘Grid-to-Grid model set-up and input data’. Model output consisting of river flow time series at each 1 km^{2} river grid-cell are used to construct maps of AMAX across Britain and to estimate the IF following the FEH methodology (Institute of Hydrology 1999). AMAX in the UK are taken as the highest flow value recorded in a water year, which runs from October to September. G2G modelled IFs were compared to measured IFs for 550 gauged sites using observations obtained from the National River Flow Archive (NRFA). Modelled and measured IFs were compared using a linear regression, together with an analysis of the sensitivity of model performance to morpho-climatic catchment descriptors. The agreement between the G2G-derived and the measured IF was evaluated by: (i) quantifying the coefficient of determination; and (ii) assessing the uncertainty in IF estimate using the factorial standard error (FSE) (Kjeldsen 2015). Maps of model residuals (differences between modelled and measured IF) provide additional information on regions and types of catchment where the model performs best (and worst). Finally, the temporal trends of modelled and measured AMAX were compared to assess the model capability in detecting observed long-term trends. We have used the Mann–Kendall test (MK, Kendall 1975) with permutations, that provides a measure of the significance of potential trends in time. This method is presented in detail by Kundzewick & Robson (2000) and has been used in several applications (e.g., Hannaford & Marsh 2006, 2008; Mediero *et al*. 2015). The procedure is as follows: (i) randomly re-order the AMAX time series to provide a large number of samples with no replacement; (ii) perform Mann–Kendall trend test to each sample; (iii) rank the trend test results; and (iv) compute the trend test for the original time series. If the derived trend for the original series falls outside the [0.05, 0.95] percentile range of the ranked values, it is deemed to be significant at the 95% confidence level, indicating a change in the magnitude of the AMAX over the analysed period. The statistical tests have been performed on both measured and modelled AMAX providing test values (including the direction of the trend) and significance assessments for a trend in both the measured and modelled series.

### Grid-to-Grid model set-up and input data

The Grid-to-Grid Model (Bell *et al*. 2007a) is a grid-based hydrological model that simulates surface and sub-surface runoff, lateral movement of soil-moisture, and river flow on a national scale. The model, which is configured on a 1 km^{2} resolution grid, includes both runoff production and flow routing components. In the runoff production scheme, each 1 km^{2} cell maintains a water balance with input consisting of precipitation (rainfall or snowmelt) and losses from evaporation. Lateral drainage, percolation and groundwater flow are simulated according to Equations (1)–(11) in Bell *et al*. (2009). A probability distribution model (Moore 1985, 2007) invoked for each grid cell ensures realistic quantities of saturation excess surface runoff even when the soil is not fully saturated. Subsurface and surface flow are simulated using two coupled one-dimensional kinematic wave equations. The G2G model, which primarily simulates naturalized flow, was designed to be applied across wide areas, not calibrated to individual catchment conditions. Model parameters are dependent on spatial datasets of elevation, soil type and land cover (short grass or urban/sub-urban areas), and soil hydraulic parameters are linked to the dominant soil-type in a grid-square. Nationally applied parameters (e.g., kinematic wave speed for land and river channels) were determined through manual adjustment to obtain good flow estimates for as many catchments as possible, and thus not calibrated to individual catchment conditions. Over Britain G2G is typically applied at a 1 km^{2} grid resolution and a 15-minute time-step, and is configured using spatial datasets of topography, soil and land cover. Applications include flood forecasting (e.g., Cole & Moore 2009) and assessment of climate change impacts on floods and snowmelt (e.g., Bell *et al*. 2007b, 2009, 2016). The most recent version of the model, as presented in Bell *et al*. (2016) was tested over Great Britain for the period 1960–2011. Driving data consist of: (i) daily precipitation observations on a 1 km^{2} grid (CEH GEAR: Keller *et al*. 2015), equally subdivided for each 15 minutes of the day; (ii) monthly PE estimates on a 40 km^{2} grid (MORECS: Hough & Jones 1997) spread equally through the month and applied equally to each 1 km^{2} box within each 40 km^{2} (Bell *et al*. 2009, 2016); and (iii) daily minimum and maximum temperature observations on a 5 km^{2} grid for 1960–2014 (Perry *et al*. 2009), which were applied through the day using a sine curve and downscaled to 1 km^{2} using a lapse rate and elevation data (Morris & Flavin 1990). Model output consisting of 15 minute river flows were used to provide AMAX values for 1 km^{2} river grid-cells across Britain.

## STUDY AREA AND DATA AVAILABILITY

The study region includes 550 catchments from England, Scotland and Wales. They are part of the United Kingdom peak flow dataset (version v4.1) obtained from ‘The National River Flow Archive’ (NRFA 2017; Dixon *et al*. 2013) and available at http://nrfa.ceh.ac.uk/. For the purposes of this analysis, we used the instantaneous peak flow AMAX values and a set of catchment descriptors consisting of: the catchment area (AREA [km^{2}]); the average annual rainfall (SAAR, [mm]) for the period 1961–1990; the base flow index based on the hydrology of soil types classification presented in Boorman *et al*. (1995) (BFIHOST [−]), which reflects the geology of the site and has typical values that range from below 0.2 (highly impermeable) to above 0.8 (highly permeable); the mean distance between each pixel of the basin and the outlet (mean drainage path length, DPLBAR, [km]), and the extent of urban and suburban land cover during the year 2000 (URBEXT2000, [−]). Table 1 summarizes these catchment properties in terms of the mean, minimum, maximum and standard deviation value over the chosen set of 550 catchments. Of the 810 catchments for which peak flow data are available in Great Britain, 260 have been excluded for various reasons, including catchment size and how well the gauged flows are thought to represent actual flows. Specifically, 225 catchments where DPLBAR <10 km and area <50 km^{2} have been excluded from the comparison of simulated and observed peak flows as modelled flows for these relatively small catchments were most likely to be adversely affected (underestimated) by the use of daily mean rainfall. These catchments have a faster hydrological response and probably the use of hourly rainfall data would be more appropriate to mimic the instantaneous peak flows. A modest number of catchments (35) were excluded due to strong anthropogenic influences including: (i) the presence of an artificial channel that modifies the natural flow-paths; (ii) unreliable rating curves due to the lack of high flow measures; and (iii) strong influence of reservoirs or groundwater abstraction on the flow regime. Figure 1 presents a map of the study area, the location of the gauges selected for the analysis (black points), and the excluded gauges (white points).

. | AREA [km^{2}]
. | SAAR [mm] . | BFIHOST [−] . | DPLBAR [km] . | URBEXT2000 [−] . |
---|---|---|---|---|---|

Minimum | 55 | 558 | 0.24 | 10 | 0 |

Median | 203 | 962 | 0.47 | 19 | 0.009 |

Maximum | 9,931 | 2,913 | 0.96 | 140 | 0.592 |

Stand. dev. | 935 | 401 | 0.14 | 18 | 0.085 |

. | AREA [km^{2}]
. | SAAR [mm] . | BFIHOST [−] . | DPLBAR [km] . | URBEXT2000 [−] . |
---|---|---|---|---|---|

Minimum | 55 | 558 | 0.24 | 10 | 0 |

Median | 203 | 962 | 0.47 | 19 | 0.009 |

Maximum | 9,931 | 2,913 | 0.96 | 140 | 0.592 |

Stand. dev. | 935 | 401 | 0.14 | 18 | 0.085 |

## RESULTS AND DISCUSSION

### Model verification and IF map estimation

A linear regression model was fitted to the measured and modelled log-transformed IF values for 550 catchments. The G2G model was executed for the whole simulation period (1960–2014) and the modelled IF in a given gauged station was computed using the modelled AMAX values corresponding to the period for which the measurements were available. Figure 2(a) shows a scatterplot of 550 G2G and observation-derived IFs in logarithm scale, together with the derived linear regression model plot, and Table 2 shows the summary statistics of the linear regression model. The high values of the t-ratio, computed as the coefficient estimated value divided by its estimated standard deviation, give an indication that the estimated coefficients are statistically different from 0. The coefficient of determination R^{2} = 0.91 summarizes the goodness of fit. Following Kjeldsen (2015), given the large number of catchments for which the model was evaluated (550) it is reasonable to assume that the prediction variance can be approximated by the variance of the regression model residuals, s = 0.15. Under this assumption it is possible to evaluate the FSE of the model FSE = 1.47. The latter defines the 68% and 95% confidence intervals for the regression model as [q· *FSE*^{−1}; q· *FSE*] and [q· *FSE*^{−2}; q· *FSE*^{2}] respectively (Kjeldsen 2015), where q indicates given discharge value. In our case, q corresponds to the median of the AMAX. The FSE presented in this study is comparable with the FSE values of the regression models currently used in FEH which are based on the AMAX measurements of 600 gauging stations. The original FEH IF regression model reported an FSE value of 1.56 and a R^{2} value of 0.92 (Robson & Reed 1999). The revised model improved them to FSE = 1.43 and R^{2} = 0.94 by assuming that the correlation between model errors is a function of the geographical distances between gauging stations (Kjeldsen *et al*. 2008). The scatterplot of G2G and observation-derived IFs and the summary statistics of the linear regression model for the 810 catchments are presented and commented upon in the Supplementary material (Figure S1 and Table S1, available with the online version of this paper). Figure 2(b) presents a map of the residuals between modelled and measured IF using a logarithmic scale. The residuals are close to zero across most of Britain, with a modest underestimation in central and southwest England, and a similarly modest overestimation in the southeast. A significant factor contributing to the underestimation is the contribution of short-duration intense rainfall events to peak river flows in central and southern Britain, which will be poorly represented by daily gridded rainfall observations, while the overestimation in southern and eastern Britain can, for many groundwater-dominated catchments, be attributed to the effects of artificial abstractions which are not currently included in the G2G model formulation.

Intercept . | T-Stat intercept . | Scaling exponent . | T-Stat scaling exponent . | Residual stand. dev . | R^{2}
. |
---|---|---|---|---|---|

0.41 | 8.995 | 0.99 | 76.681 | 0.386 | 0.910 |

Intercept . | T-Stat intercept . | Scaling exponent . | T-Stat scaling exponent . | Residual stand. dev . | R^{2}
. |
---|---|---|---|---|---|

0.41 | 8.995 | 0.99 | 76.681 | 0.386 | 0.910 |

Figure 2(c) presents a map of the modelled IF (m^{3} s^{−1}) on a logarithmic scale, for the period 1960 to 2014. The IF is typically higher in the north and west of Britain, and in major rivers. The use of continuous G2G model simulation provides a consistent spatial and temporal dataset to explore whether there has been a significant change in the IF over the last 50 years. Figure 2(d) presents a map of the change in the derived IFs between two periods: 1960 to 1986 and 1987 to 2014. The changes range from an increase in the IF of up to 45 m^{3} s^{−1} (predominantly in the north and west) to a decrease of −40 m^{3} s^{−1} in parts of southeast Britain. This regional split is broadly in line with the increased trends detected in measured mean daily flows since the early 1960s in Scotland and, to a lesser extent, Wales and western England (Hannaford & Marsh 2008). However, Hannaford & Marsh (2008) pointed out that the analysis of trends in some areas was limited by the available length of record.

The use of continuous model simulation provides a method of estimating the IF with a 91% agreement with observation-derived estimates for 550 catchments across Britain. In order to investigate whether this agreement is influenced by catchment properties a series of analyses relating model fit with properties such as area, drainage path length, urban extent and baseflow index were undertaken. For each catchment property, the catchment values were divided into deciles (i.e., the nine values that divide the sorted data into ten equally sized subsamples) and measured and modelled IF for each catchment property subgroup were compared. Figure 3 presents ten scatterplots and the coefficient of determination (R^{2}) of linear models fitted to the results for the catchment property AREA. The title of each scatterplot specifies the AREA range [km^{2}] of each class, for example, the first plot is for catchments which range in area from 53 to 80 km^{2}, the second from 80 to 110 km^{2}, etc. Similar results are presented for percentage of urban extent (URBEXT2000) in Figure 4, and for baseflow index (BFIHOST) and drainage path length (DPLBAR) in the Supplementary material (Figures S2 and S3, available with the online version of this paper). The model fit is robust in the sense that it is not strongly affected by the catchment properties. The decile range in R^{2} is 0.82–0.90 for AREA, 0.78–0.92 for URBEXT2000, 0.81–0.93 for BFIHOST and 0.84–0.91 for DPLBAR. These figures indicate relatively high levels of agreement between modelled and measured IF estimates, suggesting that the quality of the G2G estimated IF is relatively unaffected by different catchment properties and can provide estimates of consistent quality across various types of catchment (e.g., small, steep or urbanized catchments).

### AMAX trend analysis

In the previous section, we assessed whether AMAX output from a G2G continuous simulation could be used to estimate the measured IF by comparing the median of observed and simulated AMAXs over several decades. Climate, anthropogenic or natural changes at the catchment scale can lead to long-term trends in observed AMAX. For this reason, it is important to ensure that if AMAX from continuous hydrological simulation are used in place of observed AMAX, they can also reproduce observed trends in river flows. This trend analysis has now been undertaken on 285 catchments, selected from the original 550, for which at least 40 years of measured flow data are available. Results have been compared for the 69 catchments where the trend for the measured AMAX presented a significant test at the 95% significance level and are shown in Figure 5. No results are available for Scotland because the two criteria of at least 40 years of measured flow data available and trend with a 95% significance level were not matched.

Figure 5 shows that: (i) for 59 catchments positive trends were detected in both modelled and measured AMAX and (ii) for 10 catchments the trend in the modelled series is not in agreement with the direction of the trend in the measured AMAXs series. These catchments are predominantly located in the southeast part of England and for all of them the NRFA archive suggests that the runoff is affected by at least one of these reasons: (a) reservoir in the catchment; (b) presence of industrial or agricultural abstraction; and (c) presence of water supply and groundwater abstractions. This anthropogenic influence, which is not modelled in the current version of G2G, may potentially explain the differences between measured and modelled AMAX trend in time for those basins.

## CONCLUSIONS

In this paper, we demonstrate how use of continuous flow simulation by a national-scale distributed hydrological model (such as G2G) can be used to estimate key parameters such as the IF required for flood estimation methods. The comparison between IFs estimated from current (FEH) and continuous simulation methods for 550 catchments throughout Great Britain indicates a good correlation between the two methods (R^{2} = 0.91, FSE = 1.47). We have also demonstrated that AMAX from continuous hydrological simulation can reproduce observed trends in the measured AMAX (agreement in 90% of the analysed catchments), indicating the potential for applying the methodology under conditions of non-stationarity.

This initial assessment of continuous simulation from a national-scale hydrological model (G2G) for estimating the IF is encouraging and demonstrates that the new method can complement current approaches overcoming some limitations, such as the assumption of spatially homogeneous rainfall over the catchment and climate non-stationarity. Other benefits of the proposed new method include estimation of IFs in catchments subject to anthropogenic change, which at present can only be estimated using observed flows in naturalized catchments and require a correction to take into account the extent of urbanization. Here, the accuracy of IF estimates from G2G continuous simulation is shown to be relatively unaffected by catchment properties such as area and urban extent, indicating that the methodology is robust for a variety of catchment types, as long as the continuous hydrological simulation is able to take into account the many factors (natural and anthropogenic) affecting river flows.

Countries such as Britain, for which an extensive network of flow and rain gauges can support existing observation-based FEH methods, provide ideal test conditions for assessing the ability of alternative model-based flood estimation methods, such as continuous simulation from large-scale hydrological models, to underpin methods for flood estimation in data-sparse regions. It is to be hoped that the increasing availability and accuracy of global weather, climate and hydrological products can be used to develop a robust methodology to help engineers estimate design floods in regions with limited gauge data or affected by environmental change, potentially saving many lives.

Although we have shown that continuous hydrological models can be very useful in flood frequency estimation, the paper does not aim to completely replace the methods used in current practice. Finally, future work could address some of the limitations of the proposed methodology concerning: (i) the processes of flood formation in small or groundwater dominated catchments; (ii) the use of national sub-daily rainfall forcing data; and (iii) the use of methods for evaluating the uncertainty of hydrological models.

## ACKNOWLEDGEMENTS

This work was supported by CEH National Capability funding. The authors are grateful to the Editor and to the two anonymous reviewers for their helpful comments.

## REFERENCES

*Flood Estimation Handbook*(FEH)