Abstract
Simulating rare widespread hydrological events can be difficult even with the use of modelled data such as the UKCP18 12 km regional climate projections. To generate larger event sets for application in catastrophe modelling, two statistical approaches are highlighted and applied to widespread GB-generated flooding events using a grid-based hydrological model and UKCP18 regional projections. An Empirical Copula method was applied on a national scale, generating over 600,000 events across two time-slices (1980–2010 and 2050–2080). This was compared to model-generated events and showed good matching across time-slices and ensemble members, although lacked some ability to describe the least-rare events. The Empirical Copula was also compared to an implementation of a conditional exceedance model. This model was much more computationally intensive so was restricted to Northwest England but offered the ability to be tuned more finely through choices of marginal distributions. Analysing over 11,000 events, it also matched well with the Empirical Copula and model-generated events but under-represented the smallest events. Both approaches require a broad dataset to draw from but showed reasonable efficacy. For simple statistics, the Empirical Copula shows the potential to be a powerful tool in exploring spatial structure over large regions or at a fine spatial resolution.
HIGHLIGHTS
The Empirical Copula and the Heffernan–Tawn model both can be used effectively to generate national-scale extreme flooding event sets for risk analysis.
The Empirical Copula proved more stable over larger areas due to much lower computational costs.
The methods were used to investigate and highlight the differences in the spatial structure of widespread flooding events between 2010 and 2080.
INTRODUCTION
Quantifying risk due to extreme high flow events on a regional or national scale requires knowledge of the correlation of floods occurring in many places at once. One common approach, often applied in catastrophe modelling, is to refer to a suite of observed or derived scenarios, and measure the frequency of key features, including the loss associated to an event, to understand the return period or likelihood of an event generating a similar amount of loss occurring. Sayers et al. (2002) discuss the benefits having a risk-centred approach to flood hazard management. However, naively using sets of observed events, or events modelled on current or previous climatic, meteorological or hydrological conditions may neglect the possibilities of changes over time. Climate change, and its impact on global hydrology, is a clear driver of this, and much work, such as the Third Climate Change Risk Assessment (Slingo 2021) makes use of long time series and ensembles of model runs to perform analysis of future scenarios.
Flood risk assessments sometimes suffer from a lack of modelling of spatial dependence, due to comprising a collection of local-scale risk assessments, rather than a harmonious national-scale one. Keef et al. (2009, 2011) and Villarini et al. (2011) showed that in the UK and United States that there is positive dependence between locations for the most extreme events, and so there has been a drive to perform risk analysis (via catastrophe modelling via event sets, or otherwise) on a national scale through the use of hydrological modelling to describe the impacts of flooding.
As discussed in Zanardo & Salinas (2022), a large event set is essential in catastrophe modelling to be able to determine the relative frequency of events with different levels of risk or loss, as there is no well-known natural approach to theoretically model the frequency of any specific class of event. There are several approaches to generating a large event set for risk analysis. A perturbed parameter ensemble (PPE) can be used (Kelder et al. 2020; Kay et al. 2021) to cover a wide set of possible outcomes, especially if only one hydrological model is used. Alternatively, stochastic weather generators can use random processes and Monte Carlo approaches to generate a large number of possible rainfall events to feed into a hydrological model (Wright et al. 2020; Matte et al. 2021). Instead of directly using high-dimensional or high-resolution observed or modelled events and time series, various methods apply interpolation (Towe et al. 2016) or kriging (Laaha et al. 2014) to estimate likely values of flow between observed locations, preserving hydrological coherence.
Climate models, however, are often focussed on short time windows (<100 years) and therefore each model run is likely to have a very small number of the most extreme events, which can often vary greatly in their characteristics (duration, area, peakedness), and so fail to capture the full range of possible extreme events. To this end, this paper proposes two statistical approaches (as opposed to hydrological process-based models) to specifically generate a much wider range of extreme events for use in risk analysis. In this paper, the focus is on method comparison rather than examining changes in spatial coherence over time. Two methods, the Heffernan–Tawn (HT) model and the Empirical Copula approach, will be defined and applied on a national and regional scale across Great Britain, and a comparative analysis was undertaken, using a grid-based hydrological model to produce a reference dataset to seed the two statistical methods, and form a reference dataset with which to compare the two approaches.
DATA
UKCP18 climate data
UK Climate Projections 2018 (UKCP18) provides information on potential changes in a range of climate variables over the 21st century, via a number of different products (Murphy et al. 2018; Lowe et al. 2018). These data have previously been used to analyse multiple hazards in the UK and how they are expected to differ in the future due to climate change (Kay 2021; Kay et al. 2021; Arnell et al. 2021).
The UKCP18 Regional projections (Met Office Hadley Centre 2018) comprise a 12-member PPE of the Hadley Centre Regional Climate Model (RCM), nested in an equivalent PPE of the respective Global Climate Model (GCM). This work uses ensemble members 01–15, excluding 02, 03, and 14. GCM ensemble members 02, 03, and 14 do not have an RCM nested within, so have no regional projections. Ensemble member 01 uses the standard parameterisation. The ensemble covers the period December 1980 to November 2080 under an RCP8.5 emissions scenario (Representative Concentration Pathway) (Riahi et al. 2011). The rainfall, temperature, and potential evapotranspiration (PE) used in the present work is on a 12-km spatial resolution and a daily timestep covering England, Scotland, and Wales for a synthetic 360-day year (30 days per month). The data were available re-projected from the native climate model grid to a 12 km grid aligned with the GB national grid, with the exception of the PE data, which was computed from other UKCP18 variables using a modified Penman–Monteith approach (Hough et al. 1997). To allow easy comparability between time-slices, observed flow time series were not used in this paper since these are not available for the 2051–2080 time-slice.
METHODS
Grid-to-Grid model
The Grid-to-Grid hydrological model (Bell et al. 2009) is a grid-based runoff-production and flow-routing model, where flow is routed from grid-cell to grid-cell. It models the river flow across Great Britain based on data such as meteorological, topographical, soil type, and land-cover data. It requires input time series of gridded precipitation, daily maximum and minimum temperature, and PE. In total, 19,914 grid-cells were identified on the river network with at least 50 upstream grid-cells (equating to a theoretical catchment area of 50 km2). Daily flow time series for baseline (1981–2010) and future (2051–2080) time-slices for these 19,914 grid-cells.
The next step of the pipeline took these daily outputs of flow and extracted ‘widespread, extreme events’.
Reference event extraction
Full details are provided in the companion paper (Griffin et al. 2022b), with a brief summary provided here. At each grid-cell, a threshold (POT2) was chosen such that daily flow exceeded it on 2 days per year on average during the baseline time-slice. This corresponds to a daily probability of exceedance (PoE) of approximately 0.56%. The same threshold was used for both time-slices for comparability. Events were defined as periods of consecutive days where more than 0.1% of the 19,914 river grid-cells across GB exceeded the threshold at the same time. PoEs were initially calculated using daily data. To convert from daily to annual return periods, a Poisson approximation was used based on a 360-day year: .
The events were then summarised by taking the maximum flow at each grid-cell over the duration of the event. Events were restricted to 2 weeks (14 days) to mitigate the possibility of two consecutive but independent events being counted as a single event. From this, annual PoEs and return periods were computed. In order to simulate national events, the Empirical Copula and the HT model need marginal distributions for each grid-cell that cover the entire flow regime. Since no clear probability distribution can accurately cover the full regime of flow while giving the appropriate heavy tails at the upper end, a mixed distribution was selected, as in Towe et al. (2019). This is a mixed distribution combining an empirical distribution describing each time series for values of flow below threshold, and a GPa distribution on those values above the threshold. In the rest of this paper, event extent is only defined by exceedances, so the choice of an empirical sub-threshold distribution does not affect the analysis of area or peak return period.
To ensure a good fit of the GPa distribution for the most extreme events, the 60 largest independent peaks at each grid-point were found using the peak-extraction algorithm as used by the NRFA (https://nrfa.ceh.ac.uk/), which stipulates two requirements for events to be independent at a specific location:
Two peaks are separated by at least three times the time-to-peak (geometric mean of times from the start of rainfall to maximum flow) at that location. As an approximation, this was simplified in this analysis to have a time-to-peak of 7/3 days, so peaks were separated by at least 7 days.
The river flow between two peaks must drop below two-thirds the height of the lower peak.
Events were selected starting from the largest and excluding any dependent peaks before selecting the next largest remaining peak. For a specific event, the flood extent area is the maximum number of 1 km2 grid-cells simultaneously above their threshold during a selected event.
For regional events, the reference event set was divided by hydrometric area (Figure 2). In each region, only events which saw threshold exceedances in the specified region were kept, leading to a slightly smaller event set. Since events without exceedances do not contribute to either of the statistical models below, this reduced reference event set did not reduce performance.
Statistical event generation
In both of the following methods, the events were derived by choosing a ‘seed’ event from the reference set, which was used to simulate a new event more specifically. Since these simulated events were not based on a full-time series and so not technically at any specific time (other than baseline or future), we assigned the season and duration of the simulated events to be the season and duration of the ‘seed’ event.
Empirical Beta Copula
The first approach that was implemented to generate new events on a national scale was the Empirical Beta Copula (EC) method (Rüschendorf 2009; Segers et al. 2017). This approach, which has not been used in hydrological modelling before, is a fast method to generate widespread flooding events which scales well to these very large datasets. Events in the ‘reference’ (G2G) dataset (of N events) are ranked at each grid-point i, so each grid-point has one rank (1 to N) for each event n. Additionally, a mixed distribution with cdf Gi as above was fitted to the event set at each grid-point (not the whole time series). A set of N uniform random numbers ui,n in [0,1] are generated for each grid-point and ordered according to the ranks of the reference set. To simulate a new event, an event n is drawn uniformly at random, and the values are the simulated values of flow for that event. Estimated PoEs are based on the mixed distributions Fi based on the whole time series. This procedure generated 600,000 events: two sets of 25,000 events for each ensemble member, based on baseline and future time-slices.
HT model
The HT model (Heffernan & Tawn 2004) is a conditional exceedance model, designed to describe the probability of one site experiencing a threshold exceedance, given an exceedance is being experienced at another specific site. It was implemented using a modified version of the texmex package (Southworth et al. 2020), modified to accommodate the larger number of points being investigated; the number of parameters being fitted increases quadratically compared to the number of grid points, and so computational complexity increases similarly. In the present study, there were limitations on computation power, which restricted the work to just the NorthWest (NW) region, as in Figure 2.
Using the HT method, 400 extra widespread events were simulated for the NW region, for each ensemble member for baseline and future time-slices (12 × 2 × 400 = 9,600 events). Neal et al. (2013) undertook a similar exercise with a bootstrapping approach, rather than the use of an RCM ensemble.
RESULTS
National results
Regional case study
The EC method also appears to match well in terms of area and return period, although both methods struggle to simulate the ‘smallest’ extreme events, due to the conditioning on at least one location exceeding its at-site threshold. The HT method seems to match the reference event set in terms of spread of area-return period combinations (Figure 8) but on each individual margin, there are noticeable differences.
DISCUSSION AND CONCLUSIONS
Generating widespread extreme events for risk calculation analysis is an exercise in optimisation: the trade-off between getting enough events to give the full breadth of possibilities and having the time and processing power to achieve it. At one end of the scale is the UNSEEN programme (Kelder et al. 2020), which has made excellent use of the available processing power, and at the other is risk analysts just making use of observed events, but with immediate results. This paper presents two options which offer a half-way house, generating a wide set of events using a fraction of the processing power and time. In context, each 30-year hydrological model simulation takes on the order of a day, whereas the EC method can generate many more extreme events in a matter of minutes given the same computational resources.
Using the UKCP18 Regional projection data under an RCP8.5 scenario, the Grid-to-Grid hydrological model was run for the 12-member RCM ensemble for two periods: 1980–2010 and 2050–2080. To generate a greater number of possible widespread events, the Empirical Copula and HT model were used on a national and regional scale, respectively.
The analysis shows that there are some differences between the two statistical approaches and how well they match the G2G-derived reference event set (as close as one can get to a ‘truth’). The HT approach (with a more parametric model characterising the inter-site extremal dependence) has the possibility to work extremely well, if the best model can be identified (type of marginal, etc.). The standard R package for undertaking this modelling (texmex; Southworth et al. 2020) has limitations, but the model itself (as shown here and in Neal et al. 2013) has the potential to be effective The EC approach (which directly uses an empirical view of the rank dependence structure) shows the ability to better explore the state space of possible widespread extreme events. However, since choices still have to be made (marginals and thresholds) the use of the EC approach to extrapolate greatly beyond the extremes of the reference event set is not recommended.
A fuller analysis of the differences between baseline and future behaviour of widespread extreme events is covered in Griffin et al. (2022b), but a few key points can be observed here. First, there is a slight trend towards widespread events having a greater area but a similar maximum pointwise return period. On a national scale there is also an increased occurrence of the most extreme events (with a return period exceeding 100 years, based on thresholds derived from the baseline period). However, in a closer inspection for the NW region, the most extreme events in terms of return period do not change in distribution between baseline and future, suggesting most of the change in events is in spatial spread or in the structure of the more frequent events. It may also suggest that the big changes in widespread flooding dynamics are occurring outside the NW region.
The Empirical Copula approach shows the ability to simulate a wide range of statistically possible widespread events (under various choices and assumptions), but it would be interesting, and prudent, to test the plausibility of such simulated events by using them as initial/boundary conditions for hydraulic models on a smaller scale; or whether such events could be replicated directly from a hydrological or hydraulic model. There is already a natural extension of the HT model to multi-day events (Keef et al. 2009), so it would be of interest to see how well the Empirical Copula approach could be extended to such events. Applications to other types of hazards, such as widespread drought or direct analysis of extreme precipitation, could be fruitful. It would be enlightening to investigate methods of incorporating non-stationarity into both the Empirical Copula and HT methods through the use of non-stationary marginals or copulas (Xu et al. 2019). This could also more easily allow the use of design life levels for a better description of return period under non-stationarity.
Future work could investigate whether the Empirical Copula and HT methods work as well on true observed data, however for comparability between time-slices this is not investigated here. Since the principles of the methods are independent of the hydrology, one would expect similar performance, but it would be enlightening to check this. Additionally, it should be noted that all the present work is based on modelled data, and so the application of EC and HT to observed data would be interesting, whether the loss of variance in events would be detrimental to capturing the spatial dependence. Adjusting this for future scenarios is also a highly non-trivial extension.
ACKNOWLEDGEMENT
Funding for the project was provided through the UK Climate Resilience Programme supported by UK Research and Innovation, the UK Met Office, and from the Natural Environment Research Council.
AUTHOR CONTRIBUTIONS
E.S. and P.S. managed the project, A.K. ran the hydrological modelling, A.G. ran the event extraction and summary, and performed the statistical analysis. All authors assisted in writing and editing the manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. All relevant data are available from the Environmental Data Centre (Griffin et al. 2022a).
CONFLICT OF INTEREST
The authors declare there is no conflict.