## Abstract

Non-stationary methods of flood frequency analysis are widespread in research but rarely implemented by practitioners. One reason may be that research papers on non-stationary statistical models tend to focus on model fitting rather than extracting the sort of results needed by designers and decision makers. It can be difficult to extract useful results from non-stationary models that include stochastic covariates for which the value in any future year is unknown. We explore the motivation for including such covariates, whether on their own or in addition to a covariate based on time. We set out a method for expressing the results of non-stationary models as an integrated flow estimate, which removes the dependence on the covariates. This can be defined either for a particular year or over a longer period of time. The methods are illustrated by application to a set of 375 river gauges across England and Wales. We find annual rainfall to be a useful covariate at many gauges, sometimes in conjunction with a time-based covariate. For estimating flood frequency in future conditions, we advocate exploring hybrid approaches that combine the best attributes of non-stationary statistical models and simulation models that can represent changes in climate and river catchments.

## HIGHLIGHTS

We explore why and how to include physical variables as covariates in statistical models of flood frequency.

We develop and illustrate methods for extracting flow estimates from such models so that practitioners can obtain useful results.

Practitioners now have tools and guidance to apply non-stationary methods for flood management in England.

## INTRODUCTION

As the impacts of environmental change become increasingly evident in time series, non-stationary methods of frequency analysis have become widespread in research. Examples include analysis of rainfall frequency, extreme sea levels, and fluvial flood frequency, which is the primary focus of this article.

The application of non-stationary methods by practitioners involved in engineering and environmental management is much more limited. We surveyed practitioners and flood management authorities in seven countries, asking whether their currently recommended methods of flood and/or rainfall frequency estimation accounted for non-stationarity over the period of measurements (Luxford & Faulkner 2020). Apart from methods that account for the sea level rise, non-stationary methods of flood frequency estimation were not mandated by flood management authorities or generally used by practitioners in the United States, Canada, Australia, the Netherlands, Germany, or (until 2020) the United Kingdom. The only example that we found of them being used by practitioners before 2020 was in Switzerland, where the Federal Office for the Environment has fitted a range of non-stationary models to peak flow data from many catchments (BAFU 2017).

There are many possible reasons why the large body of research on non-stationary methods is not much implemented in practice. François *et al.* (2019) suggest that these include the proliferation of approaches and the challenges of diagnosing and modelling drivers and effects of change. Here, we focus on another barrier to implementation: the difficulty in extracting useful results from some non-stationary statistical models of flood frequency. Research papers tend to focus on fitting non-stationary statistical models rather than extracting the sort of results needed by designers and decision makers (Schlef *et al.* 2023). There is a particular difficulty in extracting results from models that use stochastic physical variables as covariates. This difficulty is well illustrated in a paper by Hesarkazzazi *et al.* (2021) who present results from non-stationary models of flood frequency in north-west England that show jumps from year to year, conditional on the accumulated rainfall during the year. Results such as these cannot be directly used in flood risk management because every year the rainfall will be different. The authors identified a need for further research into defining a frequency distribution for the covariates when introducing an extra stochastic component into a model. Similar comments apply to the papers by Šraj *et al.* (2016) and Chen *et al.* (2021b).

The motivation for including physical covariates is worth exploring further. The most common approach to non-stationary flood frequency estimation is to model the non-stationarity only as a function of time (Hesarkazzazi *et al.* 2021). However, many authors include physical quantities as covariates such as annual rainfall (Yan *et al.* 2017; Chen *et al.* 2021b; Hesarkazzazi *et al.* 2021), extreme rainfall (Prosdocimi *et al.* 2014), temperature (Hesarkazzazi *et al.* 2021; Wasko 2021), urban extent (Prosdocimi *et al.* 2015), or climatic indices such as the North Atlantic Oscillation (NAO) or East Atlantic (EA) pattern (Steirou *et al.* 2019).

One of the two reasons is typically given for using physically based covariates:

- A.
They help remove some of the year-to-year variability in maximum flows, enabling better identification of time-based trends and better fit of the distribution (Prosdocimi

*et al.*2014; Hesarkazzazi*et al.*2021). - B.
They provide a more physically meaningful model of non-stationarity since time itself has no physical influence on flooding (Chen

*et al.*2021b). As a covariate, time is merely a substitute for some physical quantity that is influencing floods. Some physical covariates may open up the prospect of predicting the future evolution of the flood frequency curve (Šraj*et al.*2016).

Reason A leads to models that include both time and physical quantities as covariates. One important consideration is the effect of collinearity and confounding variables on results. The greater the dependence between covariates, the harder it is to interpret the regression coefficients. When time is one of the covariates, collinearity can be avoided by detrending the physical covariates before inclusion in the statistical model. The time covariate will then represent the presence of any temporal trend in the flood peak series.

Reason B tends to lead to an approach in which the physical variables replace time as a covariate. To model temporal trends in flood series, it is then necessary to include at least one physical covariate in the model which exhibits a time trend. If this approach is to be used to understand future changes in flood risk, there is a need to model the trend in that covariate. The hope is that a covariate can be found for which the trend is easier to model than that in the flood series, perhaps because it has less variability and more predictability into the future. An example might be the extent of urbanisation in a catchment, which can be typically expected to show a monotonic increase over time and can be reasonably predicted into the future under a range of scenarios.

A risk associated with the application of this second approach is the confusion of correlation for causation. As discussed in Section 2, even apparently plausible physical covariates do not necessarily give a full explanation of trends. This can lead to a false sense of confidence about our ability to estimate the future evolution of the flood frequency curve: we might end up with a covariate for which we can confidently predict future values, but which is no more useful than the date as a way of explaining observed trends in flood magnitudes.

This potential for misinterpretation has led some authors to criticise the move to non-stationary frequency modelling or to express caution. The arguments are summarised in the study by Faulkner *et al.* (2019).

There is also a need to quantify the uncertainty associated with the results of non-stationary models, which can be expected to be larger than the uncertainty of stationary estimates of flood frequency (Serinaldi & Kilsby 2015).

Another challenge in implementing non-stationary models in practice is the difficulty of representing future conditions. Schlef *et al.* (2020) provide a classification of approaches for projecting future flood hazards, including trend informed, climate informed, and hydrological simulation. Trend informed refers to non-stationary statistical models with time as the only covariate. It is difficult to justify using such models to predict future trends. Both the climate-informed and hydrological simulation approaches make use of projections from physics-based climate models. The climate-informed approach uses these projections in non-stationary statistical models with climatic covariates. Hydrological simulation is the conventional approach using rainfall–runoff modelling with projected climate inputs, typically carrying out flood frequency analysis for quasi-stationary subdivisions of the climate model output (Schlef *et al.* 2023).

One argument in favour of the climate-informed approach is that it can use climatic variables related to large-scale oceanic-atmospheric patterns, such as mean annual temperature or rainfall, that global and regional climate models can predict more accurately than local short-duration rainfall intensity (e.g., Šraj *et al.* 2016; Schlef *et al.* 2020). Wasko (2021) suggests estimating future flood frequency using non-stationary models with climatic covariates projected using global climate models and the covariates being attributed to observed changes. One such suggested covariate is the product of monthly temperature and monthly rainfall (Towler *et al.* 2010).

On the other hand, even with physically plausible covariates, climate-informed statistical models cannot distinguish correlation from causation (Slater *et al.* 2021a). Such covariates may not necessarily be the sole or main driving mechanism for floods in a particular catchment, but if the covariate is increasing along with the floods, there may be a correlation. For example, although climate change is expected to affect annual rainfall, a common covariate, it can also be expected to influence other factors that control flood magnitudes such as storm tracks, rainfall intensity, and evapotranspiration (which influences soil moisture). These effects cannot be easily represented by the relatively simple covariates used in the climate-informed approach. Hence, it is necessary to demonstrate a mechanistic link between the covariate(s) and flood magnitudes (Wasko *et al.* 2021).

Some similar arguments can be made against the current generation of the hydrological simulation approach. For example, the effects of climate change on storm intensity are not represented in the hydrological simulations that were used to derive current guidance on the impacts of climate change on peak river flows in England. Kay *et al.* (2021) used changes in monthly rainfall and potential evaporation, running simulation models at a daily time step and assuming that daily rainfall scales with the change in monthly rainfall. Since most catchments in England have a critical storm duration of hours up to a few days, changes in monthly rainfall may not be well representative of the intensification of flood-producing rainfall on most rivers.

The hydrological simulation approach can lead to derivation of change factors which practitioners can apply to estimate flood frequency made for a baseline period. For example, in England, change factors vary with river basin, with epoch (2020s, 2050s, or 2080s) and are provided for a range of percentiles which express some of the uncertainty in the climate projections (Environment Agency 2021). One difficulty with this approach is that it tends to assume a stationary baseline period, i.e., climate change is treated as a purely future phenomenon. This assumption is difficult to reconcile with the provision of change factors that show (mainly) increases in peak flows for the present epoch, the 2020s, in comparison with an earlier baseline period.

In the discussion presented in Section 6, we make some suggestions for reconciling the two approaches to estimate the future flood frequency.

In Section 2, we first outline the types of non-stationary models of flood frequency and how they were fitted. We then set out ways of applying such models that can provide the type of information needed by practitioners. We illustrate these new approaches with an application in England and Wales. The remaining sections present the results from this application, discuss our findings, draw conclusions, and point to future steps.

The research originated within a project funded by the Environment Agency in England that produced methods, tools, and guidance to equip practitioners to carry out non-stationary analysis of fluvial flood frequency (Faulkner *et al.* 2020), although the analysis and results presented here include additional model types and findings.

## METHODS

### Non-stationary flood frequency models including physical covariates

The method described here is based on fitting frequency distributions to annual maximum river flows. All results shown here are for the generalised extreme value (GEV) distribution. It is fitted using maximum likelihood estimation.

Thus, for a non-stationary fit, there are two or more elements of the location parameter to estimate a constant component and , , which represent the influence of the covariates on the parameter and the same for the scale parameter.

Physical covariates were included in candidate non-stationary models in accordance with the following:

Up to two covariates per model, with a maximum of one being a physical covariate. Collinearity between the physical covariates was a reason to avoid combining them, as was a need to keep the number of candidate models manageable. For other applications in which covariates describing catchment land cover are combined with climatic covariates, it may be reasonable to allow combinations of physical covariates.

Covariates were considered for modelling either or both of the location and scale parameters.

Models with different physical covariates for the location and scale parameters were not allowed, given the theoretical way by these two parameters link to the properties of the underlying distribution of the hourly river flow data.

- (1)
a stationary version,

- (2)
versions with just time as a covariate,

- (3a)
versions with just one detrended physical covariate,

- (3b)
versions with just one physical covariate as measured (without detrending), and

- (4)
versions with both time and a linearly detrended physical variable.

At gauges where version 3a fits best, it can be inferred that flood magnitude is statistically associated with the value of the (detrended) physical covariate, without any temporal non-stationarity. This can be regarded as a stepping stone to aid understanding, rather than an end in itself, because there is no need to detrend the physical covariate if it is not being combined with time as a covariate. It would seem odd if year-to-year variations were captured by a detrended physical covariate without the long-term changes in that covariate (before detrending) also being important. The value of version (3a) is that it can help to disentangle the effect of any long-term trend from shorter-term cycles or fluctuations in the physical covariates. Comparing the various models helps us to assess if it is the specifics of the physical covariate or simply that it maybe approximately linear over time which is causing the physical covariate to appear statistically significant. For example, if version (3b) performs better than (4), then this may indicate that the same physical covariate is capturing both inter-annual variations and a long-term trend.

The best-fitting model was initially judged by the lowest Bayesian Information criterion (BIC). Before results were extracted from the models, any gauges where a version of model 3a had the lowest BIC were reallocated with the version 3b model that had the lowest BIC.

### Extracting results from non-stationary flood frequency models

Concepts such as return period or annual exceedance probability (AEP) become unwieldy in a non-stationary setting, and various alternative definitions and measures of probability have been proposed. We use the concept of the encounter probability, which is the probability of an event occurring over a defined number of years, such as the expected design life of a flood defence scheme. It is equivalent to the design life level proposed by Rootzén & Katz (2013).

Flood frequency estimates from a non-stationary distribution may change over time and may also depend on any physical covariates. For instance, if the only covariate is annual rainfall, then the flow with a specified AEP given 1,000 mm of rainfall is the flow expected to occur with that AEP under the (clearly hypothetical) conditions that the annual rainfall is always 1,000 mm. We refer to this quantity as the conditional flow estimate.

The conditional flow estimate may be useful when examining the probability of past floods, but it is less informative for design and planning for the present-day or future floods. We introduce the term ‘integrated flow estimate,’ which removes the dependence on a particular value of the covariates. It is defined as the return level corresponding to the encounter probability averaged over covariates in a period of interest. The calculation method is set out below. The concept was introduced by Eastoe & Tawn (2009), who refer to it as the marginal return level. This article describes the first application of the concept of the marginal return level to flood frequency analysis, following a suggestion by Faulkner *et al.* (2019).

This section sets out methods for calculating both conditional and integrated flow estimates, in each case for a non-exceedance probability value of *p*.

‘Let be the conditional distribution function of annual maximum flow, where

xis a vector of covariate values for the year in which the annual maximum is considered, and is the vector of parameters of the distribution, including the coefficients that relate the distribution parameters to the covariates.’

‘Let be the conditional flow estimate for probability

p(the conditionalpth quantile), i.e. conditional on a particular set of valuesxfor the covariates.’

‘Let be the distribution function for annual maximum flow in the period of interest (e.g. current or future time window typically exceeding a year in duration – such as over a design lifetime), where is the vector of parameters of the distribution. This distribution function does not depend on the covariate values of the particular year in which the annual maximum occurs but incorporates the distribution of the covariates over the period of interest.’

‘Let be the integrated flow estimate for probability

p(the marginalpth quantile).’

‘Let be the joint density function of the covariates for the time period of interest, so this distribution could change depending on whether the period of interest is the current or a future time window.’

‘Let be the domain of the covariates.’

The integrated flow estimate for the period of interest is then , such that .

The integrated flow estimate for a year in the period of interest is obtained by inverting the distribution function: . This needs to be solved numerically.

In practice, the parameters of this distribution are not known nor are the true density of the covariates, so to obtain an integrated flow estimate, they need to be replaced with estimates in Equation (4). Here, is estimated by , the maximum likelihood estimate of the parameters, and the joint density of the covariates needs to be replaced by some estimate , for example, the empirical density of the data or a kernel density estimate, which smooths the empirical estimate (Silverman 1998). The data used to construct this kernel density estimate depend on the period of interest.

The integrated flow estimate for probability *p*, , is found from .

Similarly, the conditional flow estimate for probability *p* is obtained from .

The integrated flow estimate should be understood as applying over a period rather than instantaneously. This is a useful concept for planning investment decisions in flood risk management.

If the covariates include both time and physical variables, it is possible to calculate an integrated flow estimate by averaging the probabilities corresponding to the observed physical covariate values, but setting the time covariate to a single value, such as the final year of record. This gives what we term a single-year integrated flow estimate. If the river flow record runs up to the present day, this estimate is representative of the flow expected to be exceeded with a given probability under current conditions, without being conditional on any particular value of a covariate. The single-year integrated flow estimate can be more easily compared with alternative estimates such as those from a model that uses only time as a covariate.

A method for calculating confidence limits of the estimates is presented in the study by Faulkner *et al.* (2020).

### Application in England and Wales

#### Dataset and screening

The method described earlier was applied to the annual maximum flow series at a set of 375 river gauging stations in England and Wales. The dataset was screened to eliminate gauges where apparent non-stationarity might arise from changes in the flow measurement method or the local hydraulics. Details of the screening and of the final dataset are given in Faulkner *et al.* (2020), which also presents the findings of non-parametric testing for trends and change points. These are not included here since our focus is on parametric models of flood frequency and in particular the role of physical covariates.

The range of record lengths was from 27 to 134 years, with a median of 48 years. Data used for model fitting were restricted to the period 1950 onwards, and the start year of the atmospheric circulation index datasets was used.

#### Choice of physical covariates

The following seven variables were selected as trial physical covariates:

Catchment-average rainfall, calculated over the water year (October to September), the autumn, and the winter seasons.

NAO index, averaged over the winter, summer, and autumn.

EA pattern index, averaged over the winter.

Although some studies have used indices of extreme rainfall as covariates (e.g. Chen *et al.* 2021b), we excluded them to avoid shifting the problem into estimating the frequency distribution of an extreme value elsewhere in the hydrological cycle. We also considered only covariates that are expected to be significant across many catchments in preference to those that represent locally specific effects such as urbanisation or changes in forest cover.

Rainfall accumulations were calculated from the Centre for Ecology and Hydrology - Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset which provides daily rainfall on a 1 km grid across the United Kingdom from 1890 (Tanguy *et al.* 2016). NAO and EA indices were obtained from NOAA (2019). Covariates were centred and scaled, subtracting the mean from each observation and dividing the result by the standard deviation.

#### Trends in physical covariates

All covariates were significance tested for trend using the Mann–Kendall trend test, which is recommended for trend detection by the World Meteorological Organization (2009). It is a non-parametric test which is based on the ranks of the data. A full description is given by the World Meteorological Organization (2009). The test was applied over the specific period of record of each individual gauging station. Table 1 summarises the significance of the trends. A large majority of stations show no evidence of significant trends in annual, autumn, or winter rainfall over their catchment at the 5% significance level. At 13% of stations, mostly in the north of England, there is a positive trend in annual rainfall, at the 5% significance level. Trends in the climatic indices NAO and EA depend entirely on the period of record covered by the gauging station since the magnitude of these indices does not vary spatially. A positive trend in winter NAO is common for gauges with records starting in the 1950s–60s. Most stations apart from those starting before 1955 correspond with a period of (weaker) negative trend in the summer NAO.

Covariate . | % of stations with null hypothesis (not trend) not rejected . | % with null hypothesis rejected and positive trend . | % with null hypothesis rejected and negative trend . |
---|---|---|---|

Water year rainfall | 87 | 13 | 0 |

Autumn rainfall | 100 | 0 | 0 |

Winter rainfall | 93 | 7 | 0 |

Winter NAO | 43 | 57 | 0 |

Summer NAO | 10 | 0 | 90 |

Autumn NAO | 100 | 0 | 0 |

Winter EA | 25 | 75 | 0 |

Covariate . | % of stations with null hypothesis (not trend) not rejected . | % with null hypothesis rejected and positive trend . | % with null hypothesis rejected and negative trend . |
---|---|---|---|

Water year rainfall | 87 | 13 | 0 |

Autumn rainfall | 100 | 0 | 0 |

Winter rainfall | 93 | 7 | 0 |

Winter NAO | 43 | 57 | 0 |

Summer NAO | 10 | 0 | 90 |

Autumn NAO | 100 | 0 | 0 |

Winter EA | 25 | 75 | 0 |

#### Model fitting

At each gauging station, all five versions of the GEV models described in Section 2 were fitted, i.e., one stationary model, one with time as a covariate and three versions with various implementations of physical covariates, some in conjunction with time. For these latter three, each of the seven physical covariates was tested. Covariates were applied to either or both of the location and scale parameters.

This resulted in 109 candidate models for each gauging station, which were then compared. The calculations used the R package nonstat (Warren & Longfield 2020).

## RESULTS

In presenting the results, we start by asking how much improvement in fit results from adding physical covariates to models versus modelling non-stationarity only as a function of time. We examine which model versions are preferred most often across England and Wales and investigate any geographical patterns or relationships with catchment properties. We then move on to ask which physical covariates appear in the preferred models, and again whether there are geographical patterns in this aspect of the results. Finally, we present flow estimates extracted from the preferred models, first at two example sites and then a comparison at the national scale between the preferred models and stationary models, including an assessment of confidence intervals.

### Preferred model versions

Model version . | (1) Stationary: No covariates . | (2) Non-stationary: time the only covariate . | (3a) Temporally stationary with detrended physical covariates (subsequently replaced with 3b) . | (3b) Non-stationary: measured physical covariates . | (4) Non-stationary: time and detrended physical covariates . |
---|---|---|---|---|---|

% of gauges | 2% | 1% | 26% | 48% | 24% |

Model version . | (1) Stationary: No covariates . | (2) Non-stationary: time the only covariate . | (3a) Temporally stationary with detrended physical covariates (subsequently replaced with 3b) . | (3b) Non-stationary: measured physical covariates . | (4) Non-stationary: time and detrended physical covariates . |
---|---|---|---|---|---|

% of gauges | 2% | 1% | 26% | 48% | 24% |

The results show that physical covariates (versions 3 and 4) give an improvement in fit over the stationary model at 97% of gauges. A large majority of these (74% of all gauges) were version (3): models without time as a covariate. In contrast, at most gauges, version (2) models with time as the only covariate give a slightly worse fit than the stationary model. There are some exceptions where version (2) models give a substantial improvement in fit. It is clear that physical covariates are adding useful information in nearly all cases. The increase in model complexity is outweighed by the increase in goodness of fit.

These findings indicate that regressing against covariates which aim to pick up only linear trends ignores more statistically important additional sources of inter-annual variation. Even when linear trends are removed from covariates (model 3a), there is a near-universal improvement in model fit compared with the stationary model. This is an important finding which hints at a causal relationship between the physical covariates and flood flows.

Figure 1 shows little difference in the fit of model versions (3a), (3b), and (4), although (3b) more commonly has the lowest BIC (Table 2). If the selection of models is limited to only (3b) and (4), at 64% of gauges, version (3b) fits better than (4). This is as expected, indicating that the linear trends captured by the time covariate included in version (4) can be often sufficiently represented by progressive changes in a physical covariate. Version (3b) has an advantage of simplicity over (4), which has more covariates.

Of versions (3a) and (3b), version (3b) shows the better fit at 71% of gauges. Again, this is expected to be the case, given the tendency for most gauges to show some progressive trend in annual maximum flow, and version (3b) is more likely to be able to capture than the detrended physical covariate used in version (3a).

*t*-test found no significant differences, at a 5% significance level, in the mean of any of the characteristics tested: catchment area, mean gradient, mean annual rainfall, soil type, urban extent, influence of lakes, and extent of floodplains. It appears that there is no consistent physical difference between catchments for which non-stationarity can be modelled using the chosen physical covariates and those for which an additional unknown covariate (represented by water year) adds useful information to the model. Neither is there any clear difference in the locations of these two groups of catchments (Figure 2).

All the following discussion considers results only from the preferred model at each gauge. As mentioned earlier, gauges where a version of model 3a had the lowest BIC were reallocated with the version 3b model that had the lowest BIC.

### Preferred physical covariates

When physical covariates are selected, the annual rainfall is by some way the most common choice (Table 3). It is included as a covariate for the location parameter at 66% of all gauges, and for the scale parameter at 22% of all gauges (the scale is modelled as a constant, with no covariate, at most gauges). In some cases, annual rainfall is included alongside time as a covariate, and in others, it is the sole covariate.

Covariates for the location parameter . | Covariates for the scale parameter . | ||||
---|---|---|---|---|---|

Rank . | Covariate . | % of gauges where chosen . | Rank . | Covariate . | % of gauges where chosen . |

1 | Annual rain | 66 | 1 | None (i.e., parameter is constant) | 62 |

2 | Winter rain | 22 | 2 | Annual rain | 22 |

3 | Time | 15 | 3 | Time | 13 |

4 | Autumn rain | 5 | 4 | Winter rain | 4 |

Covariates for the location parameter . | Covariates for the scale parameter . | ||||
---|---|---|---|---|---|

Rank . | Covariate . | % of gauges where chosen . | Rank . | Covariate . | % of gauges where chosen . |

1 | Annual rain | 66 | 1 | None (i.e., parameter is constant) | 62 |

2 | Winter rain | 22 | 2 | Annual rain | 22 |

3 | Time | 15 | 3 | Time | 13 |

4 | Autumn rain | 5 | 4 | Winter rain | 4 |

*Note*: Percentages in the columns can add up to more than 100, because models can include both time and a physical covariate together.

The second most useful physical covariate was the winter rainfall, chosen at 22% of gauges for the location parameter and 4% for the scale parameter.

There is also an association between the choice of physical covariate and the average annual rainfall of the catchment (Figure 3(b)). For example, where covariates other than annual, winter, or autumn rainfall are preferred, this tends to be on catchments with high rainfall.

### Role of time as a covariate

When the water year is included as a covariate, the sign of its regression coefficient indicates the effect that it is having on the estimated flows. In most cases (83%), the coefficient on the location parameter is positive, so the magnitudes of flood flows are modelled as increasing over time. For the scale parameter, the picture is more mixed: the coefficient is positive in 58% of cases. At these stations, the variability of flood flows is modelled as increasing over time.

### Example results from non-stationary models

*y*-axis.

The covariates for the preferred model at Leven Bridge (Figure 4(a)) are water year and annual rainfall for the location parameter, with the scale parameter modelled as a constant. There is an upward trend in the location parameter. The non-stationary flow estimates are distinctly higher than the stationary, especially for low probabilities. They also have much wider confidence limits. The highest flood on record, 125 m^{3}/s, is associated with an encounter probability over the 48-year record length of 38% according to the stationary results and just under 80% according to the non-stationary results.

The results at Bedburn (Figure 4(b)) are quite different. The covariates here are water year and autumn NAO for the location and water year for the scale, with a strong upward trend for both parameters. The physical covariate appears to add little to the quality of model fit since the model with only water year achieves a similar BIC. For this catchment, the non-stationarity is thought to be due mainly to changes in land cover, specifically afforestation and felling. For all except low probabilities, the non-stationary estimates are lower, and with narrower confidence limits, than the stationary. Findings of this type are to be anticipated as the inclusion of covariates reduces variation which the stationary model takes as implying that the data have a heavier tail (Carter & Challenor 1981).

At both stations, the non-stationary estimates lie within the 90% confidence interval of the stationary results.

### Results on a national scale

For all three AEPs shown in Figure 5, the median ratio of flow estimated from the preferred model (where it is non-stationary) to the flow from the stationary model is close to 1. The ratios can, however, be much larger, particularly for the 1% AEP (typically used as a standard for spatial planning in England). At one gauge, the non-stationary flow estimate is nearly seven times larger than the stationary estimate. This particular example is due to poor estimation of the shape parameter at a short record (33 years). It is also possible that flow estimates from the preferred non-stationary models can be smaller than stationary estimates, with the minimum ratio for the 1% AEP being 0.54.

At most stations, the flow estimates from the preferred model fall within the 90% confidence interval of the stationary estimates. This is the case at 77% of stations for an AEP of 50% and at 87% of stations for the 1% AEP. At most of the other stations, the non-stationary estimates lie above the 90% confidence interval of the stationary. Likewise, at most stations, the estimates from the stationary model fall within the (typically wider) 90% confidence interval of the non-stationary estimates. This is the case at 82% of stations for AEP 50% and at 90% of stations for AEP 1%.

## DISCUSSION

### Model fitting and selection

One reason why the non-stationary models occasionally give results much larger than the stationary may be poor estimation of the shape parameter, particularly at sites with a shorter record. At all four stations where the non-stationary estimate at the 1% AEP is more than twice the stationary estimate, the scale parameter is modelled as non-stationary. Estimation of the shape parameter could be improved by adding a penalty to the likelihood function to constrain the range of shape parameters such as the ‘geophysical prior’ of Martins and Stedinger, 2000. Although this was attempted and not found to be helpful, further research could look at improved implementations.

A possibility for future investigation would be to fit models using regularisation methods such as LASSO (least absolute shrinkage and selection operator) regression. These could be a promising way of fitting more complex models without having to try large numbers of combinations of covariates, and of avoiding difficulties with collinear covariates. However, LASSO could still have problems picking between causal and non-causal covariates when they are all strongly collinear.

For site-specific studies that lead to decisions about investment in flood management measures, it is important to take a more nuanced view of model selection, accounting for other types of information in addition to the BIC measure of model fit (François *et al.* 2019). These include other statistical measures such as Akaike Information Criterion (AIC) and likelihood ratios, inspection of model fit, consistency of model form between locations, hydrological reasoning, confidence limits, and sensibility comparisons between the flow estimates and the peak flow data (Faulkner *et al.* 2020).

### Looking to the future

Decision makers need information on future flood frequency. In theory, the integrated flow estimate could be calculated by averaging over a distribution of covariate values intended to represent future conditions. As discussed earlier, this climate-informed approach is only valid if the physical covariates provide a complete causal description of the non-stationarity in peak flows, which is unlikely to be the case.

It would be desirable to have a hybrid approach that could draw on the strengths of both the climate-informed and hydrological simulation approaches (Schlef *et al.* 2020, 2023). One possibility would be to use the outputs of hydrological simulation as a covariate in climate-informed statistical models, for example, using soil moisture as a covariate (Tramblay *et al.* 2014). A hybrid approach might seamlessly model both past and projected future non-stationarity and account for the impact of climate change on localised, short-term rainfall intensity as well as on long-term rainfall and soil moisture. It should also allow for the possibility of non-stationarity due to land use change.

There are some advantages to fitting statistical models to the output of physics-based climate models rather than only to observations. These include more confidence in identifying causal relationships and the ability to separate forced and stochastic components of the signal (Vecchi *et al.* 2011). Slater *et al.* (2021b) promote the so-called hybrid statistical-dynamical modelling as a way of taking advantage of the ability of physical models to predict and explain large-scale phenomena and the strengths of non-stationary statistical models to estimate probabilities of extreme events.

Such a hybrid approach would depend on having a coupled climate-catchment model that was capable of accurately reproducing the evolution of the statistical characteristics of floods over a period of time equivalent to typical gauged river flow record lengths, and into the future as well. To achieve this on many UK catchments, the climate model would need to run at a high resolution to resolve local-scale atmospheric convection. The UK Climate Projections (UKCP) convection-permitting model (Chen *et al.* 2021a) meets this requirement and is capable of running for time slices covering several decades. The statistical component of a hybrid model might benefit from including a covariate that functions as an indicator variable to distinguish between observed (past) and modelled (future) data. Brown *et al.* (2014) applied this technique when developing a non-stationary statistical model fitted to both observed rainfall and modelled projections of future rainfall.

## CONCLUSIONS

This research has made it possible to apply non-stationary flood frequency analysis incorporating physical covariates to some practical problems in flood risk management, although other barriers remain to widespread application (c.f. François *et al.* 2019), including the need for a method that can be applied at ungauged locations.

We have shown that incorporating stochastic physical covariates improves the fit of flood frequency models for nearly all river gauging stations in England and Wales. Annual rainfall and winter rainfall are the most commonly selected covariates out of those tested. The findings are consistent with those of Chen et al*.* (2021b) that annual rainfall tends to be the best covariate in lowland areas of England, with the annual maximum daily rainfall being preferred in the rest of the United Kingdom where catchments tend to be dominated by low permeability bedrock and so sensitive to high-intensity rainfall.

Importantly for a model to be used in prediction (Wasko *et al.* 2021), we have found evidence of a causal relationship between the physical covariates and flood flows, shown by the fact that even detrended covariates give a near-universal improvement in model fit compared with the stationary model.

At about a quarter of stations, the best-fitting model is a non-stationary version that had time as a covariate in addition to detrended physical covariates. By integrating over the distribution of the covariates, it is possible to extract estimates from such models that define the flow for a given encounter probability over the period of record (the integrated flow estimate). It is also possible to calculate the integrated flow estimate by integrating over a sample of covariates that spans a period different from that covered by the river flow data. For instance, the record of the covariates might be longer, enabling a more confident estimate of the distribution of the covariates.

If the integration is only over the physical covariates, the estimate can define the flow for a given AEP in a given year, e.g., the most recent year of record (the single-year integrated flow estimate). This novel concept can be a useful quantity for flood risk managers or insurers who need an updated estimate of flood hazards that avoids having to assume stationarity. More generally, integration could be restricted to stochastic physical covariates, as opposed to any others whose behaviour is reasonably predictable such as urban cover.

Although the best-fitting non-stationary models often give results similar to those of stationary models, in some cases, their results are substantially higher or (less often) lower, occasionally falling outside the 90% confidence interval of the stationary estimates. Despite the typically wide confidence intervals associated with non-stationary models, occasionally the stationary estimate falls outside the 90% non-stationary confidence interval. It appears that, while uncertainty is certainly a major feature of extreme value distributions, it does not always obscure the signal of non-stationarity (c.f. Serinaldi & Kilsby 2015).

The methods described in this article are now available to practitioners in the form of a user-friendly R package and guidance document, and have been used since 2020, alongside conventional stationary methods, in the planning and design of flood alleviation schemes in England.

## CODE AVAILABILITY

The procedures for non-stationary flood frequency estimation are implemented in the R package *nonstat*, available from https://www.gov.uk/flood-and-coastal-erosion-risk-management-research-reports/development-of-interim-national-guidance-on-non-stationary-fluvial-flood-frequency-estimation.

## ACKNOWLEDGEMENTS

The development of the methods described here was funded by the Environment Agency. Peak flow data were obtained from the UK National River Flow Archive and the Environment Agency.

## AUTHOR CONTRIBUTIONS

S. L. commissioned and reviewed the research, D. S. F. led the research team and prepared the manuscript with substantial contributions from all co-authors, S. W. developed the code, and J. A. T. advised on statistical aspects.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.