Use of historical data in flood frequency analysis: a case study for four catchments in Norway

There is a need to estimate design floods for areal planning and the design of important infrastructure. A major challenge is the mismatch between the length of the flood records and needed return periods. A majority of flood time series are shorter than 50 years, and the required return periods might be 200, 500, or 1,000 years. Consequently, the estimation uncertainty is large. In this paper, we investigated how the use of historical information might improve design flood estimation. We used annual maximum data from four selected Norwegian catchments, and historical flood information to provide an indication of water levels for the largest floods in the last two to three hundred years. We assessed the added value of using historical information and demonstrated that both reliability and stability improves, especially for short record lengths and long return periods. In this study, we used information on water levels, which showed the stability of river profiles to be a major challenge. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/nh.2017.069 om https://iwaponline.com/hr/article-pdf/49/2/466/196304/nh0490466.pdf er 2019 Kolbjørn Engeland (corresponding author) Donna Wilson Péter Borsányi Lars Roald Erik Holmqvist The Norwegian Water Resources and Energy Directorate, P.O. Box 5091 Majorstua, NOR-0301 Oslo, Norway E-mail: koe@nve.no


INTRODUCTION
The motivation of this study is the need to estimate design Schendel & Thongwichian ) and/or paleohydrological data (e.g., Benito & O'Connor ); or (iii) use causal information, i.e., by combining precipitation statistics with precipitation-runoff models (e.g., Paquet et al. ).
In this study, we focused on the use of historical information. Such information might be included in at least three ways (Reis & Stedinger ; Gaál et al. ). For a certain period, we need to know either (i) the exact magnitude, (ii) the upper and lower limits, or (iii) the total since it is flexible with respect to the type of historical information that can be included. The Bayesian approach is also more flexible when, in a later stage, several data sources might be combined in a regional-historical flood frequency analysis.
Sources for historical data include: (i) annals, chronicles, memory books and memoirs; (ii) weather diaries; Time-dependent thresholds can be used to meet this challenge (e.g., Brázdil et al. ). Second, flood generating processes might evolve in time due to changes in climate and land use, which can make the flood frequencies themselves non-stationary (Benito et al. ). Finally, changes in the river channel might limit the possibility to estimate the magnitude of historical floods (Brázdil et al. ). Several studies demonstrate that provided the perception threshold is sufficiently high, it is sufficient to know the number of floods When using historical data, it is important to set a reasonable length of the period for which we assume that all events above a threshold are known. Strupczewski et al. () assess the sensitivity of flood quantile estimates to the length of the historical period and show that if one large historical flood event is known, the optimal choice is to set the start of the historical period so that this event is in the middle of the historical period. In Prosdocimi (), it is shown that this estimate equals to the 'maximum spacing estimator' which sets the start of the historical period to be the time point that precedes the first historical flood event by the average time spacing between the historical events. The same approach is used in Schendel & Thongwichian (), the only difference being that both the historical and the systematic data are used to calculate the average time spacing between flood events. The aims of this paper are to (i) assess the added value of using historical data for flood quantile estimation, how the added value depends on data availability and the estimated length of the historical period, and (ii) to demonstrate the use of historical information in selected Norwegian catchments. To address this first aim, a 123-year-long annual maximum flood series was used, and it was evaluated how the added value of historical information depends on (a) the systematic record length relative to the length of the historical period, (b) threshold for historical events, and (c) the estimated length of the historical period.
To address the second aim, four catchments were selected where epigraphic sources indicating the water levels of large historical floods close to gauging stations are available.
Guidelines and/or recognized approaches for the inclusion of historical information in flood frequency analysis have not yet been developed for Norway (Kjeldsen et al. ), and as far as the authors know, no scientific studies have addressed this topic for Norwegian catchments.
This paper continues with a presentation of the study area, and data include a brief description of the largest historical floods. The methodology used for parameter estimation and the evaluation strategy are detailed, before the results are presented and discussed. Finally, some conclusions are drawn.

Systematic flood observations
We extracted daily streamflow observations for our four study sites from the national hydrological database at NVE. In Figure 2, time-series of annual maximum floods (Q) are shown (in black) together with the historical floods (as circles). Table 1  the 1860 flood (Otnes ), but there are no flood marks for this event. Similarly, for these two locations, we used the oldest rating curve since the river profiles are relatively stable.
For Labru, descriptions of flood levels are found in Kristensen () and were also transferred to flood volumes using the oldest rating curve (Roald ). Table 2 below summarizes the available information for each of the historical flood events used in this study.

Historical river profile information
Bulken gauging station is located at the outlet of the lake Vangsvatnet. The water was originally conveyed into a channel with limited capacity, which caused large fluctuations of the water level in the lake upstream. In order to avoid this, the channel was deepened and widened in 1865, and assessments from the Norwegian channel directorate (Kanalvaesenet) indicate that the flood levels were reduced by up to 6 feet (1.9 meters) (Kanalvaesenet ). The outlet of the lake was again modified in 1990 in order to reduce flood water levels. The gauging station at Voss therefore has two rating curves, one for the period  and one for the period 1990 to today. In order to assess flood discharge for historical floods before 1865, the rating curve from 1892 to 1990 was simply adjusted by 1.9 meters when compared to historical water levels.
For the Lalm, Strandefjord, and Labru gauging stations, we used the oldest available rating curves when translating water levels to river flows. Their profiles are assumed to be stable since no information regarding changes have been found.

MODELING Flood frequency modeling
For flood frequency modeling we applied the GEV distribution which is shown to be a limiting distribution for block maxima (Fisher & Tippett ; Gnedenko ; (1) where π is the prior and lxjm, α, k ð Þis the likelihood of the observation vectorx given the parameters m, α, k.
The denominator makes the integral under the pdf equal one.
For the prior π m, α, k ð Þ , we used non-informative priors for the location and scale parameters, where the priors for the location parameter and the log-transformed scale parameter were uniform. In order to avoid absurd estimates of the shape parameter, we added a prior likelihood as rec- The likelihood for the systematic data was calculated as: where F is the GEV distribution given in Equation (1).
Then, we need to include our knowledge on the floods exceeding x 0 . In the simplest case we know only that t floods exceeded x 0 , and then this likelihood is given as: Alternatively, we might know that the floods that exceeded x 0 where within an interval defined by an upper x U and lower x L limt: And in the best case we know the exact magnitude of all floods exceeding x 0 : Depending on which data we have, the total likelihood is given as a product of the three major likelihood terms: For Bulken, Lalm, and Strandefjord we did not have any specific information about when the historical period starts. We therefore followed Prosdocimi () and set the length of the historical period to be the time span from the first historical event to the end of the historical period plus the average time spacing between the historical events.
Details for each catchment are given in Table 3. Labru we treated differently and we subjectively set the historical period to start in 1790 since the 1860 flood is the largest one since 'Storofsen' in 1789. Using the approach by Prosdocimi () would set the historical period too short and therefore lead to overestimation of flood quantiles.
For all catchments the historical period ends when the systematic records starts.

Model evaluation
In order to assess the added value of the historical infor- Experiment 1 (using systematic data only) The systematic record of annual maximum floods Q was resampled k times with replacement to obtain an independent data sample Q* of length n. Then, a subset Q Ã s of length s ¼ 5-75 years of the resampled record was used as systematic data in order to estimate the likelihood term l s in Equation (3). To create historical information we used Experiment 2 (using systematic and historical data) In the second experiment, the real historical information (see Tables 2 and 3) was combined with the systematic observations. For the first part of this experiment, we wanted to explore how the length of the systematic flood data influences the flood quantile estimates. Therefore, we resampled, with replacement, the systematic flood data to obtain an independent data sample Q* of length n. Then, a subset Q Ã s of length s ¼ 5-120 years was combined with the historical information listed in Table 2 and used for estimating flood quantiles. For the second part of this experiment, we wanted to explore how the length of the historical period influences the flood quantile estimates. Therefore, we resampled, with replacement, the historical flood information that consists of the six flood values given in Table 1 and 319 zeros. This provided an independent data sample Q 0 of length 325. Then a subset Q 0 h of length h ¼ 100-300 years was combined with the systematic data of length 123 years. For the estimation, we set the length of the historical period in two ways: (i) we used the known length of the historical period, i.e., the length of the subset, and (ii) we followed Prosdocimi () as described above. Note that h was estimated for each subsample Q 0 h . To assess the stability of the design flood estimates, the coefficient of variation of design flood estimates based on the k samples was used. In order to assess reliability, the distribution was fitted to the complete data set (i.e., the systematic flood data Q of length 123 years and the historical information with the six known floods in 335 years) as a 'truth'. As in experiment 1, the MAE and MRE for the difference between the reference and the distribution fitted to the different sub-samples were calculated.
For both experiments, the evaluation of the reliability is challenging since we used the distribution fitted to the original data as truth, i.e., the truth is unknown. We assumed that the evaluation of reliability can be trusted for return periods shorter than the 123 years of data used to estimate the truth. The stability, on the other hand, can be analyzed for any return period. Using MAE and MRE to measure the reliability also gives information about stability, since a small MAE indicates a small spread in the estimates.

Return level plot
To demonstrate the use of historical information in selected Norwegian catchments, the fitted distributions were plotted together with their 95% credibility intervals and their empirical distribution. Each plot includes models fitted to systematic data only, and to both systematic data and historical information. We applied the Cunnane plotting position (Cunnane ) where the exceedance probability of x i with rank i from a data set with m members sorted in decreasing order is given as: For introducing the historical floods we used the plotting positions given by Hirsch & Stedinger (): where i is the rank, l is the number of extraordinary floods, n is now the length of the period for which we have information about floods (note that n þ h þ s), s is the length of the systematic record, h is the length of the historic information, and e is the number of extraordinary floods in the systematic record (note that the number of historical floods t ¼ lÀe).

Experiment 1
The results for the first part of experiment 1 are shown in Results for the second part of experiment 1 in Figure 9 show the sensitivity of the evaluation criteria to the estimated length of the historical period. The historical data were created only for the highest threshold (500 m 3 /s), and the estimated distribution using all 123 years of AMS data was used as the 'truth'.

Experiment 2
In experiment 2 we used the historical flood information at Bulken (see Tables 2 and 3)

Return level plots
The estimated return level together with a 95% confidence interval are plotted for the four study catchments. Figure 13 compares: (i) the use of systematic data only and (ii) the combination of systematic and historical data. The empirical distribution of all data (systematic and historical) is included.

DISCUSSION
The first objective of this paper was to assess the added value of using historical data for flood quantile estimation, and how the added value depends on data availability. The results from both experiments 1 and 2 show that using historical flood information improves both the reliability and the stability of the design flood estimates.   The MRE reflects the bias in the estimation. From experiment 1, we see that using only systematic data has the  (Figures 7 and 10). The reduction in CV is largest when the systematic sample size is the shortest. For experiment 1, the improvement in stability when using the number of floods in the historical period is low when the length of the systematic record is longer than around 25 years and the length of the historical period is shorter than 98 years (Figure 7). In experiment 2, we do not see this behavior ( Figure 10). This behavior in experiment 1 is related to the relative strengths of the different likelihood terms in the estimation (l s , l b , and l a ), as discussed above for the results of MAE.
All three evaluation criteria show a small sensitivity to the length of the historical period in the range from 100 to 300 years ( Figure 11) when all systematic data are used. As for Figure 10, there is a small difference between using historical flood sizes and using the number of historical floods.
Also the sensitivity to the length of the systematic record is relatively small when all historical data are used. This demonstrates the strength of combining these two data sources. . We therefore suggest that a general behavior for most sites would be that a smaller estimation bias (both MAE and MRE) and improved stability indicated as smaller CV is achieved when historical information is used, and that the improvement is the most pronounced when the amount of systematic data is short. We also suggest that for a relatively  The results show that the evaluation criteria are sensitive to how the length of the historical period is estimated ( Figures 9 and 11). In experiment 1, we see that using the maximum distance estimator provides best results, whereas using the year of the earliest historical flood event as the starting point for the historical period contributes to an overestimation of return levels ( Figure 9). This confirms results from Prosdocimi (). For experiment 2, we see that using the estimated length is even better than using the known length of the historical period ( Figure 11). The maximum distance estimator suggested by Prosdocimi () should therefore be a standard approach for estimating the historical period length.
The results in Figure 13 demonstrate that adding historical information, in particular helps to reduce estimation The results in this paper indicate that, provided sufficient historical information is available, even five to ten years of systematic data are sufficient for estimating design floods with a reliability and stability comparable to an estimate based on 50 years of systematic data.
In this study, we have not addressed the sensitivity of the result to biases in the historical information. In our case studies, the bias in the historical information is, to a large degree, related to our assumption of stability of river profile for Labru, Lalm, and Strandefjord, and to a rather simplistic quantification of historical flood sizes at Bulken. In the latter case, we do not have the necessary information to establish rating curves for the period before the modification of the river profile in the 19th century.
In this study, we have not used information about naturalized floods, i.e., estimates of flood sizes as if there were no reservoirs. The three catchments Lalm, Strandefjord, and Labru are all today heavily regulated, each with several reservoirs (see Table 1). The effect of river regulations is characterized by a non-uniform decrease in flood magnitudes that influences both the mean, the spread, and the skewness of the annual maximum floods. Naturalized flood estimates (i.e., estimated flood sizes where the effect of reservoirs is removed) are available in some of these catchments. These data were not used in this study due to the large estimation uncertainty that is unique for each flood event. Naturalized floods are based on water balance calculations for each reservoir, and need time series of overflow, gate openings, power production, and reservoir levels.
The flood estimates are uncertain on a daily time resolution since: (i) in some of the reservoirs, the water levels were measured at a weekly time resolution before 1990; (ii) reduction of flood peaks in natural lakes before the reservoirs were established is not known or not accounted for; (iii) the traveling time from the reservoirs to downstream reservoirs and to the gauging station is partly ignored.

CONCLUSIONS
In this paper, the use of historical information in flood frequency analysis was investigated. The historical information could be either the number of floods or the magnitude of all floods above a threshold for a specified period. We evaluated the added value of historical information using criteria measuring reliability and stability of the design flood estimates. We also demonstrated how historical information could be used in four selected catchments in Norway. Based on this study, the following conclusions were drawn: • There is added value in using historical information. The added value is greatest when the magnitude of the historical floods is known, and when the length of the systematic record is short. If only the number of floods above a perception threshold is known, the added value is greatest when the perception threshold is relatively high. In our four study catchments it is shown that even the longest time series of 123 years benefits from the use of historical information. It is also recommended to follow Prosdocimi () and set the length of the historical period to be the time span from the first historical event to the end of the historical period plus the average time spacing between the historical events.
• It is feasible to include historical information in the flood frequency estimation in many Norwegian catchments. A major challenge is to translate information about historical floods to flood levels and discharges, and in this study we selectedon purposecatchments where information about flood levels is available. For other catchments, the use of historical information might therefore be even more challenging and detailed analyses might be required to assess flood magnitudes. In this study, we have identified 1789 and 1860 as years with exceptional flood magnitudes in eastern Norway, and these years should be considered for flood studies in other catchments in this region.
A major limitation in using this approach is the nature of the flood information available. Frequently, as in this study, water levels on flood stones or marks are all that is available.
In such cases, the stationarity of the river profile is often assumed. In other cases, information about flood damage rather than flood levels, is all that is available. In Norway, flood damage information for large floods is typically sourced from either written documents or tax reduction records which indicate the farms that have suffered damage. Using such information is more time-consuming and requires detailed mapping combined with routing models in order to assess the flood magnitudes which have caused damage.
In order to make historical flood information easily accessible, NVE has now begun storing historical flood magnitudes in the national hydrological database and is developing a set of analysis tools available for use with such data. The registration of historical flood magnitudes will be encouraged in future assessments of design floods.