In this study, pre- and postprocessing of hydrological ensemble forecasts are evaluated with a special focus on floods for 119 Norwegian catchments. Two years of ECMWF ensemble forecasts of temperature and precipitation with a lead time of up to 9 days were used to force the operational hydrological HBV model to establish streamflow forecasts. A Bayesian model averaging processing approach was applied to preprocess temperature and precipitation forecasts and for postprocessing streamflow forecasts. Ensemble streamflow forecasts were generated for eight schemes based on combinations of raw, preprocessed, and postprocessed forecasts. Two datasets were used to evaluate the forecasts: (i) all streamflow forecasts and (ii) forecasts for flood events with streamflow above mean annual flood. Evaluations based on all streamflow data showed that postprocessing improved the forecasts only up to a lead time of 2–3 days, whereas preprocessing temperature and precipitation improved the forecasts for 50–90% of the catchments beyond 3 days' lead time. We found large differences in the ability to issue warnings between spring and autumn floods. Spring floods had predictability for up to 9 days for many events and catchments, whereas the ability to predict autumn floods beyond 3 days was marginal.
The study evaluates the univariate and the combined effects of preprocessing both precipitation and temperature forecasts together with the postprocessing of streamflow.
Evaluating forecasts of both floods as well as all streamflow values.
Large catchment sample for more robust assessment of preferred processing approaches.
Seasonal and regional differences in processing approaches are assessed.
Early warnings based on flood forecasts enable both the management authorities and the public to take necessary measures to reduce the economical, personal, and social impact of floods (e.g., UNISDRI 2004; Pappenberger et al. 2017). However, in common with any sort of forecast, an inherent feature of flood forecasting is uncertainty. In the hydro-meteorological forecasting chain, the forecast uncertainty comes from multiple sources. There is uncertainty in observations, initial conditions, forcing data, model description, and model parameters (e.g., Buizza et al. 1999; Zappa et al. 2011).
To capture the uncertainty in weather prediction caused by initial conditions (e.g., Lorenz 1969) and model parametrization, ensemble prediction systems (EPS) were developed (e.g., Leith 1974; Buizza 2015). The use of hydrological ensemble forecasts has been studied in the literature, see, e.g., Cloke & Pappenberger (2009), Wetterhall et al. (2013). To get unbiased and reliable hydrological forecasts, preprocessing (applied to the meteorological forcing) and/or postprocessing (applied to the hydrological output) techniques are needed. For flood forecasting, important sources of uncertainty and errors are the precipitation and temperature forecasts (e.g., Zappa et al. 2011). These variables are considered for preprocessing in this paper.
For a national or regional flood forecasting service, a large number of catchments with different hydrological processes and regimes are considered. In most papers, ensemble forecasts of all streamflow values for one or a small number of catchments are evaluated. Therefore, to assess the added value of pre- and postprocessing on flood forecasts, a case study from a large number of catchments that well represent the variability of hydrological processes is needed to provide robust conclusions.
The quality of ensemble forecasts is often measured by the key characteristics' reliability and accuracy. A forecast is reliable (statistically calibrated) when, e.g., for 90% of the forecasts, the observations are within the 90% prediction interval. Raw forecast ensembles are often biased and underdispersive (Gneiting et al. 2005). A lack of dispersion in global meteorological ensembles is most evident for the shortest lead times and can be explained by slower growth rates of the perturbations in the ensemble prediction system compared to those of an instable ‘true’ atmosphere (Hamill 2001). To correct for bias and underdispersion in ensemble systems, different statistical postprocessing approaches are proposed, see Li et al. (2017) and Vannitsem et al. (2018) for comprehensive reviews. These approaches include both parametric approaches relying on parametric probability distributions, for example, Bayesian model averaging (BMA) and nonhomogeneous Gaussian regression (NGR), and nonparametric approaches like quantile regression and ensemble error dressing methods. In this study, we used BMA since it is well established and adapts easily to any kind of seasonality. Raftery et al. (2005) introduced BMA to the atmospheric community as a statistical method to achieve calibrated and sharp forecasts, and the method has since been widely used within the community (e.g., Fraley et al. 2010; Madadgar et al. 2014; Xu et al. 2019).
The effects of both pre- and postprocessing on short- to medium-range streamflow forecasts have been analyzed in previous studies (e.g., Zalachori et al. 2012; Roulin & Vannitsem 2015; Benninga et al. 2017; Sharma et al. 2018). Some key findings are that (i) calibrated precipitation forecasts do not necessarily lead to calibrated streamflow forecasts (Zalachori et al. 2012; Verkade et al. 2013; Benninga et al. 2017); (ii) postprocessing alone is the simplest way to improve forecasting performance (Zalachori et al. 2012; Sharma et al. 2018), but not always with a significant improvement (Benninga et al. 2017); (iii) preprocessing the meteorological forcing is important for forecasting high streamflows since errors from the meteorological model are dominant in this case (Benninga et al. 2017); (iv) preprocessing has the highest skill improvement in the warm season, whereas postprocessing is the most effective in the cold season with snow cover (Sharma et al. 2018). These findings indicate that the relative importance of pre- and postprocessing depends on factors including lead time, streamflow magnitude, and season. None of these studies have compared the univariate and the combined effects of including both precipitation and temperature forecasts in the preprocessing together with the postprocessing of streamflow on flood forecasts. Furthermore, these studies indicate that the effects depend on both climatological and physiographic catchment characteristics and that it can be useful to systematically evaluate the combination of pre- and postprocessing methods for a large set of catchments with variations of climatic and physiographic properties. In this study, we will evaluate (i) the univariate and the combined effects of preprocessing both precipitation and temperature forecasts together with the postprocessing of streamflow for forecasting floods as well as all streamflow values, and to (ii) perform the evaluation for a large catchment sample.
The main objective of this study is to assess the potential improvements in flood forecasts by combining pre- and postprocessing for a variety of catchments. Different schemes of pre- and postprocessing using BMA are evaluated within the operational flood forecasting setup used by the Norwegian flood forecasting service. The different schemes were tested for 119 catchments that vary in climatology, catchment characteristics, and hydrological regimes. During the study period, there were flood events in 80 of the catchments. The large number of flood events and catchments allowed us to provide robust assessments of the performance of the different schemes under different flood conditions.
The working hypothesis of this paper is that pre- and/or postprocessing improves streamflow forecasts and that the improvements differ between catchments and between events. We addressed the following questions:
How should pre- and postprocessing be combined to improve streamflow forecasts with an emphasis on floods?
Are there regional or seasonal patterns in the preferred combination of pre- and postprocessing?
STUDY AREA, HYDROLOGICAL MODEL, AND DATA
Norway consists of several different climatic zones. The west coast of Norway forms a topographical barrier for the westerlies and orographic enhancement of precipitation makes this area one of the wettest parts of Europe, with an annual precipitation of around 4,000 mm. The driest regions have annual precipitation of around 400 mm (Hanssen-Bauer et al. 2017). The temperature depends on latitude, altitude, and distance from the coast.
The study area consists of 119 catchments distributed all over Norway (Figure 1). All selected catchments are part of the operational flood forecasting system and are mostly unregulated, with a large variation in size (3–15,447 km2) and elevation (103–2,284 meters above sea level [m.a.s.l.]). Three catchments (Table 1) are presented in more detail to illustrate streamflow ensemble forecasts estimated by different processing approaches for three different flood events. The catchments were selected to represent the main flood-generating processes for the different regions. The catchments are all well described by the model.
|Name .||Area (km2) .||Annual Q (mm) .||Mean elev (m.a.s.l) .||Eff lake (%) .||Glacier (%) .||Selected Flood .|
|Moeska||121||1,585||325||1.71||0.00||Rain: Dec 2015|
|Nybergsund||4,425||487||781||2.48||0.00||Snowmelt: May 2014|
|Bulken||1,092||2,038||867||0.88||0.39||AR: oct 2014|
|Name .||Area (km2) .||Annual Q (mm) .||Mean elev (m.a.s.l) .||Eff lake (%) .||Glacier (%) .||Selected Flood .|
|Moeska||121||1,585||325||1.71||0.00||Rain: Dec 2015|
|Nybergsund||4,425||487||781||2.48||0.00||Snowmelt: May 2014|
|Bulken||1,092||2,038||867||0.88||0.39||AR: oct 2014|
|No .||Yes .|
|No .||Yes .|
We used the Hydrologiska Byråens Vattenbalance (HBV) model (Bergström 1976; Sælthun 1996; Beldring 2008) that is used by the operational flood forecasting service at the Norwegian Water Resources and Energy Directorate (NVE). The HBV model is a conceptual model whose vertical structure includes a snow routine, a soil moisture routine, and a response function that consists of two tanks. Quick runoff is represented by a nonlinear tank, whereas slow runoff is represented by a linear tank. The model divides each catchment into 10 elevation zones where each represents 10% of the catchment area. Catchment average temperature and precipitation are elevation adjusted using a catchment-specific lapse rate to attain one representative precipitation and temperature value for each elevation zone. The Nash–Sutcliffe efficiency (Nash & Sutcliffe 1970) and volume bias are used as calibration metrics. The calibration period, 1996–2012, gives a mean Nash–Sutcliffe 0.77 for all 119 catchments, with zero volume bias. The validation period, 1980–1995, shows a mean Nash–Sutcliffe of 0.73, with a mean volume bias of 5% (Gusong 2016).
Meteorological observation SeNorge v1.1
We used the gridded daily temperature and precipitation data from the SeNorge v 1.1 dataset, which covers all of Norway with a 1 × 1 km grid size. The interpolation of observations to the grid is based on measured values at approximately 400 meteorological stations for precipitation, and 240 stations for temperature. Residual kriging is applied for spatial interpolation of detrended temperature values (Tveito 2007; Mohr 2008). Temperature is detrended by adjusting station data to sea level using a standard temperature lapse rate of 0.65 °C/100 m. Triangulation is used for the spatial interpolation of precipitation (Tveito 2007; Mohr 2008). The precipitation is further elevation corrected, using a constant increase of 10% per 100 m beneath 1,000 m.a.s.l, and 5% per 100 m above 1,000 m.a.s.l. (Tveito et al. 2005).
Meteorological forecasts ECMWF ENS
The temperature and precipitation forecasts used in the hydrological simulations of this study were taken from the European Center of Medium-Range Weather Forecast (ECMWF) forecast ensembles (ENS). ENS provides an ensemble of 51 members and a forecasting period of 246 h. The ensemble members are generated by adding small perturbations to the forecast initial conditions. The perturbations represent the uncertainty in the observations. Further, the uncertainty associated with the model physics is represented by perturbing the physics tendencies that come from the parametrizations and each member is perturbed individually. This method is known as the Stochastically Perturbed Parametrization Tendencies (SPPT) scheme and improves the forecasts giving a much better spread-error relationship compared to initial condition perturbations alone. A detailed description of the ECMWF ENS system is provided in, e.g., Buizza et al. (1999) and Persson (2015). The grid resolution of the model forecasts used in this study is 0.25° (i.e., model cycles/versions 40r1, and 41r1 (ECMWF 2018)). The variables used for the hydrological modeling are the accumulated precipitation and the 2-m temperature aggregated to catchment daily (06:00–06:00) mean values.
Streamflow reference simulations
To calibrate the hydrological model the streamflow measurements from the NVE database (https://www.nve.no/hydrology/) were used as a reference. To evaluate the streamflow forecasts, we used simulated streamflow (reference streamflow) created by running the hydrological model with SeNorge temperature and precipitation as forcing. Using this approach, we isolated the effect of the uncertainty in the weather forecasts, and we could ignore uncertainties in observed meteorological inputs, initial conditions, hydrological model parametrizations and parameters as suggested in Verkade et al. (2013).
The study period 2014 and 2015 was chosen since several large floods affected rivers in most parts of Norway during this period (Figure 1). In May 2014, there were large snowmelt floods in the central and eastern parts of Norway. In October 2014, western Norway was hit by an atmospheric river (a narrow plume of high moisture content air transported from the tropical and extratropical latitudes towards the poles, see, e.g., Zhu & Newell 1998), which led to the flooding of multiple rivers. Atmospheric rivers are responsible for extreme precipitation events when the moist air masses are orographically lifted at topographical barriers like the west coast of Norway (e.g., Stohl et al. 2008). In July 2015, there were snowmelt floods in central eastern Norway, and in September 2015, an extratropical cyclone, Petra, caused floods in Southern Norway. In early October 2015, a cyclone, Roar, caused floods in Trøndelag and Nordland and in early December a cyclone, Synne, caused floods in several catchments in south-west Norway, some exceeding the 200-year return level. Floods did not occur in all catchments; hence, the number of catchments used in the flood evaluation analysis was reduced to 80, from the original 119 catchments available for evaluation.
PRE- AND POSTPROCESSING
The temperature and precipitation forecast data from ECMWF were prepared by aggregating the variables from hourly to a daily time step. Thereafter the horizontal resolution was changed using nearest neighbor interpolation to a 1 × 1 km grid, equal to the SeNorge grid. For the temperature forecasts, a standard elevation adjustment of 0.65 °C/100 m was applied to account for the elevation differences between the original and the seNorge grid. Finally, the temperature and precipitation forecasts were aggregated to average values for each catchment. The ECMWF forecasts from 2014 and 2015 were used as forcing for the hydrological model, enabling a retrospective evaluation of the daily streamflow forecasts for almost 2 years. The unprocessed daily ensemble forecasts for each catchment are referred to as Traw,t,l,s,m and Praw,t,l,s,m where t is the issue time, l is the lead time, s is the catchment and m is the ensemble member.
Bayesian model averaging
BMA for temperature (Tbma)
We followed Raftery et al. (2005) and used a Normal distribution as the kernel for the temperature BMA models. Since the temperature ensemble forecasts were not already bias corrected, the mean is specified as , where is the temperature forecast for ensemble member m and a0 and a1 are regression parameters that account for any bias. The parameters are specific for each catchment, issue date, and lead time and are the same for all ensemble members.
To estimate the parameters , , and in Equation (2), the catchment average temperatures from SeNorge were used as a reference.
BMA for precipitation (Pbma)
BMA for streamflow (Qbma)
BMA training length
Following Raftery et al. (2005), the BMA models for temperature, precipitation, and streamflow were trained on data from a time window prior to the issue date for each forecast. We tested different training lengths for all variables and lead times, using CRPS (description in the following section) as the evaluation metric. Experiments with different training lengths showed that the optimal window size depends on variable, lead time, and whether CRPS was calculated for all data or only for days with flooding. Precipitation was most sensitive to the training length and a 45-day training period was found to be optimal for most catchments and lead times. To maintain consistency during the evaluation we used a 45-day training period for all variables (i.e., temperature, precipitation, and streamflow).
Temperature and precipitation dependence structure (ensemble copula coupling)
The BMA models described above were applied independently to each weather variable, each location (here catchment) and each lead time. The preprocessed ensembles were established by drawing 51 new realizations from the mixture distribution of each BMA model independently. To recreate forecast trajectories of temperature and precipitation, it is necessary to account for the temporal and inter-variable dependence structures. In this study, it was achieved by using an approach similar to Ensemble Copula Coupling (ECC, Schefzik et al. 2013). The original 51 ensemble members (o,m) for temperature and precipitation were, for each location, issue date, and lead time, assigned a rank (ro,m), where o refers to the original ensemble member. Similarly, the 51 BMA-processed precipitation and temperature ensemble members were assigned a rank (rn,m), where n,m refers to the BMA-processed ensemble member. The 51 preprocessed ensemble members were reordered by using ro,m and rn,m as keys to keep the preprocessed ensemble members in the same rank sequence as the original ensemble members. By applying this method to all variables, lead times, and issue dates we maintain the dependency between the variables, as well as the temporal dependency for each of the variables.
We evaluated the pre- and postprocessing methods for the study period using both the full dataset and the flood dataset using continuous rank probability score (CRPS), skill score (CRPSS) and the critical success index (CSI) as evaluation metrics.
CRPS and continuous rank probability skill score (CRPSS)
Note that CRPSS has 1 as the optimal value and is positively oriented. Since CRPSS has no units, we could calculate average skill scores across all catchments. CRPS and CRPSS were calculated for the complete dataset as well as for the flood dataset.
Critical success index
The p-values show the outcome of a t-test comparing each processing approach to the best ones. For the full dataset, Tbma_Praw_Qbma is the best whereas for the flood dataset, Tbma_Pbma is the best.
Since floods are rare events, there are a small number of flood events compared to the number of nonevents. A good forecast has a high hit ratio and a low false alarm ratio. The CSI (Jolliffe & Stephenson 2012) balances these two aims by penalizing the hit ratio for both the missed events (M) and the false alarms (F). The CSI has a value between zero and one, with one being the optimal value. In an operational setting, a warning will be issued when a predefined number of ensemble members (or a defined probability) exceeds the flood warning threshold. For the simplicity of this work, we have chosen a limit of 10 members exceeding the mean annual flood level. The mean annual flood has a return period of 2.33 years (i.e., ∼20% probability of occurrence).
Floods by seasons
The performance of flood forecasts can differ between seasons for several reasons. One reason is that flood-dominating processes often are aligned to season, e.g., snowmelt contribution to floods dominates in spring, and rain-induced floods dominate in autumn. Another example are seasonal dependent biases, for example, a negative bias in the temperature ensemble forecast in autumn and winter for the Norwegian west coast (Seierstad et al. 2016; Hegdahl et al. 2019). For these reasons, we divided the flood events into spring and autumn floods and used CSI to evaluate how the performance of processing methods depends on the season. We defined spring from April 4 to June 13, and autumn from September 1 to December 10.
Skill – relations to lead time for all data and floods
The variability in CRPSS is larger for the flood dataset (Figure 3 right and Table 3) compared to the full dataset, meaning that the benefit from the PS under flood conditions is not so high for all catchments, and for several catchments, the forecasts worsen (those where CRPSS is below zero). For the flood dataset, we find that if only preprocessing is applied, preprocessing both precipitation and temperature gives the highest skill. For the approaches including postprocessing, we see that postprocessing alone is the worst processing scheme, and that combining preprocessing of temperature with postprocessing is the best approach for more catchments. For the longer lead times, there are increasingly more catchments where postprocessing leads to a poorer performance, compared to using the raw forecast (our reference forecast). The t-test in Table 3 shows that it is difficult to find one method that is significantly better than all the others for all of Norway. The best processing approaches for the flood data (Tbma_Pbma) and for all data (Tbma_Praw_Qbma) are both significantly better than postprocessing streamflow without any preprocessing (Traw_Praw_Qbma).
CRPS – relations to location for the flood dataset
Although Qbma alone is the best approach for lead time of 1 day in a large proportion of the catchments (22 of 88), in particular in eastern Norway (8 of 26) (Table 4), it has the worst average performance since it results in low, and even negative CRPSS values in several catchments (Figure 3, right column). This indicates that Qbma alone lacks robustness.
CSI for the whole year, spring, and autumn floods
Evaluating CSI for floods from the whole year did not give any clear indication as to which of the processing methods was better at predicting floods. This might be caused by floods being generated from rain, snowmelt, or a combination of those. However, by separating the flood dataset between floods occurring in spring and those occurring in autumn (Figure 5) we attain some interesting insight. For spring (Figure 5, left) we see that for a lead time of 1 day, the number catchments for which the different processing method performed the best is almost similar, indicating several successful methods. Qbma alone or in combination with Tbma and Qbma were the least successful methods. For lead times of 5 and 9 days, we see some improvement by applying pre- and/or postprocessing to spring floods.
For autumn (Figure 5, right) the results differ from the spring results. For a lead time of 1 day, the predictions are improved in several catchments by including postprocessing. Postprocessing has zero predictability (CSI is zero) for most of the catchments for lead times of 5 and 9 days. Only a few catchments have better predictive skill when applying Pbma alone or in combination with Tbma.
The effect of pre- and postprocessing for a selection of events and catchments
Well-forecasted streamflow is essential to determine a correct flood warning level. In this subsection, we present three flood events and catchments to demonstrate how the different processing approaches influence the ensemble flood forecasts, and how they correspond to warning levels and the reference streamflow.
Table 5 shows the number of members that for each processing approach exceed the warnings threshold for the events presented in Figures 8–10. Included are the three lead times with the highest warning level for each event.
Three lead times (Lt) are presented for each catchment.
DISCUSSION AND CONCLUSION
The results demonstrate that all catchments benefitted from one or more of the applied PS, thereby confirming our working hypothesis. However, it was not possible to identify a distinct processing chain that is optimal for all forecasts. The optimal method varies with several factors including lead time and season. The flood-generating process is often seasonal, i.e., snowmelt floods are more prone in spring and for inland and high elevation catchments, and rain-induced floods are more typical for autumn and in coastal catchments.
Part of answering our first research question ‘How should pre- and postprocessing be combined to improve streamflow forecasts with an emphasis on floods?’ is that postprocessing alone seems to be the least optimal choice when evaluating both the full dataset and even less optimal when the subset of floods is considered. This approach is significantly worse than the best processing approach, both for floods and for all streamflow. This clearly demonstrates the importance of correcting biases and spread in the forcing variables. The catchments' responses to the temperature and precipitation inputs are nonlinear, in particular for snow accumulation and snow melt processes where temperature thresholds are important. Using postprocessing alone is therefore less effective in correcting for biases in inputs to the hydrological model. We find that for the full dataset, the best performance is seen when applying postprocessing combined with preprocessing of temperature for lead times of up to three days, whereas for the longer lead times preprocessing of temperature alone or both precipitation and temperature provide the best performance. Global meteorological ensembles often lack spread for shorter lead times since they are designed for medium-range forecasts and therefore use perturbations that optimize the ensemble spread for longer lead times. BMA models used both for pre- and postprocessing will therefore improve the forecast skill. It would be instructive to assess whether using meteorological ensembles from a regional weather model, which are better able to model the uncertainties in the short range compared to the ensembles from global weather models (Frogner et al. 2019a, 2019b), as inputs to the hydrological model alter this finding. However, such forecasts were not available for our study period.
The improvement in skill resulting from the PS is smaller for the flood dataset compared to the complete dataset, and for some catchments, the processing deteriorates the forecasts (Figure 3). We find that postprocessing is less useful for the three first lead times for the flood dataset as compared to the full dataset. Preprocessing both precipitation and temperature for the shortest lead times and only temperature for the longest lead times was the best choice for the largest portion of the catchments in the flood dataset. This result is in line with Benninga et al. (2017) who underline the importance of improving the meteorological inputs, in particular for high flow events. In addition to the differences in preferred PS between catchments, we find that for a single catchment, the best processing scheme varies with lead time (i.e., Figures 6 and 7). This underlines that forecast errors arise from different sources, and that being conclusive based on relatively small sample of floods is difficult. The results further showed that autumn floods were particularly difficult to predict beyond a lead time of 3 days, where processing did not improve the flood prediction capability for 28 of 33 catchments with a CSI of zero (Figure 7 right).
Answering our second research question ‘Are there regional seasonal patterns in the preferred combination of pre- and postprocessing approaches?’, the results show that the preferred scheme has both regional and seasonal patterns when evaluated for the flood dataset. The regional pattern shows that catchments benefitting from preprocessing alone are, to a large degree, located in coastal areas whereas postprocessing is more important for the inland and high-elevation catchments where temperature and slower snowmelt processes dominate (Figure 4). Furthermore, Pbma is the most successful processing scheme in areas with high precipitation (i.e., the west and south-west coast of Norway).
The performance of the PS has clear seasonal patterns. The seasonal effect was evaluated by separating spring floods from autumn floods. The CSI shows that there are large differences in predictability between seasons. For autumn floods there is almost no predictability beyond 3 days, whereas in contrast, spring floods show predictability for up to 9 days. These results indicate that the predictability of floods depends on the flood-generating processes, i.e., snowmelt-induced spring floods are easier to forecast than rain-induced autumn floods. These results further imply that the autumn precipitation and floods are the most difficult to predict and have the highest potential for improvements. Typical catchments improved by BMA applied to precipitation (Pbma) are located in coastal and western Norway and are hence prone to high precipitation amounts. One concern when using BMA for preprocessing precipitation is that some of the ensemble members in Pbma attained physically nonplausible values, resulting in very high flood forecasts. This is apparent for the Bulken catchment for the October 2014 event (Figure 8). The explanation is that the Bulken catchment experienced large amounts of precipitation during a preceding event. Several of the raw ensemble members for this preceding event had much lower precipitation than what was later observed, whereas the high precipitation for the October 2014 event was better forecasted. Consequently, the BMA procedure increased the forecasted precipitation values too much. In addition, the use of a positively skewed gamma distribution for the kernel amplifies high precipitation values. We believe that this effect can be particularly important in western Norway where small shifts in wind directions might significantly change spatial precipitation patterns and thereby introduce a potential for large errors in forecasts. Possible solutions could be to use a categorical approach (e.g., Ji et al. 2019), where the precipitation is separated into precipitation categories (based on for example daily ensemble mean) and unique BMA models are trained for each category.
Cold climate challenges in flood forecasting are demonstrated by the importance of correct temperature and precipitation forecasts for snow storage estimations. For both Bulken and Moeska (Figures 8 and 9) preprocessing temperature affects streamflow through the snowmelt. This indicates that the models have snow available in higher elevated parts of the catchment. On the other hand, neither Pbma nor Tbma affected the streamflow for the snowmelt flood in Nybergsund. In this example, there is no snow in the model's internal state and therefore, in a situation of snowmelt, any increase in temperature by Tbma will not increase streamflow.
For the calculation of CSI, we used a limit of 10 ensemble members (a probability of about 20%) exceeding the flood threshold to issue a flood warning. The ensemble can provide a whole range of probabilities and here we only evaluated for one probability level. The optimal probability of exceedance to issue a flood warning might be different between catchments, lead times, and seasons. Another aspect is to investigate the acceptance level for false alarms to missed events. The number of tolerable false alarms might depend on the impacts of the event (e.g., risk evaluation), and it is therefore difficult to make one absolute decision on behalf of all possible exceedance levels (flood sizes) and affected parties. We acknowledge that the choice of evaluation criteria can be different depending on the users and the cost of mitigation action compared to the loss due to an event, and that false alarms and missed events might be weighted differently depending on a total cost-loss evaluation.
An evaluation of CRPS for the complete dataset of 2 years showed that the combination of pre- and postprocessing is most effective for short lead times, up to 2–3 days. For longer lead times, PS that only include preprocessing provide the best results, either BMA applied to temperature (Tbma) alone or in combination with precipitation (Pbma).
For the flood dataset, the added value of processing is less clear. Overall, the best approach for all lead times is to preprocess both precipitation and temperature.
The processing is sensitive to regional patterns. Postprocessing was most effective for inland and higher elevated catchments whereas the coastal catchments gained more from preprocessing. BMA applied to precipitation and temperature improved CRPS for the western and southwestern coastal catchments for the early lead times, whereas Tbma was most important for the longer lead times.
We see a substantial difference in performance between spring and autumn floods using critical success index (CSI) for evaluation. In autumn, there is almost no predictive skill for lead times of more than 3 days. Spring floods have a higher predictability for up to 9 days in advance.
The focus for further improvements should be on the preprocessing of high precipitation rates. For most incidents, the highest precipitation incidents and hence floods were underestimated, whereas for a few incidents, preprocessing high precipitation rates resulted in unrealistic amounts for individual ensemble members.
The authors would like to thank Bård Grønbech at NVE for help setting up the hydrological model used in this study.
DATA AND SCRIPTS
We made use of the following R-packages: ncdf4, ensembleMOS, ensembleBMA, SpecsVerification.
The code available at https://github.com/metno/fimex was used for the resampling and reprojection of the gridded datasets.
The SeNorge data are downloadable, https://thredds.met.no/thredds/projects/senorge.html, Met Norway.
The ensemble forecast data are available from ECMWF, and streamflow observation is available from NVE upon request.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.