Abstract
In this study, we utilise Artificial Neural Network (ANN) models to estimate the 100- and 1500-year return levels for around 900,000 ungauged catchments in the contiguous USA. The models were trained and validated using 4,079 gauges and several selected catchment descriptors out of a total of 25 available. The study area was split into 15 regions, which represent major watersheds. ANN models were developed for each region and evaluated by calculating several performance metrics such as root-mean-squared error (RMSE), coefficient of determination (R2) and absolute percent error. The availability of a large dataset of gauges made it possible to test different model architectures and assess the regional performance of the models. The results indicate that ANN models with only one hidden layer are sufficient to describe the relationship between flood quantiles and catchment descriptors. The regional performance depends on climate type as models perform worse in arid and humid continental climates. Overall, the study suggests that ANN models are particularly applicable for predicting ungauged flood quantiles across a large geographic area. The paper presents recommendations about future application of ANN in regional flood frequency analysis.
HIGHLIGHTS
Parsimonious ANN models (only one hidden layer and with fewer inputs) have similar performance to more complex models for the study area.
ANN models in arid, semi-arid and humid continental regions have the lowest performance.
Area and standard annual average rainfall have the highest importance for predicting flows in ungauged basins in most regions.
INTRODUCTION
Several large flood events were recorded in the USA in the last few years including the Midwest Floods in 2019 and Hurricane Florence, which caused significant loss of life as well as damage to property and infrastructure. With extreme flooding expected to increase in parts of the USA due to climate change (e.g., Paltan et al. 2018), it is important to properly model the magnitude and frequency of these events. Model estimates are needed, among other things, for flood risk mapping, which in turn is used, for example, by insurance providers. In the case of flood maps, large-scale studies are often needed as a first step to identify regions where more detailed local analysis is called for.
A ubiquitous method for flood frequency analysis at a gauged location involves fitting a statistical distribution to peak discharge events (usually peaks over threshold or annual maxima) and then stating return levels as quantiles of the selected statistical distribution (Kottegoda & Rosso 2008). Flood frequency analysis applied directly to peak discharge is less complex in comparison to rainfall-runoff modelling as it only relies on the analysis of gauged river discharge data. Thus, this approach is especially useful in large-scale modelling, which involves the processing of very large datasets. However, Prediction in Ungauged Basins (PUB) remains a significant challenge and requires that regional frequency analysis (RFA) is applied (Hrachowitz et al. 2013). An extension to the simple single-site flood frequency estimation is the index flood method, which is a type of RFA, where flow data from homogeneous regions is suitably scaled and pooled such that a much larger sample size of peak discharge events is available for distribution fitting. Finally, this method can be further extended to allow PUB by relating the scaling factor to specific catchment descriptors such as area, often via a regression equation.
The index flood method has been applied to large geographic areas and recently, Smith et al. (2015) performed regional flood frequency analysis at a global scale using 703 gauges from the GRDC (Global Runoff Data Centre) and 242 United States Geological Survey (USGS) gauges for the USA. Due to the limited available data, the median absolute error in estimating the 100-year return period (Q100) was relatively high at 56%. Currently, this study is being updated using a method based on support vector machine regression (Zhao et al. 2020). Similarly, Kjeldsen (2015) reports factorial standard error (fse) of 1.54, when applying the index flood method for the UK.
Identifying homogeneous hydrological regions is difficult, which has motivated studies to regionalise the parameters of flood frequency distributions or estimate peak discharge quantiles directly. These include spatial proximity methods such as kriging (e.g., Merz et al. 2008; Faulkner et al. 2016), and direct quantile regression. However, some types of kriging and linear regression still need to be applied to regions that are hydrologically similar because these methods can only model linear responses. Two studies have concluded that quantile regression performs better than regionalising the parameters of the distribution (Haddad et al. 2012; Ahn & Palmer 2016) and, as a consequence, studies in the USA typically use methods based on quantile regression. For example, the USGS peak flows equations (currently being updated) use this method (Ries 2007). However, the equations are difficult to apply nationally because they are developed on a regional basis and are published in different USGS flood frequency reports. In particular, Walton et al. (2019) discuss inconsistencies in the region definitions and input variables for Iowa, and similar conclusions can be made for most of the USGS flood frequency reports. At the time of writing, the StreamStats website (https://streamstats.usgs.gov/ss/) is being developed and can be used either for individual sites or in batch mode (limited to 100 gauges), but it is not implemented everywhere and thus is not applicable for large-scale studies. This poses a problem when consistent flood hazard data is required at the continental scale.
Of the approaches appropriate to large-area modelling, ANN provides a pragmatic balance to introduce some non-linear capabilities while avoiding the computational demand of physics-based simulation. ANN models have been shown to be flexible and appropriate for modelling larger datasets (Durocher et al. 2015). ANNs have frequently been used to predict daily streamflow; for example, the American Society of Civil Engineers (ASCE) task force focused on the application of ANN in hydrology (Govindaraju 2000) and recently, Kratzert et al. (2018) used Long Short-Term Memory (LSTM)-based recurrent neural network (RNN) models to perform regional rainfall-runoff modelling for 531 basins across the USA. However, only a few studies apply ANN models for predicting flood quantiles in ungauged basins and most of these studies are limited to small regions. Aziz et al. (2014) applied ANN using two predictor variables and 452 gauges in Australia and compared the model to quantile regression. The ANN model performed better than the regression method in terms of median relative error and coefficient of efficiency. The authors also demonstrated that combining all available data in a single region performs better than applying ANNs to subregions based on cluster analysis. Kordrostami et al. (2020) applied ANNs using a set of 88 gauges in New South Wales, Australia and reported improvement to the model developed by Aziz et al. (2014). Dawson et al. (2006) fitted ANNs to the median flood and a range of other return periods using 850 catchments in the UK. The model performed better in comparison to the Flood Estimation Handbook (FEH) method (IoH 1999) and multiple linear regression. In a study based in the USA, Muttiah et al. (1997) used ANNs to estimate the 2-year flood for 17 major watersheds in the USA and found that, in some areas, the performance was better than the USGS equations. Shu & Ouarda (2007) developed single and ensemble ANN models using five catchment descriptors transformed with Canonical Correlation Analysis (CCA) and 151 gauges in Quebec, Canada. The authors compared this model to CCA, an approach where kriging was performed on CCA, as well as the ANN and ensemble ANN models. Their findings suggest that the ensemble-based ANN, using the CCA-transformed variables, outperforms the other models. Recently, Beck et al. (2015) used ANNs on a global scale to calculate 17 flow characteristics such as mean flow, baseflow and different flow percentiles and their models show good performance based on several different measures.
A main challenge with applying the ANNs is feature selection. As a data-driven method, the selection of input variables can have a large influence on the model results. At the same time, computational time for training can be reduced and results improved if feature selection is properly applied. For example, Schmidt et al. (2020) used Permutation Feature Importance (PFI) to identify the most important input variable for predicting flood events in Germany. Snieder et al. (2020) compares four different methods for variable selection in neural networks applied to flood forecasting. Apart from these papers, feature selection is rarely described in the literature, and ANNs are just applied to preselected variables. The optimal ANN configuration has also been given limited consideration and most studies simply state the model structure (Maier & Dandy 2000).
In this study, we apply ANNs to predict flood quantiles in ungauged basins for the entire contiguous USA and focus on the following objectives: (1) determine the minimum complexity of the ANN models necessary to provide an acceptable fit to the data by comparing an ANN architecture using one hidden layer to multiple hidden layers; (2) quantify the performance of the models and (3) assess the catchment attribute selection and the variable importance in different regions. In this way, we provide guidelines on the application of the ANN models for regional flood frequency analysis, which would benefit future studies. We show that relatively parsimonious models can be applied and less variables are needed. In addition, we demonstrate that this approach is capable of assimilating very large, gridded datasets, which makes it possible to apply at a large scale. The method explored here was further adapted by JBA Risk Management to develop commercial flood data for application in the insurance industry.
METHOD
Study regions and peak flow data
To balance computational resources, the study area was divided into 15 regions corresponding to major watersheds. For each gauge, USGS annual maximum (AMAX) peak flow data downloaded from https://nwis.waterdata.usgs.gov/usa/nwis/peak (U.S. Geological Survey 2017) was used and included both systematic (gauge measurements) and historical data. Information on data quality checks applied by USGS is given in Ryberg et al. (2017). Provisional data was automatically excluded. Additional filtering of the gauged sites was performed by assessing the confidence interval and the fit of the statistical distribution, described later in this section, to the data series. A thorough data screening step is important for methods such as ANN modelling. Where regions contained too few gauged sites, they were merged such that the minimum gauge count region contained 49 sites. The final number of gauges and mean AMAX sample sizes are given in Table 1. The minimum data length was set to 20 years and the average AMAX sample size is between 56.4 and 71.9 years depending on the region. The selected catchments were randomly split into validation and training sets, with 75% of the dataset used for training of the models (3027 catchments) and 25% for validation (Figure 1).
Region . | Selected sites . | Mean AMAX sample size . |
---|---|---|
1 | 750 | 61.2 |
2 | 140 | 59 |
3 | 1,573 | 63.5 |
4 | 726 | 71.9 |
5 combined with 13 | 376 | 61.5 |
6 | 73 | 68.7 |
7 combined with 14 and 15 | 149 | 56.4 |
8 | 49 | 69 |
9 combined with 11 | 244 | 57.3 |
10 | 114 | 66.2 |
12 using gauges from both 9 and 12 | 303 | 68.3 |
Region . | Selected sites . | Mean AMAX sample size . |
---|---|---|
1 | 750 | 61.2 |
2 | 140 | 59 |
3 | 1,573 | 63.5 |
4 | 726 | 71.9 |
5 combined with 13 | 376 | 61.5 |
6 | 73 | 68.7 |
7 combined with 14 and 15 | 149 | 56.4 |
8 | 49 | 69 |
9 combined with 11 | 244 | 57.3 |
10 | 114 | 66.2 |
12 using gauges from both 9 and 12 | 303 | 68.3 |
Flood quantiles at gauged catchments
The study is based on 4,079 gauged catchments located across the contiguous USA. These catchments were utilised to develop models to predict peak flows at around 900,000 ungauged catchments, which covered all points located every 2 km along the river network and draining areas of 30 km2 or more. Although six return levels were estimated, only results for 100- and 1500-year return periods are presented here. The return levels at gauged sites were estimated using the guidelines in Bulletin 17c (England et al. 2019). The method involves fitting a Log Pearson type III (LP3) distributions using the Expected Moments Algorithm (EMA) to the AMAX series available from gauge and pre-gauge (historic) records and uses flow intervals instead of point data. In addition, the Multiple Grubs-Beck Test (MGBT) for detecting outliers and Potentially Influential Low Flows (PILFS), as recommended in Bulletin 17c, was applied. In case of low outliers, these values were censored, and the relevant perception thresholds updated. Due to the limited record at some gauges, a weighted average between the regional and at-site skew was used to estimate the skew parameter of the LP3 distribution by using results published in Bulletin 17b. Applying a regional analysis for the skew parameter and using pre-gauge data reduced the uncertainty in estimating the high return period flows. An example of the single site Bulletin 17c method fitting a LP3 distribution to AMAX values is presented in Figure 2. These values were later used to calibrate the ANN models.
Catchment descriptors
For each catchment, a total of 25 descriptors including catchment area and properties related to elevation, land cover, soil types and climate were extracted (Table 2). The catchments were delineated and stored as polygons using the commercial software SCALGO Hydrology application (https://scalgo.com/), and a zonal statistics tool, developed by JBA Risk Management, was used to summarise the catchment descriptors for each polygon. While this step requires considerable computational resources and infrastructure, we wanted to ensure that a sufficiently large number of descriptors were extracted.
Attribute . | Description . | Source . |
---|---|---|
Area | Catchment area | Output from SCALGO Hydrology; based on USGS 3DEP elevation |
Median Elevation | Elevation in the DTM | USGS 3DEP elevation |
Land use (barren, cultivated, forest, herbaceous, scrub, snow and ice, wetlands) | Percentage of different land use types | National Land Cover Database (NLCD) 2011 dataset, available from Multi-Resolution Land Characteristics (MRLC) consortium |
Land use developed (high, medium, low, open) | Percentage of different urban land use types | |
Land use open water | Percentage of open water bodies | |
Water bodies (all, estuary, ice, lakes, swamp, reservoir) | Percentage of different types of water bodies | USGS National Hydrography Dataset ‘NHDWaterbody’ layer |
Soil type (A, B, C, D, NA) | Percentage of soil types | SSURGO and STATSGO2 soil datasets from the US Department of Agriculture Natural Resources Conservation Service (USDA NRCS) |
RI (2-h precipitation intensity) | 2-year return period 2-h precipitation intensity | National Oceanic and Atmospheric Administration (NOAA) Atlas 14 Precipitation-Frequency Atlas of the United States |
SAAR (Standard annual average rainfall) | SAAR using 1961–1990 data | USDA NRCS ‘1961–1990 Annual Average Precipitation by State’ layer |
Attribute . | Description . | Source . |
---|---|---|
Area | Catchment area | Output from SCALGO Hydrology; based on USGS 3DEP elevation |
Median Elevation | Elevation in the DTM | USGS 3DEP elevation |
Land use (barren, cultivated, forest, herbaceous, scrub, snow and ice, wetlands) | Percentage of different land use types | National Land Cover Database (NLCD) 2011 dataset, available from Multi-Resolution Land Characteristics (MRLC) consortium |
Land use developed (high, medium, low, open) | Percentage of different urban land use types | |
Land use open water | Percentage of open water bodies | |
Water bodies (all, estuary, ice, lakes, swamp, reservoir) | Percentage of different types of water bodies | USGS National Hydrography Dataset ‘NHDWaterbody’ layer |
Soil type (A, B, C, D, NA) | Percentage of soil types | SSURGO and STATSGO2 soil datasets from the US Department of Agriculture Natural Resources Conservation Service (USDA NRCS) |
RI (2-h precipitation intensity) | 2-year return period 2-h precipitation intensity | National Oceanic and Atmospheric Administration (NOAA) Atlas 14 Precipitation-Frequency Atlas of the United States |
SAAR (Standard annual average rainfall) | SAAR using 1961–1990 data | USDA NRCS ‘1961–1990 Annual Average Precipitation by State’ layer |
The ANN modelling process requires input features (regressors) to be suitably scaled prior to model calibration. The descriptors that relate to the percentage coverage of different land uses and soil types were scaled such that the range 0–100% was mapped to the 0–1 interval. The flow and the other descriptors (log Area, median Elevation, RI and SAAR) were mapped to the 0.1–0.9 range. This provides some scope for model extrapolation above and below the range of the calibration data (Aziz et al. 2014). A boxplot of selected catchment descriptors is shown in Figure 3.
The catchments exhibit high attribute variability within each region, as well as between regions. Studies have shown that the inclusion of spurious or colinear features can negatively affect the training or result in instability of the ANNs (Kuhn 2008). For this reason, we applied a feature importance estimation algorithm implemented in the R Boruta package (Kursa & Rudnicki 2010). This algorithm proceeds by forming Random Forest (RF) models for multiple combinations of descriptors and then ranking the descriptors according to how often they contribute to a good model fit. The feature importance algorithm consistently identified a small number of insignificant features; however, in keeping with our intent to streamline the modelling process, we discarded further features where the contribution was proportionally small leaving a maximum of six features to use for modelling. An exception was Region 1 where a larger set of descriptors was used. Removing the less significant descriptors had minimal impact on model fit. For example, fitting the ANN model using 21 significant descriptors for region 4 resulted in an R2 fit score of 0.86 compared to a score of 0.82 using the top 6 most significant features. This finding was reflected across all regions and is perhaps not too surprising considering the correlation between some features such as elevation, land use and soil types.
ANN models
ANN is a machine learning (ML) approach that permits modelling where the relationship between the chosen features (regressors) and target (dependent variable) cannot be easily identified. Neural networks are constructed using a set of ‘layers’ (input, one or more hidden layers and output), which in turn consist of interconnected nodes (or ‘neurons’). The values in the input layer are weighted and a bias is added, and the resulting weight vector is served as input to the hidden layer. Then, an activation function is applied to determine the output of every node. The number of hidden layers gives the depth of the ANN and the number of neurons in the hidden layer gives the width. The ANN can be classified depending on their architecture into feed-forward, where the values in the output node are directly calculated using the input nodes, and recurrent ANN, where the previous network output can be used as input. Deep learning model architectures, including convolutional neural networks, LSTM systems and variational autoencoders, are given in Shen et al. (2018). These model architectures are mostly used in image recognition or rainfall-runoff modelling. Training or learning is needed in order to find the optimal weight matrices and bias vectors that minimise a predetermined error. In all regions, we have fitted the feed forward ANN model using a gradient descent back-propagation training algorithm, and root-mean-squared error (RMSE) as a loss function. The statistical background of ANN is given in detail in Hastie et al. (2009) or in Goodfellow et al. (2016) and summarised here. A typical output of the Multilayer Perceptron (MLP) ANN, which is used in this study, is shown in Figure 4.
During the optimisation procedure, the weights and biases of the hidden and output layers are adjusted over a series of iterations to minimise the loss function. In this case, the loss function is the RMSE between the model output and the flood quantiles estimated empirically at gauged sites using the method outlined above.
RESULTS
Model complexity
To balance parsimony with fit, we tested the performance of MLP ANN models with between 1 and 3 hidden layer depths using the R caret and neuralnet packages (Kuhn 2008; Günther & Fritsch 2010). Bootstrap resampling was used during optimisation of the learning rate, decay parameter (for single hidden layer), node number and hidden layer number hyperparameters. The learning rate and decay rate hyperparameters control the speed at which the model is optimised, while the number of nodes and hidden layers specify the complexity of the model. As an example, the results from training different ANN model architectures for region 4 are shown in Figure 5.
The resulting R2 for the training set for RP100 (100-year return level) and RP1500 (1,500-year return level), using different configurations, are shown in Tables 3 and 4. The single hidden layer ANNs have a maximum of 12 nodes, while the ANN that use two or three hidden layers have a maximum of 8 nodes per layer. In the case of RP100, the ANN architecture with a single hidden layer had similar or higher R2 values in 7 out of 11 regions (see Table 3) and for RP1500, in 6 out of 11 regions (see Table 4). Typically, the better performing, single hidden layer ANNs had fewer nodes than multiple hidden layer counterparts which is beneficial for optimisation speed. The same trend is seen in the RMSE cost function where the larger errors were generated by models having multiple hidden layers (see Tables 3 and 4). The utility of simple single hidden layer ANN models for flood quantile estimation is a useful result and is encouraging for continental and global-scale modelling applications where processing time and computing resources may be limiting factors. The following results use the single hidden layer ANN model structure.
. | One hidden layer . | Multiple hidden layers . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Region . | R2 . | RMSE . | Model configuration . | Parameters . | R2 . | RMSE . | Model configuration . | Parameters . | Input features . |
1 | 0.81 | 0.04 | layer1 = 8 | 177 | 0.79 | 0.041 | layer1 = 1, layer2 = 0, layer3 = 8 | 46 | Log_Area, H_range, SAAR, SOIL_TypeB, SOIL_TypeA, SOIL_TypeC, Elevation, LandUse_Forest, LandUse_Herb, H_range, LandUse_Developed, SOIL_TypeD, LandUse_Scrub, Wetlands, a LandUse_Open, Waterbodies_All, LandUse_DevelopedLow, Waterbodies_Lake, Waterbodies_Swamp, LandUse_DevelopedMedium |
2 | 0.68 | 0.051 | layer1 = 8 | 57 | 0.67 | 0.054 | layer1 = 3, layer2 = 0, layer3 = 8 | 62 | Log_Area, H_range, SOIL_TypeC, median elevation, SAAR |
3 | 0.69 | 0.046 | layer1 = 9 | 55 | 0.75 | 0.041 | layer1 = 8, layer2 = 4, layer3 = 1 | 83 | Log_Area, H_range, SAAR, RI2 |
4 | 0.82 | 0.037 | layer1 = 4 | 29 | 0.82 | 0.039 | layer1 = 7, layer2 = 2, layer3 = 2 | 67 | Log_Area, Wetlands, H_range, RI2, SAAR |
5 and 13 | 0.79 | 0.049 | layer1 = 4 | 25 | 0.8 | 0.049 | layer1 = 7, layer2 = 5, layer3 = 5 | 111 | Log_Area, SAAR, H_range, LandUse_Scrub |
6 | 0.8 | 0.052 | layer1 = 12 | 73 | 0.78 | 0.058 | layer1 = 6, layer2 = 6, layer3 = 0 | 73 | Log_Area, SOIL_TypeB, SAAR, LandUse_Forest |
7,14 and 15 | 0.75 | 0.059 | layer1 = 4 | 29 | 0.76 | 0.059 | layer1 = 2, layer2 = 0, layer3 = 4 | 29 | Log_Area, median elevation, H_range, SOIL_TypeD, SAAR |
8 | 0.72 | 0.071 | layer1 = 4 | 41 | 0.61 | 0.074 | layer1 = 2, layer2 = 5, layer3 = 0 | 34 | Log_Area, LandUse_Cultivated, LandUse_Herb, LandUse_Forest, LandUse_Scrub, LandUse_Barren, SOIL_TypeB, SAAR |
9 and 11 | 0.76 | 0.051 | layer1 = 6 | 37 | 0.76 | 0.051 | layer1 = 5, layer2 = 7, layer3 = 7 | 131 | Log_Area, H_range, SOIL_TypeC, SAAR |
10 | 0.7 | 0.046 | layer1 = 10 | 71 | 0.67 | 0.048 | layer1 = 6, layer2 = 3, layer3 = 8 | 98 | Log_Area, H_range, SOIL_TypeD, LandUse_Scrub, SAAR |
12 | 0.74 | 0.052 | layer1 = 8 | 57 | 0.72 | 0.054 | layer1 = 6, layer2 = 0, layer3 = 6 | 43 | Log_Area, H_range, SOIL_TypeC, SOIL_TypeB, SAAR |
. | One hidden layer . | Multiple hidden layers . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Region . | R2 . | RMSE . | Model configuration . | Parameters . | R2 . | RMSE . | Model configuration . | Parameters . | Input features . |
1 | 0.81 | 0.04 | layer1 = 8 | 177 | 0.79 | 0.041 | layer1 = 1, layer2 = 0, layer3 = 8 | 46 | Log_Area, H_range, SAAR, SOIL_TypeB, SOIL_TypeA, SOIL_TypeC, Elevation, LandUse_Forest, LandUse_Herb, H_range, LandUse_Developed, SOIL_TypeD, LandUse_Scrub, Wetlands, a LandUse_Open, Waterbodies_All, LandUse_DevelopedLow, Waterbodies_Lake, Waterbodies_Swamp, LandUse_DevelopedMedium |
2 | 0.68 | 0.051 | layer1 = 8 | 57 | 0.67 | 0.054 | layer1 = 3, layer2 = 0, layer3 = 8 | 62 | Log_Area, H_range, SOIL_TypeC, median elevation, SAAR |
3 | 0.69 | 0.046 | layer1 = 9 | 55 | 0.75 | 0.041 | layer1 = 8, layer2 = 4, layer3 = 1 | 83 | Log_Area, H_range, SAAR, RI2 |
4 | 0.82 | 0.037 | layer1 = 4 | 29 | 0.82 | 0.039 | layer1 = 7, layer2 = 2, layer3 = 2 | 67 | Log_Area, Wetlands, H_range, RI2, SAAR |
5 and 13 | 0.79 | 0.049 | layer1 = 4 | 25 | 0.8 | 0.049 | layer1 = 7, layer2 = 5, layer3 = 5 | 111 | Log_Area, SAAR, H_range, LandUse_Scrub |
6 | 0.8 | 0.052 | layer1 = 12 | 73 | 0.78 | 0.058 | layer1 = 6, layer2 = 6, layer3 = 0 | 73 | Log_Area, SOIL_TypeB, SAAR, LandUse_Forest |
7,14 and 15 | 0.75 | 0.059 | layer1 = 4 | 29 | 0.76 | 0.059 | layer1 = 2, layer2 = 0, layer3 = 4 | 29 | Log_Area, median elevation, H_range, SOIL_TypeD, SAAR |
8 | 0.72 | 0.071 | layer1 = 4 | 41 | 0.61 | 0.074 | layer1 = 2, layer2 = 5, layer3 = 0 | 34 | Log_Area, LandUse_Cultivated, LandUse_Herb, LandUse_Forest, LandUse_Scrub, LandUse_Barren, SOIL_TypeB, SAAR |
9 and 11 | 0.76 | 0.051 | layer1 = 6 | 37 | 0.76 | 0.051 | layer1 = 5, layer2 = 7, layer3 = 7 | 131 | Log_Area, H_range, SOIL_TypeC, SAAR |
10 | 0.7 | 0.046 | layer1 = 10 | 71 | 0.67 | 0.048 | layer1 = 6, layer2 = 3, layer3 = 8 | 98 | Log_Area, H_range, SOIL_TypeD, LandUse_Scrub, SAAR |
12 | 0.74 | 0.052 | layer1 = 8 | 57 | 0.72 | 0.054 | layer1 = 6, layer2 = 0, layer3 = 6 | 43 | Log_Area, H_range, SOIL_TypeC, SOIL_TypeB, SAAR |
Region . | One hidden layer . | Multiple hidden layers . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | Model configurations . | Parameters . | R2 . | RMSE . | Model configurations . | Parameters . | Input features . | |
1 | 0.7 | 0.046 | layer1 = 4 | 89 | 0.69 | 0.05 | layer1 = 1, layer2 = 0, layer3 = 6 | 40 | Log_Area, H_range, SAAR, SOIL_TypeA, SOIL_TypeB, Elevation, SOIL_TypeC, LandUse_DevelopedOpen, SOIL_TypeD, LandUse_Wetland, LandUse_Herb, LandUse_Forest, Waterbodies_All, LandUse_Scrub, LandUse_OpenWater, LandUse_DevelopedLow, Waterbodies_LakePond, Waterbodies_Swamp, LandUse_DevelopedMedium |
2 | 0.6 | 0.055 | layer1 = 6 | 43 | 0.68 | 0.054 | layer1 = 3, layer2 = 0, layer3 = 2 | 29 | Log_Area, H_range, Elevation, LandUse_Barren, SAAR |
3 | 0.63 | 0.05 | layer1 = 12 | 85 | 0.67 | 0.048 | layer1 = 7, layer2 = 6, layer3 = 0 | 91 | Log_Area, H_range, RI2, Elevation, SAAR |
4 | 0.74 | 0.043 | layer1 = 4 | 29 | 0.75 | 0.044 | layer1 = 5, layer2 = 1, layer3 = 1 | 40 | Log_Area, LandUse_Wetland, H_range, RI2, SAAR |
5 and 13 | 0.75 | 0.054 | layer1 = 9 | 55 | 0.72 | 0.056 | layer1 = 7, layer2 = 0, layer3 = 3 | 63 | Log_Area, SAAR, H_range, LandUse_Scrub |
6 | 0.77 | 0.054 | layer1 = 10 | 61 | 0.75 | 0.059 | layer1 = 6, layer2 = 0, layer3 = 7 | 87 | Log_Area, SOIL_TypeB, SAAR, LandUse_Forest |
7,14 and 15 | 0.75 | 0.062 | layer1 = 10 | 61 | 0.75 | 0.061 | layer1 = 8, layer2 = 5, layer3 = 5 | 121 | Elevation, Log_Area, H_range, SAAR |
8 | 0.65 | 0.072 | layer1 = 9 | 91 | 0.53 | 0.077 | layer1 = 5, layer2 = 6, layer3 = 4 | 114 | Log_Area, LandUse_Herb, LandUse_Forest, LandUse_Scrub, LandUse_Barren, LandUse_Cultivated, SOIL_TypeD, SAAR |
9 and 11 | 0.73 | 0.055 | layer1 = 6 | 43 | 0.75 | 0.054 | layer1 = 6, layer2 = 0, layer3 = 2 | 53 | Log_Area, H_range, SOIL_TypeC, Waterbodies, SAAR |
10 | 0.66 | 0.049 | layer1 = 10 | 71 | 0.62 | 0.053 | layer1 = 8, layer2 = 3, layer3 = 6 | 106 | Log_Area, H_range, SOIL_TypeD, LandUse_Scrub, SAAR |
12 | 0.69 | 0.058 | layer1 = 8 | 57 | 0.68 | 0.059 | layer1 = 4, layer2 = 6, layer3 = 8 | 119 | Log_Area, H_range, SOIL_TypeC, SOIL_TypeB, SAAR |
Region . | One hidden layer . | Multiple hidden layers . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | Model configurations . | Parameters . | R2 . | RMSE . | Model configurations . | Parameters . | Input features . | |
1 | 0.7 | 0.046 | layer1 = 4 | 89 | 0.69 | 0.05 | layer1 = 1, layer2 = 0, layer3 = 6 | 40 | Log_Area, H_range, SAAR, SOIL_TypeA, SOIL_TypeB, Elevation, SOIL_TypeC, LandUse_DevelopedOpen, SOIL_TypeD, LandUse_Wetland, LandUse_Herb, LandUse_Forest, Waterbodies_All, LandUse_Scrub, LandUse_OpenWater, LandUse_DevelopedLow, Waterbodies_LakePond, Waterbodies_Swamp, LandUse_DevelopedMedium |
2 | 0.6 | 0.055 | layer1 = 6 | 43 | 0.68 | 0.054 | layer1 = 3, layer2 = 0, layer3 = 2 | 29 | Log_Area, H_range, Elevation, LandUse_Barren, SAAR |
3 | 0.63 | 0.05 | layer1 = 12 | 85 | 0.67 | 0.048 | layer1 = 7, layer2 = 6, layer3 = 0 | 91 | Log_Area, H_range, RI2, Elevation, SAAR |
4 | 0.74 | 0.043 | layer1 = 4 | 29 | 0.75 | 0.044 | layer1 = 5, layer2 = 1, layer3 = 1 | 40 | Log_Area, LandUse_Wetland, H_range, RI2, SAAR |
5 and 13 | 0.75 | 0.054 | layer1 = 9 | 55 | 0.72 | 0.056 | layer1 = 7, layer2 = 0, layer3 = 3 | 63 | Log_Area, SAAR, H_range, LandUse_Scrub |
6 | 0.77 | 0.054 | layer1 = 10 | 61 | 0.75 | 0.059 | layer1 = 6, layer2 = 0, layer3 = 7 | 87 | Log_Area, SOIL_TypeB, SAAR, LandUse_Forest |
7,14 and 15 | 0.75 | 0.062 | layer1 = 10 | 61 | 0.75 | 0.061 | layer1 = 8, layer2 = 5, layer3 = 5 | 121 | Elevation, Log_Area, H_range, SAAR |
8 | 0.65 | 0.072 | layer1 = 9 | 91 | 0.53 | 0.077 | layer1 = 5, layer2 = 6, layer3 = 4 | 114 | Log_Area, LandUse_Herb, LandUse_Forest, LandUse_Scrub, LandUse_Barren, LandUse_Cultivated, SOIL_TypeD, SAAR |
9 and 11 | 0.73 | 0.055 | layer1 = 6 | 43 | 0.75 | 0.054 | layer1 = 6, layer2 = 0, layer3 = 2 | 53 | Log_Area, H_range, SOIL_TypeC, Waterbodies, SAAR |
10 | 0.66 | 0.049 | layer1 = 10 | 71 | 0.62 | 0.053 | layer1 = 8, layer2 = 3, layer3 = 6 | 106 | Log_Area, H_range, SOIL_TypeD, LandUse_Scrub, SAAR |
12 | 0.69 | 0.058 | layer1 = 8 | 57 | 0.68 | 0.059 | layer1 = 4, layer2 = 6, layer3 = 8 | 119 | Log_Area, H_range, SOIL_TypeC, SOIL_TypeB, SAAR |
Regional performance of the single hidden layer ANN
RP100 . | RP1500 . | ||||
---|---|---|---|---|---|
All gauges . | Validation set . | Training set . | All gauges . | Validation set . | Training set . |
37.1 | 38.5 | 36.8 | 45.1 | 45.2 | 45.0 |
RP100 . | RP1500 . | ||||
---|---|---|---|---|---|
All gauges . | Validation set . | Training set . | All gauges . | Validation set . | Training set . |
37.1 | 38.5 | 36.8 | 45.1 | 45.2 | 45.0 |
MAPE is similar for the validation and training data (Table 5), indicating that the model has not been overfit. The model performance varies spatially as shown in Figure 6 where percent errors are overlaid on Köppen index (Kottek et al. 2006). The spatial variation might be partially explained by the use of different ANN models for different regions. In particular, the absolute percent errors are larger in catchments with semi-arid (Köppen index BWh and BSk) and humid continental (Köppen index Dfa) climates and for some catchments with high lake percent for both RP100 and RP1500.
Even though there is spatial clustering, it is important to note that scatterplots (Figures 7 and 8) show minimal bias. The number of catchments and MAPE for both validation and training set, classified for each climate index, are given in Figure 9. Most catchments have humid subtropical climate (Cfa), while the lowest number of gauges occurs in areas with a hot desert climate (BWh). As also indicated by Figure 6, the largest errors are observed in catchments with arid (BWh and Bsk) or humid continental (Dfa) climates, where larger mean absolute percent error is also observed. These areas also have the least number of catchments available for model calibration.
In addition, the spatial autocorrelation of the percent error for RP100 and RP1500 was tested using Moran's I statistic. The result was positive and statistically significant (p < 0.05) indicating spatial structure within the error field, which can be attributed to the variation of model performance with climate characteristics. For example, as shown in Figures 6 and 9 and discussed earlier, larger errors are seen in cold semi-arid (Bsk) and hot desert climate regions (BWh), while catchments with humid temperate climate (Cfa) are associated with lower errors.
Variable importance
The variable importance of the selected input catchment descriptors, discussed in the Methods section, was computed using the R caret package (Kuhn 2008). The algorithm takes into account the absolute connection weights for the input and hidden nodes and is referred to as the ‘weights’ method (Gevrey et al. 2003). The results for the variable importance are standardised so that the most important variable is set to 100% and the others are expressed as a percent of this value. The variable importance for every region for RP100 and P1500 is given in Figure 10.
The logarithm of catchment area (Area) is the most important variable for RP100 in all regions, except 1 and 3 where SAAR is identified as most important. Similarly, for RP1500, Area is most important in all regions except for region 3. A possible reason for this is that region 3 is more diverse in terms of catchment properties and most catchments in region 1 have high percentage of wetlands. The descriptors SAAR, RI, Land use and soil type, median elevation, elevation range and percent waterbodies also appear to be of importance. As shown in Figure 8, there are some differences in the variable importance between RP100 and RP1500. For example, SAAR is the second most important variable for five of the regions for RP100, but only for four regions for RP1500. Soil types are important for two of the regions: region 6 and region 9 and 11 but for none of the regions for RP1500.
DISCUSSION
Model complexity
ANN models that have only one hidden layer have similar R2 and RMSE values to models with multiple hidden layers, which indicates that there is no justification for additional model complexity. In addition, increasing the number of layers can result in overfitting of the ANNs causing a reduced performance for the validation set. A similar conclusion was also made in de Vos & Rientjes (2005) who use ANN for rainfall-runoff modelling. Their results indicated that ANNs with only one hidden layer outperforms ANNs with two hidden layers and that the performance does not increase after having more than five nodes in the hidden layer. The authors suggest that the reason is that the optimisation algorithms are not able to find the global optima for complex models. Especially, models such as BNN (Bayesian Neural Networks) fitted using MC dropout are prone to less overfitting and provide better results as well as confidence intervals (Duerr et al. 2020). The estimation of uncertainty is a particularly important future extension of the application of ANN to peak flows due to the large inherent uncertainty in these values. In recent years, deeper neural networks using two or more hidden layers have become increasingly common in hydrology. For example, Kratzert et al. (2018) utilise LSTM, which is a RNN (having a feedback network), for a large dataset of catchments in the USA and show the applicability of this method for rainfall-runoff modelling, where the temporal structure of the data is important. However, in regional flood frequency, the volume of necessary data is lower than that of rainfall-runoff modelling. There is also no need for modelling events over time (considering the temporal dependence in the streamflow timeseries), and therefore complex models, such as a RNN, are not beneficial.
Regional performance of the single hidden layer ANN and comparison to other studies
As discussed earlier, the largest errors are observed in catchments with arid, semi-arid and humid continental climate, and high lake percentage. Several other articles also show that regionalisation methods perform worse for semi-arid catchments. Smith et al. (2015) finds higher R-RMSE (relative RMSE) for catchments in drier continental and arid catchments, compared to wetter temperate, tropical and polar regions. A reason is that there is more uncertainty when fitting the statistical distribution, because fewer large flood events are observed. For example, when we applied the log Pearson type III distribution in this study, a lot of the AMAX data were excluded as zero or low flow influential flows in arid catchments as recommended in Bulletin 17c, which reduced the number of available observations. This finding suggests that ANN models have lower performance in arid regions similarly to more traditional models and that PUB in these regions is extremely difficult.
In addition to differences in the performance of the models between climatic zones, we also found statistically significant spatial correlation. However, a reason for this might be the variations in climate, discussed earlier. Likewise, Dawson et al. (2006) analyse the regional performance of the ANN models and conclude that there are differences in the performance across the UK. The performance was worse in North and South Wales and in North England and the Scottish Highlands, which receive more precipitation or have higher altitude. The authors suggest that more descriptors are needed to accurately predict peak flows in these regions. In this study, we included catchment descriptors such as SAAR, RI and median elevation which exhibit some spatial autocorrelation and, in this way, information on spatial patterns is implicitly taken into account in the ANN models.
The overall performance of the ANN is comparative to that of other large-scale studies. For example, Walton (2018) compares the performance of 44 different USGS equations to estimate 2-year peak flow in 10 states using 1,509 gauges and finds that the USGS equations have an average absolute error of 41% while the results are improved (error is 33%) if only one equation is used. USGS reports on peak flows estimation at ungauged basins commonly report standard model errors of more than 40% for the 100-year return level (reports can be accessed at: https://water.usgs.gov/osw/programs/nss/pubs.html). Furthermore, a comparative analysis of research articles by Salinas et al. (2013) show that the median (NRMSE) error of all analysed studies is around 0.4 (or 40%) for Q100 (specific discharge) but the performance is much worse in arid catchments.
An alternative is applying a conceptual model to estimate the peak flows. An advantage of a conceptual model is that physical processes are explicitly taken into account and several studies show the potential of these methods (e.g. Yu et al. 2019). The improvement in quality of global gridded datasets for rainfall and temperature are now permitting the application of these methods at large scales. However, non-linearity in both the runoff generation and routing needs to be taken into account (McIntyre et al. 2011). Additionally, these models are prone to the same problems as ANN in terms of both calibrating the models to extreme peak flow events and regionalising the parameters to ungauged basins. Most importantly at the moment, it is computationally prohibitive to apply even a simple conceptual model at high enough density (producing discharge series at many points), which is required as input in flood hazard mapping at high resolution.
Variable importance
The most important catchment descriptors are Area and SAAR for both RP100 and RP1500. This is expected as these variables are directly related to discharge. Similar results are reported in other large-scale studies. For example, Smith et al. (2015) show that Area and SAAR explain most of the variability in a global-scale study that performs RFA, based on the index flood method. Even though it is possible to use more inputs when implementing ANN for regional flood frequency analysis, several studies also suggest that only two or three catchment descriptors are necessary to achieve the optimum performance. Muttiah et al. (1997) show that using only drainage area, precipitation and elevation is sufficient to optimise estimation of the 2-year flood. The authors also find that the performance is similar if only area and elevation are used, except for California where the performance is degraded when dropping the precipitation input. Similarly, Walton (2018) compared the equations presented in the USGS reports for 17 states and showed that the most often used catchment descriptors were Area, SAAR, elevation, percent of forest and storage (lakes, wetlands, etc.), and slope of the main channel. However, in a study by Shu & Ouarda (2007) predicting flood quantiles for Quebec, Canada, Area ranks only fourth out of the five descriptors used in the study. The most important variable in the study was the fraction of the basin area covered with lakes, followed by SAAR. This ambiguity suggests that the significance of descriptor variables is more dependent on the characteristics of the study area and the specific combination of available descriptors rather than the actual modelling method used.
In this study, we find that there are differences between the variable selection for RP100 and RP1500, but Kordrostami et al. (2020) show consistency between the different return periods. A possible reason is that the authors only used 88 gauges and estimated return levels up to the 100-year flood. In general, most of the USGS reports (Gotvald et al. 2012; Southard & Veilleux 2014) also show that similar catchment descriptors are selected in the regression equations for different return periods. In some cases, however, the relationship is forced to avoid reduction in flow at higher return periods (Ries & Dillow 2006). Despite the slight differences between return periods, we show that Area and SAAR have the highest importance in the majority of regions for the contiguous USA.
CONCLUSION
The paper outlines the application of ANNs, with differing architectures, to a large set of gauged catchments in the contiguous USA for the purpose of estimating flood quantiles in ungauged catchments. The process involved fitting numerous ANN models, including both single- and multiple-layer ANNs, to flood quantiles using selected catchment descriptors in order to determine the best model architecture for each region. The RMSE of the resulting models indicated that the ANN, fitted using one hidden layer, performed similarly to the more complex multiple hidden layer model. More complex ANN models do not provide any advantage, which can be explained by the lower data requirements in the regional flood frequency analysis. As expected, the model performance is worse in arid regions because fewer gauges are available and the uncertainty in the flood quantiles at the gauged points is higher.
The errors associated with the results of this study are comparable to other studies, suggesting that ANN is a viable alternative for large-scale RFA. A limitation of the ANN models is that explicit equations are not derived which reduces the interpretability. However, the automatic choice of significant catchment descriptors is sensible, with area and mean annual rainfall dominating in most regions. Another limitation is the challenge of collecting, storing and pre-processing large volumes of data required for model training. Currently, our approach does not provide quantification of flow quantile estimate uncertainty. Further investigation into uncertainty estimates using BNN is ongoing (this is an active area of ML research in general). However, the ML method described in this paper is particularly applicable to situations where a computational framework is in place to process gridded datasets in order to form a robust table of catchment descriptors. We anticipate that future applications of similar approaches will extend to the global scale, and as such, will provide a consistent method to produce global flood frequency estimates that can evolve in tandem with new data – particularly that based on Earth Observation products – and advances in algorithm design, providing actionable information in regions that may be poorly served by other methods.
ACKNOWLEDGEMENTS
The discharge data, 3DEP elevation and waterbody datasets were downloaded from USGS. Datasets for Soil type and average precipitation are available from the US Department of Agriculture Natural Resources Conservation Service (USDA NRCS).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.