Abstract
This study presents the first attempt to identify extreme rainfall events based on surrounding sea-level pressure anomalies, using neural network-based classification. Sensitivity analysis was also performed to identify the spatial importance of sea-level pressure anomalies. Three classification models were generated: the first classifies the patterns between extreme and regular rainfall events in the North West of England, the second classifies the patterns between extreme and regular rainfall events in the South East of England, and the third classifies between the patterns of extreme events in the North West and South East of England. All classifiers obtain accuracies between 60 and 65%, with precision and recall metrics showing that extreme events are easier to identify than regular events. Finally, a sensitivity analysis is performed to identify the spatial importance of the patterns across the North Atlantic, highlighting that for all three classifiers the local anomaly sea-level pressure patterns around the British Isles are key to determining the difference between extreme and regular rainfall events. In contrast, the pattern across the mid and western North Atlantic shows no contribution to the overall classifications.
HIGHLIGHTS
Neural networks can distinguish between extreme and regular rainfall events.
The sea-level pressure surrounding the UK is key to distinguishing extreme events.
The western North Atlantic does not contribute to classifying extreme events.
INTRODUCTION
Flooding caused by extreme rainfall events can have severe social, environmental and economic consequences. Although variations in yearly flood trends have been studied extensively (e.g. Robson et al. 1998; Cox et al. 2002; Prosdocimi et al. 2019), such trends do not help in identifying the processes which lead to the extreme cases. For example, in February 2020 storms Ciara and Dennis passed over the UK, resulting in up to 177 mm of rainfall in a single 24-h period (Met Office 2020), and are estimated to have resulted in insured losses of up to £200 million (Finch 2020). To provide improved risk analysis and support flood management, the processes which cause these extreme rainfall events need to be better understood and differentiated from those of regular rainfall events.
The occurrence of extreme rainfall events in the UK has a strong dependence on the concurrent and prior meteorological conditions across North Western Europe and the North Atlantic. For example, Brown (2018) shows the dependence of extreme daily rainfall in the UK on large-scale meteorological indices: North Atlantic Oscillation (NAO), Pacific Decadal Oscillation (PDO), El Niño – Southern Oscillation (ENSO) and Atlantic Multidecadal Oscillation (AMO). These indices represent the difference in either sea-level pressure (NAO, PDO) or sea-surface temperature (ENSO, AMO) across their specified regions. Brown found the biggest impact was made by the NAO (the difference in sea-level pressure between Iceland and the Azores), a positive NAO increases the likelihood of extra-tropical cyclones developing over the North Atlantic. This relationship is further demonstrated by Richardson et al. (2017) and Schillereff et al. (2019) which both show a negative NAO correlates strongly with increased high-river flows.
Extra-tropical cyclones are known to be the main contributors to extreme precipitation across the globe (Pfahl & Wernli 2012) and have also been linked to the development of atmospheric rivers (ARs) (Gimeno et al. 2020). ARs are long plumes of highly concentrated water vapour in the atmosphere which originate from the mid to lower latitudes, with those affecting the UK moving upwards towards North Western Europe from the Caribbean. Lavers et al. (2011) found ARs occurred during the 10 largest floods in the UK. Following this, Lavers & Villarini (2013) analysed the frequency and intensity of ARs with several climate change scenarios, concluding that both the intensity and frequency of the strongest ARs are expected to increase in the future. In contrast, Champion et al. (2015) found that less than 35% of winter and 15% of summer ARs are associated with an extreme rainfall event. This highlights the need to be able to determine the difference between atmospheric phenomena which cause extreme rainfall events and those which produce more moderate rainfall events of little or no interest in flood management.
Attempts to address this need have focussed on the identification of key meteorological patterns across the North Atlantic relating to extreme rainfall events derived from large-scale climate data. Neal et al. (2016) present 30 sea-level pressure anomaly (SLPA) patterns (MO-30) identified through the application of the k-means clustering algorithm (Lloyd 1957). These patterns represent the types of SLPA patterns which can be present over the North Atlantic on any day. The patterns were then combined subjectively into 8 SLPA patterns (MO-8), some of which are shown to strongly correlate with the NAO. Richardson et al. (2017) investigated the applicability of these cluster sets for identifying regional precipitation and drought climatology throughout the UK, finding that the smaller set of eight clusters do not aid in explaining precipitation variability. However, magnitude variation between patterns was observed, with some patterns producing consistently higher levels of median daily rainfall across all regions of the UK. However, this does not allow an easy distinction between extreme and regular rainfall SLPA patterns in the various regions.
Ummenhofer et al. (2017) attempted to cluster SLPA patterns using the anomaly precipitation spatial variation across Europe and found a dipole in SLPA across the UK, which can determine whether precipitation anomalies in the UK will be positive in the North West or South East. This relates to the findings by Champion et al. (2019), who found that summer extreme rainfall events in the North West are typically associated with a positive SLPA region over the UK. However, no such relationship was found when analysing extremes in the South East.
The large-scale meteorological patterns being considered by the studies above are represented by images, 2D matrices with pixel colour representing the numerical value of the meteorological variable in question (e.g. SLPA). Neural networks have proved to be effective at image classification across various domains, from the classification of YouTube videos (Karpathy et al. 2014) to tumour feature extraction (Yang et al. 2019) but, up until now, they have not been used effectively to classify meteorological patterns such as distinguishing between SLPA of extreme events. This study SLPA applies neural networks to identify the differences in SLPA patterns across the North Atlantic for extreme and regular rainfall events, applied to both the North West and South East of England. In particular, the study demonstrates the differences in SLPA between:
- 1.
Daily extreme and regular rainfall events in the North West of England.
- 2.
Daily extreme and regular rainfall events in the South East of England.
- 3.
Daily extreme rainfall events in the North West and South East of England.
Following this, a sensitivity analysis was conducted to compare which regions of the North Atlantic are most important to determining between the above classifications.
DATA
Rainfall events
This extraction resulted in 3,008 individual events (1,504 extreme and regular rainfall, respectively) in the North West and 2,290 events (1,145 extreme and regular events) in the South East. The disparity here is due to there being fewer non-trace rainfall days in the South East, with the North West having 15,046 non-trace rainfall days and the South East having only 11,450.
Meteorological patterns
To classify each event, the SLPA pattern is required across the North Atlantic. For each of the regular and extreme rainfall days for both the North West and South East regions, SLPA patterns are extracted across the North Atlantic. The patterns are extracted from the 2.5 gridded NCEP/NCAR Reanalysis 1 data set (Kalnay et al. 1996), with each pattern bounded between 15 and 70 latitudes and −80 and 15 longitudes.
CLASSIFICATION
To classify the rainfall events, a neural network-based classification method is used, the classifiers are trained using the SLPA patterns for each event over the North Atlantic. Three classifiers are required; the first classifies the North Western extreme and regular event patterns, the second classifies the South Eastern extreme and regular patterns and finally, the third classifier classifies the extreme event patterns from both the North West and the South East. The exclusion of a fourth model distinguishing between North West and South East regular events is intentional, as the focus of this paper is on the identification and comparison of extreme events. Table 1 describes each of these classifiers (MNW, MSE and Mcomp) and indicates which data sets are used in each model and which class they represent. This section introduces the neural network classification method and the optimisation procedure including how the data is split for training.
Model . | Description . | North West . | South East . | ||
---|---|---|---|---|---|
Extreme . | Regular . | Extreme . | Regular . | ||
MNW | Distinguishes between North Western extreme and regular events | 1 | 0 | ||
MSE | Distinguishes between South Eastern extreme and regular events | 1 | 0 | ||
Mcomp | Distinguishes between North Western and South Eastern extremes | 1 | 0 |
Model . | Description . | North West . | South East . | ||
---|---|---|---|---|---|
Extreme . | Regular . | Extreme . | Regular . | ||
MNW | Distinguishes between North Western extreme and regular events | 1 | 0 | ||
MSE | Distinguishes between South Eastern extreme and regular events | 1 | 0 | ||
Mcomp | Distinguishes between North Western and South Eastern extremes | 1 | 0 |
The numbers indicate the class of the given data set in the given model.
Neural network classification
A neural network consists of at least two layers of nodes connected through edges. Figure 2 shows an example architecture with three layers: an input layer, a hidden layer and an output layer. Assigned to each edge in the graph is a weight which is optimised through training and is indicated by wx where is the edge label.
Classifier training
Each of the three models introduced at the beginning of this section (MNW, MSE and Mcomp) requires different data sets, and hence three data sets are produced consisting of a labelled set of input vectors. Each data set is represented by a matrix of n rows and m columns where n is determined by the number of events and m is the number of cells (or pixels) in a given pattern. The row which represents a given pattern is generated through the flattening of its matrix, which is done through the concatenation of each row in the matrix to each other resulting in a vector. Through this operation, each 22 × 38 cell pattern is converted to a single vector of size 836. Each set of events is then split into two subsets: a training and a test set. The training set consists of 80% of the input vectors and is used to train the neural network, the remaining 20% is the testing data set and is used to validate the accuracy of the network on unseen data. The actual selection of events as belonging to either the training or testing data sets is randomised.
Training the network involves the optimisation of the weights between the nodes in the network, this optimisation is done using stochastic gradient descent (Bottou 1998) and the backpropagation algorithm (Rumelhart et al. 1986). Backpropagation is a way to propagate the error calculated at the output nodes back through the weights of a neural network (Laung & Haykin 1991). To do this, an objective function is required, and in this study, the cross-entropy loss function (Goodfellow et al. 2016) is used to determine the error between the output nodes and the intended classification label (extreme or regular). The cross-entropy error will take the log difference between the true label and its calculated probability, resulting in the two output nodes containing the probability of each class being correct.
To further provide refinements to the final model, we trial models with a varying number of hidden nodes. In the example given in Figure 2, only two hidden nodes are used; however, in the SLPA classifiers, the number of hidden nodes is trialled from 10 to 100 in increments of 10. This will enable the selection of a suitable number of hidden nodes for the final representative model.
RESULTS
The accuracy and interpretation of the three SLPA classifiers are given in this section. The first part of this section discusses the accuracy of each classifier and then the sensitivity of the classifiers to regions of the North Atlantic's SLPA.
Model accuracy
North west classifier (model MNW)
The results presented in Figure 3(c) show that the testing accuracies during training plateau at approximately 60%; in contrast, the training accuracies continue to increase close to 100% for classifiers with 90 and 100 hidden nodes. The classifier with 30 hidden nodes ends the 100 epochs of training with the highest testing accuracy of 62% and hence is selected for further investigation.
Investigating the MNW,30 classifier, precision between the extreme and regular event classifications differs at 68 and 55%, respectively. This indicates the classifier is better at identifying only extreme events, with fewer false positives. However, when comparing the recall scores of each classification, which are 60% for extremes and 63% for regular events, this shows the classifier is equally good at labelling positive extreme and regular events alike.
South east classifier (model MSE)
Similarly to the North West classifier a plateau occurs when attempting to train a neural network to classify between South Eastern extreme and regular rainfall event SLPA patterns across the North Atlantic. As presented in Figure 4, the training accuracy increases, as the training error of these classifiers decreases; however, as found in the training for MNW, the testing accuracy remains consistent at around 60%.
For MSE, the optimal classifier contains 10 hidden nodes (MSE,10), this classifier finishes the 100 epochs with a 60% testing accuracy and 84% training accuracy. Breaking this into the precision and recall values, a similar trend again is seen to MNW,30 with precision values for both regular and extreme events being similar at 55 and 59%, respectively. Further to this, the recall values also show a similar trend with the extreme event patterns having a recall higher than that with the regular event patterns (61 and 53%, respectively).
Extreme event classifier
The final classifier (Mcomp) looked to distinguish between extreme event patterns for North West and South East, similarly again to MNW and MSE a plateau of testing accuracy occurs, as shown in Figure 5, but with a slight variability depending on the number of hidden nodes used. The spread of training accuracies and training errors is also marginally lower than those presented in the previous classifiers.
Despite this, the most accurate classifier has 10 hidden nodes (Mcomp,10) and gives a testing accuracy of 65% and a training accuracy of 81%. In contrast to the MNW,30 and MSE,10 models, the precision and recall values of identifying South Eastern extremes (66 and 67%) are both higher than those of identifying North Western extremes (55 and 54%). This is counter-intuitive as the model was trained using 31% more examples of North Western extremes, indicating that the model would have more experience classifying these types of extremes.
Spatial sensitivity
To identify the regions of interest to each classifier, a saliency map is created, representing the relative contribution of each cell (input feature) to the overall classification. To calculate the contribution of each cell to the overall classification, the backpropagation algorithm is used on a baseline image which is a pattern consisting of only 0 and a given classification, for example, an extreme event in MNW. The error generated by the network is then propagated back through the network; the weights of edges will show a stronger difference if they are important to the given classification, whereas those with little relevance will not change by a comparatively large amount (Simonyan et al. 2014). When the errors reach the input nodes, they can be rank-ordered to identify which pixels were contributing the most to the given classification, in this study this is achieved by normalising the contribution values between 0 and 1.
Figure 6 shows the spatial contribution patterns for both MNW,30 (left) and MSE,10 (right). Both maps show little contribution from cells in the mid and western regions of the North Atlantic; however, a strong contribution is present closer to the British Isles. MNW,30 presents higher levels of contribution from both the Irish and North Seas, whereas MSE,10 presents relatively weak contributions from these regions but a higher level of contribution from the coast of Brittany in North Western France. Ummenhofer et al. (2017) show the difference in SLPA across the North Atlantic for various precipitation anomaly patterns; the key difference in the patterns presented is the SLPA just west and South West of the British Isles (a positive SLPA leads to negative precipitation anomalies in the North West and a negative SLPA indicates positive precipitation anomalies). Similarly, MSE,10 shows interest in the South West of the UK which continues to match the findings of Ummenhofer et al. (2017).
Next, the saliency map for Mcomp,10, which distinguishes between extreme events in the North West and extreme events in the South East, is shown in Figure 7 and presents a high level of contributions across the North of England, the Irish Sea and the coast of Brittany. This reinforces the case of local meteorological conditions creating the difference not only between extremes and regular rainfall events but also the difference between North West and South East extreme events. Further to this, both Figures 6 and 7 indicate on the day of occurrence the conditions across the rest of the North Atlantic are not contributing to the resulting classifications. This raises the question of how these contributions could change if the classifiers were trained using patterns of the days prior to an event, as it is known that certain prior meteorological conditions are common to some extreme rainfall events (Allan et al. 2019).
Limitations
The neural network-based approach used in this study relies heavily on the data used to train and test the models. As neural networks were trained to differentiate between extreme/regular rainfall events, hence, the training process is sensitive to how extreme events were selected. In this study, a threshold method is used, which selects the top 10% of standardised rainfall days in each region to represent extreme events. The reasons for this selection are outlined in Section 2.1; however, the use of an alternative threshold (e.g. 5%) or a maxima-based method would result in a set of models which vary greatly from the models presented here.
Furthermore, the method used for calculating the representative daily rainfall total for each region and day is the result of a trade-off, between computational time and accuracy. As presented in Section 2.1, the representative sample for each day is collated from a regular grid of points at 30 km intervals, the CEH-GEAR data set is provided at 1 km intervals; however, increasing the resolution of the regular grid would substantially increase the computational time required to calculate the average daily regional rainfall total. This trade-off was considered reasonable, as the present study only uses the magnitudes to determine the extreme and regular events within each region. If, however, we were to compare the magnitudes between regions, then a different approach using higher resolution representations would be necessary.
Finally, the sensitivity analyses shown in Section 4.2 highlight some interesting disparities between regular and extreme events in both regions. However, the interpretability and reliability of the sensitivity analysis are tied to how effective the optimisation of the neural network's parameters was during training. Hence, with further fine-tuning of the network's parameters, the resulting sensitivity analyses should reveal clearer disparities and offer further insight into the differences.
CONCLUSIONS
Neural network-based image classification was used to identify the concurrent, and SLPA differences across the North Atlantic between the following types of daily rainfall events for two homogenous rainfall regions in the UK:
- 1.
Extreme and regular rainfall events in the North West of England.
- 2.
Extreme and regular rainfall events in the South East of England.
- 3.
Extreme rainfall events in the North West and South East of England.
Through the generation and optimisation of several neural network classifiers to represent each of the above scenarios, the following conclusions can be drawn:
- 1.
The differences in SLPA for extreme and regular rainfall events in both the North West and South East of England are close enough to make numerical classification difficult.
- a.
Classifying between North Western extremes and regular rainfall events has an accuracy of 2% higher than classifying between South Eastern extremes and regular events (62 and 60%, respectively).
- b.
In both regional classifiers, the precision and recall of extreme events are higher than those of the regular rainfall patterns, indicating that extreme event SLPA patterns are more defined than regular events.
- c.
In determining the differences between extreme events in the North West and those in the South East, both recall and precision are higher for South Eastern extremes, indicating that the patterns relating to these events are more defined than those in the North West.
- a.
- 2.
Saliency maps have been used to identify the spatial regions of SLPA which contribute to the classifications.
- a.
The local SLPA patterns across the British Isles are key to determining the difference between extreme and regular rainfall events in both the North West and the South East.
- b.
The mid and western North Atlantic, however, has been shown not to provide any substantial contribution to any of the classifications developed in this study.
- a.
Finally, the patterns presented in the sensitivity analysis indicate the potential for this method to be used to identify the spatial importance of meteorological variables in the days prior to extreme or regular events. This will aid meteorologists to target the regions of importance without the need to incorporate global data, reducing computational requirements. Furthermore, opening this method for the application of other meteorological variables such as precipitable water, sea-surface temperature and geopotential height may enable further inference to be gained on why extreme events are different.
ACKNOWLEDGEMENTS
Andrew Barnes acknowledges funding as part of the Water Informatics Science and Engineering Centre for Doctoral Training (WISE CDT) under the National Productivity Investment Fund (grant no. EP/R512254/1).
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories (https://catalogue.ceh.ac.uk/documents/ee9ab43d-a4fe-4e73-afd5-cd4fc4c82556 and https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.surface.html).