ABSTRACT
Anomaly detection is used to explore the link between data-driven anomalous storms and their socio-economic impact on countries within the North-West Pacific. Three anomaly detection models are trialled using three distinct algorithms on the storm tracks and temperature profiles of storms. A feature-based comparison of the top 5% of anomalous storms from each model is used to reveal variations in anomalous storm activity. Further to this, the socio-economic impact of the anomalous storms is assessed, revealing a link between the anomalous behaviour of storms and the impact experienced by countries on their path. A final cross-comparison shows that the k-Nearest Neighbour and Isolation Forest algorithms succeeded at identifying high-impacting storms. However, the agglomerative clustering model found many unique storms that had low impact. This highlights the importance of considering both trajectory and temperature in determining the severity and impact of erroneous storms.
HIGHLIGHTS
Anomaly detection models found a link between anomalous nature and high socio-economic impact.
K-Nearest Neighbour and Isolation Forest algorithms identified high-impact storms.
Contradicting trajectory and temperature patterns were highlighted by the algorithms and were linked to high impact.
INTRODUCTION
Storms are one of the biggest natural disaster threats to the North-West Pacific (NWP), causing widespread destruction to countries such as South Korea, Japan, and the Philippines (Lee et al. 2019; Santos 2021; Basconcillo & Moon 2023). Not only this, storms in the NWP can be especially unpredictable, with changes in oceanic conditions having a major influence on the resulting storm's intensity and spatial impact (Yoshida et al. 2017). Estimates place the cost of storm-related damages make up 39% of hydrometeorological hazard costs in East Asia.
Basconcillo & Moon (2023) statistically analysed the characteristics of storms in the South Korean Peninsula, putting an emphasis on recent shifts and trends. The findings identified a link between storm intensity and the Pacific Decadal Oscillation, a sea surface temperature climate cycle, and its negative phase which creates favourable conditions for increasingly intense storms over the Pacific Ocean.
Similarly, Lee et al. (2019) investigated long-term trends in storms and their tracks over the NWP between 1982 and 2016, with a focus on Japan and South Korea. Results showed that the northward trajectory of storms was associated with changing environmental conditions, such as sea surface temperature, which in turn led to increased intensification. Complementing these results, Moon & Ha (2021) analysed NWP storms, between 1958 and 2019, to determine the likely factors for the increasing number of storms each year. Consistent with the findings of both Basconcillo & Moon (2023) and Lee et al. (2019), their conclusions suggested that sea surface temperature was a likely contributor to this increase.
Building on this, sea surface temperature has further been found to be a major factor in the unpredictability of storms. Yoshida et al. (2017) conducted ensemble simulations to evaluate the expected variations in global storm activity, under both current and forecasted surface warming conditions. Their results unveiled significant anomalies in storm frequency in particular years, leading to the conclusion that these variances were predominantly influenced by fluctuations in sea surface temperature and storm track trends.
Highlighting the socio-economic impact of storms is equally vital to this research. As such, storms are a major contributing factor to damage to infrastructure and agriculture, causing a variety of unique weather events such as tsunamis, landslides, flash flooding, and drought (Santos 2021). Santos (2021) assessed the most destructive storms in the Philippines in 2020 by pulling data from several online sources, including social media. One objective of this was to identify impact and recommend future precautionary measures to affected areas. The results found that Storm Ulysses was the most destructive, with the most affected areas situated near water bodies, trees that absorb huge amounts of water, and low-lying areas. Not only is understanding this damage essential to this research's motivation but it will also aid when linking anomalous storm nature to socio-economic impact.
Machine learning offers a promising approach to classifying storms across the globe, with one such method, anomaly detection (AD), being highly motivated yet sparsely researched. Li et al. (2022) used such to explore storms and their growth, specifically analysing gale-force wind radius (R34). By means of an Isolation Forest, data were split into rapid growth (RG) and rapid shrinking (RS) by looking at the change in R34. By identifying the anomalies it was found that storms with RG tended to show high destructive potential, and a discernible size life cycle, compared to those with RS. From this, the research highlighted the pivotal role of RG in altering the destructive potential of storms, and in doing so showcased the intuitive use of AD. Other machine learning methods, not implementing AD, have been used in this domain. Tamamadin et al. (2022), for instance, used a supervised k-Nearest Neighbour (k-NN) algorithm to make predictions of typhoon tracks. After training the model on a set of current and similar class typhoons, a series of tests showed that the algorithm was highly successful at predicting track with great accuracy. It was additionally found that the number of ensemble members did not necessarily affect accuracy, whereas the determination of similarity between storms at the start did. Alternatively, Martzikos et al. (2018) used a variety of algorithms such as Agglomerative Clustering to analyse storm events in Greece and classify them. This study was an extended analysis of storm events in the port of Rethymno for wave projections between the years 1960 and 2100, resulting in storm events being classified based on severity through the analysis of storm energy and period. Despite highly successful results in these papers, neither considered the alternate use of AD in current or future work, especially in relation to the unpredictable nature of storms (Yoshida et al. 2017).
With these advancements in mind, this paper will utilise machine learning to identify anomalous storm tracks and their relationships with socio-economic impact. To begin the paper describes the case study region, the dataset of storm tracks used and the three chosen AD methods which will be employed. The AD models are then implemented on the dataset to identify anomalous storm tracks, which are subsequently subjected to qualitative analysis to identify their relationships to socio-economic impact. This will enable effective evaluation of the use of machine learning-based storm track AD.
METHODOLOGY
This section begins by detailing the chosen case study and the dataset of storm tracks and temperature profiles. Data preprocessing steps are then outlined followed by detailed descriptions of the three chosen AD methods.
Data
Storm tracks from the ‘International Best Track Archive for Climate Stewardship’ dataset (Knapp et al. 2010, 2018), particularly the Eastern Pacific subset, were combined with a 2-m temperature data from the ERA5 meteorological dataset (Hersbach et al. 2023). While sea surface temperature is the norm, this study used 2 m air temperature as it allows analysis of storms which travel over both sea and land.
Here, the standardised data for storm s and feature f is defined by the raw value (for storm s and feature f) minus the mean of the raw values for that feature across all storms, divided by the standard deviation of all raw values for the given feature across all storms.
Anomaly detection
The three AD methods are selected to identify anomalous storm tracks and temperature profiles. The three chosen methods are: (i) k-NNs, (ii) Agglomerative Clustering and (iii) Isolation Forests. Each of these is justified and described in their relevant sections below.
k-NNs
The first model that was chosen was k-NN, as implemented in the work by Tamamadin et al. (2022). Here, k-NN was used to detect and classify ensembles, size K, of neighbouring storms. This was done by finding those with the smallest Euclidian distance and then using these to predict storm tracks. In a similar fashion, k-NN in this work is employed to identify ensembles of neighbouring storms. It will calculate the average distance of all neighbours for each storm and define a threshold to determine its anomalous nature.
Due to the variability of features in the storm dataset, as well as its smaller length, a medium value of 8 was chosen for K. This value selection allowed the model to identify complex patterns that did not underfit the dataset while also complementing the variability of storms. Additional tests with differing values found that a smaller K overfitted the data, ignoring many of the local neighbourhoods that were otherwise found, and a larger value led to the model learning overly simplistic patterns.
A decision function was chosen based on the calculated average distance of all neighbours. This method was not only computationally efficient but gave each of the identified neighbours equal importance, which for the exploratory scope of this project seemed preferable.
Distance scores were then compared against a threshold to determine classification. The threshold was defined by the value at the 95th percentile of distance scores, a decision that is consistent through all three models. The 95th percentile was chosen based on the idea that it allowed enough anomalies to be identified but limited the quantity to prevent it from becoming too extensive. This decision is also seen in the work done by Li et al. (2022). As with the decision function, this is a choice that could further be explored in future works.
Agglomerative clustering
The second model to be implemented was the Agglomerative Clustering algorithm, a hierarchical model used by Martzikos et al. (2018). The aim of clustering in this study was to divide the data into groups with similar properties based on the Euclidean distance between data points. These groups of storms could then be generalised, extracting shared features of each. Alternatively, in this project, these distances will instead be used to determine outliers, where single data points with higher initial distances before their first cluster will be categorised as anomalous.
After calculating distances between each pair of data points, a decision on the linkage method was needed for the clustering model, unlike in the work of Martzikos et al. (2018) who used all methods. Ward's method was chosen for this project to minimise the variance between clusters (scikit-learn 2023a), reducing the model's susceptibility to noise. As with the other decisions made, this is another hyperparameter that could be extensively explored further in future work.
After clustering, the ‘distance to closest cluster’ for all data points was examined (in other words, the distance an isolated data point encounters before merging into the hierarchical structure). Distances that were greater than the threshold, which again took the value at the 95th percentile of all distance values, were labelled anomalous.
Isolation Forest
The final model implemented was the Isolation Forest, taking inspiration from the work of Li et al. (2022) who used such to detect anomalous classes of storms in Australia. The algorithm detects anomalies through the use of binary trees, created by random features and split values (scikit-learn 2023b). Scores are then given to each data point determined by how many splits are needed to isolate it. Anomalies, being easier to isolate, require fewer splits.
The Isolation Forest model had several additional hyperparameters that were set (scikit-learn 2023b), based upon the nature of the data, for optimal result production. Bootstrap determined whether individual trees were randomly sampled with replacement. Setting this value to true, given the nature of the data set, was likely to increase the robustness of the model by increasing the diversity of the trees. Additionally, the parameter n_estimators determined the number of trees in the forest. Setting this to a moderate value of 150 made sure that the model was less likely to overfit the data but was still able to pick out varying patterns.
Max samples determined the number of samples drawn to train each base and were not changed from the default (auto) to include the length of the entire dataset. Alternatively, max features indicated the number of features drawn, and again was left as the default (1.0) so that all features would be included. These decisions were based upon the exploratory nature of this project, and further work would likely be tweaked to gain further insights into the model's capability.
Once the data had been fit to the model, each of the scores was converted to a scale between 0 and 1 and then to a percentage. In line with the previous models, anomalies were then classified based on whether they were greater than the value at the 95th percentile of data.
RESULTS AND DISCUSSION
The results section will analyse and discuss the results obtained by the three AD models. The first three subsections will provide a qualitative analysis of anomalous storm tracks and temperature profiles identified by the models. Next, the anomalous storms are compared with historic reports and news articles to determine their socio-economic impact, providing a link between the anomalous nature of the storm and the direct impact on society. Finally, a detailed cross-comparison of the methods and anomalies is provided.
k-NN
The anomalies identified by the k-NN model are shown in Table 1. Each storm is given a numerical label for easier identification of the proceeding figures.
SID . | Date . | Label . | Average neighbour distance . |
---|---|---|---|
1952224N14135 | Aug 1952 | 9 | 13.46 |
1960164N29134 | Jun 1960 | 48 | 13.31 |
1964224N25161 | Aug 1964 | 74 | 15.33 |
1965205N07163 | Jul 1965 | 79 | 12.82 |
1975221N18116 | Aug 1975 | 124 | 15.76 |
1978288N10185 | Oct 1978 | 137 | 13.50 |
1983239N10183 | Aug 1983 | 150 | 13.65 |
1984214N12143 | Aug 1984 | 155 | 15.22 |
1986221N17163 | Aug 1986 | 162 | 12.64 |
1989010N16212 | Jan 1989 | 170 | 17.26 |
1989235N26122 | Aug 1989 | 172 | 13.54 |
1990319N07197 | Nov 1990 | 179 | 13.46 |
1993221N12216 | Aug 1993 | 199 | 15.12 |
1997201N24152 | Jul 1997 | 227 | 13.00 |
1997221N03179 | Aug 1997 | 228 | 15.81 |
2020213N15131 | Jul 2020 | 310 | 15.54 |
SID . | Date . | Label . | Average neighbour distance . |
---|---|---|---|
1952224N14135 | Aug 1952 | 9 | 13.46 |
1960164N29134 | Jun 1960 | 48 | 13.31 |
1964224N25161 | Aug 1964 | 74 | 15.33 |
1965205N07163 | Jul 1965 | 79 | 12.82 |
1975221N18116 | Aug 1975 | 124 | 15.76 |
1978288N10185 | Oct 1978 | 137 | 13.50 |
1983239N10183 | Aug 1983 | 150 | 13.65 |
1984214N12143 | Aug 1984 | 155 | 15.22 |
1986221N17163 | Aug 1986 | 162 | 12.64 |
1989010N16212 | Jan 1989 | 170 | 17.26 |
1989235N26122 | Aug 1989 | 172 | 13.54 |
1990319N07197 | Nov 1990 | 179 | 13.46 |
1993221N12216 | Aug 1993 | 199 | 15.12 |
1997201N24152 | Jul 1997 | 227 | 13.00 |
1997221N03179 | Aug 1997 | 228 | 15.81 |
2020213N15131 | Jul 2020 | 310 | 15.54 |
The threshold is 12.54 and the mean of all anomaly distances is 14.34.
Track
The patterns that can be seen in Figure 2 vary, most in ways that differ from the norm. For example, Storm 124 begins to travel at a constant speed towards the North-West before accelerating in the opposite direction. The tracks also show the storm diverting around the edge of the high-density areas of the map. Looking back at Table 1, this unusual pattern is backed up by the high distance score of 15.76, in contrast to the mean of 14.34. It is likely that the storm's unusual change of speed and direction was different to that of its neighbours', causing the algorithm to classify it as anomalous. There are other similar examples, such as Storm 155, which follows a wider curve down the coast of eastern China before unexpectedly diverting on a short southern trajectory near its end. Alternatively, Storm 228 instead chooses to travel up the Eastern side of the map. Both storms' respective distance scores of 15.22 and 15.81 are indicators that these patterns are again less common and, thus are also flagged by the k-NN algorithm.
Notably, Figure 2 shows that Storms 137, 150, 170, and 179 all exhibit horizontal East-to-West paths. Not only do the storms inherit similar trajectories but are also neighbours of each other. This ensemble of neighbours is likely attributed to the chosen mid-level value of K. If, for instance, K was set to a lower value of 4, the allocated distance scores would likely decrease (assuming the four storms remained neighbours), potentially leading to these storms no longer being classified as anomalies. This hyperparameter choice was less uncertain in the work by Tamamadin et al. (2022), who using a labelled dataset could find optimal values of K that produced the lowest error. Nonetheless, this example emphasises the crucial role that the hyperparameters, such as K, play in the k-NN model, where a misjudgement could lead to patterns such as this being overlooked.
According to Table 1, Storm 170 records the highest distance score of 17.26. Despite its aforementioned trajectory displayed by other anomalies, Figure 2 additionally shows Storm 170 travelling slightly southward, a pattern that is likely to have determined its higher distance score because it is not common with the rest of the data. Further, while analysing the date of Storm 170, it is observed as the only identified anomaly that occurs at the start of the year. It is vital to note that date was not included as a training feature of the k-NN model, which suggests that there are likely other external variables and patterns encapsulated in the model. In light of these observations, it is evident why Storm 170 has been highlighted as a significant anomaly by the k-NN algorithm.
Temperature
While there is variability in the data, Figure 3 generally indicates a decrease in temperature over time. This correlates with the northward travel of most storms, where temperature typically drops as the storm journeys further away from the equator. Many of the identified anomalies adhere to this trend – for instance, the temperature of Storm 150 drops after approximately 150 h. Given this pattern's commonality, it suggests that temperature might not have been the primary factor singling Storm 150 as an anomaly. This is evident after recalling its distinctive track in Figure 2.
Other examples of similar storms follow this trend, yet also indicate additional temperature fluctuations. Storm 48, for example, experiences an initial temperature increase before typically decreasing. This pattern, coupled with its brief duration, offers insights into its anomalous classification. It is additionally supported by its tracks in Figure 2, which initially head towards the hot equator before changing direction. Similarly, Storm 155's temperature sharply drops before swiftly rising. This is then followed by another drop. While this behaviour aligns with the storm's trajectory in Figure 2, where it momentarily heads South towards the equator, the pattern remains distinctively unique, further emphasising its anomaly status.
Storm 170 is again worth looking at in detail due to its extremely hot temperature. Paired with its unusual track as identified in Figure 2, it is unlike any other storm identified; the fact that both its temperature and tracks are unique support its high distance score. In addition to this, it is an incredibly fast storm, taking less than 200 h of total travelling time. This was most likely yet another feature to be picked up by the k-NN algorithm and indicates that for this storm there is likely to be a larger set of features that categorise it as anomalous.
Agglomerative clustering
Track
The results in Table 2 outline the anomalies found by the Agglomerative Clustering model along with their numerical labels and dates.
SID . | Date . | Label . | Distance to closest cluster . |
---|---|---|---|
1960164N29134 | Jun 1960 | 48 | 16 |
1964224N25161 | Aug 1964 | 74 | 15 |
1975221N18116 | Aug 1975 | 124 | 15 |
1986221N17163 | Aug 1986 | 162 | 14 |
1989010N16212 | Jan 1989 | 170 | 19 |
1989235N26122 | Aug 1989 | 172 | 18 |
1993221N12216 | Aug 1993 | 199 | 13 |
1997221N03179 | Aug 1997 | 228 | 14 |
1998235N17131 | Aug 1998 | 235 | 13 |
2010240N15142 | Aug 2010 | 274 | 12 |
2015229N08212 | Sep 2015 | 295 | 16 |
2020213N15131 | Jul 2020 | 310 | 18 |
SID . | Date . | Label . | Distance to closest cluster . |
---|---|---|---|
1960164N29134 | Jun 1960 | 48 | 16 |
1964224N25161 | Aug 1964 | 74 | 15 |
1975221N18116 | Aug 1975 | 124 | 15 |
1986221N17163 | Aug 1986 | 162 | 14 |
1989010N16212 | Jan 1989 | 170 | 19 |
1989235N26122 | Aug 1989 | 172 | 18 |
1993221N12216 | Aug 1993 | 199 | 13 |
1997221N03179 | Aug 1997 | 228 | 14 |
1998235N17131 | Aug 1998 | 235 | 13 |
2010240N15142 | Aug 2010 | 274 | 12 |
2015229N08212 | Sep 2015 | 295 | 16 |
2020213N15131 | Jul 2020 | 310 | 18 |
The threshold is 12 and the mean of all anomalous distances is 15.25.
On the other hand, Average Linkage examines the average distance between members of two clusters and merges if this distance is smaller than other potential combinations (scikit-learn 2023a). To test this theory, the model was executed in the same manner using the average linkage method, and as a result, Storm 274 was not classified as an anomaly. This may have been because of stricter criteria for what constitutes an anomalous trajectory. Given that Storm 274 originally had a score of 12, which borders the threshold, it is likely that adjusting the model's method, and consequently sensitivity, excluded it from being labelled as an outlier.
In the trajectories of Storms 172 and 310, both full and partial curves are observed. Although the general direction of both storms is what may be expected, they have been given much higher distance scores of 18, exceeding the threshold of 12 and mean of 15.25. Despite both storms exhibiting some unusual behaviours – Storm 310 creates a broader curve and Storm 172 starts travelling slightly Southward near its end – their shared high score may suggest that there are other features that contribute to their classification.
Despite this, most of the identified storms seem unique in nature: Storms 74 and 199 loop on themselves; Storm 235 manifests a meandering path; and Storm 295 takes a unique curved path heading inland rather than parallel to the coast. All of these display mid-range distance scores between 13 and 16, which centre around the mean. Within the scope of Agglomerative Clustering, such mid-range scores indicate that these storms deviate from normal patterns but may not be as extreme as those affiliated with higher scores. As previously discussed, tweaking the linkage method and sensitivity may have led to differing results, or perhaps increasing the threshold may have removed the anomaly status from storms such as 199 and 235, which accompany lower-bound scores of 13. Despite these scenarios, it is evident that these storms are indicative of moderate anomalies in the case of Agglomerative Clustering.
Temperature
Storm 310 observes a similar temperature decline, albeit with some fluctuations, followed by an increase. Yet, a closer look at Figure 4 shows no southward shift in Storm 310's trajectory to explain this increase. This contradiction between trajectory and temperature is notable and leads to suggestions that the Agglomerative Clustering algorithm identified this inconsistency, weighting it differently to the other algorithms. Regardless, in both cases of Storms 172 and 310, these insights start to reveal underlying patterns in the data that imply why these storms were given higher distance scores.
Other aforementioned storms can be further examined using Figure 5. For example, Storms 74 and 199, whose trajectories looped, have two varying temperature patterns. Storm 74, which boasts a distance score of 15 seemingly fluctuates in temperature. In addition to these shifts, the initial temperature is not as high as most other storms, as indicated by the density heatmap. This is a trend shown by several other anomalies, such as Storm 48, 172 and 295, attributing to the initial track position being further North. It is likely that the collation of these patterns led to a distance score of 15, again indicating a moderate anomaly not too extreme.
Storm 199 offers slightly divergent observations; its temperature pattern follows a rather standard trend, staying within the mid to high-density areas as shown in Figure 5. Although its trajectory aligns with this pattern, as previously explained, its extreme looping nature stands out. As a result, it might be reasonable to propose that for this storm, the track had a significantly stronger influence on classification. This presumption finds support in the lower distance score of 13, which hints that Storm 199 may only border what the Agglomerative Clustering model deems anomalous.
Isolation Forest
The final model to analyse is the Isolation Forest. Results similar to the previous models are shown in Table 3.
SID . | Date . | Label . | Anomaly score (%) . |
---|---|---|---|
1952224N14135 | Aug 1952 | 9 | 100 |
1954240N13151 | Aug 1954 | 17 | 85 |
1964224N25161 | Aug 1964 | 74 | 86 |
1965205N07163 | Jul 1965 | 79 | 90 |
1975221N18116 | Aug 1975 | 124 | 89 |
1984214N12143 | Aug 1984 | 155 | 95 |
1989010N16212 | Jan 1989 | 170 | 96 |
1989235N26122 | Aug 1989 | 172 | 91 |
1990319N07197 | Nov 1990 | 179 | 87 |
1992318N06182 | Nov 1992 | 197 | 81 |
1993221N12216 | Aug 1993 | 199 | 97 |
1997201N24152 | Jul 1996 | 217 | 95 |
1996227N15176 | Aug 1996 | 218 | 84 |
1997221N03179 | Aug 1997 | 228 | 87 |
2020213N15131 | Jul 2020 | 310 | 81 |
2021219N08193 | Aug 2021 | 313 | 89 |
SID . | Date . | Label . | Anomaly score (%) . |
---|---|---|---|
1952224N14135 | Aug 1952 | 9 | 100 |
1954240N13151 | Aug 1954 | 17 | 85 |
1964224N25161 | Aug 1964 | 74 | 86 |
1965205N07163 | Jul 1965 | 79 | 90 |
1975221N18116 | Aug 1975 | 124 | 89 |
1984214N12143 | Aug 1984 | 155 | 95 |
1989010N16212 | Jan 1989 | 170 | 96 |
1989235N26122 | Aug 1989 | 172 | 91 |
1990319N07197 | Nov 1990 | 179 | 87 |
1992318N06182 | Nov 1992 | 197 | 81 |
1993221N12216 | Aug 1993 | 199 | 97 |
1997201N24152 | Jul 1996 | 217 | 95 |
1996227N15176 | Aug 1996 | 218 | 84 |
1997221N03179 | Aug 1997 | 228 | 87 |
2020213N15131 | Jul 2020 | 310 | 81 |
2021219N08193 | Aug 2021 | 313 | 89 |
The threshold score is 80.25% and the mean score of all anomalies is 89.56%.
Track
Storm 17 also stands out among the newly identified anomalies with its unusual zig-zag trajectory. Following this, it resumes a northward course with some dynamic speed variations. Its score of 85% puts it into the lower to mid-range of scores when compared to the mean. The role of the model's hyperparameters may have been indicative in this classification, specifically n_estimators, which determines the number of trees in the forest (scikit-learn 2023b). If the number of trees in the model was smaller, any variability due to randomness may have been increased, potentially removing the anomaly classification from this and other storms in the lower bound of scores. On the other hand, increasing this value may have led to the overfitting of data and noise.
Storm 197, another newly identified anomaly, has one of the lowest anomaly scores at just 81%. Upon observation of its trajectory, shown in Figure 6, this seems fitting; its path closely aligns with the high-density regions of the heatmap, making it less anomalous. Despite this, Storm 197 was still classified as an outlier, hinting at subtle patterns that may differentiate it. Compared to other curving storms, Storm 197 displays more extreme acceleration northward, evident by the larger gaps between data points. This distinctive pattern, although minor in appearance, may have been pivotal in its anomalous classification. As well as this, other features such as temperature and time could also have influenced this classification.
Temperature
The Isolation Forest algorithm works by randomly partitioning features and determining anomalies by taking data points isolated with the fewest splits (scikit-learn 2023b). This means that even subtle differences in the data, that are not overly visible in graphical representations, may lead to much quicker isolation. Storm 9's trajectory and temperature pattern, although appearing moderately unique at a visual level, could have hidden traits that make it easy for the model to isolate in the feature space. It is also important to consider the hyperparameters used for the model, where minor tweaks to the level of randomness and variability can have huge effects on the results. Consequently, the characteristics of Storm 9 that are seen as minor may have been magnified in the high-dimensional feature space that the algorithm operates in, thus gaining a high score.
Storm 79's temperature stands out among other anomalies identified across the three models. This is mainly due to it registering some of the coldest temperature readings, seen at the 375-h mark. At a closer look, this is backed up by the storm's trajectory, which indicates that it makes landfall. Similar patterns of this are found in other anomalies such as Storm 155 and 310, which seemingly make landfall at 250 and 175 h respectively. On one hand, such findings highlight the model's capability to identify specific patterns in the absence of external prior information. On the other hand, this observation begs further questions about the model's oversight regarding other storms that make landfall but are not classified as anomalies. Much like the ambiguities of Storm 9, it again suggests the notion of hidden patterns in the model and its high-dimensional feature space that are less obvious through qualitative analysis.
Footprint and impact
In this section, the footprint of anomalous storms will be explored through a comparison with historical reports and news articles. The insights that are drawn here will be used to complement the patterns identified in the section prior, evaluating the genuine anomalous nature of the storms identified in Sections 3.1–3.3.
Storms 137, 150, 170 and 179 – otherwise known as Typhoon Rita, Typhoon Ellen, Tropical Storm Winona, and Typhoon Owen – have evidence of extensive damages and loss of life seen across various reports. Not only was Rita one of the Philippines' deadliest storms as reported by Bacani (2013), but resulted in extensive damages and more than 400 deaths. Ellen, on the other hand, was the worst typhoon to hit Hong Kong in 10 years resulting in 10 deaths and a further 12 missing (Hong Kong Observatory 1983). Next, the out-of-season Winona was part of a highly above-average typhoon season that saw a yearlong flurry of storms (Hong Kong Observatory 1989). Finally, Owen, a typhoon that travelled across the Marshall and Caroline Islands, caused extensive property destruction on particular islands upwards of 99% (Joint Warning Typhoon Centre 1990).
Storm 9 had a notable anomaly score of 100% but appeared unremarkable at surface level. Historical records in the years between the Korean War however detail its heavy impact. Accounts of this storm, called Karen, recall events including: halting the siege of Wonsan, dislodging mines, pausing combat due to torrential conditions, and disrupting communications (The Courier-Mail 1952). Additionally, hospitals had to prep emergency equipment to tend to the wounded during the storm. Storm 197 alternatively, also known as Typhoon Gay, was the most intense storm to hit the NWP since 1979 (Joint Warning Typhoon Centre 1992). Gay was said to have bulldozed through the Marshall Islands, leaving over 5,000 people homeless. Other areas such as Guam also experienced damage, but figures were hard to estimate due to another intense storm only months prior. Miraculously there were very few deaths and injuries.
Storm 172, otherwise known as Tropical Storm Rodger, made landfall in Japan killing 3 people as reported by Hong Kong Observatory (1989). Other impacts included disruption to air traffic and rail services, as well as flooding in around 550 houses. More recently, in 2020 Typhoon Hagupit, known here as Storm 310, swept through eastern China and South Korea causing at least 15 deaths and major property damage (Ji-hye 2020). Storms 74, 199 and 217, known for their looping tracks, can also be linked to their footprint. Storm 74, or Typhoon Kathy, was featured on the front page of The Brandon Sun (1964), resulting in multiple fatalities in Japan due to thunderstorms and landslides. Storm 217 was nicknamed ‘Killer Typhoon’ Kirk by one news report, similarly resulting in several fatalities and even more injuries (United Press International 1996). Storm 199 on the other hand, does not show any significant socio-economic impact. A report about such (Hong Kong Observatory 1993), named Typhoon Keoni, concludes that it dissipated over water.
There are other identified anomalies that showed very little damage despite their unusual patterns. Storm 295, Typhoon Kilo, was a record-setter for the longest-living storm on Earth in 2015 (Fritz 2015), although this is not shown in the data because of its cross between the central and NWP. Despite this longevity, minimal reports of damage were received (Birchard 2018). Therefore, while there is some evidence to suggest that anomalous storm tracks and temperature patterns do lead to high-impact events there is not sufficient evidence to conclude that these are the only contributing factors.
Cross-comparison
This section of the analysis will closely examine and compare the three models in question. Given the absence of a labelled dataset, a direct qualitative comparison becomes necessary for gaining insights into the performances of models. The primary focus of this will be to discern the differences between the models and their results, highlighting notable feature patterns and correlating them to their respective methodologies. Model performance can additionally be determined by the impact of anomalous storms as discussed in the section prior.
Despite this discrepancy, each model identified Storm 170 as a pronounced anomaly, assigning it significantly higher scores than others. Above all, Storm 170 coincided with a season that was above average (Hong Kong Observatory 1989), as well as occurring in January rather than in the expected summer period. This association, given the models had no prior knowledge about the dates of these storms, further suggests a correlation between the model's training features and heightened storm impact.
In addition to this, the Isolation Forest was the only algorithm to identify one of the greater impacting storms, Storm 197 (Typhoon Gay). Finding similarities to Storm 217, its tracks in Figure 11 slingshot northward, dramatically increasing its acceleration past Japan. This anomaly was found to be the most intense storm seen in the NWP within the span of 10 years, devastating areas such as the Marshal Islands and Guam (Joint Warning Typhoon Centre 1992).
CONCLUSIONS
This paper details the application of three AD models to identify anomalous storm tracks and temperature profiles. These anomalous storms are then assessed for their links with socio-economic impacts in the NWP region.
All models consistently identified anomalous storms, with a large proportion associated with high levels of socio-economic impact. For example, Storm 74, known for its devastating impact on Japan (The Brandon Sun 1964), and Storm 310, which swept through Eastern China and South Korea (Ji-hye 2020) injuring and killing many. These storms exhibited contradicting temperature and track patterns to the rest of the data enabling all three AD methods to identify them. Such unanimous findings by the models, combined with the literature, emphasise the critical role of storm track (Yoshida et al. 2017; Moon & Ha 2021) and temperature (Moon & Ha 2021; Basconcillo & Moon 2023) in anomalous storm classification.
While the models produced differing sets of anomalies, each showcased different strengths. The k-NN model highlighted a unique cluster of high-impacting storms, demonstrating its sensitivity to non-curving trajectories otherwise overlooked. This resonates with the work of Tamamadin et al. (2022), who also found high success rates in their k-NN storm track study. In a similar fashion, the Isolation Forest was able to pinpoint other impactful storms missed by others, showcasing its ability to identify unique fluctuating acceleration patterns. This use of AD is corroborated by the work of Li et al. (2022), who had equal success in the novel use of the Isolation Forest model.
Overall this paper has highlighted the applicability of unsupervised AD methods for identifying anomalous storm tracks and temperature profiles. This paper also goes a step further, highlighting that in some cases the anomalies identified also lead to anomalous socio-economic impacts; however, drawing a direct relationship between anomalous storms and socio-economic impact is an open question.
ACKNOWLEDGEMENTS
The authors offer gratitude to friends and family who have supported them throughout the duration of this work.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository at https://journals.ametsoc.org/view/journals/bams/91/3/2009bams2755_1.xml.
CONFLICT OF INTEREST
The authors declare there is no conflict.