ABSTRACT
Oceanographic phenomena are inherently complex due to factors such as global warming, ice melting, and human activities. Analyzing oceanographic data presents a challenging task, and a quantitative approach is often implemented to address these complexities. Traditional machine learning techniques, such as regression and cluster analysis, are commonly employed for predicting and identifying similarities in oceanographic conditions across different locations. This study explores the integration of qualitative approaches into oceanographic data analysis by developing hybrid models that combine topological data analysis (TDA) with traditional machine learning methods. The focus is on extracting qualitative features using persistent homology, the core tool of TDA, and incorporating these features into machine learning models such as support vector regression (SVR) and hierarchical clustering analysis (HCA). The study applies these methods to the following two oceanographic datasets: wave height data from buoy stations in Canada and sea level data from gauge stations in Japan. The results show that the hybrid models significantly outperform traditional approaches across several performance metrics, including the R2 score, maximum absolute error, mean squared error, and Rand score. This demonstrates the value of adding topological information to machine learning models, which enhances their predictive performance.
HIGHLIGHTS
Oceanographic data’s non-linearity complicates traditional analysis.
This research introduces a hybrid model integrating topological features.
Wave height and sea level data from Canada and Japan are used to test the model’s efficacy.
The study develops a topology-based hierarchical clustering and support vector regression.
Results indicate these models outperform traditional methods.
INTRODUCTION
Oceanographic data play a pivotal role in elucidating the ocean’s contribution to global climate dynamics, monitoring marine ecosystem vitality, and enhancing navigational and disaster response strategies. Among the variables collected, sea level and wave heights are of paramount importance. Changes in the sea level, closely monitored for climate change indicators, impact coastal ecosystems and human settlements, guiding adaptive strategies in coastal management and infrastructure development. Accurate measurements of wave heights are crucial for maritime safety, optimizing shipping routes, and formulating responses to storm surges and tsunamis.
The field of oceanographic data analysis is rapidly advancing with the integration of sophisticated machine learning and deep learning techniques, significantly enhancing our understanding and management of marine environments (Thomson & Emery 2014). These technologies are pivotal in deciphering complex patterns from vast datasets. Notable examples include Kumar et al. (2018): where they applied multiple linear regression to reconstruct sea levels in the western South Pacific using atmospheric and oceanic variables. Kumar et al. (2018) claimed that the model captures 82 island sea level and can be utilized to generate local sea level projections by downscaling climate models. Similarly, Imani et al. (2014) forecasted sea levels of Caspian Sea using support vector machines and gene expression programming, where SVM demonstrates the best performance in predicting Caspian Sea level anomalies with a 0.96 maximum coefficient of determination. Song et al. (2021) and den Bieman et al. (2023) explored sea level simulations and operational wave forecasts, respectively, using advanced neural networks and hybrid models, both showing similar patterns of alternating sea level rise and fall. Jiang et al. (2024) investigated the dynamics of a semi-submersible Floating Offshore Wind Turbine model. Duan et al. (2016) develops a hybrid empirical model decomposition support vector regression (SVR) model for nonlinear and non-stationary wave prediction. Cornejo-Bueno et al. (2016) propose a method for wave height estimation based on a SVR algorithm. Jenkins et al. (2023) analyzed storm clustering around the UK coastline by examining the temporal and spatial characteristics of storm surges, wave heights, and high still sea level exceedances.
Previous studies have primarily analyzed or predicted oceanographic time series data using machine learning techniques, often overlooking qualitative features that could enhance performance. This study introduces a novel hybrid methodology that incorporates qualitative features extracted using tools from topological data analysis (TDA), notably through persistent homology, which is instrumental in analyzing the geometry and topology of data. Persistent homology excels at capturing qualitative features known as topological features that persist across various scales, allowing for a simpler representation of complex data structures through topological frameworks that may not be evident in single-scale analyses (Otter et al. 2017; Favelier et al. 2018). By examining how these features evolve over different scales, persistent homology furnishes a more comprehensive understanding of data’s underlying topology. The feasibility of persistent homology in providing different representations of data structures in a qualitative context has successfully attracted researchers attention to the potential benefits of integrating these features with machine learning methods (Yihui & Chan 2005). TDA has been applied to various tasks, including curvature detection (Bubenik et al. 2020), entropy measurement (Atienza et al. 2019), rainfall pattern clustering (Bakar et al. 2020), and spatial data prediction (Pereira & de Mello 2015).
In this study, utilizing tools from TDA, we developed two specific applications: a topology-based Hierarchical Clustering Analysis (TDA-HCA) model for analyzing sea levels with data from Japanese gauge stations, and a topology-based Support Vector Regression (TDA-SVR) model for predicting wave heights using Canadian buoy data. The decision to use Canada’s buoy data for the TDA-SVR model, rather than Japan’s sea level data, stems from the availability of related atmospheric and environmental variables in Canada’s dataset, which are essential for developing accurate predictive models. Japan’s tide stations, on the other hand, only provide daily sea level measurements, making their data more suitable for clustering analysis (TDA-HCA). To the best of our knowledge, this represents the inaugural attempt to blend machine learning techniques with topological perspectives in the study of oceanographic data, signifying a substantial innovation within this field. The following are features and advantages of this study:
Our results demonstrate that TDA-SVR and TDA-HCA models substantially outperform traditional SVR and HCA models, setting new benchmarks in oceanographic data analysis.
The proposed model is versatile and applicable to various time series oceanographic datasets, including those related to salinity, currents, temperatures, sea levels, and wave heights, due to the ability of topological features to represent different structures according to the behavior of each variable.
The hierarchical clustering model with the topological features, is proficient in recognizing coastal areas with similar patterns in mean sea levels or wave heights and in analyzing fluctuations in other oceanographic parameters. This expanded capability allows us to explore regional responses to global changes such as rising temperatures or shifts in salinity, which are often associated with global warming and the melting of polar ice (Thompson & Merrifield 2014).
DATA
Wave height data
VCAR: Characteristic significant wave height (calculated by MEDS) (m)
VTPK: Wave spectrum peak period (calculated by MEDS) (s)
WDIR: Direction from which the wind is blowing (° True)
GSPD: Gust wind speed (m/s)
ATMS: Atmospheric pressure at sea level (mbar)
DRYT: Dry bulb temperature (°C)
SSTP: Sea surface temperature (°C)
The location, coastal region, water depth, and anemometer height for Canadian buoy stations
Station code . | Latitude . | Longitude . | Coastal region . | Water depth . | Anem. height . |
---|---|---|---|---|---|
c44139 | 44.240N | 57.100W | Banqureau Banks | 1,500 m | 5 m above |
c44207 | 50.252N | 129.550W | East Dellwood Knolls | 2,215 m | 5 m above |
Station code . | Latitude . | Longitude . | Coastal region . | Water depth . | Anem. height . |
---|---|---|---|---|---|
c44139 | 44.240N | 57.100W | Banqureau Banks | 1,500 m | 5 m above |
c44207 | 50.252N | 129.550W | East Dellwood Knolls | 2,215 m | 5 m above |
A map showing the locations of Canada’s buoy stations in the interest of domain. The locations are indicated by the purple dots with green codes.
A map showing the locations of Canada’s buoy stations in the interest of domain. The locations are indicated by the purple dots with green codes.
The distributions of features (from top to bottom: VCAR, VTPK, WDIR, GSPD, ATMS, DRYT, SSTP) from January 1, 2015, to December 31, 2015, collected at station c44139 are shown. The -axis represents the feature values, while the
-axis denotes the frequency of each feature.
The distributions of features (from top to bottom: VCAR, VTPK, WDIR, GSPD, ATMS, DRYT, SSTP) from January 1, 2015, to December 31, 2015, collected at station c44139 are shown. The -axis represents the feature values, while the
-axis denotes the frequency of each feature.
The distributions of features (from top to bottom: VCAR, VTPK, WDIR, GSPD, ATMS, DRYT, SSTP) from January 1, 2015, to December 31, 2015, collected at station c46207 are shown. The -axis represents the feature values, while the
-axis denotes the frequency of each feature.
The distributions of features (from top to bottom: VCAR, VTPK, WDIR, GSPD, ATMS, DRYT, SSTP) from January 1, 2015, to December 31, 2015, collected at station c46207 are shown. The -axis represents the feature values, while the
-axis denotes the frequency of each feature.
Sea level data
The names, coastal locations, latitude, and longitude of twelve different tide gauge stations in Japan. The table also displays some statistical summaries. Except for Std., summaries are presented in millimeters
Location . | Lat.and Long. . | Coastal region . | Mean . | Median . | Std. . | Max/Min . |
---|---|---|---|---|---|---|
Akune | (32.02, 130.19) | East China Sea | 7,059.46 | 7,047.50 | 134.61 | 7,367/6,795 |
Sakai | (35.55, 133.24) | Japan Sea | 7,111.58 | 7,111 | 142.37 | 7,479/6,810 |
Toyama | (36.76, 137.22) | Japan Sea | 7,071.14 | 7,079.00 | 125.45 | 7,373/6,775 |
Oga | (39.94, 139.71) | Japan Sea | 6,864.34 | 6,862.50 | 103.29 | 7,373/6,592 |
Wakkanai | (45.41, 141.69) | Japan Sea | 7,099.73 | 6,862.50 | 82.26 | 7,312/6,846 |
Abashiri | (44.01, 144.29) | Sea of Okhotsk | 7,065.50 | 7,067.00 | 68.45 | 7,266/6,839 |
Kushiro | (42.98, 144.38) | Pacific Ocean | 7,243.21 | 7,254.00 | 75.47 | 7,395/6,945 |
Hakodate | (41.78, 140.82) | Pacific Ocean | 6,928.59 | 6,930.00 | 84.93 | 7,118/6,692 |
Miyako | (39.64, 141.98) | Pacific Ocean | 7,120.31 | 7,133.00 | 84.94 | 7,307/6,875 |
Gyoko | (35.74, 140.86) | Pacific Ocean | 7,057.76 | 7,055.50 | 91.37 | 7,296/6,843 |
Owase | (30.08, 136.21) | Pacific Ocean | 6,864.88 | 6,869.00 | 99.18 | 7,163/6,646 |
Shimizu | (32.78, 132.96) | Pacific Ocean | 7,012.16 | 7,000.00 | 116.15 | 7,294/6,727 |
Location . | Lat.and Long. . | Coastal region . | Mean . | Median . | Std. . | Max/Min . |
---|---|---|---|---|---|---|
Akune | (32.02, 130.19) | East China Sea | 7,059.46 | 7,047.50 | 134.61 | 7,367/6,795 |
Sakai | (35.55, 133.24) | Japan Sea | 7,111.58 | 7,111 | 142.37 | 7,479/6,810 |
Toyama | (36.76, 137.22) | Japan Sea | 7,071.14 | 7,079.00 | 125.45 | 7,373/6,775 |
Oga | (39.94, 139.71) | Japan Sea | 6,864.34 | 6,862.50 | 103.29 | 7,373/6,592 |
Wakkanai | (45.41, 141.69) | Japan Sea | 7,099.73 | 6,862.50 | 82.26 | 7,312/6,846 |
Abashiri | (44.01, 144.29) | Sea of Okhotsk | 7,065.50 | 7,067.00 | 68.45 | 7,266/6,839 |
Kushiro | (42.98, 144.38) | Pacific Ocean | 7,243.21 | 7,254.00 | 75.47 | 7,395/6,945 |
Hakodate | (41.78, 140.82) | Pacific Ocean | 6,928.59 | 6,930.00 | 84.93 | 7,118/6,692 |
Miyako | (39.64, 141.98) | Pacific Ocean | 7,120.31 | 7,133.00 | 84.94 | 7,307/6,875 |
Gyoko | (35.74, 140.86) | Pacific Ocean | 7,057.76 | 7,055.50 | 91.37 | 7,296/6,843 |
Owase | (30.08, 136.21) | Pacific Ocean | 6,864.88 | 6,869.00 | 99.18 | 7,163/6,646 |
Shimizu | (32.78, 132.96) | Pacific Ocean | 7,012.16 | 7,000.00 | 116.15 | 7,294/6,727 |
Time series MMSLs for Japan’s studied gauge stations. The -axis represents the focused time period, while the
-axis represents the sea level height in millimeters (mm).
Time series MMSLs for Japan’s studied gauge stations. The -axis represents the focused time period, while the
-axis represents the sea level height in millimeters (mm).
A map showing the locations of Japan’s tide stations in the interest of domain. The locations are indicated by the purple dots with green names.
A map showing the locations of Japan’s tide stations in the interest of domain. The locations are indicated by the purple dots with green names.
Some oceanic data context
In the implementation of clustering tasks within oceanographic data, the incorporation of prior information regarding the contextual environment is indispensable. In this subsection, we introduce data contexts that are important for initial clustering hypotheses and discussions.
Zone 1a: Covers the northernmost part of Japan. This region likely experiences a cold climate, possibly subarctic.
Zone 1b: This region is likely to have a cooler temperate climate.
Zone 1c: This inland region might have a continental climate with significant temperature variations between seasons.
Zone 2: Located in the southern part of Japan. It’s colored red, indicating a potentially warmer climate zone, possibly subtropical.
Zone 3: This orange area suggests a transitional climate zone, likely temperate but warmer than the northern zones.
Zones 4, 5d, and 5c: These include areas are likely to have a temperate climate, with 5 c and 5 d being more towards the central region of Zone 1 b.
Zone 5a and 5b: These regions probably experience oceanic influences, with mild temperatures and relatively high precipitation.
Zone 6a and 6b: Represent the subtropical climate of warm temperatures year-round.
Japan’s climate zones, as described in Yuan et al. (2017), are represented with different colors and numbers to indicate the distinct climate zones across various regions.
Japan’s climate zones, as described in Yuan et al. (2017), are represented with different colors and numbers to indicate the distinct climate zones across various regions.
METHODS
Takens’ embedding theorem




Transformation of a time series into its phase space representation. (Top) A time series containing observations of values over 24 h on July 21, 2015. (Bottom) The phase space representation of the time series, constructed using the embedding theorem with
and
.
Transformation of a time series into its phase space representation. (Top) A time series containing observations of values over 24 h on July 21, 2015. (Bottom) The phase space representation of the time series, constructed using the embedding theorem with
and
.
Topological data analysis
Persistent homology







For any set in a complex, its subsets are also contained in that complex;
Any intersection of two sets in a complex is either empty or another set contained in that complex.









A nested sequence of complex. and
, while
. Obviously,
. The figure is sourced from Wasserman (2018).
A nested sequence of complex. and
, while
. Obviously,
. The figure is sourced from Wasserman (2018).
A filtration process can effectively capture the structure of datasets by tracking the emergence and disappearance of holes or voids. Here, by ‘holes or voids,’ we refer to cycles that do not serve as boundaries. To illustrate, consider K2 in Figure 8, where three 1-simplices, , and
, form a cycle. This cycle acts as the boundary for the simplex
in K3. In K2, the simplices
, and
do not constitute the boundary of any simplex (since the simplex
, indicated by the yellow triangle in K3, does not exist in K2 ). Thus, they form a non-boundary cycle that emerges in K2 but transitions to a boundary in K3. Consequently, the hole delineated by
, and
is born at the stage 2 and dies at stage 3.
Persistent diagram

















A persistent diagram for one-dimensional voids (denoted by dots) and two-dimensional holes (denoted by an open square) of the complex filtration shown in Figure 8. The figure is sourced from Wasserman (2018).
A persistent diagram for one-dimensional voids (denoted by dots) and two-dimensional holes (denoted by an open square) of the complex filtration shown in Figure 8. The figure is sourced from Wasserman (2018).
Persistent landscape




(a) A persistence diagram and (b) its corresponding persistent landscape, where for
, and
.
(a) A persistence diagram and (b) its corresponding persistent landscape, where for
, and
.






Bottleneck distance






The illustration demonstrates the filtration process in persistent homology applied to the Vietoris–Rips complex. Each point is associated with a sphere of the same size, and the sphere’s radius serves as the filtration parameter. As the radius increases, a sequence of nested simplicial complexes is formed, generating the persistent homology.
The illustration demonstrates the filtration process in persistent homology applied to the Vietoris–Rips complex. Each point is associated with a sphere of the same size, and the sphere’s radius serves as the filtration parameter. As the radius increases, a sequence of nested simplicial complexes is formed, generating the persistent homology.
Relevance to oceanographic data

The process of generating topological features from oceanographic data. (Top left) Time series of VTPK from 1:00 AM on October 21, 2016, to 12:00 AM on October 22, 2016. (Top right) The point cloud generated using Taken’s embedding theorem. (Bottom right) PD generated from the point cloud. (Bottom left) PL generated from the PD.
The process of generating topological features from oceanographic data. (Top left) Time series of VTPK from 1:00 AM on October 21, 2016, to 12:00 AM on October 22, 2016. (Top right) The point cloud generated using Taken’s embedding theorem. (Bottom right) PD generated from the point cloud. (Bottom left) PL generated from the PD.
PH can capture the underlying dynamic scaling processes inherent in the time series oceanographic data. PH, along with PD and PL, can be conceptualized as layers of an encoder. Collectively, they extract features that are often imperceptible to conventional statistical methods, revealing complex relationships between observed time series variables, such as VTPK or VCAR, and latent variables that are never observed. Furthermore, these features encapsulate interactions between the same observed variables across different time steps, uncovering time dependencies and hidden structures within the temporal evolution of the system. The machine learning models trained with topological features can be viewed as the decoder, learning a function that more effectively interprets the relationship between the targets, the data, and the hidden structures within the data.Moreover, based on the observation that higher-dimensional holes (above the third dimension) are rarely present, we selected a maximal homology dimension for the topology, starting from 0 and ending at 3, where 0-dimensional, 1-dimensional, and 2-dimensional representations correspond to the topological features of connected components, holes, and voids, respectively.
(a) 24-h records from January 1, 2015: The time series data displayed on the left shows relatively stable wave height measurements throughout the day, except for one extreme spike occurring in the final hour. This stability is reflected in the corresponding embedding plot in the center, where the data points are densely clustered with minimal spread. The persistence diagram on the right indicates that the data exhibits no significant topological features beyond dimension 1 (no higher-dimensional ‘holes’ or ‘voids’). The diagram shows some short-lived 0-dimensional features (connected components), but no long-lasting 1-dimensional features (loops) or 2-dimensional voids, reflecting the general stability of the data except for the final spike.
(b) 24-h records from May 7, 2015: In contrast, the time series data for May 7, 2015, shown on the bottom, exhibits significant fluctuations and oscillations throughout the 24-h period. These variations are captured in the embedding plot, where the data points are more scattered, showing more complex structures. The corresponding persistence diagram on the right reveals the presence of both 1-dimensional and 2-dimensional topological features (holes and voids), which are represented as long-lasting points in the diagram. These features reflect the more complex and fluctuating nature of the time series for this day.
Processes illustrating how variations in wave height lead to different topological representations in persistence diagrams. (a) 24-h records from January 1, 2015: The time series is relatively stable, with one extreme value occurring in the final hour, resulting in no higher-dimensional holes (dimension ) in the persistence diagram. (b) 24-h records from May 7, 2015: The time series shows notable fluctuations, resulting in the presence of 2-dimensional holes in the persistence diagram.
Processes illustrating how variations in wave height lead to different topological representations in persistence diagrams. (a) 24-h records from January 1, 2015: The time series is relatively stable, with one extreme value occurring in the final hour, resulting in no higher-dimensional holes (dimension ) in the persistence diagram. (b) 24-h records from May 7, 2015: The time series shows notable fluctuations, resulting in the presence of 2-dimensional holes in the persistence diagram.
Machine learning methods
Support vector regression







Hierarchical clustering

















(a) Dissimilarity distance calculated using example data. (b) A dendrogram showing clusters. The figure is sourced from Zulkepli et al. (2020).
(a) Dissimilarity distance calculated using example data. (b) A dendrogram showing clusters. The figure is sourced from Zulkepli et al. (2020).
In the case of time series data, the distance and dissimilarity matrix can be constructed by Dynamic Time Warping (DTW). Specifically, DTW was used to directly measure the dissimilarity among time series datasets from various tide gauge stations, resulting in a dissimilarity matrix. DTW constructs a matrix for two time series sequences and
, with each element
representing the distance between the
-th element of
and the
-th element of
. The goal is to identify a warping path
, which defines a mapping between these two sequences. Each element of this matrix,
, quantifies the similarity between different time series. The combining of different albums or clusters are determined by the linkage criterion. Common linkage criteria include:
Working flow
Data Preprocessing: Allocate certain percent the collected data for training purposes and the remaining for testing.
Reconstruction of Time Series Data: Transform time series data into formats suitable for shape or topological analysis. Various methods can be employed for this purpose, including Dynamic Time Warping and Taken’s Embedding Theorem, which is detailed in Section 3.1.
Persistent Diagram: Construct complexes filtration and persistent diagram to analyze the topological structure of the reconstructed data, as described in Section 3.2. To put it simply, a persis-tent diagram is represented as a two-dimensional Cartesian coordinate system, populated with points denoted as
for
.
Topological Feature Generation: Extract key topological features from persistent diagrams using methods such as Persistent Landscapes, Bottleneck Distance, or other techniques.
Model Training and Evaluation: Train the selected machine learning models using the extracted topological features. Next, choose suitable information criteria and compare the performance of the topology-based models against traditional models as benchmarks.



EXPERIMENT AND RESULTS
Traditional SVR and TDA-SVR
Our objective in building TDA-SVR models is to predict the daily mean VCAR using five parameters: VTPK, WDIR, GSPD, ATMS, SSTP, and DRYT, as detailed in Section 2. Each hourly observation includes (VTPK, WDIR, GSPD, ATMS, DRYT, SSTP, VCAR), represented as and
, where
are the input variables and
, VCAR, is the output variable.
We divided the hourly observation data, collected from January 1, 2015, to December 31, 2018, into training and testing datasets, with 85% for training and 15% for testing. The models were trained using the training dataset, and predictions were made on the testing dataset. We develop two categories of model, details are shown as follows:
- Traditional SVR models: The traditional SVR models include those listed from Equations (6) to (10). Before training the models, we processed the data in several steps to ensure it was ready for analysis: First, we took the hourly observations for each variable over the given period and calculated the daily mean for each day to ensure that the variables were on a similar scale and to prevent any one variable from dominating the model, we normalized the data. The normalization was performed using the following formula:where
represents the variables,
denotes the mean of the variable, and
denotes the standard deviation. This transformation standardizes the data, meaning that each variable will have a mean of 0 and a standard deviation of 1. For example, for feature VTPK, each observation
is normalized by
, where
denotes the vector containing the all observations during the training period. We removed outliers that satisfy:This is Z-score method that is commonly used for outlier removal (Aggarwal et al. 2019). We filled short continuous periods (less than 5 days) of missing data using the average of the previous seven days, while excluding long continuous periods of missing data. The intuitive reasoning behind this choice aligns with the idea that more recent values have a greater influence on the current value than those further back in time. Then, we trained the models and made predictions. The normalization and removal of outliers were also applied before training TDA-SVR models.
TDA-SVR models: The TDA-SVR models also include those listed from Equations (6) to (10). Before training, we performed a coefficient analysis to identify the input variable most strongly correlated with VCAR, which in our study was WDIR and VTPK. For both WDIR and VTPK, we first embedded the time series of 24 observations for each day of the focused period into a data point cloud with
and
, an example of which is shown in Figure 7. Then, we constructed the Vietoris–Rips complex and PH for that data point cloud, extracting the PD containing birth–death pairs of one, two, and three-dimensional holes. Subsequently, we built the first layer landscape for holes of each dimension and averaged them to obtain a scalar for each day of the focused period for both WDIR and VTPK. In other words, through this process, we obtained two new time series of topological representations: one constructed from WDIR and the other from VTPK. The major difference between the TDA-SVR models and traditional SVR models is that we trained the TDA-SVR models using the five variables inputted to the traditional models, in addition to the two variables of topological features derived from WDIR and VTPK. The other steps for the TDA-SVR models – normalization, outlier handling, and missing data handling – are the same as those for the traditional models. This process is illustrated in Figure 15.
Performance metrics



: The model has limited predictive power.
: The model has moderate predictive power.
: The model has high predictive power.
Comparative results of traditional SVR and TDA-SVR
Model framework
In this section, we present the results of the models constructed using APIs from sklearn and giotto-tda. For RBF models, the kernel coefficient (γ) was searched over the range to ensure a fine-grained exploration of possible values. This step-wise range allows for a thorough search across both lower and higher values of γ, which are crucial in controlling the influence of individual data points. Smaller values lead to a smoother decision boundary, while larger values make the boundary more complex. For EI and SEI models, the hyperparameter governing whether the intercept is fitted or not was included to account for potential shifts in the data that may require an intercept for better model performance. Including this in the search ensures that models are evaluated both with and without the intercept to find the optimal configuration. For Poly models, the kernel coefficient (γ) was also searched over
to cover a similar range of influence, while the degree of the polynomial function was searched between 2 and 5. This range captures both simpler models (degree 2) and more complex ones (degree 5), providing flexibility to model non-linear patterns in the data without overfitting. The hyperparameter selection was performed using 2-fold cross-validation on the training data.
Comparison of performance
SVR-RBF (Station c44139): The TDA-SVR-RBF model achieved an R2 score of 0.8851, significantly outperforming the traditional SVR-RBF model, which had an R2 score of 0.7221. This represents an improvement of approximately 22.6%. Additionally, the TDA model reduced the MAE from 2.1179 to 1.4152 and the MSE from 0.3386 to 0.1569, demonstrating improved prediction accuracy.
Linear Model (Station c44139): The TDA-linear model achieved an R2 score of 0.8238, compared to the traditional linear model’s R2 score of 0.6738, marking an improvement of 22.3%. The TDA model also reduced the MAE from 1.7376 to 1.3510 and the MSE from 0.3675 to 0.2193, indicating better performance.
SVR-RBF (Station c46207): The TDA-SVR-RBF model achieved an R2 score of 0.8593, compared to the traditional SVR-RBF model’s R2 score of 0.6750, representing a 27.3% improvement. The TDA model lowered the MAE from 2.8222 to 2.2372 and the MSE from 0.4506 to 0.2391.
Linear Model (Station c46207): The TDA-linear model achieved an R2 score of 0.6939, compared to the traditional model’s R2 score of 0.5811, representing an improvement of 19.4%. The TDA model also slightly improved the MAE, lowering it from 2.6660 to 2.5951.
Figures illustrating the performance of TDA-SVR models in predicting data from station c46207. Blue lines represent the actual wave heights, while red lines depict the model’s predictions. The type of model is specified in the title of each subfigure at the top right: (a) TDA-RBF and RBF models (b) TDA-Linear and Linear models (c) TDA-EI and EI models (d) TDA-SEI and SEI models (e) TDA-Poly and Poly models The -axis represents the prediction time period, and the
-axis shows the wave height in meters.
Figures illustrating the performance of TDA-SVR models in predicting data from station c46207. Blue lines represent the actual wave heights, while red lines depict the model’s predictions. The type of model is specified in the title of each subfigure at the top right: (a) TDA-RBF and RBF models (b) TDA-Linear and Linear models (c) TDA-EI and EI models (d) TDA-SEI and SEI models (e) TDA-Poly and Poly models The -axis represents the prediction time period, and the
-axis shows the wave height in meters.
Figures illustrating the performance of TDA-SVR models in predicting data from station c44136. Blue lines represent the actual wave heights, while red lines depict the model’s predictions. The type of model is specified in the title of each subfigure at the top right: (a) TDA-RBF and RBF models (b) TDA-Linear and Linear models (c) TDA-EI and EI models (d) TDA-SEI and SEI models (e) TDA-Poly and Poly models The -axis represents the prediction time period, and the
-axis shows the wave height in meters.
Figures illustrating the performance of TDA-SVR models in predicting data from station c44136. Blue lines represent the actual wave heights, while red lines depict the model’s predictions. The type of model is specified in the title of each subfigure at the top right: (a) TDA-RBF and RBF models (b) TDA-Linear and Linear models (c) TDA-EI and EI models (d) TDA-SEI and SEI models (e) TDA-Poly and Poly models The -axis represents the prediction time period, and the
-axis shows the wave height in meters.
Comparison of model performance for data from station c44139. This table presents the performance metrics of traditional models (denoted as ’Trad’) and TDA-based models (denoted as ’TDA’) across different model types: RBF, Linear, EI, SEI, and Poly. The metrics include R2 scores, Mean Absolute Error (MAE), and Mean Squared Error (MSE). The ’Params’ column specifies the hyperparameter settings used when these performance metrics were obtained. For RBF and Poly models, the kernel coefficient (γ) was tuned, while for EI and SEI models, the decision to fit the intercept was explored
Model type . | Type . | R2 scores . | MAE . | MSE . | Params . |
---|---|---|---|---|---|
RBF | Trad | 0.7221 | 2.1179 | 0.3386 | γ: 0.1 |
TDA | 0.8851 | 1.4152 | 0.1569 | γ: 0.1 | |
Linear | Trad | 0.6738 | 1.7376 | 0.3675 | |
TDA | 0.8238 | 1.3510 | 0.2193 | ||
EI | Trad | 0.6633 | 1.7913 | 0.3678 | fit_intercept: True |
TDA | 0.8230 | 1.3557 | 0.2209 | fit_intercept: True | |
SEI | Trad | 0.6683 | 1.7962 | 0.3752 | fit_intercept: True |
TDA | 0.8269 | 1.3178 | 0.2159 | fit_intercept: True | |
Poly | Trad | 0.5517 | 2.3457 | 0.5047 | d:3, γ:1.0 |
TDA | 0.6491 | 2.3386 | 0.4023 | d: 3, γ:1.0 |
Model type . | Type . | R2 scores . | MAE . | MSE . | Params . |
---|---|---|---|---|---|
RBF | Trad | 0.7221 | 2.1179 | 0.3386 | γ: 0.1 |
TDA | 0.8851 | 1.4152 | 0.1569 | γ: 0.1 | |
Linear | Trad | 0.6738 | 1.7376 | 0.3675 | |
TDA | 0.8238 | 1.3510 | 0.2193 | ||
EI | Trad | 0.6633 | 1.7913 | 0.3678 | fit_intercept: True |
TDA | 0.8230 | 1.3557 | 0.2209 | fit_intercept: True | |
SEI | Trad | 0.6683 | 1.7962 | 0.3752 | fit_intercept: True |
TDA | 0.8269 | 1.3178 | 0.2159 | fit_intercept: True | |
Poly | Trad | 0.5517 | 2.3457 | 0.5047 | d:3, γ:1.0 |
TDA | 0.6491 | 2.3386 | 0.4023 | d: 3, γ:1.0 |
The bold values indicate the performances of TDA models.
Comparison of model performance for data from station c46207. This table presents the performance metrics of traditional models (denoted as ‘Trad’) and TDA-based models (denoted as ‘TDA’) across different model types: RBF, Linear, EI, SEI, and Poly. The metrics include R2 scores, Mean Absolute Error (MAE), and Mean Squared Error (MSE). The ’Params’ column specifies the hyperparameter settings used when these performance metrics were obtained. For RBF and Poly models, the kernel coefficient (γ) was tuned, while for EI and SEI models, the decision to fit the intercept was explored
Model type . | Type . | R2 scores . | MAE . | MSE . | Params . |
---|---|---|---|---|---|
RBF | Trad | 0.6750 | 2.8222 | 0.4506 | γ: 0.07 |
TDA | 0.8593 | 2.2372 | 0.2391 | γ: 0.07 | |
Linear | Trad | 0.5811 | 2.6660 | 0.6322 | |
TDA | 0.6939 | 2.5951 | 0.5710 | ||
EI | Trad | 0.5785 | 2.7022 | 0.5285 | fit_intercept: True |
TDA | 0.6820 | 2.4067 | 0.5015 | fit_intercept: True | |
SEI | Trad | 0.5671 | 2.6653 | 0.5171 | fit_intercept: True |
TDA | 0.6677 | 2.4306 | 0.5240 | fit_intercept: True | |
Poly | Trad | 0.4484 | 3.2976 | 0.7437 | d:3, γ:1.0 |
TDA | 0.5883 | 3.4299 | 0.5883 | d:3, γ:1.0 |
Model type . | Type . | R2 scores . | MAE . | MSE . | Params . |
---|---|---|---|---|---|
RBF | Trad | 0.6750 | 2.8222 | 0.4506 | γ: 0.07 |
TDA | 0.8593 | 2.2372 | 0.2391 | γ: 0.07 | |
Linear | Trad | 0.5811 | 2.6660 | 0.6322 | |
TDA | 0.6939 | 2.5951 | 0.5710 | ||
EI | Trad | 0.5785 | 2.7022 | 0.5285 | fit_intercept: True |
TDA | 0.6820 | 2.4067 | 0.5015 | fit_intercept: True | |
SEI | Trad | 0.5671 | 2.6653 | 0.5171 | fit_intercept: True |
TDA | 0.6677 | 2.4306 | 0.5240 | fit_intercept: True | |
Poly | Trad | 0.4484 | 3.2976 | 0.7437 | d:3, γ:1.0 |
TDA | 0.5883 | 3.4299 | 0.5883 | d:3, γ:1.0 |
The bold values indicate the performances of TDA models.
Sensitivity analysis
Performance of the TDA-SVR model with an RBF kernel as a function of the RBF kernel coefficient (γ).
Performance of the TDA-SVR model with an RBF kernel as a function of the RBF kernel coefficient (γ).
Traditional HCA and TDA-HCA
In the initial phases of our analysis, we engaged with data from twelve distinct gauge stations: Akune, Sakai, Toyama, Oga, Wakkanai, Abashiri, Kushiro, Hakodate, Miyako, Gyoko, Owase, and Shimizu. Our initial hypothesis groups Akune, Sakai, and Toyama into one category, while Miyako, Gyoko, Owase, and Shimizu form a second group. All other stations are categorized into a third group. The geographical positions of these stations and their respective temperature zones are illustrated in Section 2. We analyzed MMSL time series from these locations, presented in Figure 4, and subsequently transformed these time series into three-dimensional point clouds within Euclidean space for further examination. The selection of an appropriate time delay, critical for our analysis, involved identifying the optimal range between 3 to 12 months to encompass seasonal to annual variations. For each dataset, we generated PDs, a step pivotal for constructing a dissimilarity matrix essential for training HCAs. The dissimilarity matrix quantifies the pairwise distances between the PDs, with ’distance’ specifically measured by the bottleneck distance as defined in Equation (4). This process underpins our methodological approach, allowing us to discern complex patterns in the data indicative of underlying environmental processes.
Subsequently, we utilized the AgglomerativeClustering function from sklearn, taking advantage of our pre-existing dissimilarity matrix derived from the bottleneck distance. For this purpose, we configured the ’metric’ parameter to ’precomputed’ and set the ’threshold’ parameter to zero, indicating our reliance on the direct distances provided by our matrix. Following this, we applied GridSearchCV to systematically evaluate and identify the optimal model performance across various linkage methods. These methods included Single Linkage (as detailed in Equation (11)), Complete Linkage (Equation (12)), and Average Linkage (Equation (13)), each representing distinct strategies for measuring distances between clusters of data points.
Performance metrics



represents the count of element pairs that are grouped together in both
and k.
signifies the count of element pairs that are placed in separate groups in both
and k.

: The clusterings have a low degree of similarity.
: The clusterings are as similar as would be expected by random chance.
: The clusterings are similar, with a high degree of agreement between them.
Comparative results of traditional HCA and TDA-HCA



Dendrograms generated for different linkage methods in the traditional HCA model for sea level data from January 1989 to December 2018. (a) Single linkage: Clusters are formed by merging the closest points between two clusters. (b) Complete linkage: Clusters are merged based on the farthest distance between points. (c) Average linkage: Clusters are formed using the average distance between all points in two clusters.
Dendrograms generated for different linkage methods in the traditional HCA model for sea level data from January 1989 to December 2018. (a) Single linkage: Clusters are formed by merging the closest points between two clusters. (b) Complete linkage: Clusters are merged based on the farthest distance between points. (c) Average linkage: Clusters are formed using the average distance between all points in two clusters.
Figure 20(a): The dendrogram was constructed by single linkage. The first clusters that form include Kushiro with Akune, and then Hakodate joins them. Notable is the gradual, step-wise fashion in which locations join to form clusters. For instance, Sakai and Abashiri form a pair, which then joins with the Kushiro-Akune-Hakodate cluster. This illustrates the chaining effect typical of single linkage clustering, where locations can join a cluster due to a single close neighbor. Further up the scale, we see Oga joining with Wakkanai, followed by Miyako, which suggests they are progressively less similar to the main cluster they join. The distance at which they join is indicative of the level of dissimilarity. Towards the end, Toyama, Owase, Shimizu, and Gyoko join at higher dissimilarity levels, indicating that these locations are the least similar to the initial cluster according to the metric used.
Figure 20(b): The dendrogram is constructed by complete linkage. Wakkanai and Hakodate are the first to merge into a cluster, suggesting they are relatively similar or close to each other based on the chosen dissimilarity metric. The next clusters that form include Oga with Akune, and Sakai with Abashiri, which then join together before merging with the Wakkanai-Hakodate cluster. Separately, Gyoko, Miyako, and Toyama form a cluster, which suggests these three locations are similar to each other according to the data. Kushiro, Shimizu, and Omae form another distinct cluster which joins the larger cluster at a higher level of dissimilarity.
Figure 20(c): In the case of average linkage, the dendrogram shows that Wakkanai, Akune, and Kushiro form a tightly-knit cluster early on, suggesting they share stronger similarities with each other than with other locations. Oga joins the aforementioned cluster at a slightly higher level of dissimilarity, followed by Toyama and Abashiri, which indicates these locations are somewhat less similar to the initial group. Hakodate and Miyako join at a still higher dissimilarity level. Sakai is the last location to be merged into the larger cluster.
Dendrograms generated for different linkage methods in the TDA-HCA model for sea level data from January 1989 to December 2018. (a) Single linkage: The dendrogram shows clusters formed by connecting the closest pair of points between two clusters, resulting in elongated and less compact clusters. (b) Complete linkage: In this dendrogram, clusters are formed based on the maximum distance between points in two clusters, leading to more compact and balanced clusters. (c) Average linkage: The dendrogram illustrates clusters formed by averaging the distances between all pairs of points in two clusters.
Dendrograms generated for different linkage methods in the TDA-HCA model for sea level data from January 1989 to December 2018. (a) Single linkage: The dendrogram shows clusters formed by connecting the closest pair of points between two clusters, resulting in elongated and less compact clusters. (b) Complete linkage: In this dendrogram, clusters are formed based on the maximum distance between points in two clusters, leading to more compact and balanced clusters. (c) Average linkage: The dendrogram illustrates clusters formed by averaging the distances between all pairs of points in two clusters.
Rand index evaluation across scaled distance thresholds for for (a)TDA-HCA and (b) traditional HCA models.
Rand index evaluation across scaled distance thresholds for for (a)TDA-HCA and (b) traditional HCA models.
To assess the performance of HCA models, we employed traditional techniques that combine DTW and hierarchical clustering, widely recognized for clustering time series data. This pre-computed matrix was then utilized as the input for the HCA models. We investigated various linkage methods and evaluated the models’ performance by adjusting the distance thresholds. Further details and comparative results are illustrated in Figure 19. For more transparent comparison, we normalized the distance thresholds by dividing 80 from each value. The results demonstrate that topology-based HCA models surpass their traditional counterparts in terms of rand score performance.
DISCUSSIONS
The study of wave height by TDA-SVR
The first important observation from the results in Section 4.1.2 is that models developed using data from station c44139 generally perform better than those using data from station c46207. This discrepancy can be attributed to the higher number of outliers in the data from station c46207, as shown in Figures 2 and 3 (WDIR and GSPD). These outliers are removed before training and prediction, which contaminates the results and is why the test periods for two stations are slightly different. Despite this, TDA-SVR models outperform traditional models. This improvement can be intuitively explained by the construction of new time series features (topological features), which enhances the data through augmentation, thereby providing more extracted information to the models. We also observed that the Linear, EI, and SEI SVR models exhibit similar prediction approximations to a certain degree, as shown in Figures 17 and 16. This similarity can be attributed to the linear nature of these models.
Predicting sea level changes with high accuracy remains challenging due to the myriad of interconnected variables involved. These include atmospheric and oceanic temperatures, ice sheet dynamics, and glacial melt rates (Nicholls 2011). These variables interact in complex and often unpredictable ways, making it difficult to model them accurately. Traditional machine learning techniques, which predict based on oceanographic data, frequently struggle to assign appropriate weights to parameters that interact in unexpected ways. Models with too few parameters tend to be inaccurate, while those with too many are prone to overfitting. Achieving the right balance is challenging. Second, the topology-based SVR enhances traditional models by utilizing a machine learning framework designed to manage large datasets and discern complex patterns more effectively. It contributes to the field of oceanography in several ways:
Buoy c46207 is located off the coast of the United States, in the Pacific Ocean, and Buoy c44139 is located in the North Atlantic Ocean, off the east coast of Canada. Both regions hold immense ecological significance. The North Atlantic Ocean supports a diverse array of marine life and plays a crucial role in regulating the global climate. Similarly, the Pacific Ocean is one of the most expansive and biologically productive bodies of water on Earth, supporting diverse ecosystems. Gaining a deeper understanding of wave patterns in these areas is essential for the effective protection and management of these critical ecosystems.
Two regions offer critical insights into global climate conditions. By predicting wave height and analyzing its changing patterns, we can gain a deeper understanding of how climate change may impact various parts of the world. Observations from these locations provide valuable information on the connection between global warming and oceanic conditions.
Predicting wave height and patterns can also support marine ecosystem conservation efforts. Wave dynamics play a crucial role in shaping marine habitats, especially in coastal regions and coral reefs. Accurate wave predictions can help conservationists monitor and manage sensitive ecosystems that are affected by wave energy, sediment transport, and water quality.
Accurate predictions of wave heights can greatly enhance safety for ships at sea. Knowing the wave conditions in advance helps in routing and scheduling, allowing vessels to avoid high seas and potential storms, which reduces the risk of maritime accidents. This is crucial not only for cargo ships but also for passenger vessels, fishing boats, and smaller recreational crafts.
In the field of renewable energy, particularly for offshore wind farms and wave energy converters, knowing wave heights is critical. Accurate predictions ensure that these facilities are built and operated within the limits of their design specifications, maximizing energy production while ensuring structural safety.
Accurate wave height information is essential for coastal management practices. It aids in the design of coastal infrastructure like sea walls, breakwaters, and levees. Understanding and anticipating wave patterns can help in minimizing erosion and managing flood risk, which is increasingly important with rising sea levels due to climate change.
Salcedo-Sanz et al. (2015) applied the SVR model using simulation-based data, achieving a MSE ranging between 0.38 and 0.81, and focused on Wave Height Forecasting.
Li et al. (2022) implemented a GRU model with data collected from six buoy stations in China, demonstrating a high predictive accuracy with R2 values ranging from 0.82 to 0.92 for predicting wave height.
Abbas et al. (2024) carried out multiple experiments:
- ○
They employed an LSTM network with buoy station data from both Canada and China, yielding R2 values between 0.71 and 0.81.
- ○
A separate experiment using the GRU model on the same data source achieved R2 values between 0.73 and 0.80.
- ○
Another study using ANN on the same Canadian and Chinese buoy stations achieved R2 values ranging from 0.55 to 0.73, showing slightly lower predictive accuracy compared to GRU and LSTM.
- ○
Our proposed models suggest further use of topological features-enhanced SVR applied to two Canadian buoy stations, showing promising predictive performance with R2 values ranging from 0.85 to 0.88.
Summary of performance improvements for TDA-SVR models compared to traditional SVR models
Model Type . | Station . | R2 Improvement . | MAE Improvement (%) . | MSE Improvement . |
---|---|---|---|---|
SVR-RBF | c44139 | ![]() | ![]() | ![]() |
Linear Model | c44139 | ![]() | ![]() | ![]() |
SVR-RBF | c46207 | ![]() | ![]() | ![]() |
Linear Model | c46207 | ![]() | ![]() | ![]() |
Model Type . | Station . | R2 Improvement . | MAE Improvement (%) . | MSE Improvement . |
---|---|---|---|---|
SVR-RBF | c44139 | ![]() | ![]() | ![]() |
Linear Model | c44139 | ![]() | ![]() | ![]() |
SVR-RBF | c46207 | ![]() | ![]() | ![]() |
Linear Model | c46207 | ![]() | ![]() | ![]() |
The table summarizes previous studies and proposed models related to wave height prediction, focusing on the methods, data sources, accuracy, and applications. Each study employs different machine learning techniques and datasets
Previous studies . | Methods . | Data source . | Accuracy . | Application . |
---|---|---|---|---|
Salcedo-Sanz et al. (2015) | SVR | simulation-based data | ![]() | Save Height Forecasting |
Li et al. (2022) | GRU | six China buoy stations | R2: 0.82-0.92 | Prediction of Wave Height |
Abbas et al. (2024) | LSTM | Canadian and China buoy stations | R2: 0.71-0.81 | Prediction of Wave Height |
Abbas et al. (2024) | GRU | Canadian and China buoy stations | R2: 0.73-0.91 | Prediction of Wave Height |
Abbas et al. (2024) | ANN | Canadian and China buoy stations | R2: 0.55-0.73 | Prediction of Wave Height |
Proposed Models | SVR | two Canadian buoy stations | R2: 0.85-0.88 | Prediction of Wave Height |
Previous studies . | Methods . | Data source . | Accuracy . | Application . |
---|---|---|---|---|
Salcedo-Sanz et al. (2015) | SVR | simulation-based data | ![]() | Save Height Forecasting |
Li et al. (2022) | GRU | six China buoy stations | R2: 0.82-0.92 | Prediction of Wave Height |
Abbas et al. (2024) | LSTM | Canadian and China buoy stations | R2: 0.71-0.81 | Prediction of Wave Height |
Abbas et al. (2024) | GRU | Canadian and China buoy stations | R2: 0.73-0.91 | Prediction of Wave Height |
Abbas et al. (2024) | ANN | Canadian and China buoy stations | R2: 0.55-0.73 | Prediction of Wave Height |
Proposed Models | SVR | two Canadian buoy stations | R2: 0.85-0.88 | Prediction of Wave Height |
In addition to strong predictive power, our proposed models offer several advantages over other machine learning techniques, such as GRU and ANN. One key advantage is their stability. Deep learning models, which rely on back-propagation algorithms, often involve significant randomness, leading to high variance in testing performance. Moreover, they require substantial computational resources and considerable time for both training and testing.
Interchangeability of TDA techniques for sea levels and significant wave heights
This work primarily focuses on TDA-clustering techniques for sea levels and TDA-regression techniques for significant wave heights. However, these techniques and datasets are interchangeable. In other words, it is feasible to apply the clustering pipelines for wave heights data and establish regression models for sea levels.
For example, for the three buoy stations from which significant wave height datasets are obtained, PHs and PDs can be established. The dissimilarity matrix between their PDs can be measured using metrics such as bottleneck distance. Similarly, PLs can be established for sea levels at Japan’s tide stations. The topological features (i.e., vectors containing discrete endpoints of PL intervals) can then be used to train regression models.
By leveraging these interchangeable techniques, we can develop robust models and gain deeper insights into both sea level variations and wave height patterns, enhancing our understanding of oceanographic data.
CONCLUSIONS AND FUTURE RESEARCH
Conclusions
This paper presents a novel hybrid modeling framework aimed at advancing oceanographic research by integrating advanced machine learning techniques with topological features. Two specific models were developed and evaluated using data from 15 observation stations across two countries and maritime regions, spanning extensive historical periods and significant oceanographic events.
Our study demonstrates that incorporating topological data significantly enhances the performance of traditional clustering and regression models. This improvement is particularly evident in the analysis of complex oceanographic phenomena, such as mean monthly sea levels and significant wave heights. For instance, in predicting wave heights using SVR models, the inclusion of topological features improved the R2 score by 19.4% to 27.3%, MAE by 2.7% to 33.2%, and MSE by 10.0% to 53.6%. These results highlight the robustness of our models in enhancing traditional approaches for oceanographic time series analysis. By addressing data gaps caused by inactive observation stations and integrating topological insights, the proposed approach expands the predictive capabilities of oceanographic models and supports more comprehensive analyses of regional responses to global environmental changes.
The methodologies developed in this work, validated through rigorous model evaluation and detailed data analysis, provide substantial advancements in marine environmental management and forecasting. These contributions lay a strong foundation for future research and practical applications in oceanographic modeling, particularly in addressing challenges related to climate change and environmental sustainability.
Future research directions
Despite the significant advancements achieved in this study, certain limitations highlight opportunities for future research. Oceanographic data often exhibit highly complex, multi-dimensional structures influenced by numerous interacting factors. These structures are not always fully captured by the three-dimensional models utilized in this work. Future research should explore advanced methods, such as the Mapper Algorithm or Path Homology, in conjunction with state-of-the-art data learning approaches, to better model these complexities and provide deeper insights into marine dynamics.
Moreover, while this study primarily focused on wave height prediction, future research will aim to expand the scope to include other critical oceanographic variables, such as salinity, temperature, and wind patterns. Advanced machine learning techniques, such as Long Short-Term Memory (LSTM) networks and Feed-Forward Neural Networks (FNN), will be explored to further improve the predictive accuracy of oceanographic models.
Finally, expanding the geographic coverage of data collection will be a priority to account for broader environmental conditions and enhance the generalizability of the proposed models. These future efforts aim to build upon the foundation established in this work, driving more accurate predictions and effective management of oceanographic and environmental challenges.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories: Wave data: https://meds-sdmm.dfo-mpo.gc.ca/isdm-gdsi/index-eng.html Sea level data: https://psmsl.org/data/obtaining/.
CONFLICT OF INTEREST
The authors declare there is no conflict.