Abstract
This study focuses on the trend analysis of sea level data along the Chennai coast and thereby checks the structural change in the dataset using the Chow method. This study also proposed a methodology for predicting the mean sea level with the feed-forward neural network (FFNN) and wavelet transform neural network (WTNN) models. The data analysis shows that a breakpoint is observed in the year 1994 and found an overall increasing trend during the selected time period at the Chennai coast. For model development, a better understanding of the influencing parameters of the sea level is essential. Hence, correlation analyses have been performed and found that wind speed, sea surface salinity, and surface pressure are influencing variables for modelling sea level data. Apparently, these influencing variables have been considered as potential inputs for model development. To compare the performance of all the developed models, the Root Mean Square Error, Correlation Coefficient, and Nash–Sutcliffe Efficiency (NSE) were utilized. The results of performance indices and the graphical indicators also show that WTNN Model 4 outperformed all the other developed models. It was noticed that the percentage increase in the efficiency of NSE was 29.52% for WTNN Model 4 as compared to other developed models.
HIGHLIGHTS
Detection of the breakpoint using the Chow method.
Identification of major climatic variables affecting the sea level rise.
Proposed a methodology for prediction of the sea level using climatic variables employing feed-forward neural network (FFNN) and wavelet transform neural network (WTNN).
INTRODUCTION
Under higher emissions scenarios, oceanic sea level rise (SLR) by 2100 is expected to range between 0.61 and 1.10 m (https://www.ipcc.ch). According to Oppenheimer et al. (2019), due to instabilities in the Antarctic and Greenland ice sheets, the rise might be as much as 2 m (DeConto & Pollard 2016; Bamber et al. 2019). Many studies have confirmed that climate change has been linked to SLR. Climate change is a story of global economic prosperity, particularly in the last 70 years since World War II, when expanding energy demands fed a burgeoning global economy. The sea level data throughout the whole oceanic domain have been available since the early 1990s due to the availability of precise altimetry satellite data. This priceless data collection has verified that, on average, the pace of SLR over the last 20 years has been twice as fast as it was over the previous lengthier multidecadal time span (Nerem et al. 2010; Church & White 2011). According to the Intergovernmental Panel on Climate Change (IPCC), by the 2050s, mean summer temperatures would rise by 1.5–2.0 °C and mean winter temperatures will climb by 2.5–3.0 °C which might be terrible for this environment. Flooding threats, coastal erosion, saltwater intrusion into groundwater, ecosystem and land use changes, and the potential of land conversion into permanent open water are all effects of relative SLR (Nicholls et al. 2007). To address this problem, it is necessary to model mean SLR accurately, which can be done by modelling efficiently using soft computing techniques.
Artificial neural networks (ANNs) are one of the most extensively used neurocomputing technologies in the field of time series analysis. These methods are extremely useful for conducting analysis on datasets when minimal information about the influencing variables and their involvement in generating the time series is available (Shamshirband et al. 2020; Band et al. 2022). ANNs can approximate a continuous function to any required precision without making any implicit assumptions, and they have desirable properties including non-linearity, parallelism, and robustness (Basheer & Hajmeer 2000). A wavelet transform neural network (WTNN) is a hybrid model that combines ANN with wavelet transform and has been used in a number of studies. The convolution of a signal with each of the wavelets in the family is defined as a wavelet transform operation. Translation invariant representation can be obtained by introducing some non-linearity to the system. In hydrology, machine learning (ML) has proven to be effective. Traditional data-driven and physical hydrology models perform less accurately in flood prediction than ML approaches especially in short-term flood forecasting (Mosavi et al. 2018). ML is a field of artificial intelligence (AI) used to induce regularities and patterns, providing easier implementation with low computation cost, as well as fast training, validation, testing and evaluation, with high performance compared to physical models, and relatively less complexity. Furthermore, the ML approach aids in the estimation of precipitation using satellite records. These are widely used in a variety of applications, including navigational safety, agricultural optimization, and mechanical structure design among others (Guillou & Chapalain 2021). ML algorithms have also been used to aggregate ‘best-estimate’ forecasts from an ensemble for the predictions of ocean waves (Bruneau et al. 2020). Neural networks (NNs) have also successfully been used to bias-correct measurements leading to more homogeneous climate data records (Leahy et al. 2018). Characterized by remarkable learning ability, noise tolerance, and generalisability, these advanced approaches offer new horizons compared to traditional engineering methods, and are therefore recognized as one of the pillars of future economic and industrial developments. Thus, these powerful methods are also playing an increasing role in the study of coastal processes, and their importance is reinforced by a growing number of observational available datasets (Beuzen & Splinter 2020). In relation to its highly predictable characteristics, particular attention was also dedicated to tide forecasting with applications in harbours disseminated along the coastline by comparing traditional harmonic analysis techniques with ML methods (French et al. 2017; Liu et al. 2019).
The study region Chennai has been identified as one of the important coastlines of India. It is facing the problem of SLR and its impacts in the past have been devastating (Deepa & Gnanaseelan 2021). The increase in the rate of change of SLR creates an interest to study this area. An attempt has been made to study the trend of SLR and thereby detect the breakpoint of mean SLR along the Chennai coast during the period (1916–2015). In this paper, WTNN models and feedforward neural network (FFNN) models have been developed for sea level modelling along the Chennai coast. The correlation analysis between the local climatic variables and sea level was performed to identify the potential factors responsible for the changes. The input variables have been varied and the performance indices have been calculated for each model. Their comparison has been done and the best model has been recommended for carrying out forecasting in the future times. The majority of previous research work found solely the thermosteric influence on the sea level, which could lead to an overestimation of expected sea levels. Moreover, the number of inputs used for the previous work is very limited and this has been taken into consideration for the present work.
Study area and data
METHODOLOGY
Breakpoint analysis
Breakpoint analysis is a way of looking at data to determine when there are shifts or breaks in normal levels. This analysis is used for determining the structural change of a time series that changes the slope abruptly at some unknown point. This test determines whether a broken line fits the data significantly better than a single straight line. The position of the breakpoint can be approximated using a confidence interval. The breakpoint is a crucial, safe, or threshold value above or below which undesirable effects occur. The breakpoint is very important in making decisions. The method used in this work for breakpoint detection is the Chow test. It was devised by econometrician Gregory Chow in 1960. It is a test of whether the true coefficients in two linear regressions on separate data sets are similar. It is frequently used to see if the independent variables have distinct effects on different demographic subgroups (Hurtado et al. 2020).
Trend analysis
This is the method of collecting data and identifying its pattern (Wang et al. 2020) and is very important in predicting future events. This analysis has gained much attention during the recent 2–3 decades. The periodicities of fluctuations have been detected for Chennai with data ranging from 1916 to 2015.
Modelling of sea level variations
For the modelling of sea level data, potential input variables have to be selected for the development of more efficient models. The relation between the variables and sea level was found using RStudio 2021.09.1-372. The variables having a strong correlation with the sea level were chosen as potential inputs for modelling sea level variations. These potential inputs were used for the development of FFNN and WTNN.
Feed-forward neural network
ANNs, sometimes known as NNs, are computer systems which work like a human brain. They learn from experience rather than programming. Artificial neurons, which are a collection of nodes, make up the system. Each neuron sends a signal to the next neuron. It functions similarly to a biological brain. The output of each neuron is determined using a non-linear function of its inputs. The weights of neurons frequently vary as they learn. Signals pass via concealed layers from the first (input layer) to the final (output layer). This approach is used to get the most expected value of an output variable by using the training settings and simulating the testing conditions. In this work, the feed-forward backpropagation method has been used to adjust the connection weights to compensate for each error found during learning. The multi-layered network trained by the backpropagation algorithm has been applied extensively to solve various engineering problems. The Levenberg–Marquardt algorithm (LMA) is used for training the network. The LM method is the most widely utilized in NN training because it allows for faster convergence of gradient descent (Nourani et al. 2009). The LMA is a popular trust region algorithm that is used to find a minimum of a function (either linear or non-linear) over a space of parameters. Essentially, a trusted region of the objective function is internally modelled with some function such as a quadratic. When an adequate fit is found, the trust region is expanded. As with many numerical techniques, the Levenberg–Marquardt method can be sensitive to the initial starting parameters. In traditional Levenberg–Marquardt implementations, finite differences are used to approximate the Jacobian. The Jacobian is a matrix of all first-order partial derivatives of the function being optimized. This matrix is convenient, as the user needs only supply a single function to the library. When the input is given, based on the performance of various inputs, weight is allocated to the neurons and the weights are updated based on the error values. The LM method is the most widely utilized in NN training because it allows for faster convergence of gradient descent (Nourani et al. 2009). The development of FFNN models has been carried out using MATLAB 9.8 R2020a.
Wavelet Transform Neural Network
A hybrid model was created by combining Wavelet Transform inputs with ANNs to improve the model's efficacy. Wavelet ensemble NN is the name for this type of NN (WTNN). It has a wide range of applications in signal and image analysis and denoising. It is concerned with the growth of functions using the basic functions as a starting point.
Discrete wavelet transform (DWT) takes less time and is easier to implement than the traditional continuous wavelet transform (CWT), which involves a large amount of processing work and data (Adamowski & Chan 2011). To obtain a time-scale signal in the DWT, digital filtering techniques are used. The wavelet algorithm is used to derive detailed coefficients and approximation series from the original time series after passing it through high-pass and low-pass filters (Gurley & Kareem 1999). There are varieties of mother wavelets and the authors compared four popular mother wavelets in their study: Daubechies (Db), Symlet (Sym), Discrete Meyer (dMey), and Haar (Nourani et al. 2009; Adamowski & Chan 2011). Daubechies wavelets were chosen as the study's mother wavelets because they yield the best results (Maheswaran & Khosa 2012).
Decomposition process
The decomposition means to break down the input parameters into different bands. It helps in finding the optimum value of the input to be used for model formation considering the scaling and shifting factors. Wavelet decomposition provides a complete image representation and performs decomposition according to both scale and orientation. When conducting a wavelet-based ANN model, it is necessary to determine the most suitable decomposition level from 1 to M. Theoretically, the maximum decomposition level (M) can be calculated as: M = log2(N), where N is the series length as mentioned in the line. Using this formula, the level of the decomposition was found to be 2, hence it has been used.
First, all the inputs which formed a good correlation with the mean sea level were chosen. The potential input parameters were sea SS, SP, and wind speed. Then, the datasets were divided into 80% training and 20% testing data. The model formulation started using two different NNs – ANN and WTNN. Now, the decomposition level is calculated by using the formula given. The Daubechies mother wavelet method is used in further analysis. In the model formation, the parameters are first trained and then tested. The network is simulated and the output variable is estimated.
Performance indices
Parameter . | Criteria . | Performance category . |
---|---|---|
R2 | R2> 0.5 | Acceptable |
R2 > 0.75 | Very good | |
NSE | NSE ≤ 0.50 | Unsatisfactory |
0.5 < NSE ≤ 0.65 | Satisfactory | |
0.65 < NSE ≤ 0.75 | Good | |
NSE ≥ 0.75 | Very good |
Parameter . | Criteria . | Performance category . |
---|---|---|
R2 | R2> 0.5 | Acceptable |
R2 > 0.75 | Very good | |
NSE | NSE ≤ 0.50 | Unsatisfactory |
0.5 < NSE ≤ 0.65 | Satisfactory | |
0.65 < NSE ≤ 0.75 | Good | |
NSE ≥ 0.75 | Very good |
Along with the performance indices, graphical comparisons of the model performances have also been carried out. For that, random walk test, heat map, violin test and scatter diagram have been included. A heatmap shows the best model using different colour based on the ranks given by performance indices. The colour variation can be via hue or intensity, indicating whether the occurrence is clustered or varies over time. It is used to rank models according to their performance. For data visualization and quality control, heat maps are commonly employed in expression analysis investigations (Zhao et al. 2014). A violin plot is a type of quantitative data visualization. It is similar to a box plot, however, on each side, there is a rotating kernel density plot. According to the random walk theory, any variable or phenomenon does not follow a pre-existing trend. It is presumptively impossible to outperform the market without taking on greater risk. It is a set of discrete, fixed-length steps that move in random directions (Roshni et al. 2020). The scatter diagram displays two sets of data on each axis and helps in determining the relation between the calculated and the ideal values. If the variables have a good relation, then it will be closer to the 1:1 line. The scatter plot is one of seven fundamental quality tools.
RESULTS AND DISCUSSIONS
Breakpoint analysis
S. No. . | Assumed breakpoints (years) . | F-statistics . | P-value (Probability) . |
---|---|---|---|
1 | 1927 | 12.99 | 0.000494342 |
2 | 1928 | 9.66 | 0.002464821 |
3 | 1993 | 38.26636 | 1.43067E-08 |
4 | 1994 | 39.6696 | 8.55609E-09 |
5 | 1995 | 39.04055 | 1.0766E-08 |
6 | 1997 | 33.71456 | 7.89438E-08 |
7 | 1998 | 32.0547 | 1.49566E-07 |
8 | 2001 | 27.84859 | 7.87232E-07 |
9 | 2002 | 21.53741 | 1.07605E-05 |
10 | 2005 | 25.79382 | 1.81321E-06 |
11 | 2006 | 26.88946 | 1.15982E-06 |
12 | 2007 | 35.42913 | 4.11783E-08 |
S. No. . | Assumed breakpoints (years) . | F-statistics . | P-value (Probability) . |
---|---|---|---|
1 | 1927 | 12.99 | 0.000494342 |
2 | 1928 | 9.66 | 0.002464821 |
3 | 1993 | 38.26636 | 1.43067E-08 |
4 | 1994 | 39.6696 | 8.55609E-09 |
5 | 1995 | 39.04055 | 1.0766E-08 |
6 | 1997 | 33.71456 | 7.89438E-08 |
7 | 1998 | 32.0547 | 1.49566E-07 |
8 | 2001 | 27.84859 | 7.87232E-07 |
9 | 2002 | 21.53741 | 1.07605E-05 |
10 | 2005 | 25.79382 | 1.81321E-06 |
11 | 2006 | 26.88946 | 1.15982E-06 |
12 | 2007 | 35.42913 | 4.11783E-08 |
It can also be seen from Figure 3 that before the breakpoint, the rate of SLR was 0.2 mm/year and it has been increased to 4.1 mm/year after the year 1994. This change in the slope created an interest in further study of this region. The rate of rise has drastically increased in the last 2–3 decades. The dotted line shows the breakpoint in 1994. This does not mean that it is solely the extreme hydrological events in that particular year that have led to the abrupt changes in the time series for a longer duration. There will be a number of factors affecting the SLR such as melting of glaciers, sea SS, wind speed, SP, etc. During the last 2–3 decades, a lot of disasters have been witnessed along the coastal areas but this is not the only reason for the sudden rise in the sea level. A momentary surge would have been noticed for a shorter period of time due to these extreme events. Keeping in mind the average increase in the mean sea level at global scale, a threshold value of (2–3) mm increase was adopted. When this limit is crossed, it is notified as a breakpoint. The Chow test gave a number of breakpoints but the significant one was in the year 1994.
Trend analysis
Modelling of sea level variations
Looking at the impacts of climate change on the sea level, predicting the rise in the sea level with better precision is critical. Correlation analysis should be done to find out the major variables affecting the SLR. In order to tackle these problems, modelling of SLR is the first and foremost step that needs to be taken. It would be possible to assess the rate of change in sea level and proper management steps to be taken to decrease the rapid rate.
Correlation among climatic variables
Sea level prediction
This study analyses the variation in sea level observed values with the predicted values of different NN models. It also takes into account the forcing factors which are responsible for such discrepancies. Modelling using FFNN and WTNN gives the most accurate predicted values. SSS, SP, and wind speed were used as potential input variables for the formulation of models and the sea level was taken as the target or output variable. It was done for the period 1993–2020. The data were divided into two parts: training and testing dataset. The training data consists of 270 months and testing data are of 66 months. Since the observed data were not available for the whole time period, there were some gaps in the time series which may lead to errors. Hence, instead of using observed data from 1916, satellite data from 1993–2020 have been used for the modelling.
Feed forward neural network
In this approach, the LM algorithm was used for the feedforward back propagation method in detecting the optimum solution. The best model was selected based on the performance indices given by the respective models. The trained model was put to the test by forecasting sea level values using the remaining 20% of data. Performance indices such as NSE, RMSE, and r were used to perceive the competence of the model. The results of the performance indices are shown in Table 2. The indices showed that further modification using different methods was needed.
Wavelet transform neural network
The decomposed approximations and details have been shown on the y-axis for wind speed, pressures, and salinity.
A total of seven models were formed with FFNN and WTNN models with different input combinations and the performance indices have been calculated and shown in Table 3.
Model/Level/Mother wavelet Neuron . | Input combination . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE . | R . | NSE . | RMSE . | R . | NSE . | ||
FFNN/50 (Model 1) | 0.038 | 0.819 | 0.657 | 0.039 | 0.807 | 0.629 | |
WTNN/3/Db1/37 (Model 2) | (a3 and d1,d2,d3) | 0.042 | 0.875 | 0.753 | 0.045 | 0.905 | 0.805 |
WTNN/3/Db2/37 (Model 3) | (a3 and d1,d2,d3) | 0.044 | 0.885 | 0.762 | 0.038 | 0.896 | 0.789 |
WTNN/3/Db3/39 (Model 4) | (a3 and d1,d2,d3) | 0.038 | 0.933 | 0.851 | 0.041 | 0.949 | 0.879 |
WTNN/3/Db4/37 (Model 5) | (a3 and d1,d2,d3) | 0.041 | 0.909 | 0.811 | 0.039 | 0.919 | 0.831 |
WTNN/3/Db3/45 (Model 6) | (a3 and d1 + d2 + d3) | 0.038 | 0.921 | 0.827 | 0.042 | 0.904 | 0.802 |
WTNN/3/Db4/38 (Model 7) | (a3 and d1 + d2 + d3) | 0.039 | 0.913 | 0.821 | 0.044 | 0.895 | 0.798 |
Model/Level/Mother wavelet Neuron . | Input combination . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE . | R . | NSE . | RMSE . | R . | NSE . | ||
FFNN/50 (Model 1) | 0.038 | 0.819 | 0.657 | 0.039 | 0.807 | 0.629 | |
WTNN/3/Db1/37 (Model 2) | (a3 and d1,d2,d3) | 0.042 | 0.875 | 0.753 | 0.045 | 0.905 | 0.805 |
WTNN/3/Db2/37 (Model 3) | (a3 and d1,d2,d3) | 0.044 | 0.885 | 0.762 | 0.038 | 0.896 | 0.789 |
WTNN/3/Db3/39 (Model 4) | (a3 and d1,d2,d3) | 0.038 | 0.933 | 0.851 | 0.041 | 0.949 | 0.879 |
WTNN/3/Db4/37 (Model 5) | (a3 and d1,d2,d3) | 0.041 | 0.909 | 0.811 | 0.039 | 0.919 | 0.831 |
WTNN/3/Db3/45 (Model 6) | (a3 and d1 + d2 + d3) | 0.038 | 0.921 | 0.827 | 0.042 | 0.904 | 0.802 |
WTNN/3/Db4/38 (Model 7) | (a3 and d1 + d2 + d3) | 0.039 | 0.913 | 0.821 | 0.044 | 0.895 | 0.798 |
Performance analysis of ANN and WTNN models
In order to find the best model for predicting the values, the statistical parameters of both models (ANN and WTNN) were computed and compared and shown in Table 2. It was observed that Model 4 (highlighted in Table 2), which is a hybrid model, performed better compared to the other developed models. The ANN model showed that NSE > 0.5 (acceptable) and R > 0.75 (acceptable). The WTNN model yielded very good results with NSE > 0.851 and R > 0.933 (Sithara et al. 2020). The RMSE value being very small made the conclusion more reliable. In order to check the efficacy of the developed model, various plots such as violin plot, heatmap, scatter plot, and random walk test were used.
The percentage change in performance indices has been calculated and shown in the table It is obvious from Table 4 that WTNN (Model 4) percentage gain in efficiency of NSE is in the range of (0–29.52)% for the training dataset while for the testing dataset, it varies between (0–39.74)% with respect to the conventional model (FFNN). Model 4 has no percent change in RMSE indicating that it is the best model for the prediction. The range of RMSE is (0–10.52)% for the training dataset and testing dataset, and is in the range of (0–15.38)% compared to FFNN. Similarly, considerable percentage improvement has been observed for all indices during the training and testing period.
. | Training . | Testing . | ||||
---|---|---|---|---|---|---|
Model no. . | % RMSE . | % R . | %NSE . | % RMSE . | % R . | %NSE . |
1 | – | – | – | – | – | – |
2 | 10.52 | 6.83 | 14.61 | 15.38 | 12.14 | 27.98 |
3 | 15.78 | 7.45 | 15.98 | 2.56 | 11.03 | 25.43 |
4 | 0 | 13.91 | 29.52 | 5.12 | 17.59 | 39.74 |
5 | 7.89 | 10.98 | 23.43 | 0 | 13.87 | 32.11 |
6 | 0 | 12.45 | 25.87 | 7.68 | 12.01 | 27.50 |
7 | 2.63 | 11.47 | 24.96 | 12.80 | 10.90 | 21.18 |
. | Training . | Testing . | ||||
---|---|---|---|---|---|---|
Model no. . | % RMSE . | % R . | %NSE . | % RMSE . | % R . | %NSE . |
1 | – | – | – | – | – | – |
2 | 10.52 | 6.83 | 14.61 | 15.38 | 12.14 | 27.98 |
3 | 15.78 | 7.45 | 15.98 | 2.56 | 11.03 | 25.43 |
4 | 0 | 13.91 | 29.52 | 5.12 | 17.59 | 39.74 |
5 | 7.89 | 10.98 | 23.43 | 0 | 13.87 | 32.11 |
6 | 0 | 12.45 | 25.87 | 7.68 | 12.01 | 27.50 |
7 | 2.63 | 11.47 | 24.96 | 12.80 | 10.90 | 21.18 |
CONCLUSIONS
The sea level variations are complex in nature, resulting from different processes and settings. This research examined the sea level variations by studying the trend analysis and breakpoint analysis and consequently the selection of potential drivers of sea level change at the Chennai coast. This work further assessed and compared the performances of the predicted sea level variations at the Chennai coast using ANN and WTNN models.
Trend lines of the mean sea level show an overall increasing trend for the selected time period. The results of breakpoint analysis by the Chow method lead us to conclude that there is a major breakpoint in the year 1994 and the SLR changed from 0.2 to 4 mm/year after the breakpoint year 1994. Further research has to be carried out to understand and verify the structural changes in SLR during the year 1994. The year 1994 is only acting as a breakpoint from where there has been a continuous rise in the sea level. From the correlation analysis, it was observed that salinity, wind speed, and SP are some of the major driving forces of sea level variations in the Chennai coast. It was also observed that salinity and wind speed are positively correlated with sea level and negatively correlated with SP. Seven models were developed to predict the sea level variations using ANN and WTNN models. For the development of WTNN models, different input combinations of approximate and details were used. Statistical indices along with the graphical indicators were used for the comparison of the developed models and found that the input combination a3 and d1, d2, d3 was found superior to the other developed models in terms of its prediction. It is evident from the graphical interpretations that Model 4 (a3 and d1, d2, d3) outperformed other developed models with 29.52% more efficiency than the conventional FFNN model. It tells what should be the combination of approximations and details in prediction analysis while carrying out the modelling part. It makes future research easier as again permutation and combination do not have to be used to establish the best-predicted results.
The applicability of different ML techniques in sea level prediction and extending the regional scale prediction to a global scale can be set as the future scope of this study. The outcomes of the present study have important implications for research on forecasting the sea level, especially from the viewpoint of wavelets and ANNs in particular and different time series/data-based methods more broadly. Though the hybrid WTNN models performed well, there is still scope for further improvements through additional studies. This study is region-specific and it can be extended to the entire Coromandel coast. There are many other factors that influence SLR (e.g. melting of glaciers, the gravitational pull of moon and earth) that have not been considered in the present study. Nevertheless, the model showed good predictability in the calibration and validation period using long-term observed data even when the above factors were not taken into consideration.
ACKNOWLEDGEMENT
The authors would like to appreciate the time and effort that the editor and the reviewers dedicated to providing feedback on our manuscript and are grateful for their insightful comments.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.