Abstract
Mapping mangrove forests is crucial for their conservation, but it is challenging due to their complex characteristics. Many studies have explored machine learning techniques that use Synthetic Aperture Radar (SAR) and optical data to improve wetland classification. This research compares the random forest (RF) and support vector machine (SVM) algorithms, employing Sentinel-1 dual polarimetric C-band data and Sentinel-2 optical data for mapping mangrove forests. The study also incorporates various derived parameters. The Jeffries–Matusita distance and Spearman’s rank correlation are used to evaluate the significance of commonly used spectral indices and SAR parameters in wetland classification. Only significant parameters are retained, reducing data dimensionality from 63 initial features to 23–33 essential features, resulting in an 18% improvement in classification accuracy. The combination of SAR and optical data yields a substantial 33% increase in the overall accuracy for both SVM and RF classification. Consistently, the fusion of SAR and optical data produces higher classification accuracy in both RF and SVM algorithms. This research provides an effective approach for monitoring changes in Pichavaram wetlands and offers a valuable framework for future wetland monitoring, supporting the planning and sustainable management of this critical area.
HIGHLIGHTS
The combination of Sentinel-1 dual-polarimetric C-band data and Sentinel-2 optical data for mapping mangrove forests is shown to provide better classification results of mangroves.
Machine learning algorithms such as random forest and support vector machine are used for classification, with a comparison of their performance.
This study emphasizes the importance of feature selection for the accurate classification of mangrove forests.
INTRODUCTION
Wetlands hold significant ecological and economic value, serving as habitats for wildlife and fisheries, stabilizing shorelines, managing floods, replenishing groundwater, and providing recreational opportunities. They also act as carbon sinks, offering potential in mitigating climate change impacts (Villa & Bernal 2018). Mangroves, a critical wetland type, span worldwide. Alarming reports indicate that around 20% of this area was lost between 1980 and 2005 due to mismanagement, pollution, and climate change (Spalding 2010). Recent studies also alerted that climate change-related impacts continue to erode wetland quantity and quality (Association of State Wetland Managers 2020). The latest research in the context of Indian mangroves also unveils alarming trends, which state the highly vulnerable status due to climate change for about 40% of West Bengal’s Sundarbans mangrove forests. The entire region of mangroves along India’s east coast is also in similar vulnerability conditions due to climate change (MONGABAY 2023).
Remote sensing data play a crucial role in gathering information, facilitating planning, and shaping policies related to wetlands (Tiner et al. 2015). Satellite data accessibility and temporal variability have made them a popular choice for wetland mapping. While optical sensor data and derived vegetation indices have been traditionally used, they have gained prominence due to their effectiveness (Ozesmi Bauer 2002). Vegetation indices are commonly employed to assess plant biomass, growth, and health, correlating with ground vegetation conditions and derived from satellite data. However, optical data’s limitation lies in its unavailability during the monsoon season. Distinguishing wetland vegetation species, identifying flooded vegetation, and estimating canopy structure remain challenging even with high-resolution optical imagery (Heumann 2011; Mahdavi et al. 2018). In contrast, synthetic aperture radar (SAR) data, which use microwave wavelength of electromagnetic spectrum, penetrate vegetation canopies, offering advantages like day–night availability and cloud-free observations (Salehi et al. 2018). SAR data’s sensitivity to dielectric properties and surface geometry make them effective for discriminating wetland classes (Henderson & Lewis 2008). SAR-based wetland classification accuracy depends on polarization and frequency bands (Touzi et al. 2007).
Flooded vegetation is a primary distinguishing feature of wetlands. Radar signals interact differently with flooded and non-flooded areas. Multifrequency and multipolarized SAR data are effective solutions, with dual-pol and quad-pol data being particularly useful. Multipolarized data enable the application of polarimetric decomposition techniques, aiding in identifying backscattering mechanisms and improving wetland classification (Salehi et al. 2018).
Artificial intelligence and machine learning (ML) play a pivotal role in precise classification and real-time monitoring of forest changes, thereby furnishing vital data for climate change research and management (Di Nitto et al. 2014; Srivastava et al. 2015; Whitt et al. 2020). For distinguishing wetlands from other land cover categories, remote sensing data classification stands as the primary technique. The commonly employed supervised classification methods like support vector machine (SVM), classification and regression tree, k-nearest neighbor, decision tree, and random forest (RF) have garnered attention for their efficacy (Judah & Hu 2019; Ghosh & Das 2020; Mallick et al. 2021; Munizaga et al. 2022; Song et al. 2022). Among these, the SVM model offers adaptability to both linearly separable and non-linear data, demonstrating its effectiveness in wetland mapping. It excels in handling multisource data, which leads to improved accuracy in wetland classification, as supported by previous studies (Mercier et al. 2019). SVM’s strength lies in its ability to capture non-linear relationships in data, resulting in high classification accuracies, as noted in the literature (Ghosh & Das 2020). Its capability to effectively manage non-linear data is achieved through the use of kernel functions that map the data into higher-dimensional spaces where it can be linearly separated (Cortes & Vapnik 1995; Cristianini & Shawe-Taylor 2000). Many studies have sought to enhance the classification accuracy by incorporating a greater number of derived parameters from satellite data (Amani et al. 2017, 2019; Mercier et al. 2019). The RF classifier is an ensemble of decision trees known for its capacity to handle high-dimensional data and mitigate overfitting, making it an appealing choice for wetland classification (Munizaga et al. 2022). It has proven its value in monitoring wetlands across diverse landscapes (Srivastava et al. 2015). SVM, like RF, is also robust to overfitting, rendering it suitable for managing complex and high-dimensional datasets. Both classifiers exhibit less susceptibility to noise and outliers, which can have detrimental effects on the performance of some other classifiers (Breiman 2001; Liaw & Wiener 2002). RFs, by nature of their ensemble approach, can also handle non-linear relationships in data effectively.
The present study addresses the challenge of reducing the dimensionality of input data as a mitigation strategy. Reducing the number of input variables offers several advantages, including a reduction in the computational burden of modeling and often an enhancement of the model’s effectiveness. Feature selection primarily aims to retain only the most relevant input features while eliminating redundant information, ultimately minimizing generalization errors (Hai Ly et al. 2022).
There have been many studies in the mapping of wetlands all across the world including North Indian mangrove forests like the Sundarbans. There is lack of research in South Indian tropical mangrove forests. The primary objective of this study is to enhance the accuracy of delineating mangrove forests and various wetland classes within the Pichavaram region in South India. This enhancement is achieved through the integration of parameters derived from both optical and SAR data. By incorporating complementary optical data, the study aims to address the limitations of SAR data. The research further involves a comparative analysis of the performance of SVM and RF classifiers in the classification of wetlands. To achieve this objective, the study employs Jeffries–Matusita (JM) distance and multicollinearity analysis to identify and retain the most significant features. This approach serves to reduce data dimensionality and subsequently enhance the accuracy of classification. The research considers a total of five distinct scenarios: (1) utilizing only optical data, including derived indices; (2) utilizing optical data after feature subsetting; (3) utilizing solely SAR data, including parameters obtained from SAR; (4) utilizing SAR data following feature subsetting; and (5) integrating both SAR and optical data after feature subsetting. In addition, the study aims to assess these five scenarios across different seasons of the year to investigate the variations in parameter behavior throughout different time periods.
This study aims to develop a method to monitor spatial and temporal wetland changes in the Pichavaram area based on ML, using SAR and optical data. The initial hypothesis tested in this study is that feature subsetting improve the performance of the classification model by reducing data redundancy. This study can be considered a framework to minimize field costs and improve processing efficiency to obtain the best wetland classification results. The results of this study can support decision-makers in developing strategies and policies for the planning and sustainable management of the territory.
STUDY AREA
Study area
Pichavaram is renowned for its extensive mangrove forest, which is the second largest of its kind globally. However, recent studies have indicated that the mangrove forest in Pichavaram is facing degradation due to improper management practices, including unscientific approaches and cattle grazing during the rainy season. This grazing occurs precisely when the young mangrove species are at their peak growth phase (Selvam et al. 2010). The climate in this area is characterized by monthly temperature variations ranging from a minimum of 25 °C to a maximum of 31 °C during the summer season. The mean monthly rainfall fluctuates between 10 and 400 mm. The region experiences two distinct monsoon seasons: the Southwest monsoon from June to September and the Northeast monsoon from October to December. November is the wettest month, receiving the highest amount of rainfall. Relative humidity levels are around 69% during the Southwest monsoon and increase to 81% during the Northeast monsoon (Saleem Khan et al. 2014).
The dominant mangrove species in this region are Avicennia (Black mangrove) and Rhizophora (Red mangrove). Avicennia is characterized by sparse vegetation, while Rhizophora forms a dense outer ring around Avicennia. These mangroves provide habitat for diverse flora and fauna, attracting migratory birds as well.
The study area was deliberately selected to encompass a diverse range of major land cover types, which includes water bodies, mangroves, non-flooded vegetation, paddy fields, mudflats, fallow land, and built-up areas. It is noteworthy that four of these categories – water bodies, mangroves, paddy fields, and mudflats – are classified as wetlands in accordance with the Ramsar Convention (SECRETARIAT 2014). According to the Ramsar Convention, wetlands are defined as ‘areas of marsh, fen, peatland or water, whether natural or artificial, permanent or temporary, with water that is static or flowing, fresh, brackish or salt, including areas of marine water the depth of which at low tide does not exceed six meters’.
Mangrove wetlands are particularly vital as they serve as sources of biological diversity, supplying essential water and primary productivity that numerous plant and animal species rely on for their survival. Paddy fields, which are a common wetland plant, represent a staple diet for over half of the global population. Mudflats, as coastal wetlands, serve as rich foraging grounds for shorebirds during low tides and provide valuable resources for other birds and fish during high tides.
The non-flooded vegetation class encompasses various types of woody and herbaceous vegetation, as well as orchards. These areas are valuable for maintaining air quality, soil quality, and water quality, making them ecologically important.
Satellite data used
The data utilized in this study were sourced from the European Space Agency and consisted of both SAR and optical data. Specifically, Sentinel-1A SLC (single look complex) data were obtained from the Copernicus website (https://scihub.copernicus.eu/dhus/#/home), while Sentinel-1A GRD (ground range detected) and Sentinel-2A optical data were acquired from Google Earth Engine (GEE). Sentinel-1 data, being freely available, played a crucial role in the analysis, focusing on the C-band’s capability to differentiate wetlands from other land cover classes. The C-band SAR data was employed to examine the backscattering response of various land cover categories.
Complementing the SAR data, Sentinel-2A optical data provided valuable insights into vegetation and land cover types. Sentinel-2 data offered multiple spectral bands in near infrared (NIR), which is particularly useful for vegetation classification. Moreover, Sentinel-2 data have comparatively higher spatial and temporal resolutions, enhancing its suitability for the study. To ensure data quality, only optical data with a cloud cover of less than 20% were selected for the research. This threshold allowed for a balance between data availability and data quality, ensuring that the selected imagery was suitable for analysis. In addition, a cloud mask was implemented to effectively address any cloud-affected areas within the images, further enhancing data quality and accuracy.
The study specifically leveraged the IWS mode of Sentinel-1A, which provided dual-polarized data (vertically transmitted (VV) and horizontally received (VH) signals) over the study area. The selection of data acquisition dates was thoughtfully planned to encompass all seasons throughout the year 2018, taking into consideration the aim of minimizing cloud coverage in the optical data. It was ensured that both SAR and optical images were acquired within a close temporal window of each other, allowing for effective comparison and analysis. The details of the data can be found in Table 1.
Satellite data . | Dates of acquisition . | Description . |
---|---|---|
Sentinel-2A optical data | 5 February 2018 | Spatial resolution of bands: |
18 April 2018 | Blue, Green, and Red – 10 m | |
11 July 2018 | Vegetation red edge (VRE) and | |
9 September 2018 | Short-wave infrared (SWIR) – 20 m | |
8 November 2018 | Temporal resolution: 10 days | |
Sentinel-1A SAR data | 31 January 2018 | Spatial resolution: |
21 April 2018 | Temporal resolution: 12 days | |
5 July 2018 | Wavelength: 5.405 GHz (C-band) | |
8 September 2018 | Polarization: VV and VH | |
12 November 2018 | Acquisition mode: | |
Interferometric wide swath (IWS) | ||
Swath width: 250 km | ||
Incidence angle: 20–45 |
Satellite data . | Dates of acquisition . | Description . |
---|---|---|
Sentinel-2A optical data | 5 February 2018 | Spatial resolution of bands: |
18 April 2018 | Blue, Green, and Red – 10 m | |
11 July 2018 | Vegetation red edge (VRE) and | |
9 September 2018 | Short-wave infrared (SWIR) – 20 m | |
8 November 2018 | Temporal resolution: 10 days | |
Sentinel-1A SAR data | 31 January 2018 | Spatial resolution: |
21 April 2018 | Temporal resolution: 12 days | |
5 July 2018 | Wavelength: 5.405 GHz (C-band) | |
8 September 2018 | Polarization: VV and VH | |
12 November 2018 | Acquisition mode: | |
Interferometric wide swath (IWS) | ||
Swath width: 250 km | ||
Incidence angle: 20–45 |
METHODOLOGY
To ensure the quality and validity of the analysis, several statistical procedures were conducted using R software. Specifically, separability and multicollinearity analyses were performed to assess the characteristics of the data. For ground truth data collection, a comprehensive effort was made to gather field data from a range of 200–300 ground sites for each specific class. This involved utilizing GPS devices, cameras, and leveraging Google Earth. The data collection phase spanned from December 2018 to February 2020.
Preprocessing
GEE archive offers preprocessed Sentinel-2A optical images. Preprocessing steps like cloud masking with Sentinel-2 quality bands and subsetting to isolate the relevant subject area were performed on it. Subsequently, within the GEE platform, spectral indices were computed from the Sentinel-2A optical images, which are capable of discriminating water, land, and vegetation. These spectral indices were derived using their respective formulas, as outlined in Table 2.
Satellite Data . | Features Extracted . |
---|---|
Sentinel-2A optical features (Mahdianpari et al. 2019; Mercier et al. 2019) | Optical bands: |
Blue, Green, Red; VRE; NIR, | |
SWIR | |
Spectral indices: | |
Normalized difference vegetation index (NDVI): (NIRRed)/(NIRRed) | |
Normalized difference water index (NDWI): (GreenNIR/(GreenNIR) | |
Modified NDWI (MNDWI): (GreenSWIR)/(GreenSWIR) | |
Red edge NDWI (RENDVI): (NIRVRE)/(NIRVRE) | |
Ratio vegetation index (RVI):Red/NIR | |
Difference vegetation index (DVI): NIRRed | |
Forest discriminate index (FDI): NIR(VREBlue) | |
Soil adjusted vegetation index (SAVI):(1L)(NIRRed)/(NIRRedL) | |
Land surface water index (NIRSWIR)/(NIRSWIR) | |
Blue/SWIR ratio | |
Sentinel-1A SAR features (Amani et al. 2017, 2019; Jahncke et al. 2018; Mahdianpari et al. 2019) | SAR polarizations: |
VV and VH | |
SAR parameters: | |
VV/VH ratio, VVVH, VVVH | |
Depolarization ratio = | |
covariance matrix (), trace and determinant of matrix | |
, /Trace, /Trace | |
Degree of polarization (DoP), SE, | |
GLCM of Matrix |
Satellite Data . | Features Extracted . |
---|---|
Sentinel-2A optical features (Mahdianpari et al. 2019; Mercier et al. 2019) | Optical bands: |
Blue, Green, Red; VRE; NIR, | |
SWIR | |
Spectral indices: | |
Normalized difference vegetation index (NDVI): (NIRRed)/(NIRRed) | |
Normalized difference water index (NDWI): (GreenNIR/(GreenNIR) | |
Modified NDWI (MNDWI): (GreenSWIR)/(GreenSWIR) | |
Red edge NDWI (RENDVI): (NIRVRE)/(NIRVRE) | |
Ratio vegetation index (RVI):Red/NIR | |
Difference vegetation index (DVI): NIRRed | |
Forest discriminate index (FDI): NIR(VREBlue) | |
Soil adjusted vegetation index (SAVI):(1L)(NIRRed)/(NIRRedL) | |
Land surface water index (NIRSWIR)/(NIRSWIR) | |
Blue/SWIR ratio | |
Sentinel-1A SAR features (Amani et al. 2017, 2019; Jahncke et al. 2018; Mahdianpari et al. 2019) | SAR polarizations: |
VV and VH | |
SAR parameters: | |
VV/VH ratio, VVVH, VVVH | |
Depolarization ratio = | |
covariance matrix (), trace and determinant of matrix | |
, /Trace, /Trace | |
Degree of polarization (DoP), SE, | |
GLCM of Matrix |
Satellite data . | Features extracted . |
---|---|
Sentinel-2A optical features | Optical bands: |
Green, Red, VRE 1, VRE 2, NIR, VRE 4, SWIR 2 | |
Spectral indices: | |
NDVI, MNDWI, LSWI, Blue/SWIR ratio | |
Sentinel-1A SAR features | SAR polarizations: |
VV and VH | |
SAR parameters: | |
Covariance matrix elements: , | |
Entropy (H), SE, | |
GLCM dissimilarity, homogeneity, angular second moment (ASM), and correlation of | |
GLCM maximum probability and correlation of |
Satellite data . | Features extracted . |
---|---|
Sentinel-2A optical features | Optical bands: |
Green, Red, VRE 1, VRE 2, NIR, VRE 4, SWIR 2 | |
Spectral indices: | |
NDVI, MNDWI, LSWI, Blue/SWIR ratio | |
Sentinel-1A SAR features | SAR polarizations: |
VV and VH | |
SAR parameters: | |
Covariance matrix elements: , | |
Entropy (H), SE, | |
GLCM dissimilarity, homogeneity, angular second moment (ASM), and correlation of | |
GLCM maximum probability and correlation of |
Month . | February . | April . | July . | September . | November . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Scenarios . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . |
SVM classification | ||||||||||
Sentinel-1 images with all features | 41.45 | 0.31 | 26 | 0.13 | 42.21 | 0.33 | 52.43 | 0.45 | 26.67 | 0.14 |
Sentinel-1 images after feature subset | 49.34 | 0.41 | 44.00 | 0.35 | 46.75 | 0.38 | 66.50 | 0.61 | 34.00 | 0.23 |
Sentinel-2 images with all features | 67.10 | 0.62 | 69.33 | 0.64 | 59.09 | 0.52 | 83.98 | 0.81 | 63.33 | 0.57 |
Sentinel-2 images after feature subset | 70.39 | 0.65 | 69.33 | 0.64 | 59.09 | 0.52 | 84.47 | 0.81 | 63.33 | 0.57 |
SAR and optical fused image | 69.08 | 0.64 | 68.67 | 0.63 | 60.39 | 0.54 | 83.49 | 0.80 | 60 | 0.53 |
RF classification | ||||||||||
Sentinel-1 images with all features | 51.97 | 0.44 | 56.00 | 0.49 | 50.00 | 0.42 | 79.61 | 0.76 | 51.33 | 0.43 |
Sentinel-1 images after feature subset | 55.26 | 0.48 | 60.67 | 0.54 | 53.25 | 0.46 | 81.55 | 0.77 | 52.00 | 0.44 |
Sentinel-2 images with all features | 71.71 | 0.67 | 76.00 | 0.72 | 80.52 | 0.77 | 90.29 | 0.88 | 63.33 | 0.57 |
Sentinel-2 images after feature subset | 70.39 | 0.65 | 82.67 | 0.80 | 82.47 | 0.80 | 92.23 | 0.91 | 64.00 | 0.58 |
SAR and optical fused image | 72.37 | 0.68 | 81.33 | 0.78 | 83.77 | 0.81 | 93.20 | 0.92 | 68.00 | 0.63 |
Month . | February . | April . | July . | September . | November . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Scenarios . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . | OA (%) . | Kappa . |
SVM classification | ||||||||||
Sentinel-1 images with all features | 41.45 | 0.31 | 26 | 0.13 | 42.21 | 0.33 | 52.43 | 0.45 | 26.67 | 0.14 |
Sentinel-1 images after feature subset | 49.34 | 0.41 | 44.00 | 0.35 | 46.75 | 0.38 | 66.50 | 0.61 | 34.00 | 0.23 |
Sentinel-2 images with all features | 67.10 | 0.62 | 69.33 | 0.64 | 59.09 | 0.52 | 83.98 | 0.81 | 63.33 | 0.57 |
Sentinel-2 images after feature subset | 70.39 | 0.65 | 69.33 | 0.64 | 59.09 | 0.52 | 84.47 | 0.81 | 63.33 | 0.57 |
SAR and optical fused image | 69.08 | 0.64 | 68.67 | 0.63 | 60.39 | 0.54 | 83.49 | 0.80 | 60 | 0.53 |
RF classification | ||||||||||
Sentinel-1 images with all features | 51.97 | 0.44 | 56.00 | 0.49 | 50.00 | 0.42 | 79.61 | 0.76 | 51.33 | 0.43 |
Sentinel-1 images after feature subset | 55.26 | 0.48 | 60.67 | 0.54 | 53.25 | 0.46 | 81.55 | 0.77 | 52.00 | 0.44 |
Sentinel-2 images with all features | 71.71 | 0.67 | 76.00 | 0.72 | 80.52 | 0.77 | 90.29 | 0.88 | 63.33 | 0.57 |
Sentinel-2 images after feature subset | 70.39 | 0.65 | 82.67 | 0.80 | 82.47 | 0.80 | 92.23 | 0.91 | 64.00 | 0.58 |
SAR and optical fused image | 72.37 | 0.68 | 81.33 | 0.78 | 83.77 | 0.81 | 93.20 | 0.92 | 68.00 | 0.63 |
The preprocessing of Sentinel-1 SLC data involved several key steps using SNAP software. These steps included radiometric calibration, orbit file application, multilooking, and speckle filtering using a Lee Speckle filter. Terrain correction was also applied to the data. Subsequently, the data were transformed into a covariance matrix using the Polarimetric Matrix Generation tool within SNAP. In addition, a dual-polarimetric H/A/alpha decomposition was performed on the SLC image using the H-Alpha Dual Pol Decomposition tool. Furthermore, a gray-level co-occurrence matrix (GLCM) was generated from texture analysis using raster image analysis. To prepare for image classification, a variety of features were extracted from both optical and SAR data, as detailed in Table 2. The SAR parameters extracted included polarizations; covariance matrix elements; H, A, and alpha parameters; Shannon entropy (SE); and GLCM features (Amani et al. 2017). SE, with values ranging from 0 to 1, proved effective in identifying vegetation cover accurately (Betbeder et al. 2014). The degree of polarization, with values ranging from 0 to 1, is closely related to the entropy of the target (Raney 2006).
Recent studies have increasingly combined GLCM with Sentinel-1 and Sentinel-2 data for wetland classification. The GLCM features were computed using the texture analysis tool available in SNAP (Lu & Weng 2007). Leveraging the GLCM based on image texture improved overall accuracy (OA) and urban classification accuracy, as highlighted in various studies (Chatziantoniou et al. 2017; Adeli et al. 2022; Tavus et al. 2022).
Feature selection process
Feature selection involved a two-step process, encompassing separability analysis and multicollinearity analysis on the extracted features. Separability analysis was crucial for assessing the statistical distinction between different classes, thus confirming the importance of the extracted features in effectively distinguishing these classes. To perform this analysis, the Fisher (F) statistics and JM distance were employed. JM distance is widely recognized as a separability criterion for optimal band selection and assessing classification results, as documented in prior research (Sen et al. 2019).
In the GEE platform, RF and SVM classifiers were selected for the classification task. These choices were made based on their superior performance when compared to alternative methods like artificial neural networks, ML, and naïve Bayes. The RF classifier is easy to optimize and robust with less number of training samples (Maxwell et al. 2018). Parameter tuning is required for the optimum performance of the classifiers. For RF classifier, the number of decision trees (ntree) to create was set to 100 and number of variables per split was taken as the default value of square root of number of bands used. In this case, it varies with the scenarios. For the SVM classifier, the kernel type chosen was RBF, gamma value was set to 0.01, and the cost parameter was 10.
RESULTS AND DISCUSSION
In the correlogram, the colors and values signify the correlation between each pair of features. When dealing with multiple correlated features (features correlated with more than one other feature), the one with the highest JM distance was chosen for further analysis to prevent redundancy. The feature subset obtained after both separability and multicollinearity analyses was utilized for the subsequent classification, as outlined in the Supplementary material.
Table 3 outlines the optical and SAR features selected for the classification of data in the month of September. It is worth noting that the elements within the feature subset obtained after JM distance and multicollinearity analysis exhibited variations across different seasons of the year, indicating shifts in land cover corresponding to seasonal changes.
Certain spectral bands, specifically Green, Red, NIR, VRE, and SWIR, exhibited exceptional ability to differentiate between different land cover classes, surpassing the performance of other bands, as indicated in Table S1. Notably, the NDVI emerged as the most effective spectral index for distinguishing between these land cover classes. NDVI captures the absorption of visible light by chlorophyll and the reflection of NIR light during photosynthesis, making it a powerful tool for vegetation assessment.
Several other spectral indices, including the enhanced vegetation index (EVI), FDI, DVI, RVI, and SAVI, exhibited strong correlations with NDVI. Consequently, to streamline the classification process and reduce data complexity, these indices were excluded. The spectral bands Green, Red, VRE, and NIR, which are crucial for vegetation identification, consistently appeared in feature subsets across all seasons. The significance of the VRE band, known for its notable separability between vegetation and other land cover categories, aligns with previous research findings. In addition, the SWIR band consistently demonstrated remarkable ability to distinguish various vegetation classes and remained present in all datasets.
Furthermore, the study identified key spectral indices – NDVI, MNDWI, LSWI, and the Blue/SWIR ratio – that consistently appeared in feature subsets regardless of the time of year.
The study showed that SAR features exhibited a lower JM distance compared to optical features. This can be explained by C-band wave interactions, which emphasize fine spatial elements due to limited penetration through dense vegetation. Consequently, SAR features had reduced separability between land cover classes such as non-flooded vegetation, mangrove, and paddy. Although several studies have argued that DoP was suitable for classification (Shirvany et al. 2012; Betbeder et al. 2015), this study demonstrated that it is very noisy and not suitable for classification. Similarly, certain features like GLCM contrast and mean were identified as noise contributors and excluded. Low JM distances were noted for depolarization ratio, VV/VH ratio, and the sum of VV+VH features. VV and VH polarizations, SE, ASM, and correlation of and consistently appeared in SAR subsets for all seasons. However, VV and VH polarizations had lower separability compared to other SAR features. Derived SAR features demonstrated enhanced separability, while optical features exhibited reduced noise, potentially due to their reliance on spatial features’ physiological characteristics.
In Figure 4, the representation of mangrove areas across all maps was notably accurate. However, some noticeable misclassifications were observed, particularly within the mudflat and built-up classes. A significant portion of the mudflats near the shoreline was incorrectly labeled as fallow land. Furthermore, there were instances where mudflat regions were inaccurately classified as built-up areas, with such errors being particularly prominent in the November map. Interestingly, in November, certain sections of the ocean were also misclassified as built-up areas. The resemblance in sea roughness, especially in monsoon and built-up areas, in SAR data can be ascribed to the smooth and flat nature of both surfaces and the intricate interactions of radar signals with three-dimensional structures. Accurate differentiation between these features in SAR imagery often necessitates the application of advanced data processing and classification techniques.
Figure 5 provides a clear view of the misclassification of land cover classes. The SVM-classified images predominantly showed the prevalence of two classes in each image, with the specific classes varying depending on the time of the year.
Table 4 displays the classification results for both RF and SVM classifiers in each scenario. Regardless of the classifier used, the fusion of Sentinel-1 and Sentinel-2 datasets consistently yielded the highest accuracy. The use of feature subsetting has proven to be advantageous for classification tasks, especially when dealing with SAR data. In addition, the integration of fused datasets has demonstrated its potential to achieve higher accuracy compared to using only subsetted SAR data.
To assess the significance of the accuracy difference, a t-test was performed. In the case of the RF classifier, the difference between the classification results of the fused image and those from Sentinel-1 data with all features and Sentinel-1 with subsetted features is statistically significant, with p-values of 0.018 and 0.024 at the 5% significance level.
The SVM classifier showed better performance with SAR data compared to optical data. The difference between the SVM-classified fused image and those from Sentinel-2 data with all features and Sentinel-2 with subsetted features is also statistically significant, with p-values of 0.002 and 0.018 at the 5% significance level.
It is crucial to observe that the degree of accuracy improvement tends to diminish as the monsoon season approaches, a trend particularly noticeable for the RF classifier. The month of November exhibited the least increase in accuracy, likely due to its association with the peak rainfall during the Northeast monsoon in the Pichavaram region, as documented in Saleem Khan et al. (2014). During this period, radar backscattering behavior tends to become similar between flooded mangrove forests and inundated paddy fields. Likewise, radar responses from wet fallow land, mudflats, and urban areas share similar characteristics.
Across all scenarios, the combined use of optical and SAR data consistently yielded the highest classification accuracy, surpassing the use of standalone SAR data for both the RF and SVM classification. Notably, the RF algorithm exhibited superior accuracy when classifying fused data, outperforming single-source SAR or optical data. It is worth mentioning that the RF classifier consistently outperformed the SVM classifier in all scenarios, a distinction that was particularly evident in both land use classification and the discrimination of mangroves in the Pichavaram region.
The conducted analysis, centered around September 2018, illuminated the efficacy of feature subsetting for both optical and SAR data, accentuating the potency of the RF classifier. It underscored the nuanced accuracy variations across diverse land cover classes, while the fusion of SAR and optical data demonstrated superior classification performance, primarily attributed to their combined geometric and spectral attributes.
Similarly, for SVM classification, using SAR data alone yielded very low accuracy, but after subsetting, the accuracy increased significantly. Fusing SAR data with optical data further improved the classification accuracy. Wetland classes such as mangrove, water, and paddy exhibited improved accuracy with fused data.
The water and mangrove classes consistently demonstrated high PA and CA values throughout the year, irrespective of whether RF or SVM classification was employed. In contrast, all other land cover classes were susceptible to seasonal variations. The study findings suggest that the dry period is the least suitable time for classifying wetland classes. The F1 score for most classes reached its peak in September, coinciding with the Southwest monsoon season. During this time, the Pichavaram mangrove forest experiences inundation, making it easier to distinguish mangroves from other land cover types.
Conversely, the lowest F1 scores for most classes were observed in February and November. November is the month with the highest rainfall in Pichavaram, leading to flooding in the area. The flooded conditions during this period make it challenging to differentiate wetland classes from non-wetland classes, rendering it unsuitable for classification purposes. However, wetland classes such as water bodies and mangroves consistently exhibited better and more consistent results compared to non-wetland classes.
The implementation of feature subsetting in this study has notably improved classification accuracy. Carefully selecting relevant features not only enhances computational efficiency but also boosts the accuracy of classification algorithms. Importantly, this study achieved a statistically significant increase in classification accuracy across different scenarios, setting it apart from previous efforts that did not reach this level of success, as noted by Li et al. (2016).
However, one of the main challenges encountered in this study was that feature subsetting had an adverse impact on the classification of the less distinguishable classes, particularly those with smaller JM distances. The distinction between the mangrove–non-flooded vegetation and non-flooded vegetation–paddy classes proved to be very challenging in both SAR and optical data. This might be attributed to the limited canopy penetration ability of the Sentinel-1 C-band, resulting in volume scattering from thick vegetation canopy and paddy fields. In addition, the C-band is highly sensitive to surface roughness (Mattia et al. 1997). Consequently, volume scattering from mature paddy fields, non-flooded vegetation, and ploughed fallow lands made it difficult to differentiate these classes. It is worth noting that RF classification outperformed SVM classification with each dataset. In the case of SAR data, SVM classification exhibited low accuracy. Moreover, when using fused data, RF classification demonstrated superior classification results.
Indeed, the accuracy of SAR data improved significantly after feature subsetting. In general, optical data are considered superior to SAR data because optical sensors are highly sensitive to various physical characteristics of vegetation. However, the integration of SAR data with optical features has proven to be highly efficient in discriminating different spatial features. Mangrove ecosystems are characterized by woody biomass and intricate aboveground structures, including expansive canopies and complex root systems. Radar imagery is finely tuned to capture these structural nuances, making it a valuable complement to optical imagery. This synergy is particularly advantageous for distinguishing vegetation types that share spectral similarities with mangrove forests. Consequently, the integration of optical data from Sentinel-2 and SAR data from Sentinel-1 offers a harmonious amalgamation of vegetation insights, capitalizing on their complementary attributes and ultimately enhancing mapping accuracy (Liu et al. 2021).
In addition, the study found that the RF algorithm consistently outperformed the SVM algorithm in all scenarios and is recommended for discriminating mangrove species (Maurya et al. 2021). Recent studies have indicated that the mangrove forest in Pichavaram is in a degraded state due to unscientific management practices and cattle grazing during the rainy season, which is a critical growth period for young mangrove species (Selvam et al. 2010). The results of this study can serve as an initial step for relevant authorities to formulate appropriate policies and management strategies for Pichavaram mangroves.
This study has presented a robust approach for delineating wetlands from other land cover classes, providing valuable initial insights for the improved management of these ecosystems. Leveraging the multitemporal capabilities of satellite sensors allows for repeat coverage, facilitating change detection and the assessment of wetland health over time. The feature subsetting technique adopted in this study not only improved classification accuracy but also enhanced efficiency, making it a transferable approach applicable to various regions around the world.
However, a limitation of this study lies in its inability to differentiate between various mangrove species that may share similar structural characteristics. To address this limitation, there is future potential in exploring longer wavelength SAR data, such as L- and P-band data, which could provide enhanced capabilities for discriminating between different mangrove species.
CONCLUSIONS AND FUTURE SCOPE
The primary objective of this study was to assess the impact of separability and multicollinearity analyses on RF and SVM classification, with a specific focus on the relevance of the extracted features. The results indicate that increasing the volume of data does not necessarily lead to improved classification accuracy. Instead, using pertinent data can enhance accuracy while reducing data dimensionality. The study also highlighted the significant role of seasonal variations in classification accuracy, particularly during the monsoon season. The onset of the monsoon season led to diminished improvements in accuracy, especially noticeable for the RF classifier. In November, characterized by heavy rainfall during the Northeast monsoon, accuracy improvements were reduced due to shared radar backscattering behaviors of flooded mangrove forests and inundated paddy fields. The classification challenge was compounded by the similarities in radar responses from wet fallow land, mudflats, and urban areas.
The integration of optical data from Sentinel-2 and SAR data from Sentinel-1 has demonstrated its potential to harmonize vegetation insights and significantly improve mapping accuracy. The statistical t-test analysis of RF classification provided strong support for this, particularly when it comes to utilizing Sentinel-1 SAR data. This fusion method effectively leveraged the complementary attributes of these data sources, proving especially advantageous for distinguishing vegetation types that share spectral similarities with mangrove forests. Notably, the RF classifier consistently outperformed the SVM classifier in all scenarios, making it a suitable choice for discriminating mangrove species and classifying land use. The RF classification showed a remarkable improvement of up to 23% compared to the SVM classification. The analysis also highlighted the importance of feature subsetting in enhancing classification accuracy, particularly when dealing with SAR data. Feature subsetting resulted in a 67% improvement in the fused image with the RF classifier compared to using SAR data alone. In addition, feature subsetting allowed for a reduction in data dimensionality, down to 23–33 features from the initial 63 features.
Overall, this study provides valuable insights into the synergies between optical and SAR data for accurate land cover classification in mangrove ecosystems. It contributes crucial information for monitoring changes in wetland ecosystems, which is essential for conservation efforts aimed at preserving these productive habitats. The results have implications for sustainable monitoring, management, and conservation of wetlands in Pichavaram, with potential applications in similar conservation efforts worldwide, particularly in South India. The fusion of data sources shows promise for advancing mapping accuracy, with the RF classifier proving to be a robust tool for effective discrimination. As our understanding of these synergies evolves, this research sets the stage for further advancements in remote sensing applications for mangrove ecosystems and beyond.
However, challenges remain in accurately classifying certain classes. For wetland classification, the use of lower frequency bands such as L- or P-band, as supported by previous research, is advisable. This study underscores the significance of feature selection in classification efforts and suggests that combining polarimetric SAR data with spectral insights from optical sensors can yield enhanced outcomes. Despite the progress made, there is room for further refinement. Future investigations should continue to explore the relevance of various features for classification and investigate the potential of multifrequency and multipolarized data. The integration of a fully polarimetric SAR dataset could provide additional insights into wetland dynamics in Pichavaram, shedding light on distinct scattering mechanisms.
ACKNOWLEDGEMENTS
This project was supported by the Space Application Centre (SAC), Ahmedabad (Grant Number: EPSA/3.1.1/2017), and received constructive suggestions from the organization. This study was a part of NASA-ISRO SAR (NISAR) projects.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.