Abstract
The aim of this study was application of multivariate statistical techniques – e.g., hierarchical cluster analysis (HCA), principal component analysis (PCA) and discriminant analysis (DA) – to analyse significant sources affecting water quality in Deepor Beel. Laboratory analyses for 20 water quality parameters were carried out on samples collected from 23 monitoring stations. HCA was used on the raw data, categorising the 23 sampling locations into three clusters, i.e., sites of relatively high (HP), moderate (MP) and low pollution (LP), based on water quality similarities at the sampling locations. The HCA results were then used to carry out PCA, yielding different principal components (PCs) and providing information about the respective sites' pollution factors/sources. The PCA for HP sites resulted in the identification of six PCs accounting for more than 84% of the total cumulative variance. Similarly, the PCA for LP and MP sites resulted in two and five PCs, respectively, each accounting for 100% of total cumulative variance. Finally, the raw dataset was subjected to DA. Four parameters, i.e., BOD5, COD, TSS and SO42− were shown to account for large spatial variations in the wetland's water quality and exert the most influence.
INTRODUCTION
Adverse effects on surface water quality arising from various land use and land cover (LULC) patterns in a watershed can have direct implications on the aquatic ecosystems (Lee et al. 2009; Rothwell et al. 2010; Tran et al. 2010). Many additional elements such as surface runoff and erosion, as well as various anthropogenic activities – e.g., increasing industrial and urban sprawl, along with agrarian activities and improper water resource management – contribute cumulatively to the characteristic features of the surface water in a watershed (Singh et al. 2004; Li et al. 2011; Islam et al. 2015). Wetlands are highly dynamic biological networks, enabling an extensive range of ecosystem services including water decontamination, flood management, erosion regulation, and coastline fortification, in addition to sediment and nutrient transport. Wetlands also create aesthetically attractive areas and wildlife habitats contributing to recreational, educational, and research prospects (Chapman et al. 1996; Fleming-Singer & Horne 2006; Ghermandi et al. 2010; Sabia et al. 2018). Keeping in mind the constant degradation of water quality, which leads to the decline of the natural wetland ecosystems, continuous water quality monitoring of such wetlands is necessary. This, however, results in the generation of massive databases for multiple sampling stations (spatial) and for extensive time periods (temporal), as well as large numbers of water quality and hydrologic parameters. Such databases are extremely complex. Thus, monitoring network and water quality parameter optimization, condensing them to characteristic issues without the loss of valuable data, is increasingly important.
Cluster analysis (CA) aids in categorising units (parameters) into groups (clusters) based on resemblances within classes and discrepancies between them. Data interpretation and pattern designation are CA's outcomes (Vega et al. 1998).
PCA is a very effective tool employed to reduce the dimensionality of a dataset comprising considerable numbers of inter-dependent variables while maintaining the existing inconsistency in the dataset. This is achieved by the converting the set into a fresh array of variables, i.e., the principal components (PCs), which are orthogonal and organized in descending order of significance. Mathematically, the PCs are calculated from covariance that defines the distribution of various computed parameters to obtain eigenvalues and eigenvectors. PCs are the linear groupings of fundamental variables and eigenvectors (Alberto et al. 2001).
Discriminant analysis (DA) offers statistical categorisation of the samples and is accomplished with a prior understanding of the association of entities to a specific cluster (for instance, spatial or temporal sorting of a sample is identified from its monitoring site or time respectively). DA also assists in clustering samples with common properties (Alberto et al. 2001; Singh et al. 2004).
The principal objective of this study was to determine the efficacy of multivariate statistical techniques (CA, PCA and DA) by analysing and interpreting a large set of water quality data from Deepor Beel, without losing any important information. The results are expected to help in assessing the wetland's spatial water quality, which would help in establishing the principal pollution sources at different locations along the basin.
The most appropriate tools for meaningful data reduction and understanding multi-component parameters are the multivariate statistical tools, for example including CA, PCA and DA, which have been used extensively to extract significant information (Vega et al. 1998; Helena et al. 2000; Alberto et al. 2001; Reghunath et al. 2002; Bengraïne & Marhaba 2003; Simeonova et al. 2003; Singh et al. 2004).
MATERIALS AND METHODOLOGY
Study area
The Deepor Beel (Figure 1) was once considered part of the Brahmaputra River in its southern embankment area, covering 40 km2. It helps regulate the flood waters emerging from the city and Brahmaputra River during the south-west monsoon, and is thus a key stormwater storage facility for the city of Guwahati. It has been recognised as being of international importance since 2002 when it was categorised under the Ramsar Convention (No. 1207), to safeguard global biodiversity, and sustain human life through ecological and hydrological functions (Bhattacharyya & Kapil 2010). The Beel is bounded by dense human populations in the north-east, and Rani and Garbhanga forest reserves in the south-east. Several suburbs and industrial complexes have been developed recently in parts of the wetland.
Apart from this, the local population frequently use the Beel as a waterway between its southern and northern edges. National Highway 37 runs through the north-western part of the wetland, separating it from the Brahmaputra River (Mozumder et al. 2014).
Recently, the wetland's water quality has become vulnerable due to anthropogenic activities including disproportionately extensive fishing and agriculture, stalking of water birds, etc. (Bhattacharyya & Kapil 2010).
Data acquisition
Analytical monitoring was carried out to assess water quality spatial variation across the wetland. Some 23 sampling sites, based on accessibility, were chosen. Sampling site coordinates were determined by Global Positioning System (GPS) – see Table 1 – and transferred to Google Earth. A map of the study area displaying all sampling locations was delineated using ArcMap 10.3.1 and an LULC map of the Beel was generated using ENVI 4.7 software (Canty 2014) (Figure 1).
GPS coordinates for sampling sites
Site . | Latitude . | Longitude . |
---|---|---|
1 | 91°39′0.70″E | 26° 7′16.15″N |
2 | 91°37′58.72″E | 26° 7′37.48″N |
3 | 91°39′12.60″E | 26° 6′51.42″N |
4 | 91°38′6.00″E | 26° 6′43.20″N |
5 | 91°39′46.29″E | 26° 7′3.47″N |
6 | 91°39′12.56″E | 26° 7′52.01″N |
7 | 91°40′11.10″E | 26° 6′53.58″N |
8 | 91°39′22.81″E | 26° 7′10.92″N |
9 | 91°40′19.68″E | 26° 6′49.56″N |
10 | 91°38′27.62″E | 26° 7′34.42″N |
11 | 91°40′22.57″E | 26° 7′11.61″N |
12 | 91°40′7.08″E | 26° 6′43.08″N |
13 | 91°39′56.16″E | 26° 6′43.44″N |
14 | 91°39′37.38″E | 26° 6′46.86″N |
15 | 91°40′31.61″E | 26° 6′48.21″N |
16 | 91°40′22.69″E | 26° 6′36.81″N |
17 | 91°38′38.40″E | 26° 7′8.40″N |
18 | 91°38′23.84″E | 26° 6′48.42″N |
19 | 91°38′9.60″E | 26° 7′4.80″N |
20 | 91°37′51.60″E | 26° 6′43.20″N |
21 | 91°37′35.89″E | 26° 6′27.40″N |
22 | 91°37′18.68″E | 26° 6′32.54″N |
23 | 91°37′0.25″E | 26° 6′16.87″N |
Site . | Latitude . | Longitude . |
---|---|---|
1 | 91°39′0.70″E | 26° 7′16.15″N |
2 | 91°37′58.72″E | 26° 7′37.48″N |
3 | 91°39′12.60″E | 26° 6′51.42″N |
4 | 91°38′6.00″E | 26° 6′43.20″N |
5 | 91°39′46.29″E | 26° 7′3.47″N |
6 | 91°39′12.56″E | 26° 7′52.01″N |
7 | 91°40′11.10″E | 26° 6′53.58″N |
8 | 91°39′22.81″E | 26° 7′10.92″N |
9 | 91°40′19.68″E | 26° 6′49.56″N |
10 | 91°38′27.62″E | 26° 7′34.42″N |
11 | 91°40′22.57″E | 26° 7′11.61″N |
12 | 91°40′7.08″E | 26° 6′43.08″N |
13 | 91°39′56.16″E | 26° 6′43.44″N |
14 | 91°39′37.38″E | 26° 6′46.86″N |
15 | 91°40′31.61″E | 26° 6′48.21″N |
16 | 91°40′22.69″E | 26° 6′36.81″N |
17 | 91°38′38.40″E | 26° 7′8.40″N |
18 | 91°38′23.84″E | 26° 6′48.42″N |
19 | 91°38′9.60″E | 26° 7′4.80″N |
20 | 91°37′51.60″E | 26° 6′43.20″N |
21 | 91°37′35.89″E | 26° 6′27.40″N |
22 | 91°37′18.68″E | 26° 6′32.54″N |
23 | 91°37′0.25″E | 26° 6′16.87″N |
Weekly monitoring was carried out by sampling from October 2017 to January 2018. A boat was used to collect samples from depths of up to 30 cm below the water surface. Sample collection, storage and analysis were done in accordance with APHA guidelines (APHA/AWWA/WEF 2012). Samples were determined for a wide range of water quality parameters appropriate to water in the Beel. The 20 parameters were; pH, electrical conductivity (EC), turbidity, total alkalinity (TA), total hardness (TH), total dissolved solids (TDS), total suspended solids (TSS), total solids (TS), total Kjeldahl nitrogen (TKN), dissolved oxygen (DO), 5-day biochemical oxygen demand (BOD5), chemical oxygen demand (COD), fluoride (F−), chloride (Cl−), nitrate (NO3−), phosphate (PO43−), sulphate (SO42−), sodium (Na+), potassium (K+) and calcium (Ca2+). All analyses were performed in triplicate. The results, and the analytical procedures used, are set out in Table 2.
Methodology adopted for estimating water quality parameters
Parameter . | Abbreviation . | Analytical procedure . | Units of measurement . |
---|---|---|---|
Dissolved oxygen | DO | Winkler's method | mg/L |
pH | pH | Digital pH meter | pH units |
Electrical conductivity | EC | Digital conductivity meter | μS/cm |
Turbidity | – | Nephelometric turbidimeter | NTU |
Total alkalinity | TA | APHA titrimetric method | mg/L as CaCO3 |
Total hardness | TH | APHA titrimetric method | mg/L as CaCO3 |
Biochemical oxygen demand | BOD5 | 5-day BOD test | mg/L |
Chemical oxygen demand | COD | Closed reflux, titrimetric method | mg/L |
Total Kjeldahl nitrogen | TKN | APHA distillation method | mg/L |
Total dissolved solids | TDS | Oven drying at 103–105 °C | mg/L |
Total suspended solids | TSS | mg/L | |
Total solids | TS | mg/L | |
Fluoride | F− | Ion chromatograph (IC) | mg/L |
Chloride | Cl− | mg/L | |
Nitrates | NO3− | mg/L | |
Phosphates | PO43− | mg/L | |
Sulphates | SO42− | mg/L | |
Sodium | Na+ | Flame photometer | mg/L |
Potassium | K+ | mg/L | |
Calcium | Ca2+ | mg/L |
Parameter . | Abbreviation . | Analytical procedure . | Units of measurement . |
---|---|---|---|
Dissolved oxygen | DO | Winkler's method | mg/L |
pH | pH | Digital pH meter | pH units |
Electrical conductivity | EC | Digital conductivity meter | μS/cm |
Turbidity | – | Nephelometric turbidimeter | NTU |
Total alkalinity | TA | APHA titrimetric method | mg/L as CaCO3 |
Total hardness | TH | APHA titrimetric method | mg/L as CaCO3 |
Biochemical oxygen demand | BOD5 | 5-day BOD test | mg/L |
Chemical oxygen demand | COD | Closed reflux, titrimetric method | mg/L |
Total Kjeldahl nitrogen | TKN | APHA distillation method | mg/L |
Total dissolved solids | TDS | Oven drying at 103–105 °C | mg/L |
Total suspended solids | TSS | mg/L | |
Total solids | TS | mg/L | |
Fluoride | F− | Ion chromatograph (IC) | mg/L |
Chloride | Cl− | mg/L | |
Nitrates | NO3− | mg/L | |
Phosphates | PO43− | mg/L | |
Sulphates | SO42− | mg/L | |
Sodium | Na+ | Flame photometer | mg/L |
Potassium | K+ | mg/L | |
Calcium | Ca2+ | mg/L |
Statistical analysis of the acquired data
The data acquired for Deepor Beel were subjected to analysis using Microsoft Excel, IBM SPSS Statistics (Version 20) and STATISTICA software (De Sá 2007) to carry out the multivariate statistical analyses. HCA, PCA and DA were all carried out for determining the source(s) and governing factor(s) contributing to the contamination of the Deepor Beel.
HCA
HCA is an unsupervised pattern recognition technique used to determine the fundamental structure or principal behaviour of a dataset in the absence of prior assumptions regarding the data, so as to categorize the system objects into groups or clusters based on their closeness or similarity (Vega et al. 1998). Hierarchical agglomerative clustering is one of the most familiar methods of classifying variables, with high similarities within and high variance between the classes, into clusters. In HCA, the clusters are linked in sequence, beginning with the most similar variables and developing higher clusters until a single cluster is attained comprising all variables.


PCA
The PC extraction method was used to carry out the PCA by rotating the factor axis to an orthogonal simple structure for ease of analysis. The major orthogonal rotations are varimax, quartimax and equimax, and varimax was adopted in this study. The Keiser-Meyer-Olkin (KMO) criterion was followed to determine the adequacy of sampling (Sahu et al. 2013). Bartlett's sphericity test (χ2 having degrees of freedom) was used to validate the applicability of PCA to the raw data (Gupta et al. 2005; Dalal et al. 2010). Varimax rotation was then performed on the PCs, extracted from PCA, of the normalized variables (the water quality dataset), to generate varifactors (VFs) of environmental significance.
DA






In the study, three groups (monitoring location zones) were chosen for the spatial assessment on the basis of the HCA results. n was taken as the number of analytical parameters used to designate a measure from a sampling station into a cluster (monitoring region). DA was employed on the raw water quality dataset for the standard as well as stepwise modes for the construction of the discriminant functions to evaluate the spatial variations in the water quality data. The locations (spatial) comprised the dependent variables while the assessed parameters represented the independent variables.
RESULTS AND DISCUSSION
Clustering of Deepor Beel monitoring stations (HCA)
CA enabled the 23 monitoring stations to be classified into three statistically significant clusters, at , based on the 20 different water quality parameters. z-transformation for the input data, squared Euclidean distance as a resemblance measure and Ward's method of linkage were adopted to cluster the sampling sites. The cluster analytical results yielded a dendrogram (Figure 2) to be used to analyse the similarities in water quality variation between the sampling locations.
Ward's minimum variance dendrogram of 23 sampling locations based on the water quality in Deepor Beel wetland.
Ward's minimum variance dendrogram of 23 sampling locations based on the water quality in Deepor Beel wetland.
Figure 3 shows the cluster's spatial distribution along the study area, based on the level of similarity between them.
Spatial distribution of the three clusters based on their resemblance in the study area.
Spatial distribution of the three clusters based on their resemblance in the study area.
A number of observations were made.
Cluster 1 (HP stations)
Cluster 1 included sites 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 18, 21, 22 and 23, which all had high pollution levels. They are close to the Boragaon dumpsite, as well as the industrial complexes in the south-west part of the watershed, which may be the primary pollution source.
Cluster 2 (LP stations)
Sites primarily in the wetland's central zone (1, 2 and 8), comprise a single cluster, as they are those with the lowest pollution levels. They are commonly enclosed by naturally preserved areas as well as un-urbanized zones, which could explain the limited pollution.
Cluster 3 (MP stations)
This cluster, comprising sites (5, 6, 10, 17, 19 and 20), includes mainly those either adjacent to roads, or contiguous to settlements in and around the wetland, which make no significant contaminant contribution to the water.
Most of the MP and HP monitoring stations are influenced by built-up areas, with industrial and domestic wastewater from nearby settlements.
The CA results indicate the possibility of having a reliable surface water classification in the entire region for the optimal design of future water sampling. They also indicate that one site from each cluster could serve to represent the water quality of the whole network for assessment, without significant data loss. This would help to speed regional water quality assessment, which in turn would reduce sampling costs significantly.
Similar studies have been performed and successfully applied since about 2000 in water quality monitoring programs (Simeonova et al. 2003; Singh et al. 2004; Zhang et al. 2009; Jung et al. 2016; Hajigholizadeh & Melesse 2017).
PCA
PCA was employed independently in relation to each of the three clusters – i.e., HP, LP and MP – derived from CA, to identify the factors influencing water quality in the Beel. The PCA for HP, LP and MP sites resulted in six, two and five PCs corresponding to 84, 100 and 100% of the total variance respectively (Table 3). Scree plots (Figure 4) for the three clustering sites were used to identify the fundamental dataset structure (Vega et al. 1998). Eigenvalues >1 were considered for the extraction of the potential factors. Six components for HP sites, two for LP sites and five for MP sites had eigenvalues exceeding 1, with all other components for the respective sites having values less than 1, and thus being excluded (Figure 4).
Factor loadings of experimental variables (20) on the principal components independently for the HP, LP and MP site data-sets
Variables . | PC 1 . | PC 2 . | PC 3 . | PC 4 . | PC 5 . | PC 6 . |
---|---|---|---|---|---|---|
Cluster 1 (HP sites) | ||||||
DO | −0.916 | |||||
pH | ||||||
EC | ||||||
Turbidity | 0.732 | |||||
TA | ||||||
TH | ||||||
BOD5 | 0.956 | |||||
COD | 0.963 | |||||
TDS | −0.862 | |||||
TSS | 0.900 | |||||
TS | 0.846 | |||||
F− | ||||||
Cl− | 0.787 | |||||
NO3− | 0.824 | |||||
PO43− | −0.700 | |||||
SO42− | ||||||
Na+ | 0.768 | |||||
K+ | 0.805 | |||||
Ca2+ | 0.910 | |||||
TKN | ||||||
% variance | 22.550 | 20.342 | 13.467 | 10.071 | 9.860 | 7.748 |
Cumulative % variance | 22.550 | 42.892 | 56.359 | 66.430 | 76.290 | 84.037 |
Cluster 2 (LP sites) | ||||||
DO | −0.767 | |||||
pH | 0.793 | |||||
EC | 0.988 | |||||
Turbidity | −0.0997 | |||||
TA | 0.916 | |||||
TH | −0.0969 | |||||
BOD5 | 1.000 | |||||
COD | 0.822 | |||||
TDS | −0.746 | |||||
TSS | 0.999 | |||||
TS | 0.971 | |||||
F− | 0.701 | 0.713 | ||||
Cl− | 0.955 | |||||
NO3− | −0.948 | |||||
PO43− | 0.841 | |||||
SO42− | 0.977 | |||||
Na+ | 0.979 | |||||
K+ | −0.930 | |||||
Ca2+ | 0.967 | |||||
TKN | −0.904 | |||||
% variance | 59.029 | 40.971 | ||||
Cumulative % variance | 59.029 | 100.000 | ||||
Cluster 3 (MP sites) | ||||||
SO42− | 0.965 | |||||
TS | 0.925 | |||||
COD | 0.917 | |||||
BOD5 | 0.822 | |||||
TSS | 0.784 | |||||
TDS | 0.729 | |||||
pH | −0.721 | |||||
Ca2+ | 0.704 | |||||
Turbidity | 0.974 | |||||
DO | 0.946 | |||||
PO43− | −0.893 | |||||
Na+ | −0.829 | |||||
Cl− | 0.702 | |||||
TKN | 0.905 | |||||
TA | −0.889 | |||||
EC | 0.791 | |||||
F− | ||||||
NO3− | −0.891 | |||||
TH | 0.887 | |||||
K+ | 0.992 | |||||
% variance | 29.838 | 28.628 | 17.594 | 13.926 | 10.014 | |
Cumulative % variance | 29.838 | 58.466 | 76.060 | 89.986 | 100.000 |
Variables . | PC 1 . | PC 2 . | PC 3 . | PC 4 . | PC 5 . | PC 6 . |
---|---|---|---|---|---|---|
Cluster 1 (HP sites) | ||||||
DO | −0.916 | |||||
pH | ||||||
EC | ||||||
Turbidity | 0.732 | |||||
TA | ||||||
TH | ||||||
BOD5 | 0.956 | |||||
COD | 0.963 | |||||
TDS | −0.862 | |||||
TSS | 0.900 | |||||
TS | 0.846 | |||||
F− | ||||||
Cl− | 0.787 | |||||
NO3− | 0.824 | |||||
PO43− | −0.700 | |||||
SO42− | ||||||
Na+ | 0.768 | |||||
K+ | 0.805 | |||||
Ca2+ | 0.910 | |||||
TKN | ||||||
% variance | 22.550 | 20.342 | 13.467 | 10.071 | 9.860 | 7.748 |
Cumulative % variance | 22.550 | 42.892 | 56.359 | 66.430 | 76.290 | 84.037 |
Cluster 2 (LP sites) | ||||||
DO | −0.767 | |||||
pH | 0.793 | |||||
EC | 0.988 | |||||
Turbidity | −0.0997 | |||||
TA | 0.916 | |||||
TH | −0.0969 | |||||
BOD5 | 1.000 | |||||
COD | 0.822 | |||||
TDS | −0.746 | |||||
TSS | 0.999 | |||||
TS | 0.971 | |||||
F− | 0.701 | 0.713 | ||||
Cl− | 0.955 | |||||
NO3− | −0.948 | |||||
PO43− | 0.841 | |||||
SO42− | 0.977 | |||||
Na+ | 0.979 | |||||
K+ | −0.930 | |||||
Ca2+ | 0.967 | |||||
TKN | −0.904 | |||||
% variance | 59.029 | 40.971 | ||||
Cumulative % variance | 59.029 | 100.000 | ||||
Cluster 3 (MP sites) | ||||||
SO42− | 0.965 | |||||
TS | 0.925 | |||||
COD | 0.917 | |||||
BOD5 | 0.822 | |||||
TSS | 0.784 | |||||
TDS | 0.729 | |||||
pH | −0.721 | |||||
Ca2+ | 0.704 | |||||
Turbidity | 0.974 | |||||
DO | 0.946 | |||||
PO43− | −0.893 | |||||
Na+ | −0.829 | |||||
Cl− | 0.702 | |||||
TKN | 0.905 | |||||
TA | −0.889 | |||||
EC | 0.791 | |||||
F− | ||||||
NO3− | −0.891 | |||||
TH | 0.887 | |||||
K+ | 0.992 | |||||
% variance | 29.838 | 28.628 | 17.594 | 13.926 | 10.014 | |
Cumulative % variance | 29.838 | 58.466 | 76.060 | 89.986 | 100.000 |
Scree plots from PCA for: (a) Cluster 1 {HP sites}; (b) Cluster 2 {LP sites}; (c) Cluster 3 {MP sites}.
Scree plots from PCA for: (a) Cluster 1 {HP sites}; (b) Cluster 2 {LP sites}; (c) Cluster 3 {MP sites}.
Amongst the PCs for the HP sites, PC1 incorporates 22.5% of the total variance, with strong positive loading on BOD5 and COD, whereas the loadings on PO43− and DO were strongly negative, indicating organic pollution. The converse correlation between BOD/COD and DO is obvious as increasing organic pollution leads to septic conditions, which reduces the DO in the water.
PC2, covering 20.3% of the total variance, exhibited strong positive loadings on turbidity, TSS, TS and Cl−. Turbidity, TSS and TS are partially inter-related and correspond in part to soil erosion, whereas the presence of Cl− may signify sewage discharges (Kelly et al. 2010).
PC3, explaining about 13.5% of the variance, showed strong positive loadings for Na+. PC4, corresponding to 10.1% of variance, indicated strong positive NO3− loadings, possibly accounted for by organic pollution from septic/domestic waste. PC5 – 9.86% of variance – displayed strong positive K+ loadings and negative TDS loadings. Finally, PC6, exhibiting the least variance at 7.7% had a strongly positive Ca2+ loading, possibly indicating soil component dissolution.
Using PCA on the LP site dataset produced only two PCs. PC1, describing about 59% of the total variance, exhibited strongly positive EC, alkalinity, TSS, TS, F−, Cl−, PO43− and SO42− loadings, with negative loadings on DO, turbidity, NO3− and hardness. This indicates the influence of non-point contamination from agriculture. Local farmers usually use fertilizers for rice cultivation, as a result of which, the wetland receives significant quantities of phosphate. PC2, explaining the remaining 41% of variance, showed strongly positive pH, BOD5, COD, F−, Na+ and Ca2+ loadings, and strongly negative TDS, K+ and TKN loadings. This is attributed to the decomposition of dead vegetation in the vicinity.
Finally, use of PCA on the dataset from the MP sites yielded five PCs. PC1, supporting about 30% of the total variance, showed strongly positive SO42−, TS, COD, BOD5, TSS, TDS and Ca2+ loadings. This component indicates the contribution of an organic pollution source along with strong surface runoff from neighbouring areas. PC2 – some 28.6% of variance – displayed strongly positive turbidity, DO and Cl− loadings, with strongly negative PO43− and Na+ loadings, which appear to be attributable to large seasonal variations in water quality.
PC3 – about 17.6% of variance – exhibited strongly positive TKN and EC loadings which are typical of agricultural runoff. PC4, pertaining to 13.9% of variance, revealed strongly positive total hardness loadings, probably arising from erosion during farming, as hardness is primarily a function of the concentrations of Ca2+ and Mg2+. PC5 (10%) showed strongly positive K+ loadings.
DA
The execution of DA required the raw water quality dataset to be clustered before processing. The dataset was first categorised into three clusters – i.e., HP, LP and MP sites – which consisted of the grouping (dependent) variables, while the measured parameters represented the independent variables.
Tables 4 and 5 represent the discriminant functions (DFs) and classification matrices (CMs), respectively, obtained from the standard and stepwise modes of DA. As can be seen in the tables, the standard mode DFs generated equivalent CMs, using the 20 discriminant parameters allocating 100% correct cases. On the other hand, the stepwise mode DA generated CMs with 87% correct assignations for four major discriminating parameters – BOD5, COD, TSS and SO42− (see Figure 6 for box plots showing spatial variations along the Deepor Beel). It gave excellent results for the spatial variation of wetland water quality and, out of a total of 20 parameters, needed only 4 (BOD5, COD, TSS and SO42−) to discriminate between the three clusters of monitoring stations. Scores of the two functions were plotted (Figure 5) and the different water quality at various locations along the Deepor Beel is very evident.
Classification functions (Equation (3)) for discriminant analysis of the spatial variations along Deepor Beel
Parameters . | Standard DA mode . | Stepwise DA mode . | ||||
---|---|---|---|---|---|---|
Cluster 1 coeff. . | Cluster 2 coeff. . | Cluster 3 coeff. . | Cluster 1 coeff. . | Cluster 2 coeff. . | Cluster 3 coeff. . | |
DO | 139.154 | 139.314 | 149.863 | |||
pH | 1,859.670 | 1,897.098 | 1,893.091 | |||
EC | 0.033 | 0.033 | −0.001 | |||
Turbidity | −2.821 | −2.591 | −4.718 | |||
TA | −15.267 | −14.128 | −18.187 | |||
TH | −14.186 | −15.748 | −13.348 | |||
BOD5 | −44.711 | −54.251 | −34.162 | 0.486 | 2.138 | 2.060 |
COD | 18.721 | 20.477 | 16.735 | −0.068 | −0.520 | −0.495 |
TDS | 4.040 | 4.169 | 4.106 | |||
TSS | 1.738 | 1.528 | 2.157 | 0.102 | 0.053 | 0.152 |
F− | −359.818 | −382.492 | −355.646 | |||
Cl− | 17.248 | 19.089 | 16.463 | |||
NO3− | −132.937 | −150.605 | −135.993 | |||
PO43− | 148.384 | −162.804 | 192.853 | |||
SO42− | −8.384 | −7.378 | −7.981 | −0.0121 | 0.221 | −0.093 |
Na+ | 7.545 | 8.409 | 7.788 | |||
K+ | −52.360 | −57.301 | −53.282 | |||
Ca2+ | 2.474 | 2.338 | 1.991 | |||
TKN | 29,196.331 | 31,221.624 | 29,120.216 | |||
(Constant) | −6,846.747 | −6,986.171 | −7,082.359 | −11.169 | −11.816 | −20.094 |
Parameters . | Standard DA mode . | Stepwise DA mode . | ||||
---|---|---|---|---|---|---|
Cluster 1 coeff. . | Cluster 2 coeff. . | Cluster 3 coeff. . | Cluster 1 coeff. . | Cluster 2 coeff. . | Cluster 3 coeff. . | |
DO | 139.154 | 139.314 | 149.863 | |||
pH | 1,859.670 | 1,897.098 | 1,893.091 | |||
EC | 0.033 | 0.033 | −0.001 | |||
Turbidity | −2.821 | −2.591 | −4.718 | |||
TA | −15.267 | −14.128 | −18.187 | |||
TH | −14.186 | −15.748 | −13.348 | |||
BOD5 | −44.711 | −54.251 | −34.162 | 0.486 | 2.138 | 2.060 |
COD | 18.721 | 20.477 | 16.735 | −0.068 | −0.520 | −0.495 |
TDS | 4.040 | 4.169 | 4.106 | |||
TSS | 1.738 | 1.528 | 2.157 | 0.102 | 0.053 | 0.152 |
F− | −359.818 | −382.492 | −355.646 | |||
Cl− | 17.248 | 19.089 | 16.463 | |||
NO3− | −132.937 | −150.605 | −135.993 | |||
PO43− | 148.384 | −162.804 | 192.853 | |||
SO42− | −8.384 | −7.378 | −7.981 | −0.0121 | 0.221 | −0.093 |
Na+ | 7.545 | 8.409 | 7.788 | |||
K+ | −52.360 | −57.301 | −53.282 | |||
Ca2+ | 2.474 | 2.338 | 1.991 | |||
TKN | 29,196.331 | 31,221.624 | 29,120.216 | |||
(Constant) | −6,846.747 | −6,986.171 | −7,082.359 | −11.169 | −11.816 | −20.094 |
Classification matrix for discriminant analysis of the spatial variation along Deepor Beel
Monitoring Regions . | % Correct . | Regions assigned by DA . | ||
---|---|---|---|---|
Cluster 1 . | Cluster 2 . | Cluster 3 . | ||
Standard DA mode | ||||
Cluster 1 | 100.0 | 14 | 0 | 0 |
Cluster 2 | 100.0 | 0 | 3 | 0 |
Cluster 3 | 100.0 | 0 | 0 | 6 |
Total | 100.0 | 14 | 3 | 6 |
Stepwise DA mode | ||||
Cluster 1 | 85.7 | 12 | 0 | 0 |
Cluster 2 | 100.0 | 1 | 3 | 1 |
Cluster 3 | 83.3 | 1 | 0 | 5 |
Total | 87.0 | 14 | 3 | 6 |
Monitoring Regions . | % Correct . | Regions assigned by DA . | ||
---|---|---|---|---|
Cluster 1 . | Cluster 2 . | Cluster 3 . | ||
Standard DA mode | ||||
Cluster 1 | 100.0 | 14 | 0 | 0 |
Cluster 2 | 100.0 | 0 | 3 | 0 |
Cluster 3 | 100.0 | 0 | 0 | 6 |
Total | 100.0 | 14 | 3 | 6 |
Stepwise DA mode | ||||
Cluster 1 | 85.7 | 12 | 0 | 0 |
Cluster 2 | 100.0 | 1 | 3 | 1 |
Cluster 3 | 83.3 | 1 | 0 | 5 |
Total | 87.0 | 14 | 3 | 6 |
Scatter plot for the spatial discrimination analysis of water quality variations across three clusters (PCA stepwise mode).
Scatter plot for the spatial discrimination analysis of water quality variations across three clusters (PCA stepwise mode).
Spatial variations of water quality for the three clusters of monitoring locations: BOD5, COD, TSS and SO42−.
Spatial variations of water quality for the three clusters of monitoring locations: BOD5, COD, TSS and SO42−.
CONCLUSION
This case study details the application of various multivariate statistical techniques on the water quality dataset from Deepor Beel. HCA classified the 23 sampling stations into three clusters with similar characteristics, representing sites having high, low and moderate pollution. These clusters were then subjected to independent factor analysis to reduce their dataset dimensionality.
PCs acquired using PCA indicate that the parameters accounting for the water quality variations are essentially associated with point source organic pollution – a dumpsite and an industrial complex – in the relatively heavily polluted sections, non-point source – e.g., nutrients in agricultural runoff – in low pollution sections, and organic pollution and nutrients, from domestic wastewater and industry, in relatively moderate pollution sections of the wetland.
DA helped reduce the data significantly, showing that only four parameters (BOD5, COD, TSS and SO42−) were responsible for large-scale spatial water quality variations in the wetland.
The efficacy of multivariate statistical techniques in monitoring and assessment as well as better water quality management in Deepor Beel is very evident.