The aim of this study was application of multivariate statistical techniques – e.g., hierarchical cluster analysis (HCA), principal component analysis (PCA) and discriminant analysis (DA) – to analyse significant sources affecting water quality in Deepor Beel. Laboratory analyses for 20 water quality parameters were carried out on samples collected from 23 monitoring stations. HCA was used on the raw data, categorising the 23 sampling locations into three clusters, i.e., sites of relatively high (HP), moderate (MP) and low pollution (LP), based on water quality similarities at the sampling locations. The HCA results were then used to carry out PCA, yielding different principal components (PCs) and providing information about the respective sites' pollution factors/sources. The PCA for HP sites resulted in the identification of six PCs accounting for more than 84% of the total cumulative variance. Similarly, the PCA for LP and MP sites resulted in two and five PCs, respectively, each accounting for 100% of total cumulative variance. Finally, the raw dataset was subjected to DA. Four parameters, i.e., BOD5, COD, TSS and SO42− were shown to account for large spatial variations in the wetland's water quality and exert the most influence.

Adverse effects on surface water quality arising from various land use and land cover (LULC) patterns in a watershed can have direct implications on the aquatic ecosystems (Lee et al. 2009; Rothwell et al. 2010; Tran et al. 2010). Many additional elements such as surface runoff and erosion, as well as various anthropogenic activities – e.g., increasing industrial and urban sprawl, along with agrarian activities and improper water resource management – contribute cumulatively to the characteristic features of the surface water in a watershed (Singh et al. 2004; Li et al. 2011; Islam et al. 2015). Wetlands are highly dynamic biological networks, enabling an extensive range of ecosystem services including water decontamination, flood management, erosion regulation, and coastline fortification, in addition to sediment and nutrient transport. Wetlands also create aesthetically attractive areas and wildlife habitats contributing to recreational, educational, and research prospects (Chapman et al. 1996; Fleming-Singer & Horne 2006; Ghermandi et al. 2010; Sabia et al. 2018). Keeping in mind the constant degradation of water quality, which leads to the decline of the natural wetland ecosystems, continuous water quality monitoring of such wetlands is necessary. This, however, results in the generation of massive databases for multiple sampling stations (spatial) and for extensive time periods (temporal), as well as large numbers of water quality and hydrologic parameters. Such databases are extremely complex. Thus, monitoring network and water quality parameter optimization, condensing them to characteristic issues without the loss of valuable data, is increasingly important.

Cluster analysis (CA) aids in categorising units (parameters) into groups (clusters) based on resemblances within classes and discrepancies between them. Data interpretation and pattern designation are CA's outcomes (Vega et al. 1998).

PCA is a very effective tool employed to reduce the dimensionality of a dataset comprising considerable numbers of inter-dependent variables while maintaining the existing inconsistency in the dataset. This is achieved by the converting the set into a fresh array of variables, i.e., the principal components (PCs), which are orthogonal and organized in descending order of significance. Mathematically, the PCs are calculated from covariance that defines the distribution of various computed parameters to obtain eigenvalues and eigenvectors. PCs are the linear groupings of fundamental variables and eigenvectors (Alberto et al. 2001).

Discriminant analysis (DA) offers statistical categorisation of the samples and is accomplished with a prior understanding of the association of entities to a specific cluster (for instance, spatial or temporal sorting of a sample is identified from its monitoring site or time respectively). DA also assists in clustering samples with common properties (Alberto et al. 2001; Singh et al. 2004).

The principal objective of this study was to determine the efficacy of multivariate statistical techniques (CA, PCA and DA) by analysing and interpreting a large set of water quality data from Deepor Beel, without losing any important information. The results are expected to help in assessing the wetland's spatial water quality, which would help in establishing the principal pollution sources at different locations along the basin.

The most appropriate tools for meaningful data reduction and understanding multi-component parameters are the multivariate statistical tools, for example including CA, PCA and DA, which have been used extensively to extract significant information (Vega et al. 1998; Helena et al. 2000; Alberto et al. 2001; Reghunath et al. 2002; Bengraïne & Marhaba 2003; Simeonova et al. 2003; Singh et al. 2004).

Study area

The Deepor Beel (Figure 1) was once considered part of the Brahmaputra River in its southern embankment area, covering 40 km2. It helps regulate the flood waters emerging from the city and Brahmaputra River during the south-west monsoon, and is thus a key stormwater storage facility for the city of Guwahati. It has been recognised as being of international importance since 2002 when it was categorised under the Ramsar Convention (No. 1207), to safeguard global biodiversity, and sustain human life through ecological and hydrological functions (Bhattacharyya & Kapil 2010). The Beel is bounded by dense human populations in the north-east, and Rani and Garbhanga forest reserves in the south-east. Several suburbs and industrial complexes have been developed recently in parts of the wetland.

Figure 1

Study area (Deepor Beel), with sampling points and LULC Classes.

Figure 1

Study area (Deepor Beel), with sampling points and LULC Classes.

Close modal

Apart from this, the local population frequently use the Beel as a waterway between its southern and northern edges. National Highway 37 runs through the north-western part of the wetland, separating it from the Brahmaputra River (Mozumder et al. 2014).

Recently, the wetland's water quality has become vulnerable due to anthropogenic activities including disproportionately extensive fishing and agriculture, stalking of water birds, etc. (Bhattacharyya & Kapil 2010).

Data acquisition

Analytical monitoring was carried out to assess water quality spatial variation across the wetland. Some 23 sampling sites, based on accessibility, were chosen. Sampling site coordinates were determined by Global Positioning System (GPS) – see Table 1 – and transferred to Google Earth. A map of the study area displaying all sampling locations was delineated using ArcMap 10.3.1 and an LULC map of the Beel was generated using ENVI 4.7 software (Canty 2014) (Figure 1).

Table 1

GPS coordinates for sampling sites

SiteLatitudeLongitude
91°39′0.70″E 26° 7′16.15″N 
91°37′58.72″E 26° 7′37.48″N 
91°39′12.60″E 26° 6′51.42″N 
91°38′6.00″E 26° 6′43.20″N 
91°39′46.29″E 26° 7′3.47″N 
91°39′12.56″E 26° 7′52.01″N 
91°40′11.10″E 26° 6′53.58″N 
91°39′22.81″E 26° 7′10.92″N 
91°40′19.68″E 26° 6′49.56″N 
10 91°38′27.62″E 26° 7′34.42″N 
11 91°40′22.57″E 26° 7′11.61″N 
12 91°40′7.08″E 26° 6′43.08″N 
13 91°39′56.16″E 26° 6′43.44″N 
14 91°39′37.38″E 26° 6′46.86″N 
15 91°40′31.61″E 26° 6′48.21″N 
16 91°40′22.69″E 26° 6′36.81″N 
17 91°38′38.40″E 26° 7′8.40″N 
18 91°38′23.84″E 26° 6′48.42″N 
19 91°38′9.60″E 26° 7′4.80″N 
20 91°37′51.60″E 26° 6′43.20″N 
21 91°37′35.89″E 26° 6′27.40″N 
22 91°37′18.68″E 26° 6′32.54″N 
23 91°37′0.25″E 26° 6′16.87″N 
SiteLatitudeLongitude
91°39′0.70″E 26° 7′16.15″N 
91°37′58.72″E 26° 7′37.48″N 
91°39′12.60″E 26° 6′51.42″N 
91°38′6.00″E 26° 6′43.20″N 
91°39′46.29″E 26° 7′3.47″N 
91°39′12.56″E 26° 7′52.01″N 
91°40′11.10″E 26° 6′53.58″N 
91°39′22.81″E 26° 7′10.92″N 
91°40′19.68″E 26° 6′49.56″N 
10 91°38′27.62″E 26° 7′34.42″N 
11 91°40′22.57″E 26° 7′11.61″N 
12 91°40′7.08″E 26° 6′43.08″N 
13 91°39′56.16″E 26° 6′43.44″N 
14 91°39′37.38″E 26° 6′46.86″N 
15 91°40′31.61″E 26° 6′48.21″N 
16 91°40′22.69″E 26° 6′36.81″N 
17 91°38′38.40″E 26° 7′8.40″N 
18 91°38′23.84″E 26° 6′48.42″N 
19 91°38′9.60″E 26° 7′4.80″N 
20 91°37′51.60″E 26° 6′43.20″N 
21 91°37′35.89″E 26° 6′27.40″N 
22 91°37′18.68″E 26° 6′32.54″N 
23 91°37′0.25″E 26° 6′16.87″N 

Weekly monitoring was carried out by sampling from October 2017 to January 2018. A boat was used to collect samples from depths of up to 30 cm below the water surface. Sample collection, storage and analysis were done in accordance with APHA guidelines (APHA/AWWA/WEF 2012). Samples were determined for a wide range of water quality parameters appropriate to water in the Beel. The 20 parameters were; pH, electrical conductivity (EC), turbidity, total alkalinity (TA), total hardness (TH), total dissolved solids (TDS), total suspended solids (TSS), total solids (TS), total Kjeldahl nitrogen (TKN), dissolved oxygen (DO), 5-day biochemical oxygen demand (BOD5), chemical oxygen demand (COD), fluoride (F), chloride (Cl), nitrate (NO3), phosphate (PO43−), sulphate (SO42−), sodium (Na+), potassium (K+) and calcium (Ca2+). All analyses were performed in triplicate. The results, and the analytical procedures used, are set out in Table 2.

Table 2

Methodology adopted for estimating water quality parameters

ParameterAbbreviationAnalytical procedureUnits of measurement
Dissolved oxygen DO Winkler's method mg/L 
pH pH Digital pH meter pH units 
Electrical conductivity EC Digital conductivity meter μS/cm 
Turbidity – Nephelometric turbidimeter NTU 
Total alkalinity TA APHA titrimetric method mg/L as CaCO3 
Total hardness TH APHA titrimetric method mg/L as CaCO3 
Biochemical oxygen demand BOD5 5-day BOD test mg/L 
Chemical oxygen demand COD Closed reflux, titrimetric method mg/L 
Total Kjeldahl nitrogen TKN APHA distillation method mg/L 
Total dissolved solids TDS Oven drying at 103–105 °C mg/L 
Total suspended solids TSS mg/L 
Total solids TS mg/L 
Fluoride F Ion chromatograph (IC) mg/L 
Chloride Cl mg/L 
Nitrates NO3 mg/L 
Phosphates PO43− mg/L 
Sulphates SO42− mg/L 
Sodium Na+ Flame photometer mg/L 
Potassium K+ mg/L 
Calcium Ca2+ mg/L 
ParameterAbbreviationAnalytical procedureUnits of measurement
Dissolved oxygen DO Winkler's method mg/L 
pH pH Digital pH meter pH units 
Electrical conductivity EC Digital conductivity meter μS/cm 
Turbidity – Nephelometric turbidimeter NTU 
Total alkalinity TA APHA titrimetric method mg/L as CaCO3 
Total hardness TH APHA titrimetric method mg/L as CaCO3 
Biochemical oxygen demand BOD5 5-day BOD test mg/L 
Chemical oxygen demand COD Closed reflux, titrimetric method mg/L 
Total Kjeldahl nitrogen TKN APHA distillation method mg/L 
Total dissolved solids TDS Oven drying at 103–105 °C mg/L 
Total suspended solids TSS mg/L 
Total solids TS mg/L 
Fluoride F Ion chromatograph (IC) mg/L 
Chloride Cl mg/L 
Nitrates NO3 mg/L 
Phosphates PO43− mg/L 
Sulphates SO42− mg/L 
Sodium Na+ Flame photometer mg/L 
Potassium K+ mg/L 
Calcium Ca2+ mg/L 

Statistical analysis of the acquired data

The data acquired for Deepor Beel were subjected to analysis using Microsoft Excel, IBM SPSS Statistics (Version 20) and STATISTICA software (De Sá 2007) to carry out the multivariate statistical analyses. HCA, PCA and DA were all carried out for determining the source(s) and governing factor(s) contributing to the contamination of the Deepor Beel.

HCA

HCA is an unsupervised pattern recognition technique used to determine the fundamental structure or principal behaviour of a dataset in the absence of prior assumptions regarding the data, so as to categorize the system objects into groups or clusters based on their closeness or similarity (Vega et al. 1998). Hierarchical agglomerative clustering is one of the most familiar methods of classifying variables, with high similarities within and high variance between the classes, into clusters. In HCA, the clusters are linked in sequence, beginning with the most similar variables and developing higher clusters until a single cluster is attained comprising all variables.

The clustering process yields dendrogram (tree diagram) outputs, providing visual descriptions which determine the number of clusters and describe the essential processes leading to spatial variation (Simeonova et al. 2003; Shrestha & Kazama 2007). The Euclidean distance typically provides the resemblance between two samples and can be depicted by the discrepancy between the analytical values from the samples (Singh et al. 2004). In this study, a hierarchically agglomerative CA was executed on the raw water quality dataset. Ward's method (Wilks 2011) was employed to combine clusters using the squared Euclidean distance method (Equation (1)) (Singh et al. 2004; Shrestha & Kazama 2007) to determine the distance between the target clusters.
formula
(1)
where is the ith object and represents the value of the jth variable of the ith object.

PCA

PCA, one of the most effective and familiar techniques available, is usually employed to reduce the dimensionality of large groups of inter-correlated variables and transform them into a reduced set of uncorrelated (independent) variables (principal components) with minimal loss of original information (Vega et al. 1998; Helena et al. 2000; Alberto et al. 2001). PC is mathematically formulated in Equation (2):
formula
(2)
where Q represents the component score, and p, x, i, j and n represent the component loading, the measured value of the variable, the component and sample numbers and the total number of variables, respectively.

The PC extraction method was used to carry out the PCA by rotating the factor axis to an orthogonal simple structure for ease of analysis. The major orthogonal rotations are varimax, quartimax and equimax, and varimax was adopted in this study. The Keiser-Meyer-Olkin (KMO) criterion was followed to determine the adequacy of sampling (Sahu et al. 2013). Bartlett's sphericity test (χ2 having degrees of freedom) was used to validate the applicability of PCA to the raw data (Gupta et al. 2005; Dalal et al. 2010). Varimax rotation was then performed on the PCs, extracted from PCA, of the normalized variables (the water quality dataset), to generate varifactors (VFs) of environmental significance.

DA

DA is generally adopted to determine variables, by differentiating two or more naturally occurring sets. Unlike CA, DA requires prior knowledge of the association of objects in a particular group/cluster for the statistical classification of the samples. It helps in predicting the group to which each parameter belongs, by maximizing the distance between the clusters, i.e., it should possess strong discriminatory powers between the groups/clusters. It operates on crude data and the method creates a discriminant function for each individual set and is represented by Equation (3) (Alberto et al. 2001; Singh et al. 2004).
formula
(3)
where i represents the number of groups , and , n and represent, respectively, the constant that is characteristic to each set, the number of parameters used to categorize an array of data into each group, and the weighting coefficient allocated by DA to a given particular parameter . 's maximize the distance between the dependent variables and hence the variables with the largest weighting coefficients contribute most to the prediction of group membership.

In the study, three groups (monitoring location zones) were chosen for the spatial assessment on the basis of the HCA results. n was taken as the number of analytical parameters used to designate a measure from a sampling station into a cluster (monitoring region). DA was employed on the raw water quality dataset for the standard as well as stepwise modes for the construction of the discriminant functions to evaluate the spatial variations in the water quality data. The locations (spatial) comprised the dependent variables while the assessed parameters represented the independent variables.

Clustering of Deepor Beel monitoring stations (HCA)

CA enabled the 23 monitoring stations to be classified into three statistically significant clusters, at , based on the 20 different water quality parameters. z-transformation for the input data, squared Euclidean distance as a resemblance measure and Ward's method of linkage were adopted to cluster the sampling sites. The cluster analytical results yielded a dendrogram (Figure 2) to be used to analyse the similarities in water quality variation between the sampling locations.

Figure 2

Ward's minimum variance dendrogram of 23 sampling locations based on the water quality in Deepor Beel wetland.

Figure 2

Ward's minimum variance dendrogram of 23 sampling locations based on the water quality in Deepor Beel wetland.

Close modal

Figure 3 shows the cluster's spatial distribution along the study area, based on the level of similarity between them.

Figure 3

Spatial distribution of the three clusters based on their resemblance in the study area.

Figure 3

Spatial distribution of the three clusters based on their resemblance in the study area.

Close modal

A number of observations were made.

Cluster 1 (HP stations)

Cluster 1 included sites 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 18, 21, 22 and 23, which all had high pollution levels. They are close to the Boragaon dumpsite, as well as the industrial complexes in the south-west part of the watershed, which may be the primary pollution source.

Cluster 2 (LP stations)

Sites primarily in the wetland's central zone (1, 2 and 8), comprise a single cluster, as they are those with the lowest pollution levels. They are commonly enclosed by naturally preserved areas as well as un-urbanized zones, which could explain the limited pollution.

Cluster 3 (MP stations)

This cluster, comprising sites (5, 6, 10, 17, 19 and 20), includes mainly those either adjacent to roads, or contiguous to settlements in and around the wetland, which make no significant contaminant contribution to the water.

Most of the MP and HP monitoring stations are influenced by built-up areas, with industrial and domestic wastewater from nearby settlements.

The CA results indicate the possibility of having a reliable surface water classification in the entire region for the optimal design of future water sampling. They also indicate that one site from each cluster could serve to represent the water quality of the whole network for assessment, without significant data loss. This would help to speed regional water quality assessment, which in turn would reduce sampling costs significantly.

Similar studies have been performed and successfully applied since about 2000 in water quality monitoring programs (Simeonova et al. 2003; Singh et al. 2004; Zhang et al. 2009; Jung et al. 2016; Hajigholizadeh & Melesse 2017).

PCA

PCA was employed independently in relation to each of the three clusters – i.e., HP, LP and MP – derived from CA, to identify the factors influencing water quality in the Beel. The PCA for HP, LP and MP sites resulted in six, two and five PCs corresponding to 84, 100 and 100% of the total variance respectively (Table 3). Scree plots (Figure 4) for the three clustering sites were used to identify the fundamental dataset structure (Vega et al. 1998). Eigenvalues >1 were considered for the extraction of the potential factors. Six components for HP sites, two for LP sites and five for MP sites had eigenvalues exceeding 1, with all other components for the respective sites having values less than 1, and thus being excluded (Figure 4).

Table 3

Factor loadings of experimental variables (20) on the principal components independently for the HP, LP and MP site data-sets

VariablesPC 1PC 2PC 3PC 4PC 5PC 6
Cluster 1 (HP sites) 
 DO −0.916      
 pH       
 EC       
 Turbidity  0.732     
 TA       
 TH       
 BOD5 0.956      
 COD 0.963      
 TDS     −0.862  
 TSS  0.900     
 TS  0.846     
 F       
 Cl  0.787     
 NO3    0.824   
 PO43− −0.700      
 SO42−       
 Na+   0.768    
 K+     0.805  
 Ca2+      0.910 
 TKN       
 % variance 22.550 20.342 13.467 10.071 9.860 7.748 
 Cumulative % variance 22.550 42.892 56.359 66.430 76.290 84.037 
Cluster 2 (LP sites) 
 DO −0.767      
 pH  0.793     
 EC 0.988      
 Turbidity −0.0997      
 TA 0.916      
 TH −0.0969      
 BOD5  1.000     
 COD  0.822     
 TDS  −0.746     
 TSS 0.999      
 TS 0.971      
 F 0.701 0.713     
 Cl 0.955      
 NO3 −0.948      
 PO43− 0.841      
 SO42− 0.977      
 Na+  0.979     
 K+  −0.930     
 Ca2+  0.967     
 TKN  −0.904     
 % variance 59.029 40.971     
 Cumulative % variance 59.029 100.000     
Cluster 3 (MP sites) 
 SO42− 0.965      
 TS 0.925      
 COD 0.917      
 BOD5 0.822      
 TSS 0.784      
 TDS 0.729      
 pH −0.721      
 Ca2+ 0.704      
 Turbidity  0.974     
 DO  0.946     
 PO43−  −0.893     
 Na+  −0.829     
 Cl  0.702     
 TKN   0.905    
 TA   −0.889    
 EC   0.791    
 F       
 NO3    −0.891   
 TH    0.887   
 K+     0.992  
 % variance 29.838 28.628 17.594 13.926 10.014  
 Cumulative % variance 29.838 58.466 76.060 89.986 100.000  
VariablesPC 1PC 2PC 3PC 4PC 5PC 6
Cluster 1 (HP sites) 
 DO −0.916      
 pH       
 EC       
 Turbidity  0.732     
 TA       
 TH       
 BOD5 0.956      
 COD 0.963      
 TDS     −0.862  
 TSS  0.900     
 TS  0.846     
 F       
 Cl  0.787     
 NO3    0.824   
 PO43− −0.700      
 SO42−       
 Na+   0.768    
 K+     0.805  
 Ca2+      0.910 
 TKN       
 % variance 22.550 20.342 13.467 10.071 9.860 7.748 
 Cumulative % variance 22.550 42.892 56.359 66.430 76.290 84.037 
Cluster 2 (LP sites) 
 DO −0.767      
 pH  0.793     
 EC 0.988      
 Turbidity −0.0997      
 TA 0.916      
 TH −0.0969      
 BOD5  1.000     
 COD  0.822     
 TDS  −0.746     
 TSS 0.999      
 TS 0.971      
 F 0.701 0.713     
 Cl 0.955      
 NO3 −0.948      
 PO43− 0.841      
 SO42− 0.977      
 Na+  0.979     
 K+  −0.930     
 Ca2+  0.967     
 TKN  −0.904     
 % variance 59.029 40.971     
 Cumulative % variance 59.029 100.000     
Cluster 3 (MP sites) 
 SO42− 0.965      
 TS 0.925      
 COD 0.917      
 BOD5 0.822      
 TSS 0.784      
 TDS 0.729      
 pH −0.721      
 Ca2+ 0.704      
 Turbidity  0.974     
 DO  0.946     
 PO43−  −0.893     
 Na+  −0.829     
 Cl  0.702     
 TKN   0.905    
 TA   −0.889    
 EC   0.791    
 F       
 NO3    −0.891   
 TH    0.887   
 K+     0.992  
 % variance 29.838 28.628 17.594 13.926 10.014  
 Cumulative % variance 29.838 58.466 76.060 89.986 100.000  
Figure 4

Scree plots from PCA for: (a) Cluster 1 {HP sites}; (b) Cluster 2 {LP sites}; (c) Cluster 3 {MP sites}.

Figure 4

Scree plots from PCA for: (a) Cluster 1 {HP sites}; (b) Cluster 2 {LP sites}; (c) Cluster 3 {MP sites}.

Close modal

Amongst the PCs for the HP sites, PC1 incorporates 22.5% of the total variance, with strong positive loading on BOD5 and COD, whereas the loadings on PO43− and DO were strongly negative, indicating organic pollution. The converse correlation between BOD/COD and DO is obvious as increasing organic pollution leads to septic conditions, which reduces the DO in the water.

PC2, covering 20.3% of the total variance, exhibited strong positive loadings on turbidity, TSS, TS and Cl. Turbidity, TSS and TS are partially inter-related and correspond in part to soil erosion, whereas the presence of Cl may signify sewage discharges (Kelly et al. 2010).

PC3, explaining about 13.5% of the variance, showed strong positive loadings for Na+. PC4, corresponding to 10.1% of variance, indicated strong positive NO3 loadings, possibly accounted for by organic pollution from septic/domestic waste. PC5 – 9.86% of variance – displayed strong positive K+ loadings and negative TDS loadings. Finally, PC6, exhibiting the least variance at 7.7% had a strongly positive Ca2+ loading, possibly indicating soil component dissolution.

Using PCA on the LP site dataset produced only two PCs. PC1, describing about 59% of the total variance, exhibited strongly positive EC, alkalinity, TSS, TS, F, Cl, PO43− and SO42− loadings, with negative loadings on DO, turbidity, NO3 and hardness. This indicates the influence of non-point contamination from agriculture. Local farmers usually use fertilizers for rice cultivation, as a result of which, the wetland receives significant quantities of phosphate. PC2, explaining the remaining 41% of variance, showed strongly positive pH, BOD5, COD, F, Na+ and Ca2+ loadings, and strongly negative TDS, K+ and TKN loadings. This is attributed to the decomposition of dead vegetation in the vicinity.

Finally, use of PCA on the dataset from the MP sites yielded five PCs. PC1, supporting about 30% of the total variance, showed strongly positive SO42−, TS, COD, BOD5, TSS, TDS and Ca2+ loadings. This component indicates the contribution of an organic pollution source along with strong surface runoff from neighbouring areas. PC2 – some 28.6% of variance – displayed strongly positive turbidity, DO and Cl loadings, with strongly negative PO43− and Na+ loadings, which appear to be attributable to large seasonal variations in water quality.

PC3 – about 17.6% of variance – exhibited strongly positive TKN and EC loadings which are typical of agricultural runoff. PC4, pertaining to 13.9% of variance, revealed strongly positive total hardness loadings, probably arising from erosion during farming, as hardness is primarily a function of the concentrations of Ca2+ and Mg2+. PC5 (10%) showed strongly positive K+ loadings.

DA

The execution of DA required the raw water quality dataset to be clustered before processing. The dataset was first categorised into three clusters – i.e., HP, LP and MP sites – which consisted of the grouping (dependent) variables, while the measured parameters represented the independent variables.

Tables 4 and 5 represent the discriminant functions (DFs) and classification matrices (CMs), respectively, obtained from the standard and stepwise modes of DA. As can be seen in the tables, the standard mode DFs generated equivalent CMs, using the 20 discriminant parameters allocating 100% correct cases. On the other hand, the stepwise mode DA generated CMs with 87% correct assignations for four major discriminating parameters – BOD5, COD, TSS and SO42− (see Figure 6 for box plots showing spatial variations along the Deepor Beel). It gave excellent results for the spatial variation of wetland water quality and, out of a total of 20 parameters, needed only 4 (BOD5, COD, TSS and SO42−) to discriminate between the three clusters of monitoring stations. Scores of the two functions were plotted (Figure 5) and the different water quality at various locations along the Deepor Beel is very evident.

Table 4

Classification functions (Equation (3)) for discriminant analysis of the spatial variations along Deepor Beel

ParametersStandard DA mode
Stepwise DA mode
Cluster 1 coeff.Cluster 2 coeff.Cluster 3 coeff.Cluster 1 coeff.Cluster 2 coeff.Cluster 3 coeff.
DO 139.154 139.314 149.863    
pH 1,859.670 1,897.098 1,893.091    
EC 0.033 0.033 −0.001    
Turbidity −2.821 −2.591 −4.718    
TA −15.267 −14.128 −18.187    
TH −14.186 −15.748 −13.348    
BOD5 −44.711 −54.251 −34.162 0.486 2.138 2.060 
COD 18.721 20.477 16.735 −0.068 −0.520 −0.495 
TDS 4.040 4.169 4.106    
TSS 1.738 1.528 2.157 0.102 0.053 0.152 
F −359.818 −382.492 −355.646    
Cl 17.248 19.089 16.463    
NO3 −132.937 −150.605 −135.993    
PO43− 148.384 −162.804 192.853    
SO42− −8.384 −7.378 −7.981 −0.0121 0.221 −0.093 
Na+ 7.545 8.409 7.788    
K+ −52.360 −57.301 −53.282    
Ca2+ 2.474 2.338 1.991    
TKN 29,196.331 31,221.624 29,120.216    
(Constant) −6,846.747 −6,986.171 −7,082.359 −11.169 −11.816 −20.094 
ParametersStandard DA mode
Stepwise DA mode
Cluster 1 coeff.Cluster 2 coeff.Cluster 3 coeff.Cluster 1 coeff.Cluster 2 coeff.Cluster 3 coeff.
DO 139.154 139.314 149.863    
pH 1,859.670 1,897.098 1,893.091    
EC 0.033 0.033 −0.001    
Turbidity −2.821 −2.591 −4.718    
TA −15.267 −14.128 −18.187    
TH −14.186 −15.748 −13.348    
BOD5 −44.711 −54.251 −34.162 0.486 2.138 2.060 
COD 18.721 20.477 16.735 −0.068 −0.520 −0.495 
TDS 4.040 4.169 4.106    
TSS 1.738 1.528 2.157 0.102 0.053 0.152 
F −359.818 −382.492 −355.646    
Cl 17.248 19.089 16.463    
NO3 −132.937 −150.605 −135.993    
PO43− 148.384 −162.804 192.853    
SO42− −8.384 −7.378 −7.981 −0.0121 0.221 −0.093 
Na+ 7.545 8.409 7.788    
K+ −52.360 −57.301 −53.282    
Ca2+ 2.474 2.338 1.991    
TKN 29,196.331 31,221.624 29,120.216    
(Constant) −6,846.747 −6,986.171 −7,082.359 −11.169 −11.816 −20.094 
Table 5

Classification matrix for discriminant analysis of the spatial variation along Deepor Beel

Monitoring Regions% CorrectRegions assigned by DA
Cluster 1Cluster 2Cluster 3
Standard DA mode 
 Cluster 1 100.0 14 
 Cluster 2 100.0 
 Cluster 3 100.0 
 Total 100.0 14 
Stepwise DA mode 
 Cluster 1 85.7 12 
 Cluster 2 100.0 
 Cluster 3 83.3 
 Total 87.0 14 
Monitoring Regions% CorrectRegions assigned by DA
Cluster 1Cluster 2Cluster 3
Standard DA mode 
 Cluster 1 100.0 14 
 Cluster 2 100.0 
 Cluster 3 100.0 
 Total 100.0 14 
Stepwise DA mode 
 Cluster 1 85.7 12 
 Cluster 2 100.0 
 Cluster 3 83.3 
 Total 87.0 14 
Figure 5

Scatter plot for the spatial discrimination analysis of water quality variations across three clusters (PCA stepwise mode).

Figure 5

Scatter plot for the spatial discrimination analysis of water quality variations across three clusters (PCA stepwise mode).

Close modal
Figure 6

Spatial variations of water quality for the three clusters of monitoring locations: BOD5, COD, TSS and SO42−.

Figure 6

Spatial variations of water quality for the three clusters of monitoring locations: BOD5, COD, TSS and SO42−.

Close modal

This case study details the application of various multivariate statistical techniques on the water quality dataset from Deepor Beel. HCA classified the 23 sampling stations into three clusters with similar characteristics, representing sites having high, low and moderate pollution. These clusters were then subjected to independent factor analysis to reduce their dataset dimensionality.

PCs acquired using PCA indicate that the parameters accounting for the water quality variations are essentially associated with point source organic pollution – a dumpsite and an industrial complex – in the relatively heavily polluted sections, non-point source – e.g., nutrients in agricultural runoff – in low pollution sections, and organic pollution and nutrients, from domestic wastewater and industry, in relatively moderate pollution sections of the wetland.

DA helped reduce the data significantly, showing that only four parameters (BOD5, COD, TSS and SO42−) were responsible for large-scale spatial water quality variations in the wetland.

The efficacy of multivariate statistical techniques in monitoring and assessment as well as better water quality management in Deepor Beel is very evident.

Alberto
W. D.
,
María del Pilar
D.
,
María Valeria
A.
,
Fabiana
P. S.
,
Cecilia
H. A.
&
María de los Ángeles
B.
2001
Pattern recognition techniques for the evaluation of spatial and temporal variations in water quality. A case study:: Suquía River Basin (Córdoba–Argentina)
.
Water Research
35
,
2881
2894
.
https://doi.org/10.1016/S0043-1354(00)00592-3
.
APHA/AWWA/WEF
.
2012
Standard Methods for the Examination of Water and Wastewater
, 20th edn.
American Public Health Association/American Water Works Association/Water Environment Federation
,
Washington DC
,
USA
.
Bengraïne
K.
&
Marhaba
T. F.
2003
Using principal component analysis to monitor spatial and temporal changes in water quality
.
Journal of Hazardous Materials
100
,
179
195
.
https://doi.org/10.1016/S0304-3894(03)00104-3
.
Bhattacharyya
K. G.
&
Kapil
N.
2010
Impact of urbanization on the quality of water in a natural reservoir: a case study with the Deepor Beel in Guwahati city, India
.
Water and Environment Journal
24
,
83
96
.
https://doi.org/10.1111/j.1747-6593.2008.00157.x
.
Canty
M. J.
2014
Image Analysis, Classification and Change Detection in Remote Sensing: with Algorithms for ENVI/IDL and Python
.
CRC Press
,
Boca Raton
.
Chapman
L. J.
,
Chapman
C. A.
&
Chandler
M.
1996
Wetland ecotones as refugia for endangered fishes
.
Biological Conservation
78
,
263
270
.
https://doi.org/10.1016/S0006-3207(96)00030-4
.
Dalal
S. G.
,
Shirodkar
P. V.
,
Jagtap
T. G.
,
Naik
B. G.
&
Rao
G. S.
2010
Evaluation of significant sources influencing the variation of water quality of Kandla creek, Gulf of Katchchh, using PCA
.
Environmental Monitoring and Assessment
163
,
49
56
.
https://doi.org/10.1007/s10661-009-0815-y
.
De Sá
J. P. M.
2007
Applied Statistics Using SPSS, Statistica, Matlab and R
.
Springer Science & Business Media
,
Heidelberg
.
Fleming-Singer
M. S.
&
Horne
A. J.
2006
Balancing wildlife needs and nitrate removal in constructed wetlands: the case of the Irvine Ranch Water District's San Joaquin Wildlife Sanctuary
.
Ecological Engineering
26
,
147
166
.
https://doi.org/10.1016/j.ecoleng.2005.09.010
.
Ghermandi
A.
,
Van Den Bergh
J. C. J. M.
,
Brander
L. M.
,
De Groot
H. L. F.
&
Nunes
P. A. L. D.
2010
Values of natural and human-made wetlands: a meta-analysis
.
Water Resources Research
46
,
1
12
.
https://doi.org/10.1029/2010WR009071
.
Gupta
A. K.
,
Gupta
S. K.
&
Patil
R. S.
2005
Statistical analyses of coastal water quality for a port and harbour region in India
.
Environmental Monitoring and Assessment
102
,
179
200
.
https://doi.org/10.1007/s10661-005-6021-7
.
Hajigholizadeh
M.
&
Melesse
A. M.
2017
Assortment and spatiotemporal analysis of surface water quality using cluster and discriminant analyses
.
Catena
151
,
247
258
.
https://doi.org/10.1016/j.catena.2016.12.018
.
Helena
B.
,
Pardo
R.
,
Vega
M.
,
Barrado
E.
,
Fernandez
J. M.
&
Fernandez
L.
2000
Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis
.
Water Research
34
,
807
816
.
https://doi.org/10.1016/S0043-1354(99)00225-0
.
Islam
M. S.
,
Ahmed
M. K.
,
Raknuzzaman
M.
,
Habibullah -Al- Mamun
M.
&
Islam
M. K.
2015
Heavy metal pollution in surface water and sediment: a preliminary assessment of an urban river in a developing country
.
Ecological Indicators
48
,
282
291
.
https://doi.org/10.1016/j.ecolind.2014.08.016
.
Jung
K. Y.
,
Lee
K.-L.
,
Im
T. H.
,
Lee
I. J.
,
Kim
S.
,
Han
K.-Y.
&
Ahn
J. M.
2016
Evaluation of water quality for the Nakdong River watershed using multivariate analysis
.
Environmental Technology & Innovation
5
,
67
82
.
https://doi.org/10.1016/j.eti.2015.12.001
.
Kelly
W. R.
,
Panno
S. V.
,
Hackley
K. C.
,
Hwang
H. H.
,
Martinsek
A. T.
&
Markus
M.
2010
Using chloride and other ions to trace sewage and road salt in the Illinois Waterway
.
Applied Geochemistry
25
(
5
),
661
673
.
https://doi.org/10.1016/j.apgeochem.2010.01.020
.
Lee
S. W.
,
Hwang
S. J.
,
Lee
S. B.
,
Hwang
H. S.
&
Sung
H. C.
2009
Landscape ecological approach to the relationships of land use patterns in watersheds to water quality characteristics
.
Landscape and Urban Planning
92
,
80
89
.
https://doi.org/10.1016/j.landurbplan.2009.02.008
.
Mozumder
C.
,
Tripathi
N. K.
&
Tipdecho
T.
2014
Ecosystem evaluation (1989–2012) of Ramsar wetland Deepor Beel using satellite-derived indices
.
Environmental Monitoring and Assessment
186
,
7909
7927
.
https://doi.org/10.1007/s10661-014-3976-2
.
Reghunath
R.
,
Murthy
T. R. S.
&
Raghavan
B. R.
2002
The utility of multivariate statistical techniques in hydrogeochemical studies: an example from Karnataka, India
.
Water Research
36
,
2437
2442
.
https://doi.org/10.1016/S0043-1354(01)00490-0
.
Rothwell
J. J.
,
Dise
N. B.
,
Taylor
K. G.
,
Allott
T. E. H.
,
Scholefield
P.
,
Davies
H.
&
Neal
C.
2010
A spatial and seasonal assessment of river water chemistry across North West England
.
Science of the Total Environment
408
,
841
855
.
https://doi.org/10.1016/j.scitotenv.2009.10.041
.
Sahu
B. K.
,
Begum
M.
,
Khadanga
M. K.
,
Jha
D. K.
,
Vinithkumar
N. V.
&
Kirubagaran
R.
2013
Evaluation of significant sources influencing the variation of physico-chemical parameters in Port Blair Bay, South Andaman, India by using multivariate statistics
.
Marine Pollution Bulletin
66
,
246
251
.
https://doi.org/10.1016/j.marpolbul.2012.09.021
.
Shrestha
S.
&
Kazama
F.
2007
Assessment of surface water quality using multivariate statistical techniques: a case study of the Fuji river basin, Japan
.
Environmental Modelling & Software
22
,
464
475
.
https://doi.org/10.1016/j.envsoft.2006.02.001
.
Simeonova
P.
,
Simeonov
V.
&
Andreev
G.
2003
Water quality study of the Struma river basin, Bulgaria (1989–1998)
.
Open Chemistry
1
,
121
136
.
https://doi.org/10.2478/BF02479264
.
Singh
K. P.
,
Malik
A.
,
Mohan
D.
&
Sinha
S.
2004
Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India) - A case study
.
Water Research
38
,
3980
3992
.
https://doi.org/10.1016/j.watres.2004.06.011
.
Tran
C. P.
,
Bode
R. W.
,
Smith
A. J.
&
Kleppel
G. S.
2010
Land-use proximity as a basis for assessing stream water quality in New York State (USA)
.
Ecological Indicators
10
,
727
733
.
https://doi.org/10.1016/j.ecolind.2009.12.002
.
Vega
M.
,
Pardo
R.
,
Barrado
E.
&
Debán
L.
1998
Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis
.
Water Research
32
,
3581
3592
.
https://doi.org/10.1016/S0043-1354(98)00138-9
.
Wilks
D. S.
2011
Cluster analysis
.
International Geophysics
100
,
603
616
.
Zhang
Q.
,
Li
Z.
,
Zeng
G.
,
Li
J.
,
Fang
Y.
,
Yuan
Q.
,
Wang
Y.
&
Ye
F.
2009
Assessment of surface water quality using multivariate statistical techniques in red soil hilly region: a case study of Xiangjiang watershed, China
.
Environmetal Monitoring and Assessment
152
,
123
131
.
https://doi.org/10.1007/s10661-008-0301-y
.