## Abstract

The aim of the present study is to identify sources of groundwater contamination in Rupnagar district, Punjab, using an integrated approach of exploratory factor analysis (EFA) and ordinary kriging (OK). For this, a 13 physico-chemical parameter data set at 14 sampling locations for a period of over 25 years was assessed. The correlation was statistically examined amongst parameters. A five-factor model is proposed which explains over 89.11% of total groundwater quality variation. Three semi-variogram models, namely exponential, Gaussian, and spherical, fitted well for the data set and are cross-validated using predictive statistics. Spatial variability maps of all the parameters and factor scores are generated and are in good agreement with each other. The variation seen in groundwater quality is mainly due to various hydrogeochemical, anthropogenic, and geogenic processes occurring in the region. Thus, this study indicated that there is need to treat the industrial and municipal wastewater before discharging it (directly/indirectly) into nearby streams and pits and to encourage sustainable agricultural practices to prevent adverse health effects and minimize further environmental degradation in the study region.

## INTRODUCTION

In a semi-arid country like India, groundwater is limited by quality rather than quantity. It has become an essential commodity and is the most threatened resource nowadays due to its overexploitation by rapidly growing urbanization and industrialization. Groundwater once polluted stays in an unusable condition for quite a long time or even hundreds of years. So the issues related to groundwater contamination are a huge problem that has caught the attention of social activists and researchers all around the world. The study of physico-chemical parameters indicates the diversity of the groundwater and orientation of the likely hydrochemical processes that take place throughout the aquifer (Sánchez-Martos *et al.* 2001). Thus, timely assessment of physico-chemical parameters involves handling a huge amount of spatial data, which helps in assessing the water quality, its potability and planning sustainable management of groundwater resources. The primary and most important tool for handling such a type of data is the geographical information system (GIS) and multivariate statistical analysis.

Multivariate statistical analysis is an unbiased data reduction technique which involves the handling and interpretation of hydrochemical parameters by pointing out significant interrelationship amongst the parameters used (Wenning & Erickson 1994). It also acts as a valuable tool for the evaluation of spatio-temporal variations and interpretation of complex water-quality data sets, apportionment of contaminant sources (natural or anthropogenic), and the design of a monitoring network for the effective management of water resources as well as for finding practical solutions to contamination problems (Machiwal & Jha 2015). Several studies have been conducted over recent years using different multivariate statistical techniques, including factor analysis (FA) and principal component analysis (PCA). All the studies showed that the FA and PCA methods are important tools to determine underlying relationships between water quality parameters and identify sources of groundwater contamination. They appear to be different varieties of the same analysis rather than two different methods. However, there is a fundamental difference between them that has an enormous effect on how to use them. PCA is a data reduction technique that explains variance in the data while reducing the number of parameters to a few uncorrelated components. In contrast, the aim of FA is to help identify underlying factors that are accountable for the correlation amongst the parameters used. Thus, both methods enable the identification of groups of parameters or individuals (Wu & Kuo 2012). Detailed explanations of these techniques are enumerated in the literature (Saager & Esselaar 1969; Ashley & Lloyd 1978; Zhang *et al.* 2009; Thuong *et al.* 2013; Kumarasamy *et al.* 2014), henceforth to avoid the unnecessary length of this paper they are not discussed here.

GIS-based geostatistical techniques help in creating surfaces incorporating the statistical properties of the measured data. Many methods are associated with geostatistics, but they are all in the kriging family. Ordinary, simple, universal, probability, indicator, and disjunctive kriging are some of the geostatistical techniques available (ESRI 2016). These kriging methods produce not only prediction surfaces but also error surfaces, thus indicating how good the predictions are. It is also considered as an important tool for autocorrelation between sampling locations (Clark 1979; Trangmar *et al.* 1986). It also helps in analysing the spatio-temporal variation of the physico-chemical parameters. For this, the spatial variability of these parameters is determined by variographic analysis (i.e. calculating experimental and theoretical semi-variograms) and by mapping these parameters using a geostatistical method (i.e. ordinary kriging (OK) in this study) (Sánchez-Martos *et al.* 2001). Detailed explanations of different geostatistical methods are enumerated in standard textbooks on geostatistics (i.e. Clark 1979; Isaaks & Srivastava 1989; Goovaerts 1997; Kitanidis 1997; Webster & Oliver 2001; Chang 2012).

Many past studies have analysed groundwater chemistry to identify the cause of groundwater contamination by applying the multivariate statistical technique and geostatistical modeling technique in segregation. To date, few studies are reported where the multivariate statistical technique is integrated with the GIS-based geostatistical modeling technique (Sánchez-Martos *et al.* 2001; Kolsi *et al.* 2013; Machiwal & Jha 2015). Singh *et al.* (2011) illustrated the usefulness of the multivariate statistical technique integrated with GIS-based deterministic modeling technique for the interpretation and assessment of water-quality variations. Most of these previous studies mainly focused on the interpretation and assessment of water-quality variations at a specific sampling location using short-term data sets. But in the present study, the groundwater quality of the study area was evaluated using long-term data sets (1990–2015) of physico-chemical parameters, which will effectively raise the efficiency and reliability of the results obtained. This will be a valuable reference for managing groundwater contamination by revealing the primary factors that affect water quality and understanding the geochemistry of the aquifer.

## MATERIALS AND METHODOLOGY

### Study area and data procurement

Rupnagar district, Punjab (76°16′26″E–76°43′21″E, 30°44′21″N–31°25′53″N) is a part of the Satluj River Basin and is located in the eastern part of the Punjab State. It covers an area of 1,414 km^{2} (Figure 1). Agriculture is an important source for the economy in the state covering almost 55% of the area (Central Groundwater Board (CGWB) 2017). The river Satluj is the chief source of water in the area. It is the longest river in the Punjab region. The climate here is semi-arid, with warm summers and cold winters. The district gets its rainfall through the south-west monsoon which contributes about 78% of the total rainfall. The general direction of groundwater in the northern part of the district is towards the south and south-easterly direction whereas in the south-eastern part of the district the flow is in the south and south-westerly direction (CGWB 2017).

CGWB is a national agency working under the Ministry of Water Resources, Government of India. It monitors and analyses data related to physico-chemical parameters of groundwater resources in the country in their chemical laboratory using standard methods for the examination of water and wastewater as given in American Public Health Association (APHA) (1998) and Bureau of Indian Standards (BIS) IS: 3025 (2004). Thirteen parameters were selected that have continuity in their data set for a period of over 25 years (1990–2015) for the 14 sampling locations that the study area covers. These parameters are pH, electrical conductivity (EC), chloride (Cl^{−}), nitrate (NO_{3}^{−}), sulphate (SO_{4}^{2−}), total hardness (TH), potassium (K^{+}), sodium (Na^{+}), calcium (Ca^{2+}), silica (SiO_{2}), magnesium (Mg^{2+}), bicarbonate (HCO_{3}^{−}), and fluoride (F^{−}). All concentrations (except pH) are in mg/l, and EC is in µS/cm at 25 °C. The sampling locations from which the data have been taken include various dug wells and borewells in the study region. The descriptive statistics and acceptable limits as per BIS IS: 10500 (1991) for the various parameters that are analysed are listed in Table 1. From the descriptive statistics result it was apparent that the concentrations of NO_{3}, TH, Mg^{2+} and F^{−} well exceeded the acceptable limits of 45 mg/l, 200 mg/l, 30 mg/l and 1 mg/l respectively. Geographic coordinates of each sampling location were linked to the quality data of various parameters using ArcGIS 10.4 Software.

Physico-chemical parameters . | Min. . | Max. . | Mean . | Std. dev. . | BIS acceptable limits (as per IS: 10500) . |
---|---|---|---|---|---|

pH | 7.61 | 7.88 | 7.74 | 0.08 | 6.5–8.5 |

EC | 439.38 | 1,159.46 | 710.29 | 197.99 | – |

Cl^{−} | 20.25 | 123.15 | 48.10 | 28.46 | 250 |

NO_{3}^{−} | 2.98 | 78.76 | 27.32 | 23.29 | 45 |

SO_{4}^{2−} | 9.77 | 153.38 | 51.72 | 36.15 | 200 |

TH | 183.92 | 343.92 | 251.65 | 41.46 | 200 |

K^{+} | 1.91 | 39.98 | 11.53 | 10.51 | – |

Na^{+} | 11.78 | 205.92 | 63.47 | 44.73 | – |

Ca^{2+} | 33.81 | 67.15 | 53.50 | 8.21 | 75 |

SiO_{2} | 22.00 | 28.75 | 25.71 | 2.07 | – |

Mg^{2+} | 13.34 | 46.85 | 28.34 | 10.62 | 30 |

HCO_{3}^{−} | 193.50 | 376.23 | 298.25 | 60.34 | – |

F^{−} | 0.12 | 4.90 | 0.77 | 1.24 | 1 |

Physico-chemical parameters . | Min. . | Max. . | Mean . | Std. dev. . | BIS acceptable limits (as per IS: 10500) . |
---|---|---|---|---|---|

pH | 7.61 | 7.88 | 7.74 | 0.08 | 6.5–8.5 |

EC | 439.38 | 1,159.46 | 710.29 | 197.99 | – |

Cl^{−} | 20.25 | 123.15 | 48.10 | 28.46 | 250 |

NO_{3}^{−} | 2.98 | 78.76 | 27.32 | 23.29 | 45 |

SO_{4}^{2−} | 9.77 | 153.38 | 51.72 | 36.15 | 200 |

TH | 183.92 | 343.92 | 251.65 | 41.46 | 200 |

K^{+} | 1.91 | 39.98 | 11.53 | 10.51 | – |

Na^{+} | 11.78 | 205.92 | 63.47 | 44.73 | – |

Ca^{2+} | 33.81 | 67.15 | 53.50 | 8.21 | 75 |

SiO_{2} | 22.00 | 28.75 | 25.71 | 2.07 | – |

Mg^{2+} | 13.34 | 46.85 | 28.34 | 10.62 | 30 |

HCO_{3}^{−} | 193.50 | 376.23 | 298.25 | 60.34 | – |

F^{−} | 0.12 | 4.90 | 0.77 | 1.24 | 1 |

Units in mg/l except for pH and EC (µS).

### Exploratory factor analysis (EFA)

*A*

*=*(

*A*

_{1},

*A*

_{2}, …,

*A*

_{n}) is the observation with

*n*variables, and

*y*

*=*(

*Y*

_{1},

*Y*

_{2}, …,

*Y*

_{m}) is the factor matrix for the

*m*number of factors,

*x*

*=*(

*X*

_{11},

*X*

_{12}, …,

*X*) is the factor weight matrix, and

_{nm}*r*

*=*(

*R*

_{1},

*R*

_{2}, …,

*R*) is the residual errors (Kim & Seo 2015). The equation is derived based on two assumptions. First, the error terms do not depend on each other, such that M(

_{m}*r*) = 0 and Var(

_{i}*r*) =

_{i}*μ*

_{i}, where M denotes the variable mean value for all the observed data set, Var and

*μ*

_{i}are the variance and specificity respectively. Second, the common factors

*Y*are independent of one another and of the error terms, such that M(

_{i}*r*) = 0 and Var(

_{i}*r*) = 1.

_{i}EFA follows the following procedure. (a) Normalize the raw data to make the data dimensionless and remove the influence of different units of measurement. Thus, the *z*-scale transformation was performed to standardize the raw data. (b) Parameters are checked for sampling adequacy by performing these tests, namely, the Kaiser–Meyer–Olkin (KMO) test and Bartlett's test of sphericity (BTS). So in the data set used, the KMO test value (should be ≥0.5) came out to be 0.627, indicating the sample is adequate and the data are suited for EFA. As seen from BTS, the chi-square value (*χ*^{2}) of the correlation matrix came out to be 170.814. This value is greater than critical *χ*^{2} = 99.617 (*P* = 0.05 and 78 DOF), indicating correlation amongst the parameters and allowing us to explain the variability in the data set with a lesser number of parameters (or simply factors). (c) Obtain Pearson's correlation matrix between *n* variables. It tells about the inter-parameter relationship between each parameter by giving the initial factor loadings which fit the observed ones as closely as possible (Table 2). (d) Find out the number of factors to be extracted by calculating eigenvalues and eigenvectors of the correlation matrix. For this the Kaiser criterion is followed, which retains only those factors whose eigenvalues exceed 1 (Kaiser 1958) and discards those factors having eigenvalues lower than 1 (Kim & Mueller 1978; Basilevsky 1994), thus decreasing the dimensionality of the original data space by means of EFA. (e) If the factor loadings which are extracted in the third step do not provide any reasonable explanation of the results, go for rotation. For this, varimax rotation as proposed by Kaiser (1958) is used in the present study. In varimax rotation, only those factors are detected which are related to some variables, as opposed to quartimax rotation, which detects factors influencing all the variables. Henceforth, post rotation only those parameters are considered and retained for analysis whose factor loadings are strong so that sufficient variance from the variable is extracted by factor (Table 3). Liu *et al.* (2003) categorized factor loadings as ‘strong’, ‘moderate’, and ‘weak’ analogous to loading values of ‘ > 0.75’, ‘0.75–0.50’, and ‘0.50–0.40’ respectively. The STATISTICA 10.0 Software package was used to carry out all the aforementioned processes using the principal component method of factor analysis (or simply PCM).

. | pH . | EC . | Cl^{−}
. | NO_{3}^{−}
. | SO_{4}^{2−}
. | TH . | K^{+}
. | Na^{+}
. | Ca^{2+}
. | SiO_{2}
. | Mg^{2+}
. | HCO_{3}^{−}
. | F^{−}
. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

pH | 0.96 | ||||||||||||

EC | −0.31 | 0.99 | |||||||||||

Cl^{−} | −0.25 | 0.91** | 0.99 | ||||||||||

NO_{3}^{−} | −0.10 | 0.01 | −0.02 | 0.97 | |||||||||

SO_{4}^{2−} | −0.35 | 0.88** | 0.83** | −0.04 | 0.99 | ||||||||

TH | −0.29 | 0.57* | 0.49 | 0.24 | 0.36 | 0.99 | |||||||

K^{+} | −0.10 | 0.42 | 0.05 | 0.12 | 0.32 | 0.17 | 0.98 | ||||||

Na^{+} | −0.30 | 0.88** | 0.87** | −0.21 | 0.85** | 0.17 | 0.22 | 0.99 | |||||

Ca^{2+} | −0.09 | −0.32 | −0.38 | 0.57* | −0.44 | 0.45 | 0.05 | −0.67** | 0.99 | ||||

SiO_{2} | −0.47 | 0.10 | 0.12 | 0.06 | −0.09 | 0.41 | −0.18 | 0.01 | 0.18 | 0.91 | |||

Mg^{2+} | −0.35 | 0.83** | 0.76** | 0.02 | 0.78** | 0.77** | 0.19 | 0.63** | −0.15 | 0.26 | 0.99 | ||

HCO_{3}^{−} | −0.11 | 0.55* | 0.29 | −0.29 | 0.24 | 0.36 | 0.53* | 0.45 | −0.14 | 0.18 | 0.40 | 0.97 | |

F^{−} | −0.06 | 0.20 | 0.33 | −0.16 | 0.07 | 0.45 | −0.20 | 0.05 | 0.10 | 0.25 | 0.28 | 0.10 | 0.87 |

. | pH . | EC . | Cl^{−}
. | NO_{3}^{−}
. | SO_{4}^{2−}
. | TH . | K^{+}
. | Na^{+}
. | Ca^{2+}
. | SiO_{2}
. | Mg^{2+}
. | HCO_{3}^{−}
. | F^{−}
. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

pH | 0.96 | ||||||||||||

EC | −0.31 | 0.99 | |||||||||||

Cl^{−} | −0.25 | 0.91** | 0.99 | ||||||||||

NO_{3}^{−} | −0.10 | 0.01 | −0.02 | 0.97 | |||||||||

SO_{4}^{2−} | −0.35 | 0.88** | 0.83** | −0.04 | 0.99 | ||||||||

TH | −0.29 | 0.57* | 0.49 | 0.24 | 0.36 | 0.99 | |||||||

K^{+} | −0.10 | 0.42 | 0.05 | 0.12 | 0.32 | 0.17 | 0.98 | ||||||

Na^{+} | −0.30 | 0.88** | 0.87** | −0.21 | 0.85** | 0.17 | 0.22 | 0.99 | |||||

Ca^{2+} | −0.09 | −0.32 | −0.38 | 0.57* | −0.44 | 0.45 | 0.05 | −0.67** | 0.99 | ||||

SiO_{2} | −0.47 | 0.10 | 0.12 | 0.06 | −0.09 | 0.41 | −0.18 | 0.01 | 0.18 | 0.91 | |||

Mg^{2+} | −0.35 | 0.83** | 0.76** | 0.02 | 0.78** | 0.77** | 0.19 | 0.63** | −0.15 | 0.26 | 0.99 | ||

HCO_{3}^{−} | −0.11 | 0.55* | 0.29 | −0.29 | 0.24 | 0.36 | 0.53* | 0.45 | −0.14 | 0.18 | 0.40 | 0.97 | |

F^{−} | −0.06 | 0.20 | 0.33 | −0.16 | 0.07 | 0.45 | −0.20 | 0.05 | 0.10 | 0.25 | 0.28 | 0.10 | 0.87 |

**Correlation is significant at 0.01 level (two-tailed).

*Correlation is significant at 0.05 level (two-tailed).

Parameters . | Factor 1 . | Factor 2 . | Factor 3 . | Factor 4 . | Factor 5 . |
---|---|---|---|---|---|

pH | −0.295 | −0.131 | 0.146 | −0.036 | − 0.823 |

EC | 0.906 | 0.005 | 0.154 | 0.361 | 0.093 |

Cl^{−} | 0.930 | −0.056 | 0.260 | −0.020 | 0.067 |

NO_{3}^{−} | 0.038 | 0.899 | −0.156 | −0.106 | 0.036 |

SO_{4}^{2−} | 0.960 | −0.009 | −0.063 | 0.106 | 0.014 |

TH | 0.381 | 0.456 | 0.656 | 0.286 | 0.273 |

K^{+} | 0.184 | 0.198 | −0.255 | 0.849 | −0.110 |

Na^{+} | 0.898 | −0.336 | −0.097 | 0.148 | 0.094 |

Ca^{2+} | −0.468 | 0.773 | 0.294 | 0.115 | 0.137 |

SiO_{2} | −0.053 | 0.002 | 0.323 | −0.009 | 0.850 |

Mg^{2+} | 0.789 | 0.123 | 0.358 | 0.200 | 0.221 |

HCO_{3}^{−} | 0.218 | −0.289 | 0.208 | 0.839 | 0.151 |

F^{−} | 0.119 | −0.096 | 0.861 | −0.120 | 0.041 |

Eigenvalues | 4.586 | 1.895 | 1.749 | 1.751 | 1.602 |

% Explained variance | 35.280 | 14.579 | 13.456 | 13.470 | 12.325 |

% Cumulative variance^{a} | 35.280 | 49.860 | 63.315 | 76.785 | 89.110 |

Parameters . | Factor 1 . | Factor 2 . | Factor 3 . | Factor 4 . | Factor 5 . |
---|---|---|---|---|---|

pH | −0.295 | −0.131 | 0.146 | −0.036 | − 0.823 |

EC | 0.906 | 0.005 | 0.154 | 0.361 | 0.093 |

Cl^{−} | 0.930 | −0.056 | 0.260 | −0.020 | 0.067 |

NO_{3}^{−} | 0.038 | 0.899 | −0.156 | −0.106 | 0.036 |

SO_{4}^{2−} | 0.960 | −0.009 | −0.063 | 0.106 | 0.014 |

TH | 0.381 | 0.456 | 0.656 | 0.286 | 0.273 |

K^{+} | 0.184 | 0.198 | −0.255 | 0.849 | −0.110 |

Na^{+} | 0.898 | −0.336 | −0.097 | 0.148 | 0.094 |

Ca^{2+} | −0.468 | 0.773 | 0.294 | 0.115 | 0.137 |

SiO_{2} | −0.053 | 0.002 | 0.323 | −0.009 | 0.850 |

Mg^{2+} | 0.789 | 0.123 | 0.358 | 0.200 | 0.221 |

HCO_{3}^{−} | 0.218 | −0.289 | 0.208 | 0.839 | 0.151 |

F^{−} | 0.119 | −0.096 | 0.861 | −0.120 | 0.041 |

Eigenvalues | 4.586 | 1.895 | 1.749 | 1.751 | 1.602 |

% Explained variance | 35.280 | 14.579 | 13.456 | 13.470 | 12.325 |

% Cumulative variance^{a} | 35.280 | 49.860 | 63.315 | 76.785 | 89.110 |

^{a}The percentage of variance accounted for by the first *n* components.

### Geostatistical technique used, cross-validation, and goodness of fit

OK is one of the geostatistical methods used for spatial interpolation and includes autocorrelation. It can also be utilized for mapping spatial variations using spatial correlations between the sampled points (Lee *et al.* 2007). As compared with other geostatistical methods, the OK interpolation method is moderately fast and accurate and is better than deterministic methods (i.e. inverse distance weighted, spline interpolation method, etc.) in measuring prediction error with variance (Chang 2012; Environmental Science Research Institute (ESRI) 2016). Oliver & Webster (1990) and Stein (1999) stated kriging to be a multistep process, which includes exploratory statistical analysis of data, semi-variogram modeling, and spatial variability map creation. Predictions at each sampling location in the study area are done based on the semi-variogram and on the arrangement of spatially measured values that are close to each other. The only limitation lies in the case of outliers and nonstationary data (Weise 2001).

Cross-validation based on predictive statistics was carried out to determine the best-fit semi-variogram models (Stein 1999; Gorai & Kumar 2013; ESRI 2016; Chaudhry *et al.* 2019). For this, a comparison between the average standard error (ASE), mean error (ME), mean square error (MSE), root mean square error (RMSE), and root mean square standardized error (RMSSE) values was done (Table 4). As per ESRI (2016), best-fit models are those which attain RMSSE close to unity and result in the minimum values of ASE, ME, MSE, and RMSE when compared with other models. If the RMSE values of two methods are equal, then the MSE value is considered for determining the best-fit method (Hooshmand *et al.* 2011). Johnston *et al.* (2001) stated that, on examining the cross-validation results if the ASE value is close to the RMSE value, then the prediction standard errors are appropriate and one can adopt that model. Thus, three semi-variogram models are used in this study for physico-chemical parameters and factor scores (Table 5):

Parameters/Factor scores . | Model used . | ME . | MSE . | RMSSE . | RMSE . | ASE . |
---|---|---|---|---|---|---|

pH | Spherical | −0.003 | −0.024 | 0.998 | 0.085 | 0.086 |

EC | Exponential | 1.592 | −0.008 | 0.991 | 209.634 | 209.929 |

Cl^{−} | Exponential | 0.480 | −0.090 | 0.998 | 29.591 | 29.891 |

NO_{3}^{−} | Spherical | −1.398 | −0.030 | 0.995 | 21.318 | 22.793 |

SO_{4}^{2−} | Gaussian | 0.856 | −0.055 | 0.990 | 35.860 | 38.089 |

TH | Gaussian | −1.690 | −0.037 | 0.998 | 45.244 | 45.293 |

K^{+} | Exponential | −0.378 | −0.201 | 0.984 | 12.297 | 14.900 |

Na^{+} | Exponential | 0.561 | −0.081 | 0.998 | 57.628 | 58.531 |

Ca^{2+} | Spherical | −0.228 | −0.023 | 0.996 | 9.206 | 9.409 |

SiO_{2} | Exponential | −0.077 | −0.034 | 0.992 | 2.158 | 2.196 |

Mg^{2+} | Spherical | −0.158 | −0.015 | 0.990 | 11.422 | 11.612 |

HCO_{3}^{−} | Spherical | 1.102 | −0.027 | 0.999 | 66.524 | 67.263 |

F^{−} | Spherical | 0.091 | −0.137 | 0.995 | 1.348 | 1.351 |

Factor score 1 | Gaussian | 0.026 | 0.020 | 0.982 | 1.002 | 1.030 |

Factor score 2 | Exponential | 0.011 | 0.005 | 0.983 | 0.972 | 0.984 |

Factor score 3 | Exponential | −0.010 | −0.011 | 0.995 | 1.077 | 1.098 |

Factor score 4 | Gaussian | 0.022 | 0.018 | 0.999 | 1.073 | 1.088 |

Factor score 5 | Exponential | −0.026 | −0.024 | 0.991 | 1.039 | 1.049 |

Parameters/Factor scores . | Model used . | ME . | MSE . | RMSSE . | RMSE . | ASE . |
---|---|---|---|---|---|---|

pH | Spherical | −0.003 | −0.024 | 0.998 | 0.085 | 0.086 |

EC | Exponential | 1.592 | −0.008 | 0.991 | 209.634 | 209.929 |

Cl^{−} | Exponential | 0.480 | −0.090 | 0.998 | 29.591 | 29.891 |

NO_{3}^{−} | Spherical | −1.398 | −0.030 | 0.995 | 21.318 | 22.793 |

SO_{4}^{2−} | Gaussian | 0.856 | −0.055 | 0.990 | 35.860 | 38.089 |

TH | Gaussian | −1.690 | −0.037 | 0.998 | 45.244 | 45.293 |

K^{+} | Exponential | −0.378 | −0.201 | 0.984 | 12.297 | 14.900 |

Na^{+} | Exponential | 0.561 | −0.081 | 0.998 | 57.628 | 58.531 |

Ca^{2+} | Spherical | −0.228 | −0.023 | 0.996 | 9.206 | 9.409 |

SiO_{2} | Exponential | −0.077 | −0.034 | 0.992 | 2.158 | 2.196 |

Mg^{2+} | Spherical | −0.158 | −0.015 | 0.990 | 11.422 | 11.612 |

HCO_{3}^{−} | Spherical | 1.102 | −0.027 | 0.999 | 66.524 | 67.263 |

F^{−} | Spherical | 0.091 | −0.137 | 0.995 | 1.348 | 1.351 |

Factor score 1 | Gaussian | 0.026 | 0.020 | 0.982 | 1.002 | 1.030 |

Factor score 2 | Exponential | 0.011 | 0.005 | 0.983 | 0.972 | 0.984 |

Factor score 3 | Exponential | −0.010 | −0.011 | 0.995 | 1.077 | 1.098 |

Factor score 4 | Gaussian | 0.022 | 0.018 | 0.999 | 1.073 | 1.088 |

Factor score 5 | Exponential | −0.026 | −0.024 | 0.991 | 1.039 | 1.049 |

Parameters/Factor scores . | Model used . | Nugget, N_{o}
. | Sill, S_{o}
. | Range, r
. |
---|---|---|---|---|

pH | Spherical | 0.005 | 0.008 | 33,498 |

EC | Exponential | 29,201.000 | 42,201.000 | 15,558 |

Cl^{−} | Exponential | 0.135 | 0.343 | 16,317 |

NO_{3}^{−} | Spherical | 0.157 | 0.697 | 17,173 |

SO_{4}^{2−} | Gaussian | 0.292 | 0.511 | 42,031 |

TH | Gaussian | 1,777.600 | 2,196.600 | 62,185 |

K^{+} | Exponential | 0.259 | 0.850 | 16,558 |

Na^{+} | Exponential | 0.316 | 0.486 | 15,558 |

Ca^{2+} | Spherical | 68.990 | 98.990 | 50,985 |

SiO_{2} | Exponential | 2.016 | 4.625 | 14,958 |

Mg^{2+} | Spherical | 106.110 | 154.937 | 58,985 |

HCO_{3}^{−} | Spherical | 3,554.800 | 4,004.800 | 0.366 |

F^{−} | Spherical | 1.551 | 1.651 | 0.376 |

Factor score 1 | Gaussian | 0.912 | 1.151 | 5.282 |

Factor score 2 | Exponential | 0.709 | 0.904 | 2.296 |

Factor score 3 | Exponential | 0.987 | 1.137 | 3.533 |

Factor score 4 | Gaussian | 0.999 | 1.397 | 5.282 |

Factor score 5 | Exponential | 0.910 | 1.040 | 3.788 |

Parameters/Factor scores . | Model used . | Nugget, N_{o}
. | Sill, S_{o}
. | Range, r
. |
---|---|---|---|---|

pH | Spherical | 0.005 | 0.008 | 33,498 |

EC | Exponential | 29,201.000 | 42,201.000 | 15,558 |

Cl^{−} | Exponential | 0.135 | 0.343 | 16,317 |

NO_{3}^{−} | Spherical | 0.157 | 0.697 | 17,173 |

SO_{4}^{2−} | Gaussian | 0.292 | 0.511 | 42,031 |

TH | Gaussian | 1,777.600 | 2,196.600 | 62,185 |

K^{+} | Exponential | 0.259 | 0.850 | 16,558 |

Na^{+} | Exponential | 0.316 | 0.486 | 15,558 |

Ca^{2+} | Spherical | 68.990 | 98.990 | 50,985 |

SiO_{2} | Exponential | 2.016 | 4.625 | 14,958 |

Mg^{2+} | Spherical | 106.110 | 154.937 | 58,985 |

HCO_{3}^{−} | Spherical | 3,554.800 | 4,004.800 | 0.366 |

F^{−} | Spherical | 1.551 | 1.651 | 0.376 |

Factor score 1 | Gaussian | 0.912 | 1.151 | 5.282 |

Factor score 2 | Exponential | 0.709 | 0.904 | 2.296 |

Factor score 3 | Exponential | 0.987 | 1.137 | 3.533 |

Factor score 4 | Gaussian | 0.999 | 1.397 | 5.282 |

Factor score 5 | Exponential | 0.910 | 1.040 | 3.788 |

## RESULTS AND DISCUSSION

Five factors which explain 89.11% of total groundwater quality variations are recognized. Parameters having strong factor loadings are considered and retained for analysis. Factor 1 explains 35.28% of the total variance and shows strong positive loadings for SO_{4}^{2−}, EC, Mg^{2+}, Cl^{−}, and Na^{+}. Thus, factor 1 is the result of various hydrogeochemical processes occurring in the study area such as mineralization of the geological component of soils, lack of geological control, dust deposition, and solubilization in the aquifer medium (Guo & Wang 2004). High loadings of cations such as Na^{+} and Mg^{2+} in the groundwater can also be attributed to the ion-exchange process and dissolution of minerals (Davis & DeWiest 1966). Thus, factor 1 can be termed the *hydrogeochemical factor*. Factor 2 explains 14.57% of the total variance and shows strong positive loadings for NO_{3}^{−} and Ca^{+}. High loading of NO_{3}^{−} is attributed to long-term use of fertilizers, agricultural runoff, animal waste, crop residues, and industrial waste discharge containing nitrogen (Fernandes *et al.* 2008), whereas high loading of Ca^{+} is attributed to andesine conversion to kaolinite (Singh *et al.* 2011). Thus, factor 2 is mainly the result of various anthropogenic processes occurring in the region and can be termed the *anthropogenic contamination factor*. Factor 3 has a total variance of 13.45% with a strong loading of F^{−}. It can be inferred that fluoride in water is due to the weathering of rocks. Since there is no natural source of fluoride in the study area it may also be due to wastewater pollution from industrial and domestic sources, due to agricultural sources like long-term usage of manure and extreme irrigation practices in the study area and improper implementation of water fluoridation in the sewage treatment plant. The vast distance between the groundwater residence time and recharge area alongside Ca^{+} inadequate type groundwater is also a major cause of fluoride contamination in groundwater (Singh *et al.* 2011). Thus, factor 3 can be termed the *fluoride contamination factor*. Factor 4 explains 13.47% of the total variance and shows strong positive loadings for K^{+} and HCO_{3}^{−}. From this factor, it can be interpreted that there may be a strong impact of limestone with gypsum in the rocks of the recharge area. This suggests that K^{+} originates as a result of natural rock–water interaction. It may also be attributed to fertilizer use and the breakdown of animal and human waste (Guo & Wang 2004). Langmuir (1971) stated that bicarbonate enters the groundwater system as a result of uptake of carbon dioxide (CO_{2}) both from soil zone gases and direct atmospheric input. It may also be attributed to carbonate dissolution in the region. Factor 5 has a total variance of 12.33% with the strong negative loading of pH and positive loading of SiO_{2}. Freeze & Cherry (1979) and Hounslow (1995) stated that the rise in pH concentration in water is mostly due to the interaction between atmospheric and biogenic CO_{2} which enter the surface water through infiltration and contribute to the alkalinity of the water and also react with alumina-silicates including kaolin and other clay minerals, resulting in release of Ca^{+} and Mg^{2+} cations, whereas the strong loading of SiO_{2} is due to dissolution of minerals (Freeze & Cherry 1979). Thus, factors 4 and 5 are mainly the result of various natural processes occurring in the region and can be termed the *geogenic contamination factor*.

### Geostatistical modeling assessment of factor scores

On the basis of cross-validation results (Table 4), Gaussian and exponential models are used to assess the spatial distribution of factor scores due to the slightly better performance of these models in most cases.

The results interpreted from spatial variability maps of factor scores are in good agreement with the results of spatial variability maps of the physico-chemical parameters (Figure 2(a)–2(r)). On associating both the maps, it can be clearly seen that areas with high factor scores match fairly to areas with poor quality groundwater (i.e. high values) and low factor score areas match well with good quality groundwater areas (i.e. low values). These results indicate and confirm that multivariate statistical analyses can be used with GIS to successfully identify groundwater contamination sources in the study region.

## CONCLUSION

This study highlights an integrated approach involving EFA and OK. The results interpreted from spatial variability maps of factor scores were in good agreement with the results of physico-chemical parameters. The EFA technique helped in identifying the factors causing the degradation of groundwater quality. Results indicated the occurrence of groundwater contamination by hydrogeochemical, anthropogenic, and geogenic processes. Based on the results, factors 1, 2, 3, and 4 & 5 are termed the *hydrogeochemical factor*, *anthropogenic contamination factor*, *fluoride contamination factor* and *geogenic contamination factor* respectively. Therefore, the water quality of the study area is mainly of mixed physical and chemical characteristics and is controlled mainly by non-agricultural sources like domestic and industrial effluent discharge, septic tanks, and agricultural sources like fertilizers, crop residues, animal waste, etc. This calls for an urgent need to develop appropriate strategies to reduce the effects of groundwater contamination. Steps should be taken to stop practicing anthropogenic activities in the water fields as well as in the city to check further degradation of groundwater in the region. Continuous groundwater quality-monitoring programs along with regular hydrochemical analysis should be adopted in the region.

## ACKNOWLEDGEMENTS

We would like to express our thanks to CGWB, Chandigarh, Ministry of Water Resources, Government of India, for their cooperation and for providing the physico-chemical data needed for this research. Comments/suggestions provided by an anonymous reviewers helped in improving the manuscript. The authors are thankful to them also.

## REFERENCES

*Kriging: a statistical interpolation method and its applications*