ABSTRACT
Water resource management substantially depends on water quality (WQ). Anthropogenic and geogenic pollutants in water system are challenging to identify, transport, and properly dispose of, thus demanding frequent monitoring. Study focuses on application of statistical approach to analyse pattern and to monitor WQ parameters of region. Paper presents computation of water quality index (WQI) based on various WQ parameters of the Daman Ganga River situated in Vapi, Gujarat, India. 17 WQ parameters considered were pH, electrical conductivity, temperature (Temp), total dissolved solids, NO2 + NO3, (P-Tot), Ca, Mg, Na, K, Cl, SO4, CO3, HCO3, total hardness, sodium absorption ratio (SAR), and calcium hardness (HAR_Ca). Quartile deviation was carried out as preprocessing technique to identify fair analysis of trend followed by other parameters. Application of PCA followed by varimax rotation factor analysis was attempted to identify contribution of significant parameters. Methods developed by Council of Canadian Ministry of Environment (CCME) and British Columbia (BC) were applied to compute WQI. WQI evaluated were 42.35 and 63.29 for CCME and BC, respectively, based on five significantly influencing parameters, namely, HAR_Ca, SAR, CO3, Temp, and P-Tot. Study signifies the hardness and salinity factors impacting WQ and efficiently reduces subjectivity and bias to determine the WQI model.
HIGHLIGHTS
Trend analysis followed by 17 water quality parameters from 2000 to 2019 was studied.
Application of data preprocessing and analysis through principal component analysis followed by varimax rotation factor analysis was performed.
Factor analysis of data notifies that five parameters are suspected to influence the water quality index (WQI) values.
The WQI was computed with the methods developed by the Council of Canadian Ministry of Environment and British Columbia.
INTRODUCTION
Background information
Water quality (WQ) management is of paramount importance for environmental sustainability, public health, and ecosystem preservation. Water is a precious natural resource that is necessary for both human survival and the wellbeing of ecosystems. Over the last few decades, there has been an increase in human activity, particularly in industrial areas, which has had an impact on water bodies. These concerns are present worldwide. There have been expansions in population, industry, agricultural practices, and urban areas along the corridor or riverbanks in modern times due to scientific and technological advancements (Kumar et al. 2019). Large amounts of nutrients, heavy metals, and organic pollutants are transported into surface waters by anthropogenic factors including urbanization and agricultural development. This leads to the contamination of water bodies and sediments, biome outbreaks, and organic pollution (Khabouchi et al. 2020).
In the semi-arid region, the water surface serves as a main WQ indicator for sustainable development. Aspects of sustainable development and water resource management dependent on land cover and natural resource availability, such as surface energy balance, biogeochemical cycling, and climatic regulation, are crucial to the growth of agriculture and water resources (Haghiabi et al. 2018).
Climate change and other associated stresses, particularly rainfall and temperature, can make water reserves more vulnerable. The results of Khambhat City in Gujarat showed that increasing changes in land use and land cover were one of the main causes of change in groundwater quality between 2001 and 2011. Sharp reductions in freshwater resources have been noted in terms of both quantity (lower water table) and quality (high nutrient content, saltwater incursion) because of the aforementioned factors and conditions (Kumar et al. 2019).
At different places, the main drivers of declining WQ are agriculture, industries, human activities, and climate changes. In recent years, several agricultural locations in Baghdad have been transformed into urban districts. There is little doubt that the decrease in agricultural activity has had an adverse effect on the quality of the Tigris River's water, which flows through Baghdad. In addition, human and industrial activities including home sewage and pollution from household and industrial wastes influence the river's WQ. Furthermore, the lack of rain brought on by climate change and neighboring countries' control over rivers that Iraq shares with them have a significant negative influence on the rivers’ salinity levels (Ahmed et al. 2023).
Pollutant transport is indirectly influenced by land use patterns in the surrounding region, but the configurations of water bodies (types, sizes, and location) are the direct elements that determine the effect of lowering water pollution. Season is a more sensitive factor in the impact of landscape on WQ than other criteria. According to Zhao et al. (2020), changing the layouts of water bodies and land use patterns can greatly lower the loads of pollutants.
Also, the surface water body variations are common throughout the world for several reasons (Woolway et al. 2020). For instance, the extinction of Arctic ponds and distinct variations in the length of water bodies in China and the contiguous United States have been caused by climate change (Smol & Douglas 2007). The effects of groundwater pumping, the building and management of dams, and the enhancement of riverbeds further complicate the hydraulic connectivity between surface water and groundwater (Kandekar et al. 2021). An increasing evaporation/precipitation ratio has resulted in a reduction of the pond surface area in the Italian Alps (Salerno et al. 2014). However, it is possible to support the sustainable development of water resources and yield several benefits for humans by paying attention on the dynamics of surface water bodies and comprehending the elements that influence them (Wang et al. 2020).
The preservation of ecosystem services, management of water resources, and promotion of economic growth depend on sustainable development (Moharir & Pande 2020). Yet, comprehensive information about surface water's long-term trends in WQ is lacking. Field measurements give detailed data on the rivers being monitored for WQ. The problem with field WQ measurements, though, was that they required a lot of operation. The cost of WQ sampling using these methods may increase due to sensor calibration, cleaning, and technical issues (Wang et al. 2020).
Many authors have demonstrated to efficiently assess and monitor the overall quality of inland water by computing water quality index (WQI) using various WQ parameters (Asadi et al. 2007). WQI is a mathematical tool that incorporates the calculation through various water characteristics into a single value (Yogendra & Puttaiah 2008). Brown et al. (1972) introduced the concept of WQI initially. The different WQIs used worldwide are Weighted Arithmetic Water Quality Index (WA-WQI), British Columbia Water Quality Index (BC-WQI), Canadian Council of Ministers of the Environment Water Quality Index (CCME-WQI), Oregon Water Quality Index (O-WQI), and the US National Sanitation Foundation Water Quality Index (NSF-WQI) (Kumar et al. 2020). Gupta et al. (2017) studied the effects of eight physical, biological, and chemical parameters at six sites to compute the WQI along the Narmada River at Madhya Pradesh, India, using three methods (NSF-WQI, CCME-WQI, and WA-WQI) and found that the WQ is good in summer and winter seasons while poor in the monsoon season. Roy et al. (2021), with 11 parameters at 18 sites using Comprehensive Pollution Index and Eutrophication Index values, found that the WQ of the Shilabati River, West Bengal, India, is poor and severely polluted. An et al. (2015) presented the successful application of principal component analysis (PCA) as one of the soft-computing technique to reduce dimensionality to assess the WQ. Ibrahim et al. (2023) investigated the most significant among 22 WQ parameters that were affecting the WQ and suggested the decision-making model using the PCA by developing the best inputs. Benkov et al. (2023) proposed valuable information through studying the Struma River Catchment, Bulgaria, with the trend analysis of computed WQI values using the CCME-WQI method and identified the latent factors that are controlling WQ using the PCA. Ghoderao et al. (2022) computed the groundwater quality index using two methods by applying varimax rotation factor analysis at five different sites with 11 WQ parameters and found that three sites were not suitable for drinking condition, one site had moderate drinking condition, and one site had excellent drinking condition.
This study is structured in three parts to compute the WQI of inland water body in Vapi region, Gujarat, India. (1) The raw dataset was pre-processed using the quartile deviation technique. (2) The application of PCA followed by varimax rotation factor analysis was done after standardizing the raw dataset. (3) The WQI was computed based on the identification of significantly influencing parameters using two methods, i.e., CCME-WQI and BC-WQI.
Contextualizing the research problem
Vapi is a city and municipal corporation on the banks of the Daman Ganga River in Pardi, Valsad district, situated at the southern end of Gujarat. As it is home to numerous chemical industries and the factories of some well known brands, it is also referred to as the ‘Chemical City of Gujarat’. The city is home to several chemical companies as well as the factories of some well-known brands. This study proposes the following:
(1) To examine the quality of inland water body through the analysis of 17 essential physicochemical parameters.
(2) To enhance the CCME-WQI and BC-WQI models through the implementation of PCA followed by factor analysis to more accurately reflect the unique WQ conditions of the rivers.
(3) To determine the elements regulated by the most critical WQ parameters that affect the overall WQ and apply PCA to allocate suitable weights for a more precise evaluation.
(4) To analyze the trends in WQ over time to gain insights into long-term changes and guide future water management strategies.
(5) To present the practical insights into environmental authorities and policymakers for the betterment of the region's WQ management and monitoring programs.
MATERIALS AND METHODS
Study area and data source
As shown in Figure 1 Vapi is a city and municipal corporation on the banks of the Daman Ganga River in Pardi, Valsad district, situated at the southern end of Gujarat. It is known as the ‘Chemical City of Gujarat’. The city is home to several chemical companies as well as the factories of some well-known brands. The basin is situated between 19° 51′ to 20° 28′ North latitude and 72° 50′ to 73° 38′ East longitude. The basin's total drainage area is 2,318 km2 that collects the annual rainfall of about 2,200 mm from the months of June to September.
Water sampling
WQ parameter data were provided by the Gujarat State Water Data Centre on request. The samples were collected on various dates from July 2000 to January 2019. The general observed frequency of collecting sample was once in 2 months. On various days, the general timing for collecting the water sample for testing varied from 8:00 AM to 12:30 PM. Twenty-five WQ parameters, including physical, chemical, and biological parameters, were included in the given data; of these, 17 were visually identifiable and should be considered during computation.
Determination of properties of water
The dataset of 25 parameters was available of which 17 physiochemical WQ parameters were taken into consideration for the study such as pH, electrical conductivity (EC), temperature (Temp), total dissolved solids (TDS), nitrogen oxides such as NO2 + NO3, total phosphate, calcium (Ca), magnesium (Mg), sodium (Na), potassium (K), chloride (Cl), SO4, CO3, HCO3, total hardness (HAR_Total), sodium absorption ratio (SAR), and calcium hardness (HAR_Ca). Table 1 presents the fundamental statistical analysis of the 17 parameters.
Statistical analysis of the 17 physiochemical WQ parameters with their details
Abbreviation . | Parameter . | Variable ID . | Unit . | Permissible limits (BIS) . | Mean . | Variance . | Standard deviation . |
---|---|---|---|---|---|---|---|
pH | pH | VAR01 | pH units | 6.5–8.5 | 6.16 | 1.30 | 1.14 |
EC | Electrical conductivity | VAR02 | μmho/cm | 1,000a | 2,609.90 | 6,119,447.85 | 2,473.75 |
Temp | Temperature | VAR03 | °C | 27 | 27.86 | 9.47 | 3.08 |
TDS | Total dissolved solids | VAR04 | mg/L | 500 | 3,828.71 | 11,638,887.84 | 3,411.58 |
NO2 + NO3 | Nitrogen, total oxidized | VAR05 | mg N/L | 45.00 | 18.93 | 363.65 | 19.07 |
P-Tot | Total phosphorous | VAR06 | mg P/L | 1.00 | 0.75 | 0.52 | 0.72 |
Ca | Calcium | VAR07 | mg/L | 75.00 | 188.04 | 32,795.96 | 181.10 |
Mg | Magnesium | VAR08 | mg/L | 0.10 | 83.35 | 5,266.24 | 72.57 |
Na | Sodium | VAR09 | mg/L | 200b | 975.35 | 901,149.09 | 942.29 |
K | Potassium | VAR10 | mg/L | 12b | 1.07 | 3.73 | 1.93 |
Cl | Chloride | VAR11 | mg/L | 250.00 | 1,456.69 | 2,023,581.63 | 1,422.53 |
SO4 | Sulfate | VAR12 | mg/L | 200.00 | 153.34 | 15,770.99 | 125.58 |
CO3 | Carbonate | VAR13 | mg/L | 30.00 | 5.71 | 58.01 | 7.62 |
HCO3 | Bicarbonate | VAR14 | mg/L | 25.00 | 158.01 | 12,208.27 | 110.49 |
HAR_Total | Total hardness | VAR15 | mg CaCO3/L | 600.00 | 817.23 | 368,763.11 | 607.26 |
SAR | Sodium absorption ratio | VAR16 | – | 3.00 | 14.12 | 199.91 | 14.14 |
HAR_Ca | Calcium hardness | VAR17 | mg CaCO3/L | 200.00 | 470.10 | 204,974.76 | 452.74 |
Abbreviation . | Parameter . | Variable ID . | Unit . | Permissible limits (BIS) . | Mean . | Variance . | Standard deviation . |
---|---|---|---|---|---|---|---|
pH | pH | VAR01 | pH units | 6.5–8.5 | 6.16 | 1.30 | 1.14 |
EC | Electrical conductivity | VAR02 | μmho/cm | 1,000a | 2,609.90 | 6,119,447.85 | 2,473.75 |
Temp | Temperature | VAR03 | °C | 27 | 27.86 | 9.47 | 3.08 |
TDS | Total dissolved solids | VAR04 | mg/L | 500 | 3,828.71 | 11,638,887.84 | 3,411.58 |
NO2 + NO3 | Nitrogen, total oxidized | VAR05 | mg N/L | 45.00 | 18.93 | 363.65 | 19.07 |
P-Tot | Total phosphorous | VAR06 | mg P/L | 1.00 | 0.75 | 0.52 | 0.72 |
Ca | Calcium | VAR07 | mg/L | 75.00 | 188.04 | 32,795.96 | 181.10 |
Mg | Magnesium | VAR08 | mg/L | 0.10 | 83.35 | 5,266.24 | 72.57 |
Na | Sodium | VAR09 | mg/L | 200b | 975.35 | 901,149.09 | 942.29 |
K | Potassium | VAR10 | mg/L | 12b | 1.07 | 3.73 | 1.93 |
Cl | Chloride | VAR11 | mg/L | 250.00 | 1,456.69 | 2,023,581.63 | 1,422.53 |
SO4 | Sulfate | VAR12 | mg/L | 200.00 | 153.34 | 15,770.99 | 125.58 |
CO3 | Carbonate | VAR13 | mg/L | 30.00 | 5.71 | 58.01 | 7.62 |
HCO3 | Bicarbonate | VAR14 | mg/L | 25.00 | 158.01 | 12,208.27 | 110.49 |
HAR_Total | Total hardness | VAR15 | mg CaCO3/L | 600.00 | 817.23 | 368,763.11 | 607.26 |
SAR | Sodium absorption ratio | VAR16 | – | 3.00 | 14.12 | 199.91 | 14.14 |
HAR_Ca | Calcium hardness | VAR17 | mg CaCO3/L | 200.00 | 470.10 | 204,974.76 | 452.74 |
aValues as per Environment Protection Agency.
bValues as per WHO.
Primary data and data treatment of WQ parameters
IBM® SPSS v.29 (International Business Machines Corporation Statistical Product and Service Solutions, Armonk, NY, USA), Online MATLAB® (Matrix Laboratory, MathWorks, Natick, AM, USA), and Microsoft® Excel (Redmond, Washington, DC, USA) for Windows were utilized for data processing. Simplified digital maps were developed using the ArcMap 10.8.2 GIS software (ESRI®; Environmental Systems Research Institute, Redlands, CA, USA).
Analyzing big datasets needs more consideration during processing and interpreting the results. Processing large datasets arbitrarily might lead to biased results that deviate the study from the anticipated trend. In addition, visual presentation of the big datasets may increase the complexity that may encounter the error (Sivarajah et al. 2017). This study deals with the data sample of 17 parameters each having a sample size of 143 numbers, i.e., a dataset of 2,431 numbers. There were outliers in the provided data sample of WQ parameters, suggesting that the dataset was noisy. Thus, the quartile deviation method was used to de-noise the sample of water parameters data. The concept of quartile deviation is to measure the deviation at the middle of the data. The values of WQ parameters laying in the range of upper bound and lower bound derived from the quartile deviation were considered for further processing (Kumar et al. 2021). This process leads to a final data sample of 56 in numbers for each 17 parameters, i.e., a dataset of 952 numbers as shown in Figure 2.
Comparison of raw dataset with standardized (z) dataset and centered (c) dataset of parameters: (a) HAR_Ca, (b) Temp, (c) SAR, (d) P-Tot, and (e) CO3.
Comparison of raw dataset with standardized (z) dataset and centered (c) dataset of parameters: (a) HAR_Ca, (b) Temp, (c) SAR, (d) P-Tot, and (e) CO3.
Principal component analysis
PCA is a method to find the linear combination that accounts for as much variability as possible. The fundamental of PCA is to find the optimal values of the weights to maximize the variance of the combined parameters. The reason behind maximizing the variance is that the variance can be seen as information. Thus, by combining parameters, the information can be kept as much as possible in combined variables. PCA can be used to reduce the number of dimensions or parameters in the dataset for further types of analysis (An et al. 2015). There are various methods based on which PCA can be computed such as singular value decomposition and eigendecomposition of the covariance matrix (Takane 2003). The scope of this article covers math based on the eigendecomposition of the covariance matrix to compute the PCA. With this research, we have perceived a transformation of 17 parameters into simply two components.
Correlation among the WQ parameters. Red represents a high correlation whereas blue represents low correlation.
Correlation among the WQ parameters. Red represents a high correlation whereas blue represents low correlation.
Since the values of parameters are relatively varying due to the change in the units and their permissible concentration in the water, the variance has shown relative disparity as shown in Table 1 and Figure 4. The eigenvalues and eigenvectors of a covariance matrix would then be computed using MATLAB software. The order of eigenvectors have been rearranged into a matrix based on the eigenvector corresponding to the largest eigenvalue, which becomes the first column of the matrix. The values of weight associated with the parameters in this instance are held by the eigenvectors; the values of five eigenvectors are presented in Table 2. The WQ parameters are combined using the values from the eigenvector as weights to transform the data into principal components.
Eigen vector matrix
Parameter . | EV1 . | EV2 . | EV3 . | EV4 . | EV5 . |
---|---|---|---|---|---|
pH | −0.16 | −0.38 | −0.11 | −0.25 | 0.16 |
EC | 0.20 | −0.01 | 0.27 | 0.48 | −0.21 |
Temp | 0.04 | 0.17 | −0.32 | 0.42 | 0.44 |
TDS | 0.29 | 0.01 | −0.04 | 0.09 | 0.22 |
NO2 + NO3 | 0.13 | −0.36 | −0.13 | 0.14 | −0.05 |
P-Tot | 0.17 | 0.20 | 0.17 | 0.18 | −0.54 |
Ca | 0.30 | −0.12 | 0.39 | −0.22 | 0.15 |
Mg | 0.27 | −0.16 | −0.17 | 0.29 | −0.02 |
Na | −0.35 | −0.10 | 0.29 | 0.24 | 0.11 |
K | 0.04 | −0.42 | −0.18 | 0.25 | 0.00 |
Cl | 0.35 | 0.08 | −0.28 | −0.24 | −0.13 |
SO4 | 0.24 | 0.10 | −0.01 | 0.07 | 0.46 |
CO3 | 0.12 | −0.50 | −0.15 | 0.05 | −0.27 |
HCO3 | −0.20 | −0.31 | −0.02 | −0.17 | 0.02 |
HAR_Total | 0.36 | −0.17 | 0.21 | −0.02 | 0.11 |
SAR | 0.26 | 0.17 | −0.41 | −0.27 | −0.14 |
HAR_Ca | 0.30 | −0.12 | 0.39 | −0.22 | 0.15 |
Parameter . | EV1 . | EV2 . | EV3 . | EV4 . | EV5 . |
---|---|---|---|---|---|
pH | −0.16 | −0.38 | −0.11 | −0.25 | 0.16 |
EC | 0.20 | −0.01 | 0.27 | 0.48 | −0.21 |
Temp | 0.04 | 0.17 | −0.32 | 0.42 | 0.44 |
TDS | 0.29 | 0.01 | −0.04 | 0.09 | 0.22 |
NO2 + NO3 | 0.13 | −0.36 | −0.13 | 0.14 | −0.05 |
P-Tot | 0.17 | 0.20 | 0.17 | 0.18 | −0.54 |
Ca | 0.30 | −0.12 | 0.39 | −0.22 | 0.15 |
Mg | 0.27 | −0.16 | −0.17 | 0.29 | −0.02 |
Na | −0.35 | −0.10 | 0.29 | 0.24 | 0.11 |
K | 0.04 | −0.42 | −0.18 | 0.25 | 0.00 |
Cl | 0.35 | 0.08 | −0.28 | −0.24 | −0.13 |
SO4 | 0.24 | 0.10 | −0.01 | 0.07 | 0.46 |
CO3 | 0.12 | −0.50 | −0.15 | 0.05 | −0.27 |
HCO3 | −0.20 | −0.31 | −0.02 | −0.17 | 0.02 |
HAR_Total | 0.36 | −0.17 | 0.21 | −0.02 | 0.11 |
SAR | 0.26 | 0.17 | −0.41 | −0.27 | −0.14 |
HAR_Ca | 0.30 | −0.12 | 0.39 | −0.22 | 0.15 |
One of the aims of this study is to examine which parameters are related to which principal components. A better understanding of principal components can be done by rotating them (Kilmer 2010). Rotation is a procedure in which the factors are rotated to achieve a simple structure. A simple structure means that each factor should have a few high loadings with the rest of the loading being zero or close to zero. Rotation methods can be categorized as either orthogonal or oblique methods. Orthogonal rotation methods assume that the factors in the analysis are uncorrelated. Varimax rotation method is one among the orthogonal methods. In this study, we will use the varimax rotation factor analysis method on the components. Varimax rotation is a method in addition to the PCA that increases the interpretation of our weights (Acal et al. 2020).
WQI assessment methods
The WQI is a thorough index to evaluate WQ that can be applied to measure the extent of pollution in the water (Brown et al. 1972; Khan et al. 2023). Diverse techniques have been developed by experts to determine the WQI (Debels et al. 2005). It uses mathematics to combine a huge volume of data on WQ into a single number that sums up the total WQ level and depicts the WQ (Şener et al. 2017). This study aims to compute the WQI for the Vapi district, Gujarat, India, in addition to the factor analysis method. On availing the onset of significant parameters, the WQI was computed using the CCME-WQI and BC-WQI methods.
RESULTS AND DISCUSSIONS
WQ parameter
According to the report of the Central Pollution Control Board (CPCB), the river Daman Ganga comes under the Priority 1 category which is having a biochemical oxygen demand (BOD) value of more than or equal to 30 mg/L.
As perceived from the Table 1, the content of 11 among 17 parameters such as EC, Temp, TDS, Ca, Mg, Cl, HCO3, Na, HAR_Ca, SAR, and HAR_Total in river water samples collected from the monitoring stations during the 2000–2019 sampling period exceeds the parametric value given by the Bureau of Indian Standards (BIS) and the World Health Organization (WHO). The temporal distribution of parameters content in the river sample is depicted in Figure 4. The elevated contents such as major cations (Ca, Mg, and Na), major anions (CO3, HCO3, and Cl), and dissolved solids concentration in the river water sample contribute to the major impurities in water bodies resulting in noxious impact on aquatic habitat (Kumar et al. 2022).
Furthermore, the significance of the WQ parameters can be understood with the help of a box plot and a loading plot developed through PCA. Figure 4(a)–(d) shows the box plot of the WQ parameters having values of raw dataset, whereas Figure 5(a)–(d) shows standardized dataset of WQ parameter. However, it was quite strenuous to extract information from the plot of the raw dataset, as each parameter has a different permissible limit and units of measurement. Using a standardized dataset, the level of consistency in the data was better understood as it reveals that parameters such as EC, TDS, Na, Cl, HCO3, HAR_Total, and HAR_Ca had spread more.
Principal component analysis
Jollife & Cadima (2016) suggested that during the evaluation process, the variances of the principal components correspond to the eigenvalues that we calculated. Thus, assigning the eigenvectors as weights, corresponding to the largest eigenvalues, has achieved the principle of arranging the dataset with new components with maximal variance. The weights can be interpreted as how much WQ parameters contribute to the principal components. The weights for the PC1 come from the first eigenvector with the highest eigenvalue, whereas the weights for the PC2 comes from the second eigenvector with the second highest eigenvalue. Given that HAR_Total has the highest absolute weight (0.36), it was given more weight while calculating PC1. In contrast, CO3 had the largest absolute weight (0.50), so it was given more weight while calculating PC2. We observed that the weights assigned to SAR, SO4, Na, Cl, TDS, EC, HAR_Ca, Ca, HAR_Total, and Mg parameters were more for computing the PC1, on the other hand weights assigned to HCO3, P-Tot, K, pH, Temp, CO3, and NO2 + NO3 parameters were more for computing the PC2.
Plot of PC1 vs WQ parameters: (a) PC1 vs zpH, (b) PC1 vs zEC, (c) PC2 vs zHCO3, and (d) PC2 vs zTemp.
Plot of PC1 vs WQ parameters: (a) PC1 vs zpH, (b) PC1 vs zEC, (c) PC2 vs zHCO3, and (d) PC2 vs zTemp.
PC1–PC5 captures the variance sum up to 75% as reflected from the scree plot. This implies that PC1–PC5 stores almost all information about the WQ parameters. In addition, the covariance of the PCA scores was equal to zero. This implies that all the principal components are completely uncorrelated to each other. Based on this, it was decided to use the orthogonal approach for the rotation factor analysis. However, PCA combines the variables in a way that optimizes the PCs' variance, which is advantageous for reducing the number of dimensions but disadvantageous when interpreting the components.
Furthermore, the analysis of the parameters using varimax rotation revealed the following findings as shown in Table 3: (a) Factor 1 (VF1), explaining 35.33% variance of the total variability, is a dipolar factor with high positive loadings (greater than +0.6) for HAR_Ca, Ca, and HAR_Total and moderate negative loading (−0.211) for Temp; (b) Factor 2 (VF2) accounts for 13.04% of the total variance and is a factor with high positive loadings for SAR, Na, and Cl (greater than +0.6) and moderate negative loading (−0.193) for pH; (c) Factor 3 (VF3), explaining 11.41% variance of the total variability, is a factor with high positive loadings (greater than +0.6) for CO3, K, and NO2 + NO3 and moderate negative loading (−0.194) for HCO3; (d) Factor 4 (VF4), explaining 7.889% variance of the total variability, is a factor with high positive loadings (greater than +0.6) for P-Tot and EC and high negative loading (−0.677) for pH; and (e) Factor 5 (VF5), explaining 6.51% variance of the total variability, is a factor with high positive loadings (greater than +0.6) for Temp only and moderate negative loading (−0.215) for pH.
Varimax rotation factor analysis matrix
S. No. . | Variable ID . | Parameter . | Varimax rotation factor . | ||||
---|---|---|---|---|---|---|---|
VF1 . | VF2 . | VF3 . | VF4 . | VF5 . | |||
1 | VAR17 | HAR_Ca | 0.955 | 0.143 | 0.048 | 0.141 | −0.093 |
2 | VAR07 | Ca | 0.955 | 0.143 | 0.048 | 0.141 | −0.093 |
3 | VAR15 | HAR_Total | 0.833 | 0.273 | 0.317 | 0.248 | 0.09 |
4 | VAR04 | TDS | 0.467 | 0.366 | 0.174 | 0.201 | 0.386 |
5 | VAR16 | SAR | 0.037 | 0.948 | 0.037 | 0.071 | 0.103 |
6 | VAR09 | Na | 0.294 | 0.918 | 0.125 | 0.165 | 0.116 |
7 | VAR11 | Cl | 0.295 | 0.913 | 0.153 | 0.168 | 0.092 |
8 | VAR13 | CO3 | 0.099 | 0.139 | 0.813 | −0.061 | −0.229 |
9 | VAR10 | K | −0.033 | −0.078 | 0.717 | −0.115 | 0.096 |
10 | VAR05 | NO2 + NO3 | 0.139 | 0.116 | 0.651 | −0.012 | 0.045 |
11 | VAR08 | Mg | 0.242 | 0.335 | 0.565 | 0.287 | 0.32 |
12 | VAR06 | P-Tot | 0.082 | 0.201 | −0.027 | 0.769 | −0.214 |
13 | VAR02 | EC | 0.324 | −0.146 | 0.271 | 0.727 | 0.114 |
14 | VAR01 | pH | −0.088 | −0.193 | 0.278 | −0.677 | −0.215 |
15 | VAR14 | HCO3 | 0.173 | 0.295 | −0.194 | 0.484 | 0.304 |
16 | VAR03 | Temp | −0.211 | 0.076 | 0.04 | 0.029 | 0.817 |
17 | VAR12 | SO4 | 0.487 | 0.252 | −0.027 | 0.064 | 0.553 |
Eigenvalue | 6.01 | 2.22 | 1.94 | 1.341 | 1.11 | ||
Variance % by component | 35.33 | 13.04 | 11.41 | 7.889 | 6.51 | ||
Cumulative variance % by component | 35.33 | 48.37 | 59.79 | 67.67 | 74.18 |
S. No. . | Variable ID . | Parameter . | Varimax rotation factor . | ||||
---|---|---|---|---|---|---|---|
VF1 . | VF2 . | VF3 . | VF4 . | VF5 . | |||
1 | VAR17 | HAR_Ca | 0.955 | 0.143 | 0.048 | 0.141 | −0.093 |
2 | VAR07 | Ca | 0.955 | 0.143 | 0.048 | 0.141 | −0.093 |
3 | VAR15 | HAR_Total | 0.833 | 0.273 | 0.317 | 0.248 | 0.09 |
4 | VAR04 | TDS | 0.467 | 0.366 | 0.174 | 0.201 | 0.386 |
5 | VAR16 | SAR | 0.037 | 0.948 | 0.037 | 0.071 | 0.103 |
6 | VAR09 | Na | 0.294 | 0.918 | 0.125 | 0.165 | 0.116 |
7 | VAR11 | Cl | 0.295 | 0.913 | 0.153 | 0.168 | 0.092 |
8 | VAR13 | CO3 | 0.099 | 0.139 | 0.813 | −0.061 | −0.229 |
9 | VAR10 | K | −0.033 | −0.078 | 0.717 | −0.115 | 0.096 |
10 | VAR05 | NO2 + NO3 | 0.139 | 0.116 | 0.651 | −0.012 | 0.045 |
11 | VAR08 | Mg | 0.242 | 0.335 | 0.565 | 0.287 | 0.32 |
12 | VAR06 | P-Tot | 0.082 | 0.201 | −0.027 | 0.769 | −0.214 |
13 | VAR02 | EC | 0.324 | −0.146 | 0.271 | 0.727 | 0.114 |
14 | VAR01 | pH | −0.088 | −0.193 | 0.278 | −0.677 | −0.215 |
15 | VAR14 | HCO3 | 0.173 | 0.295 | −0.194 | 0.484 | 0.304 |
16 | VAR03 | Temp | −0.211 | 0.076 | 0.04 | 0.029 | 0.817 |
17 | VAR12 | SO4 | 0.487 | 0.252 | −0.027 | 0.064 | 0.553 |
Eigenvalue | 6.01 | 2.22 | 1.94 | 1.341 | 1.11 | ||
Variance % by component | 35.33 | 13.04 | 11.41 | 7.889 | 6.51 | ||
Cumulative variance % by component | 35.33 | 48.37 | 59.79 | 67.67 | 74.18 |
VF1 is the most significant factor as it contributes to the highest proportion of the total variance among the parameters as shown in Table 3. The variability of HAR_Ca, Ca, and HAR_Total can be attributed to the process of mixing synthesized water with river water (Mahamat Nour et al. 2020). Also, the close relationship among HAR_Ca, Ca, and HAR_Total can be attributed to the presence of hardness in the water. Thus, VF1 expresses the hardness content of the river water samples determined by these parameters and can be referred to as ‘hardness factor’ for the examined datasets. VF1 identified the antipathetic relation between HAR_Ca, Ca, HAR_Total, and Temp in sampling campaigns indicating that Temp significantly influences these WQ parameters. In contrast, VF2 is the second most significant factor of the total variance as shown in Table 3. The fluctuation of SAR, Na, and Cl in greater quantities in the river sample might indicate the presence of ions and salts due to chemicals (Zaman et al. 2018). Thus, VF2 expresses the salinity content of the river water samples determined by these parameters and can be referred to as ‘salinity factor’ for the examined datasets. VF2 identified the antipathetic relation between SAR, Na, Cl, and pH in sampling campaigns indicating that pH significantly influences these WQ parameters. HAR_Ca, SAR, CO3, and Temp were selected from PC1 through PC4.
A total of 35.33% of the variation was explained by VF1, which was best reflected by the hardness caused by calcium (HAR_Ca). VF2 significantly affected SAR and accounted for 13.04% of the overall variance. With a positive loading of the greatest on CO3, VF3 explained 11.41% of the variance. VF4 accounted for 7.889% of the total variation and had a substantial loading on P-Tot. VF5 accounting for 6.51% of the total variance had significant loading on Temp.
Water quality index
Based on various key WQ parameters, the WQI delivers a single figure that represents the total WQ at a certain place and time.
The evaluated value of WQI according to the CCME was 42.35, as shown in Table 4. As per Table 5, the computed value indicates that WQI falls in a poor rank. The interpretation can be understood from the computed value that the conditions often depart from the natural or desirable levels. Although the computed value is near the marginal rank according to the CCME, it can be interpreted that the quality will not deteriorate until the danger level. Although as shown in Table 4 the computed WQI value according to the BC method was 63.29, as per Table 6, the WQI falls in the poor class and is near the borderline.
Computed WQI values for the river water sample
S. No. . | Computed WQI . | Computation method . | Remarks . |
---|---|---|---|
1 | 42.35 | CCME-WQI | Poor |
2 | 63.29 | BC-WQI | Poor |
S. No. . | Computed WQI . | Computation method . | Remarks . |
---|---|---|---|
1 | 42.35 | CCME-WQI | Poor |
2 | 63.29 | BC-WQI | Poor |
WQI classification as per CCME
Rank . | WQI . | Value . | Description . |
---|---|---|---|
1 | Excellent | 95–100 | Water quality is protected with a virtual absence of threat or impairment |
2 | Good | 80–94 | Water quality is protected with only a minor degree of threat or impairment |
3 | Fair | 65–79 | Water quality is usually protected but occasionally threatened or impaired |
4 | Marginal | 45–64 | Water quality is frequently threatened or impaired |
5 | Poor | 0–44 | Water quality is almost always threatened or impaired |
Rank . | WQI . | Value . | Description . |
---|---|---|---|
1 | Excellent | 95–100 | Water quality is protected with a virtual absence of threat or impairment |
2 | Good | 80–94 | Water quality is protected with only a minor degree of threat or impairment |
3 | Fair | 65–79 | Water quality is usually protected but occasionally threatened or impaired |
4 | Marginal | 45–64 | Water quality is frequently threatened or impaired |
5 | Poor | 0–44 | Water quality is almost always threatened or impaired |
WQI classification as per BC-WQI
Rank . | Description of WQI . | Index value . |
---|---|---|
1 | Excellent | 0–3 |
2 | Good | 04–17 |
3 | Fair | 18–43 |
4 | Borderline | 44–59 |
5 | Poor | 60–100 |
Rank . | Description of WQI . | Index value . |
---|---|---|
1 | Excellent | 0–3 |
2 | Good | 04–17 |
3 | Fair | 18–43 |
4 | Borderline | 44–59 |
5 | Poor | 60–100 |
Represents the comparison of various WQI methods
WQI methods . | Advantages . | Disadvantages . |
---|---|---|
NSF-WQI | The aggregation method is straightforward and easy to use. It uses a reduced number of WQ parameters. | Assesses WQ by using individual parameter weights assigned by experts, which can be subjective and prone to an ‘eclipsing’ effect – where a single parameter disproportionately influences the overall score – creating sensitivity issues and limiting the method's ability to reflect the effects of individual WQ parameters. |
CCME-WQI | The method is more objective based and allows flexibility regarding the type and amount of WQ parameters used; chosen based on the water's utilization purpose and data availability. | The method is more complex because it has to calculate the objectives F1, F2, and F3 values. The approach requires more sampling and testing of data. |
Oregon Water Quality Index | Equal weighting is more suitable to determine the surface water's quality for general use. | Sub-indices equation is too ideal for the river in the method and prone to an ‘ambiguity’ effect. |
WQI methods . | Advantages . | Disadvantages . |
---|---|---|
NSF-WQI | The aggregation method is straightforward and easy to use. It uses a reduced number of WQ parameters. | Assesses WQ by using individual parameter weights assigned by experts, which can be subjective and prone to an ‘eclipsing’ effect – where a single parameter disproportionately influences the overall score – creating sensitivity issues and limiting the method's ability to reflect the effects of individual WQ parameters. |
CCME-WQI | The method is more objective based and allows flexibility regarding the type and amount of WQ parameters used; chosen based on the water's utilization purpose and data availability. | The method is more complex because it has to calculate the objectives F1, F2, and F3 values. The approach requires more sampling and testing of data. |
Oregon Water Quality Index | Equal weighting is more suitable to determine the surface water's quality for general use. | Sub-indices equation is too ideal for the river in the method and prone to an ‘ambiguity’ effect. |
A closer examination of the factors that cause these relative high index values indicates that objective exceedances occurred for four parameters among five significant parameters such as Temp, P-Tot, HAR_Total, and SAR. The objectives of CO3 were never exceeded. In addition, exceedances observed in the parameter P-Tot were fairly minor, whereas exceedances in SAR, Temp, and HAR_Ca were high. These exceedances in the parameters resulted in high index values as computed by CCME-WQI and BC-WQI.
Benkov et al. (2023) proposed the estimation of WQI as one of the significant tasks to consider for the environmental agencies. The approach revealed latent factors impacting WQ, estimated using CCME-WQI method integrated with PCA. Nath Roy et al. (2024) focused on a study on four Dhaka-based rivers that revealed the reduction in the subjectivity of WQI models using the PCA approach. Also, the dilatation caused by local rainfall was the reason for the higher values in the WQI trend throughout the rivers during the wet season. Guenouche et al. (2024) assessed the WQI using 16 physiochemical parameters using PCA to develop inter-relationship between the parameters that identified distinct characteristics of various study area sites. Dutta et al. (2018) clearly demonstrated the application of PCA and cluster analysis with WQI to categorize the analyzed trend into four major polluting factors: (1) mineral and nutrient pollution, (2) heavy metal pollution, (3) organic pollution, and (4) fecal contamination.
Generalizability of findings and external validity
According to several studies released by different media outlets, industries that have been established are those that employ fly ashes and silos to manufacture industrial chemicals, dyes, pulp, paper, board, work involving fluoride, and metals near the Daman Ganga River, Vapi. These kinds of industrial effluents may have an effect on river health. Through this study, the categorized factors are developed as VF1, indicating the influence of hardness factor, and VF2, expressing the dominance of salinity.
Wang et al. (2022) proposed four processes that result in the majority of wastewater production: pretreatment, dyeing, printing, and functional finishing. Chlorine, hardness, and pH are components of a definite parameter that is used to characterize effluent from textile, pulp, paper, and board processes. Also, Chockalingam et al. (2019) observed high levels of hardness in the effluents collected from the textile industries of Tiruppur, which resulted in increased alkalinity and pH in the nearby environments. Hereby, the study successfully determines that VF1 identified the antipathetic relation between HAR_Ca, Ca, and HAR_Total.
In addition, numerous industrial operations, like those in the chemical industry, pharmaceutical processing, and papermaking, produce significant volumes of high-salinity wastewater containing complex constituents and contaminants that are difficult to decompose. Direct discharge will result in the introduction of garbage and a significant amount of potential salt resources into the river water body (Guo et al. 2023). Zhang et al. (2024) gave an overview of the WQ characteristics of saline wastewater discharged from various industrial sectors and examined the consequences of increasing salinity on various treatment methods. Large-scale turbidity discharge and a high pH are typical characteristics of wastewater from textile printing and dyeing processes, according to research (Xu et al. 2018). Hereby, this study successfully determines that VF2 identified the antipathetic relation between SAR, Na, Cl, and pH in sampling campaigns indicating that pH significantly influences these WQ parameters.
By effectively modifying the CCME-WQI using PCA and the varimax factor analysis method, the viewpoints were articulated, as HAR_Ca, SAR, CO3, Temp, and P-Tot are important components that aid in identifying the WQI based on the performed methodology.
Although the present method works well for reducing parameters, there are several drawbacks to the existing methodology. The use of PCA might involve subjectivity, especially when deciding how many principal components to use. Simpler methods such as NSF-WQI and WA-WQI offer more accessible and interpretable results with less precision, making the application of PCA with factor analysis approach potentially more practical for policy and decision-making processes.
Any WQI model development, nevertheless, has several drawbacks. The applicability, precision, and reliability of WQI models are constrained, and machine learning techniques may be utilized to estimate and forecast these. Uncertainty in the model may result from sampling numbers and ecological variables that fluctuate over a good range for a short period of time. Furthermore, there are instances where time and space constrain the WQI models.
Study significance: exploring aspects and relationships
Traditional WQI techniques (comparison as shown in Table 7), including the NSF-WQI and the WA-WQI, frequently depend on arbitrary weighting or presume that all parameters are equally important (Marselina et al. 2022). These presumptions are challenged by this study, which shows that PCA can use statistical analysis to objectively determine each parameter's significance. With this method, WQI calculations become more objective and regionally specific, offering more specialized insights for efficient water management techniques. Moreover, in this study, the employed CCME-WQI method introduces a dynamic approach by incorporating factors such as scope, frequency, and amplitude, which allows for a more nuanced understanding of fluctuations in WQ data with the established guidelines. This is in contrast with techniques such as NSF-WQI and WA-WQI, which rely on predefined weights and might not adequately account for the temporal fluctuations present in WQ.
Due to the available point source or non-point source pollution loading, each location has distinct geographical features that have an impact on the river body. Also, each river has distinct geomorphological features that influence the surrounding hydrosphere's ecosystem. Hereby, this study offers several novel aspects that enhance knowledge and methods for evaluating the WQ, especially in areas where industrial activity is present. One significant advancement is the use of varimax rotation analysis in conjunction with PCA, which makes it possible to identify and appropriately weigh important WQ factors. Particularly, in complex environments like the Vapi region where data were available for 17 WQ parameters, the accuracy of WQI measurements is improved through this advanced data-driven approach. The incorporation of two different WQI approaches, CCME-WQI and BC-WQI, within the same research framework is another addition. This dual technique offers improved reliability and depth of research by enabling a thorough comparison of WQ outcomes. In addition, the study offers innovation by addressing a topic that is often overlooked in research, with its focus on the highly industrialized Vapi region and its analysis of nearly two decades of data (2000–2019). This extensive study provides novel insights into the short-term impacts of industrial pollution. In addition, the evaluation procedure is made simpler by concentrating on the most important variables by utilizing PCA to select five significant WQ parameters: HAR_Ca, SAR, CO3, Temp, and P-Tot. Taking advantage of quantile deviation for preprocessing data enhances the study substantially by eliminating noise and outliers from the data and ensuring a reliable and accurate dataset. Finally, by providing insights into the effects of industrial pollution and directing focused determinations for sustainable water resource management, the study advances environmental and public health policies.
CONCLUSIONS
Summary of key findings
The units of WQ parameters selected have shown impact on how the variables in the PCA are handled when we have parameters with various units. The PCA extraction method followed by varimax rotation has appropriately classified the patterns in the data without a theory. The highly influenced parameters that account for the greatest proportion of total variance in the WQ datasets include HAR_Total, SAR, CO3, Temp, and P-Tot. The contents of high SAR concentrations affect the salinity of water which in turn increases the struggle for the direct application of water in irrigation. Also Ding et al. (2023) demonstrated through experiments how SAR radically changes the composition of bacterial communities in soil when applied for irrigation. Despite the high value of HAR_Total and CO3 categories, the impact of hardness factor is more in the river water. Tiwari & Bajpai (2012) suggested that constant usage of very hard/hard water can lead to several health issues, so consumption can be done after due treatment. The large amount of utilization of the chemicals in the Vapi district of Gujarat may indicate that the disposal of treated water from industries or nearby water treatment plants in the river may lead to increase in such type of factors. The combined activities influenced by humans and natural factors can result in the accumulation of P-Tot in inland water. Based on the results of varimax rotation factor analysis of the datasets, it was found that the major portion of the river water get influenced by the two categorized factors such as ‘hardness factor’ and ‘salinity factor’. Also, it was revealed through statistical analysis that the major impact on hardness factor was observed by the change in temperature values, whereas the major impact on salinity factor was observed by the change in the pH values. Moreover, as it comes to the contamination of the aquatic environment, it might be challenging to intuitively determine the relative importance of each factor (Zhang et al. 2023).
CCME-WQI classifies the inland river water as belonging to the poor category which implies that the WQ is always threatened or impaired and the same can be observed from the BC-WQI. However, it is important to have exact information about the water parameters to find the source of pollution. The limitation to computing WQI for any region is that they can be used for managing water bodies, rather it should not be utilized in place of a thorough review of environmental modeling. It can, however, offer a comprehensive summary of environmental performance. As per the investigation of this study, comparing the index values with different methods for the river water samples can therefore suggest applying regular audits by competent authorities of the region on water withdrawal and its quality issues suggested. Also, WQ monitoring and management should be prioritized to safeguard the water resources from contamination and provide technologies to make water suitable for residential and drinking uses.
Key takeaway points from our recent study on the WQ of the Daman Ganga River in Vapi, Gujarat, are as follows:
(1) The varimax rotation enhances the study by reducing the ambiguity, making the loadings more distinct. This aided in clearly identifying the variables that significantly contribute to each factor.
(2) Our analysis identified that the most significant factors affecting WQ are related to hardness (HAR_Ca, Ca, HAR_Total) and salinity (SAR, Na, Cl), with pH also playing crucial roles.
(3) These findings underscore the importance of monitoring these specific parameters to effectively manage and improve the WQ.
Implications for health, efficiency, and economics
Encroachment of high salinity and hardness in the water directly impact the nearby soil type. This may impact the future land use pattern and may affect the nearby available agricultural and forest areas. Various possible cases are observed, where one of the major potential implication is that the end of the Daman Ganga River is at the Arabian Sea; hence, pollution in the Daman Ganga River will also affect the aquatic ecosystem present in the seawater.
Both ecological systems and human health are significantly impacted by the WQI. Increased levels of SAR, HAR_Ca, and P-Tot indicate poor WQ, which contributes to waterborne diseases that affect both human and agricultural health. In terms of ecology, these pollutants change the quantities of dissolved oxygen, which impacts fish and plants, and too much phosphorus causes eutrophication, which further disturbs biodiversity. Therefore, WQI aids in the early detection of these threats, directing strategies to lower pollution and safeguard ecosystems.
Although unique to Vapi, the study's conclusions can be applied to other industrial areas with comparable pollution levels. The elevated HAR_Ca, SAR, and P-Tot values in Vapi are indicative of pollutants found in many industrial areas. A strong framework is provided by the combination of PCA and factor analysis with the WQI; nevertheless, parameter selection and weighting may need to be adjusted to accommodate local circumstances. This flexible approach makes the methodology useful as a template while taking into account distinct geographic and industrial situations across various regions, allowing for region-specific WQ assessments and efficient environmental management measures.
There are significant financial and environmental advantages to using the WQI as a benchmark, particularly in industrial locations like Vapi. By encouraging industries to enhance wastewater management, minimizing the load of contamination on public water systems, and lowering treatment costs, WQI can result in cost savings in water treatment. In addition, industrial compliance is driven by WQI requirements, which encourage businesses to implement pollution control systems to stay out of trouble. Improved WQ increases production efficiency and quality, which benefits industries like agriculture and medicines that rely on clean water. In addition, better WQ benefits communities and lessens financial burdens while lowering healthcare expenses by minimizing waterborne illnesses.
Recommendations for future research
Geographic Information Systems and WQI models should be integrated in future studies to evaluate and visualize the effects of urban stressors on WQ. Using machine learning techniques might improve the accuracy of predictions and categorization. It is crucial to look at how environmental factors affect the patterns of WQ as well as how WQ standards have changed over time and incorporate biological indicators for a comprehensive evaluation. In addition, concentrating on pollution sources and seasonal fluctuations would facilitate efficient control. It is possible to improve evaluations and educate sustainable water resource management methods for ecosystem preservation by implementing advanced technologies such as remote sensing and real-time monitoring.
Stakeholder implications and societal impact
For instance, studies from other industrial regions have also identified hardness, salinity, and chemical pollutants as dominant factors in determining WQ, which aligns with this study's identification of HAR_Ca, SAR, and P-Tot as key contributors, requiring immediate attention. By providing a clear and quantifiable assessment of WQ in the Vapi region, the study equips policymakers with essential information for informed decision-making regarding water management strategies, allowing policymakers
(1) to educate local communities about the management of water resources,
(2) to promote environmental supervision for sustainable management practices in industrialized regions,
(3) to implement effective strategies that safeguard WQ, specifically considering the significantly impacting parameters on priority,
(4) to ensure the integrity and diligence with strict oversight mechanisms, and
(5) to study the impact assessment on the ground water due to seepage through available impurities in surface flow.
ACKNOWLEDGEMENT
The authors acknowledge the assistance provided by the State Water Data Centre, Gandhinagar, Government of Gujarat.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.