The Mid-Gangetic Plain, a vital farmland in India, faces increasing groundwater quality deterioration due to anthropogenic activities. This study aimed to assess groundwater quality and contamination sources in the region utilizing statistical methods. A total of 78 groundwater samples were collected and analyzed using standard methods. The hydrochemistry analysis of samples revealed that several parameters such as Ca2+, Mg2+, HCO3, NO3, F and PO43− surpassed the limits prescribed by the Bureau of Indian Standards (BIS). The principal component analysis yielded three significant factors, explaining 68.96% variation, highlighting geogenic and anthropogenic influences on groundwater chemistry. Hierarchical cluster analysis categorized groundwater into three clusters based on the parameters with similar trends of variation. Furthermore, discriminant analysis identified four significant variables (Mg2+, F, Cl and NO3) responsible for creating the distinction among the identified clusters. Hydrogeochemical categorization and multivariate statistical analyses indicated that rock–water interaction, weathering, leaching and anthropogenic activities collectively influenced groundwater quality throughout the studied region. The Water Quality Index reveals that 59% of samples have good water quality, while 41% exhibit poor quality predominantly concentrated in the south-western, south-eastern and central regions. This study demonstrates the efficacy of statistical techniques to interpret complex datasets and grasp water quality dynamics, enhancing groundwater management.

  • Statistical analysis revealed that geogenic and anthropogenic factors significantly influenced the groundwater chemistry.

  • Discriminant analysis identified Mg2+, F, Cl and NO3 as key variables in shaping distinct water quality clusters.

  • The WQI revealed that 59% of samples have good water quality, while 41% exhibit poor quality predominantly concentrated in the south-western, south-eastern, and central regions.

GW

groundwater

PCA

principal component analysis

HCA

hierarchal cluster analysis

DA

discriminant analysis

WQI

Water Quality Index

GIS

Geographical Information System

VF

varifactor

WHO

World Health Organization

BIS

Bureau of Indian Standards

APHA

American Public Health Association

CGWB

Central Groundwater Board

Groundwater (GW) stands as a widely recognized invaluable resource globally, serving as a crucial source for a multitude of applications. It is an essential part of the hydrological cycle and provides a steady supply of fresh water for use in agriculture, industry and home usage. Yet, a range of human and natural elements can lead to the deterioration and pollution of GW, rendering it susceptible to adverse impacts. According to a report published in the year 2019 by the WHO/UNICEF Joint Monitoring Program for Water Supply, global access to safe drinking water increased from 61 to 71% between 2000 and 2017. However, 2.2 billion people still lacked safe drinking water, with 785 million lacking basic services. In India, with a population of 1.4 billion, 35 million individuals do not have access to clean drinking water, and 678 million lack access to proper sanitation facilities (Water.org 2023). India, the largest consumer of GW globally, using approximately 251 billion cubic meters annually which accounts for more than one-quarter of the global total, faces challenges due to unregulated usage, leading to dried-up wells and declining water levels (Niti Aayog 2021). GW is predominantly used in the agricultural sector (87%), followed by domestic (10%) and industrial (3%) sectors (Alam et al. 2024). GW resources are the predominant source of potable water in most of the rural regions in India since there is no access or limited access to public water supply or a reliable supply of clean drinking water. Furthermore, inhabitants in metropolitan areas are increasingly dependent on GW because of the inconsistent and insufficient availability of water resources (Sreedhar et al. 2019). The Gangetic plains in India, are currently facing significant threats from GW contamination due to both natural and anthropogenic inputs (Mandal et al. 2019; Mukherjee et al. 2020). The decline in GW quality is a result of increasing GW withdrawals, water constraints, rapid development, extensive industrialization and urbanization (Alam & Kumar 2023; Kumar et al. 2019a, 2019b, 2023). Numerous geogenic processes, such as bedrock degradation and mineral dissolution, have been associated with the degradation of GW quality. Along with geogenic processes, agricultural activities, and industrial and municipal discharge also influenced the GW (Maity et al. 2020; Alam & Singh 2022). These activities lead to the introduction of a variety of trace elements and pollutants such as nitrate, fluoride and arsenic and have a direct effect on the physicochemical characteristics of GW (Chen et al. 2017; Kumar et al. 2022). Due to their potential to harm human health, these pollutants represent a serious threat. Consequently, quantifying the extent of GW pollution necessitates a comprehensive assessment.

The Water Quality Index (WQI) is a numerical indicator well acknowledged as a technique that confers a broad evaluation of water quality by utilizing the physiochemical properties of water. Owing to its versatility, adaptability and statistical compactness it facilitates the interpretation of complex water quality data and has gained significant interest among researchers (Alam & Singh 2022; Hinge et al. 2022; Gad et al. 2023). The emergence of GIS represents a valuable resource for showcasing the precision of water quality mapping. In previous studies, several researchers have explored the assessment of GW quality under various conditions by employing GIS (Nas & Berktay 2010; Marko et al. 2014; Assiuti & Governorate 2020; Gad et al. 2023). The integration of the WQI with GIS enables the assessment of water quality in inaccessible or geographically challenging areas. Utilizing the water quality conditions of neighboring gauzed sites, extrapolates the water quality at ungauged regions (Alam et al. 2024). Water quality analysis has been the subject of several researches, whereby various multivariate statistical approaches such as principal component analysis (PCA) and hierarchal cluster analysis (HCA) have been used. These methods have been effectively utilized in numerous recent studies to investigate the factors contributing to water pollution and gain a comprehensive understanding of water quality dynamics (Kumar et al. 2018; Das et al. 2019; Sreedhar et al. 2019; Govind et al. 2021; Panghal & Bhateria 2021; Hinge et al. 2022; Mishra & Lal 2023). In addition to HCA, in the current study, discriminant analysis (DA) is also applied to check the reliability of HCA. DA is a statistical technique which identifies the pattern among complex dataset and identifies the most significant variables responsible for creating the group differences. There is very few research where DA has been applied to determine the most influential parameter which are responsible for cluster pattern formation (Ajorlo et al. 2013; Yang et al. 2016; Chen et al. 2018; Masood et al. 2022). A study conducted by Alam & Singh (2023) in the Gaya district of south Bihar utilized PCA and HCA for source appointment of contamination in a regional aquifer. The author identified that over-exploitation of GW triggered the geogenic activities which led to a rise in the level of fluoride across the region. Govind et al. (2021) conducted a study in the Arwal and Jehanabad districts of south Bihar to check the aptness of GW for agriculture and human consumption. They utilized the WQI and PCA to determine the aptness of GW for consumption and to determine the probable sources of contamination respectively. They concluded that a significant part of Mid-Gangetic plains in south Bihar has fluoride levels above prescribed limits and GW of the region is adversely affected by lithological sources as well as anthropogenic sources.

The selected study area lies in the Mid-Gangetic Plain of south Bihar which is considered as a significant farmland in India. As per the census (2011), a population of about 2.5 million lives within the study area and GW is the sole supply of drinking water for this population. The water quality in the research region may have been adversely affected because of inadequate sewage drainage infrastructure, significant agricultural practices, and over-exploitation of GW resources. People have been greatly impacted by GW contamination. Based on the information provided in an earlier section we identified that one significant research gap is the absence of a baseline assessment of GW quality in the study area. The hydrochemistry of GW in the Nalanda district has not been reported and analyzed using statistical methods, and there has been no effort made to identify the potential contamination sources. Along with baseline assessment, understanding the sources and causes of GW contamination is crucial for effective management and mitigation strategies. A comprehensive study to establish the status of GW quality, identification of pollution patterns and pollution source characterization is essential.

This paper aims to fill the identified research gap. The objective of the current research is to evaluate the status of GW quality and check its suitability for human consumption by utilizing the WQI. It also seeks to identify potential contamination sources and factors affecting GW quality variation using multivariate statistical techniques, while determining GW quality distribution patterns through GIS analysis. This assessment offers vital insights for planners, authorities, and policymakers regarding current GW quality for drinking purposes, aiding in the development of effective management strategies applicable not only to our specific region but also to similar geographic areas facing similar challenges worldwide. It also demonstrates the innovative use of multivariate statistical techniques to interpret complex data and understand water quality dynamics.

Description and hydrogeological settings of the study area

The study area, Nalanda district (Figure 1), is situated around 80 km from Patna, the capital of Bihar State, India. The district is home to the UNESCO World Heritage Site known as the Nalanda Mahavihara. The Nalanda district is in the Mid-Ganga basin, on the Gangetic plains' southern boundary. The latitude and longitude of the district boundaries lie between 24°57′57.78″N and 25°27′39.636″N and 85°9′54.9″E and 85°55′27.084″E, respectively. Except for the Rajgir hills located in the southern part of the district, it is mostly flat alluvial land and has an area of around 2,367 km2. The drainage of the research region is mostly governed by the Ganga. The entire study area consists of two geological units namely quaternary-sediments (Q) in the Southern most regions and undivided Precambrian rocks (PC) in the north (Figure 1). The major soil varieties in Nalanda district are clay loam, loam, fine loam, and coarse loam. The district receives 1,002.2 mm of rain on average per year, with the bulk (92.55%) falling from June to October (CGWB 2013). Nalanda's weather could be categorized as sub-tropical to sub-humid, with very cold winters and scorching summers (CGWB 2013).
Figure 1

Hydrogeological map with sampling locations of Nalanda, district.

Figure 1

Hydrogeological map with sampling locations of Nalanda, district.

Close modal

Water sample collection and analysis

To choose the sampling locations, a systematic grab sampling strategy was employed by dividing the entire research area into 5 × 5 km grids (Kumar et al. 2018). After that, the selection of sampling locations within each grid was done in a manner that ensured representation of the entire area and the livelihoods associated with it. A total of 78 samples (n = 78) of GW were collected across different marked sampling locations during the month of June 2022 in the pre-monsoon season in Nalanda district, India (Figure 1). The entire set of samples was acquired from unconfined subsurface sources, comprising bore wells and hand pumps with depths varying between 5 and 25 m throughout the study area. High-density polyethylene screw-capped bottles of capacity 1 L were pre-sterilized and cleaned with deionized water and were utilized to collect GW samples. Prior to sampling, the source of sampling underwent flushing for 3–5 min to remove any stagnated water, and the sampling bottles were rinsed twice using source water to avoid any possible outer contamination. Immediately following sampling, parameters including pH and electrical conductivity (EC) were promptly assessed in situ utilizing a Thermo-Scientific Orion Star A2140 (S.no-X34633) multiparameter kit. The obtained samples were then passed through filtration using a 0.45-micron membrane syringe filter. A portion of filtered samples underwent acidification to avoid any possible wall deposition to preserve the chemical integrity of the water sample and prevent reactions that occur during the storage (APHA 2017; Kumar et al. 2019a, 2019b). To avoid matrix decomposition, the non-acidified samples were stored away from light at 4 °C, until the assessment was finished (APHA 2017). The samples of GW were examined for their physiochemical characteristics, such as pH, EC, total dissolved solids (TDS), total hardness (TH), total alkalinity (TA), dissolved oxygen (DO), calcium (Ca2+), magnesium (Mg2+), sulfate (SO42−), phosphate (PO43−), nitrate (NO3−), fluoride (F), chloride (Cl), and bicarbonate (HCO3). The quantification of Ca2+ and Mg2+ in the samples was assessed through the conventional EDTA titrimetric methods, as outlined in APHA (2017) guidelines. Chloride levels were determined using the argentometric method and bicarbonate was determined through titration method. Sulfate and nitrate levels were determined utilizing a UV-spectrophotometer (Thermo-Scientific Evolution-201, S.no-V05044). The concentration level of fluoride was quantified via potentiometric analysis using an ion selective electrode (Thermo-scientific, electrode Sno. UQ1-10725) (Kumar & Kumar 2015). The American Public Health Association's recommendations for the study of GW samples were followed in conducting the quantification and analysis of these parameters. These techniques guarantee the correctness, dependability, and comparability of the results.

Data analysis and methodology

The analytical result was represented by descriptive statistics, which were then compared to the standard limits set by the BIS (2012) manual. The study used Pearson's correlation analysis to examine the associations among the physiochemical characteristics of the samples. All statistical and data analyses were performed using XLSTATS and IBM SPSS STATICS v26. All the maps were drawn using ARCGIS 10.8. Figure 2 presents a detailed methodical flowchart that illustrates the many procedures involved in assessing the quality of GW for consumption.
Figure 2

Methodological flowchart.

Figure 2

Methodological flowchart.

Close modal

Principal component analysis

Multivariate statistical methods are beneficial for establishing relationships and finding links among extensive and diverse datasets. PCA is a statistical approach employed to reduce dimensions (Jolliffe & Cadima 2016; Hinge et al. 2022). PCA reduces numerous potentially linked variables into a reduced set of linearly uncorrelated variables referred to as principal components (PC). This transformation preserves the majority of information included in the raw dataset. Following PCA, the subsequent procedure is factor analysis (FA), which aims to further optimize the effectiveness of the data acquired via PCA. In PCA, weighted observable variables are combined linearly; the weights are established using eigenvalue calculations, and only components having eigenvalues higher than 1 are retained (Alam & Singh 2023).

Hierarchal cluster analysis

HCA is a data mining method which groups the cases or observations on the basis of similarities in measurable variables. Its goal is to find natural clusters or groups among a set of information. HCA is a popular way of clustering water quality characteristics in GW. This approach divides variables into clusters based on their commonalities, with each cluster reflecting a distinct process inside the system. The technique operates on the principle of building a binary data tree, gradually merging similar point groups to achieve significant homogeneity within clusters and noticeable heterogeneity between clusters (Zhou et al. 2007). By finding these clusters, we gain understanding of the underlying causes of trends and associations in the data being examined (Sreedhar et al. 2019). In the current study, HCA was conducted on the experimental data with the aid of Ward's linkage technique. The results are depicted in a 2-D plot known as a dendrogram which displays the relationships between groups and their closeness. This dendrogram offers a simplified representation of the dimensionality of the original data.

Discriminant analysis

DA is a commonly used multivariate statistical technique. DA encompasses a set of ordination methods that identify linear combinations of observed variables, optimizing the segregation of samples into distinct classes. Its fundamental concept involves summarizing the uniform pattern of the dataset and subsequently creating discriminant functions (DFs) which are linear combinations of discriminatory parameters which are called conical functions and it is utilized to classify new data into distinct groups (Ajorlo et al. 2013; Chen et al. 2018). It produces a discriminant function (DF) for each group in the following manner (Masood et al. 2022):
(1)
where i is the number of groups (X), Yi is a constant specific to each group, n is the number of variables utilized to distinguish a dataset into a particular group, and zij represents the weight coefficient allocated by DA to the given parameters (yij). The initial canonical function delineates the unique linear amalgamation of parameters which optimizes the ratio of intergroup variance to intragroup variance along a specific dimension (Masood et al. 2022). The evaluative function for validating clusters via cluster analysis was determined using stepwise DA. Employing stepwise DA demonstrated effectiveness in classifying clusters, and it offered the benefit of assessing the accuracy through cross-validation.

WQI for drinking

The WQI is of paramount importance in the evaluation of GW quality since it aids in establishing its appropriateness for consumption. It functions as a quantitative measure that provides a comprehensive evaluation of water quality in a specific location by leveraging its physiochemical characteristics (Ramakrishnaiah et al. 2009; Adimalla & Qian 2019). The weighted arithmetic WQI approach, initially introduced by Horton (1965) and subsequently enhanced by Brown et al. (1970), was used in this study to evaluate the WQI of every sample location in Nalanda. The WQI was computed based on the drinking water quality standards recommended by the BIS (2012). The assessment included 13 physiochemical characteristics, namely pH, TDS, TA, TH, Ca2+, Mg2+, HCO3, NO3, Cl, SO42−, F, PO43− and DO to evaluate water quality for human consumption at each sampling site. Mathematically, the WQI can be represented as follows:
(2)
where = 1, n = total number of water quality variables, Wi = unit weight for the ith variable, Qi = sub-index for ith variable.
Unit weight for each parameter is calculated as:
where K = proportionality constant and Si = standard value of ith parameter. The value of the proportionality constant can be calculated using the formula:
The value of sub-index or quality rating for each parameter can be calculated using the following equation (Palliyakkal & Rajan 2018):
(3)
where Vn = observed value of ith variable at individual sampling point, Vi = ideal value of ith parameter and Si= standard value of ith parameter. Except for pH (Vi=7) and dissolved oxygen (Vi = 14.6), all other parameters have ideal values of 0.

Geostatistical analysis

The present research used a geostatistical model to examine and depict the spatial distribution pattern of water quality distribution within the designated study region (Thomas 2023). The ordinary kriging method with log transformation technique was utilized to generate various semivariogram models such as models including Circular, Spherical, Exponential and Gaussian (Nas & Berktay 2010). The aforementioned models were used to analyze the GW quality dataset with the objective of identifying the best appropriate model for assessing spatial variability in relation to the WQI. Kriging, being a stochastic technique, relies on the use of statistical model to analyze the data. The spatial interpolation method using Kriging incorporates standard errors, which serve as a metric for quantifying the level of uncertainty linked to the projected quantities. Kriging methods exhibit optimal performance when the data approximates a normal distribution. Transformations were applied to achieve normal distribution and comply with the condition of equal variability of data. Prediction performance assessment was carried out through cross-validation, enabling the selection of the optimal model that yields the most precise predictions. To evaluate the precision of a model in making accurate predictions, it is preferable for the standardized mean error (MS) to be close to 0. Moreover, minimizing both the root-mean-square error (RMSE) and the average standard error (ASE) is essential, particularly when evaluating various models (Omran 2012). Furthermore, the root-mean-square standardized error (RMSS) should ideally approach 1 (Johnston et al. 2001; Hossain et al. 2020).

General hydrochemistry

Table 1 presents the findings of a descriptive statistics analysis conducted on the physiochemical characteristics of 78 GW samples collected in Nalanda district. The analysis confirms a notable variation in physiochemical characteristics across the study area. The recommended pH range of water fit for human consumption is 6.5–8.5 (BIS 2012). In all samples, pH was found to be within the recommended BIS limits. The predominant characteristic of the GW samples collected in the Nalanda district was shown to be alkaline in nature. EC is a quantitative assessment of water's capacity to carry electric current (Alam et al. 2024). It is influenced by the quantity of ions in water, which can conduct electricity, and it is a general indicator of the overall salinity. A significant disparity in EC was detected across the research area. TDS offers a comprehensive evaluation of water quality, encompassing a broader range of dissolved substances, providing more detailed insights compared to EC (Azhdarpoor et al. 2019). The TDS levels in the samples ranged from 188.20 to 1,856 mg/L, with an average concentration of 411.89 mg/L. Most of the samples were found to exceed the desirable TDS concentration. As per a classification based on Subba Rao (2017), In the present study, only 10% of the samples are very fresh water types, 87% of the samples are fresh water types and only 4% of the samples are brackish in nature. Alkalinity is the generally used term to describe the ability of water to counteract the effects of acid. The primary causes of alkalinity in GW are carbonates, bicarbonates, hydroxides, and other natural components (Panghal & Bhateria 2021). The water sample exhibited a total alkalinity range of 22.935 to 69.570 mg/L, with a mean value of 37.470 mg/L. According to BIS, the acceptable limit for TH is 200 mg/L, whereas the permissible limit is 600 mg/L. It was observed that 96% of the samples were below the permissible limit. Elevated hardness value above 250 mg/L may cause stones in the kidney, and consumption of water with excessive hardness leads to stomach-related diseases and may end up with permanent damage to the stomach (Patil & Patil 2010).

Table 1

Descriptive statistics of physiochemical characteristics of GW samples

Min.Max.MeanStd. deviationWHO (2017) BIS IS10500:(2012)
pH 6.74 7.740 7.294 0.199 6.5–8.5 6.5–8.5 
EC 44.50 3,790 776.459 490.420 – 750 
TDS 188.20 1,856 411.895 233.297 600–1,000 500 
TA 22.935 69.570 37.470 8.290 200 200 
TH 151.965 782.85 338.586 107.973 200 200 
DO 1.170 7.74 2.428 0.945 
Ca2+ 59.865 419.05 167.610 65.371 75 75 
Mg2+ 4.605 409.84 170.975 77.822 50 30 
F 0.180 1.5 0.519 0.287 1.5 1–1.5 
Cl 12.009 754.146 56.256 87.416 200 250 
NO3 0.018 69.77 12.641 17.929 50 45 
SO42− 10.615 172.054 47.579 28.305 250 200 
PO43− 0.089 0.67 0.146 0.094 – 0.1 
HCO3 229.350 695.695 374.703 82.904 300 300 
Min.Max.MeanStd. deviationWHO (2017) BIS IS10500:(2012)
pH 6.74 7.740 7.294 0.199 6.5–8.5 6.5–8.5 
EC 44.50 3,790 776.459 490.420 – 750 
TDS 188.20 1,856 411.895 233.297 600–1,000 500 
TA 22.935 69.570 37.470 8.290 200 200 
TH 151.965 782.85 338.586 107.973 200 200 
DO 1.170 7.74 2.428 0.945 
Ca2+ 59.865 419.05 167.610 65.371 75 75 
Mg2+ 4.605 409.84 170.975 77.822 50 30 
F 0.180 1.5 0.519 0.287 1.5 1–1.5 
Cl 12.009 754.146 56.256 87.416 200 250 
NO3 0.018 69.77 12.641 17.929 50 45 
SO42− 10.615 172.054 47.579 28.305 250 200 
PO43− 0.089 0.67 0.146 0.094 – 0.1 
HCO3 229.350 695.695 374.703 82.904 300 300 

Note: All parameters are represented in mg/L except for pH (unitless) and EC is in μS/cm.

Major ions

In the present study, the cationic dominance follows the sequence Ca2+> Mg2+, whereas the anionic dominance is in the sequence HCO3> Cl> SO42−> NO3> F> PO43−. The abundance of calcium and magnesium represents the property of freshwater systems. Both cations are major contributors to the hardness of water. The recommended BIS value for calcium is 75–200 mg/L, and for magnesium, it is 30–100 mg/L. The calcium concentration in 85% of the samples was found in the range of BIS prescribed limits, while the magnesium concentration in 87% of the samples was found above the BIS limiting value. Long-term intake of water excessively enriched with calcium and magnesium may lead to cardiovascular illness, diarrhea, and retarded growth in children (Mandal et al. 2019; Mukherjee et al. 2020).

Elevated chloride levels act as a pollution tracer in GW (Sadat-Noori et al. 2014). High chloride intake may cause hypertension, be laxative, and affect the metabolism of the human body (Ramakrishnaiah et al. 2009; Adimalla & Qian 2019). The samples exhibited a range of chloride concentrations, spanning from 12 to 754.15 mg/L, with an average value of 56.256 mg/L. Except for one sample with a concentration of 754.15 mg/L, all the samples were confirmed to be under the permissible BIS level of 250 mg/L. The SO42− concentration in all the samples was found to be within the BIS acceptable limit (200 mg/L). The phosphate content in 60% of samples was found near the BIS limit (0.1 mg/L). A total of 40% of the samples exceeded the acceptable limits set by the BIS. The elevated levels of phosphate in GW may be attributed to the phosphate-rich sewage and detergent-containing wastewater coming from community drains (Alam et al. 2024). The bicarbonates in most of the samples surpassed the allowable level set by the BIS, which is 300 mg/L. The bicarbonates contribute to the alkalinity of the water. The bicarbonates in GW are generally imparted due to weathering of rocks, gaseous soil carbon-dioxide, and dissolution of carbonates. The range of nitrate concentration in the GW samples was 0.018–69.77 mg/L, with a mean concentration of 12.641 mg/L. The BIS maximum permissible value of nitrate in drinking water is 45 mg/L with zero relaxation. A total of nine samples (12% of the total samples) were found above the BIS permissible limit. Several factors, such as fertilizers, manure applied to the soil, and sewage released from septic tanks, can introduce nitrates into GW. Drinking water with high nitrate levels can raise the risk of blue baby syndrome in infants, stomach cancer and gastroenterological illness, especially for young children (Kumar et al. 2019a, 2019b; Jandu et al. 2021). Leaching of natural fluoride sources like fluoride-bearing rocks and sediments may be the factors contributing to the presence of fluoride in subsurface water. The amount of fluoride intake determines its impact on human health. Fluoride at the right dose (up to 1 mg/L) is essential for us to avoid dental cavities by strengthening tooth enamel, but exposure to a high fluoride dose (above 1.5 mg/L) can lead to dental fluorosis and exposure to such doses for a longer time may promote bone fluorosis (Fordyce 2011; Ahmad et al. 2022). The research observed a range of fluoride concentrations, ranging from 0.18 to 1.50 mg/L, with an average value of 0.519 mg/L. It was determined that all the samples fell within the permitted limit set by the BIS (1.5 mg/L).

Correlation analysis

Correlation analysis is a widely acknowledged technique to determine the association or inter-relationship among the variables, which helps to gain insights about their degree of association and helps to identify if they originated from common sources (Suleiman et al. 2022). Figure 3 presents the Pearson's correlation matrix for the variables within the present research region. pH is negatively correlated with major parameters; Ca2+ (r = −0.47), Mg2+ (r = −0.23) and TH (r = −0.45). TDS achieved a positive correlation with all the parameters and is strongly correlated EC (r = 0.85), Mg2+ (r = 0.65), HCO3 (r = 0.67). A strong correlation between EC and TDS suggests that both measure a common entity. Ca2+ and Mg2+ are positively correlated (r = 0.13), but the weak correlation suggests that in addition to dolomite, they might originate in GW from additional sources. TH is strongly correlated to hardness causing cations; Ca2+ (r = 0.70) and Mg2+ (r = 0.80), high correlation values with both cations indicate that the hardness is primarily due to these cations. Since, nitrate exhibits a positive correlation with Ca2+ (r = 0.37), Mg2+ (r = 0.48) and SO42− (r = 0.39), GW chemistry is highly influenced by agricultural activities. Also, the correlation among NO3, SO42−, Cl and PO43− is positive and suggests that, they might have originated from anthropogenic inputs.
Figure 3

Correlation matrix of the physiochemical parameters of the study area.

Figure 3

Correlation matrix of the physiochemical parameters of the study area.

Close modal
Figure 4

Relationships: (a) molar ratio of Ca2+ and Mg2+, (b) ionic cross-plot of Ca2+ + Mg2+ vs. SO42− +HCO3, (c) ionic cross-plot of SO42− vs. Ca2+, (d) ionic cross-plot of Ca2+ + Mg2+ vs. HCO3, (e) ionic cross-plot of Ca2+ + Mg2+ vs. Cl and (f) ionic cross-plot of TDS vs. NO3 + Cl/HCO3−.

Figure 4

Relationships: (a) molar ratio of Ca2+ and Mg2+, (b) ionic cross-plot of Ca2+ + Mg2+ vs. SO42− +HCO3, (c) ionic cross-plot of SO42− vs. Ca2+, (d) ionic cross-plot of Ca2+ + Mg2+ vs. HCO3, (e) ionic cross-plot of Ca2+ + Mg2+ vs. Cl and (f) ionic cross-plot of TDS vs. NO3 + Cl/HCO3−.

Close modal

Hydrogeochemical processes

Ionic cross-plots (ICPs) serve as commonly employed graphical tools in GW chemistry, facilitating the analysis and visual representation of GW sample compositions. As GW chemistry is influenced by the interactivity among hydrogeological components of the aquifer, so the utilization of ICPs holds great importance in understanding the hydrogeochemical aspects of GW, offering valuable insights into the mechanisms governing chemical composition (Das & Kaur 2007; Subba Rao et al. 2019; Malik et al. 2021). The current research uses several ICPs, also called scatterplots, to obtain insights into the hydrogeochemical evolution of GW across the study region.

The Ca2+/Mg2+ ratio (Figure 4(a)) serves as an indicator to identify the sources of Ca2+ and Mg2+ in subsurface water. Ca2+/Mg2+ ratio ≤ 1 suggests dolomite dissolution, Ca2+/Mg2+ ratio > 1 reveals calcite dissolution, and Ca2+/Mg2+ ratio > 2 reflects silicate rock weathering may be a probable source of these parameters in the GW (Subramani et al. 2010; Tiwari et al. 2020). In the current investigation, 50% of the samples have Ca2+/Mg2+ ratio ≤ 1, suggesting dolomite dissolution is the predominant cause contributing to these ions in the GW. Also, 42% of samples had Ca2+/Mg2+ ratio ranging between 1 and 2, indicating that calcite dissolution and the remaining 8% have Ca2+/Mg2+ > 2, suggesting that silicate rock weathering also contributes to these ions in the GW (Boateng et al. 2016). To provide a comprehensive understanding of the influence of weathering and dissolving processes on the chemical composition of GW, the ionic cross-plot of Ca2+ + Mg2+: HCO3 + SO42− is utilized (Kumar et al. 2006; Marghade et al. 2020). As depicted in Figure 4(b) approximately 90% of the GW samples in the current study region lie above the equiline (1:1) of Ca2+ + Mg2+: HCO3 + SO42−. This indicates that the dominant or primary process controlling the aquifer chemistry is reverse ion exchange (Zaidi et al. 2015; Subba Rao et al. 2019). To examine the impact of gypsum dissolution in aquifers, the correlation between Ca2+ and SO42− was investigated (Barzegar et al. 2016). As depicted in Figure 4(c), the bulk of the samples are situated beneath the theoretical line (1:1 ratio) of SO42− : Ca2+. This finding suggests that the presence of sulfate in GW is not only attributed to the breakdown of gypsum. Moreover, SO42− might be attributed to Glauber's salt (NaSO4·10H2O) occurring during hotter climatic circumstances as shown in Equation (4) and weathering of rock enriched with sulfates (He et al. 2019). Also, a lack of substantial correlation was found among calcium and sulfate (r = 0.187), indicating that the calcium and sulfate ions may not have a common origin.
(4)
The scatter plot (Figure 4(d)) comparing Ca2+ + Mg2+: HCO3 represents that most GW samples exhibit figures above the equiline (1:1). The dominance of alkali earth minerals (calcium + magnesium) over bicarbonate in samples shows that the silicate weathering impacts the GW chemistry (Amadi et al. 2012). The samples located upstream of the equiline exhibit an abundance of Ca2+ + Mg2+ over HCO3, suggesting the existence of additional non-carbonate sources of Ca2+ and Mg2+ (Malik et al. 2021). Magnesium exhibits a positive correlation with sulfate (r = 0.346) indicating that these ions have some common origins, such as Mg-rich silicates, Epsomite, and clay minerals (Datta & Tyagi 1996). The cross-plot of Ca2+ + Mg2+: Cl (Figure 4(e)) shows a negative trend between Ca-Mg and Cl (r = −0.0157) demonstrating a reduction in salinity with an increase in calcium and magnesium concentrations. These findings indicate that the observed phenomena might be attributed to the occurrence of reverse ion exchange inside the weathered stratum of the aquifer (Boateng et al. 2016).

Indeed, assessing the broad chemistry of GW, which is influenced by human interventions in the aquifer, is a complex procedure. As an example, the prevalence of HCO3 and Cl ions in GW is mostly attributed to irrigation return flow. Additionally, sewer effluents from the community serve as sources of HCO3, Cl, and NO3 ions. Furthermore, the use of agrochemicals, applied to boost crop yields, serves as a common source of SO42− and NO3 ions in GW (Jalali 2009; Li et al. 2016). Indeed, TDS quantifies the overall content of dissolved salts in GW. By establishing the correlation of TDS with dissolved ions such as HCO3, SO42−, Cl, and NO3 which are significant pollutants also joined from non-lithological sources like household wastes and agricultural activities, we can gain insights into their respective origins. In the current study, TDS exhibited a positive correlation (Figure 3) with Cl, SO42−, NO3, and HCO3 evidence that anthropogenic inputs have a significant impact on the GW chemistry in the studied region (Subba Rao et al. 2019). A positive correlation between TDS and NO3 + Cl/HCO3 (r = 0.17504) as shown in Figure 4(f), supports the influence of anthropogenic inputs on GW pollution (Li et al. 2016).

KMO and Bartlett's test

The KMO and Bartlett's tests are the statistical tests used to assess the appropriateness of any given dataset for its PCA/FA. For the PCA analysis, KMO values are classified as sufficient between 0.8 and 1, reasonably acceptable between 0.5 and 0.8, and unsatisfactory below 0.5 (Li et al. 2020; Masood et al. 2022). As illustrated in Table 2, for the current dataset, the KMO's sample adequacy value was 0.613, and Bartlett's test of sphericity on the correlation matrix of parameters is significant and yielded chi-square = 1,023.15 (p = 0.00001 and df = 45). The KMO number is within the range that is reasonably acceptable. Furthermore, the significant Bartlett's test result shows that the variables in the dataset are not entirely uncorrelated, confirming the necessity of using PCA to find important trends and information in the data. Also, only parameters having communality >0.5 were considered for PCA.

Table 2

Varimax rotated component matrix of the parameters, total variance and KMO validation

ParameterVF1VF2VF3CommunalitiesValidation
pH −0.418 0.603 0.053 0.541 KMO = 0.613
Bartlett's test:
Approx. chi
square = 1,023.15
df = 45
Sig <0.00001

 
TDS 0.810 0.400 0.065 0.819 
Ca2+ 0.496 −0.652 −0.095 0.68 
Mg2+ 0.799 0.061 0.183 0.676 
TH 0.876 − 0.351 0.074 0.896 
F 0.161 0.731 − 0.468 0.779 
Cl 0.087 0.313 0.865 0.855 
NO3 0.699 − 0.160 −0.096 0.523 
SO42− 0.617 0.266 −0.221 0.501 
HCO3 0.562 0.557 0.016 0.627 
Eigenvalue 2.995 2.764 1.138 − − 
% Variance 29.946 27.645 11.376 − − 
Cum. % variance 29.946 57.59 68.966 − − 
ParameterVF1VF2VF3CommunalitiesValidation
pH −0.418 0.603 0.053 0.541 KMO = 0.613
Bartlett's test:
Approx. chi
square = 1,023.15
df = 45
Sig <0.00001

 
TDS 0.810 0.400 0.065 0.819 
Ca2+ 0.496 −0.652 −0.095 0.68 
Mg2+ 0.799 0.061 0.183 0.676 
TH 0.876 − 0.351 0.074 0.896 
F 0.161 0.731 − 0.468 0.779 
Cl 0.087 0.313 0.865 0.855 
NO3 0.699 − 0.160 −0.096 0.523 
SO42− 0.617 0.266 −0.221 0.501 
HCO3 0.562 0.557 0.016 0.627 
Eigenvalue 2.995 2.764 1.138 − − 
% Variance 29.946 27.645 11.376 − − 
Cum. % variance 29.946 57.59 68.966 − − 

Bold values highlights the set of strongly loaded physiochemical parameters on the different varifactors.

Principal component analysis

PCA was utilized to identify the primary factors influencing GW quality in Nalanda district. A total of 10 GW quality parameters (pH, TDS, Ca2+, Mg2+, TH, F, Cl, NO3, SO42− and HCO3) were taken into consideration while conducting the PCA. The PCA results yielded three significant factors (referred to as varactors or VFs) with eigenvalues exceeding one, which is shown in the scree plot (Figure 5(a)) (Helena et al. 2000; Das et al. 2019). These three VFs collectively represented 68.966% of the overall variability observed in the dataset. Detailed information regarding the varactors' eigenvalue and the corresponding percentage of explained variances can be found in Table 2. Loadings in PCA represent the correlation between variables and factors. According to Liu et al. (2003), these loadings are classified as robust (>± 0.75), moderate (from ±0.75 to ±0.50), and mild (from ±0.50 to ±0.30) based on the absolute loading value. The loading of variables on derived varifactors is represented in Table 2 and the factor component plot for the same is shown in Figure 5(b) and is discussed here.
Figure 5

(a) Scree plot of the eigen values and (b) component loading in rotated space.

Figure 5

(a) Scree plot of the eigen values and (b) component loading in rotated space.

Close modal
VF1, having eigenvalue 2.995, explained 29.946% of the overall variation holds robust loading of TDS (0.810), TH (0.876), Mg2+ (0.799) and a moderate positive loading with NO3 (0.699) and SO42− (0.617). The high loading of TDS, TH, sulfate and magnesium along with the mild loading of calcium and bicarbonate suggests that geological features like the disintegration of minerals (carbonate, silicates and evaporites) as well as the weathering of host rocks, have the most impact on the hydrochemistry of the aquifer in the studied area (Alam & Singh 2023). The strong loading of nitrate and sulfate on VF1 can be associated with anthropogenic origin due to sewage from domestic activities, animal wastes, and the use of agricultural nitrogenous fertilizers in the study area, which made its way down to join GW by leaching. So, VF1 indicated that both anthropogenic and geogenic activities have impacted the GW quality. Nitrate, sulfate and magnesium strongly influence VF1, as a result, the spatial distribution pattern of loadings on VF1 across the study area depicted in Figure 6(a) shows that the central and south-western parts of the research region have high loading concentrations. The application of nitrogenous fertilizers in the study area and domestic wastes which make their way down to join GW by leaching elevates the nitrate level in GW (Malik et al. 2021). So, in Nalanda district nitrate concentration is a crucial determinant in regulating the quality of GW in the central and south-western regions of the district. A similar component loading pattern was observed in a study conducted in Aurangabad, Bihar (India) and Ejisu-Juaben Municipality, Ghana by Prasun & Singh (2024) and Boateng et al. (2016) respectively. They have also concluded that agricultural activities have introduced nitrate contamination in GW and as a result, nitrate has been extracted as the first factor (Table 2).
Figure 6

Spatial distribution map of loadings on varifactors across the study area of (a) VF1, (b) VF2 and (c) VF3.

Figure 6

Spatial distribution map of loadings on varifactors across the study area of (a) VF1, (b) VF2 and (c) VF3.

Close modal

VF2, with an eigenvalue of 2.764, explained 27.645% of the overall variation holds strong loadings of F(0.731) and moderate loading of pH (0.603) and HCO3 (0.557). The positive loading of F, along with pH and the negative loading of Ca2+, indicate the hydrogeochemistry associated with calcite-fluorite and rock–water interaction. Also, the strong loading of HCO3 and pH suggests the alkaline nature of GW. This represents a geogenic factor affecting the GW quality (Hinge et al. 2022). As depicted in Figure 6(b), representing the spatial distribution of loadings on VF2, loading concentration is scattered on the entire study region and there is some concentration in the north-eastern region and south-western region of the study area. A high amount of HCO3 present in water promotes alkalinity which affects human health. Fluoride strongly influenced VF2 shows elevated levels of fluoride in the south-western and north-east regions. South Bihar plains are severely affected by fluoride concentrations. So, in Nalanda district fluoride concentration is also a major contributor to control geochemistry. As per the reports of CGWB, the neighboring districts of Nalanda, namely Gaya, Nawada and Aurangabad, have been reported with high fluoride levels in GW. The current study has also reported elevated fluoride concentration in the GW of the district. Therefore, fluoride has been extracted here as a factor with strong positive loading (Table 2). VF3, having eigen value 1.138, explained 11.376% of the overall variation holds robust loading of Cl (0.865). VF3 indicates the anthropogenic factor influencing the GW quality, and the high loading of chlorides in GW may be associated with the application of bleaching powder in agricultural activities and other possible sources of chlorides like wastewater and sewage originating from household activities. The loading distribution of VF3 across the research region is shown in Figure 6(c). From the spatial map, it can be observed that the south-western region and some parts of the east region have high concentrated loading. This indicates that Cl is also a determinant to control the regional GW chemistry.

Hierarchal cluster analysis

The same parameters used for PCA analysis were utilized for the application of cluster analysis. A dendrogram generated by cluster analysis to make clustering among the sampling sites is displayed in Figure 7(a). Cluster 1 represented a combination of 55 sampling sites and occupied 70.51%, Cluster 2 represented three sampling sites and occupied only 3.9% and Cluster 3 represented 20 sampling sites and occupied 20.64% of the overall grouping. Also, the dendrogram demonstrating cluster analysis using GW quality variables is depicted in Figure 7(b). This dendrogram indicates the formation of three clusters. The measures of pH and Cl are represented in Cluster 1. The presence of Cl in this cluster suggests the influence of surface pollutants, agricultural practices, and the extraction of minerals from sedimentary rocks by leaching (Sreedhar et al. 2019). The measures of TDS, HCO3, SO42− and F is represented in Cluster 2. Cluster 2 combination can be referred to as geogenic activities including dissolution of fluorite minerals and silicate weathering. Cluster 3 represents measures of Ca2+, Mg2+, TH and NO3 and occupies 38.46% of the total grouping. It indicated that the hardness in GW is majorly due to Ca2+ and Mg2+. Geogenic processes such as carbonate weathering, and anthropogenic inputs such as the use of agricultural fertilizers and organic decomposition in soil may be factors responsible for this combination (Alam & Singh 2023). The results obtained from HCA show a strong agreement with results obtained from PCA/FA and correlation analysis. Similar results were obtained for HCA analysis in studies conducted by Panghal & Bhateria (2021) in Beri block, Haryana (India), Masood et al. (2022) in Mewat district, Haryana and Alam & Singh (2023) in Gaya district of Bihar (India). All these regions are densely populated and agriculturally active like the current study area. All these studies have also reported that both anthropogenic and natural activities are deteriorating the GW quality and are responsible for such cluster pattern formation.
Figure 7

Dendrogram representing (a) sampling site-wise cluster and (b) variable-wise cluster.

Figure 7

Dendrogram representing (a) sampling site-wise cluster and (b) variable-wise cluster.

Close modal

Discriminant analysis

DA was conducted, utilizing water quality parameters as predictive variables, to determine the inclusion or membership of GW samples in water quality groups previously identified through HCA. The stepwise mode of DA was utilized, employing a method where, at each step, one variable was added or removed based on its impact in minimizing the overall Wilks' Lambda statistic. The objective of this stepwise approach was to pinpoint the most essential variables that played a significant role in explaining the variations among the identified clusters. By iteratively selecting or excluding variables, the analysis aimed to identify the key contributors to the differences observed among the clusters. In stepwise DA, two DFs (DF1 and DF2) were generated from DA, with four significant variables (Mg2+, F, Cl and NO3) which were found as the most important discriminating parameters. A statistical summary of these functions is provided in Table 3. As illustrated in Table 3, the low value of Wilk's lambda and the high value of chi-square for both DFs with p-value <0.0001 indicates that the DA was reliable and efficient. DF1, having eigenvalue of 3.073, contributed to 89.5% of the total variability of the group differences and has a high canonical correlation of 0.869. DF2, having eigenvalue of 0.361, contributed to 10.5% of the group differences and has a moderate canonical correlation of 0.515. The discriminant score was computed using unstandardized coefficients (Table 4) generated from DA, while standardized coefficients (Table 4) obtained have been utilized to evaluate the contribution of every independent parameter to the DF. The DF coefficients' values indicate the significance of these variables. A higher coefficient value for a DF suggests a greater influence and importance of that variable in DA (Masood et al. 2022). The DF generated using unstandardized coefficients is presented in the following equations:
(5)
(6)
where DF1 and DF2 are discriminant scores and all other parameters are independent variables.
Table 3

Wilks' lambda test and eigen value table of canonical discriminant functions

Test of function(s)Wilks' LambdaChi-squaredfSig.Eigen value% varianceCanonical correlation
0.180 125.867 0.0001 3.073 89.5 0.869 
0.735 22.654 0.0001 0.361 10.5 0.515 
Test of function(s)Wilks' LambdaChi-squaredfSig.Eigen value% varianceCanonical correlation
0.180 125.867 0.0001 3.073 89.5 0.869 
0.735 22.654 0.0001 0.361 10.5 0.515 
Table 4

Canonical discriminant function coefficients

ParametersFunctiona
Functionb
DF1DF2DF1DF2
Mg2+ 0.014 −0.004 0.759 −0.208 
F 1.741 3.173 0.450 0.820 
NO3 0.066 −0.021 0.706 −0.221 
Cl 0.007 0.008 0.540 0.593 
(Constant) −4.469 −1.147 – − 
ParametersFunctiona
Functionb
DF1DF2DF1DF2
Mg2+ 0.014 −0.004 0.759 −0.208 
F 1.741 3.173 0.450 0.820 
NO3 0.066 −0.021 0.706 −0.221 
Cl 0.007 0.008 0.540 0.593 
(Constant) −4.469 −1.147 – − 

aUnstandardized coefficients.

bStandardized coefficients.

A plot illustrating all determined values in the domain of two DFs is depicted in Figure 8. DA establishes centroids for each cluster group. The plot visually demonstrates effective discrimination between groups, with clear representations of distances between group centroids. The first DF (DF1) effectively distinguished Group 1 from Groups 2 and 3 (see Figure 8). It exhibited significant absolute correlations (standardized coefficients > 0.3) with NO3, Cl, Mg2+and F (refer to Table 4). The second DF (DF2) introduced a separation between Group 2 and Group 3, showing notable correlations. Specifically, it was significantly correlated with Cl and F. This emphasizes the vitality of these parameters in maintaining distinctions between the groups. As shown in Figure 8, DA accurately classified 94.9% of the originally grouped cases. In summary, the correct and reliable clustering results from HCA suggest significant temporal variation in GW quality across the studied region.
Figure 8

Scatterplot of three water quality clusters in two discriminant functions space.

Figure 8

Scatterplot of three water quality clusters in two discriminant functions space.

Close modal

GW quality distribution across the study area

In the current investigation, the WQI obtained using Equation (2) for the GW samples ranged from 35.67 to 96.79. Based on the obtained WQI, the water quality at each site may be categorized as excellent, good, poor, very poor, and unsuitable, depending upon their respective WQI values (Ram et al. 2021; Mishra & Lal 2023). The WQI value and water categorization are summarized in Table 5. Based on the WQI values, 59% of the total samples are classified as having good water quality and suitable for human consumption, 35% are classified as having poor water quality, and 6% are classified as having extremely poor water quality and are not safe for consumption.

Table 5

Classification of GW based on the WQI

WQI valueWater quality statusNo. of samplesSamples (%)
0–25 Excellent – – 
25–50 Good 46 59 
50–75 Poor 27 35 
75–100 Very poor 
>100 Unsuitable – – 
WQI valueWater quality statusNo. of samplesSamples (%)
0–25 Excellent – – 
25–50 Good 46 59 
50–75 Poor 27 35 
75–100 Very poor 
>100 Unsuitable – – 

Table 6 illustrates four models along with their prediction error values for WQI prediction at ungauged sites. Metrics including, MS, RMSP, RMSS and ASE are employed to assess the accuracy, fitness, and appropriateness of each model. After examining the cross-validation output factors, it was found that among all the models examined, the Gaussian model has the smallest root-mean-square value, the lowest mean standardized error value of 0.0040 close to zero and the RMSS value of 1.030 closest to 1 was found as a best-fit model for this study. The semivariogram best-fit model spectrum for WQI estimation is illustrated in Figure 9(a). Also, the visual representation of Figure 9(b) indicated that predicted values and measured values are close, showing a good relationship with each other. Considering the above discussions, the Gaussian model was found to be a best-fit model for spatial interpolation of the WQI.
Table 6

Model output of the best-fitted semivariogram model

ParameterModelPrediction errors
MeanASEMSRMSSRMSE
GWQI Circular 0.220 10.91 0.0043 1.051 12.390 
Spherical 0.198 10.93 0.0024 1.046 13.389 
Exponential 0.168 10.91 0.0043 1.042 12.386 
Gaussiana 0.216 10.90 0.0040 1.030 12.323 
ParameterModelPrediction errors
MeanASEMSRMSSRMSE
GWQI Circular 0.220 10.91 0.0043 1.051 12.390 
Spherical 0.198 10.93 0.0024 1.046 13.389 
Exponential 0.168 10.91 0.0043 1.042 12.386 
Gaussiana 0.216 10.90 0.0040 1.030 12.323 

aBest-fitted model among all.

Figure 9

(a) Best-fit semivariogram model, (b) scatterplot predicted vs. measured value and (c) spatial distribution map of the WQI across the study area.

Figure 9

(a) Best-fit semivariogram model, (b) scatterplot predicted vs. measured value and (c) spatial distribution map of the WQI across the study area.

Close modal

The spatial distribution map of the WQI across the region was created using the Ordinary Kriging technique incorporating the Gaussian model utilizing the Spatial Analyst tool in ArcGIS 10.8 and is depicted in Figure 9(c). The classification of the spatial distribution map of the WQI is done based on Table 5 and it was found that water quality in the north-eastern region is classified as good water quality, indicating potable GW. The spatial distribution map also revealed that the major parts of south-western, south-eastern, and central regions of the studied area have poor water quality. This observation could be a consequence of inadequate handling of household waste, intensive agriculture practices, and industrial waste in the region. Some scattered part of the study region falls under very poor water quality indicating high pollution to the GW and is unfit for consumption.

The study, conducted in Nalanda, Bihar, assessed the status of GW quality using physiochemical parameters from 78 samples across the study area utilizing an approach including the WQI, geostatistics, hydrogeochemical characterization and multivariate statistical methods. The study uncovered the alkaline nature of GW, with several physicochemical parameters including, hardness, alkalinity, calcium, magnesium, bicarbonate, fluoride, and nitrate surpassing prescribed limits set by BIS Manual IS 10500 (BIS 2012). In the present study, the cationic dominance follows the sequence Ca2+> Mg2+, whereas the anionic dominance is in the sequence HCO3> Cl> SO42−> NO3> F> PO43−, indicative of freshwater systems. PCA recognized three factors which collectively accounted for 68.96% of the total variance responsible for such GW quality distribution, highlighting the influence of bicarbonate, magnesium, and fluoride from geogenic sources and NO3, PO43−, Cl, and SO42− from anthropogenic sources on the GW. HCA classified the GW in three clusters across the region which was also confirmed by DA. In stepwise DA, two DFs were generated from DA, with four significant variables (Mg2+, F, Cl and NO3) found as the most important discriminating parameters responsible for creating the differences among the identified clusters. Hydrogeochemical categorization and multivariate statistical analyses concluded that rock–water interaction, weathering, leaching, and anthropogenic activities collectively influenced GW quality throughout the studied region. The WQI varied between 35.67 and 96.79, categorizing 59% of samples as good, 35% as poor, and 6% as very poor, highlighting the need for treatment before consumption. The geostatistical analysis identified the Gaussian model as the best-fit model for the WQI interpolator. The spatial distribution map of GW quality distribution indicated that the water quality in significant portions of the south and central regions of the research area has poor water quality. Remedial measures are required in the identified regions to safeguard the health of inhabitants. This research provides a foundational framework for GW quality assessment and sets a baseline for policymakers to focus their efforts on addressing specific sources of pollutants and implementing mitigation measures accordingly which will help to improve public health. Therefore, this research demonstrates the efficacy of employing multivariate statistical techniques to interpret intricate datasets, assess water quality, and understand interactions among variables.

A.K. contributed to conceptualization, formal analysis, investigation, methodology, software, validation, writing – original draft. A.S. contributed to conceptualization, data curation, investigation, methodology, editing and reviewing draft, supervision, validation.

No funding was provided for this work.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Adimalla
N.
&
Qian
H.
(
2019
)
Groundwater quality evaluation using water quality index (WQI) for drinking purposes and human health risk (HHR) assessment in an agricultural region of Nanganur, south India
,
Ecotoxicology and Environmental Safety
,
176
(
126
),
153
161
.
https://doi.org/10.1016/j.ecoenv.2019.03.066
.
Ahmad
S.
,
Singh
R.
,
Arfin
T.
&
Neeti
K.
(
2022
)
Fluoride contamination, consequences and removal techniques in water: A review
,
Environmental Science: Advances
,
1
(
5
),
620
661
.
https://doi.org/10.1039/d1va00039j
.
Ajorlo
M.
,
Abdullah
R. B.
,
Yusoff
M. K.
,
Halim
R. A.
,
Hanif
A. H. M.
,
Willms
W. D.
&
Ebrahimian
M.
(
2013
)
Multivariate statistical techniques for the assessment of seasonal variations in surface water quality of pasture ecosystems
,
Environmental Monitoring and Assessment
,
185
(
10
),
8649
8658
.
https://doi.org/10.1007/s10661-013-3201-8
.
Alam
A.
&
Kumar
S.
(
2023
)
Groundwater quality assessment and evaluation of scaling and corrosiveness potential of drinking water samples
,
Environmental Sciences Proceedings
,
25
,
64
.
https://doi.org/10.3390/ecws-7-14316
.
Alam
A.
&
Singh
A.
(
2022
)
Groundwater quality evaluation using statistical approach and water quality index in Aurangabad, Bihar
,
Rasayan Journal of Chemistry
,
2022
(
Special Issue
),
180
188
.
https://doi.org/10.31788/RJC.2022.1558191
.
Alam
A.
&
Singh
A.
(
2023
)
Groundwater quality assessment using SPSS based on multivariate statistics and water quality index of Gaya, Bihar (India)
,
Environmental Monitoring and Assessment
,
195
(
6
),
1
23
.
https://doi.org/10.1007/s10661-023-11294-7
.
Alam
A.
,
Kumar
A.
&
Singh
A.
(
2024
)
A GIS approach for groundwater quality evaluation with entropy method and fluoride exposure with health risk assessment
,
Environmental Geochemistry and Health
,
46
(
2
).
https://doi.org/10.1007/s10653-023-01822-2
.
Amadi
A. N.
,
Nwankwoala
H. O.
,
Olasehinde
P. I.
,
Okoye
N. O.
,
Okunlola
I. A.
&
Alkali
Y. B.
(
2012
)
Investigation of aquifer quality in Bonny Island, Eastern Niger Delta, Nigeria using geophysical and geochemical techniques
,
Journal of Emerging Trends in Engineering and Applied Sciences
,
3
,
180
184
.
APHA
(
2017
)
Standard Methods for the Examination of Water and Wastewater
, 23rd ed.
Washington DC
:
American Public Health Association
.
Assiuti
E.
&
Governorate
A.
(
2020
)
GIS - based assessment of groundwater quality and suitability for drinking and irrigation purposes in the outlet and central parts of Wadi
,
Bulletin of the National Research Centre
,
https://doi.org/10.1186/s42269-020-00428-3
.
Azhdarpoor
A.
,
Radfard
M.
,
Pakdel
M.
,
Abbasnia
A.
,
Badeenezhad
A.
,
Mohammadi
A. A.
&
Yousefi
M.
(
2019
)
Assessing fluoride and nitrate contaminants in drinking water resources and their health risk assessment in a Semiarid region of Southwest Iran
,
Desalination and Water Treatment
,
149
,
43
51
.
https://doi.org/10.5004/dwt.2019.23865
.
Barzegar
R.
,
Asghari Moghaddam
A.
,
Najib
M.
,
Kazemian
N.
&
Adamowski
J.
(
2016
)
Characterization of hydrogeologic properties of the Tabriz plain multilayer aquifer system, NW Iran
,
Arabian Journal of Geosciences
,
9
(
2
),
1
17
.
https://doi.org/10.1007/s12517-015-2229-1
.
BIS
(
2012
)
Indian Standard Drinking Water Specification (Second Revision)
.
Bureau of Indian Standards
, pp.
1
11
,
IS 10500(May). Available at: http://cgwb.gov.in/Documents/WQ-standards.pdf.
Boateng
T. K.
,
Opoku
F.
,
Acquaah
S. O.
&
Akoto
O.
(
2016
)
Groundwater quality assessment using statistical approach and water quality index in Ejisu-Juaben Municipality, Ghana
,
Environmental Earth Sciences
,
75
(
6
).
https://doi.org/10.1007/s12665-015-5105-0
.
Brown
R.
,
McClelland
N.
,
Deininger
R.
&
Tozer
R.
(
1970
)
A water quality index – do we dare?
,
Proceedings, National Symposium on Data and Instrumentation for Water Quality Management
,
117
,
339
343
.
CGWB
(
2013
) भूजल सूचना
पुस्तिका (Issue September). Available at: https://cgwb.gov.in/District_Profile/Bihar/Nalanda.pdf.
Chen
J.
,
Wu
H.
,
Qian
H.
&
Gao
Y.
(
2017
)
Assessing nitrate and fluoride contaminants in drinking water and their health risk of rural residents living in a semiarid region of Northwest China
,
Exposure and Health
,
9
(
3
),
183
195
.
https://doi.org/10.1007/s12403-016-0231-9
.
Chen
T.
,
Zhang
H.
,
Sun
C.
,
Li
H.
&
Gao
Y.
(
2018
)
Multivariate statistical approaches to identify the major factors governing groundwater quality
,
Applied Water Science
,
8
(
7
),
1
6
.
https://doi.org/10.1007/s13201-018-0837-0
.
Das
B. K.
&
Kaur
P.
(
2007
)
Geochemistry of surface and sub-surface waters of Rewalsar Lake, Mandi District, Himachal Pradesh: Constraints on weathering and erosion
,
Journal of the Geological Society of India
,
69
,
1020
1030
.
Das
N.
,
Mondal
P.
,
Ghosh
R.
&
Sutradhar
S.
(
2019
)
Groundwater quality assessment using multivariate statistical technique and hydro-chemical facies in Birbhum District, West Bengal, India
,
SN Applied Sciences
,
1
(
8
),
1
21
.
https://doi.org/10.1007/s42452-019-0841-5
.
Datta
P. S.
&
Tyagi
S. K.
(
1996
)
Major ion chemistry of groundwater in Delhi area: Chemical weathering processes and groundwater flow regime
,
Journal of the Geological Society of India
,
47
(
2
),
179
188
.
Fordyce
F. M.
(
2011
)
Encyclopedia of environmental health
,
Encyclopedia of Environmental Health
,
2
,
776
785
.
Gad, M., Gaagai, A., Eid, M. H., Szűcs, P., Hussein, H., Elsherbiny, O., Elsayed, S., Khalifa, M. M., Moghanm, F. S., Moustapha, M. E., Tolan, D. A. & Ibrahim, H.
(
2023
)
Groundwater quality and health risk assessment using indexing approaches, multivariate statistical analysis, artificial neural networks, and GIS techniques in El Kharga Oasis, Egypt
.
Water
15
(
6
),
https://doi.org/10.3390/w15061216
.
Govind
A.
,
Anand
B.
,
Priya
K.
&
Trivedi
R.
(
2021
)
Integration of multivariate statistics and water quality indices to evaluate groundwater quality and its suitability in middle Gangetic floodplain, Bihar
,
SN Applied Sciences
,
3
(
4
),
1
18
.
https://doi.org/10.1007/s42452-021-04394-x
.
He
X.
,
Wu
J.
&
He
S.
(
2019
)
Hydrochemical characteristics and quality evaluation of groundwater in terms of health risks in Luohe aquifer in Wuqi County of the Chinese Loess Plateau, northwest China
,
Human and Ecological Risk Assessment: An International Journal
,
25
(
1–2
),
32
51
.
https://doi.org/10.1080/10807039.2018.1531693
.
Helena
B.
,
Pardo
R.
,
Vega
M.
,
Barrado
E.
,
Fernandez
J.
&
Fernández
L.
(
2000
)
Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis
,
Water Research
,
34
,
807
816
.
https://doi.org/10.1016/S0043-1354(99)00225-0
.
Hinge
G.
,
Bharali
B.
,
Baruah
A.
&
Sharma
A.
(
2022
)
Integrated groundwater quality analysis using water quality index, GIS and multivariate technique: A case study of Guwahati City
,
Environmental Earth Sciences
,
81
(
16
),
1
15
.
https://doi.org/10.1007/s12665-022-10544-0
.
Horton
R. K.
(
1965
)
An index number system for rating water quality
,
Journal of the Water Pollution Control Federation
,
37
,
300
306
.
Hossain
M.
,
Patra
P. K.
,
Begum
S. N.
&
Rahaman
C. H.
(
2020
)
Spatial and sensitivity analysis of integrated groundwater quality index towards irrigational suitability investigation
,
Applied Geochemistry
,
123
,
104782
.
https://doi.org/https://doi.org/10.1016/j.apgeochem.2020.104782
.
Jalali
M.
(
2009
)
Geochemistry characterization of groundwater in an agricultural area of Razan, Hamadan, Iran
,
Environmental Geology
,
56
(
7
),
1479
1488
.
https://doi.org/10.1007/s00254-008-1245-9
.
Johnston
K.
,
Ver Hoef
J. M.
,
Krivoruchko
K.
&
Lucas
N.
(
2001
)
Using ArcGIS geostatistical analyst
. In:
Analysis
, Vol.
300
, p.
300
.
Jolliffe
I. T.
&
Cadima
J.
(
2016
)
Principal component analysis: A review and recent developments
,
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
,
374
(
2065
),
20150202
.
https://doi.org/10.1098/rsta.2015.0202
.
Kumar
A.
&
Kumar
C.
(
2015
)
Characterization of hydrogeochemical processes and fluoride enrichment in groundwater of South-Western Punjab
,
Water Quality, Exposure and Health
,
2008
,
373
387
.
https://doi.org/10.1007/s12403-015-0157-7
.
Kumar
M.
,
Ramanathan
A.
,
Rao
M. S.
&
Kumar
B.
(
2006
)
Identification and evaluation of hydrogeochemical processes in the groundwater environment of Delhi, India
,
Environmental Geology
,
50
(
7
),
1025
1039
.
https://doi.org/10.1007/s00254-006-0275-4
.
Kumar
D.
,
Singh
A.
,
Kumar
R.
,
Sunil
J.
,
Sahoo
K.
&
Jha
V.
(
2018
)
Using spatial statistics to identify the uranium hotspot in groundwater in the mid-eastern Gangetic plain, India
,
Environmental Earth Sciences
,
77
(
19
),
1
12
.
https://doi.org/10.1007/s12665-018-7889-1
.
Kumar
D.
,
Singh
A.
,
Jha
R. K.
,
Sahoo
B. B.
,
Sahoo
S. K.
&
Jha
V.
(
2019a
)
Source characterization and human health risk assessment of nitrate in groundwater of middle Gangetic Plain, India
,
Arabian Journal of Geosciences
,
12
(
11
).
https://doi.org/10.1007/s12517-019-4519-5
.
Kumar
D.
,
Singh
A.
,
Jha
R. K.
,
Sahoo
S. K.
&
Jha
V.
(
2019b
)
A variance decomposition approach for risk assessment of groundwater quality
,
Exposure and Health
,
11
(
2
),
139
151
.
https://doi.org/10.1007/s12403-018-00293-6
.
Kumar
R.
,
Singh
S.
,
Kumar
R.
&
Sharma
P.
(
2022
)
Groundwater quality characterization for safe drinking water supply in Sheikhpura District of Bihar, India: A geospatial approach
,
Frontiers in Water
,
4
,
848018
.
https://doi.org/10.3389/frwa.2022.848018
.
Li
P.
,
Lee
T.
&
Youn
H.
(
2020
)
Dimensionality reduction with sparse locality for principal component analysis
,
Mathematical Problems in Engineering
,
2020
,
1
12
.
https://doi.org/10.1155/2020/9723279
.
Liu
C.-W.
,
Lin
K.-H.
&
Kuo
Y.-M.
(
2003
)
Application of factor analysis in the assessment of groundwater quality in a Backfoot Disease Area in Taiwan
,
The Science of the Total Environment
,
313
,
77
89
.
https://doi.org/10.1016/S0048-9697(02)00683-6
.
Maity
S.
,
Biswas
R.
&
Sarkar
A.
(
2020
)
Chemosphere Comparative valuation of groundwater quality parameters in Bhojpur, Bihar for arsenic risk assessment
,
Chemosphere
,
259
,
127398
.
https://doi.org/10.1016/j.chemosphere.2020.127398
.
Malik
N.
,
Malik
A.
&
Bishnoi
S.
(
2021
)
Assessment of groundwater hydro-geochemistry, quality, and human health risk in arid area of India using chemometric approach
,
Arabian Journal of Geosciences
,
14
(
15
).
https://doi.org/10.1007/s12517-021-07852-3
.
Mandal
J.
,
Golui
D.
,
Raj
A.
&
Ganguly
P.
(
2019
)
Risk assessment of arsenic in wheat and maize grown in organic matter amended soils of indo-gangetic plain of bihar, India
,
Soil and Sediment Contamination: An International Journal
,
28
(
8
),
757
772
.
https://doi.org/10.1080/15320383.2019.1661353
.
Marghade
D.
,
Malpe
D. B.
,
Subba Rao
N.
&
Sunitha
B.
(
2020
)
Geochemical assessment of fluoride enriched groundwater and health implications from a part of Yavtmal District, India
,
Human and Ecological Risk Assessment: An International Journal
,
26
(
3
),
673
694
.
https://doi.org/10.1080/10807039.2018.1528862
.
Marko
K.
,
Al-Amri
N. S.
&
Elfeki
A. M. M.
(
2014
)
Geostatistical analysis using GIS for mapping groundwater quality: case study in the recharge area of Wadi Usfan, western Saudi Arabia
,
Arabian Journal of Geosciences
,
7
(
12
),
5239
5252
.
https://doi.org/10.1007/s12517-013-1156-2
.
Masood
A.
,
Aslam
M.
,
Pham
Q. B.
,
Khan
W.
&
Masood
S.
(
2022
)
Integrating water quality index, GIS and multivariate statistical techniques towards a better understanding of drinking water quality
,
Environmental Science and Pollution Research
,
29
(
18
),
26860
26876
.
https://doi.org/10.1007/s11356-021-17594-0
.
Mishra
A.
&
Lal
B.
(
2023
)
Assessment of groundwater quality in Ranchi district, Jharkhand, India, using water evaluation indices and multivariate statistics
,
Environmental Monitoring and Assessment
.
https://doi.org/10.1007/s10661-023-11101-3
.
Mukherjee
I.
,
Singh
U. K.
,
Singh
R. P.
,
Anshumali
,
Kumari
D.
,
Jha
P. K.
&
Mehta
P.
(
2020
)
Characterization of heavy metal pollution in an anthropogenically and geologically influenced semi-arid region of east India and assessment of ecological and human health risks
,
Science of the Total Environment
,
705
,
135801
.
https://doi.org/10.1016/j.scitotenv.2019.135801
.
Nas
B.
&
Berktay
A.
(
2010
)
Groundwater quality mapping in urban groundwater using GIS
,
Environmental Monitoring and Assessment
,
160
(
1–4
),
215
227
.
https://doi.org/10.1007/s10661-008-0689-4
.
Niti Aayog
(
2021
)
Omran
E. S. E.
(
2012
)
A proposed model to assess and map irrigation water well suitability using geospatial analysis
,
Water (Switzerland)
,
4
(
3
),
545
567
.
https://doi.org/10.3390/w4030545
.
Palliyakkal
R.
&
Rajan
B.
(
2018
)
Determination of ground water quality index as a tool for assessing the quality of ground water for drinking and other purpose in a coral atoll in Indian ocean Kavaratti Island, Lakshadweep, India
,
Lakshadweep, India
,
5
,
9
15
.
Panghal
V.
&
Bhateria
R.
(
2021
)
A multivariate statistical approach for monitoring of groundwater quality: A case study of Beri block, Haryana, India
,
Environmental Geochemistry and Health
,
43
(
7
),
2615
2629
.
https://doi.org/10.1007/s10653-020-00654-8
.
Patil
V. T.
&
Patil
P. R.
(
2010
)
Physicochemical analysis of selected groundwater samples of amalner town in Jalgaon District, Maharashtra, India
,
E-Journal of Chemistry
,
7
,
820796
.
https://doi.org/10.1155/2010/820796
.
Prasun
A.
&
Singh
A.
(
2024
)
Evaluation of potential human health risks arising from nitrate and fluoride in the groundwater of Aurangabad, Bihar using GIS and chemometric analysis
,
Environmental Geochemistry and Health
,
46
(
8
).
https://doi.org/10.1007/s10653-024-02047-7
.
Ram
A.
,
Tiwari
S. K.
,
Pandey
H. K.
,
Chaurasia
A. K.
,
Singh
S.
&
Singh
Y. V.
(
2021
)
Groundwater quality assessment using water quality index (WQI) under GIS framework
,
Applied Water Science
,
11
(
2
),
46
.
https://doi.org/10.1007/s13201-021-01376-7
.
Ramakrishnaiah
C. R.
,
Sadashivaiah
C.
&
Ranganna
G.
(
2009
)
Assessment of water quality index for the groundwater in Tumkur taluk, Karnataka state, India
,
E-Journal of Chemistry
,
6
(
2
),
523
530
.
https://doi.org/10.1155/2009/757424
.
Sadat-Noori
M.
,
Ebrahimi
K.
&
Liaghat
A.
(
2014
)
Groundwater quality assessment using the Water Quality Index and GIS in Saveh-Nobaran aquifer, Iran
,
Environmental Earth Sciences
,
71
.
https://doi.org/10.1007/s12665-013-2770-8
.
Sreedhar
G. S.
,
Machender
K. G.
&
Dhakate
R.
(
2019
)
Multivariate statistical approach for the assessment of fluoride and nitrate concentration in groundwater from Zaheerabad area, Telangana State, India
,
Sustainable Water Resources Management
,
5
(
2
),
785
796
.
https://doi.org/10.1007/s40899-018-0258-0
.
Subba Rao
N.
(
2017
).
Hydrogeology: Problems with Solutions
.
Subba Rao
N.
,
Srihari
C.
,
Deepthi Spandana
B.
,
Sravanthi
M.
,
Kamalesh
T.
&
Abraham Jayadeep
V.
(
2019
)
Comprehensive understanding of groundwater quality and hydrogeochemistry for the sustainable development of suburban area of Visakhapatnam, Andhra Pradesh, India
,
Human and Ecological Risk Assessment
,
25
(
1–2
),
52
80
.
https://doi.org/10.1080/10807039.2019.1571403
.
Subramani
T.
,
Rajmohan
N.
&
Elango
L.
(
2010
)
Groundwater geochemistry and identification of hydrogeochemical processes in a hard rock region, Southern India
,
Environmental Monitoring and Assessment
,
162
(
1–4
),
123
137
.
https://doi.org/10.1007/s10661-009-0781-4
.
Suleiman
A. A.
,
Abdullahi
U. A.
,
Suleiman
A.
,
Suleiman
S. A.
&
Abubakar
H. U.
(
2022
)
Correlation and regression model for physicochemical quality of groundwater in the Jaen District of Kano State, Nigeria
,
Journal of Statistical Modelling and Analytics
,
4
(
1
),
14
24
.
https://doi.org/10.22452/josma.vol4no1.2
.
Tiwari
A.
,
Suozzi
E.
,
Fiorucci
A.
&
Russo
S.
(
2020
)
Assessment of groundwater geochemistry and human health risk of an intensively cropped alluvial plain, NW Italy
,
Human and Ecological Risk Assessment: An International Journal
,
27
,
1
21
.
https://doi.org/10.1080/10807039.2020.1775484
.
Water.org
. (
2023
).
India's Water and Sanitation Crisis
.
WHO
(
2017
).
2017 WHO Guidelines for Drinking Water Quality: First Addendum to the Fourth Edition. Journal - American Water Works Association 109, 44–51. https://doi.org/10.5942/jawwa.2017.109.0087
.
Yang
Q.
,
Zhang
J.
,
Wang
Y.
,
Fang
Y.
&
Martín
J. D.
(
2016
)
Multivariate statistical analysis of hydrochemical data for shallow ground water quality factor identification in a coastal aquifer
,
Polish Journal of Environmental Studies
,
24
(
2
),
769
776
.
https://doi.org/10.15244/pjoes/30263
.
Zaidi
F. K.
,
Nazzal
Y.
,
Jafri
M. K.
,
Naeem
M.
&
Ahmed
I.
(
2015
)
Reverse ion exchange as a major process controlling the groundwater chemistry in an arid environment: A case study from northwestern Saudi Arabia
,
Environmental Monitoring and Assessment
,
187
(
10
).
https://doi.org/10.1007/s10661-015-4828-4
.
Zhou
F.
,
Liu
Y.
&
Guo
H.
(
2007
)
Application of multivariate statistical methods to water quality assessment of the watercourses in Northwestern New Territories, Hong Kong
,
Environmental Monitoring and Assessment
,
132
(
1–3
),
1
13
.
https://doi.org/10.1007/s10661-006-9497-x
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).