This paper presents a machine learning approach for classification of arsenic (As) levels as safe and unsafe in groundwater samples collected from the Indo-Gangetic region. As water is essential for sustaining life, heavy metals like arsenic pose a public health concern. In this study, various tree-based machine learning models namely Random Forest, Optimized Forest, CS Forest, SPAARC, and REP Tree algorithms have been applied to classify water samples. As per the guidelines of the World Health Organization (WHO), the arsenic concentration in water should not exceed 10 μg/L. The groundwater quality parameter was ranked using a classifier attribute evaluator for training and testing the models. Parameters obtained from the confusion matrix, such as accuracy, precision, recall, and FPR, were used to analyze the performance of models. Among all models, Optimized Forest outperforms other classifier as it has a high accuracy of 80.64%, a precision of 80.70%, recall of 97.87%, and a low FPR of 73.33%. The Optimized Forest model can be used to test new water samples for classification of arsenic in groundwater samples.

  • Decision Tree-based machine learning algorithms used for prediction of arsenic (As) in groundwater samples.

  • Confusion matrix obtained and accuracy, precision, recall, and FPR were calculated.

  • Model can be used to approximate the number of population affected with arsenic.

  • Spatial analysis of water parameters has been discussed.

  • Optimized Forest algorithm is the best-suited model for classification of arsenic.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Water is a priceless and essential natural resource for sustaining life on earth. Around 97.5% of the water on earth is saltwater, and the rest is freshwater. Out of total freshwater available, approximately 79% are in the form of ice caps and glaciers. Groundwater is around 20%, and the rest 1% is in lakes, soil, rivers, and the atmosphere. With increased human population and climate change, freshwater demand has increased significantly.

To meet the growing demand for freshwater, Panagopoulos (2022a) proposed saltwater desalination using zero liquid discharge (ZLD). Panagopoulos (2021) ZLD desalination system consists of reverse osmosis (RO) membrane, brine concentrator (BC), and brine crystilizer (BCr) to produce over 99% of freshwater and mixed salts. The freshwater produced has various industrial applications and is potable. More recently, Panagopoulos (2022b) compared the ZLD desalination system using BCr and ZLD desalination system using Wing Aided Intensified Evaporation (WAIV). The freshwater produced is much greater using BCr than WAIV, but WAIV is more cost-effective and energy-efficient.

Groundwater is a primary freshwater resource that makes up about 98% of freshwater available on the planet after the cryosphere plays a substantial role in the terrestrial and aquatic ecosphere. Arsenic contamination of groundwater is a growing concern, especially in the middle Gangetic belt of Uttar Pradesh, India (Chattopadhyay et al. 2020) and the lower Indo Gangetic plain of West Bengal, India (Das et al. 2020). Groundwater quality depends on aquifers’ geochemical and mineralogical composition and anthropogenic sources during weathering of rocks and minerals, followed by subsequent leaching and runoffs (Jeelani et al. 2014). Elevated arsenic concentration above the permissible limit of the World Health Organization 10 mg/L (WHO 2011) in aquifer pose a severe health risk to approximately one hundred million people in Indo Gangetic Plains (Bhowmick et al. 2018). Thus, in the last few decades, arsenic has been marked as the topmost substance which poses a severe threat to human life (ATSDR 2019).

In the literature, most of the work is to identify the spatial distribution of arsenic in the groundwater of Indo Gangetic plain and various factors such as geologic, topography, sediment characteristics, biogeochemical, hydrogeologic, and anthropogenic factors which are responsible for As contamination (Biswas et al. 2012; Chakraborty et al. 2020). Studies have revealed that contamination of As in groundwater is due to dissolution of bearing minerals found in Quaternary deposits belonging to the Holocene age (Lee et al. 2009; Mukherjee et al. 2009; Postma et al. 2016). Due to global warming, as these glaciers melt, the deposits of As-rich sediment are carried by rivers originating from the Himalayas are discharged into the Indo Gangetic plain of Uttar Pradesh (Ahamed et al. 2006; Chauhan et al. 2009), Bihar (Chakraborti et al. 2003; Saha 2009), Jharkhand (Alam et al. 2016; Tirkey et al. 2017), West Bengal (Nickson et al. 2008; Mukherjee et al. 2011), Assam (Verma et al. 2015, 2019), Punjab, and Haryana (Kumar & Singh 2020). The main reason aquifers are contaminated through the oxidation of pyrite into iron oxides under aerobic conditions, thereby releasing arsenic, sulfate, and various trace elements. In recent years, several machine learning models have been applied to assess the quality of arsenic in groundwater, of which logistic regression is the most common model. The logistic regression model was used by Dummer et al. (2015) for predicting the spatial distribution of arsenic worldwide. Parameters such as topology, soil properties, and geology have been used to predict the arsenic concentration using Logistic Regression (Zhang et al. 2012). In some studies, various attributes, i.e., Holocene sediments, the salinity of the soil, texture of the soil, and wetness index of the topographic was used to predict the quality of groundwater contamination (Lado et al. 2008). For arsenic contamination detection in groundwater, Luo et al. (2012) used principal component regression (PCR). In the year 2013, Zhang et al. (2013) used a linear regression model, in some studies, artificial neural network (ANN) was used for the detection of arsenic in groundwater (Cho et al. 2011; Bonelli et al. 2017) and some studies used Bayesian modeling (Cha et al. 2016).

Several researchers have modeled adaptive neuro-fuzzy inference system, regression Kriging, Random Forest, Boosted regression trees, etc., to establish an association among the spatial distribution of arsenic contamination in groundwater with different environmental attributes in other parts of the world (Erickson et al. 2018; Bindal & Singh 2019; Podgorski & Berg 2020). Machine learning models accurately establish a complex relationship and magnitude between different attributes. The machine learning model's performance is more accurate to overcome the complexity of establishing patterns and relationships in parametric models. However, no such machine learning model has been developed based on water parameters to assess arsenic contamination risk in the Varanasi region. Varanasi was chosen by the Ministry of Urban Development (MoUD) and the Ministry of Housing and Urban Poverty Alleviation (MoHUPA) government of India to develop the city by conserving its heritage, culture, and traditions. Therefore, arsenic investigation strategies need to be developed for the densely populated Varanasi region to provide rapid information for monitoring and managing public health improvement.

Our present study used tree-based machine learning algorithms to classify the water samples collected across the Varanasi region (Figure 1) as safe or unsafe based on the arsenic level. Most study uses geology, soil parameters, land cover, topography, minerals, temperature, precipitation, hydrology, aquifers connectivity, etc. (Chakraborty et al. 2020; de Menezes et al. 2020), as input variables to predict the occurrence of arsenic in the different study area. This study uses water parameters pH, EC (μS cm−1), TDS (ppm), Salinity (ppm), Na+ (mEq L−1), K+ (ppm), Ca2+ + Mg2+(mEq L−1), SAR, SSP (%), CO32−(mEq L−1), HCO3−(mEq L−1), RSC (mEq L−1), PO43− (ppm), Cl (mEq L−1), Mn2+ (ppb), Cu2+ (ppb), Fe2+ (ppm), Zn2+ (ppb), SO42− (ppm) as input variables to classify the water samples as per the WHO permissible limit. Moreover, for attribute selection, Random forest algorithm was used to find the relevant attributes (Chen et al. 2020) responsible for the arsenic occurrence.

Figure 1

Location map of the study area in India.

Figure 1

Location map of the study area in India.

Close modal

Therefore, the study is significant as machine learning approaches using easily estimable relevant parameters as inputs could prove highly useful in predicting the arsenic contamination level. This also helps in digitizing the dataset, which could be easily retrievable for further use and classification, and suitability analysis of groundwater. The work involves applying tree-based machine learning classification algorithms to classify the samples as safe or unsafe, namely Optimized Forest, SPAARC, CS Forest, Reduced Error Pruning (REP) Tree, and Random Forest. These models have been analyzed based on multiple evaluation criteria like accuracy, precision, recall, and FPR.

Study area

In this study, 62 groundwater samples were collected along with the bank of river Ganga of Varanasi region on a grid basis (Figure 1). Samples were collected, and different water quality parameters were determined by the standard method described by the American Public Health Association (APHA; Maiti 2001; Federation & Aph Association 2005).

For determining the concentration of zinc (Zn2+), copper (Cu2+), iron (Fe2+), manganese (Mn2+), and arsenic (As) ions, an absorption spectrophotometer (AAS) was used. Moreover, atomic absorption spectrometry (Agilent Technologies VGA 77 AA spectrophotometer, Australia, Serial no. MY16020008) was used (Van Herreweghe et al. 2003) for the determination of arsenic concentration in water vapour.

Dataset description

The data collected from 62 different locations of Varanasi in the Indo Gangetic region and classification of water as safe and unsafe was determined using attributes shown in Table 1. All attributes were of numeric data type and one categorical attribute that determines whether the water is safe or unsafe based on the values of all the attributes. The descriptive statistics of the dataset are shown in Table 1 which contains each parameter's mode, median, minimum, and maximum values.

Table 1

Descriptive statistic and data description

Attribute DescriptionMinMaxMeanModeMedian
Hydrogen ion concentration (pH) 7.01 8.71 7.88 7.35 7.84 
Electrical conductivity (EC (μS cm−1)) 209 859 426.77 329 411 
Total dissolved solid (TDS (ppm)) 134 559 275.56 214 266 
Salt content (Salinity (ppm)) 96 647 295.58 248 299 
Sodium (Na+ (mEq L−1)) 0.07 13.40 1.83 0.14 0.62 
Potassium (K+ (ppm)) 0.70 44.60 9.22 2.60 6.05 
Calcium and Magnesium (Ca2+ + Mg2+ (mEq L−1)) 1.70 10.20 3.15 5.80 5.15 
Sodium Adsorption Ratio (SAR) 0.04 9.48 1.23 0.04 0.41 
Soluble Sodium (SSP (%)) 0.53 77.01 18.64 2.44 7.53 
Carbonate (CO32− (mEq L−1)) 1.20 4.40 2.24 2.00 2.00 
Bicarbonate (HCO3 (mEq L−1)) 1.80 9.20 4.78 3.60 4.65 
Residual Sodium Carbonate (RSC (mEq L−1)) 0.10 6.90 1.77 0.30 1.15 
Phosphate (PO43− (ppm)) 0.45 14.09 4.16 3.94 3.94 
Chloride (Cl (mEq L−1)) 1.60 5.00 2.71 2.40 2.60 
Manganese (Mn2+ (ppb)) 0.90 97.00 31.48 36.00 26.00 
Copper (Cu2+ (ppb)) 12.00 74.20 34.65 22.00 30.90 
Iron (Fe2+ (ppm)) 0.10 22.10 3.63 1.20 2.10 
Zinc (Zn2+ (ppb)) 10.00 77.00 31.32 39.00 27.50 
Arsenic (As (III) (ppb)) 3.00 25.00 8.38 6.20 7.52 
Attribute DescriptionMinMaxMeanModeMedian
Hydrogen ion concentration (pH) 7.01 8.71 7.88 7.35 7.84 
Electrical conductivity (EC (μS cm−1)) 209 859 426.77 329 411 
Total dissolved solid (TDS (ppm)) 134 559 275.56 214 266 
Salt content (Salinity (ppm)) 96 647 295.58 248 299 
Sodium (Na+ (mEq L−1)) 0.07 13.40 1.83 0.14 0.62 
Potassium (K+ (ppm)) 0.70 44.60 9.22 2.60 6.05 
Calcium and Magnesium (Ca2+ + Mg2+ (mEq L−1)) 1.70 10.20 3.15 5.80 5.15 
Sodium Adsorption Ratio (SAR) 0.04 9.48 1.23 0.04 0.41 
Soluble Sodium (SSP (%)) 0.53 77.01 18.64 2.44 7.53 
Carbonate (CO32− (mEq L−1)) 1.20 4.40 2.24 2.00 2.00 
Bicarbonate (HCO3 (mEq L−1)) 1.80 9.20 4.78 3.60 4.65 
Residual Sodium Carbonate (RSC (mEq L−1)) 0.10 6.90 1.77 0.30 1.15 
Phosphate (PO43− (ppm)) 0.45 14.09 4.16 3.94 3.94 
Chloride (Cl (mEq L−1)) 1.60 5.00 2.71 2.40 2.60 
Manganese (Mn2+ (ppb)) 0.90 97.00 31.48 36.00 26.00 
Copper (Cu2+ (ppb)) 12.00 74.20 34.65 22.00 30.90 
Iron (Fe2+ (ppm)) 0.10 22.10 3.63 1.20 2.10 
Zinc (Zn2+ (ppb)) 10.00 77.00 31.32 39.00 27.50 
Arsenic (As (III) (ppb)) 3.00 25.00 8.38 6.20 7.52 

Statistical and spatial analysis

Pearson correlation analysis was performed to find the correlation between the parameters and arsenic. Moreover, to find the variation of parameters in the dataset, Principal component analysis was performed using IBM SPSS Statistics 25 software. The data variation in the study area can be visualized using spatial variability maps. An inverse distance weighting (IDW) technique of interpolation was used for performing spatial analysis. Each measured data point in IDW has a more significant local influence that decreases with an increase in distance. Spatial variable maps of all the parameters were created using ArcGIS 10.8 software.

Feature selection

The collected dataset contained 62 instances that were trained and tested by using Optimized Forest, SPAARC, CS Forest, Reduced Error Pruning (REP) Tree, and Random Forest. Before training and testing the dataset, it is essential to perform feature selection. The feature selection method (Hossain et al. 2013) simplifies the model by reducing attributes and decreasing the training time. It also increases the model's accuracy by selecting the right attributes for classification. Moreover, it also reduces the overfitting problem by a generalization and avoids the curse of dimensionality. Under this phenomenon, the predictive power of a classifier first increases as the number of attributes increases, but beyond a certain number of attributes, the accuracy of the classifier gets reduced.

Classifier attribute evaluator was used for finding the relevant attributes for the datasets. The classifier attribute evaluator evaluates individual attributes by measuring the amount of information gained about the class given the attribute. A random forest classifier was used to assess the importance of an attribute for classification (Chen et al. 2020).

Implementation of machine learning algorithms

The machine learning classifiers were trained and tested (Fushiki 2011) based on these attributes to get a confusion matrix. A flowchart of the methodology used to implement machine learning models is shown in Figure 2. Classifiers have been analyzed based on multiple evaluation criteria like accuracy, precision, recall, FPR (false positive rate), and mean absolute error (MAE).

Figure 2

Methodology for machine learning algorithms.

Figure 2

Methodology for machine learning algorithms.

Close modal

Optimized Forest

Optimized Forest Classifier (Adnan & Islam 2016) Algorithm is based on a decision forest algorithm which uses genetic algorithm (GA) to select optimized subforest with high accuracy and diversity to increase the overall accuracy of the algorithm. The main idea of this algorithm is to infuse high-quality trees as the initial population of GA. GA encodes the population in the form of data structures, also known as chromosomes. In this algorithm, 20 chromosomes are encoded to constitute the population. In which 10 odd number of chromosomes are the selected-based stratified sampling technique to choose good quality of trees and rest 10 even number of chromosomes are chosen randomly. Moreover, crossover and mutation are applied to the chromosomes using the roulette wheel technique. Elitist operation is applied after the previous step to get the competent chromosomes. To prevent degradation, a pool of 40 chromosomes is created to select the 20 best among them based on the roulette wheel technique. Finally, a sequential Search Operation is applied to get the best ensemble accuracy.

SPAARC

Split-Point and Attribute Reduced Classifier (SPAARC) is based on a fast decision tree algorithm (Yates et al. 2018). It speeds up the induction process of decision trees by Split Point Sampling and Node Attribute Sampling (NAS). The NAS avoids testing each non-class attribute at every node, and before the induction, it avoids preselecting a subset of attributes. Moreover, this method dynamically selects the attributes based on the depth level of the current node under test. This algorithm also uses split point sampling for optimum split point using the Gini index.

CS Forest

It is a cost-sensitive classification technique that uses the ensemble method of decision tree (Siers & Islam 2015). This algorithm takes advantage of CSVoting to minimize the classification cost. CSVoting calculates the cost of labeling the records belonging to the positive and negative classifications. It computes the sum of the total positive classification cost for all leaves and the total negative classification cost. Finally, the records are classified as positive if the cost of the positive class is less than computing the negative class; otherwise, it is classified as negative. This algorithm also uses cost-sensitive pruning a tree if the pruning does not increase significantly. In the CSForest algorithm, initially, the tree is allowed to grow fully; then, it is pruned. In this algorithm, firstly ensemble of trees is built, then CSVoting is used for classification.

REP Tree

Reduced Error Pruning (REP) is a fast decision tree learner algorithm (Elomaa & Kaariainen 2001). It is based on the C4.5 algorithm is used for classification and regression trees. It builds a decision tree based on information gain value and reduces pruning error by using back fitting. REP uses the bottom-up technique for traversing a tree and prunes a node with the most repeated class without reducing the accuracy of the tree. This procedure continues until the accuracy decreases, and that is estimated using a pruning set.

Random Forest

Random Forest algorithm (Breiman 2001) is a collection of tree-structured classifiers. The Random forest classification algorithm uses the bootstrap resampling method (Stine 1989) to extract subsamples from original samples to create a decision tree for each sample. After creating decision trees, they are combined to form a forest. It calls for polling, and the result depends on the output of the decision tree, which is used for prediction.

Pearson correlation between the parameters

The correlation between the parameters is shown in Table 2. The correlation value for each parameter ranges from −1 (denoted in red) to +1 (denoted in green). Among various parameters considered in the study area, arsenic is positively correlated with potassium (r = +0.009) and iron (r = +0.248). Moreover, arsenic is positively correlated with sodium (r = +0.144) and negatively correlated with copper (r = −0.025). Multiple studies have shown that reducing oxides and hydroxides due to abiotic or biotic factors, and oxidation of iron sulfides are responsible for arsenic accumulation in groundwater. Iron shows a positive correlation with Ca2+ + Mg2+ (r = +0.304).

Table 2

Pearson correlation of arsenic with all parameters

 
 

Please refer to the online version of this paper to see this table in colour: http://dx.doi.org/10.2166/wh.2022.015.

Moreover, potassium is positively correlated with zinc (r = +0.166). Sodium has a negative correlation with Ca2+ + Mg2 + (r = −0.266), iron (r = −0.013), and potassium (r = −0.331) but a positive correlation with copper (r = + 0.034). Thus, the study observed an inverse relation between sodium and other metal ions. Phosphorous shows a negative correlation with arsenic (r = −0.281) and bicarbonate (r = −0.271). Arsenic is positively correlated with bicarbonate (r = + 0.259) and is responsible for the dissociation and accumulation of arsenic into groundwater from iron oxyhydroxides present in aquifers.

Principal component analysis for significant variables

Variation in the dataset between the parameters is shown in Table 3. In the first principal component, SSP and Na+ contribute more to the overall variation of 24.60%. TDS and SAR contribute 16.05% of the total variation in the second principal component. In the third principal component, Ca2+ + Mg2+, Fe2+, and arsenic contribute 11.99% to the total variation. Moreover, in the fourth principal component, RSC and Cl contribute 8.60% to the total contribution. Arsenic and Mn2+ contribute 7.30% to the total variation in the fifth principal component. In the sixth principal component, PO43− and Zn2+ contribute 6.41% to the total variation. In this study, arsenic contributes to the third and fifth principal components affecting groundwater aquifers.

Table 3

Principal component analysis of groundwater parameters

ParametersComponent
PC1PC2PC3PC4PC5PC6
% of Variance 24.602 16.058 11.993 8.609 7.300 6.416 
SSP 0.865 −0.403     
Na+ 0.767 −0.459     
pH 0.758      
SAR 0.750 −0.501     
CO32− 0.623      
RSC 0.595   0.536   
TDS 0.508 0.783     
EC 0.533 0.753     
Salinity 0.598 0.675     
HCO3  0.632 0.405 0.480   
Ca2++Mg2+  0.489 0.655    
Fe2+   0.564    
PO43−   −0.552   0.506 
Cl    −0.517   
K+    0.472   
As   0.513  −0.587  
Mn2+     0.574  
Cu2+     0.477 0.574 
Zn2+   −0.435   −0.490 
ParametersComponent
PC1PC2PC3PC4PC5PC6
% of Variance 24.602 16.058 11.993 8.609 7.300 6.416 
SSP 0.865 −0.403     
Na+ 0.767 −0.459     
pH 0.758      
SAR 0.750 −0.501     
CO32− 0.623      
RSC 0.595   0.536   
TDS 0.508 0.783     
EC 0.533 0.753     
Salinity 0.598 0.675     
HCO3  0.632 0.405 0.480   
Ca2++Mg2+  0.489 0.655    
Fe2+   0.564    
PO43−   −0.552   0.506 
Cl    −0.517   
K+    0.472   
As   0.513  −0.587  
Mn2+     0.574  
Cu2+     0.477 0.574 
Zn2+   −0.435   −0.490 

Spatial analysis of parameters

The variation of data in the study is determined through spatial analysis. From the spatial analysis map of zinc (Figure 3(a)) and arsenic (Figure 3(b)), it is observed that they have a negative correlation. In some parts, the high zinc content is due to the overutilization of cow dung as manure for agriculture. Moreover, TDS (Figure 3(c)) has little relation to arsenic.

Figure 3

Spatial distribution map of (a) arsenic, (b) zinc, (c) TDS, and (d) SSP.

Figure 3

Spatial distribution map of (a) arsenic, (b) zinc, (c) TDS, and (d) SSP.

Close modal

The spatial analysis map of SSP (Figure 3(d)), sodium, SAR, salinity, and RSC in Figure 4(a)–4(d), shows high sodium content in water samples due to the deposition of sodium by river Ganga. A high phosphate level in Figure 4(e) is negatively related to arsenic occurrence, but it is positively correlated with potassium, as shown in Figure 4(f).

Figure 4

Spatial distribution map of (a) sodium, (b) SAR, (c) salinity, (d) RSC, (e) phosphate, and (f) potassium.

Figure 4

Spatial distribution map of (a) sodium, (b) SAR, (c) salinity, (d) RSC, (e) phosphate, and (f) potassium.

Close modal

A high pH level in Figure 5(a) in groundwater increases negative charges in minerals present in aquifers and stimulates the desorption of arsenic into groundwater. Iron content in Figure 5(b) is higher in the meandering position of river Ganga, showing a positive relation to arsenic. The map shows that chloride content in Figure 5(c) is higher away from the meandering part of river Ganga. The presence of manganese in groundwater (Figure 5(d)) of the study area indicates that it favors the dissolution or reduction of manganese oxyhydroxides to arsenic. Calcium content, as shown in Figure 5(e) and magnesium content in Figure 5(f) in groundwater, can decrease the leaching of arsenic.

Figure 5

Spatial distribution map of (a) pH, (b) iron, (c) chloride, (d) manganese, (e) calcium, and (f) magnesium.

Figure 5

Spatial distribution map of (a) pH, (b) iron, (c) chloride, (d) manganese, (e) calcium, and (f) magnesium.

Close modal

In Figure 6(a), the groundwater has a high amount of EC that negatively relates to the formation of arsenic. There is a negative correlation between arsenic and copper, as shown in Figure 6(b) of the study area, but both are responsible for nephrotoxicity and damage to the kidneys. The presence of carbonates, as shown in Figure 6(c) and bicarbonates in Figure 6(d), favor arsenic dissolution into aquifers. Bicarbonates make a favorable condition for mobilization of arsenic from iron and manganese oxyhydroxides.

Figure 6

Spatial distribution map of (a) EC, (b) copper, (c) carbonates, and (d) bicarbonate.

Figure 6

Spatial distribution map of (a) EC, (b) copper, (c) carbonates, and (d) bicarbonate.

Close modal

It is evident from the result obtained from Pearson correlation as shown in Table 2 and attributes selected from the Classifier Attribute evaluator that Fe2+ (ppm) is positively correlated and responsible for arsenic occurrence in groundwater (Bhattacharya et al. 2003). The oxides/hydroxides of Fe2+ containing arsenic are reduced due to abiotic and biotic factors (Islam et al. 2004). Moreover, the oxidation of Fe2+ sulfides is too responsible for arsenic occurrence in groundwater (Carraro et al. 2013). Also, the presence of Ca2++Mg2+ in groundwater has a positive relationship with the occurrence of arsenic. It is found that Ca2++Mg2+ shows a positive correlation with Fe2+. Bhowmick et al. (2013) reported that a high concentration of HCO3 (mEq L−1) favors mobilization of arsenic in groundwater by dissociating arsenic from iron oxyhydroxides, illite, and kaolinite (Gao et al. 2011).

High pH concentration in the Varanasi region favours arsenic occurrence. It may be due to the presence of Calcite in bedrock (Ayotte et al. 2003). Due to high pH (>8.5), there is desorption of arsenic from mineral oxides and in neutral pH values reduces arsenic from Fe2+ and Mn2+ oxides (Smedley & Kinniburgh 2002). Moreover, from Pearson correlation and classifier attribute selector, it is found that Zn2+ (ppb) and Mn2+ (ppb) have a negative correlation with the occurrence of arsenic (Murphy et al. 2019) Cl(mEq L−1) are found to be positively associated with the occurrence of arsenic, and it is evident that Cl derives arsenic from bedrock (Warner 2001). High sodium concentration in water samples shows a significant negative correlation with iron, potassium, calcium, and magnesium. As per Meng et al. (2017), the result obtained from Pearson correlation and classifier attribute evaluator CO32− (mEq L−1) of Ca2+ shows a negative correlation with the occurrence of arsenic.

Classification result of arsenic contamination in groundwater

Based on the classifier attribute evaluator, the feature selection method following attributes HCO3 (mEq L−1), Fe2+ (ppm), Zn2+ (ppb), pH, Ca2++Mg2+ (mEq L−1), Cl(mEq L−1), SSP (%), Mn2+ (ppb), RSC (mEq L−1), and CO32− (mEq L−1) were selected for training and testing the machine learning algorithms. For all machine learning classifiers accuracy, precision, recall, and FPR is calculated based on the confusion matrix as shown in Table 4.

Table 4

Confusion Matrix and results obtained for machine learning algorithms

AlgorithmsSafeUnsafeAccuracy (%)Precision (%)Recall (%)FPR (%)
Optimized Forest Safe 46 80.64 80.70 97.87 73.33 
Unsafe 11 
SPAARC Safe 47 77.41 77.04 100 93.33 
Unsafe 14 
CS Forest Safe 46 75.80 76.66 97.97 93.33 
Unsafe 14 
REP Tree Safe 46 75.80 76.66 97.87 93.33 
Unsafe 14 
Random Forest Safe 46 79.03 79.31 97.87 80.00 
Unsafe 12 
AlgorithmsSafeUnsafeAccuracy (%)Precision (%)Recall (%)FPR (%)
Optimized Forest Safe 46 80.64 80.70 97.87 73.33 
Unsafe 11 
SPAARC Safe 47 77.41 77.04 100 93.33 
Unsafe 14 
CS Forest Safe 46 75.80 76.66 97.97 93.33 
Unsafe 14 
REP Tree Safe 46 75.80 76.66 97.87 93.33 
Unsafe 14 
Random Forest Safe 46 79.03 79.31 97.87 80.00 
Unsafe 12 

Experimental results obtained after training and testing are compared, and a graph (Figure 7) is plotted for all machine learning classifiers. In our study's context, the algorithm with the overall high value of accuracy, precision, recall, and lowest FPR values is considered best. Accuracy is the ratio of total correct predictions out of total predictions made. Optimized Forest has the highest accuracy (Figure 7(a)) of 80.64% compared to other classifiers. Moreover, precision is the number of true positive outcomes more closely among all positive results. Optimized Forest has the highest precision (Figure 7(b)) of 80.70% compared to other classifiers.

Figure 7

(a) Accuracy, (b) precision, (c) recall, and (d) FPR obtained for machine learning algorithms.

Figure 7

(a) Accuracy, (b) precision, (c) recall, and (d) FPR obtained for machine learning algorithms.

Close modal

When Recall is considered, SPAARC has a high recall value (Figure 7(c)) of 100% compared to other machine learning classifiers. Recall signifies the sensitivity of the model. Moreover, when FPR is considered, the Optimized Forest algorithm has the least FPR of 77.33% (Figure 7(d)) compared to other classifiers. The model with a less FPR value is considered the best among other models.

Finally, from the overall performance of all classifiers, the Optimized Forest model is the best in terms of accuracy, precision, recall, and FPR. Moreover, when the overall MAE of all models is compared, it is found that Optimized Forest has the least MAE of 0.240 compared to 0.245, 0.358, 0.245, and 0.243 for SPAARC, CS Forest, REP Tree, and Random Forest Algorithms. Thus, the Optimized Forest classifier model can be used for various applications.

Application of machine learning model outcomes

Approximation of people prone to As poisoning

The accuracy result obtained from the model's overall performance can be used to approximate the number of people prone to arsenic poisoning. Bindal & Singh (2019) used a linear functional relationship between the predicted accuracy of the model and the population density of an area to approximate the number of people affected by arsenic.
(1)

The total area of our study is 1,258.39 km2, out of which 197.36 km2 have a high arsenic concentration (>10 μg/L). As per the census of India, the population density of the Varanasi district is 2,395 people/km2. Using Equation (1), approximately 381,166 people live in the high arsenic concentration (>10 μg/L) region, and 2,049,197 people live in the low arsenic concentration (<10 μg/L) region. So, approximately 2,430,363 people are prone to developing carcinogenic and non-carcinogenic diseases in the study area.

Carcinogenic and non-carcinogenic risk assessment of the study area

The people prone to carcinogenic and non-carcinogenic risk can be calculated for the study area. The effect of arsenic contamination in the study area is calculated using the ingestion and dermal contact pathways (U.S. EPA 2005). The ingestion pathway determines the average daily intake of arsenic through drinking water using Equation (1).
(2)
where ADIingestion is an average daily intake of arsenic through ingestion of water in mg kg−1 day−1, AC is the average arsenic concentration in drinking water (μL−1), RWI is the rate of water intake per day (2.00 L day−1), EF is the exposure frequency (365 days/year), ED is the exposure duration of a person throughout its lifetime (60 years), BW is the body weight of a person (60 kg), and AT is the average time exposure to arsenic (21,900 days) (Chakraborti et al. 2017).
The dermal contact pathway is an exposure of the body to arsenic-contaminated water, calculated through Equation (2).
(3)
where ADDdermal is the average daily dose through dermal contact of arsenic-contaminated water (mg kg−1 day−1), Ki is the parameter for dermal adsorption (0.001 cm/h), SA is the surface area of the body exposed to arsenic-contaminated water (16,600 cm2), BF is the bathing frequency of a person in the study area (1 time/day), and CF is the unit conversion factor of (0.002 L/cm3).
The non-carcinogenic risk can be calculated using a hazard quotient (HQ). The hazard quotient for ingestion pathway (HQ (ingestion)) is the ratio of average daily ingestion (ADIingestion) to the oral reference dose (0.0003 mg kg−1 day−1). Moreover, the hazard quotient for dermal contact pathway (HQ (dermal)) is the ratio of average daily dose (ADDdermal) to the dermal reference dose (0.000123 mg kg−1 day−1). If the total sum of HQ (Equation (3)) through ingestion and the dermal pathway is greater than 1, then there is a high risk of developing non-carcinogenic health effects; otherwise, no adverse health effect can be expected.
(4)
The carcinogenic risk of a person contracting cancer over a lifetime is calculated using Target Cancer Risk (TCR), as shown in Equation (4). Target Cancer Risk for ingestion pathways (TCR (ingestion)) is a product of (ADIingestion) and the Cancer Slope Factor (CSF). According to the integrated Risk Information System (IRIS) database, CSF is 1.5 mg kg−1 day−1 for ingestion pathways. Moreover, Target Cancer Risk for ingestion pathways (TCR (dermal)) is a product of (ADIdermal) and the Cancer Slope Factor (CSF). The Cancer Slop Factor for dermal contact pathways is 3.66 mg kg−1 day−1.
(5)

Based on the model accuracy and population of the study area (Equation (1)), approximately 2,430,363 people are prone to arsenic poisoning. Carcinogenic and non-carcinogenic risk of the people living in the study area is calculated (Table 5) using the minimum, maximum, mean, and median value of arsenic distribution in the study area (Li et al. 2018). The safe limit for non-carcinogenic risk (HQ) is less than 1, and for carcinogenic risk (TCR) is less than 10−6 (U.S. EPA (U.S. Environmental Protection Agency) 1992a, 1992b). The HQ value ranges from 0.33 (low risk) to 2.77 (very high risk); the area having HQ>1 has a high risk of developing non-carcinogenic diseases. Moreover, the TCR value ranges from 1.5 × 10−4 (high risk) to 1.25 × 10−3 (very high risk), the area having TCR value >10−6 have a high risk of developing skin, lung, and urinary bladder cancer due to long-term exposure to arsenic.

Table 5

Carcinogenic and non-carcinogenic risk assessment of study area

Non-carcinogenic risk
Carcinogenic risk
AsADDiADDdHiHQdHQTCRiTCRdTCR
Min 3.00 1*10−4 7.57*10−11 0.33 6.16*10−15 0.33 1.5*10−4 2.77*10−10 1.5*10−4 
Mean 8.38 2.79*10−4 2.11*10−10 0.93 1.72*10−14 0.93 4.18*10−4 7.74*10−10 4.19*10−4 
Median 7.52 2.50*10−4 1.90*10−10 0.83 1.54*10−14 0.83 3.75*10−4 6.95*10−10 3.75*10−4 
Max 25.00 8.33*10−4 6.31*10−10 2.77 5.13*10−14 2.77 1.24*10−3 2.31*10−9 1.25*10−3 
Non-carcinogenic risk
Carcinogenic risk
AsADDiADDdHiHQdHQTCRiTCRdTCR
Min 3.00 1*10−4 7.57*10−11 0.33 6.16*10−15 0.33 1.5*10−4 2.77*10−10 1.5*10−4 
Mean 8.38 2.79*10−4 2.11*10−10 0.93 1.72*10−14 0.93 4.18*10−4 7.74*10−10 4.19*10−4 
Median 7.52 2.50*10−4 1.90*10−10 0.83 1.54*10−14 0.83 3.75*10−4 6.95*10−10 3.75*10−4 
Max 25.00 8.33*10−4 6.31*10−10 2.77 5.13*10−14 2.77 1.24*10−3 2.31*10−9 1.25*10−3 

i, Ingestion; d, dermal, As (μg/L).

Association between attributes and spatial distribution of arsenic

Based on the finding of Pearson correlation and classifier attribute evaluator, the significant attributes responsible for the occurrence of arsenic are found. Moreover, based on those attributes, the model was trained and tested to classify arsenic levels as safe and unsafe. The model created can be used to predict new water samples based on attributes as safe or unsafe. Moreover, the spatial distribution of arsenic, as shown in Figure 3(a), can be used to group the regions as high, moderate, and low-risk zone. Based on this spatial distribution of arsenic zones, various mitigation strategies can be implemented to protect people's health. There is a need for continuous surveillance and monitoring of the water quality in the affected regions.

Mitigation techniques based on the spatial map of arsenic

Based on spatial maps and the grouping of zones, policymakers can address the mitigation strategies. There are two categories by which arsenic mitigation can be addressed. Firstly, by finding alternative arsenic-free water sources and secondly, using the latest technology to remove arsenic from the water source.

The creation of rainwater harvesting pits, new ponds, lakes, and dug wells in the affected zones are generally free from arsenic due to the continuous oxidation environment and groundwater recharge with rainfall. Another way is to motivate people in the affected zones to switch from the arsenic-affected shallow tube wells to new deep tube wells to access arsenic-free water.

Infrastructure needs to be developed for arsenic-free water in the arsenic-affected zones. One of them is through the oxidation process (Lee et al. 2003) in which the soluble AsIII is converted to Asv followed by adsorption. Asv is adsorbed onto a solid surface more easily than AsIII. There are several oxidants utilized O3, H2O2, Cl2, NH2Cl.

Another technique of removing arsenic is through the Coagulation-Flocculation (Pallier et al. 2010) process that uses Fe and Al bases coagulants followed by the formation of floc which aggregates to form large particles. The soluble arsenic is precipitated onto floc and thus eliminated from the water.

Moreover, arsenic removal using adsorbents (Katsoyiannis et al. 2008) has become widely explored. Adsorbents like Zero Valent Iron (ZVI) Fe(0), Ferrihydrite (Giles et al. 2011), granular ferric hydroxide (Driehaus et al. 1998), and hydrous ferric oxide (Wilkie & Hering 1996) are the most widely explored iron oxides and hydroxides for the removal of As yielding promising results for both AsIII and AsV removals.

In a recent study, various bacteria, Gallionella ferruginea and Leptothrix ochracea, have removed arsenic from contaminated water (Katsoyiannis & Zouboulis 2004). These bacteria oxidize AsIII to AsV. The AsV is further removed using a coagulation process to get arsenic-free water.

This research article assesses the groundwater quality for classifying water samples as safe or unsafe as per the guidelines laid by the WHO in Varanasi district of Uttar Pradesh state of India. Arsenic is a known carcinogen that affects the health of humans and animals. Based on the findings of this study, major geochemical parameters responsible for the occurrence of arsenic are found. Moreover, based on the overall performance of Optimized Forest, a machine learning model is created for the classification of water samples as safe or unsafe. The model approximates 2,430,363 people prone to arsenic poisoning. Based on model accuracy and population of the study area, the carcinogenic and non-carcinogenic risk are assessed. To counter the poisoning of arsenic in the study area, there is a need for continuous surveillance and monitoring of water quality. Moreover, various infrastructures need to be developed in the worst-affected regions of the study area.

In the future, we will continue an emphasis on the inclusion of other vital parameters like geology, soil parameters, land use land cover, topography, minerals, temperature, precipitation, hydrology, aquifers connectivity, hydrostratigraph along with water parameters for the creation of a machine learning model that is robust and accurate for a broader study area.

This is to certify that the authors are not affiliated with or involved with any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this paper.

The author would like to acknowledge the fellowship provided by the University Grant Commission (UGC), New Delhi, in the form of Junior Research Fellowship (JRF) and Senior Research Fellowship (SRF), to conduct this research.

All relevant data are included in the paper or its Supplementary Information.

Ahamed
S.
,
Sengupta
M. K.
,
Mukherjee
A.
,
Hossain
M. A.
,
Das
B.
,
Nayak
B.
&
Chakraborti
D.
2006
Arsenic groundwater contamination and its health effects in the state of Uttar Pradesh (UP) in upper and middle Ganga plain, India: a severe danger
.
Science of the Total Environment
370
(
2-3
),
310
322
.
Alam
M.
,
Shaikh
W. A.
,
Chakraborty
S.
,
Avishek
K.
&
Bhattacharya
T.
2016
Groundwater arsenic contamination and potential health risk assessment of Gangetic Plains of Jharkhand, India
.
Exposure and Health
8
(
1
),
125
142
.
ATSDR 2019 ATSDR's Substance Priority List. Available from: https://www.atsdr.cdc.gov/spl/index.html
Ayotte
J. D.
,
Montgomery
D. L.
,
Flanagan
S. M.
&
Robinson
K. W.
2003
Arsenic in groundwater in eastern New England: occurrence, controls, and human health implications
.
Environmental Science & Technology
37
(
10
),
2075
2083
.
Bhowmick
S.
,
Nath
B.
,
Halder
D.
,
Biswas
A.
,
Majumder
S.
,
Mondal
P.
&
Chatterjee
D.
2013
Arsenic mobilization in the aquifers of three physiographic settings of West Bengal, India: understanding geogenic and anthropogenic influences
.
Journal of Hazardous Materials
262
,
915
923
.
Bhowmick
S.
,
Pramanik
S.
,
Singh
P.
,
Mondal
P.
,
Chatterjee
D.
&
Nriagu
J.
2018
Arsenic in groundwater of West Bengal, India: a review of human health risks and assessment of possible intervention options
.
Science of the Total Environment
612
,
148
169
.
Bhattacharya
P.
,
Tandukar
N.
,
Nekul
A.
,
Valero
A.
,
Mukherjee
A. B.
&
Jacks
G.
(2003 May)
Geogenic arsenic in groundwaters from Terai alluvial plain of Nepal
. In
Journal de Physique IV (Proceedings)
107
, pp.
173
176
. EDP sciences.
Biswas
A.
,
Nath
B.
,
Bhattacharya
P.
,
Halder
D.
,
Kundu
A. K.
,
Mandal
U.
&
Jacks
G.
2012
Hydrogeochemical contrast between brown and grey sand aquifers in shallow depth of Bengal Basin: consequences for sustainable drinking water supply
.
Science of the Total Environment
431
,
402
412
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Carraro
A.
,
Fabbri
P.
,
Giaretta
A.
,
Peruzzo
L.
,
Tateo
F.
&
Tellini
F.
2013
Arsenic anomalies in shallow Venetian Plain (Northeast Italy) groundwater
.
Environmental Earth Sciences
70
(
7
),
3067
3084
.
Chakraborti
D.
,
Das
B.
,
Rahman
M. M.
,
Nayak
B.
,
Pal
A.
,
Sengupta
M. K.
&
Dutta
R. N.
2017
Arsenic in groundwater of the Kolkata Municipal Corporation (KMC), India: Critical review and modes of mitigation
.
Chemosphere
180
,
437
447
.
Chakraborti
D.
,
Mukherjee
S. C.
,
Pati
S.
,
Sengupta
M. K.
,
Rahman
M. M.
,
Chowdhury
U. K.
&
Basu
G. K.
2003
Arsenic groundwater contamination in Middle Ganga Plain, Bihar, India: a future danger?
.
Environmental Health Perspectives
111
(
9
),
1194
1201
.
Chakraborty
M.
,
Sarkar
S.
,
Mukherjee
A.
,
Shamsudduha
M.
,
Ahmed
K. M.
,
Bhattacharya
A.
&
Mitra
A.
2020
Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: Infusing physically-based model with machine learning
.
Science of the Total Environment
748 (
1-14
),
141107
.
Chattopadhyay
A.
,
Singh
A. P.
,
Singh
S. K.
,
Barman
A.
,
Patra
A.
,
Mondal
B. P.
&
Banerjee
K.
2020
Spatial variability of arsenic in Indo-Gangetic basin of Varanasi and its cancer risk assessment
.
Chemosphere
238
(
1-9
),
124623
.
Chauhan
V. S.
,
Nickson
R. T.
,
Chauhan
D.
,
Iyengar
L.
&
Sankararama Krishnan
N.
2009
Ground water geochemistry of Ballia district, Uttar Pradesh, India and mechanism of arsenic release
.
Chemosphere
75
(
1
),
83
91
.
Chen
R. C.
,
Dewi
C.
,
Huang
S. W.
&
Caraka
R. E.
2020
Selecting critical features for data classification based on machine learning methods
.
Journal of Big Data
7
(
1
),
1
26
.
Cho
K. H.
,
Sthiannopkao
S.
,
Pachepsky
Y. A.
,
Kim
K. W.
&
Kim
J. H.
2011
Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network
.
Water Research
45
(
17
),
5535
5544
.
Das
A.
,
Das
S. S.
,
Chowdhury
N. R.
,
Joardar
M.
,
Ghosh
B.
&
Roychowdhury
T.
2020
Quality and health risk evaluation for groundwater in Nadia district, West Bengal: an approach on its suitability for drinking and domestic purpose
.
Groundwater for Sustainable Development
10 (
1-10
),
100351
.
de Menezes
M. D.
,
Bispo
F. H. A.
,
Faria
W. M.
,
Gonçalves
M. G. M.
,
Curi
N.
&
Guilherme
L. R. G.
2020
Modeling arsenic content in Brazilian soils: What is relevant
?
Science of the Total Environment
712 (
1-12
),
136511
.
Driehaus
W.
,
Jekel
M.
&
Hildebrandt
U.
1998
Granular ferric hydroxidea new adsorbent for the removal of arsenic from natural water
.
Journal of Water Supply: Research and Technology Aqua
47
(
1
),
30
35
.
Dummer
T. J. B.
,
Yu
Z. M.
,
Nauta
L.
,
Murimboh
J. D.
&
Parker
L.
2015
Geostatistical modelling of arsenic in drinking water wells and related toenail arsenic concentrations across Nova Scotia, Canada
.
Science of the Total Environment
505
,
1248
1258
.
Elomaa
T.
&
Kaariainen
M.
2001
An analysis of reduced error pruning
.
Journal of Artificial Intelligence Research
15
,
163
187
.
Erickson
M. L.
,
Elliott
S. M.
,
Christenson
C. A.
&
Krall
A. L.
2018
Predicting geogenic arsenic in drinking water wells in glacial aquifers, north-central USA: Accounting for depth-dependent features
.
Water Resources Research
54
(
12
),
10
172
.
Federation
W. E.
, &
Aph Association 2005 Standard methods for the examination of water and wastewater
.
American Public Health Association (APHA)
:
Washington, DC, USA
,
21
.
Fushiki
T.
2011
Estimation of prediction error by using K-fold cross-validation
.
Statistics and Computing
21
(
2
),
137
146
.
Gao
X.
,
Wang
Y.
,
Hu
Q.
&
Su
C.
2011
Effects of anion competitive adsorption on arsenic enrichment in groundwater
.
Journal of Environmental Science and Health, Part A
46
(
5
),
471
479
.
Giles
D. E.
,
Mohapatra
M.
,
Issa
T. B.
,
Anand
S.
&
Singh
P.
2011
Iron and aluminium based adsorption strategies for removing arsenic from water
.
Journal of Environmental Management
92
(
12
),
3011
3022
.
Gnanambal
S.
,
Thangaraj
M.
,
Meenatchi
V. T.
&
Gayathri
V.
2018
Classification algorithms with attribute selection: an evaluation study using WEKA
.
International Journal of Advanced Networking and Applications
, 3640–3644.
Hossain
M. R.
,
Oo
A. M.
&
Ali
A. B.
2013
The effectiveness of feature selection method in solar power prediction
.
Journal of Renewable Energy
2013
,
1
10
.
Islam
F. S.
,
Gault
A. G.
,
Boothman
C.
,
Polya
D. A.
,
Charnock
J. M.
,
Chatterjee
D.
&
Lloyd
J. R.
2004
Role of metal-reducing bacteria in arsenic release from Bengal delta sediments
.
Nature
430
(
6995
),
68
71
.
Jeelani
G. H.
,
Shah
R. A.
&
Hussain
A.
2014
Hydrogeochemical assessment of groundwater in Kashmir Valley, India
.
Journal of Earth System Science
123
(
5
),
1031
1043
.
Katsoyiannis
I. A.
,
Ruettimann
T.
&
Hug
S. J.
2008
pH dependence of Fenton reagent generation and As (III) oxidation and removal by corrosion of zero valent iron in aerated water
.
Environmental Science & Technology
42
(
19
),
7424
7430
.
Lado
L. R.
,
Polya
D.
,
Winkel
L.
,
Berg
M.
&
Hegan
A.
2008
Modelling arsenic hazard in Cambodia: A geostatistical approach using ancillary data
.
Applied Geochemistry
23
(
11
),
3010
3018
.
Lee
J. J.
,
Jang
C. S.
,
Liu
C. W.
,
Liang
C. P.
&
Wang
S. W.
2009
Determining the probability of arsenic in groundwater using a parsimonious model
.
Environmental Science & Technology
43
(
17
),
6662
6668
.
Li
R.
,
Kuo
Y. M.
,
Liu
W. W.
,
Jang
C. S.
,
Zhao
E.
&
Yao
L.
2018
Potential health risk assessment through ingestion and dermal contact arsenic contaminated groundwater in Jianghan Plain, China
.
Environmental Geochemistry and Health
40
(
4
),
1585
1599
.
Maiti
S. K.
2001
Methods in Environmental Studies: Water and Wastewater Analysis
.
ABD, Jaipur
.
Mukherjee
A.
,
Fryar
A. E.
&
Thomas
W. A.
2009
Geologic, geomorphic and hydrologic framework and evolution of the Bengal basin, India and Bangladesh
.
Journal of Asian Earth Sciences
34
(
3
),
227
244
.
Mukherjee
A.
,
Fryar
A. E.
,
Scanlon
B. R.
,
Bhattacharya
P.
&
Bhattacharya
A.
2011
Elevated arsenic in deeper groundwater of the western Bengal basin, India: Extent and controls from regional to local scale
.
Applied Geochemistry
26
(
4
),
600
613
.
Murphy
T.
,
Irvine
K.
,
Phan
K.
,
Lean
D.
&
Wilson
K.
2019
Environmental and health implications of the correlation between arsenic and zinc levels in rice from an arsenic-rich zone in Cambodia
.
Journal of Health and Pollution
9 (22).
Nickson
R. T.
,
McArthur
J. M.
,
Ravenscroft
P.
,
Burgess
W. G.
&
Ahmed
K. M.
2000
Mechanism of arsenic release to groundwater, Bangladesh and West Bengal
.
Applied Geochemistry
15
(
4
),
403
413
.
Pallier
V.
,
Feuillade-Cathalifaud
G.
,
Serpaud
B.
&
Bollinger
J. C.
2010
Effect of organic matter on arsenic removal during coagulation/flocculation treatment
.
Journal of Colloid and Interface Science
342
(
1
),
26
32
.
Panagopoulos
A.
2022b
Techno-economic assessment of zero liquid discharge (ZLD) systems for sustainable treatment, minimization and valorization of seawater brine
.
Journal of Environmental Management
306 (
1-11
),
114488
.
Podgorski
J.
&
Berg
M.
2020
Global threat of arsenic in groundwater
.
 Science
368
(
6493
),
845
850
.
Postma
D.
,
Pham
T. K. T.
,
H. U.
,
Vi
M. L.
,
Nguyen
T. T.
,
Larsen
F.
&
Jakobsen
R.
2016
A model for the evolution in water chemistry of an arsenic contaminated aquifer over the last 6000 years, Red River floodplain, Vietnam
.
Geochimica et Cosmochimica Acta
195
,
277
292
.
Saha
D.
2009
Arsenic groundwater contamination in parts of middle Ganga plain, Bihar
.
Curr Sci
97
(
6
),
753
755
.
Smedley
P. L.
&
Kinniburgh
D. G.
2002
A review of the source, behaviour and distribution of arsenic in natural waters
.
Applied Geochemistry
17
(
5
),
517
568
.
Stine
R.
1989
An introduction to bootstrap methods: Examples and ideas
.
Sociological Methods & Research
18
(
2-3
),
243
291
.
Tirkey
P.
,
Bhattacharya
T.
,
Chakraborty
S.
&
Baraik
S.
2017
Assessment of groundwater quality and associated health risks: a case study of Ranchi city, Jharkhand, India
.
Groundwater for Sustainable Development
5
,
85
100
.
U.S. E.P.A
2005
Guidelines for Carcinogen Risk Assessment
.
Office of Environmental Information, Washington, DC
.
Environmental Protection Agency
.
U.S. EPA (U.S. Environmental Protection Agency)
1992a
Guidelines for Exposure Assessment
.
Federal Register
.
U.S. EPA (U.S. Environmental Protection Agency)
1992b
Draft Report: A Cross-Species Scaling Factor for Carcinogen Risk Assessment Based on Equivalence of mg/kg3/4/day
.
Federal Register
.
Van Herreweghe
S.
,
Swennen
R.
,
Vandecasteele
C.
&
Cappuyns
V.
2003
Solid phase speciation of arsenic by sequential extraction in standard reference materials and industrially contaminated soil samples
.
Environmental Pollution
122
(
3
),
323
342
.
Verma
S.
,
Mukherjee
A.
,
Choudhury
R.
&
Mahanta
C.
2015
Brahmaputra river basin groundwater: solute distribution, chemical evolution and arsenic occurrences in different geomorphic settings
.
Journal of Hydrology: Regional Studies
4
,
131
153
.
Verma
S.
,
Mukherjee
A.
,
Mahanta
C.
,
Choudhury
R.
,
Badoni
R. P.
&
Joshi
G.
2019
Arsenic fate in the Brahmaputra river basin aquifers: Controls of geogenic processes, provenance and water-rock interactions
.
Applied Geochemistry
107
,
171
186
.
WHO 2011 Guidelines for Drinking-Water Quality: World Health Organization. Distribution and Sales, Geneva.
Wilkie
J. A.
&
Hering
J. G.
1996
Adsorption of arsenic onto hydrous ferric oxide: effects of adsorbate/adsorbent ratios and co-occurring solutes
.
Colloids and Surfaces A: Physicochemical and Engineering Aspects
107
,
97
110
.
Yates
D.
,
Islam
M. Z.
&
Gao
J.
2018
SPAARC: a fast decision tree algorithm
. In:
Australasian Conference on Data Mining
.
Springer, Singapore
, pp.
43
55
.
Zhang
Q.
,
Rodríguez-Lado
L.
,
Johnson
C. A.
,
Xue
H.
,
Shi
J.
,
Zheng
Q.
&
Sun
G.
2012
Predicting the risk of arsenic contaminated groundwater in Shanxi Province, Northern China
.
Environmental Pollution
165
,
118
123
.
Zhang
Q.
,
Rodriguez-Lado
L.
,
Liu
J.
,
Johnson
C. A.
,
Zheng
Q.
&
Sun
G.
2013
Coupling predicted model of arsenic in groundwater with endemic arsenism occurrence in Shanxi Province, Northern China
.
Journal of Hazardous Materials
262
,
1147
1153
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).