India has been dealing with fluoride contamination of groundwater for the past few decades. Long-term exposure of fluoride can cause skeletal and dental fluorosis. Therefore, an in-depth exploration of fluoride concentrations in different parts of India is desirable. This work employs machine learning algorithms to analyze the fluoride concentrations in five major affected Indian states (Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal). A correlation matrix was used to identify appropriate predictor variables for fluoride prediction. The various algorithms used for predictions included K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The performance of these models is assessed over accuracy, precision, recall and error rate and receiver operating curve. As the dataset was skewed, the performance of models was evaluated before and after resampling. Analysis of results indicates that the RF model is the best model for predicting fluoride contamination in groundwater in Indian states.

  • Prediction of fluoride in groundwater using supervised machine learning algorithms is explored in Indian states.

  • This study focuses on only naturally occurring fluoride, i.e., geogenic sources of fluoride contamination in groundwater.

  • pH, EC, bicarbonate, chloride, total alkalinity, sodium and sulfate are major factors influencing the occurrence of fluoride in groundwater.

  • The random forest model is the best model in predicting fluoride contamination in groundwater.

Most nations, including Germany, the United States, China and India, rely primarily on groundwater for household, agricultural and industrial growth (Houéménou et al. 2020; Khosravi et al. 2020; Sutradhar & Mondal 2021). Over a period, less rainfall and less surface water are leading to the reliability on groundwater in India (Li et al. 2017). In accordance to the Central Ground Water Board (CGWB) 2010 report, 19 states have vigorous fluoride contamination levels across India (Rodell et al. 2009; World Bank 2010). Because of an abrupt surge in population along with industrial development and citification, the chemical composition of water has changed considerably in the recent past (Bhagure & Mirgane 2011; Singh & Kumar 2017). As a result, groundwater is also imposing a threat to humans as well as to the nation's economic growth. The groundwater contains various heavy chemicals like chromium, lead, arsenic, silica and some inorganic ions like fluoride, nitrate and chloride which are not good for health (Brindha & Elango 2011; Vithanage & Bhattacharya 2015). Out of all these chemicals, fluoride is of major concern in India.

Fluoride is the most electronegative plus the 17th most abundant component found naturally (Tebutt 1983). Fluoride is present in the form of rocks (granitic rocks which are rich in fluoride minerals, igneous and metamorphic rocks) and minerals in deep earth. Apart from natural occurrences, volcanic activities, industrial waste, ceramic industries, brick kilns, etc., are also responsible for the presence of fluoride in the environment. The major sources of fluoride contamination are shown in Figure 1.
Figure 1

Origin of fluoride contagion.

Figure 1

Origin of fluoride contagion.

Close modal

Granitic rocks, metamorphic rocks and igneous rocks are present naturally under the earth crust. Weathering of these rocks leads to fluoride contamination which is a geogenic source of fluoride contamination. Natural sources which contribute to fluoride pollution in groundwater are volcanic eruption that releases a huge amount of volcanic ash and volcanic rocks loaded with fluorine (Araya et al. 1993). Geothermal water which is alkaline in nature promotes the absorption of fluoride from fluoride-bearing minerals from rocks (Saxena & Ahmed 2001). Atmospheric deposition in which the air contain fluoride in particulate form or in gaseous form due to rainfall reaches the earth's surface and then to the groundwater (Gupta et al. 2005). Anthropogenic activities include industrial debris, phosphatic fertilizers, dumping grounds, brick manufacturing industries, ceramic industries, aluminum smelting and power stations using coal are all responsible for fluoride contamination in groundwater (Singh et al. 2008; Rawat et al. 2010). Out of all the different sources of fluoride contamination, geogenic source contamination which occurs due to the weathering of rocks under the earth crust is the primary cause of fluoride contamination around the globe (Kumar et al. 2016).

Fluoride is present in fusion with other elements like fluorite and fluorapatite and in soil and water (Ghosh et al. 2013; Banerjee 2015). The earth's crust contains rocks that are rich in fluoride like mica, topaz, fluorite, apatite (Handa 1975), etc. These rocks upon weathering release inorganic fluoride in groundwater (Tavener & Clark 2006; Vithanage & Bhattacharya 2015). There exists an ion that has similar properties with respect to size and charge like fluoride (F). This ion is hydroxide ion (OH) which in chemical reaction replaces fluoride ion (F) with itself (Saxena & Ahmed 2001; Chae et al. 2007). The processes that are accountable for fluoride contagion within groundwater are precipitation, hydrolysis, adsorption, dissolution, biochemical reaction and ion exchange (Saxena & Ahmed 2003). Fluoride is present in granitic rocks (hornblende, muscovite), igneous rocks, metamorphic rocks and sedimentary deposits which on weathering releases fluoride (Edmunds & Smedley 2005; Ozsvath 2009; Vithanage & Bhattacharya 2015). The quantity of fluoride depends upon the conformation of the rocks. For instance, the fluoride content in ultramafic rocks is 100 mg/kg whereas alkaline rocks contain 1,000 mg/kg and marine shales contain 1,300 mg/kg (Hem 1985; Faure 1991; Ozsvath 2009).

Nearly 200 million people consume water that is above the prescribed limit of WHO (World Health Organization (WHO) 2006). Out of this, 70 million people are from India only (UNICEF 1999). The permissible amount of fluoride in the human body lies between 0.5 and 1.0 mg/L by the World Health Organization (WHO) and 1.5 mg/L in adverse conditions. The guidelines and standards for fluoride in drinking water are presented in Table 1. Freshwater contains 0.01–3 mg/L of fluoride, whereas groundwater contains 1–35 mg/L. Though fluoride helps in coagulation of tooth coating and preserving bone health in the human body (Ghosh et al. 2013), extreme consumption of fluoride leads to chronic fluorosis that affects not only the teeth (World Health Organization (WHO) 1994) and bones but also notable effects on cardiovascular, respiratory, gastrointestinal and immune system parts of the human body (Salve et al. 2008; Chouhan & Flora 2010).

Table 1

Guidelines and standards for fluoride in drinking water

Country/bodiesValue (mg/L)References
World Health Organization (WHO) 1.5 (Guideline value) WHO (2011)  
Australia 1.5 (Permissible limit) NHMRC & NRMMC (2011)  
Bureau of Indian Standards (BIS) 1 (Acceptable limit) BIS (2012)  
1.5 (Permissible limit) 
Canada 1.5 (Permissible limit) Health Canada (2010)  
European Union 1.5 (Permissible limit) DECLG (2014)  
Ireland 1.5 (Permissible limit) NEIA (2018)  
Japan 0.8 (Standard value) MHLW (2010)  
New Zealand 1.5 (Permissible limit) MH (2008)  
Malaysia 1.5 (Permissible limit) ESD (2004)  
Singapore 0.7 (Max. prescribed quantity) NEA Singapore (2008)  
South Korea 1.5 (Permissible limit) ECOREA (2013)  
United States Environment Protection Agency (USEPA) 4 (Max. contaminant level) USEPA (2011)  
2 (Secondary max. contaminant level) 
Country/bodiesValue (mg/L)References
World Health Organization (WHO) 1.5 (Guideline value) WHO (2011)  
Australia 1.5 (Permissible limit) NHMRC & NRMMC (2011)  
Bureau of Indian Standards (BIS) 1 (Acceptable limit) BIS (2012)  
1.5 (Permissible limit) 
Canada 1.5 (Permissible limit) Health Canada (2010)  
European Union 1.5 (Permissible limit) DECLG (2014)  
Ireland 1.5 (Permissible limit) NEIA (2018)  
Japan 0.8 (Standard value) MHLW (2010)  
New Zealand 1.5 (Permissible limit) MH (2008)  
Malaysia 1.5 (Permissible limit) ESD (2004)  
Singapore 0.7 (Max. prescribed quantity) NEA Singapore (2008)  
South Korea 1.5 (Permissible limit) ECOREA (2013)  
United States Environment Protection Agency (USEPA) 4 (Max. contaminant level) USEPA (2011)  
2 (Secondary max. contaminant level) 

Several studies have been done in India to predict fluoride contamination using traditional chemical analysis approaches like high fluoride in Tamil Nadu, India (Chicas et al. 2022), geochemical analysis and health risk associated with fluoride contamination in Guntur, Andhra Pradesh, India (Rao Subba et al. 2020), fluoride contamination in groundwater resources of Alleppey, South India (Raj & Shaji 2016), fluoride pollution in Nalgonda, India (Adimalla et al. 2019), fluoride assessment in different parts of Telangana (Narsimha & Sudarshan 2017; Narsimha & Rajitha 2018), fluoride distribution in different parts of Haryana, India (Yadav et al. 2019), fluoride contamination in West Bengal, India (Batabyal & Gupta 2017; De et al. 2022) and fluoride contamination in Sonbhadra, Uttar Pradesh (Raju et al. 2009). Another study of the Kolar and Tumkur districts of Karnataka, India revealed that evaporation and rock-weathering are responsible for fluoride contamination in groundwater (Mamatha & Rao 2010).

These studies assess groundwater through the chemical process which is time-consuming and involves a big budget as it requires collecting samples by digging up wells from different areas, evaluating water samples collected and data management, etc. Cost of equipment, labor force and chemicals used for evaluation further make it an expensive process (Tiyasha Tung & Yaseen 2020). According to the literature, analysis of water via traditional chemical approaches is not cost-friendly and therefore influences water quality assessment to some extent (Ongley 2000). In this digital era, artificial intelligence techniques are seen as potential solutions to solve real-world problems.

Machine learning is an artificial intelligence technique that is used to learn useful patterns from the data to make appropriate predictions. These techniques interpret the non-linear and intricate associations among input and output data. Over the years, large amounts of data have been collected by scientists to anticipate water contamination. The present study focuses on utilizing this data for making predictions using machine learning techniques in a lesser amount of time. Machine learning models are used to estimate groundwater contamination in different parts of the world. For example, the artificial neural network (ANN) possesses the ability to address numerous inaccuracies within a dataset. Also, ANN is able to uncover the non-associations between predictor and dangling variables, whereas the random forest (RF) model is proficient in effectively managing binary, continuous, missing value and high-dimensional data. Furthermore, logistic regression (LR) is an efficient method adept at analyzing binary classification and swiftly training datasets. Table 2 presents a summary of studies that employed machine learning techniques for fluoride predictions. All the studies presented in Table 2 evidently proved that machine learning algorithms have good probability of accurately predicting the occurrence of fluoride in smaller time spans.

Table 2

Summary of research work done on fluoride contamination using machine learning

State/countryAlgorithm usedEvaluation parametersLimitationReferences
Ghana, West Africa Random forest algorithm Sensitivity, specificity, precision, balanced accuracy Only random forest algorithm used Araya et al. (2022)  
Datong Basin, China Random forest, logistic regression, artificial neural network Accuracy, sensitivity, specificity, error rate Other models can be used Nafouanti et al. (2021)  
Qiantao and Houtao plain, Northwestern China Random forest algorithm Sensitivity, specificity, positive predictive value, negative predictive value Only random forest algorithm is used Xiangcao et al. (2024)  
Whole China Artificial neural network Sensitivity, specificity, AUC curve Used only artificial neural network Hailong et al. (2022)  
Chhatisgarh, India Random forest, extreme gradient boosting (XGBoost), artificial neural network Mean square error, mean absolute error, root mean squared error, mean absolute percentage error, coefficient of determination Single monsoon dataset was collected Singha et al. (2021)  
Mamundiyar Basin, India Artificial neural network Mean square error Only single algorithm is used Dar et al. (2012)  
Bankura, Purulia, Paschim Medinipur, West Bengal India Random forest model Accuracy, sensitivity, specificity, ROC (AUC) curve Random forest is used Aind et al. (2022)  
India Random forest model, multivariate logistic regression Sensitivity, specificity, ROC (AUC) curve Only two algorithms are used Podgorski et al. (2018)  
Khaf, Iran Artificial neural network Root mean squared error Only artificial neural network is used Mohammadi et al. (2016a)  
Pakistan Random forest model Mean decrease accuracy, mean decrease Gini impurity Small dataset and only random forest is used Yuya et al. (2022)  
Maku, Turkey Extreme learning machine, multi-layer perceptron, support vector machine Coefficient of determination, root mean squared error, mean absolute bias error, Nash–Sutcliffe efficient coefficient Only 143 water samples were used Barzegar et al. (2017)  
Southeast Antolia Region, Turkey LR-KNN-ANN (hybrid) and SVM Correlation coefficient values of the machine learning models Only 252 samples were used Ataş et al. (2021)  
Western United States Random forest algorithm R-squared and root mean squared error Only random forest algorithm used Celia et al. (2022)  
All over World Random forest algorithm Sensitivity, specificity, balanced accuracy, AUC curve Only random forest algorithm used Podgorski & Berg (2022)  
State/countryAlgorithm usedEvaluation parametersLimitationReferences
Ghana, West Africa Random forest algorithm Sensitivity, specificity, precision, balanced accuracy Only random forest algorithm used Araya et al. (2022)  
Datong Basin, China Random forest, logistic regression, artificial neural network Accuracy, sensitivity, specificity, error rate Other models can be used Nafouanti et al. (2021)  
Qiantao and Houtao plain, Northwestern China Random forest algorithm Sensitivity, specificity, positive predictive value, negative predictive value Only random forest algorithm is used Xiangcao et al. (2024)  
Whole China Artificial neural network Sensitivity, specificity, AUC curve Used only artificial neural network Hailong et al. (2022)  
Chhatisgarh, India Random forest, extreme gradient boosting (XGBoost), artificial neural network Mean square error, mean absolute error, root mean squared error, mean absolute percentage error, coefficient of determination Single monsoon dataset was collected Singha et al. (2021)  
Mamundiyar Basin, India Artificial neural network Mean square error Only single algorithm is used Dar et al. (2012)  
Bankura, Purulia, Paschim Medinipur, West Bengal India Random forest model Accuracy, sensitivity, specificity, ROC (AUC) curve Random forest is used Aind et al. (2022)  
India Random forest model, multivariate logistic regression Sensitivity, specificity, ROC (AUC) curve Only two algorithms are used Podgorski et al. (2018)  
Khaf, Iran Artificial neural network Root mean squared error Only artificial neural network is used Mohammadi et al. (2016a)  
Pakistan Random forest model Mean decrease accuracy, mean decrease Gini impurity Small dataset and only random forest is used Yuya et al. (2022)  
Maku, Turkey Extreme learning machine, multi-layer perceptron, support vector machine Coefficient of determination, root mean squared error, mean absolute bias error, Nash–Sutcliffe efficient coefficient Only 143 water samples were used Barzegar et al. (2017)  
Southeast Antolia Region, Turkey LR-KNN-ANN (hybrid) and SVM Correlation coefficient values of the machine learning models Only 252 samples were used Ataş et al. (2021)  
Western United States Random forest algorithm R-squared and root mean squared error Only random forest algorithm used Celia et al. (2022)  
All over World Random forest algorithm Sensitivity, specificity, balanced accuracy, AUC curve Only random forest algorithm used Podgorski & Berg (2022)  

The following research gaps are analyzed during the literature survey:

  • (1) Firstly, the traditional chemical analysis of groundwater is an unavoidable process, but it is a time-consuming process. Repeated chemical analysis incurs high costs as digging wells include labor cost, equipment, etc. Given the abundance of existing chemical analysis data and the desire to mitigate the expenses associated with repetitive chemical testing, machine learning offers a promising avenue for exploration. Machine learning techniques are highly efficient in predicting fluoride in groundwater in a short span of time (Table 2). The machines are trained on the data generated by the chemical analysis of water which helps in predicting fluoride in groundwater.

  • (2) Secondly, the major source of fluoride contamination is a geogenic source which involves the weathering of rocks under the earth's crust. However, limited studies have analyzed geogenic sources of fluoride in groundwater individually (Kumar et al. 2016).

  • (3) Furthermore, it is worth noting that some of the most heavily fluoride-affected regions in India including Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal have yet to be examined using machine learning methodologies. This presents an intriguing opportunity to apply advanced analytical techniques to better understand and address the complexities of fluoride contamination in these areas, in order to uncover new insights and develop targeted mitigation strategies for improved public health outcomes.

The study particularly focuses on predicting fluoride contamination due to geogenic sources in groundwater using supervised machine learning algorithms in India. In addition to this, this study also evaluates and compares different supervised machine learning algorithms and focuses on analyzing the fluoride concentrations in five major affected states of India: Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal.

This section presents the detailed methodology adopted to analyze the fluoride concentrations in five majorly affected states of India. It involves various steps beginning with understanding the geological and hydrological settings, collection of data, feature engineering, modeling, analysis of output, etc. The process begins with the collection of raw data from the CGWB website in MS Excel format. The raw data collected was preprocessed which involved cleaning of data by removing the irrelevant data and handling the missing value in desired data (Figure 2). The next step is data exploration via visualization to understand the data, and the relationship between features using a correlation matrix. The important features were extracted using mean decrease in impurity (MDI) as shown in Figure 2.
Figure 2

Steps involved in methodology for machine learning-based prediction models.

Figure 2

Steps involved in methodology for machine learning-based prediction models.

Close modal
The World Health Organization states that fluoride is safe for human consumption up to a 1 mg/L threshold and is harmful exceeding this limit. For this reason, fluoride is divided into two classes: class 0 and class 1, using binary classification. If fluoride intake is less than or equal to 1 mg/L, it falls under class 0, and if it exceeds 1 mg/L, it falls under class 1 (Table 1). So, fluoride (F) attribute is discretized using Equation (1) as follows:
(1)

After preprocessing, the next step is to develop the model using supervised machine learning algorithms. The data were divided in 80:20 ratios for training and testing the model. The training set is used to learn the model using various classification algorithms as shown in Figure 2, whereas the test set is used to assess the performance of the model using various evaluation parameters such as accuracy, precision, recall and error rate (Figure 2). The detailed steps are as follows.

Geological and hydrological settings of Indian states

In India, groundwater is used for drinking by 85% of the population (World Bank 2010). The geological and climatic conditions of India make it more vulnerable to fluoride contamination (Kundu et al. 2001; Rao et al. 2006; Singaraja et al. 2014). The map shown in Figure 3 depicts fluoride >1.5 mg/L in India. According to prior studies, Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal are more vulnerable, out of all the Indian states (Thivya et al. 2015; Ali et al. 2016; Mondal et al. 2016b). The region's geological conditions are mostly to blame for the elevated fluoride levels in groundwater. Andhra Pradesh is surrounded with Peninsular Gneiss which is divided into granitoid, newer granites and gneissic rocks. The granite gneiss consist of quartz, feldspar, biotite, and iron ore, garnetiferous granite gneiss, gray granite (Sriraamadas 1967). The minerals that are the root cause of geogenic pollution of fluoride are present in the parent rock of this state (Mukherjee & Singh 2018). Rajasthan is India's biggest state with reference to the area, making up 10.4% of the country's total land area. Rajasthan is divided into 33 districts, and 18 of them have fluoride groundwater, making them prone to fluorosis (Muralidharan et al. 2002). The Rajasthan states geology composed of metamorphosed rocks, which contain quartzite, mica schist and gneiss (Wasson et al. 1984; Sundaram & Pareek 1995; Yuya et al. 2022), basalt and rhyolite are surrounded by sandstone, limestone, slate, phyllite, schist, marble, dolomite and limestone which are nothing but carbonate rocks (Deotare et al. 1998). Different minerals like clay, apatite, fluorite, mica and feldspar are also present in Rajasthan. On the other hand, Tamil Nadu is composed of Archean and Proterozoic rocks. These rocks are made up of a variety of minerals, including biotite, pyrite, pyrrhotite, arsenopyrite, chalcopyrite, quartzite and calc-granulite (Subramani et al. 2005; Manikandan et al. 2012). Telangana largely comprises Peninsular Gneiss Complex along with Gondwana super group, Deccan trap and Cuddapah super group. This region is made up of minerals including clay and feldspar, as well as limestone, sandstone and coal (Mukherjee & Singh 2018). West Bengal is the densely populated state of India. Though the fluoride concentration in West Bengal was in the admissible limit (Datta et al. 2014), some of the districts like Nadia, 24 Parganas, Purulia and Midnapur were badly affected. Purulia district contains granite gneiss, limestone, hornblende schist, biotite gneiss, pegmatite and quartz vein (Mukherjee & Singh 2018). Apart from these, different rock formations exist in some areas of West Bengal which contain dolerite, hornblende schist, quartzites, phyllites and mica, gabbroic anorthosites, anorthosites and pyroxene granulites, phyllites, mica, amphibolite, hornblende and quartzite, fluorite, fluorapatite, biotite, amphibole and feldspathic materials (Chakrabarti & Bhattacharya 2013).
Figure 3

Distribution of fluoride in aquifer systems of India (CGWB, https://cgwb.gov.in/cgwbpnm/public/uploads/documents/1686055710748531399file.pdf).

The underground rock formation of Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal majorly consists of granites and gneisses, limestones, clays, sandstones and coal. Table 3 exhibits the range of fluoride in different types of rocks.

Table 3

Types of rocks and range of fluoride in the rocks (Mukherjee & Singh 2018)

Type of rocksRange of fluoride (ppm)
Basalt 20–1,060 
Granites and gneisses 20–2,700 
Shales and clays 10–7,600 
Limestones 0–1,200 
Sandstones 10–880 
Coal (ash) 40–480 
Type of rocksRange of fluoride (ppm)
Basalt 20–1,060 
Granites and gneisses 20–2,700 
Shales and clays 10–7,600 
Limestones 0–1,200 
Sandstones 10–880 
Coal (ash) 40–480 

Data collection and extraction

The data collection and extraction is the first step in methodology of this study as mentioned in Figure 2. The data were collected from the CGWB website (https://cgwb.gov.in/index.html) from groundwater quality reports (groundwater quality data, 2010–2018). This data contains groundwater analysis from the years 2010–2018 of all the states of India. In total, the dataset contains 85,197 rows. The samples of water were taken from dug wells and tube wells. Out of 85,197 tuples, the data of five states are as follows: Andhra Pradesh (4,618), Rajasthan (5,426), Tamil Nadu (4,026), Telangana (2,917) and West Bengal (4,032). The total data of five states comes with a total of 21,019 rows.

Data preprocessing

The data contain 15 input variables: pH, electrical conductivity (EC), total hardness (TH), total alkalinity, calcium (Ca), magnesium (Mg), sodium (Na), potassium (K), iron (Fe), carbonate (CO3), bicarbonate (HCO3), chloride (Cl), sulfate (SO42−), nitrate (NO3−) and fluoride (F). The data were noisy with missing values, some irrelevant observations, outliers and different formats as well. In this study, the data were cleaned by removing the irrelevant observations, removing data of different formats and filling the missing values with the mean hence reducing the data to 8,047 tuples.

According to guidelines provided by the Bureau of Indian Standards (BIS; Table 1), the acceptable limit of fluoride consumption in the human body should not be more than 1 mg/L. For this reason, the cleaned data was scaled to 0 (zero) to fluoride concentration less than equal to 1 mg/L and assigned 1 (one) to fluoride concentration greater than 1 mg/L (Equation (1)). The resulted fluoride data upon scaling is unbalanced as it contains 73% labels with zero values of fluoride and 27% labels with one value of fluoride as shown in Figure 4. Handling unbalanced classes in binary classification is a common challenge in machine learning. There exist various methods to address this issue: either use resampling techniques or use algorithms that inherently handle class imbalance well, such as tree-based algorithms. In this technique, both techniques are used to handle the unbalanced data. In this work, resampling is done via oversampling. As shown in Figure 4, the number of instances in the minority class is increased to balance it with the majority class. Here, K-fold cross-validation is used which splits the input data into K batches each of uniform dimension. K-fold cross-validation is implemented using the scikit-learn machine learning library of python.
Figure 4

(a) Unbalanced fluoride samples before resampling and (b) fluoride samples after resampling using K-fold cross-validation technique.

Figure 4

(a) Unbalanced fluoride samples before resampling and (b) fluoride samples after resampling using K-fold cross-validation technique.

Close modal

Choice of appropriate input

Machine learning models are affected by disposing of remarkable features or by keeping unrelated inputs (Gheyas & Smith 2010). To determine the association between the input variables, a correlation matrix is generated.

The correlation matrix between the parameters shown in Figure 5 is calculated by the given formula:
where is values of x variable in a sample, is the mean of values of x variable, is values of y variable in a sample and is the mean of values of y variable. Correlation is an analytical method that shows how closely two or more variables change with respect to one another. In this study, firstly correlation matrix is created and then important features are also extracted using a feature selection method. MDI is a technique used in tree-based ensemble algorithms such as RF and gradient boosting machines (GBMs) to evaluate the relevance of each feature in predicting the target variable. Using this approach, out of 15 input variables only five input variables are selected. These five features play an important role in predicting fluoride contamination. They are Na, SO42−, EC, HCO3 and Cl.
Figure 5

Correlation matrix between different parameters.

Figure 5

Correlation matrix between different parameters.

Close modal

Evaluation parameters

The confusion matrix serves as the primary foundation for the assessment of prediction performance for binary classification. The confusion matrix compares the model's classification of the actual values to the anticipated values (Bowes et al. 2012). To determine the proportions of data that were correctly categorized, the forecast was contrasted with observed concentrations. The confusion matrix metrics (accuracy, precision, recall and error rate) are utilized to inspect the models and these are computed to:

where True Positive is the model accurately forecast fluoride; True Negative is the model accurately forecast non-fluoride; False Positive is the model inaccurately forecast fluoride (means an outcome is forecasted as fluoride but it is actually not) and False Negative is the model inaccurately forecast as non-fluoride while it is fluoride.

Supervised machine learning algorithms used for analysis

Logistic regression

Logistic regression (LR) is an effective supervised learning approach that works rapidly in training the datasets. LR accurately inspects binary classification (Qian et al. 2020). In this study, it is used to predict fluoride contents in groundwater. LR is given by:
where and are estimated parameters.

K-nearest neighbor

The simplest supervised machine learning algorithm is the K-nearest neighbor (KNN) algorithm which is perhaps used for classification and regression but largely best suited for classification problems. This algorithm revolves around the notion that identical patterns are located adjacent to one another (Cover & Hart 1967). The classifier finds K samples that have minimum distance from one another when the scenario is unlabeled. The adjacency is measured by distance between two samples. Euclidean distance can be used as distance metric and is best suited for multidimensional data. The Euclidean distance between two points a and b in x-space is given by:

Support vector machine

Another supervised machine learning approach is support vector machines (SVMs) and is used for classification and regression. Though SVMs initially are meant for classification, now they are generalized for regression problems. SVMs efficiently classify linearly separable data and non-linearly separable data using kernel functions such as sigmoid, radial or polynomial (Singh et al. 2022). It is advantageous if the data are detachable and the ratio between the number of dimensions and samples is greater. SVMs are memory efficient but it is more time-consuming in training the model which makes it impractical for large datasets.

Naïve Bayes

Naïve Bayes (NB) pursues Bayes theorem and predicts accurately on non-linear interdependence among predictors and response even when the sample size is small. NB is considered appropriate for unconditional variables. It needs a limited number of training datasets for promptly predicting on test datasets. In addition to this, NB strongly presumes that the predictors are uncorrelated and independent of each other. NB is described as:

Let the posterior probability of a class

P(a) = prior probability of a class

= probability of the predictor given class

P(b) = prior probability of the predictor of a class

Then, Where

Multi-layer perceptron classifier

Multi-layer perceptron (MLP) algorithm perceives patterns and objects by imitating the functioning of the human nervous system. MLP is very much influenced by the composition of neurons inside the human brain (Bengio 2009). MLP classifier is an ANN model that translates various input datasets onto a range of suitable sets of outputs. Data travels through layers of an MLP in one (ahead) direction from the input to the output. There exist several levels of perceptron or neurons. The nodes on every layer are linked to each and every node on the next layer (Figure 6).
Figure 6

ANN architecture.

Figure 6

ANN architecture.

Close modal

Decision tree classifier

The decision tree is a multivariate machine learning technique suitable for classification as well as regression. Decision trees work on inner instinct (Kucheryavskiy 2018) and are straightforward (Géron 2019). The decision tree classifier is a tree-based classification technique that is frequently used to model binary responses. It teaches how independent variables classify binary responses by making decisions. Decision trees are useful in handling non-linear datasets effectively by building decision trees which in turn helps in creating the classification model. Decision trees work well with numerical and categorical data.

Ensemble techniques

Random forest

RF is an ensemble technique dependent upon classification and regression trees where both these techniques apply recursive binary splitting for bifurcating the dataset just to find the optimal variables (Breiman 2001). A substantial quantity of different trees are grown together by randomly resampling the original data and random selection of variables for dependable forecasting. The concluding forecasting is the outcome of the aggregate of the entire tree population (Araya et al. 2022). Apart from this, RF has the concept of bagging with some further degree of randomization. Also, the RF model works best in dealing with multi-scaled data, misplaced data, dichotomous data, immunity against noise and is quickly trained thus making it convenient and uncomplicated (Wu et al. 2019).

Gradient boosting

Another ensemble technique is gradient boosting utilized for both classification and regression. Gradient boosting relies on the assumption that the weak learners when combined can form an effective learning model by gaining knowledge from prior misclassifications.

Voting classifier hard and voting classifier soft

Voting classifier, a machine learning model that learns from a collection of many models and forecasts an output based on the class that has the highest likelihood of being selected as the output. Two different voting methods are supported by voting classifier:

Voting classifier hard: The class with the maximum number of votes means the class that has the maximum likelihood of being forecasted by every classifier is selected.

Voting classifier soft: The forecast made for each output class is determined by the aggregate probability designated to such class.

Thorough studies were performed to analyze the data from various perspectives. Data were analyzed using an open source software library Python 3.x in Jupyter notebook. The detailed discussion of the experiments and results is as follows.

Statistical analysis

To know the chemical configuration of groundwater in different areas, the interpretation of hydrochemistry of groundwater plays a vital role. The extent of fluoride in groundwater lies between 0 and 98 mg/L, whereas the average fluoride concentration lies up to 0.99 mg/L in groundwater (Table 4).

Table 4

Statistical analysis of physiochemical parameters for groundwater samples of the study area

VariablespHECTHCa2+Mg2+TANa+K+HCO3ClSO42−NO3F
Min −48.6 
Max 9.66 48,500 9,445 2,351 2,722 7,909 9,750 779 9,650 20,000 8,862 4,405 98 
Mean 7.81 2,041 439.55 80.42 66.85 280.76 275.89 18.54 383.73 363.47 164.54 65.19 0.99 
SD 0.43 2,241.7 416.5 82.32 124.53 226.01 421.97 47.17 248.84 651.19 290.37 132.52 1.60 
VariablespHECTHCa2+Mg2+TANa+K+HCO3ClSO42−NO3F
Min −48.6 
Max 9.66 48,500 9,445 2,351 2,722 7,909 9,750 779 9,650 20,000 8,862 4,405 98 
Mean 7.81 2,041 439.55 80.42 66.85 280.76 275.89 18.54 383.73 363.47 164.54 65.19 0.99 
SD 0.43 2,241.7 416.5 82.32 124.53 226.01 421.97 47.17 248.84 651.19 290.37 132.52 1.60 

The major factor influencing fluoride in our study area is HCO3 whose average concentration is 383.72 mg/L and its range lies between 0 and 9,650 mg/L. Other factors which also contribute to fluoride contamination are SO42− which lies between 0 and 8,862 mg/L and its average concentration is 164.54 mg/L, whereas Cl lies in a range of 0–20,000 mg/L and its average is 363.46 mg/L in the study areas. The average concentration of Mg2+, Na+ and EC are 66.84 mg/L, 275.89 mg/L and 2,041.00 μS/cm, respectively (Table 4).

The Bureau of Indian Standards (BIS 2012) for drinking water and WHO has given guidelines for different parameters in the form of acceptable and permissible limits. Although BIS advises focusing on the acceptable limit whenever possible and the permitted limit in situations when there is not a backup water source. The range of pH lies between 6.5 and 8.5 in drinking water and the EC should not exceed 400 μS/cm (WHO). The TH as calcium carbonate and TA as calcium carbonate acceptable limit is 200 mg/L and its permissible limit is 600 mg/L. The acceptable limit of calcium, chloride and magnesium is 75, 250 and 30 mg/L, whereas the permissible limit is 200, 1,000 and 100 mg/L. Sulfate has an acceptable limit of 200 mg/L and the permissible limit of 400 mg/L. Nitrate has an acceptable limit of 45 mg/L and there is no relaxation on permissible limit. Though BIS have not defined any acceptable limit and permissible limit for sodium and potassium, WHO (1996) states that intake of more than 200 mg/L of sodium can affect the taste of drinking water, whereas potassium intake should be 4.7 g/day in adults between 19 and 70 years of age (WHO 2009).

Pearson correlation between the parameters

Meticulous experiments were performed to evaluate the performance of 10 machine learning algorithms to predict the fluoride concentrations in five states of India. As the performance of machine learning models is highly affected by the input features in terms of accuracy as well as computational complexity, the first study was done to identify the important features to be considered for analysis.

Figure 5 displays the correlation between the parameters and the correlation value lies between +1 and −1. Amidst various parameters used in this study, fluoride has a positive correlation with pH (0.12), EC (0.18), TA (0.20), sodium (0.23), bicarbonate (0.25), sulfate (0.16) and chloride (0.12). Several studies indicate that high pH and more number of bicarbonate ions (HCO3) with sodium ion (Na+) could be the dominant cause of fluoride in groundwater (Saxena & Ahmed 2001; Guo et al. 2007; Dey et al. 2012). Also, due to higher levels of bicarbonate and hydroxide ion, the alkalinity of groundwater increases. An increase in alkalinity results in the displacement of fluoride ions with fluoride-rich minerals like muscovite, biotite and amphibole (Guo et al. 2007). Among all, fluoride has a negative correlation with potassium (−0.06) and carbonate (−0.03) but a positive weak correlation with magnesium (0.05) and nitrate (0.06).

Determining dominant factors accountable for fluoride motility

The association between fluorides with different parameters was identified by using MDI. Determining dominant features is important as it improves the overall forecasting of the models by eradicating negative impact factors. MDI is implemented by making use of the sklearn library in python. The factors such as EC, sulfate, bicarbonate, sodium, magnesium and chloride have maximal values of MDI in the plot as shown in Figure 7.
Figure 7

Mean decrease in impurity.

Figure 7

Mean decrease in impurity.

Close modal
pH is not generated as an important feature but is still considered for this study because pH is directly related to the weathering of rocks which is one of the causes of fluoride pollution in groundwater. Therefore, this study uses pH, EC, total alkalinity, sodium, bicarbonate, chloride and sulfate as input parameters in forecasting fluoride. The present study also showcases the scatter plot (Figure 8) of fluoride with pH, EC, total alkalinity, sodium, bicarbonate, chloride and sulfate. The scatter plots (Figure 8) are primarily used for data visualization indicating the relationship between different variables.
Figure 8

Scatter plot of fluoride with pH, electrical conductivity (EC), sulfate (SO42−), bicarbonate (HCO3), sodium (Na), magnesium (Mg) and chloride (Cl). (a) Fluoride vs. pH, (b) fluoride vs. EC, (c) fluoride vs. bicarbonate, (d) fluoride vs. sulfate, (e) fluoride vs. sodium, (f) fluoride vs. magnesium and (g) fluoride vs. chloride.

Figure 8

Scatter plot of fluoride with pH, electrical conductivity (EC), sulfate (SO42−), bicarbonate (HCO3), sodium (Na), magnesium (Mg) and chloride (Cl). (a) Fluoride vs. pH, (b) fluoride vs. EC, (c) fluoride vs. bicarbonate, (d) fluoride vs. sulfate, (e) fluoride vs. sodium, (f) fluoride vs. magnesium and (g) fluoride vs. chloride.

Close modal

The features that are positively correlated with fluoride and the features extracted using MDI are same. Both measures are used to validate the resultant features.

Analysis of results

The present study uses binary classification by dividing fluoride levels into a category of 0 and 1. A confusion matrix of each model was generated to analyze the prediction of the models based on true positive, false positive, true negative and false negative. To perform the experiments, data were divided into 80% training data and 20% test dataset. The performance was assessed for five Indian states using supervised machine learning algorithms: KNN, LR, RF, support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The results obtained after training and testing the various machine learning models are depicted in Table 5 and Figure 9. Machine learning models having high accuracy, precision and recall with low error rate are considered the best. It can be observed from the results that RF gives an accuracy of 75.29% and a precision of 65.96% which is highest among all the models. Accuracy gives the number of correct predictions out of total predictions whereas precision tells about the quality of positive predictions. The precision increases if the number of true positive increases. Both accuracy and precision are used to predict the classification models. Also, the RF algorithm gives a recall of 61.63% which is also highest among all the recall values of the different models. Recall indicates that the areas which are high in fluoride are all predicted as positive and recall is not concerned with the negative predictions. The error rate of RF is lowest, i.e., 24.71% which means out of all the predictions 24.71% are incorrect predictions. Other than RF, voting classifier hard gives an accuracy of 71.79% and a precision of 60.26% which is the second best among all the classification models with an error rate of 28.21%.
Table 5

Accuracy, precision, recall and error rate generated by different supervised learning models

ModelAccuracy (%)Precision (%)Recall (%)Error rate (%)
K-nearest neighbor 66.25 50.63 57.67 33.75 
Logistic regression 70.54 56.88 57.68 29.46 
Random forest 75.29 65.96 61.63 24.71 
Support vector classifier 70.59 56.95 57.79 29.41 
Gaussian NB 67.67 54.82 41.32 32.33 
MLP classifier 60.75 46.05 59.40 39.25 
Decision tree classifier 67.27 52.21 53.57 32.73 
Gradient boosting classifier 69.90 55.12 60.21 30.1 
Voting classifier soft 70.59 59.51 51.14 29.41 
Voting classifier hard 71.79 60.26 56.02 28.21 
ModelAccuracy (%)Precision (%)Recall (%)Error rate (%)
K-nearest neighbor 66.25 50.63 57.67 33.75 
Logistic regression 70.54 56.88 57.68 29.46 
Random forest 75.29 65.96 61.63 24.71 
Support vector classifier 70.59 56.95 57.79 29.41 
Gaussian NB 67.67 54.82 41.32 32.33 
MLP classifier 60.75 46.05 59.40 39.25 
Decision tree classifier 67.27 52.21 53.57 32.73 
Gradient boosting classifier 69.90 55.12 60.21 30.1 
Voting classifier soft 70.59 59.51 51.14 29.41 
Voting classifier hard 71.79 60.26 56.02 28.21 
Figure 9

Accuracy, precision, recall and error rate obtained for different supervised machine learning algorithms.

Figure 9

Accuracy, precision, recall and error rate obtained for different supervised machine learning algorithms.

Close modal

The accuracy of SVC and voting classifier soft is the same which is 70.59% but the precision of SVC is 56.95% with a recall of 57.79% and that of voting classifier soft is 59.51% with a recall of 51.14%. This means that both the models predicted the same accuracy but voting classifier soft gives more precision than SVC in the present study.

The analytical estimation of the different models implemented in this study to predict groundwater fluoride gives satisfactory results with accuracy and precision. The results from the various models have a reasonable level of accuracy because of the pertinent parameters that were chosen with the help of correlation matrix and feature selection. The effectiveness of the models is further assessed using an ROC (AUC) curve as this curve is a crucial method for assessing the effectiveness of a developed classification model. The ROC (AUC) curve is a probability curve that compares the True Positive Rate and the False Positive Rate at different threshold levels. This curve indicates how a model is capable of distinguishing between classes. Greater levels of the AUC curve indicate that the model is accurate in predicting class 0 as 0 and class 1 as 1. The curve shown in Figure 10 indicates that the RF model has the highest ROC value, i.e., 68%. This means that the RF model is the best at predicting low fluoride and high fluoride areas accurately.
Figure 10

ROC (AUC) curve.

Figure 10

ROC (AUC) curve.

Close modal

The present study evaluated and compared different supervised machine learning models to forecast fluoride accumulation in groundwater in five major states of India. The models are evaluated and compared on the basis of accuracy, precision, recall and error rate. The results from the study indicated that:

  • The features that have a positive correlation with fluoride and dominant features extracted using MDI are EC, sulfate, bicarbonate, sodium, magnesium and chloride are the same. The model performance generated using six input variables produces a negligible difference in the results in comparison with the result generated by all the 15 input variables. So, it is advisable to use the relevant features to train and test the model.

  • The different supervised machine learning algorithms are compared in this study by considering only the geogenic parameters of fluoride contamination.

  • Out of all algorithms, the RF model gives an accuracy of 75.8% in predicting fluoride concentration in groundwater.

  • The error rate indicating the incorrect predictions out of total predictions of RF is lowest among all the models.

The results generated in this study suggest that machine learning algorithms have proven efficient in forecasting fluoride contamination in groundwater. However, this study considers only natural sources of fluoride in groundwater. This study can be further extended by including different parameters like evapotranspiration, precipitation, soil parameters, etc.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Adimalla
N.
,
Marsetty
S. K.
&
Xu
P.
2019
Assessing groundwater quality and health risks of fluoride pollution in the Shasler Vagu (SV) watershed of Nalgonda, India
.
Human and Ecological Risk Assessment: An International Journal
26
(
6
).
doi:10.1080/10807039.2019.1594154
.
Ali
S.
,
Thakur
S. K.
,
Sarkar
A.
&
Shekhar
S.
2016
Worldwide contamination of water by fluoride
.
Environmental Chemistry Letters
14
,
291
315
.
https://doi.org/10.1007/s10311-016-0563-5
.
Araya
O.
,
Wittwer
F.
&
Villa
A.
1993
Evolution of fluoride concentration in cattle and grass following a volcanic eruption
.
Veterinary and Human Toxicology
35
,
437
440
.
https://doi.org/10.1136/vr.126.26.641
.
Araya
D.
,
Podgorski
J.
,
Kumi
M.
,
Mainoo
P. A.
&
Berg
M.
2022
Fluoride contamination of groundwater resources in Ghana: Country-wide hazard modeling and estimated population at risk
.
Water Research
212
,
118083
.
ISSN 0043-1354. https://doi.org/10.1016/j.watres.2022.118083
.
Ataş
M.
,
Yesilnacar
M.
&
Yetis
A.
2021
Novel machine learning techniques based hybrid models (LR-KNN-ANN and SVM) in prediction of dental fluorosis in groundwater
.
Environmental Geochemistry and Health
44
,
3891
3905
.
https://doi.org/10.1007/s10653-021-01148-x
.
Banerjee
A.
2015
Groundwater fluoride contamination: A reappraisal
.
Geoscience Frontiers
6
,
277
284
.
https://doi.org/10.1016/j.gsf.2014.03.003
.
Barzegar
R.
,
Asghari Moghaddam
A.
,
Adamowski
J.
&
Elham
F.
2017
Comparison of machine learning models for predicting fluoride contamination in groundwater
.
Stochastic Environmental Research and Risk Assessment
31
,
2705
2718
.
https://doi.org/10.1007/s00477-016-1338-z
.
Bengio
Y.
2009
Learning deep architectures for AI, foundations and trends
.
Machine Learning
2
(
1
),
1
127
.
http://dx.doi.org/10.1561/2200000006.
Bhagure
G. R.
&
Mirgane
S. R.
2011
Heavy metal concentrations in groundwaters and soils of Thane region of Maharashtra, India
.
Environmental Monitoring and Assessment
173
,
643
652
.
https://doi.org/10.1007/s10661-010-1412-9
.
Bowes
D.
,
Hall
T.
&
Gray
D.
2012
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: ACM digital library PROMISE '12: 8th International Conference on Predictive Models in Software Engineering, September 21–22, 2012. Lund, Sweden, pp. 109–118. https://doi.org/10.1145/2365324.2365338.
Breiman
L.
2001
Random forests
.
Machine Learning
45
,
5
32
.
https://doi.org/10.1201/9780429469275-8
.
Brindha
K.
&
Elango
L.
2011
Fluoride in groundwater: Causes, implications and mitigation measures
. In:
Fluoride: Properties, Applications and Environmental Management
, 1st edn., (Monroy, S. D., ed.). Nova Publishers, New York, pp.
111
136
.
Bureau of Indian Standards (BIS)
2012
Indian Standards Institution – Indian Standard Specification for Drinking Water
.
Bureau of Indian Standards
,
New Delhi
.
IS 10500:2012
.
Celia
R. Z.
,
Kenneth
B.
,
Katherine
R. M.
,
Paul
S. E.
&
Peter
M. B.
2022
Predicting regional fluoride concentrations at public and domestic supply depths in basin-fill aquifers of the western United States using a random forest model
.
Science of the Total Environment
806
(
Part 4
),
150960
.
ISSN 0048-9697. https://doi.org/10.1016/j.scitotenv.2021.150960
.
Central Ground Water Board (CGWB)
.
Ministry of Jal Shakti, Department of Water Resources, River Development and Ganga Rejuvenation, Government of India. Available from: http://cgwb.gov.in/wqreports.html.
Chae
G. T.
,
Yun
S. T.
,
Mayer
B.
,
Kim
K. H.
,
Kim
S. Y.
,
Kwon
J. S.
,
Kim
K.
&
Koh
Y. K.
2007
Fluorine geochemistry in bedrock groundwater of South Korea
.
Science of the Total Environment
385
,
272
283
.
https://doi.org/10.1016/j.scitotenv.2007.06.038
.
Chakrabarti
S.
&
Bhattacharya
H. N.
2013
Inferring the hydro-geochemistry of fluoride contamination in Bankura district, West Bengal: A case study
.
Journal Geological Society of India
82
,
379
391
.
https://doi.org/10.1007/s12594-013-0165-9
.
Chicas
D. S.
,
Omine
K.
,
Prabhakaran
M.
,
Sunitha
G. T.
&
Sivasankar
V.
2022
High fluoride in groundwater and associated non-carcinogenic risks at Tiruvannamalai region in Tamil Nadu, India
.
Ecotoxicology and Environmental Safety
233
,
113335
.
ISSN 0147-6513. https://doi.org/10.1016/j.ecoenv.2022.113335
.
Chouhan
S.
&
Flora
S. J. S.
2010
Arsenic and fluoride: Two major ground water pollutants
.
Indian Journal of Experimental Biology
48
,
666
678
.
Cover
T. M.
&
Hart
P. E.
1967
Nearest neighbor pattern classification
.
IEEE Transactions on Information Theory
13
(
1
),
21
27
.
Dar
I. A.
,
Sankar
K.
,
Dar
M. A.
&
Majumder
M.
2012
Fluoride contamination – Artificial neural network modeling and inverse distance weighting approach
.
Journal of Water Science
25
(
2
),
165
182
.
Datta
A. S.
,
Chakrabortty
A.
,
De Dalal
S. S.
&
Lahiri
S. C.
2014
Fluoride contamination of underground water in West Bengal, India
.
Fluoride
47
,
241
248
.
De
A.
,
Mridha
D.
,
Joardar
M.
,
Das
A.
,
Chowdhury
N. R.
&
Roychowdhury
T.
2022
Distribution, prevalence and health risk assessment of fluoride and arsenic in groundwater from lower Gangetic plain in West Bengal, India
.
Groundwater for Sustainable Development
16
,
100722
.
ISSN 2352-801X. https://doi.org/10.1016/j.gsd.2021.100722
.
DECLG (Department of the Environment, Community and Local Government)
2014
European Union (Drinking Water) Regulations. S.I. 122
.
Stationary Office
,
Dublin
.
Deotare
B. C.
,
Kajale
M. D.
,
Kshirsagar
A. A.
&
Rajaguru
S. N.
1998
Geoarcheological and palaeoenvironmental studies around Bap-Malar playa, district Jodhpur, Rajasthan
.
Current Science
3
,
316
320
.
Dey
R. K.
,
Swain
S. K.
,
Mishra
S.
,
Sharma
P.
,
Patnaik
T.
,
Singh
V. K.
,
Dehury
B. N.
,
Jha
U.
&
Patel
R. K.
2012
Hydrogeochemical processes controlling the high fluoride concentration in groundwater: A case study at the Boden block area, Orissa, India
.
Environmental Monitoring and Assessment
184
,
3279
3291
.
https://doi.org/10.1007/s10661-011-2188-2
.
ECOREA
2013
Environmental Review
.
Ministry of Environment Republic of Korea
, Sejong, South Korea.
Edmunds
W. M.
&
Smedley
P. L.
,
2005
Fluoride in natural waters
. In:
Essentials of Medical Geology
(
Selinus
O.
ed.).
Springer
,
Dordrecht
, pp.
311
336
.
ESD
2004
National Standard for Drinking Water Quality
.
ESD, Ministry of Health
,
Putra Jaya, Malaysia
.
Faure
G.
1991
Principles and Applications of Inorganic Geochemistry
.
Macmillan Publ. Co
,
New York, NY
, p.
626
.
Géron
A.
2019
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
.
O'Reilly Media
, Sebastapol, CA, USA.
Gheyas
I. A.
&
Smith
L. S.
2010
Feature subset selection in large dimensionality domains
.
Pattern Recognition
43
,
5
13
.
Ghosh
A.
,
Mukherjee
K.
,
Ghosh
S. K.
&
Saha
B.
2013
Sources and toxicity of fluoride in the environment
.
Research on Chemical Intermediates
39
,
2881
2915
.
https://doi.org/10.1007/s11164-012-0841-1
.
Guo
F.
,
Jiang
G.
&
Yuan
D.
2007
Major ions in typical subterranean rivers and their anthropogenic impacts in southwest karst areas, China
.
Environmental Geology
53
,
533
541
.
https://doi.org/10.1007/s00254-007-0665-2
.
Gupta
S. K.
,
Deshpande
R. D.
,
Agarwal
M.
&
Raval
B. R.
2005
Origin of high fluoride in groundwater in the North Gujarat-Cambay region, India
.
Hydrogeology Journal
13
,
596
605
.
https://doi.org/10.1007/s10040-004-0389-2
.
Hailong
C.
,
Xianjun
X.
,
Yanxin
W.
&
Hongxing
L.
2022
Predicting geogenic groundwater fluoride contamination throughout China
.
Journal of Environmental Sciences
115
,
140
148
.
ISSN 1001-0742. https://doi.org/10.1016/j.jes.2021.07.005
.
Handa
B. K.
1975
Geochemistry and genesis of fluoride containing ground waters in India
.
Groundwater
13
,
275
281
.
https://doi.org/10.1111/j.1745-6584.1975.tb03086.x
.
Health Canada
2010
Guidelines for Canadian Drinking Water Quality: Guideline Technical Document – Fluoride
.
Water, Air and Climate Change Bureau, Healthy Environments and Consumer Safety Branch, Health Canada
,
Ottawa, Ontario
.
Hem
J. D.
1985
Study and Interpretation of the Chemical Characteristics of Natural Water
, 3rd edn.
U.S Geological Survey Water-Supply Paper 2254
.
U.S. Geological Survey
,
Alexandria
, p.
263
.
Houéménou
H.
,
Tweed
S.
,
Dobigny
G.
,
Mama
D.
,
Alassane
A.
,
Silmer
R.
,
Babic
M.
,
Ruy
S.
,
Chaigneau
A.
,
Gauthier
P.
,
Socohou
A.
,
Dossou
H.-J.
,
Badou
S.
&
Leblanc
M.
2020
Degradation of groundwater quality in expanding cities in West Africa. A case study of the unregulated shallow aquifer in Cotonou
.
Journal of Hydrology
582
,
124438
.
https://doi.org/10.1016/j.jhydrol.2019.124438
.
Khosravi
K.
,
Barzegar
R.
,
Miraki
S.
,
Adamowski
J.
,
Daggupati
P.
,
Alizadeh
M. R.
,
Pham
B. T.
&
Alami
M. T.
2020
Stochastic modeling of groundwater fluoride contamination: Introducing lazy learners
.
Groundwater
58
,
723
734
.
https://doi.org/10.1111/gwat.12963
.
Kucheryavskiy
S.
2018
Analysis of NIR spectroscopic data using decision trees and their ensembles
.
Journal of Analysis and Testing
2
(
3
),
274
289
.
Kumar
M.
,
Das
A.
,
Das
N.
,
Goswami
R.
&
Singh
U. K.
2016
Co-occurrence perspective of arsenic and fluoride in the groundwater of Diphu, Assam, Northeastern India
.
Chemosphere
150
,
227
238
.
https://doi.org/10.1016/j.chemosphere.2016.02.019
.
Kundu
N.
,
Panigrahi
M. K.
,
Tripathy
S.
,
Munshi
S.
,
Powell
M. A.
&
Hart
B. R.
2001
Geochemical appraisal of fluoride contamination of groundwater in the Nayagarh district of Orissa, India
.
Environmental Geology
41
,
451
460
.
https://doi.org/10.1007/s002540100414
.
Mamatha
P.
&
Rao
M. S.
2010
Geochemistry of fluoride rich groundwater in Kolar and Tumkur districts of Karnataka
.
Environmental Earth Sciences
61
(
1
),
131
142
.
doi:10.1007/s12665-009-0331-y
.
Manikandan
S.
,
Chidambaram
S.
,
Ramanathan
A. L.
,
Prasanna
M. V.
,
Karmegam
U.
,
Singaraja
C.
,
Parama
P.
&
Jainab
I.
2012
A study on the high fluoride concentration in the magnesium rich waters of hard rock aquifer in Krishnagiri district, Tamilnadu, India
.
Arabian Journal of Geosciences
7
,
273
285
.
https://doi.org/10.1007/s12517-012-752-x
.
MH
2008
Drinking Water Standard for New Zealand 2005 (Rev. Ed. 2008)
.
Ministry of Health, Government of New Zealand Wellington
,
New Zealand
.
MHLW
2010
Drinking Water Quality Standards in Japan
. (Annual Health, Labour and Welfare Report 2010–2011). Annual Health, Labour and Welfare of Japan. Available at: https://www.mhlw.go.jp/english/wp/wp-hw5/index.html.
Mohammadi
A. A.
,
Ghaderpoori
M.
,
Yousefi
M.
,
Rahmatipoor
M.
&
Javan
S.
2016a
Prediction and modeling of fluoride concentrations in groundwater resources using an artificial neural network: A case study in Khaf
.
Environmental Health Engineering and Management Journal
3
,
217
224
. doi:10.15171/EHEM.2016.23.
Mondal
D.
,
Gupta
S.
,
Reddy
D. V.
&
Dutta
G.
2016b
Fluoride enrichment in an alluvial aquifer with its subsequent effect on human health in Birbhum district, West Bengal, India
.
Chemosphere
.
https://doi.org/10.1016/j.chemosphere.2016.10.130
.
Mukherjee
I.
&
Singh
U. K.
2018
Groundwater fluoride contamination, probable release, and containment mechanisms: A review on Indian context
.
Environmental Geochemistry and Health
40
,
2259
2301
.
https://doi.org/10.1007/s10653-018-0096-x
.
Muralidharan
D.
,
Nair
A. P.
&
Sathyanarayana
U.
2002
Fluoride in shallow aquifers in Rajgarh Tehsil of Churu district, Rajasthan – An arid environment
.
Current Science
83
,
699
702
.
Nafouanti
M. B.
,
Li
J.
,
Mustapha
N. A.
,
Uwamungu
P.
&
AL-Alimi
D.
2021
Prediction on the fluoride contamination in groundwater at the Datong Basin, Northern China: Comparison of random forest, logistic regression and artificial neural network
.
Applied Geochemistry
132
,
105054
.
ISSN 0883-2927. https://doi.org/10.1016/j.apgeochem.2021.105054
.
Narsimha
A.
&
Sudarshan
V.
2017
Assessment of fluoride contamination in groundwater from Basara, Adilabad district, Telangana State, India
.
Applied Water Science
7
(
6
),
2717
2725
.
doi:10.1007/s13201-016-0489-x
.
NEA Singapore
2008
Environmental Public Health (Quantity of Piped Drinking Water) Regulation 2008
.
National Environment Agency of Singapore
.
NEIA
2018
Compliance with drinking water quality standards in Northern Ireland, 2017. Northern Ireland Environment Agency, Antrim, Northern Ireland.
NHMRC (National Health and Medical Research Council) & NRMMC (National Resource Management Ministerial Council)
2011
Australian Drinking Water Guidelines Paper 6
.
National Water Quality Management Strategy Commonwealth of Australia
,
Canberra
.
Ongley
E. D.
2000
Water quality management: Design, financing and sustainability considerations-II
. In: Proceedings of the African Water Resources Policy Conference, Nairobi, May 26–28, 1999. The World Bank, Washington, DC, pp.
1
16
.
Ozsvath
D. L.
2009
Fluoride and environmental health: A review
.
Reviews in Environmental Science & Biotechnology
8
,
59
79
.
https://doi.org/10.1007/s11157-008-9136-9
.
Podgorski
J.
&
Berg
M.
2022
Global analysis and prediction of fluoride in groundwater
.
Nature Communications
13
,
4232
.
https://doi.org/10.1038/s41467-022-31940-x
.
Podgorski
J. E.
,
Labhasetwar
P.
,
Saha
D.
&
Berg
M.
2018
Prediction modeling and mapping of groundwater fluoride contamination throughout India
.
Environmental Science and Technology
52
,
9889
9898
.
https://doi.org/10.1021/acs.est.8b01679
.
Qian
L.
,
Zhang
R.
,
Bai
C.
,
Wang
Y.
&
Wang
H.
2020
. https://doi.org/10.5194/nhess-2018-56.
Raj
D.
&
Shaji
E.
2016
Fluoride contamination in groundwater resources of Alleppey, southern India
.
Geoscience Frontiers
.
https://doi.org/10.1016/j.gsf.2016.01.002
.
Raju
N. J.
,
Dey
S.
&
Das
K.
2009
Fluoride contamination in groundwater of Sonbhadra district, Uttar Pradesh, India
.
Current Science
96
(
7
),
979
985
.
Rao Subba
N.
,
Ravindra
B.
&
Wu
J.
2020
Geochemical and health risk evaluation of fluoride rich groundwater in Sattenapalle Region, Guntur district, Andhra Pradesh, India
.
Human and Ecological Risk Assessment: An International Journal
26
(
9
),
2316
2348
.
doi:10.1080/10807039.2020.1741338
.
Rawat
M.
,
Singh
U. K.
&
Subramanian
V.
2010
Movement of toxic metals from small-scale industrial areas: A case study from Delhi, India
.
International Journal of Environment and Waste Management
4
,
224
236
.
Rodell
M.
,
Velicogna
I.
&
Famiglietti
J. S.
2009
Satellite-based estimates of groundwater depletion in India
.
Nature
460
,
999
.
Salve
P. R.
,
Maurya
A.
,
Kumbhare
P. S.
,
Ramteke
D. S.
&
Wate
S. R.
2008
Assessment of groundwater quality with respect to fluoride
.
Bulletin of Environment Contamination and Toxicology
81
,
289
293
.
https://doi.org/10.1007/s00128-008-9466-x
.
Saxena
V. K.
&
Ahmed
S.
2001
Dissolution of fluoride in groundwater: A water–rock interaction study
.
Environmental Geology
40
,
1084
1087
.
https://doi.org/10.1007/s002540100290
.
Saxena
V. K.
&
Ahmed
S.
2003
Inferring the chemical parameters for the dissolution of fluoride in groundwater
.
Environmental Geology
43
,
731
736
.
https://doi.org/10.1007/s00254-002-0672-2
.
Singaraja
C.
,
Chidambaram
S.
,
Anandhan
P.
,
Prasanna
M. V.
,
Thivya
C.
,
Thilagavathi
R.
&
Sarathidasan
J.
2014
Geochemical evaluation of fluoride contamination of groundwater in the Thoothukudi district of Tamilnadu, India
.
Applied Water Science
4
,
241
250
.
https://doi.org/10.1007/s13201-014-0157-y
.
Singh
U. K.
&
Kumar
B.
2017
Pathways of heavy metals contamination and associated human health risk in Ajay River basin, India
.
Chemosphere
174
,
183
199
.
https://doi.org/10.1016/j.chemosphere.2017.01.103
.
Singh
U. K.
,
Kumar
M.
,
Chauhan
R.
,
Jha
P. K.
,
Ramanathan
A. L.
&
Subramanian
V.
2008
Assessment of the impact of landfill on groundwater quality: A case study of the Pirana site in western India
.
Environmental Monitoring and Assessment
141
,
309
321
.
https://doi.org/10.1007/s10661-007-9897-6
.
Singha
S.
,
Pasupuleti
S.
,
Singha
S. S.
,
Singh
R.
&
Kumar
S.
2021
Prediction of groundwater quality using efficient machine learning technique
.
Chemosphere
276
,
130265
.
ISSN 0045-6535. https://doi.org/10.1016/j.chemosphere.2021.130265
.
Singh
S. K.
,
Taylor
R. W.
,
Pradhan
B.
,
Shirzadi
A.
&
Pham
B. T.
2022
Predicting sustainable arsenic mitigation using machine learning techniques
.
Ecotoxicology and Environmental Safety
232
,
113271
.
ISSN 0147-6513. https://doi.org/10.1016/j.ecoenv.2022.113271
.
Sriraamadas
A.
1967
Geology of Eastern Ghats in Andhra Pradesh
.
Proceedings of the Indian Academy of Sciences – Section B
66
,
200
205
.
Subramani
T.
,
Elango
L.
&
Damodarasamy
S. R.
2005
Groundwater quality and its suitability for drinking and agricultural use in Chithar river basin, Tamil Nadu, India
.
Environmental Geology
47
,
1099
1110
.
https://doi.org/10.1007/s00254-005-243-0
.
Sundaram
R. M.
&
Pareek
S.
1995
Quaternary facies and paleoenvironment in north and east of Sambhar Lake, Rajasthan
.
Journal of the Geological Society of India
46
,
385
392
.
Sutradhar
S.
&
Mondal
P.
2021
Groundwater suitability assessment based on water quality index and hydrochemical characterization of Suri Sadar Sub-division, West Bengal
.
Ecological Informatics
64
,
101335
.
https://doi.org/10.1016/j.ecoinf.2021.101335
.
Tavener
S. J.
&
Clark
J. H.
,
2006
Fluorine: Friend or foe? A green chemist's perspective
. In:
Fluorine and the Environment: Agrochemicals, Archaeology, Green Chemistry and Water (Chapter 5)
(
Tressaud
A.
ed.).
Elsevier
,
Amsterdam
, pp.
177
202
.
Tebutt
T. H. Y.
1983
Principles of Quality Control
.
Pergamon
,
England
, p.
235
.
Thivya
C.
,
Chidambaram
S.
,
Rao
M. S.
,
Thilagavathi
R.
,
Prasanna
M. V.
&
Manikandan
S.
2015
Assessment of fluoride contaminations in groundwater of hard rock aquifers in Madurai district, Tamil Nadu (India)
.
Applied Water Science
7
,
1011
1023
.
https://doi.org/10.1007/s13201-015-0312-0
.
Tiyasha Tung
T. M.
&
Yaseen
Z. M.
2020
A survey on river water quality modelling using artificial intelligence models: 2000–2020
.
Journal of Hydrology
585
,
124670
.
https://doi.org/10.1016/j.jhydrol.2020.124670
.
UNICEF
1999
States of the Art Report on the Extent of Fluoride in Drinking Water and the Resulting Endemicity in India
.
Report by Fluorosis and Rural Development Foundation for UNICEF
,
New Delhi
.
USEPA (U.S. Environmental Protection Agency)
2011
Edition of the Drinking Water Standards and Health Advisories
.
EPA 820-R-11-002
.
United States Environmental Protection Agency
,
Washington, DC
.
Vithanage
M.
&
Bhattacharya
P.
2015
Fluoride in the environment: Sources, distribution and defluoridation
.
Environmental Chemistry Letters
13
(
2
),
131
147
.
doi:10.1007/s10311-015-0496-4
.
Wasson
R. J.
,
Smith
G. I.
&
Agrawal
D. P.
1984
Late quaternary sediments, minerals, and inferred geochemical history of Didwana lake, Thar desert, India
.
Paleogeography, Palaeoclimatology, Palaeoecology
46
,
345
372
.
https://doi.org/10.1016/0031-0182(84)90006-3
.
World Bank
2010
Deep Wells and Prudence: Towards Pragmatic Action for Addressing Groundwater Overexploitation in India. World Bank Group, Washington, DC. Available from: http://documents.worldbank.org/curated/en/272661468267911138/Deep-wells-and-prudence-towards-pragmatic-action-for-addressing-groundwater-overexploitation-in-India.
World Health Organization (WHO)
1994
Fluorides and Oral Health (Technical Report Series No. 846)
.
World Health Organization
,
Geneva
.
World Health Organization (WHO)
1996
Guidelines for Drinking-Water Quality, Second ed. Volume 2, Health Criteria and Other Supporting Information
.
World Health Organization
,
Geneva
.
World Health Organization (WHO)
2006
Fluoride in Drinking Water
.
IWA Publishing
,
London, UK
, p.
144
.
World Health Organization (WHO)
2009
Potassium in Drinking-Water: Background Document for Development of WHO Guidelines for Drinking-Water Quality
.
World Health Organization
. Available from: https://iris.who.int/handle/10665/70171.
World Health Organization (WHO)
2011
Guidelines for Drinking-Water Quality
, 4th edn.
World Health Organization
,
Geneva
.
Wu
L.
,
Huang
G.
,
Fan
J.
,
Zhang
F.
,
Wang
X.
&
Zeng
W.
2019
Potential of kernel based nonlinear extension of Arps decline model and gradient boosting with categorical features support for predicting daily global solar radiation in humid regions
.
Energy Conversion and Management
183
(
280e
),
295
.
https://doi.org/10.1016/j.enconman.2018.12.103
.
Xiangcao
Z.
,
Su
C.
,
Xianjun
X.
,
Ge
W.
,
Xiao
Z.
,
Yang
L.
&
Pan
H.
2024
Employing machine learning to predict the occurrence and spatial variability of high fluoride groundwater in intensively irrigated areas
.
Applied Geochemistry
167
,
106000
.
ISSN 0883-2927. https://doi.org/10.1016/j.apgeochem.2024.106000
.
Yadav
S.
,
Bansal
S. K.
,
Yadav
S.
&
Kumar
S.
2019
Fluoride distribution in underground water of district Mahendergarh, Haryana, India
.
Applied Water Science
9
,
62
.
https://doi.org/10.1007/s13201-019-0935-7
.
Yuya
L.
,
Joel
P.
,
Muhammad
S.
,
Hifza
R.
,
Syed
E. S. A. M. A.
&
Michael
B.
2022
Monitoring and prediction of high fluoride concentrations in groundwater in Pakistan
.
Science of the Total Environment
839
,
156058
.
ISSN 0048-9697. https://doi.org/10.1016/j.scitotenv.2022.156058
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).