Groundwater vulnerability to nitrate assessment serves as a measure of potential groundwater nitrate pollution in a target area. This study applies the DRASTIC-LU framework, nitrate distribution data, and three machine learning models (RF, XGB, SVM) to classify nitrate levels (exceeding 10 mg/L as nitrogen) in Chongqing, China. Model evaluation uses accuracy and F1 score metrics, with RF achieving the highest accuracy (92.9%), kappa (0.857), and AUC (0.948) on test dataset. Furthermore, the SHAP interpreter revealed that aquifer conductivity, lithology, agricultural activities, areas with high-intensity development, and groundwater recharge are the most influential indicators of groundwater vulnerability. The final groundwater vulnerability level distribution map, with a resolution of 1 km × 1 km, reveals that high and extremely high vulnerability levels are concentrated in areas with high-intensity urban development and karst trough valleys in the southeastern, northeastern, and central urban areas. This work represents the first attempt of using machine learning models for groundwater vulnerability assessment in the Chongqing region. It provides theoretical support for the construction layout of groundwater monitoring stations and the prevention and control of groundwater pollution in the future.

  • Predicting groundwater nitrate vulnerability in Chongqing region using DRASTIC-LU framework and machine learning models.

  • Random Forest outperforms other machine learning approaches.

  • Aquifer conductivity, lithology, agricultural activities and areas with high-intensity development predictors were most influential explanatory factors.

  • Distribution of groundwater nitrate vulnerability map at 1 km resolution.

  • Groundwater nitrate vulnerability map reveal the high vulnerability levels distributed karst trough valleys in the southeastern, northeastern, and central urban areas.

Groundwater serves as a vital resource for numerous rural communities globally, playing a crucial role in sustaining agricultural activities and supporting food production on a global scale (Li et al. 2021). Approximately half of the world's potable water and a significant portion of irrigation water are sourced from groundwater reservoirs (Gleeson et al. 2016). In regions like China, the overexploitation of groundwater for agricultural, industrial, and domestic purposes has intensified, elevating concerns about the reliance on groundwater and its diminishing quality (Gu et al. 2013). It is anticipated that the demand for groundwater as a dependable and safe water supply source will significantly rise in the future. This surge in demand may render aquifers more vulnerable to anthropogenic influences, including intensified agricultural practices, alterations in land use (LU) and land cover, population growth (especially in developing regions), heightened water consumption with economic prosperity, rapid urbanization and industrialization, accessibility to inexpensive drilling and pumping technology, discharge of pollutants, power generation activities, and shifts in institutional frameworks. These factors collectively contribute to the increasing vulnerability of aquifers, highlighting the urgent need for sustainable groundwater management practices.

Nitrate is a naturally occurring form of nitrogen essential for the plant growth (Mencio et al. 2016). However, in recent decades, intensive agricultural activities in rural areas have led to an excessive leaching of nitrate from sources such as animal manure and synthetic fertilizers into groundwater (Canter 2019). Numerous factors contribute to the quantity of nitrate that infiltrates groundwater. These factors encompass both present and past land-use practices, historical and current nitrogen applications or deposition, soil type, the depth of the groundwater table, and the rate at which groundwater is replenished (Goodarzi et al. 2022). The intricate interplay of these variables can have a significant impact on the extent of nitrogen leaching into groundwater. According to the United States Environmental Protection Agency (USEPA), the maximum allowable concentration of nitrate nitrogen (NO3–N) in drinking water is 10 mg/L, roughly equivalent to 45 mg of nitrate (N2O) (Ransom et al. 2017). In China, a maximum permissible concentration of 50 mg/L for nitrate in drinking water has been established (Wang et al. 2020). The assessment of aquifer vulnerability to nitrate contamination serves as a crucial tool for decision makers, aiding in the adoption of efficient management strategies to mitigate groundwater pollution. It facilitates groundwater resource allocation and assists in determining appropriate land-use practices. Moreover, it plays a vital role in raising awareness among communities about the risks associated with groundwater nitrogen contamination.

To develop effective methods for modeling the pollution in groundwater and safeguarding groundwater against surface pollutants, a diverse array of models employing various techniques and methodologies has been applied globally in recent decades (Taghavi et al. 2022). Numerical simulation is effective for investigating the flow and transport of pollutants in groundwater at a small scale (Eslamian et al. 2023). However, this method demands high-quality feature indicators and poses challenges for application in large-scale groundwater pollution assessments. Therefore, a standardized index system has emerged for large-scale groundwater vulnerability (GWV) assessment models. Commonly utilized methods for GWV assessment encompass well-established techniques such as DRASTIC (Aller et al. 1987), GOD (Foster 1987), SINTACS, COP (Vías et al. 2006), and EPIK (Doerfliger et al. 1999). These aforementioned methods all assess GWV by considering hydrogeological characteristics and pollutant sources. The modified DRASTIC-LU model (Secunda et al. 1998), utilizing a Geographic Information System (GIS) and incorporating surface cover factors, allows for a more comprehensive assessment of groundwater contamination risk. DRASTIC-LU combines DRASTIC framework and land-use index including seven hydrological elements and land-use data: depth to water, net recharge, aquifer media, soil media, topography, impact of the vadose zone, aquifer hydraulic conductivity, and land-use layer (Goodarzi et al. 2022; Mkumbo et al. 2022; Secunda et al. 1998; Shanmugamoorthy et al. 2023). The traditional DRASTIC-LU evaluation model relies on a linear relationship between subjective weights and indicator values, making it a subjective empirical model. However, the impact of complex hydrogeological environments on groundwater pollutants is nonlinear. Subjective weight models face challenges in adapting to diverse hydrogeological conditions.

Compared to complex traditional physical models and empirical index models, a novel data-driven classification model, coupled with an evaluation index system, has been widely employed in regional and national-scale studies of groundwater quality assessment and other field of hydrology, such as groundwater nitrate concentration prediction (Huang et al. 2011; Knoll et al. 2019; Ransom et al. 2022; Spijker et al. 2021), groundwater PH prediction (DeSimone et al. 2020), and other pollutant concentration prediction of groundwater (Wheeler et al. 2015; Barzegar et al. 2018; Podgorski & Berg 2020). Machine learning methods such as multiple linear regression, random forest (RF) (Breiman 2001), and others are utilized for this purpose (Tianqi Chen 2016). Machine learning can effectively address the nonlinear features between GWV and indicator systems, while incorporating its interpretable features for attribution analysis. GWV can be defined as follows: Under natural conditions or the influence of human activities, a specific location in the groundwater system is subject to trends and potential contamination from a position above the aquifer (Foster 1987; Council 1993; Aslam et al. 2018; Lahjouj et al. 2020). Hence, the probability of pollutant exceedance can be used to characterize the vulnerability indicator of groundwater to this specific contaminant. The supervised learning paradigm in machine learning can utilize the data-label correspondence to classify the probability of pollutant concentration exceeding a threshold, thereby predicting the likelihood of pollutants surpassing permissible levels in water. This method effectively addresses the limitations of linear equations in subjective weighted assessments.

The utilization of DRASTIC-LU and machine learning models in aquifer quality management appears to offer a novel approach that could mitigate numerous associated problems and reduce costs. The purpose of this study is to classify whether groundwater nitrate concentrations exceed standard threshold values using supervised learning models in the Chongqing area. This approach aims to avoid the subjectivity associated with determining the weights of factors in traditional methods such as the DRASTIC-LU model. In addition, by integrating the SHapley Additive exPlanations (SHAP) (Lundberg & Lee 2017) explainer, the correlation between the model's predictive indicators and the model output results is analyzed and interpreted to identify the drivers of GWV to nitrate. Ultimately, accurate classification probabilities of the DRASTIC-LU model for groundwater nitrate–nitrogen concentration thresholds are obtained. These probabilities serve as estimates of groundwater nitrate–nitrogen vulnerability, providing a theoretical basis for large-scale groundwater resource management.

Study area

Chongqing, a southwest city of China, is situated in the transitional area between the Tibetan Plateau and the plain on the middle and lower reaches of the Yangtze River in the subtropical climate zone often swept by moist monsoons (Figure 1). The topography of Chongqing is higher in the northeast and southeast than in the west, among which the highest elevation is in the northeast of Chongqing and the lowest elevation is in the west of Chongqing. The topography of the study area can be classified into four major units: the western part features hilly terrain on the edge of the Sichuan Basin, the central region exhibits parallel ridges and valleys with low mountains and hills, the northeastern part showcases karst low-mountain terrain in the Daba Mountains, and the southeastern area presents karst mid-mountain terrain in the Wuling Mountains.

Due to the complex geological structures and topographic conditions, the hydrogeological environment within the area is notably intricate. Based on a combination of factors including groundwater occurrence conditions, hydraulic characteristics, and aquifer properties, groundwater in the region can be categorized into four major types: carbonate rock fractured-cave water (referred to as karst water), clastic rock porous-fractured water, bedrock fractured water, and unconsolidated rock porous water. The remaining area is characterized by Silurian mudstone, shale, and siltstone, forming a relatively impermeable layer. The respective proportions of these types in terms of the total area are represented by 35.90, 38.09, 18.15, and 0.64%. The distribution and burial characteristics of groundwater in the region are closely tied to geological strata, tectonics, and topography. Groundwater resources are abundant in the surrounding mountainous areas, while resources in the central and western parts are relatively scarce. Based on differences in tectonic units and regional topographic forms, the groundwater resources of the entire municipality are classified into three zones: the Red Formation hilly hydrogeological zone in the Sichuan Basin, the parallel ridge-valley low-mountain hydrogeological zone in eastern Sichuan, and the karst low-to-mid-mountain hydrogeological zone around the basin.

DRASTIC-LU model

The DRASTIC-LU method is an extension of the traditional DRASTIC method that incorporates LU as an additional factor in assessing GWV. This modification recognizes that land-use practices can significantly impact groundwater quality. The DRASTIC-LU method assigns ratings and weights to each of these factors, and the cumulative score is used to create a vulnerability map. This enhanced method provides a more comprehensive understanding of potential groundwater contamination by considering the influence of land-use practices in addition to the traditional hydrogeological factors. However, in both the DRASTIC and DRASTIC-LU methods, the assignment of weights to various factors is typically done subjectively by experts or researchers based on their experience and judgment. This subjectivity can lead to different results when different experts or researchers assess the same study area.
where r shows the rating, w is the weight, D is the depth to water, R is the recharge, A is the aquifer media, S is the soil media, T is the topography, I is the vadose zone, C is the hydraulic conductivity, and L is the land use (Table 1).
Table 1

Summary of the DRASTIC-LU predictor variables

Sub-headVariable nameData typeResolutionData source
Groundwater table depth Grid 1 km Fan et al. (2013)  
Recharge Grid 1 km Fick & Hijmans (2017); Zomer et al. (2022)  
Aquifer media Polygons – Hartmann & Moosdorf (2012)  
Soil media Grid 1 km Wieder et al. (2014)  
Topography Grid 30 m Agency (2021)  
Impact of vadose zone Grid 1 km Wieder et al. (2014)  
Aquifer conductivity Polygons – Hartmann & Moosdorf (2012)  
LU Land use Grid 10 m Gong et al. (2019)  
Sub-headVariable nameData typeResolutionData source
Groundwater table depth Grid 1 km Fan et al. (2013)  
Recharge Grid 1 km Fick & Hijmans (2017); Zomer et al. (2022)  
Aquifer media Polygons – Hartmann & Moosdorf (2012)  
Soil media Grid 1 km Wieder et al. (2014)  
Topography Grid 30 m Agency (2021)  
Impact of vadose zone Grid 1 km Wieder et al. (2014)  
Aquifer conductivity Polygons – Hartmann & Moosdorf (2012)  
LU Land use Grid 10 m Gong et al. (2019)  

The groundwater table depth data were sourced from the Global Groundwater Depth Database (Fan et al. 2013), with a resolution of 1 km. Precipitation data (Fick & Hijmans 2017) (PRE) and actual evapotranspiration (ET0) data (Zomer et al. 2022) were collected from the WorldClim V2.0 global climate dataset and the Global Aridity Index and Potential Evapotranspiration Database, with a spatial resolution of 1 km. Soil data and vadose zone data were obtained from the Harmonized World Soil Database v1.2 (Wieder et al. 2014), with a resolution of 1 km. Terrain data were derived from the global digital surface model dataset (AW3D) (Agency 2021), and slope data for the study area were generated using ARCGIS, with a resolution of 30 m. Conductivity data of the aquifers were obtained from vectorized hydrogeological maps (1:250,000) provided by the Chongqing Geological Survey Bureau, which were then converted into point data using grid panels. Land-use data were used in this study with 10 m resolution (Gong et al. 2019), consisting of 10 LU types: Bareland, Cropland, Forest, Grassland, Impervious surface, Shrubland, Snow, Tundra, Water, and Wetland. The detail data source and datatype of DRASTIC-LU models are shown in Table 1.

Depth to groundwater table (D)

Changes in groundwater depth directly influence the vulnerability of aquifers to pollution. In this research, we leverage the global groundwater depth dataset (Fan et al. 2013) at a 1 km resolution, integrating it with Chongqing's local groundwater table depth monitoring data for groundwater level bias correction. In the study area, the groundwater depth ranges from 0 to 1,000 m. The distribution pattern shows that the groundwater depth is generally shallow in the central and western parts, while it tends to be deeper in the southeastern and northeastern parts, consistent with the topography of the study area. Depth to groundwater table of the study area is shown in Figure 2.
Figure 1

Location of study area.

Figure 1

Location of study area.

Close modal
Figure 2

Spatial distribution in groundwater vulnerability parameters and maps of groundwater vulnerability conditioning factors: (D) groundwater table depth, (R) Recharge, (A) Aquifer media, (S) Soil, (T) Soil media, (I) Impact of vadose zone, (C) Hydraulic conductivity, and (LU) Land use.

Figure 2

Spatial distribution in groundwater vulnerability parameters and maps of groundwater vulnerability conditioning factors: (D) groundwater table depth, (R) Recharge, (A) Aquifer media, (S) Soil, (T) Soil media, (I) Impact of vadose zone, (C) Hydraulic conductivity, and (LU) Land use.

Close modal

Recharge (R)

Areas with high recharge rates may be more vulnerable to surface pollutants. Groundwater recharge, the process by which water is replenished into aquifers, plays a crucial role in influencing the transport and fate of contaminants. The groundwater recharge can be estimated by the difference between precipitation (PRE) (Fick & Hijmans 2017) and actual evapotranspiration (ET0) (Zomer et al. 2022). The groundwater recharge characteristics in the study area exhibit higher rates in the southeast and northeast regions, while the central and western areas show lower rates (Figure 2).

Aquifer media (A)

The parameter for aquifer media delineates the characteristics of the materials within the aquifer, exerting a significant influence on the processes of pollutant attenuation. This layer was collected through well profiles and hydrogeological maps. In the study area, based on a combination of factors such as groundwater occurrence conditions, hydrodynamic characteristics, and aquifer properties, the types of groundwater in the region can be classified into four major categories: carbonate rock fracture-karst water, clastic rock pore-fracture water, bedrock fracture water, and loose rock pore water (Hartmann & Moosdorf 2012). The lithology of carbonate rock and fissure karst water mainly consists of Triassic limestone, distributed in the karst trough valleys located in the southeastern, northeastern, and central-western regions of the study area. Meanwhile, the lithology of clastic rock pore-fissure water mainly comprises Triassic and Jurassic sandstones and mudstones, primarily distributed in the central-western area with sporadic occurrences in the northeastern and southeastern parts. Fracture water in bedrock and porous water in loose rocks have relatively smaller distribution areas, appearing only in the southeastern and northeastern parts (Figure 2).

Soil media (S)

Groundwater recharge, water infiltration, contaminant transport, and the interaction between groundwater and surface water are all contingent on soil media properties. The soil type map, which is collected from the HWSD soil dataset, showed the soil texture type in the study area contains clay, loam, loamy sand, sandy clay loam, sandy loam, and silt loam (Wieder et al. 2014). Clay is mainly distributed in the southwest, northwest, and the west karst trough valley region. Loam, which has the largest distribution area, can be observed across the entire region (Figure 2).

Topography (T)

In the DRASTIC model, the topography parameter is delineated by the slope, playing a crucial role in the examination of water resources and surface infiltration. This parameter directly impacts water's ability to permeate the soil. Lower slopes encourage more substantial infiltration, heightening the potential for pollutants to migrate into the aquifer. In this study, the slope of region was generated by DEM using GIS software and reclassified into different types (Agency 2021). The study area's slope ranges from 0 to 72° and is divided into six categories (Figure 2).

Impact of vadose zone (I)

The vadose zone, also known as the unsaturated zone, is the area above the water table where the pores in the soil and rock contain both air and water. This zone plays a significant role in regulating the movement and transformation of contaminants before they reach the groundwater table. In the context of groundwater pollution, the vadose zone can either mitigate or exacerbate the contamination process. Factors such as soil composition, moisture content, and the presence of organic matter influence how contaminants migrate through this zone (Wieder et al. 2014). The types of vadose zone in the study area can be categorized into five classes: clay, clay loam, loam, loamy sand, and sand loam. The distribution of each type is illustrated in Figure 2.

Aquifer conductivity (C)

Aquifer conductivity delineates the capacity of an aquifer to transmit water. It assumes a fundamental role in dictating the dynamics of water movement within the aquifer, thereby influencing groundwater flow patterns and the intricate transport mechanisms of contaminants. Elevated conductivity levels expedite groundwater flow, potentially extending the reach of contaminant migration. Conversely, diminished conductivity can act as a natural impediment, restraining the mobilization of contaminants and serving as an inherent protective barrier. In this study, the hydraulic conductivity of the aquifer was estimated by integrating the 1:250,000 hydrogeological survey data from Chongqing with empirical values of rock hydraulic conductivity. Areas with high aquifer conductivity are mainly concentrated in the northeast and southeast, as well as in the karst trough valleys parallel to the ridges. Conversely, areas with low conductivity are predominantly found in the sandstone and mudstone sedimentary rock areas, particularly concentrated in the central-western region of the study area (Figure 2).

Land use

LU (Figure 2), as a surface cover, plays a crucial role in influencing groundwater quality. Agricultural activities and urban expansion contribute to increased fertilizer usage and pollutant discharge. The threat to groundwater quality safety arises from precipitation and surface runoff infiltration. Extensive research has already demonstrated the adverse impacts of human activities on the groundwater environment. Then we employed a grid resolution of 1 km × 1 km to conduct proportional statistics on various land-use categories. The proportion values were then utilized as representative metrics for the impact of each land-use type.

Groundwater nitrate concentration (NO3–N)

The nitrate data for groundwater consist of 595 samples collected from the Chongqing Municipal Groundwater Monitoring Network. Among these, 520 samples were obtained from groundwater monitoring wells spanning the years 2018 to 2022, and an additional 75 samples were derived from springs as part of hydrogeological investigations. The determination of groundwater nitrate concentrations followed the industry standard DZ/T 0064.59-2021 (China 2021) published by the Ministry of Natural Resources of China, employing the ultraviolet spectrophotometric method. In the context of the 2022 Chongqing Municipal Groundwater Monitoring Plan, machine learning models were employed in conjunction with spatial data to predict the nitrate concentrations in the groundwater of the study area. This process resulted in the creation of a distribution map illustrating the concentrations of nitrate in the groundwater across the study area (Liang et al. 2024). The high nitrate concentration areas were found primarily in the southeastern and northeastern regions of Chongqing, while the central urban and western areas exhibited lower nitrate concentrations. In addition, elevated nitrate concentrations were identified in the mountainous regions of the central urban area, displaying a north-to-south distribution pattern along karst valley, which carries industrial, agricultural, and urban domestic wastewater discharge activities (Figure 3).
Figure 3

Spatial distribution of groundwater nitrate concentration.

Figure 3

Spatial distribution of groundwater nitrate concentration.

Close modal

Machine learning models

Random forest

The vulnerability of groundwater can be interpreted as the probability of groundwater resistance to pollution. To address the impact of subjective factors on weight assignment, we employ the RF algorithm (Breiman 2001) combined with a threshold for groundwater nitrate concentration to assess the vulnerability of groundwater. The RF algorithm (Breiman 2001), a powerful ensemble learning technique, has been widely applied in the field of hydrology. It has found extensive use in tasks such as predicting large-scale groundwater pollutant concentrations (Knoll et al. 2019; DeSimone et al. 2020; Ransom et al. 2022) and GWV assessment (Sajedi-Hosseini et al. 2018; Lahjouj et al. 2020). Comprising multiple decision trees, each trained on random subsets of the data, RF mitigates overfitting and enhances prediction accuracy.

Extreme gradient boosting

XGBoost (eXtreme Gradient Boosting) (Tianqi Chen 2016) is an efficient and scalable implementation of gradient boosting for classification and regression tasks like RF. It is a powerful machine learning algorithm that has gained a widespread popularity due to its speed, accuracy, and flexibility. XGBoost incorporates regularization techniques to prevent overfitting and improve generalization. It includes L1 (Lasso) and L2 (Ridge) regularization terms in the objective function, which penalizes the complexity of the model by adding the magnitude of the coefficients as part of the loss function. XGBoost has widespread application in domains such as groundwater pollution (Belitz & Stackelberg 2021; Ransom et al. 2022) and groundwater resource assessment.

Support vector machine

Support vector machine (SVM) (Suykens & Vandewalle 1999) is a supervised machine learning algorithm that is commonly used for binary classification tasks. SVM works by finding the optimal hyperplane that best separates the data into different classes. It is widely used in various domains such as text classification, image recognition, and bioinformatics. SVM, as a classic binary classification model, effectively addresses the threshold division problem of groundwater pollutants (El Bilali et al. 2021).

In this study, we employ the RF, XGBoost, and SVM algorithms, using the DRASTIC-LU framework as input features, the threshold of groundwater nitrate concentration as the model's predicted value, and ultimately utilize the predicted probabilities from the classification model as the GWV index.

Model evaluation indices

In this study, five widespread statistical evaluation measures including accuracy, recall, precision, F1 score, kappa index, and receiver operating characteristic (ROC) curve and area under the curve (AUC) are utilized as evaluation metrics for model training and testing.
In the formulas for accuracy, recall, precision, and F1 score, TP represents the number of true positives, FP represents the number of false positives, and FN represents the number of false negatives. Higher precision and recall indicate fewer false positives and correct identification of more positive samples.

The kappa coefficient is a statistical measure used to assess the consistency between classifiers or raters, particularly in cases involving classification problems. It takes into account the consistency in classification results caused by random factors, thus providing a more accurate evaluation even for imbalanced classification data. represents the observed probability of agreement between classifiers or raters, while represents the expected probability of agreement between classifiers or raters. nii represents the number of consistent classifications for the ith category by the classifier or rater. n represents the total number of samples. ri represents the total sum of rows for the ith category. ci represents the total sum of columns for the ith category. k represents the total number of categories.

An ROC curve is a plot of true positive rate (sensitivity) against false positive rate (1 − specificity) for different threshold values. AUC measures the overall performance of the model across all possible classification thresholds. A higher AUC value indicates better model performance.

Data process

After collecting the required data for the model, data preprocessing for the predictive indicators includes handling numerical and categorical data. Numerical data are normalized to a range of 0–1, while categorical data are processed using the one-hot encoding method. For the prediction values, a threshold (e.g., 10 mg/L nitrogen content) is used for binary classification, where values greater than the threshold are labeled as 1 and values less than the threshold are labeled as 0. Subsequently, a grid of 1 km × 1 km cells is formed over the study area, and the predicted indicator values are extracted to the grid, forming the model dataset. The dataset is then divided into training and testing sets at a ratio of 4:1. Model training and parameter tuning are conducted on the training set using fivefold cross-validation and grid search method, followed by model validation on the testing set. The trained models were ultimately applied to the entire dataset to generate a map depicting the vulnerability of groundwater to nitrate contamination. The data processing workflow is illustrated in Figure 4.
Figure 4

The workflow for predicting groundwater vulnerability based on DRASTIC-LU, and machine learning models include data collection, data process, model evaluation, and groundwater vulnerability prediction.

Figure 4

The workflow for predicting groundwater vulnerability based on DRASTIC-LU, and machine learning models include data collection, data process, model evaluation, and groundwater vulnerability prediction.

Close modal

Performance of models

Without validation, GWV models will not have technical significance. In conclusion, our study employed RF, XGBoost (XGB), and SVM algorithms for modeling purposes, utilizing both training (80%) and testing (20%) datasets. We evaluated the models using metrics such as accuracy, precision, recall, F1 score, kappa value, and AUC after fivefold cross-validation (Table 2). The ROC curve is a popular and applicable technique for assessing model accuracy in LSM testing. This curve is generated by plotting the model's sensitivity (true positive rate) on the y-axis against 1 − specificity (false positive rate) on the x-axis. When the AUC value is greater than 0.5, the constructed model can be considered suitable for this evaluation. Our findings indicate that RF exhibited the most favorable performance among the three algorithms (Figure 5).
Table 2

The results of accuracy, precision, recall, F1 score, and AUC for the RF, XGB, and SVM models on the training and testing datasets

ModelsDatasetClassPrecisionF1 scoreRecallAccuracyKappaAUC
RF Train class_0 0.966 0.960 0.953 0.960 0.919 0.970 
class_1 0.954 0.960 0.966 
Test class_0 0.938 0.928 0.918 0.929 0.857 0.948 
class_1 0.919 0.929 0.940 
XGB Train class_0 0.941 0.937 0.933 0.937 0.874 0.968 
class_1 0.934 0.938 0.942 
Test class_0 0.898 0.886 0.874 0.887 0.775 0.932 
class_1 0.877 0.889 0.901 
SVM Train class_0 0.845 0.841 0.837 0.842 0.684 0.923 
class_1 0.838 0.843 0.847 
Test class_0 0.829 0.826 0.824 0.827 0.654 0.886 
class_1 0.825 0.827 0.830 
ModelsDatasetClassPrecisionF1 scoreRecallAccuracyKappaAUC
RF Train class_0 0.966 0.960 0.953 0.960 0.919 0.970 
class_1 0.954 0.960 0.966 
Test class_0 0.938 0.928 0.918 0.929 0.857 0.948 
class_1 0.919 0.929 0.940 
XGB Train class_0 0.941 0.937 0.933 0.937 0.874 0.968 
class_1 0.934 0.938 0.942 
Test class_0 0.898 0.886 0.874 0.887 0.775 0.932 
class_1 0.877 0.889 0.901 
SVM Train class_0 0.845 0.841 0.837 0.842 0.684 0.923 
class_1 0.838 0.843 0.847 
Test class_0 0.829 0.826 0.824 0.827 0.654 0.886 
class_1 0.825 0.827 0.830 

Note: Class_0 and Class_1, respectively, represent groundwater nitrate concentrations less than 10 mg/L (as nitrogen) and greater than 10 mg/L (as nitrogen).

Figure 5

ROC curve and AUC scores of three models for test dataset.

Figure 5

ROC curve and AUC scores of three models for test dataset.

Close modal

In this study, all three classifiers used in model development performed quite well. RF achieved the highest accuracy (training: 96.0% and testing: 92.9%), followed by XGB (training: 93.7% and testing: 88.7%), and SVM performed the lowest (training: 84.2% and testing: 82.7%). The kappa statistics demonstrate the reliability of the models, and this classification does not occur by chance. The kappa statistics for RF and XGB models range from 0.77 to 0.92, indicating good model performance, while the kappa statistic for the SVM model ranges from 0.65 to 0.68, relatively poorer compared to RF and XGB. The AUC values, shown in Figure 5, for all three models, range from 0.88 to 0.97. Models with AUC values greater than 0.8 are considered good models, indicating that all three models perform well in this parameter. For a more detailed presentation of the model evaluation results, the precision, F1 score, and recall for each class (class_0 and class_1) in both the training and testing datasets are shown in Table 2.

Uncertainty analysis

The distribution of groundwater nitrate vulnerability is illustrated using mean and standard deviation (Figure 6). In the figure, the average prediction probability of the SVM model is highest at 0.45, followed by the RF model at 0.36, and the XGB model has the lowest average at 0.314. The standard deviation of the RF model is relatively smaller compared to the XGB model, indicating that the RF model in this study outperforms the other two methods and can better reflect the differences in vulnerability indices among different grid units. Comprehensive accuracy metrics, kappa, and AUC indicate that this approach exhibits good performance and low uncertainty in assessing groundwater nitrate vulnerability in Chongqing.
Figure 6

The distribution pattern of groundwater nitrate vulnerability index. From left to right, it shows the distribution patterns of the RF model, SVM model, and XGB model, respectively.

Figure 6

The distribution pattern of groundwater nitrate vulnerability index. From left to right, it shows the distribution patterns of the RF model, SVM model, and XGB model, respectively.

Close modal

Dominant feature of groundwater vulnerability to nitrate

To effectively carry out groundwater pollution prevention and control, as well as optimize groundwater monitoring networks, it is necessary to analyze and model the influencing factors of GWV. Therefore, we utilized the DRASTIC-LU model and employed the SHAP interpreter to analyze the impact of features on the model. SHAP generates a value for each input feature (referred to as a SHAP value), indicating how much that feature contributes to the prediction of a specific data point. Some factors positively influence the prediction probability, while others have a negative impact on it.

In this study, we used the SHAP interpreter to explain the top 10 most influential features of the best-performing RF model (Figure 7). The SHAP summary plot concluded that the top 10 most influential indicators are hydraulic conductivity (C), recharge (R), cropland, water, aquifer lithology (A), impervious land, impact of vadose zone (I), soil parameter (S), depth of groundwater (D), and topography (T).
Figure 7

SHAP values for the top 10 most important variables based on mean absolute value SHAP values. The SHAP values are normalized to the mean training data prediction, positive SHAP values indicate the positive impact on predictions, and negative values indicate the negative impact.

Figure 7

SHAP values for the top 10 most important variables based on mean absolute value SHAP values. The SHAP values are normalized to the mean training data prediction, positive SHAP values indicate the positive impact on predictions, and negative values indicate the negative impact.

Close modal

Hydraulic conductivity (C), identified as the most important variable, and aquifer lithology had a positive relation with GWV. These two predictive indicators can serve as reflections of the ease or difficulty of surface runoff infiltrating into groundwater. In addition, groundwater table depth, vadose zone influence, and soil parameters also rank among the top 10 most influential indicators. Groundwater depth is negatively correlated with vulnerability level, while the vadose zone reflects lower porosity, inversely related to groundwater nitrate vulnerability. These conditions promote denitrification processes. From the SHAP values and the magnitude of feature importance, it can be observed that aquifer permeability and GWV level to nitrate are directly proportional. This implies that high permeability facilitates the conversion of surface runoff into groundwater, and simultaneously, due to filtration processes, makes it easier for nitrates to enter the groundwater body.

Among the top 10 most influential indicators, meteorological, LU, and topographical parameters are also included. Groundwater recharge and the proportion of water bodies are positively correlated with GWV indicators, while slope gradient is negatively correlated. These predictive indicators reveal that in areas with high groundwater recharge, there is an increased leaching of nitrates from the soil, leading to their transfer into the groundwater and an elevated risk of groundwater nitrate pollution. In terms of LU, the proportions of both cultivated land and urban land are positively correlated with GWV level, indicating the positive impact of nitrogen input from urban water consumption, sewage discharge, and agricultural activities on the increase in groundwater nitrate content.

Model result verification

To validate the accuracy of machine learning algorithms and traditional subjective weighting models, we will employ the normalized mean nitrate concentration of 598 groundwater sampling points during the dry and wet seasons from 2018 to 2022 for model validation (Figure 8). The validation results were quantified using the R2 evaluation metric. Based on the correlation results from R2, the RF model exhibited the highest correlation coefficient with an R2 value of 0.803, followed by the XG model with R2 = 0.735. The traditional indicator weighting model had an R2 of 0.526, while the SVM model demonstrated the lowest correlation coefficient with R2 = 0.354. This indicates that the RF and XGB models exhibit the highest correlation between groundwater nitrate vulnerability index and groundwater nitrate concentration, validating that ensemble models are more suitable for predicting and assessing groundwater nitrate vulnerability in the study area compared to traditional linear indicator weighting models.
Figure 8

The scatter plot results of the correlation analysis between the groundwater nitrate vulnerability index predicted by the three machine learning models and the traditional linear model, and the normalized index of groundwater nitrate concentration.

Figure 8

The scatter plot results of the correlation analysis between the groundwater nitrate vulnerability index predicted by the three machine learning models and the traditional linear model, and the normalized index of groundwater nitrate concentration.

Close modal
Figure 9

The results of four groundwater nitrate vulnerability zoning methods and presented them with box plots of observed nitrate concentrations within each zone.

Figure 9

The results of four groundwater nitrate vulnerability zoning methods and presented them with box plots of observed nitrate concentrations within each zone.

Close modal

Groundwater vulnerability to nitrate

The models, which have been tuned, are utilized to make probability predictions on whether the groundwater nitrate concentration exceeds the threshold. This process quantifies the vulnerability of groundwater to nitrate pollution. The comparison between classification methods using Spearman rank correlation (ρ), Eta coefficient (η), and F-statistics indicated that the equal interval (Sajedi-Hosseini et al. 2018; Lahjouj et al. 2020) methods are deemed most appropriate for three machine learning models, respectively (Table 3). Combining groundwater nitrate observation data with groundwater nitrate vulnerability zoning, box plots of nitrate concentrations under different vulnerability zones are formed. From Figure 9, it can be observed that different levels of vulnerability zoning correspond to different ranges of nitrate concentrations. The observed nitrate concentrations in lower vulnerability zones are lower, while those in higher vulnerability zones are higher.

Table 3

Comparison between classification methods applied to three models

Spearman rank correlation (ρ)
Eta coefficient (η)
ANOVA F-statistics
RFXGBSVMRFXGBSVMRFXGBSVM
Equal interval 0.6726 0.6673 0.6000 0.2861 0.2774 0.2641 1.8722 1.8503 1.2658 
Quantile 0.6571 0.6628 0.5733 0.2701 0.2681 0.2617 1.4473 1.4483 1.2117 
Natural break 0.6480 0.6616 0.5962 0.2734 0.2649 0.2614 1.4272 1.4375 1.2000 
Geometric interval 0.6745 0.6655 0.5840 0.2778 0.2738 0.2642 1.8414 1.7405 1.2136 
Spearman rank correlation (ρ)
Eta coefficient (η)
ANOVA F-statistics
RFXGBSVMRFXGBSVMRFXGBSVM
Equal interval 0.6726 0.6673 0.6000 0.2861 0.2774 0.2641 1.8722 1.8503 1.2658 
Quantile 0.6571 0.6628 0.5733 0.2701 0.2681 0.2617 1.4473 1.4483 1.2117 
Natural break 0.6480 0.6616 0.5962 0.2734 0.2649 0.2614 1.4272 1.4375 1.2000 
Geometric interval 0.6745 0.6655 0.5840 0.2778 0.2738 0.2642 1.8414 1.7405 1.2136 

Note: ANOVA, analysis of variance.

The GWV is classified into five categories using the equal interval classification method in GIS, namely, ‘very low,’ ‘low,’ ‘middle,’ ‘high,’ and ‘very high’. We conducted a statistical analysis of the predictions from the four methods and found that the results from the RF and XGB models exhibited a similar distribution pattern (Figure 10). However, the SVM model predicted a higher proportion of extremely high and extremely low vulnerability levels compared to the RF and XGB models, with fewer predictions for other vulnerability levels, indicating a significant deviation from the other two models.
Figure 10

GWV frequency ratio results of the three models.

Figure 10

GWV frequency ratio results of the three models.

Close modal

Compared to the three machine learning models mentioned earlier, traditional index weighting models tend to predict larger areas for high and very high vulnerability levels, while smaller areas are predicted for low vulnerability levels. The percentages of the five GWV levels in the predictions of the best model (RF) are as follows: 30.63, 24.05, 25.96, 16.35, and 3.01%, respectively.

The GWV distribution maps (Figure 11) generated from the three machine learning models reveal a strong consistency in the GWV levels. However, in the western region, traditional index weighting models often exhibit large areas of high vulnerability zones. This is primarily due to the high weighting of population centers, which may skew the vulnerability assessment. But, the nitrate concentration in the western region tends to be relatively low overall. This discrepancy between predicted values and observed nitrate concentrations highlights a mismatch between the traditional index weighting model predictions and actual observations. On the other hand, machine learning models, trained using observed values and the DRASTIC-LU framework in a supervised learning approach, demonstrate a higher correlation with actual observations. As a result, they better reflect the real conditions of the study area, providing more accurate and reliable GWV assessments.
Figure 11

The groundwater vulnerability (GWV) to nitrate distribution map constructed using RF, XGB, SVM models, and DRASTIC-LU weight model. The groundwater vulnerability is classified into five categories using the equal interval method in GIS, namely, ‘very low,’ ‘low,’ ‘middle,’ ‘high,’ and ‘very high’.

Figure 11

The groundwater vulnerability (GWV) to nitrate distribution map constructed using RF, XGB, SVM models, and DRASTIC-LU weight model. The groundwater vulnerability is classified into five categories using the equal interval method in GIS, namely, ‘very low,’ ‘low,’ ‘middle,’ ‘high,’ and ‘very high’.

Close modal

Three machine learning models all indicate higher vulnerability in the central-western urban areas, karst trough valleys, as well as in the southeastern and northeastern regions. Upon comparison with LU and hydrogeological maps, it is observed that areas of high vulnerability are concentrated in urban and agricultural land-use zones, as well as in carbonate rock regions, which are highly correlated with groundwater recharge parameters. Due to the lack of natural impermeable or filtering layers in karst areas, surface water and all pollutants can easily enter aquifers or underground rivers directly through karst features such as caves. This indicates that agricultural activities (excessive use of nitrogen fertilizers), urban sewage, and highly permeable aquifers are the main factors controlling groundwater nitrogen pollution, consistent with previous studies (Knoll et al. 2019a; Hartmann et al. 2021; Goodarzi et al. 2022; Mkumbo et al. 2022; Shanmugamoorthy et al. 2023). Due to the high vulnerability of groundwater nitrate in the northeastern and southeastern areas of Chongqing, rural areas in these regions often rely on household wells for water supply. The elevated vulnerability of groundwater, coupled with nitrogen emissions from human activities, poses a high risk of nitrate contamination in groundwater. Long-term consumption of nitrate-contaminated groundwater in these areas may lead to conditions such as methemoglobinemia, leukemia, gastrointestinal cancers, and other diseases (Yüksel et al. 2021; Topaldemir et al. 2023). To ensure the safety of drinking water in areas with high GWV, relevant authorities should take corresponding measures to manage groundwater in these areas. This may include measures such as avoiding excessive use of nitrogen fertilizers, raising awareness among local residents about safe water practices, prioritizing the use of high-quality regional groundwater, and implementing targeted water treatment measures. Subsequent research should involve sampling of domestic drinking water wells in high vulnerability areas to validate model results. In addition, monitoring points should be added to protect drinking water sources in high vulnerability areas, ensuring the safety of drinking water in these regions.

Assessing GWV has emerged as a crucial tool for sustainable management of groundwater resources. Consequently, there is a growing demand for the development of novel methods to enhance the accuracy of these assessments. This study focuses on Chongqing Municipality as the research area, integrating the traditional DRASTIC-LU GWV framework with machine learning techniques and the distribution of groundwater nitrate concentrations. By eliminating subjective weighting, this approach quantifies GWV. Evaluation of RF, XGB, and SVM models using metrics such as accuracy, precision, recall, F1 score, kappa value, and AUC leads to the following conclusions:

  • (1) In this study, among the three selected machine learning models, the RF model outperforms the others, achieving the highest accuracy (92.9% for testing), kappa value (0.857 for testing), and AUC (0.948 for testing). Further validation was conducted by analyzing the correlation between groundwater nitrate sampling concentration data and groundwater nitrate vulnerability index. The results confirmed that both the RF model and XGB model outperformed the traditional index weighting model (with the RF model achieving the highest R2 value, R2 = 0.803). These analyses affirm the RF model's superior suitability for predicting groundwater nitrate vulnerability index in the study area.

  • (2) The SHAP interpreter was utilized to explain the input features of the DRASTIC-LU model. The results indicate that aquifer permeability, lithology, groundwater recharge, as well as cultivated land and urban LU are the most influential indicators affecting GWV to nitrate.

  • (3) A GWV assessment was conducted across the entire Chongqing region using a 1 km × 1 km grid. The results indicate that the distribution proportions of vulnerability levels, from extremely low to extremely high, are as follows: 0.63, 24.05, 25.96, 16.35, and 3.01%, respectively. Areas with high and extremely high vulnerability levels are concentrated in the southeastern, northeastern, and central urban areas, particularly in regions with high urban development intensity and karst trough valleys.

While machine learning models have demonstrated excellent performance in quantifying GWV, it is important to note that the nitrate distribution data used in our study were generated through simulations by machine learning models, introducing a certain level of uncertainty. In addition, due to financial constraints, we were unable to conduct data validation in unsampled areas. Validation efforts will be pursued in future research endeavors focused on groundwater pollution prevention and control.

We gratefully acknowledge the financial support provided by the Chongqing Science and Technology Development Foundation (Project Number: cstc2020jcyj-msxmX1074) and the self-funded resources of the Chongqing Institute of Geology and Mineral Resources.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Agency
J. A. E.
2021
ALOS World 3D 30 meter DEM. V3.2. Available from: https://doi.org/10.5069/G94M92H
.
Aller
L.
,
Bennett
T.
,
Lehr
J.
,
Petty
R.
&
Hackett
G.
1987
DRASTIC: A Standardized System for Evaluating Ground Water Pollution Potential Using Hydrogeologic Settings
.
US Environmental Protection Agency
,
Washington, D.C
.
(EPA/600/2-85/018)
.
Aslam
R. A.
,
Shrestha
S.
&
Pandey
V. P.
2018
Groundwater vulnerability to climate change: A review of the assessment methodology
.
Science of the Total Environment
612
,
853
875
.
https://doi.org/10.1016/j.scitotenv.2017.08.237
.
Barzegar
R.
,
Moghaddam
A. A.
,
Deo
R.
,
Fijani
E.
&
Tziritis
E.
2018
Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms
.
Science of the Total Environment
621
,
697
712
.
https://doi.org/10.1016/j.scitotenv.2017.11.185
.
Belitz
K.
&
Stackelberg
P. E.
2021
Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models
.
Environmental Modelling & Software
139
.
https://doi.org/10.1016/j.envsoft.2021.105006
.
Breiman
L.
2001
Random forest
.
Machine Learning
45
,
5
32
.
https://doi.org/10.1023/A:1010933404324
.
Canter
L. W.
2019
Nitrates in Groundwater
.
Routledge, New York, USA
.
China
M. o. N. R. o. t. P. s. R. o.
2021
Methods for analysis of groundwater quality.Part 59: Determination of nitrate. Ultraviolet spectrophotometry. In (Vol. DZ/T 0064.59-2021)
.
Council
N. R.
1993
Ground Water Vulnerability Assessment: Predicting Relative Contamination Potential Under Conditions of Uncertainty
.
The National Academies Press
,
Washington, DC, USA
.
DeSimone
L. A.
,
Pope
J. P.
&
Ransom
K. M.
2020
Machine-learning models to map pH and redox conditions in groundwater in a layered aquifer system, Northern Atlantic Coastal Plain, eastern USA
.
Journal of Hydrology: Regional Studies
30
.
https://doi.org/10.1016/j.ejrh.2020.100697
.
Doerfliger
N.
,
Jeannin
P. Y.
&
Zwahlen
F.
1999
Water vulnerability assessment in karst environments: A new method of defining protection areas using a multi-attribute approach and GIS tools (EPIK method)
.
Environmental Geology
39
(
2
),
165
176
.
https://doi.org/10.1007/s002540050446
.
El Bilali
A.
,
Taleb
A.
&
& Brouziyne
Y.
2021
Groundwater quality forecasting using machine learning algorithms for irrigation purposes
.
Agricultural Water Management
245
,
106625
.
https://doi.org/10.1016/j.agwat.2020.106625
.
Eslamian
S.
,
Harooni
Y.
&
Sabzevari
Y.
2023
Simulation of nitrate pollution and vulnerability of groundwater resources using MODFLOW and DRASTIC models
.
Scientific Reports
13
(
1
),
8211
.
https://doi.org/10.1038/s41598-023-35496-8
.
Fan
Y.
,
Li
H.
&
Miguez-Macho
G.
2013
Global patterns of groundwater table depth
.
Science
339
(
6122
),
940
943
.
https://doi.org/10.1126/science.1229881
.
Fick
S. E.
&
Hijmans
R. J. J.
2017
Worldclim 2: New 1-km spatial resolution climate surfaces for global land areas
.
International Journal of Climatology
37
(
12
),
4302
4315
.
https://doi.org/10.1002/joc.5086
.
Foster
S. S. D.
1987
Fundamental concepts in aquifer vulnerability, pollution risk and protection strategy
. In:
Vulnerability of Soil and Groundwater to Pollution
(van Duijvanbooden, W. & van Waegeningh, H. G., eds).
TNO Committee on Hydrological Research, Delft
,
The Netherlands
.
Gleeson
T.
,
Befus
K. M.
,
Jasechko
S.
,
Luijendijk
E.
&
Cardenas
M. B.
2016
The global volume and distribution of modern groundwater
.
Nature Geoscience
9
(
2
),
161
167
.
https://doi.org/10.1038/ngeo2590
.
Gong
P.
,
Liu
H.
,
Zhang
M.
,
Li
C.
,
Wang
J.
,
Huang
H.
,
Clinton
N.
,
Ji
L.
,
Li
W.
,
Bai
Y.
,
Chen
B.
,
Xu
B.
,
Zhu
Z.
,
Yuan
C.
,
Ping Suen
H.
,
Guo
J.
,
Xu
N.
,
Li
W.
,
Zhao
Y.
,
Yang
J.
,
Yu
C.
,
Wang
X.
,
Fu
H.
,
Yu
L.
,
Dronova
I.
,
Hui
F.
,
Cheng
X.
,
Shi
X.
,
Xiao
F.
,
Liu
Q.
&
Song
L.
2019
Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017
.
Science Bulletin
64
(
6
),
370
373
.
https://doi.org/10.1016/j.scib.2019.03.002
.
Goodarzi
M. R.
,
Niknam
A. R. R.
,
Jamali
V.
&
Pourghasemi
H. R.
2022
Aquifer vulnerability identification using DRASTIC-LU model modification by fuzzy analytic hierarchy process
.
Modeling Earth Systems and Environment
8
(
4
),
5365
5380
.
https://doi.org/10.1007/s40808-022-01408-4
.
Gu
B.
,
Ge
Y.
,
Chang
S. X.
,
Luo
W.
&
Chang
J.
2013
Nitrate in groundwater of China: Sources and driving forces
.
Global Environmental Change
23
(
5
),
1112
1121
.
https://doi.org/10.1016/j.gloenvcha.2013.05.004
.
Hartmann
J.
&
Moosdorf
N.
2012
The new global lithological map database GLiM: A representation of rock properties at the Earth surface
.
Geochemistry, Geophysics, Geosystems
13
,
12
.
https://doi.org/10.1029/2012gc004370
.
Hartmann
A.
,
Jasechko
S.
,
Gleeson
T.
,
Wada
Y.
,
Andreo
B.
,
Barberá
J. A.
,
Brielmann
H.
,
Bouchaou
L.
,
Charlier
J. B.
,
Darling
W. G.
,
Filippini
M.
,
Garvelmann
J.
,
Goldscheider
N.
,
Kralik
M.
,
Kunstmann
H.
,
Ladouche
B.
,
Lange
J.
,
Lucianetti
G.
,
Martín
J. F.
,
Mudarra
M.
,
Sánchez
D.
,
Stumpp
C.
,
Zagana
E.
&
Wagener
T.
2021
Risk of groundwater contamination widely underestimated because of fast flow into aquifers
.
118
(
20
),
e2024492118
.
https://doi.org/doi:10.1073/pnas.2024492118
.
Huang
J.
,
Xu
J.
,
Liu
X.
,
Liu
J.
&
Wang
L.
2011
Spatial distribution pattern analysis of groundwater nitrate nitrogen pollution in Shandong intensive farming regions of China using neural network method
.
Mathematical and Computer Modelling
54
(
3–4
),
995
1004
.
https://doi.org/10.1016/j.mcm.2010.11.027
.
Knoll
L.
,
Breuer
L.
&
Bach
M.
2019
Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning
.
Science of the Total Environment
668
,
1317
1327
.
https://doi.org/10.1016/j.scitotenv.2019.03.045
.
Lahjouj
A.
,
El Hmaidi
A.
,
Bouhafa
K.
&
Boufala
M. h.
2020
Mapping specific groundwater vulnerability to nitrate using random forest: Case of Sais basin, Morocco
.
Modeling Earth Systems and Environment
6
(
3
),
1451
1466
.
https://doi.org/10.1007/s40808-020-00761-6
.
Li
P.
,
Karunanidhi
D.
,
Subramani
T.
&
Srinivasamoorthy
K.
2021
Sources and consequences of groundwater contamination
.
Archives of Environmental Contamination and Toxicology
80
(
1
),
1
10
.
https://doi.org/10.1007/s00244-020-00805-z
.
Liang
Y.
,
Zhang
X.
,
Gan
L.
,
Chen
S.
,
Zhao
S.
,
Ding
J.
,
Kang
W.
&
Yang
H.
2024
Mapping specific groundwater nitrate concentrations from spatial data using machine learning: A case study of chongqing, China
.
Heliyon
10
(
6
),
e27867
.
https://doi.org/10.1016/j.heliyon.2024.e27867.
Lundberg
S. M.
&
Lee
S.-I.
2017
A unified approach to interpreting model predictions
. In:
Paper presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems
,
Long Beach, California, USA
.
Menció
A.
,
Mas-Pla
J.
,
Otero
N.
,
Regàs
O.
,
Boy-Roura
M.
,
Puig
R.
,
Bach
J.
,
Domènech
C.
,
Zamorano
M.
,
Brusi
D.
&
Folch
A.
2016
Nitrate pollution of groundwater; all right…, but nothing else?
Science of the Total Environment
539
,
241
251
.
https://doi.org/10.1016/j.scitotenv.2015.08.151
.
Mkumbo
N. J.
,
Mussa
K. R.
,
Mariki
E. E.
&
Mjemah
I. C.
2022
The use of the DRASTIC-LU/LC model for assessing groundwater vulnerability to nitrate contamination in morogoro municipality, Tanzania
.
Earth
3
(
4
),
1161
1184
.
https://doi.org/10.3390/earth3040067
.
Podgorski
J.
&
Berg
M.
2020
Global threat of arsenic in groundwater
.
Science
368
(
6493
),
845
850
.
https://doi.org/10.1126/science.aba1510
.
Ransom
K. M.
,
Nolan
B. T.
,
Traum
J. A.
,
Faunt
C. C.
,
Bell
A. M.
,
Gronberg
J. A. M.
,
Wheeler
D. C.
,
Rosecrans
C. Z.
,
Jurgens
B.
,
Schwarz
G. E.
,
Belitz
K.
,
Eberts
S. M.
,
Kourakos
G.
&
Harter
T.
2017
A hybrid machine learning model to predict and visualize nitrate concentration throughout the central valley aquifer, California, USA
.
Science of the Total Environment
601–602
,
1160
1172
.
https://doi.org/10.1016/j.scitotenv.2017.05.192
.
Ransom
K. M.
,
Nolan
B. T.
,
Stackelberg
P. E.
,
Belitz
K.
&
Fram
M. S.
2022
Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States
.
Science of the Total Environment
807
,
151065
.
https://doi.org/10.1016/j.scitotenv.2021.151065
.
Sajedi-Hosseini
F.
,
Malekian
A.
,
Choubin
B.
,
Rahmati
O.
,
Cipullo
S.
,
Coulon
F.
&
Pradhan
B.
2018
A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination
.
Science of the Total Environment
644
,
954
962
.
https://doi.org/10.1016/j.scitotenv.2018.07.054
.
Secunda
S.
,
Collin
M. L.
&
Melloul
A. J.
1998
Groundwater vulnerability assessment using a composite model combining DRASTIC with extensive agricultural land use in Israel's Sharon region
.
Journal of Environmental Management
54
(
1
),
39
57
.
https://doi.org/10.1006/jema.1998.0221
.
Shanmugamoorthy
M.
,
Subbaiyan
A.
,
Elango
L.
&
Velusamy
S.
2023
Groundwater susceptibility assessment using the GIS based DRASTIC-LU model in the Noyyal river area of South India
.
Urban Climate
49
,
101464
.
https://doi.org/10.1016/j.uclim.2023.101464
.
Spijker
J.
,
Fraters
D.
&
Vrijhoef
A.
2021
A machine learning based modelling framework to predict nitrate leaching from agricultural soils across the Netherlands
.
Environmental Research Communications
3
(
4
).
https://doi.org/10.1088/2515-7620/abf15f
.
Suykens
J. A. K.
&
Vandewalle
J.
1999
Least squares support vector machine classifiers
.
Neural Processing Letters
9
(
3
),
293
300
.
https://doi.org/10.1023/A:1018628609742
.
Taghavi
N.
,
Niven
R. K.
,
Paull
D. J.
&
Kramer
M.
2022
Groundwater vulnerability assessment: A review including new statistical and hybrid methods
.
Science of the Total Environment
822
,
153486
.
https://doi.org/10.1016/j.scitotenv.2022.153486
.
Tianqi Chen
C. G.
2016
XGBoost: A Scalable Tree Boosting System
. In:
Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
,
San Francisco, California, USA
.
Topaldemir
H.
,
Taş
B.
,
Yüksel
B.
&
Ustaoğlu
F.
2023
Potentially hazardous elements in sediments and Ceratophyllum demersum: An ecotoxicological risk assessment in Miliç Wetland, Samsun, Türkiye
.
Environmental Science and Pollution Research
30
(
10
),
26397
26416
.
https://doi.org/10.1007/s11356-022-23937-2
.
Vías
J. M.
,
Andreo
B.
,
Perles
M. J.
,
Carrasco
F.
,
Vadillo
I.
&
Jiménez
P.
2006
Proposed method for groundwater vulnerability mapping in carbonate (karstic) aquifers: The COP method
.
Hydrogeology Journal
14
(
6
),
912
925
.
https://doi.org/10.1007/s10040-006-0023-6
.
Wang
S.
,
Zhang
X.
,
Wang
C.
,
Zhang
X.
,
Reis
S.
,
Xu
J.
&
Gu
B.
2020
A high-resolution map of reactive nitrogen inputs to China
.
Scientific Data
7
,
1
.
https://doi.org/10.1038/s41597-020-00718-5
.
Wheeler
D. C.
,
Nolan
B. T.
,
Flory
A. R.
,
DellaValle
C. T.
&
Ward
M. H.
2015
Modeling groundwater nitrate concentrations in private wells in Iowa
.
Science of the Total Environment
536
,
481
488
.
https://doi.org/10.1016/j.scitotenv.2015.07.080
.
Wieder
W. R.
,
Boehnert
J.
,
Bonan
G. B.
&
Langseth
M.
2014
Regridded Harmonized World Soil Database v1.2. In: ORNL Distributed Active Archive Center. https://doi.org/10.3334/ORNLDAAC/1247
.
Yüksel
B.
,
Ustaoğlu
F.
&
Arica
E.
2021
Impacts of a garbage disposal facility on the water quality of Çavuşlu stream in Giresun, Turkey: A health risk assessment study by a validated ICP-MS assay
.
Aquatic Sciences Engineering
36
(
4
),
181
192
.
https://doi.org/10.26650/ASE2020845246
.
Zomer
R. J.
,
Xu
J.
&
Trabucco
A.
2022
Version 3 of the global aridity index and potential evapotranspiration database
.
Scientific Data
9
(
1
),
409
.
https://doi.org/10.1038/s41597-022-01493-1
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).