ABSTRACT
The main objective of the present study is to evaluate groundwater quality for irrigation purposes in the central-western part of Haryana state (India). For this, 272 groundwater samples were collected during the pre- and post-monsoon periods in 2022. Several indices, including SAR, PI, Na%, KR, magnesium adsorption ratio (MAR), and IWQI were derived. The results of SAR, Na%, and KR values indicate that the groundwater is generally suitable for irrigation. On the other hand, PI and MAR exceeded the established limits, primarily showing issues related to salinity and magnesium content in the groundwater. Furthermore, according to the IWQI classification, 47.06 and 25% of the total collected samples fell under the ‘severe restriction for irrigation’ category during the pre-monsoon and post-monsoon periods, respectively. Spatial variation maps indicate that water quality in the western portion of the study area is unsuitable for irrigation during both periods. Three ML algorithms, namely RF, SVM, and XGBoost were integrated and validated to predict the IWQI. The results revealed that the XGBoost with random search achieves the best prediction performances. The approaches established in this study have been confirmed to be cost-effective and feasible for groundwater quality, using hydrochemical parameters as input variables, and highly beneficial for water resource planning and management.
HIGHLIGHTS
The study focuses on assessing groundwater quality for irrigation using various indices.
SAR, Na%, and KR values generally indicate groundwater suitability for irrigation, and PI and MAR values exceeded established limits.
RF, SVM, and XGBoost ML algorithms were employed and validated to predict IWQI.
XGBoost with random search offers a potentially effective tool for groundwater quality prediction and management.
INTRODUCTION
Groundwater resources are a fundamental component of primary natural sources, playing an essential role in the socio-economic development of countries. Nevertheless, agronomy is the principal global consumer of groundwater resources, as highlighted by Kouadri et al. (2022). These resources face a range of challenges exposing their sustainability, including the influence of climate change, human activities, and natural developments (Makki et al. 2021; Abu El-Magd et al. 2023; Raheja et al. 2023a). Mostly, these issues lead to the degradation of water's chemical composition, making it unfit for drinking or irrigation. The United Nations has recognized admittance to fresh drinking water as a fundamental human right. Their findings show that roughly 10% of urban and rural populations lack access to clean and pure water for drinking (United Nations 2015). Nevertheless, regular monitoring and assessment strategies can help manage water quality, decrease contamination, and mitigate health hazards (Al-Barakah et al. 2017; Adimalla 2019; Tleuova et al. 2023; Raheja et al. 2024a). Several techniques and methodologies have been employed for this objective, with positive outcomes in estimating groundwater quality, delineating pollution risks, and assessing health hazards. These methodologies such as index-based, statistical methods, and GIS techniques, which are commonly used for groundwater quality assessment and mapping (Gebrehiwot et al. 2011; Gao et al. 2020; Khan et al. 2022; Gaagai et al. 2023; Omeka et al. 2023; Raheja et al. 2023b).
To assist in irrigation use, the irrigation water quality (IWQ) frequently relies on the utilization of various indexes and parameters as specified by Food, Agriculture Organization (FAO) guidelines (Ayers & Westcot 1994). Several studies have explored the application of index-based and statistical methods to minimize subjectivity while evaluating water suitability for irrigation. Ewaid et al. (2019) introduced a novel software tool designed to assess the suitability of irrigation water based on the FAO guidelines. While the findings of these studies indicate the suitability of conventional methods as they are both quick and cost-effective methods for water quality estimation and controlling processes, they require a significant volume of data. Hence, using traditional approaches to evaluate IWQ may be challenging regarding labor and costs, particularly for farmers in developing countries. Sustainable groundwater management plans currently address innovative and cost-effective methods to evaluate and predict groundwater quality. To meet this requirement, prediction-based approaches can prove to be invaluable in overcoming the challenges of groundwater planning and management.
In recent years, the GIS has developed as an influential technique for collecting, investigating, and visually representing spatial data. These data are then used to make informed decisions across several water resource fields (Raheja et al. 2022a; Awasthi et al. 2023). Omeka et al. (2023) used GIS tools to evaluate the water quality for irrigation purposes in southeastern parts of Obubra LGA, Nigeria, and the results specified that geographical information system–analytical hierarchical process (GIS-AHP) techniques are very dependable for understanding the complete groundwater quality. Islam et al. (2018) utilized GIS methods to estimate the groundwater quality in Bangladesh. They determined that spatial maps of the studied area offer dependable information for policymakers in a highly sustainable manner. Amrani et al. (2022) evaluated the suitability of groundwater in the Timahdite–Almis Guigou region of Morocco. Their findings suggested fluctuating qualities of groundwater, ranging from good to poor for drinking purposes while signifying that the groundwater was appropriate for irrigation uses.
Over the past few years, several scientists have employed machine learning (ML) modeling to forecast groundwater quality for irrigation purposes, aiming to address these challenges. The results obtained using ML approaches performed better than traditional methods (Abba et al. 2020; Guo et al. 2021; Mosavi et al. 2021; Kouadri et al. 2022; Abu El-Magd et al. 2023; Raheja et al. 2024b). Recently, El Bilali et al. (2021) evaluated groundwater quality for irrigation purposes using ML models in the Berrechid aquifer of Morocco. However, Egbueri & Agbasi (2022) predicted the water quality index (WQI) and overall index of pollution (OIP) using a multiple linear regression (MLR) model and multilayer perceptron neural networks (MLP-NN) in Nigeria, and they found that rephrase ML approaches have proven helpful in the assessment, classification, and forecast of water quality indices. Ewaid et al. (2018) developed a predictive model using MLR to quickly forecast irrigation water, water quality indices, and their applications in agriculture, manufacturing industry, and drinking uses of the Tigris River in Iraq. Kouadri et al. (2022) used three ML models such as long short-term memory (LSTM), MLR, and artificial neural network (ANN) to forecast the groundwater quality for irrigation in the Maharashtra region of India. Another study by Aldrees et al. (2023) used multi-expression programming (MEP) and random forest (RF) regression models to predict water quality in the Indus River Basin of Pakistan. Singha et al. (2021) used XGBoost, RF, and ANN models for groundwater quality evaluation in the Raipur region of India. In addition to established ML tools and techniques, newer methods are developed to enhance the accuracy and reliability of data mining, forecasting, and data extraction in water quality. Consequently, the water quality research area appears to be consistently progressing scientifically in recent times.
In view of assessing groundwater quality, the current study is carried out in the central-western portion of Haryana state (India), where groundwater serves as the predominant water source for drinking and irrigation purposes. So far, no single study estimated IWQ using a combined approach involving several indices and ML processes specifically for this central-western part of Haryana. Hence, there is significant value in broadly characterizing the hydrogeochemistry of groundwater, assessing water quality, and identifying the governing factors within a region to help the sustainable utilization of groundwater resources. With this view, this study was carried out to achieve the following objectives: (1) Evaluation of groundwater quality for irrigation purposes by using several indices such as SAR, PI, Na%, KR, magnesium adsorption ratio (MAR), and IWQI; (2) to determine the controlling factors in groundwater and their spatial distribution by ArcGIS; (3) to predict the IWQI by use of advance ML algorithms, and (4) to compare the performances of three ML algorithms such as RF, SVM, and XGBoost and choosing the best suitable models. This research outcome aims to support farmers and groundwater development in semi-arid countries by allowing rapid prediction of water quality for irrigation at a low cost.
Description of the study area
The research area, comprising Hisar, Rohtak, and Jind, is situated in the central-western region of Haryana, India. Hisar, a district in the west-central part of Haryana State, covers a total geographical area of 3,983 km2. Its coordinates range between 28° 56′ 00″ and 29° 38′ 30″ North latitudes, and 75° 21′ 12″ and 76° 18′ 12″ East longitudes. Rohtak district, also in Haryana, lies between 28° 40′ to 29° 05′ North latitude and 76° 13′ to 76° 51′ East longitude, encompassing an area of 1,745 km2, with elevations ranging from 216 to 275 m (CGWB 2013a). Jind district, another region in Haryana, spans between 29° 03′ to 29° 51′ North latitude and 75° 53′ 00″ to 76° 45′ 30′ East longitude, covering a geographical area of 2702 km2.
Climate and rainfall
The climate of the Hisar area falls within the arid category, including hot summers and cold winters. Temperatures fluctuate between 5.0 °C (minimum in January) and 43°C (maximum in May and June). The average annual rainfall in Hisar is 330 mm, with 75–80% of the precipitation occurring during the southwest monsoon period from June to September. Rohtak area is situated in a subtropical semi-arid region, characterized by hot summers and cold winters. The average temperature ranges from 7°C in January to 40.5°C in May and June. The approximate annual rainfall in Rohtak is 592 mm. The climate of the Jind area is tropical, with extremely hot summers and cold winters. The average annual rainfall in Jind is around 515 mm (CGWB 2013b).
Geology and drainage system
The study area falls under the Indo-Gangetic Plains, shaped by the sedimentary deposition of the river systems traversing the region. The examination of soil characteristics holds an essential role in determining both the quality and quantity of groundwater. Soils are derived from rocks due to weathering and erosion, influenced by geological forces such as rivers, wind, and rainfall, which can vary depending on the prevailing climate conditions. The primary geological formations found in the study area cover Aeolian Deposits, Alluvium, Proterozoic Rocks, and the Aravalli Super Group. Natural drainage is absent in the study area, but it is drained by a canal network and artificial drains.
Digital elevation model
MATERIALS AND METHODOLOGY
Data collection
Field and laboratory analysis
The pH, EC, and TDS were evaluated on the site using a portable pH/TDS/electrical conductivity (EC) meter. The calcium (Ca2+), magnesium (Mg2+), and total hardness (TH) were observed using the ethylenediaminetetraacetic acid (EDTA) titration method. The sodium (Na+) and potassium (K+) were calculated using the flame photometer method. Anion analysis involved the chloride (Cl−), carbonate (), and bicarbonate () titration methods, while significant anions nitrate (), sulfate (), and fluoride (F−) were measured using a spectrophotometer.
Data reliability check
All the above concentrations of ions are in mg/L. The IBE value falls within the acceptable limit of ±10% for all groundwater samples.
Indexing approach
Numerous indices, such as SAR, PI, Na%, KR, MAR, and IWQI, were computed to classify the groundwater for irrigation uses as shown in Table 1.
Sr. No. . | Parameters . | Reference . |
---|---|---|
1. | Richards (1954) | |
2. | Doneen (1964) | |
3. | Wilcox (1955) | |
4. | Kelly (1956) | |
5. | Paliwal KV (1972) | |
6. | Meireles et al. (2010) |
Sr. No. . | Parameters . | Reference . |
---|---|---|
1. | Richards (1954) | |
2. | Doneen (1964) | |
3. | Wilcox (1955) | |
4. | Kelly (1956) | |
5. | Paliwal KV (1972) | |
6. | Meireles et al. (2010) |
Irrigating water quality index (IWQI)
The IWQI is a dimensionless index and is used to evaluate the water quality for irrigation purposes based on its chemical composition. It provides a comprehensive evaluation by considering multiple water quality parameters and assigning a single index value to indicate the overall quality of the water for irrigation purposes (Meireles et al. 2010; Masoud et al. 2022). It was calculated by using five parameters: EC, SAR, Na+, Cl−, and . The calculation of IWQI involves assigning weights to each water quality parameter based on its importance for irrigation and then aggregating these weighted parameters to obtain an overall index value. This index value typically varies from 0 to 100, where higher values indicate good water quality for irrigation. The detailed description to calculate IWQI is as follows:
Value . | Name of parameter . | ||||
---|---|---|---|---|---|
EC . | SAR . | Na+ . | Cl− . | . | |
wi | 0.211 | 0.189 | 0.202 | 0.194 | 0.204 |
Value . | Name of parameter . | ||||
---|---|---|---|---|---|
EC . | SAR . | Na+ . | Cl− . | . | |
wi | 0.211 | 0.189 | 0.202 | 0.194 | 0.204 |
qi . | EC . | SAR . | Na+ . | Cl− . | . |
---|---|---|---|---|---|
85–100 | 200–750 | <3 | 2–3 | <4 | 1–1.5 |
60–85 | 750–1,500 | 3–6 | 3–6 | 4–7 | 1.5–4.5 |
35–60 | 1,500–3,000 | 6–12 | 6–9 | 7–10 | 4.5–8.5 |
0–35 | <200 or >3,000 | >12 | <2 or >9 | >10 | <1 or >8.5 |
qi . | EC . | SAR . | Na+ . | Cl− . | . |
---|---|---|---|---|---|
85–100 | 200–750 | <3 | 2–3 | <4 | 1–1.5 |
60–85 | 750–1,500 | 3–6 | 3–6 | 4–7 | 1.5–4.5 |
35–60 | 1,500–3,000 | 6–12 | 6–9 | 7–10 | 4.5–8.5 |
0–35 | <200 or >3,000 | >12 | <2 or >9 | >10 | <1 or >8.5 |
ML algorithms
In this study, three ML algorithms, such as SVM, RF, and XGBoost, are used to predict the IWQI. However, the extensive literature on ML algorithms used can be found in Sánchez A (2003), Goel & Pal (2009), Singh et al. (2011), Chen & Guestrin (2016), Chen et al. (2020), El Bilali et al. (2021), Raheja et al. (2022b) and Aldrees et al. (2023).
Random forest
RF is a helpful method for classification and regression tasks. It combines various tree predictors, with each tree being generated independently using a bagged sample (67% of training samples). In regression, the tree predictor produces numerical values, whereas the RF classifier naturally allows class labels (Breiman 2001). The foremost aim is to perform regression predictions. Therefore, this section will focus exclusively on introducing the regression tree (RT). At every branching point of the RT, we calculate the average of the samples with leaf nodes and determine the mean square error (MSE) formed between each sample. The lowest the MSE value among leaf nodes as the branching condition, the RT will continue to grow until no more features are available or until the overall MSE reaches its optimal point (El Bilali & Taleb 2020; Zhang et al. 2021).
Support vector machine
With as the kernel function (k), i.e., linear, radial basis, and polynomial, ω and b, are the weight and basis vectors, and C is referred to as a pre-specified value to penalize the training error. whereas ξi− and ξi+ the lower and higher limitations on the output. Similar to other ML algorithms, methods like Grid Search, Random Search, and Bayesian Optimization are commonly used to fine-tune SVM's hyperparameters. These hyperparameters can significantly influence the performance of the SVM algorithm. Further details about the SVM model readers can be referred to (Cortes & Vapnik 1995; Brereton & Lloyd 2010).
Extreme gradient boosting (XGBoost)
The study's limitations include its reliance on data from specific periods, potentially missing long-term trends or extreme weather effects. While used ML algorithm performed well in predicting IWQI, exploring other advanced models and expanding the study's temporal and spatial scope would enhance accuracy and account for complex variable interactions.
Metric evaluation algorithms
The performance of different ML algorithms FOR IWQI was evaluated by four statistical criteria, such as Pearson's correlation coefficient (r), root mean square error (RMSE), mean absolute error (MAE), and relative bias (RBIAS). The values can be calculated as given in Table 4.
Statistical indicators . | Formulation . | Description . |
---|---|---|
Pearson's correlation coefficient (r) | When r = 1 signifies the strongest correlation between the actual and forecast values, it does not necessarily imply the best-fitting model. If the value of r < 1 suggests a model that fits the data to a lesser extent. | |
Root mean square error (RMSE) | A lower RMSE value than the results suggests a more accurate fit for the model. | |
Mean absolute error (MAE) | A lower MAE value indicates a higher accuracy of the model, as it signifies that the predicted values are closer to the observed values on average. | |
Relative bias (RBIAS) | When RBIAS > 0, it implies that the model tends to underestimate the target magnitude. When RBIAS < 0, it suggests that the model tends to overestimate the target magnitude. If RBIAS = 0, it indicates a perfect model. A higher value of RBIAS specifies a more significant bias in the model. |
Statistical indicators . | Formulation . | Description . |
---|---|---|
Pearson's correlation coefficient (r) | When r = 1 signifies the strongest correlation between the actual and forecast values, it does not necessarily imply the best-fitting model. If the value of r < 1 suggests a model that fits the data to a lesser extent. | |
Root mean square error (RMSE) | A lower RMSE value than the results suggests a more accurate fit for the model. | |
Mean absolute error (MAE) | A lower MAE value indicates a higher accuracy of the model, as it signifies that the predicted values are closer to the observed values on average. | |
Relative bias (RBIAS) | When RBIAS > 0, it implies that the model tends to underestimate the target magnitude. When RBIAS < 0, it suggests that the model tends to overestimate the target magnitude. If RBIAS = 0, it indicates a perfect model. A higher value of RBIAS specifies a more significant bias in the model. |
EXPERIMENTAL RESULT AND DISCUSSION
Sodium adsorption ratio (SAR)
SAR is an important parameter for evaluating the suitability of water for irrigation and managing the potential risks associated with high sodium concentrations in soil and water. SAR is a significant indicator because Na+ can have harmful effects on soil structure and plant growth when present in high concentrations. The SAR is classified into four classes such as SAR < 10: water is considered excellent for irrigation, when 10 ≤ SAR < 18: water may be suitable for irrigation, when 18 ≤ SAR < 26: water is normally not suitable, and when SAR >26 water is unsuitable for irrigation due to the high sodium content, The results of the SAR classification indicate that out of total pre-monsoon samples, 30.88% have low sodium, 22.06% medium sodium, 9.56% are in the doubtful range (high sodium content), and 37.5% have very high sodium, which is unsuitable for irrigation. For the post-monsoon study, 34.56% of samples fall under the low sodium category, 16.92% of samples in the medium sodium range, 13.23% of samples in the high sodium category, and 35.29% of samples lie in the not suitable for irrigation category (Table 5).
Name of parameters . | Water type . | Range . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . |
---|---|---|---|---|
SAR | Low sodium | <10 | 30.88 | 34.56 |
Medium sodium | 10–18 | 22.06 | 16.92 | |
High sodium | 18–26 | 9.56 | 13.23 | |
Very high sodium | >26 | 37.5 | 35.29 | |
PI | Excellent | >75 | 2.94 | 2.94 |
Good | 25–75 | 46.32 | 45.58 | |
Unsuitable | <25 | 50.74 | 51.48 |
Name of parameters . | Water type . | Range . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . |
---|---|---|---|---|
SAR | Low sodium | <10 | 30.88 | 34.56 |
Medium sodium | 10–18 | 22.06 | 16.92 | |
High sodium | 18–26 | 9.56 | 13.23 | |
Very high sodium | >26 | 37.5 | 35.29 | |
PI | Excellent | >75 | 2.94 | 2.94 |
Good | 25–75 | 46.32 | 45.58 | |
Unsuitable | <25 | 50.74 | 51.48 |
Permeability index
Sodium percentage (Na %)
Name of parameters . | Water type . | Range . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . |
---|---|---|---|---|
Na % | Excellent | <20 | 34.55 | 14.71 |
Good | 20–40 | 30.14 | 36.76 | |
Permissible | 40–60 | 25 | 28.68 | |
Doubtful | 60–80 | 8.82 | 17.64 | |
Unsuitable | >80 | 1.47 | 2.21 | |
KR | Suitable | <1 | 84.56 | 84.56 |
Unsuitable | >1 | 15.44 | 15.44 | |
MAR | Suitable | <50 | 30.89 | 27.95 |
Unsuitable | >50 | 69.11 | 72.05 |
Name of parameters . | Water type . | Range . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . |
---|---|---|---|---|
Na % | Excellent | <20 | 34.55 | 14.71 |
Good | 20–40 | 30.14 | 36.76 | |
Permissible | 40–60 | 25 | 28.68 | |
Doubtful | 60–80 | 8.82 | 17.64 | |
Unsuitable | >80 | 1.47 | 2.21 | |
KR | Suitable | <1 | 84.56 | 84.56 |
Unsuitable | >1 | 15.44 | 15.44 | |
MAR | Suitable | <50 | 30.89 | 27.95 |
Unsuitable | >50 | 69.11 | 72.05 |
Kelly's ratio
Magnesium adsorption ratio
Groundwater quality assessment using IWQI
Table 7 provides the IWQI classification along with recommendations. As per IWQI classification, 25% of groundwater samples were classified as ‘ Severe Restriction,’ 47.06% of samples as ‘High Restriction,’ and 3.68% of samples fell into the ‘Moderate Restriction’ class during the pre-monsoon period. Whereas, in the post-monsoon period, 49.26% of groundwater samples fall in ‘Severe Restriction,’ 64.71% in ‘High Restriction,’ and 10.29% of samples fall in the ‘Moderate Restriction’ class. No groundwater sample lies in the low and no restriction class of IWQI during both periods.
IWQI range . | Water use restrictions . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . | Recommendations . |
---|---|---|---|---|
85 < IWQI ≤ 100 | No restriction (NR) | 0 | 0 | It is suggested for most soil types without a significant risk of salinity and alkalinity issues. Including leaching into irrigation practices is advisable, and it may not be suitable for soils with exceptionally poor permeability. |
70 < IWQI ≤ 85 | Low restriction (LR) | 0 | 0 | It is suggested for application in irrigated soils with a low texture or medium permeability rate, where salt leaching is recommended. In heavy-textured soils, soil alkalinity issues may arise, and it is recommended to avoid using it in soils with high clay content, precisely with a 2:1 clay ratio. |
55 < IWQI ≤ 70 | Moderate restriction (MR) | 3.68 | 10.29 | Appropriate for application in soils with medium to high permeability values, with a recommendation for moderate salt leaching. |
40 < IWQI ≤ 55 | High restriction (HR) | 49.26 | 64.71 | Appropriate for utilization in soils with excellent permeability and the absence of compacted layers. |
0 < IWQI ≤ 40 | Severe restriction (SR) | 47.06 | 25 | Under typical situations, it is advisable to avoid using it for irrigation. However, in exceptional situations, irregular usage may be considered. When dealing with water containing low salt concentrations and a high SAR, the application of gypsum is required. |
IWQI range . | Water use restrictions . | Percentage of samples in pre-monsoon . | Percentage of samples in post-monsoon . | Recommendations . |
---|---|---|---|---|
85 < IWQI ≤ 100 | No restriction (NR) | 0 | 0 | It is suggested for most soil types without a significant risk of salinity and alkalinity issues. Including leaching into irrigation practices is advisable, and it may not be suitable for soils with exceptionally poor permeability. |
70 < IWQI ≤ 85 | Low restriction (LR) | 0 | 0 | It is suggested for application in irrigated soils with a low texture or medium permeability rate, where salt leaching is recommended. In heavy-textured soils, soil alkalinity issues may arise, and it is recommended to avoid using it in soils with high clay content, precisely with a 2:1 clay ratio. |
55 < IWQI ≤ 70 | Moderate restriction (MR) | 3.68 | 10.29 | Appropriate for application in soils with medium to high permeability values, with a recommendation for moderate salt leaching. |
40 < IWQI ≤ 55 | High restriction (HR) | 49.26 | 64.71 | Appropriate for utilization in soils with excellent permeability and the absence of compacted layers. |
0 < IWQI ≤ 40 | Severe restriction (SR) | 47.06 | 25 | Under typical situations, it is advisable to avoid using it for irrigation. However, in exceptional situations, irregular usage may be considered. When dealing with water containing low salt concentrations and a high SAR, the application of gypsum is required. |
Hydrogeochemistry characteristics
Gibbs diagrams, established by Gibbs in 1970, serve as valuable tools to analyze the sources of dissolved chemical constituents in groundwater and help interpret the hydrochemical processes monitoring water chemistry. They consist of two key diagrams: one depicting the relative concentrations of Na+/(Na+ + Ca2+) on the horizontal axis against TDS on the vertical axis, while the other illustrates the ratio of TDS to Cl−/(Cl− + ) ions. These diagrams explain three zones representing dominant aquifer chemistry mechanisms: evaporation dominance, rock dominance, and precipitation dominance (Gibbs 1970).
Chadha diagram
Performance of ML algorithms
For this part of the research, the dataset is divided into partitions based on the maximum r-value and minimum RMSE value, which is determined through a trial-and-error approach. Based on the results, 70% (190 samples) and 30% (82 samples) of the entire dataset are used for training (calibration) and testing (validation) phases, respectively. All three ML methods require using several hyperparameters. Different algorithms have various parameters that significantly impact their performance. In this study, the RF algorithm utilizes parameters such as Max Depth, Min Samples Leaf, Min Samples Split, and N Estimators. For SVM, the parameters include C, Degree, Epsilon, and Gamma. In XGBoost, hyperparameters like Learning Rate, Iterations, Depth, and Subsample are employed. In this study, we employed three search methods: Grid Optimization, Random Optimization, and Bayesian Optimization to determine the optimal hyperparameters, thereby improving the performance of the ML algorithm. This comprehensive approach proved to be faster and more efficient compared to conventional optimization methods. The model yielding the best predictions, with its optimal hyperparameters, was chosen to estimate the IWQI within the study area. Table 8 provides hyperparameters and functions selected for IWQI prediction.
Algorithm . | Approach . | Description of parameters and functions . |
---|---|---|
RF | Grid | Minimum samples split = 2, Maximum depth = 15, Minimum samples leaf = 2, n estimators = 100 |
Random | Minimum samples split = 7, Maximum depth = 7, Minimum samples leaf = 1, n estimators = 200 | |
Bayes | Minimum samples split = 2.32, Maximum depth = 10.0, Minimum samples leaf = 1.0, n estimators = 69.10 | |
SVM | Grid | C = 10, Degree = 2, Epsilon = 1, Gamma = 1 |
Random | C = 6.95, Degree = 4, Epsilon = 0.3197, Gamma = scale, | |
Bayes | C = 9.47, Degree = 3.15, Epsilon = 0.915, Gamma = 1.0 | |
XGBoost | Grid | Learning rate = 0.1, Iterations = 100, Depth = 3, Subsample = 0.5 |
Random | Learning rate = 0.1, Iterations = 410, Depth = 6, Subsample = 0.5 | |
Bayes | Learning rate = 0.021, Iterations = 196.6, Depth = 3.59, Subsample = 0.51, Gamma = 0.128 |
Algorithm . | Approach . | Description of parameters and functions . |
---|---|---|
RF | Grid | Minimum samples split = 2, Maximum depth = 15, Minimum samples leaf = 2, n estimators = 100 |
Random | Minimum samples split = 7, Maximum depth = 7, Minimum samples leaf = 1, n estimators = 200 | |
Bayes | Minimum samples split = 2.32, Maximum depth = 10.0, Minimum samples leaf = 1.0, n estimators = 69.10 | |
SVM | Grid | C = 10, Degree = 2, Epsilon = 1, Gamma = 1 |
Random | C = 6.95, Degree = 4, Epsilon = 0.3197, Gamma = scale, | |
Bayes | C = 9.47, Degree = 3.15, Epsilon = 0.915, Gamma = 1.0 | |
XGBoost | Grid | Learning rate = 0.1, Iterations = 100, Depth = 3, Subsample = 0.5 |
Random | Learning rate = 0.1, Iterations = 410, Depth = 6, Subsample = 0.5 | |
Bayes | Learning rate = 0.021, Iterations = 196.6, Depth = 3.59, Subsample = 0.51, Gamma = 0.128 |
Algorithm . | Approach . | R . | RMSE . | MAE . | RBIAS . |
---|---|---|---|---|---|
RF | Grid | 0.8416 | 60.6857 | 3.065 | −0.634% |
Random | 0.8511 | 60.6761 | 2.993 | −0.564% | |
Bayesian | 0.7074 | 61.1891 | 4.084 | −1.747% | |
SVM | Grid | 0.7907 | 61.3005 | 8.393 | −2.393% |
Random | 0.7927 | 61.3234 | 8.340 | −2.490% | |
Bayesian | 0.7932 | 61.3127 | 8.365 | −2.445% | |
XGBoost | Grid | 0.8776 | 60.6139 | 2.757 | −0.212% |
Random | 0.8829 | 60.5369 | 2.754 | 0.203% | |
Bayesian | 0.8569 | 60.6958 | 2.976 | 0.606% |
Algorithm . | Approach . | R . | RMSE . | MAE . | RBIAS . |
---|---|---|---|---|---|
RF | Grid | 0.8416 | 60.6857 | 3.065 | −0.634% |
Random | 0.8511 | 60.6761 | 2.993 | −0.564% | |
Bayesian | 0.7074 | 61.1891 | 4.084 | −1.747% | |
SVM | Grid | 0.7907 | 61.3005 | 8.393 | −2.393% |
Random | 0.7927 | 61.3234 | 8.340 | −2.490% | |
Bayesian | 0.7932 | 61.3127 | 8.365 | −2.445% | |
XGBoost | Grid | 0.8776 | 60.6139 | 2.757 | −0.212% |
Random | 0.8829 | 60.5369 | 2.754 | 0.203% | |
Bayesian | 0.8569 | 60.6958 | 2.976 | 0.606% |
Bold values mean that the XGBoost model with random search method performs well in terms of R-value, RMSE, MAE, and RBIAS value.
CONCLUSIONS
The present study investigates hydrogeochemistry and groundwater suitability for irrigation uses. The study has the following conclusions are drawn below:
Various indices were analyzed for IWQ assessment, and it found that PI and MAR exceed the limit, mainly describing the salinity and magnesium hazards in groundwater. The SAR, Na % and KR values suggest that the groundwater is suitable for irrigation.
The IWQI classification suggests that 47.06% of samples in Pre-monsoon and 25% of groundwater samples in Post-monsoon fall in Severe Restriction and recommends avoiding use for irrigation under normal conditions.
GIS-based IWQI identifies that the groundwater quality was improved in the post-monsoon period compared to the pre-monsoon period, and the western part is unsuitable for irrigation.
The Gibbs and Chadha plots illustrate that many ions present in groundwater are initiated from interactions dominated by evaporation, water–rock interactions, and reversed ion exchange phenomena.
The XGBoost model, particularly when optimized using the random search method, has shown superior performance in forecasting the IWQI by its high correlation coefficient (r = 0.8829) and low root mean squared error (RMSE = 60.5369), mean absolute error (MAE = 2.754) indicating strong predictive accuracy and minimal deviation from actual values. Furthermore, the relative bias (RBIAS) of 0.203% underscores the model's capability to maintain consistency in its predictions.
The findings of this study offer valuable insights into the planning and management of existing water resources within this region. However, the study also has certain limitations. The data used for the analysis were confined to specific periods, which may not capture long-term trends or the impact of extreme weather events. Additionally, while the XGBoost model performed well in predicting IWQI, it is essential to explore other advanced ML models that could further enhance predictive accuracy and consider more complex interactions among variables. Future research in this region should aim to expand the temporal and spatial scale of the study to include more extensive data over multiple years and different seasons.
ACKNOWLEDGEMENTS
The authors express appreciation to the National Institute of Technology, Kurukshetra-136119, Haryana, India for providing essential research facilities.
FUNDING
We extend our gratitude to the Ministry of Education (MOE), Government of India (GOI), for their support in funding this study through a scholarship awarded to the first author, Hemant Raheja (Grant No. 2K19/NITK/PHD/61900011).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors have no conflict of interest.