This study evaluates and predicts six water quality indices such as sodium adsorption ratio (SAR), Kelly's ratio (KR), percentage sodium (%Na), permeability index (PI), exchangeable sodium percentage (ESP), and irrigation water quality index (IWQI) using multivariate regression models (MLR, PLSR, PCR, and WLSR) and machine learning (ML) algorithms (ANN, SVM, CART, CRRF, and KNN). The study analyzes data from 360 dug wells in Sundargarh district, India, during 2014–2021 with 70% used for training and 30% for testing. Spatial mapping of SAR, KR, ESP, and PI exhibits higher suitability of groundwater. The Mann–Kendall test of trend analysis shows a monotonic increasing and decreasing trend for SAR, KR, %Na, ESP, PI, and IWQI, respectively, at p > 0.05 during 2014–2021. Principal component analysis and discriminant analysis identify Na+, SAR, KR, %Na, and PI as the most influential WQ variables affecting the groundwater quality for this study area. MLR and WLSR models are superior in predicting SAR and ESP, while ANN is the best-suited ML model for SAR, KR, %Na, PI, and ESP. CRRF predicts IWQI with a relatively higher accuracy. These findings demonstrate the effectiveness of ML models in improving irrigation water quality assessment, providing valuable insights for groundwater-based crop management.

  • Developed the predictive models for water quality indices.

  • Utilize 360 dug wells data encompassing various water quality parameters.

  • Comparison of multivariate regression models with machine learning algorithms to determine the most accurate predictive models.

  • Trend analysis to identify significant trends in water quality over time.

  • Provide actionable insights for agricultural water management.

Groundwater (GW) has a significant contribution in managing water resources, particularly in countries across West Asia and North Africa having considerable surface water scarcity (Li et al. 2022). Conversely, the agricultural sector is the largest global consumer of GW especially in West Asian countries such as Qatar, Oman, and Iran and South Asian countries such as India, and it serves as a vital resource for the socioeconomic development of any country (Ibrahim et al. 2023). Various natural processes, improper sewage disposal, and geochemical activities can render surface water unsuitable for irrigation and other uses (Benkov et al. 2023). Therefore, use of GW offers distinct advantages in terms of constancy and dependability, particularly for cultivation operation (Mohammed et al. 2022). Currently, expansion of irrigated land and uncontrolled exploitation of GW may lead to a decline in GW level with increased potential of contamination (Eyankware et al. 2022). Proper investigation and evaluation of GW quality is of utmost importance for crop productivity and soil fertility (El Bilali et al. 2021). Salt content in GW may adversely impact soil fertility and reduce biomass production of crops, regardless of its quantity (Mohammed et al. 2023). Along with that, variation in climatic conditions, soil characteristics, irrigation practices, and anthropogenic activities also has a noticeable impact on GW quality (Barbieri et al. 2023; Dao et al. 2023). GW quality may be poorer in the monsoon season as compared to the pre-monsoon season as a comparatively higher percolation rate may allow for more accumulation of inorganic fertilizers through irrigation return flow, and the rising water table dissolves solutes getting accumulated in the vadose zone, resulting in an increased ion concentration, which can cause GW contamination significantly (Rajmohan 2020; Mohammed et al. 2023). Therefore, GW quality can be effectively monitored and evaluated during each cycle of irrigation. In this context, some irrigation water quality indices (IWQIs) such as SAR, PI, ESP, KR, and %Na are considered to be the most significant WQ parameters to evaluate excess Na+ which affect soil structure and plant growth (Sreedevi et al. 2019; Kouadra & Demdoum 2020; Chidambaram et al. 2022). Along with that, water quality index (IWQI) is a regular predictor to reflect a collective influence of all the WQ variables from a large number of datasets and helps to maintain the desired quality of surface water and GW with spatial monitoring (El-Rawy et al. 2023). Several multivariate statistical approaches have been used in many studies such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), and discriminant analysis (DA) to reduce the complexity of a large number of datasets by selecting most influencing variables on WQI (Benkov et al. 2023).

In recent years, several machine learning (ML), artificial intelligence (AI), and deep learning (DL) techniques contain statistical approaches, analyzing and predictive algorithms for modeling water quality parameters (Abuzir & Abuzir 2022; Mukherjee et al. 2022; Nong et al. 2023). El Bilali & Taleb (2020) used seven ML approaches, i.e. ANN, MLR, decision tree (DT), RF, SVR, KNN, and stochastic gradient descent (SGD), to predict SAR, adjusted SAR (SARa), ESP, %Na, RSC, PI, KR, Cl, MAR, and TDS by using EC and pH as input parameters in the surface water of the Bouregreg watershed in Morocco. Kulisz et al. (2021) developed the ANN model to predict the WQI of GW in an area of eastern Poland by taking five input variables such as pH, EC, Ca2+, Na+, and Mg2+ with significantly higher R2 of 0.99. Dimple et al. (2022) used five ML techniques, i.e. linear regression (LR), random subspace (RS), additive regression (AR), reduced error pruning tree (REPTree), and support vector machine (SVM), to predict six irrigation WQ indices such as SAR, %Na, KR, PI, SSP, and MH in GW of the Nand Samand catchment of Rajasthan, India.

Sundargarh district possesses relatively high GW potential, attributed to its distinct hydrogeological and lithological features. Major litho units of unconsolidated formations like highly porous laterite, alluvium, and ferruginous sandstone exhibit good GW potentiality. The annual GW extraction rate increased by 12.5% between 2017 and 2022 to meet the substantial irrigation demand, which accounts for 80% of total usage (CGWB 2017, 2022). Deolipali and Krinjikela were subjected to nitrate pollution in 2013, whereas a fluoride concentration of more than 1.5 ppm was seen from the exploratory well of Sargipalli (CGWB 2013). Hence, accurate monitoring and maintaining safe GW quality with proper assessment of necessary water quality parameters is a major concern for drinking, industrial, as well as irrigation purposes for this district. Some photographs related to GW extraction and laboratory investigation of GW quality are shown in Figure 1.
Figure 1

(a) Field collection of the GW sample and (b) laboratory investigation of the GW sample in the regional laboratory, Bhubaneswar for Sundargarh district.

Figure 1

(a) Field collection of the GW sample and (b) laboratory investigation of the GW sample in the regional laboratory, Bhubaneswar for Sundargarh district.

Close modal

Motivation

Agriculture is the backbone of the local economy of this district, and crop cultivation may suffer significant surface water scarcity due to very less availability of water bodies (1.21%). Hence, GW can serve as an effective alternative to meet the water demands of various crops. Paddy, being the primary crop of this district, occupies a substantial net cultivable area (76%), which needs a considerable amount of water throughout its growing period. Therefore, the motivation behind this research work is to explore the quality of GW for agricultural purposes by evaluating several important IWQIs along with the identification of most and least sensitive chemical water quality parameters responsible for spatial variation of GW quality throughout the district. Furthermore, extraction of GW shows an increasing trend during the past few years (2009–2022). Hence, evaluation and future prediction of GW quality parameters is of utmost importance for maintaining crop health for Sundargarh district. Consequently, no such similar study has yet been conducted for this district, providing both the opportunity and motivation to undertake the present research.

Building on previous studies, a comprehensive analysis is required to obtain detailed insights into the GW quality of Sundargarh district. Thus, this present research aims to achieve the following objectives: (i) to show the spatio-temporal variation of SAR, KR, %Na, PI, ESP, and IWQI in RS and GIS platforms; (ii) to perform trend analysis of these IWQIs by Mann–Kendall (MK) test, Sen's slope estimator, and graphical innovative trend analysis (ITA); (iii) to investigate the factor importance through PCA and DA and (iv) to assess the prediction of these WQ variables using regression models such as MLR, PCR, PLSR, and WLSR and ML tools such as ANN, SVM, CART, CRRF, and KNN and to investigate the model performance through statistical metrics.

The study area is Sundargarh, which is the second largest district of Orissa (Figure 2). It lies between the latitude of 21°36′N–22°32′N and longitude 83°32′E–85°25′E with an average elevation ranging from 190 to 328 m from mean sea level (m.s.l). This district experiences a hot and dry climate, except during the monsoon season. Summer temperatures can reach up to 48 °C, while winter temperatures occasionally drop to 5 °C. This district receives an annual average rainfall of 1,564.2 mm, approximately 80% of which is influenced by the southwest monsoon. Sundargarh spans a geographical area of 9,748 km2, with land use comprising 47.3% tree cover, 25.5% cropland, and 21.4% rangeland. Additionally, this district's geomorphological features contain 57.7% pediment pediplain complex and 26% moderately dissected hills and valleys responsible for moderate to good GW prospects. Two primary soil textures found in this district are red sandy loam and clay loam. Some major crops of this district are paddy, maize, jowar, greengram, redgram, arhar, and sugarcane.
Figure 2

Location map of the study area (Sundargarh district) showing dug wells.

Figure 2

Location map of the study area (Sundargarh district) showing dug wells.

Close modal

Data collection

GW quality parameters are obtained from the Central Groundwater Board (CGWB), India during the pre-monsoon season as seasonal data for 2014–2021 from 360 dug wells located across various taluks within the district, as illustrated in Figure 2. These data consist of Ca2+, Na+, K+, Mg2+, Cl, HCO3, and SO42− along with pH, EC, and TDS. Consequently, some useful IWQIs such as sodium adsorption ratio (SAR), Kelly's ratio (KR), percentage sodium (%Na), residual sodium carbonate (RSC), potential salinity (PS), permeability index (PI), magnesium absorption ratio (MAR), soluble sodium percentage (SSP), exchangeable sodium percentage (ESP), and IWQI are also determined.

Collection and preservation of GW samples

The CGWB of India conducts regional water level monitoring along with the collection of chemical water quality parameters during pre-monsoon, monsoon, and post-monsoon seasons through 1,608 National Hydrograph Network Stations (NHNS) in Orissa state. Open dug and bore wells are installed through shallow and deep phreatic aquifers at a depth of 30–300 m to facilitate the exploration of GW samples.

In contrast, these GW samples are collected and preserved in accordance with the standards provided in the American Public Health Association (APHA 2005) and BIS. Furthermore, physiochemical examination of these water samples is conducted in the regional laboratory of Bhubaneswar, Orissa. Plastic or polyethylene plastic bottles of 1 L capacity are used to collect GW samples. Refrigeration at 4 °C is required for preserving the GW samples and then they are brought to the laboratory and conditioned to 25 °C for determination of EC and decomposition of solid samples is allowed to take place for 7 days for measuring TDS. GW samples are preserved by adjusting pH < 2 using 50% HNO3 with 2 mL/L of water for determination of Ca2+ and Mg2+ after the bottles are thoroughly cleaned with 6(N) HCl and rinsed with deionized water. Concentration of Na+ and K+ is evaluated by preserving the samples at pH ≤ 2 with concentrated HNO3. Cl and SO42− can be determined by storing the samples for 28 days in the laboratory without refrigeration and the water samples are treated with CH2O for sulfate determination. GW samples consisting of sulfites are required to be oxidized to sulfate by DO at pH > 8. Water samples are refrigerated at 4 °C up to 14 days for measuring the alkalinity as HCO3. Samples containing bicarbonate alkalinity are carefully handled to avoid agitation and prolonged exposure to external condition.

Analysis and interpretation of data

The spatial variation of SAR, KR, %Na, PI, ESP, and IWQI is obtained in ArcGIS 10.4 by the IDW technique at 30 m spatial resolution projected to UTM zone 45N. Time-series trend analysis is carried out by the MK test, Sen's slope (Q) estimator, and graphical ITA for SAR, KR, %Na, PI, ESP, and IWQI in R Studio software. On the other hand, PCA and DA are conducted in SPSS 23 and XLSTAT 2023, respectively. Four multivariate regression models, namely, multiple linear regression (MLR), partial least square regression (PLSR), principal component regression (PCR), and weighted least square regression (WLSR) are implemented along with five ML tools such as artificial neural network (ANN), SVM, classification and regression tree (CART), classification and regression random forest (CRRF), and K-nearest neighbour (KNN) to assess the prediction of these WQ indices in SPSS 23, XLSTAT 2023, and MATLAB 2018R. Accuracy assessment and predictive strength of these models are determined using R2, RMSE, MAPE, NSEC, Wilmott index of agreement (WI), variance account factor (VAF), and RBIAS. A Friedman test is also conducted in XLSTAT 2023 to show the difference in prediction by different models. Figure 3 shows the overall methodology for the present research.
Figure 3

Flowchart illustrating detailed methodology.

Figure 3

Flowchart illustrating detailed methodology.

Close modal

Irrigation water quality indices

Sodium adsorption ratio

SAR is regarded as a key IWQI for assessing the alkali hazard in agricultural water and serves as an indicator of Na+ accumulation in soil (Gad et al. 2023). It can be calculated based on Equation (1a) according to Richards (1954):
(1a)

Kelly's ratio

KR is also an essential IWQI used for exhibiting the suitability of irrigation water by determining excess sodium ions. It can be calculated based on Kelly (1940):
(1b)

Percentage sodium

Percentage sodium is determined as in Equation (1c) suggested by Todd & Mays (2004). It also shows the suitability of irrigation water affecting permeability of soil and crop development (Elsayed et al. 2020):
(1c)

Permeability index

Na+, Ca2+, K+, Mg2+, and HCO3 with the passage of time can be the effective indicators of PI (Dimple et al. 2022). Doneen (1964) has suggested the formula for determination of PI:
(1d)

Soluble sodium percentage

SSP is used to determine the salinity by comparing Na+ with Ca2+ and Mg2+ concentrations. It can be calculated based on Todd & Mays (2004):
(1e)

Exchangeable sodium percentage

ESP is also an important indicator of alkali hazard in the irrigation water. A considerably adverse effect of Na+ in irrigation water is seen on soil crusting and infiltration capacity with ESP exceeding 15% (Laker & Nortjé 2019). Richards (1954) has suggested the evaluation of ESP as in Equation (1f):
(1f)

Potential salinity

PS is an important water quality estimator mainly for irrigation purpose and can be determined based on Doneen (1964):
(1g)

Magnesium absorption ratio

MAR is considered as one of the crucial indicators of irrigation water quality as proposed by Paliwal & Gandhi (1976). A higher magnesium content may affect the soil permeability and alkaline characteristics of soil:
(1h)

Residual sodium carbonate

RSC is determined for predicting the possibility of Ca2+ and Mg2+ getting precipitated on soil surface. On the contrary, an RSC above 2.5 is not suitable for irrigation and may drastically reduce the crop yield:
(1i)

Irrigation water quality index

IWQI is important to identify different suitability zones of GW or surface water resources for whether they can be safely utilized without any adverse effect on plant growth (Xiao et al. 2014). IWQI also indicates the presence of sodicity, salinity, and toxicity hazard in irrigation water. In this present study, IWQI is determined based on Meireles et al. (2010) by considering five input variables such as Na+, Cl, HCO3, EC, and SAR and their ratings and weights as in Equation (2a):
(2a)
where qi is the quality of each parameter considered and Wi is the weight of considered parameters. qi is calculated using Equation (2b) depending on the tolerance limits according to Meireles et al. (2010):
(2b)
where qmax is the maximum value of qi for each class. xij is the observed value of each parameter. xinf refers to the lower limit value of the class to which the parameter belongs. qimap represents the class amplitude. xamp corresponds to class amplitude to which the parameter belongs.

Tables 1 and 2 provide the quality of each parameter (qi) and weight of IWQI parameters (Wi), respectively. IWQI obtained by Equation (2a) varies between 0 and 100. Accordingly, the water use restriction zone (Table 3) is categorized based on the IWQI value.

Table 1

Parameter limiting values for quality measurement (qi) calculation

qiEC (dS/m)SAR (meq/L)1/2Na+ (meq/L)Cl (meq/L)HCO3 (meq/L)
85–100 0.20 ≤ EC < 0.75 2 ≤ SAR < 3 2 ≤ Na < 3 1 ≤ Cl < 4 1 ≤ HCO3 < 1.5 
60–85 0.75 ≤ EC < 1.50 3 ≤ SAR < 6 3 ≤ Na < 6 4 ≤ Cl < 7 1.5 ≤ HCO3 < 4.5 
35–60 1.50 ≤ EC < 3.00 6 ≤ SAR <12 6 ≤ Na < 9 7 ≤ Cl <10 4.5 ≤ HCO3 < 8.5 
0–35 EC < 0.20 or EC ≥ 3.00 SAR < 2 or SAR ≥ 12 Na < 2 or Na ≥ 9 Cl < 1 or Cl ≥10 HCO3 < 1 or HCO3 ≥ 8.5 
qiEC (dS/m)SAR (meq/L)1/2Na+ (meq/L)Cl (meq/L)HCO3 (meq/L)
85–100 0.20 ≤ EC < 0.75 2 ≤ SAR < 3 2 ≤ Na < 3 1 ≤ Cl < 4 1 ≤ HCO3 < 1.5 
60–85 0.75 ≤ EC < 1.50 3 ≤ SAR < 6 3 ≤ Na < 6 4 ≤ Cl < 7 1.5 ≤ HCO3 < 4.5 
35–60 1.50 ≤ EC < 3.00 6 ≤ SAR <12 6 ≤ Na < 9 7 ≤ Cl <10 4.5 ≤ HCO3 < 8.5 
0–35 EC < 0.20 or EC ≥ 3.00 SAR < 2 or SAR ≥ 12 Na < 2 or Na ≥ 9 Cl < 1 or Cl ≥10 HCO3 < 1 or HCO3 ≥ 8.5 
Table 2

Weights for the IWQI parameters

ParametersWi
EC 0.211 
Na+ 0.204 
HCO3 0.202 
Cl 0.194 
SAR 0.189 
ParametersWi
EC 0.211 
Na+ 0.204 
HCO3 0.202 
Cl 0.194 
SAR 0.189 
Table 3

Different water use restriction zones

IWQIWater use restriction zone
0–40 Severe restriction (SR) 
40–55 High restriction (HR) 
55–70 Moderate restriction (MR) 
70–85 Low restriction (LR) 
85–100 No restriction (NR) 
IWQIWater use restriction zone
0–40 Severe restriction (SR) 
40–55 High restriction (HR) 
55–70 Moderate restriction (MR) 
70–85 Low restriction (LR) 
85–100 No restriction (NR) 

Trend analysis

MK test

In this study, a non-parametric MK test is used to study the trend of spatio-temporal variation of SAR, KR, %Na, PI, ESP, and IWQI, and the magnitude of increasing or decreasing trend of these irrigation WQ parameters is evaluated by Sen's slope estimator (Q). The test statistics (S) of a time series x1, x2,…., xn can be determined as follows (Mann 1945; Kendall 1975):
(3a)
where n is the number of data points and xj and xk represent the data points at time j and k, respectively.
The variance of test statistics VAR(S) can be calculated by Equation (3b):
(3b)
where g denotes the tied group's number, and ti indicates the extent ith tied number.
The test statistic Z can be estimated as in Equation (3c) when n > 10:
(3c)

A positive or negative value of Z will show the direction of the trend.

Sen's slope estimator

Sen's slope estimator (Sen 1968) can detect the magnitude of change in terms of slope (Q) and it can be determined as in Equation (4) for N pairs of data:
(4)
where Qi is the median slope.

Innovative trend analysis

The graphical non-parametric ITA test was initially proposed by Şen (2012). It can detect monotonic as well as sub-trend for a given time-series dataset. The slope of ITA is determined as follows:
(5)
where B is the ITA slope; n indicates the extent of individual sub-series; xj and xk denote the value of consecutive sub-series; and is the mean of the first sub-series (xk).
On the other hand, PBIAS (Mandal et al. 2021) can be estimated in order to calculate the percentage change of the studied parameters between the first half and second half of the time series by the following formula:
(6)

Discriminant analysis

DA is a valuable statistical tool that identifies relationships and linear combinations between two or more metric variables (Mammeri et al. 2023). Discriminant function (DF) can be determined by Equation (7):
(7)
where i is the number of groups (G); Ki represents the constant specific to each group; n is the number of variables; Wj denotes the weight coefficient given by DA to a specified parameter Pj; and Pj is the analytical value of the selected variable.

Multivariate statistical regression models

Multiple linear regression

MLR is a widely used model to simulate different WQ parameters for its problem-solving capacity in complex relationships. The governing equation of an MLR model can be expressed as follows:
(8a)
where Y is the dependent variable; βo is the constant; β1, β2,…, βn are the unstandardized coefficients of explanatory variables and X1, X2, …, Xn are the independent or explanatory variables. In this study, the MLR model is developed in SPSS 23 in Enter method and at a significance level of p < 0.05.

Partial least square regression

PLSR is a multivariate statistical approach (MSA) that considers the comparison between multiple predictors and independent variables for a given dataset. Model quality is judged by statistical measures such as Q2 cum, R2Y cum, and R2X cum in XLSTAT 2023 in this present study.

Principal component regression

PCR combines PCA and ordinary least square (OLS) regression. The goodness of fit of the PCR model can be tested by measuring the statistical parameters, i.e. Durbin–Watson (DW), Mallow's Cp, Akaike information criterion (AIC), and Bayesian information criteria (BIC) in this present study.

Weighted least square regression

WLSR reflects the behavior of random errors in the model and can be effective for both linear and non-linear functions in the parameters. In the case of a simple LR, the relationship between the predictors and the response variables can be given by Equation (8b):
(8b)
where Y is the dependent variable, β is the coefficient of the explanatory variable, xi is the ith explanatory variable, and ei is the standard error in the linear equation. In WLS regression, the estimated equation minimizes , where wi is the weight of the ith parameter used in the equation.

ML techniques

Artificial neural network

Generally, neural network (NN) models are considered one of the most robust ML algorithms for time-series prediction (Aldhyani et al. 2020). This present study performs ANN in MATLAB 2018R for all the predictors. The performance evaluation measure for the model is based on mean square error (MSE). Figure 4 shows the architecture of the ANN model developed.
Figure 4

Architecture of the ANN model.

Figure 4

Architecture of the ANN model.

Close modal

Support vector machine

In SVM, input vectors are non-linearly mapped to a high-dimension feature space. The best hyperplane is the line with the most significant margin (Aldhyani et al. 2020). Equation (9a) shows the optimization of the margin to its support vector:
(9a)
where w is the normal vector to the hyperplane; xi is the input data; and parameter b is the distance of the hyperplane from the origin along the normal vector w. The slack variable represents the misclassified sample of the corresponding margin hyperplane; and C is the penalty cost.

Classification and regression tree

The CART was initially developed by Breiman et al. (1984). Parent node is the most significant factor in the CART model. The best model output is obtained by considering maximum parent size, maximum tree depth, and maximum number of son size in the CART algorithm.

Classification and regression random forest

The CRRF model was also primarily proposed by Breiman et al. (1984). The output of the CRRF model includes OOB (out-of-bag) error, OOB prediction, and OOB prediction details.

K-nearest neighbor

The KNN algorithm basically tries to recognize the k-nearest instances in the training dataset and employs that in the label which occurs most frequently within that k-subset (Mohammed et al. 2023). The KNN fit for Y is specifically defined as follows:
(9b)
where Nk (x) is the neighborhood of x which is defined as the k closet points xi in the training sample. Table 4 shows the hyperparameters for each model developed in this study.
Table 4

Tuning hyperparameters for regression and ML models

ModelTuning hyperparameter
PLSR 
  • Cross-validation method: Jakknife (LOO)

 
  • Number of groups: 10; Stop condition: Automatic; Confidence interval: 95%

 
ANN 
  • Training function: TRAINLM (Lavenberg-Marquardt algorithm)

 
  • Transfer function: TANSIG; Network type: Feed-forward backpropagation

 
  • Maximum number of iterations: 6; Maximum number of epochs: 1,000

 
SVM 
  • Penalty cost (C): 1

 
  • Slack variable ): 0.1

 
  • Kernel function: Linear kernel

 
CART 
  • Complexity parameter (CP): 0.0001

 
  • Maximum tree depth: 11 (for SAR); 19 (for KR)

 
  • 20 (for ESP and IWQI); 18 (for PI); 7 (for %Na)

 
CRRF 
  • Method: Bagging; Sampling method: Random with replacement

 
  • Number of trees: 100

 
KNN 
  • Concept of closeness between observed and predicted data: Metric (Euclidian distance)

 
  • Number of neighbors (k): 3

 
ModelTuning hyperparameter
PLSR 
  • Cross-validation method: Jakknife (LOO)

 
  • Number of groups: 10; Stop condition: Automatic; Confidence interval: 95%

 
ANN 
  • Training function: TRAINLM (Lavenberg-Marquardt algorithm)

 
  • Transfer function: TANSIG; Network type: Feed-forward backpropagation

 
  • Maximum number of iterations: 6; Maximum number of epochs: 1,000

 
SVM 
  • Penalty cost (C): 1

 
  • Slack variable ): 0.1

 
  • Kernel function: Linear kernel

 
CART 
  • Complexity parameter (CP): 0.0001

 
  • Maximum tree depth: 11 (for SAR); 19 (for KR)

 
  • 20 (for ESP and IWQI); 18 (for PI); 7 (for %Na)

 
CRRF 
  • Method: Bagging; Sampling method: Random with replacement

 
  • Number of trees: 100

 
KNN 
  • Concept of closeness between observed and predicted data: Metric (Euclidian distance)

 
  • Number of neighbors (k): 3

 

Statistical measures for predictive strength and accuracy of the proposed models

The accuracy of the proposed models is checked by evaluating certain most widely used statistical evaluations such as RMSE, NSEC, MAPE, R2, RBIAS, WI, and VAF in this present study. They are calculated as follows:
(10a)
(10b)
(10c)
(10d)
(10e)
(10f)
(10g)
where Oi is the observed data; Pi is the predicted data; Oiavg is the mean of the observed data; and N is the number of data points.

Statistical analysis of WQ variables

Table 5 represents a comprehensive statistical summary for SAR, KR, %Na, PI, ESP, and IWQI, and similar IWQIs were studied by Batarseh et al. (2021), Elsayed et al. (2020), Gad et al. (2023), and Moneam (2023). Distribution of these water quality variables is also demonstrated by box plot (Figure 5) and histogram (Figure 6), and similar frequency distribution plots are developed by Khan et al. (2022), Nair & Vijaya (2022), Shaw & Sharma (2024), and Yan et al. (2024). SAR ranges from 0.09 to 10.06 meq/L0.5 with a mean of 1.13 meq/L0.5 indicating safe use of GW for agricultural purposes without any sodium hazard. KR ranges from 0.03 to 3.34 with a mean of 0.5; %Na varies from 4.83 to 77 with a mean of 32.44; PI ranges from 31 to 168.01 having a mean of 71.96; ESP ranges from (−1.14) to 11.95 with a mean of 0.38, indicating no excess Na+ ion in irrigation water. Subsequently, significantly lower values of SAR and ESP confirm the higher probability of occurrence of C1-S1 type irrigation water for this district. These above findings are highly aligned with the findings by Shaw & Sharma (2024) and in partial agreement with the observations by Mousazadeh et al. (2019), Taşan (2023), and Zafar et al. (2024) for SAR and %Na; Gad et al. (2023) and Tejashvini et al. (2024) for KR and PI. IWQI ranges from 34.58 to 90.59 with a mean of 63.26, which is partially similar to the findings of Batarseh et al. (2021) and M'nassri et al. (2022). Comparatively lower SAR (<3), KR (<1), %Na (<40), highly suitable IWQI (55–85), and PI (>75) are observed for 329, 301, 212, 125, and 285 times, respectively, exhibiting higher suitability of GW for irrigation. Mousazadeh et al. (2019) found that SAR (<3) and %Na (<40) have a relative frequency of 0.93 aligned partially with the present study. The box plot (Figure 5) demonstrates relatively higher positive skewness for SAR, KR, and ESP, having skewness coefficients of 0.77, 0.73, and 0.76, respectively. A relatively higher variability in the data variation can be obtained for %Na and PI having interquartile range (IQR) of 28.39 and 27.19, respectively. The findings from box plots are completely similar to those of Shaw & Sharma (2024).
Table 5

Descriptive statistics for the predicted IWQIs

VariablesMinMaxMeanMedianStd. Dev.
SAR 0.09 10.06 1.13 0.89 0.94 
KR 0.03 3.34 0.50 0.41 0.37 
%Na 4.83 77.00 32.44 31.42 13.45 
PI 31.00 168.01 71.96 70.03 17.10 
ESP −113.56 1,195.09 38.25 5.22 130.42 
IWQI 34.58 90.59 63.26 62.67 9.36 
VariablesMinMaxMeanMedianStd. Dev.
SAR 0.09 10.06 1.13 0.89 0.94 
KR 0.03 3.34 0.50 0.41 0.37 
%Na 4.83 77.00 32.44 31.42 13.45 
PI 31.00 168.01 71.96 70.03 17.10 
ESP −113.56 1,195.09 38.25 5.22 130.42 
IWQI 34.58 90.59 63.26 62.67 9.36 

Std. Dev., standard deviation.

Figure 5

Box plot for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.

Figure 5

Box plot for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.

Close modal
Figure 6

Histogram for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.

Figure 6

Histogram for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.

Close modal

Spatial variation of some important WQ variables

Table 6 presents the distribution of wells across different classes of the respective WQ parameters. One hundred per cent of wells fall under excellent category of water class based on SAR (0–10) and ESP (<20) indicating no alkali hazard in GW for this study area during the pre-monsoon season carrying low sodium (S1) water. GW from 93 and 80% wells are considered to be fit for irrigation based on KR (<1) and %Na (<40), respectively. GW can be suitable based on the PI value where 38 and 62% samples are classified as Class I and Class II type water, respectively. These findings outlined above closely align with those reported by Shaw & Sharma (2024) and they reported 100, 98, 83, and 100% water samples belong to highly suitable GW class based on SAR, KR, %Na, and ESP, respectively. Sanad et al. (2024) and Yıldız & Karakuş (2020) observed that a significant number of GW water samples fall under SAR < 10 and KR < 1, which are in full agreement with the current study. Elsayed et al. (2020) found that around 96 and 29% water samples belong to good and Class I category of water based on %Na (<40) and PI (>75), respectively. Mousazadeh et al. (2019) found that 100 and 97% GW samples belong to SAR < 10 and KR < 1, respectively. These findings are in partial agreement with the findings of the current study. Approximately 92% wells (Table 6) fall under low restriction (LR) and moderate restriction (MR) zones indicating low level of GW contamination during pre-monsoon season. Tejashvini et al. (2024) found that 90% GW samples are categorized as LR and MR zones aligned fully with the above findings. Figure 7 shows that 100, 98.7, 99.3, 27.5, 100 and 95% areas are categorized as excellent, suitable, good, Class I, excellent class of GW and LR and MR zones based on the spatial variation of SAR, KR, %Na, PI, ESP, and IWQI, respectively. Only 5.06% area (Figure 7(f)) of this district containing 7% wells is coming under severe restriction (SR) and high restriction (HR) zones are mostly found in some part of the western region of this district and exhibit poor GW quality. A similar observation is partially aligned with Badr et al. (2023) and Khalaf & Hassan (2013). Moderate to high salt-tolerant crops such as paddy, wheat, and groundnut can be cultivated with this GW in this district, which is reported by Meireles et al. (2010). Cultivation of these crops with this water may lead to reduction in photosynthetic activity, which ultimately affects crop water productivity (Devi & Arumugam 2019). Gypsum can be added with irrigation water to mitigate the increased Na+ concentration along with crop diversification to reduce the quantity of water aligned with Zaman et al. (2018) and Hashemi et al. (2024). Controlled management of industrial, domestic wastes and irrigation return flow, proper maintenance of drainage systems, and efficient use of chemical fertilizers improve GW quality in this district (Shaw & Sharma 2024).
Table 6

Distribution of wells in different classes of water based on WQ parameters

WQ parametersPermissible rangeWater class/categoryNumber of water samples% Water samples
SAR (meq/L)1/2 0–10 Excellent 360 100 
KR <1 Suitable 336 93 
1–2 Marginal suitable 24 
%Na <20 Excellent 45 13 
20–40 Good 244 67 
40–60 Permissible 65 18 
60–80 Doubtful 
PI >75 Class I 138 38 
25–75 Class II 222 62 
ESP <20 Excellent 360 100 
IWQI 0–40 SR 
40–55 HR 22 
55–70 MR 265 74 
70–85 LR 64 18 
85–100 NR 
WQ parametersPermissible rangeWater class/categoryNumber of water samples% Water samples
SAR (meq/L)1/2 0–10 Excellent 360 100 
KR <1 Suitable 336 93 
1–2 Marginal suitable 24 
%Na <20 Excellent 45 13 
20–40 Good 244 67 
40–60 Permissible 65 18 
60–80 Doubtful 
PI >75 Class I 138 38 
25–75 Class II 222 62 
ESP <20 Excellent 360 100 
IWQI 0–40 SR 
40–55 HR 22 
55–70 MR 265 74 
70–85 LR 64 18 
85–100 NR 
Figure 7

Spatial variation of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the pre-monsoon season during 2014–2021.

Figure 7

Spatial variation of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the pre-monsoon season during 2014–2021.

Close modal

Results of the trend analysis

The results of the trend analysis for SAR, KR, %Na, PI, ESP, and IWQI during 2014–2021 for the pre-monsoon season is shown in Table 7 using the MK test, Sen's slope estimator, ITA, and PBIAS test. Similar techniques were implemented by Kurunç et al. (2005), Sun et al. (2016), and Yenilmez et al. (2011) for trend analysis of different cations and anions. In this present study, the graphical ITA method is implemented with the MK test to detect the sub-trend in the variation of SAR, KR, %Na, PI, ESP, and IWQI, which is overlooked by the MK test. The total data series (2014–2021) is divided into two parts, i.e. first half (2014–2017) and second half (2018–2021), and a graphical plot is established between them. Along with that, this graphical plot exhibits the positive or negative change in the variation of these water quality parameters by comparing the first (2014–2017) and second (2018–2021) halves of the ITA pre-processed data along with Sen's slope.

Table 7

Details of pre-monsoon season WQ parameters trend of Sundargarh district

WQ parametersZ StatisticsMK tauSen's slopeITA slopeITDPBIAS
SAR 0.915 0.032 0.0003 0.680  13.59 
KR 1.191 0.042 0.0002 0.721  14.42 
%Na 1.361 0.048 0.0098 0.210  4.20 
PI 0.091 0.003 0.0007 −0.031  −0.61 
ESP 0.913 0.032 0.0381 3.396  67.91 
IWQI 0.026 0.001 0.0001 −0.067  −1.34 
WQ parametersZ StatisticsMK tauSen's slopeITA slopeITDPBIAS
SAR 0.915 0.032 0.0003 0.680  13.59 
KR 1.191 0.042 0.0002 0.721  14.42 
%Na 1.361 0.048 0.0098 0.210  4.20 
PI 0.091 0.003 0.0007 −0.031  −0.61 
ESP 0.913 0.032 0.0381 3.396  67.91 
IWQI 0.026 0.001 0.0001 −0.067  −1.34 

ITD, innovative trend detection.

From the Z statistics, it can be observed that the distribution of SAR, KR, %Na, and ESP shows a remarkable monotonic increasing trend during 2014–2021 at p > 0.05. However, the distribution of PI and IWQI is reflecting a moderately insignificant (p > 0.05) monotonic increasing trend. Sen's slope shows that the increasing rate of SAR, KR, PI, and IWQI is mild, whereas %Na and ESP have a moderate increasing rate. Figure 8 illustrates the spatio-temporal variation (trend) of these six water quality parameters during the first half (2014–2017) and second half (2018–2021) of the data series to show the presence of the sub-trend by the ITA method.
Figure 8

Spatio-temporal variation (trend) of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the time period of 2014–2017 (first half of the time series) and 2018–2021 (second half of the time series) for Sundargarh district.

Figure 8

Spatio-temporal variation (trend) of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the time period of 2014–2017 (first half of the time series) and 2018–2021 (second half of the time series) for Sundargarh district.

Close modal

On the contrary, results of graphical ITA (Table 7) reveal that the distribution of SAR, KR, %Na, and ESP shows a positive trend having a positive ITA slope. Furthermore, PI and IWQI exhibit a marginal decreasing trend in their data distribution having a negative ITA slope which is not similar to the results obtained by the MK test. It signifies that the nature of the sub-trend for the distribution of PI and IWQI is either non-monotonic increasing or decreasing during the two halves (2014–2017 and 2018–2021) of the data period. It is observed that PBIAS for SAR, KR, %Na, and ESP indicates that the second half (2014–2021) of the data period is dominated by positive change for 57.2, 2.8, 51.7, and 57.8% wells, respectively. However, PBIAS for PI and IWQI indicates that 46.1 and 53.3% wells are influenced by a negative change and first half and second half of the data are dominated by increasing and decreasing trends for PI and IWQI, respectively.

Results of PCA

In this present study, PCA is performed for 360 GW samples with 19 WQ parameters including pH, EC, TDS, Na+, Ca2+, Mg2+, K+, Cl, HCO3, SO42−, RSC, SAR, KR, %Na, PI, ESP, SSP, MAR, and PS for the pre-monsoon season during 2014–2021. Similar studies have been conducted by Nosrati & Van Den Eeckhaut (2012) and Chen et al. (2018). PCA exhibits its importance by identifying most important water quality parameters (19 variables), having a significantly higher positive factor loading under several clusters (Eigen value > 1) for the prediction of the response variable (IWQI) for this present study (Mitra et al. 2018). Also, the relative importance of each principal component (PC) is illustrated for the determination of IWQI through the explained variance. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy for this dataset is found to be 0.735 (>0.5) and Bartlett's test of sphericity is significant (p < 0.05), indicating the suitability of the dataset for PCA, which was reported by Hutcheson & Sofroniou (1999). Varimax factor loading is used to determine Eigen values, with percentage of variance explained in Table 8 (Ganiyu et al. 2022; Taşan et al. 2022).

Table 8

Varimax rotated factor loadings of GW quality parameters

VariablesVarimax components
PC1PC2PC3PC4
pH 0.072 −0.189 0.602 0.101 
EC 0.918 0.324 0.099 −0.072 
TDS 0.945 0.285 0.089 −0.074 
Ca2+ 0.414 −0.691 0.016 −0.532 
Mg2+ 0.699 −0.514 0.086 0.394 
Na+ 0.932 0.214 0.207 −0.047 
K+ 0.359 −0.056 −0.231 0.052 
HCO3 0.634 −0.338 0.594 −0.043 
SO42− 0.671 −0.170 −0.231 0.187 
Cl 0.860 0.215 −0.314 −0.131 
RSC −0.143 0.618 0.715 0.003 
SAR 0.852 0.466 0.144 −0.058 
KR 0.587 0.774 0.015 −0.060 
%Na 0.515 0.792 0.220 −0.028 
PI −0.245 0.865 0.040 −0.049 
ESP 0.861 0.470 0.120 −0.055 
MAR 0.418 −0.061 0.076 0.868 
SSP 0.533 0.787 0.190 −0.040 
PS 0.890 0.222 −0.323 −0.089 
Eigen value 8.398 4.67 1.73 1.29 
Variance (%) 44.20 24.58 9.11 6.81 
Cumulative variance (%) 44.20 68.78 77.89 84.70 
VariablesVarimax components
PC1PC2PC3PC4
pH 0.072 −0.189 0.602 0.101 
EC 0.918 0.324 0.099 −0.072 
TDS 0.945 0.285 0.089 −0.074 
Ca2+ 0.414 −0.691 0.016 −0.532 
Mg2+ 0.699 −0.514 0.086 0.394 
Na+ 0.932 0.214 0.207 −0.047 
K+ 0.359 −0.056 −0.231 0.052 
HCO3 0.634 −0.338 0.594 −0.043 
SO42− 0.671 −0.170 −0.231 0.187 
Cl 0.860 0.215 −0.314 −0.131 
RSC −0.143 0.618 0.715 0.003 
SAR 0.852 0.466 0.144 −0.058 
KR 0.587 0.774 0.015 −0.060 
%Na 0.515 0.792 0.220 −0.028 
PI −0.245 0.865 0.040 −0.049 
ESP 0.861 0.470 0.120 −0.055 
MAR 0.418 −0.061 0.076 0.868 
SSP 0.533 0.787 0.190 −0.040 
PS 0.890 0.222 −0.323 −0.089 
Eigen value 8.398 4.67 1.73 1.29 
Variance (%) 44.20 24.58 9.11 6.81 
Cumulative variance (%) 44.20 68.78 77.89 84.70 

Bold values represent strong (>0.75) and moderate (0.5–0.75) loadings.

Based on the Eigen value (>1), four clusters (PCs) have been selected (Table 8 and Figure 9(a)) with a total variance of 84.70%. Figure 9(b) illustrates the importance of variables in PC1 and PC2 through a loading plot. Salinity factors (EC, TDS, Na+, Cl, SAR, PS, and ESP) and sodium hazard factors (KR, %Na, and PI) can be the most sensitive factors influencing PC1 and PC2, respectively, based on the Eigen vector. The first component (PC1) has an Eigen value of 8.398 and expresses 44.2% of the total variance. In PC1, strong positive loadings are obtained for EC (0.918), TDS (0.945), Na+, (0.932), Cl (0.860), SAR (0.852), ESP (0.861), and PS (0.890) for showing the variation in GW quality. Hence, PC1 can be identified as the ‘salinity factor’ and Cl is commonly recognized as an indicator of occurrence of anthropogenic activities (Akshitha et al. 2021). Saeed et al. (2024) found that PC1 contains EC, Na+, and Cl as the most significant factors, showing a variance of 49.76%. Taşan et al. (2022) indicated that EC, TDS, Na+, Cl, and SAR have a higher factor loading in PC1, explaining a variance of 47.67%. These findings are in high agreement with the current study. Uncontrolled use of chemical fertilizers and improper sewage disposal may result in the dominance of Na+, which indicates silicate weathering can be a major geochemical mechanism for the occurrence of GW in this study area, aligned with the findings of Aravinthasamy et al. (2020). However, RSC (−0.143) and PI (−0.245) are negatively correlated with PC1, indicating that GW quality throughout the study area is moderately affected by sodium hazard.
Figure 9

(a) Scree plot and (b) loading plot for PCA of the WQ data during the pre-monsoon period.

Figure 9

(a) Scree plot and (b) loading plot for PCA of the WQ data during the pre-monsoon period.

Close modal

Factor 2 (PC2) has explained 24.58% of the total variance having an Eigen value as 4.67 and Patnaik et al. (2024) have found a similar outcome. It includes KR (0.774), %Na (0.792), PI (0.865), and SSP (0.787), showing a comparatively higher factor loading (>0.75) and RSC having a moderate positive factor loading (0.618) aligned partially with the findings of Raju (2007). On the other hand, %Na, PI, and SSP are strongly correlated with PC2, resulting in dominance of Na+, indicating that evaporation can be a major geochemical mechanism for GW occurrence (Saha & Paul 2019). Therefore, PC2 can be identified as a ‘sodium hazard’ factor.

PC3 has explained 9.11% of the total variance showing an Eigen value of 1.73. RSC (0.715) and pH (0.602) have a moderate positive factor loading in PC3, and this finding partially aligned with Chen et al. (2018). GW may be alkaline for dominance of HCO3 (Sharma et al. 2016) and more concentration of (HCO3 + SO42−) over total alkalis may indicate that carbonate weathering can have more influence of GW occurrence for few wells. PC3 may be called as ‘alkalinity’ factor.

PC4 has explained 6.81% of the total variance including an Eigen value of 1.29. MAR is highly positively correlated (0.868) with PC4, which illustrates mineral unification for GW occurrence. Hence, PC4 can be identified as a ‘magnesium hazard’ factor.

Spatial variation in GW quality through DA

The spatial variation of GW quality from 360 dug wells is studied by DA by adopting 10 explanatory variables (EC, TDS, Na+, Cl, SAR, KR, ESP, PS, %Na, and PI) and one response variable (IWQI) for this present study aligned with Nosrati & Van Den Eeckhaut (2012) and Chen et al. (2018). The objective of DA is to categorize the most significant variables for showing the significant variability of GW quality among the wells. Table 9 exhibits the statistical summary of DA including Wilks' Lambda and F-test value for determining the significance of DFs at a significance level of p < 0.01 for most of the variables. The evaluation of unstandardized DFs corresponds to the statistical summary provided in Table 10. The first DF explained 93.14% of the total variance, having an Eigen value of 6.12 and has the strongest discriminant power, whereas the second and third DF explained 5.56 and 0.96% of the total variance, respectively. On the contrary, the canonical correlation (CC) coefficient has explained only 86% or (0.927)2 of the first DF, which clearly reflects the significant spatial variability of the WQ variables among the stations. Mammeri et al. (2023) found that the first DF has the highest discriminating power including a CC of 0.971 for showing the spatial variability of surface water aligned mostly to the findings of the present study.

Table 9

Test of group mean equality for discriminant analysis

VariablesWilk's LambdaFDF1DF2p-value
EC 0.946 3.529 247 0.0080 
TDS 0.938 4.049 247 0.0034 
Na+ 0.894 7.326 247 <0.0001 
Cl 0.880 8.382 247 <0.0001 
SAR 0.859 10.136 247 <0.0001 
KR 0.832 12.506 247 <0.0001 
%Na 0.818 13.698 247 <0.0001 
PI 0.737 22.004 247 <0.0001 
ESP 0.842 16.691 247 <0.0001 
PS 0.874 8.885 247 <0.0001 
IWQI 0.151 347.904 247 <0.0001 
VariablesWilk's LambdaFDF1DF2p-value
EC 0.946 3.529 247 0.0080 
TDS 0.938 4.049 247 0.0034 
Na+ 0.894 7.326 247 <0.0001 
Cl 0.880 8.382 247 <0.0001 
SAR 0.859 10.136 247 <0.0001 
KR 0.832 12.506 247 <0.0001 
%Na 0.818 13.698 247 <0.0001 
PI 0.737 22.004 247 <0.0001 
ESP 0.842 16.691 247 <0.0001 
PS 0.874 8.885 247 <0.0001 
IWQI 0.151 347.904 247 <0.0001 
Table 10

Eigen values for the DFs

Discriminant functionEigen value% VarianceCumulative %CC coefficient
6.12 93.14 93.14 0.927 
0.37 5.56 98.71 0.517 
0.06 0.96 99.70 0.244 
Discriminant functionEigen value% VarianceCumulative %CC coefficient
6.12 93.14 93.14 0.927 
0.37 5.56 98.71 0.517 
0.06 0.96 99.70 0.244 

CC coefficient, canonical correlation coefficient.

Table 11 represents the standardized canonical coefficients for the DFs. Na+, Cl, SAR, PS, and IWQI can be the major WQ variables in the first DF for discriminating the spatial variation of GW quality across 360 sampling stations. However, Na+, SAR, %Na, PI, and PS are highly correlated with the second DF and Na+, SAR, KR, %Na, and PI have a comparatively higher correlation with the third DF, showing the spatial difference of the GW quality throughout the district for the dug wells considered.

Table 11

Standardized canonical DF coefficient

VariablesDF
123
EC −0.584 −0.127 0.613 
TDS 0.929 0.554 −0.552 
Na+ −0.962 −1.403 −9.027 
Cl 1.076 −0.641 0.558 
SAR 1.074 1.777 12.992 
KR −0.411 0.182 −4.090 
%Na 0.130 −1.227 −2.128 
PI −0.198 1.536 0.980 
ESP 0.000 0.000 0.000 
PS −1.487 1.340 0.720 
IWQI 1.032 0.183 −0.012 
VariablesDF
123
EC −0.584 −0.127 0.613 
TDS 0.929 0.554 −0.552 
Na+ −0.962 −1.403 −9.027 
Cl 1.076 −0.641 0.558 
SAR 1.074 1.777 12.992 
KR −0.411 0.182 −4.090 
%Na 0.130 −1.227 −2.128 
PI −0.198 1.536 0.980 
ESP 0.000 0.000 0.000 
PS −1.487 1.340 0.720 
IWQI 1.032 0.183 −0.012 

Table 12 illustrates the classification function coefficients of the input WQ variables for the water restriction zones as per IWQI, namely, NR, LR, MR, HR, and SR zones. Na+, SAR, Cl, KR, PI, and PS are the most sensitive WQ variables for the classification of water restriction zones. Subsequently, ESP has no influence on the occurrence of these zones.

Table 12

Classification function coefficients

VariablesIWQI Class
HRLRMRNRSR
EC 1.335 −10.947 −3.725 −20.917 11.535 
TDS 0.379 0.415 0.393 0.478 0.349 
Na −57.282 −64.414 −63.308 −78.394 −63.896 
Cl 17.395 22.425 20.607 27.940 13.424 
SAR 71.343 86.592 84.675 109.955 88.766 
KR −8.981 −24.818 −23.555 −35.172 −20.223 
%Na −2.808 −2.823 −2.821 −3.152 −3.270 
PI 2.617 2.566 2.540 2.756 2.961 
ESP 0.000 0.000 0.000 0.000 0.000 
PS −20.381 −26.211 −23.943 −32.205 −13.380 
Intercept −247.854 −367.771 −295.118 −500.842 −225.395 
VariablesIWQI Class
HRLRMRNRSR
EC 1.335 −10.947 −3.725 −20.917 11.535 
TDS 0.379 0.415 0.393 0.478 0.349 
Na −57.282 −64.414 −63.308 −78.394 −63.896 
Cl 17.395 22.425 20.607 27.940 13.424 
SAR 71.343 86.592 84.675 109.955 88.766 
KR −8.981 −24.818 −23.555 −35.172 −20.223 
%Na −2.808 −2.823 −2.821 −3.152 −3.270 
PI 2.617 2.566 2.540 2.756 2.961 
ESP 0.000 0.000 0.000 0.000 0.000 
PS −20.381 −26.211 −23.943 −32.205 −13.380 
Intercept −247.854 −367.771 −295.118 −500.842 −225.395 

Table 13 shows that DA is able to properly categorize 94.4 and 90.74% correctly classified water restriction zones in the training and testing dataset, respectively, aligned with the findings of Mammeri et al. (2023). However, DA also effectively identifies 91.27% correctly classified water restriction zones in cross-validation, signifying its appropriateness and effectuality in spatial variation of GW quality for this district.

Table 13

Confusion matrix for discriminant analysis

HRLRMRNRSRTotal
Original Training Count HR 17 21 
LR 60 61 
MR 151 160 
NR 
SR 
Validation Count HR 14 16 
LR 21 22 
MR 58 63 
NR 
SR 
Cross-validated Count HR 16 21  
LR 59 61  
MR 10 148 160  
NR  
SR  
HRLRMRNRSRTotal
Original Training Count HR 17 21 
LR 60 61 
MR 151 160 
NR 
SR 
Validation Count HR 14 16 
LR 21 22 
MR 58 63 
NR 
SR 
Cross-validated Count HR 16 21  
LR 59 61  
MR 10 148 160  
NR  
SR  

Evaluation of results by regression models

In this present research, four multivariate regression models, namely, MLR, PLSR, PCR, and WLSR, are used for the prediction of SAR, KR, %Na, PI, ESP, and IWQI for the pre-monsoon season by taking 70 and 30% wells as training and testing datasets, respectively. Similar regression models were implemented by many researchers for the prediction of WQ variables. Regression coefficients (Table 14) for these models and their predictive strength are evaluated by determining several statistical measures, namely, RMSE, MAPE, NSEC, R2, WI, VAF, and RBIAS both in training and testing phases, as given in Table 15. Gupta et al. (2024) used similar statistical parameters in their study. A model can underestimate (RBIAS > 0) or overestimate (RBIAS < 0) based on the RBIAS value (Gorgij et al. 2023). MLR can be considered as the superior regression model for the prediction of SAR (R2 = 0.96, VAF = 96.93%, RMSE = 0.21 meq/L1/2) and ESP (R2 = 0.96, VAF = 96.33%, RMSE = 0.26) in the training dataset. Also, it has a comparatively higher predictive strength in the testing dataset for prediction of SAR (R2 = 0.93, VAF = 92.96%, RMSE = 0.24 meq/L1/2) and ESP (R2 = 0.94, VAF = 94.11%, RMSE = 0.30 meq/L1/2). On the contrary, MLR has moderate prediction accuracy for the prediction of %Na, PI, and IWQI having RMSE as 8.24, 10.06, and 5.75 in the training and 7.72, 10.95, and 6.75 in the testing dataset, respectively. MLR highly overestimates the prediction of KR, SAR, and ESP, having a relatively higher negative RBIAS as −39.12, −29.57, and −15.19, respectively, in the training phase. The PCR model shows higher prediction accuracy for all the WQ variables over the PLSR model, having relatively higher R2, NSEC, VAF, and lower RMSE both in training and testing datasets. However, PLSR can be considered as the inferior approach for the determination of IWQI in both training and testing phases, having significantly lower R2 value ( and ) and VAF (VAFtraining = 14.93% and VAFtesting = 23.65%). These observations are similar to the findings of Abba et al. (2020) and Mokhtar et al. (2022). Furthermore, El Bilali & Taleb (2020) used MLR in the Bouregreg watershed in Morocco for the prediction of 10 WQ parameters and found its higher suitability for the prediction of SAR and KR having an R2 of 0.94 and 0.71, respectively. Elsayed et al. (2020) exhibited higher prediction accuracy of MLR and PCR models for the prediction of SAR, KR, %Na, PI, and IWQI. The prediction of all the WQ variables through PLSR and PCR models perfectly aligned with the observed data, having RBIAS of zero in the training dataset. The WLSR model exhibits higher predictive strength for the prediction of SAR (; RMSEtraining = 0.23 and ; RMSEtesting = 0.28), KR (; RMSEtraining = 0.17 and ; RMSEtesting = 0.25), and ESP (; RMSEtraining = 0.28 and ; RMSEtesting = 0.40) in the training phase as compared with the testing phase with relatively higher overestimation in the training phase. Simultaneously, WLSR provides comparatively higher underestimation for ESP (RBIAS = 15.13).

Table 14

Regression coefficients developed between water quality indices and chemical composition of water in regression models

Dependent variablesIndependent variablesRegression coefficients
Dependent variablesIndependent variablesRegression coefficients
MLRPLSRPCRWLSRMLRPLSRPCRWLSR
SAR Intercept 1.847 1.77 1.543 1.045 ESP Intercept −0.35 −0.637 −0.426 −0.635 
pH −0.162 −0.179 −0.13 −0.083 EC 0.276 0.256 0.0124 −0.436 
EC −0.221 0.304 −0.417 −0.594 TDS −0.0021 −0.0011 −0.0012 0.0011 
TDS −0.001 0.001 −0.001 0.001 Mg2+ −0.0782 −0.475 −0.0443 −0.682 
Mg2+ −0.078 −0.327 −0.032 −0.098 Na+ 1.102 0.78 1.08 1.061 
Na+ 0.801 0.584 0.795 0.714 Cl −0.0177 0.101 −0.0607 −0.115 
     HCO3 0.129 0.009 0.12 −0.111 
KR Intercept 1.68 2.115 1.509 1.427 PI Intercept 196.423 250.63 203.55 194.689 
pH −0.145 −0.218 −0.132 −0.129 pH −13.342 −20.892 −14.519 −13.439 
EC −0.045 −0.023 −0.189 −0.104 EC −19.483 −9.436 −21.605 −14.844 
TDS −0.001 −0.001 TDS −0.114 −0.015 −0.067 −0.093 
Mg2+ −0.039 −0.165 −0.026 −0.03 Mg2+ −2.393 −7.933 −2.888 −2.259 
Na+ 0.348 0.226 0.331 0.306 Na+ 12.455 6.806 11.332 10.595 
Cl −0.008 0.021 −0.03 −0.058 Cl 0.233 −1.746 −1.539 −0.277 
HCO3 −0.059 −0.021 −0.052 −0.052 HCO3 1.19 −0.622 −0.19 0.582 
%Na Intercept 35.816 −0.021 33.392 32.023 IWQI Intercept 121.406 61.196 158.519 128.457 
EC 13.213 −0.109 11.279 −4.054 EC −1.2 1.528 −0.846 −4.285 
TDS −0.041 0.005 −0.01 0.011 TDS −0.012 0.003 −0.045 −0.025 
Mg2+ −1.534 −5.477 −1.762 −1.636 Na+ −22.597 0.234 −13.132 −22.377 
Na+ 10.863 6.837 10.035 12.865 Cl −1.955 0.448 −2.153 −1.063 
Cl −0.747 1.908 −2.146 −2.78 SAR −5.03 0.353 −44.467 −3.022 
HCO3 −4.85 −2.43 −5.291 −4.784 ESP 42.4 0.3 59.8 44.6 
     PS 4.591 0.427 3.872 4.5 
     PI −0.198 −0.035 −0.362 −0.215 
     %Na 0.078 0.028 0.139 0.031 
     KR −53.374 0.493 −40.297 −61.99 
Dependent variablesIndependent variablesRegression coefficients
Dependent variablesIndependent variablesRegression coefficients
MLRPLSRPCRWLSRMLRPLSRPCRWLSR
SAR Intercept 1.847 1.77 1.543 1.045 ESP Intercept −0.35 −0.637 −0.426 −0.635 
pH −0.162 −0.179 −0.13 −0.083 EC 0.276 0.256 0.0124 −0.436 
EC −0.221 0.304 −0.417 −0.594 TDS −0.0021 −0.0011 −0.0012 0.0011 
TDS −0.001 0.001 −0.001 0.001 Mg2+ −0.0782 −0.475 −0.0443 −0.682 
Mg2+ −0.078 −0.327 −0.032 −0.098 Na+ 1.102 0.78 1.08 1.061 
Na+ 0.801 0.584 0.795 0.714 Cl −0.0177 0.101 −0.0607 −0.115 
     HCO3 0.129 0.009 0.12 −0.111 
KR Intercept 1.68 2.115 1.509 1.427 PI Intercept 196.423 250.63 203.55 194.689 
pH −0.145 −0.218 −0.132 −0.129 pH −13.342 −20.892 −14.519 −13.439 
EC −0.045 −0.023 −0.189 −0.104 EC −19.483 −9.436 −21.605 −14.844 
TDS −0.001 −0.001 TDS −0.114 −0.015 −0.067 −0.093 
Mg2+ −0.039 −0.165 −0.026 −0.03 Mg2+ −2.393 −7.933 −2.888 −2.259 
Na+ 0.348 0.226 0.331 0.306 Na+ 12.455 6.806 11.332 10.595 
Cl −0.008 0.021 −0.03 −0.058 Cl 0.233 −1.746 −1.539 −0.277 
HCO3 −0.059 −0.021 −0.052 −0.052 HCO3 1.19 −0.622 −0.19 0.582 
%Na Intercept 35.816 −0.021 33.392 32.023 IWQI Intercept 121.406 61.196 158.519 128.457 
EC 13.213 −0.109 11.279 −4.054 EC −1.2 1.528 −0.846 −4.285 
TDS −0.041 0.005 −0.01 0.011 TDS −0.012 0.003 −0.045 −0.025 
Mg2+ −1.534 −5.477 −1.762 −1.636 Na+ −22.597 0.234 −13.132 −22.377 
Na+ 10.863 6.837 10.035 12.865 Cl −1.955 0.448 −2.153 −1.063 
Cl −0.747 1.908 −2.146 −2.78 SAR −5.03 0.353 −44.467 −3.022 
HCO3 −4.85 −2.43 −5.291 −4.784 ESP 42.4 0.3 59.8 44.6 
     PS 4.591 0.427 3.872 4.5 
     PI −0.198 −0.035 −0.362 −0.215 
     %Na 0.078 0.028 0.139 0.031 
     KR −53.374 0.493 −40.297 −61.99 
Table 15

Performance evaluation of regression models in training and testing datasets

WQ parameterStatistical measuresTraining
Testing
MLRPLSRPCRWLSRMLRPLSRPCRWLSR
SAR RMSE 0.21 0.26 0.16 0.23 0.24 0.26 0.24 0.28 
MAPE 0.27 0.23 0.17 0.25 0.22 0.23 0.16 0.23 
NSEC 0.95 0.93 0.97 0.94 0.92 0.91 0.93 0.89 
R2 0.97 0.93 0.97 0.96 0.93 0.92 0.93 0.91 
VAF 96.93% 92.98% 97.35% 95.31% 92.96% 92.02% 93.01% 89.50% 
WI 0.988 0.982 0.993 0.986 0.981 0.975 0.981 0.975 
RBIAS −29.57 0.00 0.00 −24.10 −5.34 8.37 5.50 −2.46 
KR RMSE 0.17 0.18 0.15 0.17 0.21 0.24 0.23 0.25 
MAPE 0.48 0.41 0.35 0.42 0.35 0.34 0.28 0.37 
NSEC 0.77 0.74 0.82 0.78 0.68 0.60 0.63 0.56 
R2 0.82 0.74 0.82 0.80 0.68 0.66 0.66 0.56 
VAF 81.51% 74.23% 82.43% 79.74% 68.45% 64.78% 65.66% 56.23% 
WI 0.940 0.921 0.949 0.937 0.895 0.853 0.870 0.846 
RBIAS −39.12 0.00 0.00 −28.74 −1.41 16.11 11.71 4.06 
%Na RMSE 8.24 8.76 8.13 8.88 7.72 8.92 8.37 9.32 
MAPE 0.28 0.30 0.27 0.24 0.19 0.22 0.19 0.19 
NSEC 0.61 0.56 0.62 0.55 0.67 0.56 0.61 0.52 
R2 0.61 0.56 0.62 0.60 0.69 0.63 0.64 0.57 
VAF 61.18% 55.77% 61.93% 54.92% 68.72% 61.78% 64.45% 52.79% 
WI 0.869 0.837 0.868 0.873 0.888 0.837 0.867 0.860 
RBIAS −6.15 0.00 0.00 −6.70 5.12 9.75 7.24 3.48 
PI RMSE 10.06 10.83 9.95 10.05 10.95 12.96 11.54 11.46 
MAPE 0.12 0.12 0.11 0.11 0.08 0.12 0.09 0.09 
NSEC 0.64 0.59 0.65 0.64 0.60 0.43 0.55 0.56 
R2 0.64 0.58 0.65 0.65 0.62 0.44 0.57 0.61 
VAF 64.34% 58.55% 65.05% 64.47% 59.91% 44.55% 55.61% 56.24% 
WI 0.887 0.854 0.883 0.874 0.833 0.770 0.809 0.801 
RBIAS −1.83 0.00 0.00 −2.12 1.24 2.63 1.60 1.68 
ESP RMSE 0.26 0.39 0.25 0.28 0.30 0.39 0.33 0.40 
MAPE 3.20 4.60 2.20 1.27 0.50 0.99 0.49 0.88 
NSEC 0.96 0.92 0.96 0.96 0.94 0.90 0.93 0.89 
R2 0.96 0.91 0.96 0.96 0.94 0.91 0.93 0.91 
VAF 96.33% 91.53% 96.48% 95.59% 94.11% 91.03% 93.00% 89.77% 
WI 0.99 0.98 0.99 0.99 0.98 0.97 0.98 0.97 
RBIAS −15.19 0.00 0.00 −7.07 11.18 29.91 16.00 15.13 
IWQI RMSE 5.75 8.31 5.45 5.96 6.75 8.82 6.93 7.25 
MAPE 0.07 0.10 0.07 0.07 0.09 0.12 0.09 0.09 
NSEC 0.59 0.15 0.63 0.56 0.55 0.23 0.52 0.48 
R2 0.59 0.14 0.63 0.58 0.59 0.29 0.56 0.57 
VAF 59.31% 14.93% 63.41% 56.47% 55.69% 23.65% 53.75% 48.42% 
WI 0.861 0.496 0.877 0.861 0.867 0.496 0.855 0.862 
RBIAS −0.63 0.00 0.00 −1.57 −1.77 −1.64 −2.10 −1.41 
WQ parameterStatistical measuresTraining
Testing
MLRPLSRPCRWLSRMLRPLSRPCRWLSR
SAR RMSE 0.21 0.26 0.16 0.23 0.24 0.26 0.24 0.28 
MAPE 0.27 0.23 0.17 0.25 0.22 0.23 0.16 0.23 
NSEC 0.95 0.93 0.97 0.94 0.92 0.91 0.93 0.89 
R2 0.97 0.93 0.97 0.96 0.93 0.92 0.93 0.91 
VAF 96.93% 92.98% 97.35% 95.31% 92.96% 92.02% 93.01% 89.50% 
WI 0.988 0.982 0.993 0.986 0.981 0.975 0.981 0.975 
RBIAS −29.57 0.00 0.00 −24.10 −5.34 8.37 5.50 −2.46 
KR RMSE 0.17 0.18 0.15 0.17 0.21 0.24 0.23 0.25 
MAPE 0.48 0.41 0.35 0.42 0.35 0.34 0.28 0.37 
NSEC 0.77 0.74 0.82 0.78 0.68 0.60 0.63 0.56 
R2 0.82 0.74 0.82 0.80 0.68 0.66 0.66 0.56 
VAF 81.51% 74.23% 82.43% 79.74% 68.45% 64.78% 65.66% 56.23% 
WI 0.940 0.921 0.949 0.937 0.895 0.853 0.870 0.846 
RBIAS −39.12 0.00 0.00 −28.74 −1.41 16.11 11.71 4.06 
%Na RMSE 8.24 8.76 8.13 8.88 7.72 8.92 8.37 9.32 
MAPE 0.28 0.30 0.27 0.24 0.19 0.22 0.19 0.19 
NSEC 0.61 0.56 0.62 0.55 0.67 0.56 0.61 0.52 
R2 0.61 0.56 0.62 0.60 0.69 0.63 0.64 0.57 
VAF 61.18% 55.77% 61.93% 54.92% 68.72% 61.78% 64.45% 52.79% 
WI 0.869 0.837 0.868 0.873 0.888 0.837 0.867 0.860 
RBIAS −6.15 0.00 0.00 −6.70 5.12 9.75 7.24 3.48 
PI RMSE 10.06 10.83 9.95 10.05 10.95 12.96 11.54 11.46 
MAPE 0.12 0.12 0.11 0.11 0.08 0.12 0.09 0.09 
NSEC 0.64 0.59 0.65 0.64 0.60 0.43 0.55 0.56 
R2 0.64 0.58 0.65 0.65 0.62 0.44 0.57 0.61 
VAF 64.34% 58.55% 65.05% 64.47% 59.91% 44.55% 55.61% 56.24% 
WI 0.887 0.854 0.883 0.874 0.833 0.770 0.809 0.801 
RBIAS −1.83 0.00 0.00 −2.12 1.24 2.63 1.60 1.68 
ESP RMSE 0.26 0.39 0.25 0.28 0.30 0.39 0.33 0.40 
MAPE 3.20 4.60 2.20 1.27 0.50 0.99 0.49 0.88 
NSEC 0.96 0.92 0.96 0.96 0.94 0.90 0.93 0.89 
R2 0.96 0.91 0.96 0.96 0.94 0.91 0.93 0.91 
VAF 96.33% 91.53% 96.48% 95.59% 94.11% 91.03% 93.00% 89.77% 
WI 0.99 0.98 0.99 0.99 0.98 0.97 0.98 0.97 
RBIAS −15.19 0.00 0.00 −7.07 11.18 29.91 16.00 15.13 
IWQI RMSE 5.75 8.31 5.45 5.96 6.75 8.82 6.93 7.25 
MAPE 0.07 0.10 0.07 0.07 0.09 0.12 0.09 0.09 
NSEC 0.59 0.15 0.63 0.56 0.55 0.23 0.52 0.48 
R2 0.59 0.14 0.63 0.58 0.59 0.29 0.56 0.57 
VAF 59.31% 14.93% 63.41% 56.47% 55.69% 23.65% 53.75% 48.42% 
WI 0.861 0.496 0.877 0.861 0.867 0.496 0.855 0.862 
RBIAS −0.63 0.00 0.00 −1.57 −1.77 −1.64 −2.10 −1.41 

Factor importance in the PLSR model

In the PLSR model, variable importance in projection (VIP) is determined by selecting latent vectors or variables (LVs) at 95% confidence interval to evaluate the relative importance of independent variables on predicted variables (SAR/KR/%Na/PI/ESP/IWQI) in this present study (Tran et al. 2014). Figure 10 shows the significance of explanatory variables on the prediction of these six IWQIs. The optimum number of LVs is 2 for SAR, KR, %Na, PI, and ESP, whereas for IWQI, only one LV is considered to avoid the overfitting of the PLSR model and the findings are partially aligned with Gad et al. (2023). Na+ is the most influential predictor in both the LVs, whereas Mg2+ has a relatively higher impact on the prediction of KR, %Na, PI, and ESP in LV2, having VIP > 1. TDS and EC show a higher influence for the prediction of SAR, PI, and ESP in LV1, whereas the prediction of IWQI is mostly dominated by PI, Cl, and PS in LV1. Table 16 shows the correlation between the independent and dependent WQ variables with the latent vectors by evaluating R2X and R2Y, respectively.
Table 16

Correlation of independent and explanatory variables with the latent vectors

IndexLatent vectorSARKR%NaPIESPIWQI
R2X LV1 0.666 0.648 0.740 0.656 0.773 0.624 
LV2 0.776 0.746 0.868 0.752 0.863  
R2Y LV1 0.631 0.296 0.207 0.277 0.588 0.149 
LV2 0.930 0.742 0.558 0.586 0.915  
IndexLatent vectorSARKR%NaPIESPIWQI
R2X LV1 0.666 0.648 0.740 0.656 0.773 0.624 
LV2 0.776 0.746 0.868 0.752 0.863  
R2Y LV1 0.631 0.296 0.207 0.277 0.588 0.149 
LV2 0.930 0.742 0.558 0.586 0.915  
Figure 10

Relative importance of each input variable for the prediction of (a) SAR; (b) KR; (c) %Na; (d) PI; (e) ESP; and (f) IWQI in the projection of LV1 and LV2.

Figure 10

Relative importance of each input variable for the prediction of (a) SAR; (b) KR; (c) %Na; (d) PI; (e) ESP; and (f) IWQI in the projection of LV1 and LV2.

Close modal

Importance of input variables in the PCR model

PCs are derived based on the Eigen value (>1) for the prediction of SAR, KR, %Na, PI, ESP, and IWQI in the PCR model. The importance of factors in the first two PCs is demonstrated by the PCA loading plot, as shown in Figure 11. EC, TDS, Mg2+, Na+, Cl, and HCO3 have high positive loading in PC1, whereas pH is strongly correlated with PC2, explaining total variances of 87.2, 82, and 88.4% for the prediction of SAR; KR and PI; %Na; and ESP, respectively. A strong positive correlation is observed for EC (0.848), TDS (0.875), Na+ (0.964), Cl (0.817), SAR (0.931), ESP (0.938), and PS (0.828) with PC1, and PI (0.929) has a significant positive correlation with PC2, showing a total variance of 92.1% for the prediction of IWQI.
Figure 11

Factor loadings of different chemical constituents of GW samples for prediction of (a) SAR; (b) KR and PI; (c) %Na and ESP; and (d) IWQI.

Figure 11

Factor loadings of different chemical constituents of GW samples for prediction of (a) SAR; (b) KR and PI; (c) %Na and ESP; and (d) IWQI.

Close modal

Model adequacy based on input parameters in the PCR model

In this current research, some statistical measures are determined to check the adequacy of the PCR model for the prediction of SAR, KR, %Na, PI, ESP, and IWQI based on their possible input parameters. For this purpose, four statistical parameters are evaluated such as DW, Mallow's Cp, AIC, and BIC as shown in Table 17 to examine the goodness of fit of the PCR model. Similar statistical measures were studied by Kouadri et al. (2021). The DW statistic indicates no similarity in the dataset between the time series (2014–2021) for KR, %Na, and PI with the available input parameters. However, the dataset is positively and negatively correlated for SAR, ESP, and IWQI. The prediction of IWQI is poorly fitted with the PCR model for the highest Mallow's Cp (11). The PCR model shows the highest goodness of fit for the prediction of KR followed by SAR, IWQI, %Na, PI, and ESP, exhibiting lower AIC and BIC.

Table 17

Subset regression analysis for input irrigation water quality parameters in the PCR model

WQ indicesInput variablesDurbin–Watson (DW)Mallow's CpAkaike information criteria (AIC)Bayesian information criteria (BIC)
SAR EC/pH/Na+/Mg2+/TDS 2.3 −808.1 −786.9 
KR EC/pH/Na+/Mg2+/TDS/HCO3/Cl −943.6 −915.4 
%Na EC/Na+/Mg2+/TDS/HCO3/Cl 1,070 1,094.7 
PI EC/pH/Na+/Mg2+/TDS/HCO3/Cl 1,173.9 1,202.2 
ESP EC/Na+/Mg2+/TDS/HCO3/Cl 2.1 1,635.8 1,660.5 
IWQI EC/TDS/Na+/Cl/SAR/KR/%Na/PI/ESP/PS 1.9 11 900.9 939.7 
WQ indicesInput variablesDurbin–Watson (DW)Mallow's CpAkaike information criteria (AIC)Bayesian information criteria (BIC)
SAR EC/pH/Na+/Mg2+/TDS 2.3 −808.1 −786.9 
KR EC/pH/Na+/Mg2+/TDS/HCO3/Cl −943.6 −915.4 
%Na EC/Na+/Mg2+/TDS/HCO3/Cl 1,070 1,094.7 
PI EC/pH/Na+/Mg2+/TDS/HCO3/Cl 1,173.9 1,202.2 
ESP EC/Na+/Mg2+/TDS/HCO3/Cl 2.1 1,635.8 1,660.5 
IWQI EC/TDS/Na+/Cl/SAR/KR/%Na/PI/ESP/PS 1.9 11 900.9 939.7 

Evaluation of results by ML models

K-fold cross-validation of the SVM model

K-fold cross-validation (CV) technique is adopted in this present study to assess the performance of the SVM model for predicting all the six IWQIs. A trial-and-error procedure is implemented to find out the optimum value of K by using 3-, 5-, 7-, and 10-fold cross-validation with the help of RMSE, MAE, and R2 (Figure 12) similar to the study by Asrol et al. (2021). Three-fold CV demonstrates the highest goodness of fit for SAR (RMSE = 0.16 and R2 = 0.97) and IWQI (RMSE = 6.64 and R2 = 0.45). Similarly, 5-fold CV illustrates the highest goodness of fit for KR (RMSE = 0.17 and R2 = 0.75) and ESP (RMSE = 3.07, R2 = 0.94). However, 7-fold CV and 10-fold CV have given satisfactory performance of the SVM model for PI (RMSE = 10.82, R2 = 0.58) and %Na (RMSE = 9.34, R2 = 0.50). An important finding from the observations exposes that the higher value of K imparts relatively lower R2 and higher RMSE and MAE, and thus, the SVM model with lower K-fold CV is best suited for the prediction of WQ variables in this study. Mamat et al. (2021) found that the 5-fold CV technique among 3-, 5-, 7-, 10-, and 15-fold CV is the most effective for prediction of WQI in Langat River, Kajang through the SVR model.
Figure 12

Evaluation of (a) RMSE, (b) R2, and (c) MAE by using K-fold cross-validation for the SVM model.

Figure 12

Evaluation of (a) RMSE, (b) R2, and (c) MAE by using K-fold cross-validation for the SVM model.

Close modal

Performance evaluation of ML models

Five ML models are used in this present study, i.e. ANN, SVM, CART, CRRF, and KNN, to predict SAR, KR, %Na, PI, ESP, and IWQI. Table 18 shows the statistical measures to check the reliability of the proposed models. The ANN model gives the highest prediction accuracy for SAR followed by ESP, KR, PI, and %Na having higher R2 as 0.98, 0.97, 0.94, 0.93, and 0.90, respectively, and lower RMSE as 0.11, 0.24, 0.09, 4.41, and 4.19, respectively, in the training period as compared with the testing dataset aligned with El Bilali & Taleb (2020) and Gad et al. (2023). ANN highly overestimates and underestimates the prediction for ESP (RBIAS = −5.37) and %Na (RBIAS = 3.91) in the training phase.

Table 18

Performance evaluation of ML models in training and testing datasets

WQ parameterStatistical measuresTraining
Testing
ANNSVMCARTCRRFKNNANNSVMCARTCRRFKNN
SAR RMSE 0.11 0.16 0.18 0.38 0.46 0.11 0.25 0.37 0.23 0.41 
MAPE 0.11 0.14 0.08 0.11 0.41 0.10 0.15 0.14 0.11 0.36 
NSEC 0.99 0.97 0.97 0.84 0.78 0.98 0.92 0.82 0.93 0.77 
R2 0.98 0.97 0.96 0.86 0.78 0.98 0.92 0.84 0.94 0.77 
VAF 98.60% 97.17% 96.58% 84.36% 77.62% 98.54% 92.27% 83.35% 93.74% 78.28% 
WI 0.996 0.993 0.991 0.949 0.929 0.996 0.979 0.944 0.980 0.937 
RBIAS −0.22 1.88 0.00 4.58 −7.50 2.06 6.55 8.76 6.29 7.69 
KR RMSE 0.09 0.15 0.15 0.17 0.22 0.10 0.25 0.25 0.20 0.23 
MAPE 0.14 0.28 0.13 0.20 0.50 0.11 0.26 0.26 0.18 0.33 
NSEC 0.94 0.81 0.82 0.76 0.63 0.93 0.56 0.56 0.72 0.65 
R2 0.94 0.82 0.82 0.79 0.64 0.93 0.62 0.60 0.74 0.68 
VAF 93.72% 81.52% 82.39% 76.42% 63.64% 93.34% 61.76% 57.11% 73.56% 67.81% 
WI 0.983 0.948 0.949 0.916 0.874 0.982 0.854 0.874 0.907 0.881 
RBIAS 1.40 12.83 0.00 4.00 −9.92 1.32 17.06 6.10 7.79 11.96 
%Na RMSE 4.19 9.61 5.52 5.70 8.34 4.60 9.89 8.74 6.20 7.82 
MAPE 0.10 0.22 0.15 0.14 0.28 0.09 0.18 0.19 0.12 0.20 
NSEC 0.90 0.47 0.82 0.81 0.60 0.88 0.46 0.58 0.79 0.66 
R2 0.90 0.59 0.82 0.82 0.60 0.89 0.55 0.60 0.81 0.70 
VAF 90.01% 46.82% 82.46% 81.30% 60.21% 88.98% 48.82% 58.31% 80.56% 69.16% 
WI 0.973 0.864 0.950 0.942 0.862 0.968 0.849 0.879 0.934 0.884 
RBIAS 3.91 −0.59 0.00 −0.45 −6.31 3.33 6.80 2.87 5.47 6.95 
PI RMSE 4.41 10.55 6.36 7.84 8.60 6.66 12.15 12.53 9.27 7.63 
MAPE 0.05 0.10 0.07 0.09 0.10 0.05 0.09 0.12 0.08 0.08 
NSEC 0.93 0.61 0.86 0.78 0.74 0.85 0.50 0.47 0.71 0.80 
R2 0.93 0.61 0.86 0.79 0.74 0.86 0.56 0.52 0.71 0.81 
VAF 93.18% 61.44% 85.72% 78.32% 73.97% 85.09% 52.74% 47.19% 71.07% 80.78% 
WI 0.982 0.866 0.960 0.931 0.919 0.955 0.775 0.845 0.911 0.943 
RBIAS −1.24 5.06 0.00 −0.59 −1.91 0.35 3.84 0.07 −0.14 1.54 
ESP RMSE 0.24 0.28 0.51 0.51 0.62 0.16 0.37 0.34 0.29 0.54 
MAPE 1.63 2.12 1.50 2.65 11.10 0.62 0.62 0.88 0.61 2.03 
NSEC 0.97 0.96 0.85 0.86 0.78 0.98 0.91 0.92 0.94 0.80 
R2 0.97 0.96 0.85 0.87 0.78 0.98 0.92 0.93 0.94 0.82 
VAF 96.76% 95.70% 85.22% 85.64% 78.24% 98.25% 91.30% 92.57% 94.47% 81.82% 
WI 0.99 0.99 0.96 0.95 0.93 1.00 0.98 0.98 0.98 0.95 
RBIAS −5.37 2.64 0.00 19.02 −30.95 0.15 16.70 11.24 14.08 32.26 
IWQI RMSE 4.22 5.84 3.58 3.57 3.96 4.83 7.83 5.46 5.27 4.29 
MAPE 0.05 0.07 0.04 0.03 0.05 0.06 0.11 0.07 0.06 0.05 
NSEC 0.78 0.58 0.84 0.84 0.81 0.77 0.39 0.70 0.72 0.82 
R2 0.78 0.58 0.84 0.84 0.81 0.77 0.43 0.71 0.74 0.83 
VAF 78.13% 58.02% 84.22% 84.35% 80.68% 76.84% 41.54% 70.86% 74.07% 82.64% 
WI 0.936 0.842 0.956 0.956 0.944 0.931 0.779 0.901 0.916 0.944 
RBIAS 0.46 0.30 0.00 −0.60 0.57 −0.48 −2.73 −1.18 −2.23 −1.65 
WQ parameterStatistical measuresTraining
Testing
ANNSVMCARTCRRFKNNANNSVMCARTCRRFKNN
SAR RMSE 0.11 0.16 0.18 0.38 0.46 0.11 0.25 0.37 0.23 0.41 
MAPE 0.11 0.14 0.08 0.11 0.41 0.10 0.15 0.14 0.11 0.36 
NSEC 0.99 0.97 0.97 0.84 0.78 0.98 0.92 0.82 0.93 0.77 
R2 0.98 0.97 0.96 0.86 0.78 0.98 0.92 0.84 0.94 0.77 
VAF 98.60% 97.17% 96.58% 84.36% 77.62% 98.54% 92.27% 83.35% 93.74% 78.28% 
WI 0.996 0.993 0.991 0.949 0.929 0.996 0.979 0.944 0.980 0.937 
RBIAS −0.22 1.88 0.00 4.58 −7.50 2.06 6.55 8.76 6.29 7.69 
KR RMSE 0.09 0.15 0.15 0.17 0.22 0.10 0.25 0.25 0.20 0.23 
MAPE 0.14 0.28 0.13 0.20 0.50 0.11 0.26 0.26 0.18 0.33 
NSEC 0.94 0.81 0.82 0.76 0.63 0.93 0.56 0.56 0.72 0.65 
R2 0.94 0.82 0.82 0.79 0.64 0.93 0.62 0.60 0.74 0.68 
VAF 93.72% 81.52% 82.39% 76.42% 63.64% 93.34% 61.76% 57.11% 73.56% 67.81% 
WI 0.983 0.948 0.949 0.916 0.874 0.982 0.854 0.874 0.907 0.881 
RBIAS 1.40 12.83 0.00 4.00 −9.92 1.32 17.06 6.10 7.79 11.96 
%Na RMSE 4.19 9.61 5.52 5.70 8.34 4.60 9.89 8.74 6.20 7.82 
MAPE 0.10 0.22 0.15 0.14 0.28 0.09 0.18 0.19 0.12 0.20 
NSEC 0.90 0.47 0.82 0.81 0.60 0.88 0.46 0.58 0.79 0.66 
R2 0.90 0.59 0.82 0.82 0.60 0.89 0.55 0.60 0.81 0.70 
VAF 90.01% 46.82% 82.46% 81.30% 60.21% 88.98% 48.82% 58.31% 80.56% 69.16% 
WI 0.973 0.864 0.950 0.942 0.862 0.968 0.849 0.879 0.934 0.884 
RBIAS 3.91 −0.59 0.00 −0.45 −6.31 3.33 6.80 2.87 5.47 6.95 
PI RMSE 4.41 10.55 6.36 7.84 8.60 6.66 12.15 12.53 9.27 7.63 
MAPE 0.05 0.10 0.07 0.09 0.10 0.05 0.09 0.12 0.08 0.08 
NSEC 0.93 0.61 0.86 0.78 0.74 0.85 0.50 0.47 0.71 0.80 
R2 0.93 0.61 0.86 0.79 0.74 0.86 0.56 0.52 0.71 0.81 
VAF 93.18% 61.44% 85.72% 78.32% 73.97% 85.09% 52.74% 47.19% 71.07% 80.78% 
WI 0.982 0.866 0.960 0.931 0.919 0.955 0.775 0.845 0.911 0.943 
RBIAS −1.24 5.06 0.00 −0.59 −1.91 0.35 3.84 0.07 −0.14 1.54 
ESP RMSE 0.24 0.28 0.51 0.51 0.62 0.16 0.37 0.34 0.29 0.54 
MAPE 1.63 2.12 1.50 2.65 11.10 0.62 0.62 0.88 0.61 2.03 
NSEC 0.97 0.96 0.85 0.86 0.78 0.98 0.91 0.92 0.94 0.80 
R2 0.97 0.96 0.85 0.87 0.78 0.98 0.92 0.93 0.94 0.82 
VAF 96.76% 95.70% 85.22% 85.64% 78.24% 98.25% 91.30% 92.57% 94.47% 81.82% 
WI 0.99 0.99 0.96 0.95 0.93 1.00 0.98 0.98 0.98 0.95 
RBIAS −5.37 2.64 0.00 19.02 −30.95 0.15 16.70 11.24 14.08 32.26 
IWQI RMSE 4.22 5.84 3.58 3.57 3.96 4.83 7.83 5.46 5.27 4.29 
MAPE 0.05 0.07 0.04 0.03 0.05 0.06 0.11 0.07 0.06 0.05 
NSEC 0.78 0.58 0.84 0.84 0.81 0.77 0.39 0.70 0.72 0.82 
R2 0.78 0.58 0.84 0.84 0.81 0.77 0.43 0.71 0.74 0.83 
VAF 78.13% 58.02% 84.22% 84.35% 80.68% 76.84% 41.54% 70.86% 74.07% 82.64% 
WI 0.936 0.842 0.956 0.956 0.944 0.931 0.779 0.901 0.916 0.944 
RBIAS 0.46 0.30 0.00 −0.60 0.57 −0.48 −2.73 −1.18 −2.23 −1.65 

The SVM model shows its higher adequacy for the prediction of SAR, ESP, and KR in the training phase than the testing phase, having relatively higher R2 as 0.97, 0.96, and 0.82 in the training phase and lower RMSE as 0.16, 0.15, and 0.28, respectively, and similar findings are observed by Elsayed et al. 2020 and Mokhtar et al. (2022). However, it fails to give satisfactory prediction in the case of %Na and IWQI, having relatively lower VAF and NSEC both in training and testing phases, aligned with Wang et al. (2020). SVM provides the highest underestimation for KR in the testing phase (RBIAS = 17.06) as compared with the training phase (RBIAS = 12.83).

The CART model gives better performance in the training dataset for all the WQ variables, having significantly higher R2 and VAF. However, it only exhibits higher predictive strength for SAR and ESP in the testing phase, having higher R2 and VAF as 0.93, 0.84, 83.35%, and 92.57%, respectively. The CART model perfectively fits the observed data for all the six parameters in the training dataset with RBIAS being zero.

The CRRF model provides comparatively higher prediction for SAR and ESP in the testing phase as compared with the training phase, having significantly higher R2 as 0.94 for both the variables and VAF as 93.74 and 94.47%, respectively, in the testing phase. On the contrary, it gives moderately higher prediction for %Na and IWQI in the training phase (R2%Na = 0.82 and R2IWQI = 0.84) as compared with the testing phase (R2%Na = 0.81 and R2IWQI = 0.74). KNN provides the highest predictive strength for IWQI in the testing phase (R2 = 0.84; VAF = 82.64%) than in the training phase (R2 = 0.81; VAF = 80.68%), with relatively lower overestimation in the testing phase (RBIAS = −1.65). Also, it provides moderately higher prediction accuracy in the testing phase for KR, %Na, PI, and ESP than in the training phase, having R2 as 0.64, 0.60, 0.79, and 0.78, respectively, in the training phase and 0.68, 0.70, 0.81, and 0.82, respectively, in the testing phase, similar to the findings of El Bilali & Taleb (2020). It significantly overestimates (RBIAS = −30.95) and underestimates (RBIAS = 32.26) the prediction of ESP in training and testing datasets, respectively.

Variable importance in the CRRF model

In the CRRF model for the prediction of SAR, KR, %Na, PI, ESP, and IWQI, 5, 7, 6, 7, 6, and 10 input WQ parameters are utilized. Mean increase error is the governing indicator in the CRRF model and its higher value reflects significant contribution of the predictor for evaluation of the dependent variable (Zhang et al. 2018). Figure 13 shows the importance of input variables by mean increase error. Na+ has the highest influence on the prediction of SAR, KR, %Na, ESP, and PI followed by Mg2+, EC, TDS, HCO3, and Cl, having a higher mean increase error. Cl has the highest impact on the prediction of IWQI followed by EC, TDS, Na+, PS, SAR, PI, ESP, %Na, and KR. Shams et al. (2024) proposed factor importance in the RF model for determination of WQI.
Figure 13

Variable importance in the CRRF model by mean increase error for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.

Figure 13

Variable importance in the CRRF model by mean increase error for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.

Close modal
Figure 14 illustrates the comparison of mean and standard deviation (SD) of the six IWQIs between the observed value and the predicted value from different models in training and testing datasets. The findings are fully in agreement with the RBIAS obtained in training and testing datasets, as shown in Tables 15 and 18. El Bilali & Taleb (2020) showed considerable overestimation and underestimation in the prediction of SAR, KR, %Na, and ESP, respectively, in the Bouregreg watershed in Morocco. Dimple et al. (2022) illustrated higher prediction accuracy of the SVM model for the prediction of SAR and KR, having higher R2 in GW of the Nand Samand catchment of Rajasthan.
Figure 14

Statistical evaluation (mean and standard deviation) of estimated and predicted values for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.

Figure 14

Statistical evaluation (mean and standard deviation) of estimated and predicted values for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.

Close modal

Classification accuracy of ML models

Eight statistical parameters, i.e. sensitivity (SE), specificity (SP), accuracy (A), positive predictive value (PPV), false positive rate (FPR), Mathew's correlation coefficient (MCC), F1-score, and error rate (ERR), are evaluated from the confusion matrix for the assessment of classification accuracy of water restriction zones by the proposed ML models, namely, SVM, CART, CRRF, and KNN in training and testing periods. From Table 19, it is observed that CART and CRRF models have significantly higher classification accuracy than the other two models, having comparatively higher SE, SP, A, MCC, and F1-score in both training and testing datasets similar to the findings by Bhoi et al. (2022) and Khan et al. (2022). However, the CART (ERRtraining = 0.10 and ERRtesting = 0.10) model has more misclassification rate than the CRRF (ERRtraining = 0.06 and ERRtesting = 0.05) model. However, KNN provides the highest misclassification rate (ERR = 0.14) followed by SVM (ERR = 0.12) in the testing dataset.

Table 19

Accuracy assessment of ML models for training and testing datasets from the confusion matrix

Statistical measureSVM
CART
CRRF
KNN
TrainingTestingTrainingTestingTrainingTestingTrainingTesting
SE 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
SP 0.94 0.92 0.94 0.94 0.96 0.97 0.93 0.91 
0.90 0.88 0.90 0.90 0.94 0.95 0.89 0.86 
PPV 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
FPR 0.06 0.08 0.06 0.06 0.04 0.03 0.07 0.09 
MCC 0.68 0.62 0.68 0.68 0.81 0.84 0.70 0.62 
F1-score 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
ERR 0.10 0.12 0.10 0.10 0.06 0.05 0.11 0.14 
Statistical measureSVM
CART
CRRF
KNN
TrainingTestingTrainingTestingTrainingTestingTrainingTesting
SE 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
SP 0.94 0.92 0.94 0.94 0.96 0.97 0.93 0.91 
0.90 0.88 0.90 0.90 0.94 0.95 0.89 0.86 
PPV 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
FPR 0.06 0.08 0.06 0.06 0.04 0.03 0.07 0.09 
MCC 0.68 0.62 0.68 0.68 0.81 0.84 0.70 0.62 
F1-score 0.74 0.69 0.75 0.74 0.85 0.87 0.78 0.72 
ERR 0.10 0.12 0.10 0.10 0.06 0.05 0.11 0.14 

Results of the Friedman test

The Friedman test is carried out by determining p-value (Table 20) from the correlation matrix to show whether any significant difference is present between any two models at a significance level of p < 0.05. For the prediction of SAR, KR, %Na, PI, and ESP, a significant difference of outputs is observed between MLR and other remaining models at a significance level of p < 0.05. For IWQI, only a significant difference of output is observed between PLSR and SVM at p = 0.041.

Table 20

p-value in the Friedman test

 
 

This study significantly highlights the importance of RS and GIS techniques with the efficient implementation of different regression and ML models for assessment of some important IWQIs of GW for Sundargarh district. A significant number of wells contain good quality of GW based on the spatio-temporal variation of SAR (100%), KR (93%), %Na (80%), ESP (100%), and IWQI (92%), indicating higher suitability of GW for irrigation purpose during the pre-monsoon season. Dug wells of Durubaga, Ekma, Balichuuan, Banki, Lokedega, Balijori, Krinjikela, Talsara, Moshani Kani, Lathikata, R-06 Basanti Colony, R-07 Uditnagar 1, Himgiri, Laxmipose, and Bhasma contain poor quality of GW constituting BGC, schist, and alluvium aquifers. The variation of SAR, KR, %Na, and ESP follows a long-term monotonic increasing trend, and the variation of PI and IWQI shows a long-term monotonic decreasing trend at p > 0.1 during 2014–2021. PCA reveals that salinity and sodicity factors are the governing parameters significantly dominating the GW quality through the prediction of IWQI. Subsequently, DA exhibits that Na+, Cl, SAR, KR, and PS are the governing water quality variables for showing the spatial variability of GW quality throughout the district significantly. All the regression models demarcate their higher predictive strength for the prediction of SAR and ESP. However, the PCR model shows its higher acceptability for the prediction of KR, %Na, PI, and IWQI. ANN shows the highest predictive strength for prediction of SAR, KR, %Na, PI, and ESP based on the input parameters. On the contrary, the CART model can have comparatively higher prediction accuracy for all the WQ variables followed by CRRF, SVM and KNN models. This observation clearly illustrates that the regression and ML models can be effectively implemented for their robustness and cost-effectiveness over the expansive and time-consuming conventional methods in arid and semi-arid regions for long-term data variation. Also, Friedman test results reflect that significant variation between the regression and ML models is observed for the predictive value for all the WQ indices except for IWQI. Overall, the outcomes of this present study showcase that these regression and ML models can be highly suitable for efficient GW quality management, supporting the sustainable utilization and management of water resources for Sundargarh district.

The only disadvantage of this present study is the data limitation for WQ parameters as the data used in the analysis were only for the pre-monsoon season during 2014–2021. This limitation fails to show the spatio-temporal variation and trend of these WQ indices during post-monsoon and monsoon seasons, which can provide sufficient information on seasonal GW quality for the proper monitoring and management of GW for sustainable use mainly for industrial and agricultural activities for this district.

On the other hand, in future, the effect of seasonal GW quality parameters on the cultivation of paddy and its crop characteristics can be studied comprehensively to develop a cost-effective irrigation management system for Sundargarh district.

All authors consent to participate.

All authors consent to publish.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abba
S. I.
,
Hadi
S. J.
,
Sammen
S. S.
,
Salih
S. Q.
,
Abdulkadir
R. A.
,
Pham
Q. B.
&
Yaseen
Z. M.
(
2020
)
Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination
,
Journal of Hydrology
,
587
,
124974
.
Abuzir
S. Y.
&
Abuzir
Y. S.
(
2022
)
Machine learning for water quality classification
,
Water Quality Research Journal
,
57
(
3
),
152
164
.
Aldhyani
T. H.
,
Al-Yaari
M.
,
Alkahtani
H.
&
Maashi
M.
(
2020
)
Water quality prediction using artificial intelligence algorithms
,
Applied Bionics and Biomechanics
,
2020
(
1
),
6659314
.
American Public Health Association (APHA)
(
2005
)
Standard Methods for the Examination of Water and Wastewater
, 21st edn.
Washington DC
:
American Public Health Association
,
1220 p
.
Barbieri
M.
,
Barberio
M. D.
,
Banzato
F.
,
Billi
A.
,
Boschetti
T.
,
Franchini
S.
,
Gori
F.
&
Petitta
M.
(
2023
)
Climate change and its effect on groundwater quality
,
Environmental Geochemistry and Health
,
45
(
4
),
1133
1144
.
Batarseh
M.
,
Imreizeeq
E.
,
Tilev
S.
,
Al Alaween
M.
,
Suleiman
W.
,
Al Remeithi
A. M.
,
Al Tamimi
M. K.
&
Al Alawneh
M.
(
2021
)
Assessment of groundwater quality for irrigation in the arid regions using irrigation water quality index (IWQI) and GIS-zoning maps: Case study from Abu Dhabi Emirate, UAE
,
Groundwater for Sustainable Development
,
14
,
100611
.
Bhoi
S. K.
,
Mallick
C.
&
Mohanty
C. R.
(
2022
)
Estimating the water quality class of a major irrigation canal in Odisha, India: A supervised machine learning approach
,
Nature Environment and Pollution Technology
,
21
(
2
),
433
446
.
Breiman
L.
,
Friedman
J.
,
Stone
C. J.
&
Olshen
R. A.
(
1984
)
Classification and Regression Trees
.
New York
:
CRC Press
.
Central Ground Water Board (CGWB)
(
2013
)
Ground Water Information Booklet
.
Sundargarh district, Orissa
:
Ministry of Water Resources, Govt. of India
.
Central Ground Water Board
(
2017
)
National Compilation on Dynamic Ground Water Resources of India
.
Faridabad: Central Ground Water Board Ministry of Water Resources, River Development and Ganga Rejuvenation, Government of India
Central Ground Water Board
(
2022
)
National Compilation on Dynamic Ground Water Resources of India
.
Faridabad: Central Ground Water Board Ministry of Water Resources, River Development and Ganga Rejuvenation, Government of India
.
Chen
T.
,
Zhang
H.
,
Sun
C.
,
Li
H.
&
Gao
Y.
(
2018
)
Multivariate statistical approaches to identify the major factors governing groundwater quality
,
Applied Water Science
,
8
(
7
),
215
.
Chidambaram
S.
,
Prasanna
M. V.
,
Venkatramanan
S.
,
Nepolian
M.
,
Pradeep
K.
,
Panda
B.
,
Thivya
C.
&
Thilagavathi
R.
(
2022
)
Groundwater quality assessment for irrigation by adopting new suitability plot and spatial analysis based on fuzzy logic technique
,
Environmental Research
,
204
,
111729
.
Dao
P. U.
,
Heuzard
A. G.
,
Le
T. X. H.
,
Zhao
J.
,
Yin
R.
,
Shang
C.
&
Fan
C.
(
2023
)
The impacts of climate change on groundwater quality: A review
,
Science of the Total Environment
,
912
,
169241
.
Devi
N. D.
&
Arumugam
T.
(
2019
)
Salinity tolerance in vegetable crops: a review
,
Journal of Pharmacognosy and Phytochemistry
,
8
(
3
),
2717
2721
.
Dimple
D.
,
Rajput
J.
,
Al-Ansari
N.
&
Elbeltagi
A.
(
2022
)
Predicting irrigation water quality indices based on data-driven algorithms: Case study in semiarid environment
,
Journal of Chemistry
,
2022
(
1
),
4488446
.
Doneen
L. D.
(
1964
)
Notes on Water Quality in Agriculture
.
Davis
:
Water Science and Engineering, Paper 4001, Department of Water Sciences and Engineering, University of California
.
El Bilali
A.
&
Taleb
A.
(
2020
)
Prediction of irrigation water quality parameters using machine learning models in a semi-arid environment
,
Journal of the Saudi Society of Agricultural Sciences
,
19
(
7
),
439
451
.
El Bilali
A.
,
Taleb
A.
&
Brouziyne
Y.
(
2021
)
Groundwater quality forecasting using machine learning algorithms for irrigation purposes
,
Agricultural Water Management
,
245
,
106625
.
El-Rawy
M.
,
Batelaan
O.
,
Alshehri
F.
,
Almadani
S.
,
Ahmed
M. S.
&
Elbeltagi
A.
(
2023
)
An integrated GIS and machine-learning technique for groundwater quality assessment and prediction in Southern Saudi Arabia
,
Water
,
15
(
13
),
2448
.
Eyankware
M. O.
,
Akakuru
O. C.
&
Eyankware
O. E.
(
2022
)
Hydrogeophysical delineation of aquifer vulnerability in parts of Nkalagu area of Abakaliki, se. Nigeria
,
Sustainable Water Resources Management
,
8
(
1
),
39
.
Ganiyu
S. A.
,
Olurin
O. T.
,
Azeez
M. A.
,
Jegede
O. A.
,
Okeh
A.
&
Kuforiji
H. I.
(
2022
)
Evaluation of major anions, halide ions, nitrogen, and phosphorus contents in groundwater from shallow hand-dug wells near Ona River, Ibadan, Nigeria
,
International Journal of Environmental Science and Technology
,
19
(
6
),
4997
5014
.
Gorgij
A. D.
,
Askari
G.
,
Taghipour
A. A.
,
Jami
M.
&
Mirfardi
M.
(
2023
)
Spatiotemporal forecasting of the groundwater quality for irrigation purposes, using deep learning method: Long short-term memory (LSTM)
,
Agricultural Water Management
,
277
,
108088
.
Gupta
P.
,
Samui
P.
&
Quaff
A. R.
(
2024
)
Estimation of water quality index using modern-day machine learning algorithms
,
Sādhanā
,
49
(
3
),
208
.
Hashemi
S. Z.
,
Darzi-Naftchali
A.
,
Karandish
F.
,
Ritzema
H.
&
Solaimani
K.
(
2024
)
Enhancing agricultural sustainability with water and crop management strategies in modern irrigation and drainage networks
,
Agricultural Water Management
,
305
,
109110
.
Hutcheson
G. D.
&
Sofroniou
N.
(
1999
)
The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models
.
Thousand Oaks, CA
:
Sage
.
Ibrahim
H.
,
Yaseen
Z. M.
,
Scholz
M.
,
Ali
M.
,
Gad
M.
,
Elsayed
S.
,
Khadr
M.
,
Hussein
H.
,
Ibrahim
H. H.
,
Eid
M. H.
,
Kovács
A.
,
Péter
S.
&
Khalifa
M. M.
(
2023
)
Evaluation and prediction of groundwater quality for irrigation using an integrated water quality index, machine learning models and GIS approaches: A representative case study
,
Water
,
15
(
4
),
694
.
Kelly
W. P.
(
1940
)
Permissible composition and concentration of irrigated waters
,
Proceedings of the American Society of Civil Engineers
,
66
,
607
613
.
https://doi.org/10.1061/TACEAT.0005384
.
Kendall
M. G.
(
1975
)
Rank Correlation Measures
, Vol.
202
.
London
:
Charles Griffin
, p.
15
.
Khalaf
R. M.
&
Hassan
W. H.
(
2013
)
Evaluation of irrigation water quality index IWQI for Al-Dammam confined aquifer in the west and southwest of Karbala city, Iraq
,
International Journal of Civil Engineering IJCE
,
23
,
21
34
.
Khan
M. S. I.
,
Islam
N.
,
Uddin
J.
,
Islam
S.
&
Nasir
M. K.
(
2022
)
Water quality prediction and classification based on principal component regression and gradient boosting classifier approach
,
Journal of King Saud University-Computer and Information Sciences
,
34
(
8
),
4773
4781
.
Kulisz
M.
,
Kujawska
J.
,
Przysucha
B.
&
Cel
W.
(
2021
)
Forecasting water quality index in groundwater using artificial neural network
,
Energies
,
14
(
18
),
5875
.
Kurunç
A.
,
Yürekli
K.
&
Cevik
O.
(
2005
)
Performance of two stochastic approaches for forecasting water quality and streamflow data from Yeşilιrmak River, Turkey
,
Environmental Modelling & Software
,
20
(
9
),
1195
1200
.
Laker
M. C.
&
Nortjé
G. P.
(
2019
)
Review of existing knowledge on soil crusting in South Africa
,
Advances in Agronomy
,
155
,
189
242
.
https://doi.org/10.1016/bs.agron.2019.01.002
.
Li
J.
,
Chen
J.
,
He
P.
,
Chen
D.
,
Dai
X.
,
Jin
Q.
&
Su
X.
(
2022
)
The optimal irrigation water salinity and salt component for high-yield and good-quality of tomato in Ningxia
,
Agricultural Water Management
,
274
,
107940
.
Mamat
N.
,
Hamzah
F. M.
&
Jaafar
O.
(
2021
)
Hybrid support vector regression model and K-fold cross validation for water quality index prediction in Langat River, Malaysia. bioRxiv
.
Mammeri
A.
,
Tiri
A.
,
Belkhiri
L.
,
Salhi
H.
,
Brella
D.
,
Lakouas
E.
,
Tahraoui
H.
,
Amrane
A.
&
Mouni
L.
(
2023
)
Assessment of surface water quality using water quality index and discriminant analysis method
,
Water
,
15
(
4
),
680
.
Mandal
T.
,
Das
J.
,
Rahman
A. S.
&
Saha
P.
(
2021
)
Rainfall insight in Bangladesh and India: Climate change and environmental perspective
. In:
Habitat, Ecology and Ekistics: Case Studies of Human-Environment Interactions in India
,
(Rukhsana, Haldar, A., Alam, A. & Satpati, L., eds.). Singapore: Springer
, pp.
53
74
.
Mann
H. B.
(
1945
)
Nonparametric tests against trend
,
Econometrica: Journal of the Econometric Society
,
13
,
245
259
.
Meireles
A. C. M.
,
Andrade
E. M. D.
,
Chaves
L. C. G.
,
Frischkorn
H.
&
Crisostomo
L. A.
(
2010
)
A new proposal of the classification of irrigation water
,
Revista Ciência Agronômica
,
41
,
349
357
.
Mitra
S.
,
Ghosh
S.
,
Satpathy
K. K.
,
Bhattacharya
B. D.
,
Sarkar
S. K.
,
Mishra
P.
&
Raja
P.
(
2018
)
Water quality assessment of the ecologically stressed Hooghly River Estuary, India: A multivariate approach
,
Marine Pollution Bulletin
,
126
,
592
599
.
M'nassri
S.
,
El Amri
A.
,
Nasri
N.
&
Majdoub
R.
(
2022
)
Estimation of irrigation water quality index in a semi-arid environment using data-driven approach
,
Water Supply
,
22
(
5
),
5161
5175
.
Mohammed
M. A.
,
Szabó
N. P.
&
Szűcs
P.
(
2022
)
Multivariate statistical and hydrochemical approaches for evaluation of groundwater quality in north Bahri City-Sudan
,
Heliyon
,
8
,
11
.
Mohammed
M. A.
,
Kaya
F.
,
Mohamed
A.
,
Alarifi
S.
,
Abdelrady
A.
,
Keshavarzi
A.
,
Szabó
N. P.
&
Szűcs
P.
(
2023
)
Application of GIS-based machine learning algorithms for prediction of irrigational groundwater quality indices
,
Frontiers in Earth Science
,
11
,
1274142
.
Mokhtar
A.
,
Elbeltagi
A.
,
Gyasi-Agyei
Y.
,
Al-Ansari
N.
&
Abdel-Fattah
M. K.
(
2022
)
Prediction of irrigation water quality indices based on machine learning and regression models
,
Applied Water Science
,
12
(
4
),
76
.
Mousazadeh
H.
,
Mahmudy-Gharaie
M. H.
,
Mosaedi
A.
&
Moussavi Harami
R.
(
2019
)
Hydrochemical assessment of surface and ground waters used for drinking and irrigation in Kardeh Dam Basin (NE Iran)
,
Environmental Geochemistry and Health
,
41
,
1235
1250
.
Nair
J. P.
&
Vijaya
M. S.
(
2022
) ‘
Exploratory
data analysis of Bhavani River water quality index data
’,
Proceedings of International Conference on Communication and Computational Technologies: iCCCT 2022
.
Singapore
:
Springer Nature Singapore
, pp.
971
987
.
Nosrati
K.
&
Van Den Eeckhaut
M.
(
2012
)
Assessment of groundwater quality using multivariate statistical techniques in Hashtgerd Plain, Iran
,
Environmental Earth Sciences
,
65
,
331
344
.
Patnaik
M.
,
Tudu
C.
&
Bagal
D. K.
(
2024
)
Monitoring groundwater quality using principal component analysis
,
Applied Geomatics
,
16
(
1
),
281
291
.
Rajmohan
N.
(
2020
)
Groundwater contamination issues in the shallow aquifer, Ramganga Sub-basin, India
. In:
Emerging Issues in the Water Environment During Anthropocene: A South East Asian Perspective
,
(Kumar, M., Snow, D. D. & Honda, Ryo, eds.). Singapore: Springer
, pp.
337
354
.
Richards
L. A.
(
1954
)
Diagnosis and Improvement of Saline and Alkali Soils (No. 60)
.
Washington, DC
:
US Government Printing Office
.
Saeed
O.
,
Székács
A.
,
Jordán
G.
,
Mörtl
M.
,
Abukhadra
M. R.
,
El-Sherbeeny
A. M.
&
Eid
M. H.
(
2024
)
Assessing surface water quality in Hungary's Danube basin using geochemical modeling, multivariate analysis, irrigation indices, and Monte Carlo simulation
,
Scientific Reports
,
14
(
1
),
18639
.
Saha
P.
&
Paul
B.
(
2019
)
Groundwater quality assessment in an industrial hotspot through interdisciplinary techniques
,
Environmental Monitoring and Assessment
,
191
,
1
20
.
Sen
P. K.
(
1968
)
Estimates of the regression coefficient based on Kendall's tau
,
Journal of the American Statistical Association
,
63
(
324
),
1379
1389
.
Şen
Z.
(
2012
)
Innovative trend analysis methodology
,
Journal of Hydrologic Engineering
,
17
(
9
),
1042
1046
.
Shams
M. Y.
,
Elshewey
A. M.
,
El-Kenawy
E. S. M.
,
Ibrahim
A.
,
Talaat
F. M.
&
Tarek
Z.
(
2024
)
Water quality prediction using machine learning models based on grid search method
,
Multimedia Tools and Applications
,
83
(
12
),
35307
35334
.
Sharma
P. K.
,
Vijay
R.
&
Punia
M. P.
(
2016
)
Characterization of groundwater quality of Tonk District, Rajasthan, India using factor analysis
,
International Journal of Environmental Sciences
,
6
(
4
),
454
466
.
Sreedevi
P. D.
,
Sreekanth
P. D.
,
Ahmed
S.
&
Reddy
D. V.
(
2019
)
Evaluation of groundwater quality for irrigation in a semi-arid region of South India
,
Sustainable Water Resources Management
,
5
,
1043
1056
.
https://doi.org/10.1007/s40899-018-0279-8
.
Tejashvini
A.
,
Subbarayappa
C. T.
,
Mudalagiriyappa, Chowdappa
H. D.
&
Ramamurthy
V.
(
2024
)
Assessment of irrigation water quality for groundwater in Semi-Arid Region, Bangalore, Karnataka
,
Water Science
,
38
(
1
),
548
568
.
Todd
D. K.
&
Mays
L. W.
(
2004
)
Groundwater Hydrology
.
Hoboken, NJ
:
John Wiley & Sons
.
Tran
T. N.
,
Afanador
N. L.
,
Buydens
L. M.
&
Blanchet
L.
(
2014
)
Interpretation of variable importance in partial least squares with significance multivariate correlation (sMC)
,
Chemometrics and Intelligent Laboratory Systems
,
138
,
153
160
.
Yan
Y.
,
Zhang
Y.
,
Yao
R.
,
Wei
C.
,
Luo
M.
,
Yang
C.
,
Chen
S.
&
Huang
X.
(
2024
)
Groundwater suitability assessment for irrigation and drinking purposes by integrating spatial analysis, machine learning, water quality index, and health risk model
,
Environmental Science and Pollution Research
,
31
,
1
22
.
Yenilmez
F.
,
Keskin
F.
&
Aksoy
A.
(
2011
)
Water quality trend analysis in Eymir Lake, Ankara
,
Physics and Chemistry of the Earth, Parts A/B/C
,
36
(
5–6
),
135
140
.
Yıldız
S.
&
Karakuş
C. B.
(
2020
)
Estimation of irrigation water quality index with development of an optimum model: A case study
,
Environment, Development and Sustainability
,
22
,
4771
4786
.
Zaman
M.
,
Shahid
S. A.
,
Heng
L.
,
Zaman
M.
,
Shahid
S. A.
&
Heng
L.
(
2018
)
Irrigation water quality
. In:
Guideline for Salinity Assessment, Mitigation and Adaptation Using Nuclear and Related Techniques
,
(Zaman, M., Shahid, S. A. & Heng, L., eds.). London: Springer Open
, pp.
113
131
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).