ABSTRACT
This study evaluates and predicts six water quality indices such as sodium adsorption ratio (SAR), Kelly's ratio (KR), percentage sodium (%Na), permeability index (PI), exchangeable sodium percentage (ESP), and irrigation water quality index (IWQI) using multivariate regression models (MLR, PLSR, PCR, and WLSR) and machine learning (ML) algorithms (ANN, SVM, CART, CRRF, and KNN). The study analyzes data from 360 dug wells in Sundargarh district, India, during 2014–2021 with 70% used for training and 30% for testing. Spatial mapping of SAR, KR, ESP, and PI exhibits higher suitability of groundwater. The Mann–Kendall test of trend analysis shows a monotonic increasing and decreasing trend for SAR, KR, %Na, ESP, PI, and IWQI, respectively, at p > 0.05 during 2014–2021. Principal component analysis and discriminant analysis identify Na+, SAR, KR, %Na, and PI as the most influential WQ variables affecting the groundwater quality for this study area. MLR and WLSR models are superior in predicting SAR and ESP, while ANN is the best-suited ML model for SAR, KR, %Na, PI, and ESP. CRRF predicts IWQI with a relatively higher accuracy. These findings demonstrate the effectiveness of ML models in improving irrigation water quality assessment, providing valuable insights for groundwater-based crop management.
HIGHLIGHTS
Developed the predictive models for water quality indices.
Utilize 360 dug wells data encompassing various water quality parameters.
Comparison of multivariate regression models with machine learning algorithms to determine the most accurate predictive models.
Trend analysis to identify significant trends in water quality over time.
Provide actionable insights for agricultural water management.
INTRODUCTION
Groundwater (GW) has a significant contribution in managing water resources, particularly in countries across West Asia and North Africa having considerable surface water scarcity (Li et al. 2022). Conversely, the agricultural sector is the largest global consumer of GW especially in West Asian countries such as Qatar, Oman, and Iran and South Asian countries such as India, and it serves as a vital resource for the socioeconomic development of any country (Ibrahim et al. 2023). Various natural processes, improper sewage disposal, and geochemical activities can render surface water unsuitable for irrigation and other uses (Benkov et al. 2023). Therefore, use of GW offers distinct advantages in terms of constancy and dependability, particularly for cultivation operation (Mohammed et al. 2022). Currently, expansion of irrigated land and uncontrolled exploitation of GW may lead to a decline in GW level with increased potential of contamination (Eyankware et al. 2022). Proper investigation and evaluation of GW quality is of utmost importance for crop productivity and soil fertility (El Bilali et al. 2021). Salt content in GW may adversely impact soil fertility and reduce biomass production of crops, regardless of its quantity (Mohammed et al. 2023). Along with that, variation in climatic conditions, soil characteristics, irrigation practices, and anthropogenic activities also has a noticeable impact on GW quality (Barbieri et al. 2023; Dao et al. 2023). GW quality may be poorer in the monsoon season as compared to the pre-monsoon season as a comparatively higher percolation rate may allow for more accumulation of inorganic fertilizers through irrigation return flow, and the rising water table dissolves solutes getting accumulated in the vadose zone, resulting in an increased ion concentration, which can cause GW contamination significantly (Rajmohan 2020; Mohammed et al. 2023). Therefore, GW quality can be effectively monitored and evaluated during each cycle of irrigation. In this context, some irrigation water quality indices (IWQIs) such as SAR, PI, ESP, KR, and %Na are considered to be the most significant WQ parameters to evaluate excess Na+ which affect soil structure and plant growth (Sreedevi et al. 2019; Kouadra & Demdoum 2020; Chidambaram et al. 2022). Along with that, water quality index (IWQI) is a regular predictor to reflect a collective influence of all the WQ variables from a large number of datasets and helps to maintain the desired quality of surface water and GW with spatial monitoring (El-Rawy et al. 2023). Several multivariate statistical approaches have been used in many studies such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), and discriminant analysis (DA) to reduce the complexity of a large number of datasets by selecting most influencing variables on WQI (Benkov et al. 2023).
In recent years, several machine learning (ML), artificial intelligence (AI), and deep learning (DL) techniques contain statistical approaches, analyzing and predictive algorithms for modeling water quality parameters (Abuzir & Abuzir 2022; Mukherjee et al. 2022; Nong et al. 2023). El Bilali & Taleb (2020) used seven ML approaches, i.e. ANN, MLR, decision tree (DT), RF, SVR, KNN, and stochastic gradient descent (SGD), to predict SAR, adjusted SAR (SARa), ESP, %Na, RSC, PI, KR, Cl−, MAR, and TDS by using EC and pH as input parameters in the surface water of the Bouregreg watershed in Morocco. Kulisz et al. (2021) developed the ANN model to predict the WQI of GW in an area of eastern Poland by taking five input variables such as pH, EC, Ca2+, Na+, and Mg2+ with significantly higher R2 of 0.99. Dimple et al. (2022) used five ML techniques, i.e. linear regression (LR), random subspace (RS), additive regression (AR), reduced error pruning tree (REPTree), and support vector machine (SVM), to predict six irrigation WQ indices such as SAR, %Na, KR, PI, SSP, and MH in GW of the Nand Samand catchment of Rajasthan, India.
(a) Field collection of the GW sample and (b) laboratory investigation of the GW sample in the regional laboratory, Bhubaneswar for Sundargarh district.
(a) Field collection of the GW sample and (b) laboratory investigation of the GW sample in the regional laboratory, Bhubaneswar for Sundargarh district.
Motivation
Agriculture is the backbone of the local economy of this district, and crop cultivation may suffer significant surface water scarcity due to very less availability of water bodies (1.21%). Hence, GW can serve as an effective alternative to meet the water demands of various crops. Paddy, being the primary crop of this district, occupies a substantial net cultivable area (76%), which needs a considerable amount of water throughout its growing period. Therefore, the motivation behind this research work is to explore the quality of GW for agricultural purposes by evaluating several important IWQIs along with the identification of most and least sensitive chemical water quality parameters responsible for spatial variation of GW quality throughout the district. Furthermore, extraction of GW shows an increasing trend during the past few years (2009–2022). Hence, evaluation and future prediction of GW quality parameters is of utmost importance for maintaining crop health for Sundargarh district. Consequently, no such similar study has yet been conducted for this district, providing both the opportunity and motivation to undertake the present research.
Building on previous studies, a comprehensive analysis is required to obtain detailed insights into the GW quality of Sundargarh district. Thus, this present research aims to achieve the following objectives: (i) to show the spatio-temporal variation of SAR, KR, %Na, PI, ESP, and IWQI in RS and GIS platforms; (ii) to perform trend analysis of these IWQIs by Mann–Kendall (MK) test, Sen's slope estimator, and graphical innovative trend analysis (ITA); (iii) to investigate the factor importance through PCA and DA and (iv) to assess the prediction of these WQ variables using regression models such as MLR, PCR, PLSR, and WLSR and ML tools such as ANN, SVM, CART, CRRF, and KNN and to investigate the model performance through statistical metrics.
STUDY AREA
Location map of the study area (Sundargarh district) showing dug wells.
MATERIALS AND METHODOLOGY
Data collection
GW quality parameters are obtained from the Central Groundwater Board (CGWB), India during the pre-monsoon season as seasonal data for 2014–2021 from 360 dug wells located across various taluks within the district, as illustrated in Figure 2. These data consist of Ca2+, Na+, K+, Mg2+, Cl−, HCO3−, and SO42− along with pH, EC, and TDS. Consequently, some useful IWQIs such as sodium adsorption ratio (SAR), Kelly's ratio (KR), percentage sodium (%Na), residual sodium carbonate (RSC), potential salinity (PS), permeability index (PI), magnesium absorption ratio (MAR), soluble sodium percentage (SSP), exchangeable sodium percentage (ESP), and IWQI are also determined.
Collection and preservation of GW samples
The CGWB of India conducts regional water level monitoring along with the collection of chemical water quality parameters during pre-monsoon, monsoon, and post-monsoon seasons through 1,608 National Hydrograph Network Stations (NHNS) in Orissa state. Open dug and bore wells are installed through shallow and deep phreatic aquifers at a depth of 30–300 m to facilitate the exploration of GW samples.
In contrast, these GW samples are collected and preserved in accordance with the standards provided in the American Public Health Association (APHA 2005) and BIS. Furthermore, physiochemical examination of these water samples is conducted in the regional laboratory of Bhubaneswar, Orissa. Plastic or polyethylene plastic bottles of 1 L capacity are used to collect GW samples. Refrigeration at 4 °C is required for preserving the GW samples and then they are brought to the laboratory and conditioned to 25 °C for determination of EC and decomposition of solid samples is allowed to take place for 7 days for measuring TDS. GW samples are preserved by adjusting pH < 2 using 50% HNO3 with 2 mL/L of water for determination of Ca2+ and Mg2+ after the bottles are thoroughly cleaned with 6(N) HCl and rinsed with deionized water. Concentration of Na+ and K+ is evaluated by preserving the samples at pH ≤ 2 with concentrated HNO3. Cl− and SO42− can be determined by storing the samples for 28 days in the laboratory without refrigeration and the water samples are treated with CH2O for sulfate determination. GW samples consisting of sulfites are required to be oxidized to sulfate by DO at pH > 8. Water samples are refrigerated at 4 °C up to 14 days for measuring the alkalinity as HCO3−. Samples containing bicarbonate alkalinity are carefully handled to avoid agitation and prolonged exposure to external condition.
Analysis and interpretation of data
Irrigation water quality indices
Sodium adsorption ratio
Kelly's ratio
Percentage sodium
Permeability index
Soluble sodium percentage
Exchangeable sodium percentage
Potential salinity
Magnesium absorption ratio
Residual sodium carbonate
Irrigation water quality index
Tables 1 and 2 provide the quality of each parameter (qi) and weight of IWQI parameters (Wi), respectively. IWQI obtained by Equation (2a) varies between 0 and 100. Accordingly, the water use restriction zone (Table 3) is categorized based on the IWQI value.
Parameter limiting values for quality measurement (qi) calculation
qi . | EC (dS/m) . | SAR (meq/L)1/2 . | Na+ (meq/L) . | Cl− (meq/L) . | HCO3− (meq/L) . |
---|---|---|---|---|---|
85–100 | 0.20 ≤ EC < 0.75 | 2 ≤ SAR < 3 | 2 ≤ Na < 3 | 1 ≤ Cl < 4 | 1 ≤ HCO3 < 1.5 |
60–85 | 0.75 ≤ EC < 1.50 | 3 ≤ SAR < 6 | 3 ≤ Na < 6 | 4 ≤ Cl < 7 | 1.5 ≤ HCO3 < 4.5 |
35–60 | 1.50 ≤ EC < 3.00 | 6 ≤ SAR <12 | 6 ≤ Na < 9 | 7 ≤ Cl <10 | 4.5 ≤ HCO3 < 8.5 |
0–35 | EC < 0.20 or EC ≥ 3.00 | SAR < 2 or SAR ≥ 12 | Na < 2 or Na ≥ 9 | Cl < 1 or Cl ≥10 | HCO3 < 1 or HCO3 ≥ 8.5 |
qi . | EC (dS/m) . | SAR (meq/L)1/2 . | Na+ (meq/L) . | Cl− (meq/L) . | HCO3− (meq/L) . |
---|---|---|---|---|---|
85–100 | 0.20 ≤ EC < 0.75 | 2 ≤ SAR < 3 | 2 ≤ Na < 3 | 1 ≤ Cl < 4 | 1 ≤ HCO3 < 1.5 |
60–85 | 0.75 ≤ EC < 1.50 | 3 ≤ SAR < 6 | 3 ≤ Na < 6 | 4 ≤ Cl < 7 | 1.5 ≤ HCO3 < 4.5 |
35–60 | 1.50 ≤ EC < 3.00 | 6 ≤ SAR <12 | 6 ≤ Na < 9 | 7 ≤ Cl <10 | 4.5 ≤ HCO3 < 8.5 |
0–35 | EC < 0.20 or EC ≥ 3.00 | SAR < 2 or SAR ≥ 12 | Na < 2 or Na ≥ 9 | Cl < 1 or Cl ≥10 | HCO3 < 1 or HCO3 ≥ 8.5 |
Weights for the IWQI parameters
Parameters . | Wi . |
---|---|
EC | 0.211 |
Na+ | 0.204 |
HCO3− | 0.202 |
Cl− | 0.194 |
SAR | 0.189 |
Parameters . | Wi . |
---|---|
EC | 0.211 |
Na+ | 0.204 |
HCO3− | 0.202 |
Cl− | 0.194 |
SAR | 0.189 |
Different water use restriction zones
IWQI . | Water use restriction zone . |
---|---|
0–40 | Severe restriction (SR) |
40–55 | High restriction (HR) |
55–70 | Moderate restriction (MR) |
70–85 | Low restriction (LR) |
85–100 | No restriction (NR) |
IWQI . | Water use restriction zone . |
---|---|
0–40 | Severe restriction (SR) |
40–55 | High restriction (HR) |
55–70 | Moderate restriction (MR) |
70–85 | Low restriction (LR) |
85–100 | No restriction (NR) |
Trend analysis
MK test
A positive or negative value of Z will show the direction of the trend.
Sen's slope estimator
Innovative trend analysis

Discriminant analysis
Multivariate statistical regression models
Multiple linear regression
Partial least square regression
PLSR is a multivariate statistical approach (MSA) that considers the comparison between multiple predictors and independent variables for a given dataset. Model quality is judged by statistical measures such as Q2 cum, R2Y cum, and R2X cum in XLSTAT 2023 in this present study.
Principal component regression
PCR combines PCA and ordinary least square (OLS) regression. The goodness of fit of the PCR model can be tested by measuring the statistical parameters, i.e. Durbin–Watson (DW), Mallow's Cp, Akaike information criterion (AIC), and Bayesian information criteria (BIC) in this present study.
Weighted least square regression

ML techniques
Artificial neural network
Support vector machine

Classification and regression tree
The CART was initially developed by Breiman et al. (1984). Parent node is the most significant factor in the CART model. The best model output is obtained by considering maximum parent size, maximum tree depth, and maximum number of son size in the CART algorithm.
Classification and regression random forest
The CRRF model was also primarily proposed by Breiman et al. (1984). The output of the CRRF model includes OOB (out-of-bag) error, OOB prediction, and OOB prediction details.
K-nearest neighbor
Tuning hyperparameters for regression and ML models
Model . | Tuning hyperparameter . |
---|---|
PLSR |
|
| |
ANN |
|
| |
| |
SVM |
|
| |
| |
CART |
|
| |
| |
CRRF |
|
| |
KNN |
|
|
Model . | Tuning hyperparameter . |
---|---|
PLSR |
|
| |
ANN |
|
| |
| |
SVM |
|
| |
| |
CART |
|
| |
| |
CRRF |
|
| |
KNN |
|
|
Statistical measures for predictive strength and accuracy of the proposed models
RESULTS AND DISCUSSION
Statistical analysis of WQ variables
Descriptive statistics for the predicted IWQIs
Variables . | Min . | Max . | Mean . | Median . | Std. Dev. . |
---|---|---|---|---|---|
SAR | 0.09 | 10.06 | 1.13 | 0.89 | 0.94 |
KR | 0.03 | 3.34 | 0.50 | 0.41 | 0.37 |
%Na | 4.83 | 77.00 | 32.44 | 31.42 | 13.45 |
PI | 31.00 | 168.01 | 71.96 | 70.03 | 17.10 |
ESP | −113.56 | 1,195.09 | 38.25 | 5.22 | 130.42 |
IWQI | 34.58 | 90.59 | 63.26 | 62.67 | 9.36 |
Variables . | Min . | Max . | Mean . | Median . | Std. Dev. . |
---|---|---|---|---|---|
SAR | 0.09 | 10.06 | 1.13 | 0.89 | 0.94 |
KR | 0.03 | 3.34 | 0.50 | 0.41 | 0.37 |
%Na | 4.83 | 77.00 | 32.44 | 31.42 | 13.45 |
PI | 31.00 | 168.01 | 71.96 | 70.03 | 17.10 |
ESP | −113.56 | 1,195.09 | 38.25 | 5.22 | 130.42 |
IWQI | 34.58 | 90.59 | 63.26 | 62.67 | 9.36 |
Std. Dev., standard deviation.
Box plot for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.
Box plot for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.
Histogram for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.
Histogram for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI during 2014–2021.
Spatial variation of some important WQ variables
Distribution of wells in different classes of water based on WQ parameters
WQ parameters . | Permissible range . | Water class/category . | Number of water samples . | % Water samples . |
---|---|---|---|---|
SAR (meq/L)1/2 | 0–10 | Excellent | 360 | 100 |
KR | <1 | Suitable | 336 | 93 |
1–2 | Marginal suitable | 24 | 7 | |
%Na | <20 | Excellent | 45 | 13 |
20–40 | Good | 244 | 67 | |
40–60 | Permissible | 65 | 18 | |
60–80 | Doubtful | 6 | 2 | |
PI | >75 | Class I | 138 | 38 |
25–75 | Class II | 222 | 62 | |
ESP | <20 | Excellent | 360 | 100 |
IWQI | 0–40 | SR | 2 | 1 |
40–55 | HR | 22 | 6 | |
55–70 | MR | 265 | 74 | |
70–85 | LR | 64 | 18 | |
85–100 | NR | 7 | 2 |
WQ parameters . | Permissible range . | Water class/category . | Number of water samples . | % Water samples . |
---|---|---|---|---|
SAR (meq/L)1/2 | 0–10 | Excellent | 360 | 100 |
KR | <1 | Suitable | 336 | 93 |
1–2 | Marginal suitable | 24 | 7 | |
%Na | <20 | Excellent | 45 | 13 |
20–40 | Good | 244 | 67 | |
40–60 | Permissible | 65 | 18 | |
60–80 | Doubtful | 6 | 2 | |
PI | >75 | Class I | 138 | 38 |
25–75 | Class II | 222 | 62 | |
ESP | <20 | Excellent | 360 | 100 |
IWQI | 0–40 | SR | 2 | 1 |
40–55 | HR | 22 | 6 | |
55–70 | MR | 265 | 74 | |
70–85 | LR | 64 | 18 | |
85–100 | NR | 7 | 2 |
Spatial variation of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the pre-monsoon season during 2014–2021.
Spatial variation of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the pre-monsoon season during 2014–2021.
Results of the trend analysis
The results of the trend analysis for SAR, KR, %Na, PI, ESP, and IWQI during 2014–2021 for the pre-monsoon season is shown in Table 7 using the MK test, Sen's slope estimator, ITA, and PBIAS test. Similar techniques were implemented by Kurunç et al. (2005), Sun et al. (2016), and Yenilmez et al. (2011) for trend analysis of different cations and anions. In this present study, the graphical ITA method is implemented with the MK test to detect the sub-trend in the variation of SAR, KR, %Na, PI, ESP, and IWQI, which is overlooked by the MK test. The total data series (2014–2021) is divided into two parts, i.e. first half (2014–2017) and second half (2018–2021), and a graphical plot is established between them. Along with that, this graphical plot exhibits the positive or negative change in the variation of these water quality parameters by comparing the first (2014–2017) and second (2018–2021) halves of the ITA pre-processed data along with Sen's slope.
Details of pre-monsoon season WQ parameters trend of Sundargarh district
WQ parameters . | Z Statistics . | MK tau . | Sen's slope . | ITA slope . | ITD . | PBIAS . |
---|---|---|---|---|---|---|
SAR | 0.915 | 0.032 | 0.0003 | 0.680 | ![]() | 13.59 |
KR | 1.191 | 0.042 | 0.0002 | 0.721 | ![]() | 14.42 |
%Na | 1.361 | 0.048 | 0.0098 | 0.210 | ![]() | 4.20 |
PI | 0.091 | 0.003 | 0.0007 | −0.031 | ![]() | −0.61 |
ESP | 0.913 | 0.032 | 0.0381 | 3.396 | ![]() | 67.91 |
IWQI | 0.026 | 0.001 | 0.0001 | −0.067 | ![]() | −1.34 |
WQ parameters . | Z Statistics . | MK tau . | Sen's slope . | ITA slope . | ITD . | PBIAS . |
---|---|---|---|---|---|---|
SAR | 0.915 | 0.032 | 0.0003 | 0.680 | ![]() | 13.59 |
KR | 1.191 | 0.042 | 0.0002 | 0.721 | ![]() | 14.42 |
%Na | 1.361 | 0.048 | 0.0098 | 0.210 | ![]() | 4.20 |
PI | 0.091 | 0.003 | 0.0007 | −0.031 | ![]() | −0.61 |
ESP | 0.913 | 0.032 | 0.0381 | 3.396 | ![]() | 67.91 |
IWQI | 0.026 | 0.001 | 0.0001 | −0.067 | ![]() | −1.34 |
ITD, innovative trend detection.
Spatio-temporal variation (trend) of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the time period of 2014–2017 (first half of the time series) and 2018–2021 (second half of the time series) for Sundargarh district.
Spatio-temporal variation (trend) of (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI for the time period of 2014–2017 (first half of the time series) and 2018–2021 (second half of the time series) for Sundargarh district.
On the contrary, results of graphical ITA (Table 7) reveal that the distribution of SAR, KR, %Na, and ESP shows a positive trend having a positive ITA slope. Furthermore, PI and IWQI exhibit a marginal decreasing trend in their data distribution having a negative ITA slope which is not similar to the results obtained by the MK test. It signifies that the nature of the sub-trend for the distribution of PI and IWQI is either non-monotonic increasing or decreasing during the two halves (2014–2017 and 2018–2021) of the data period. It is observed that PBIAS for SAR, KR, %Na, and ESP indicates that the second half (2014–2021) of the data period is dominated by positive change for 57.2, 2.8, 51.7, and 57.8% wells, respectively. However, PBIAS for PI and IWQI indicates that 46.1 and 53.3% wells are influenced by a negative change and first half and second half of the data are dominated by increasing and decreasing trends for PI and IWQI, respectively.
Results of PCA
In this present study, PCA is performed for 360 GW samples with 19 WQ parameters including pH, EC, TDS, Na+, Ca2+, Mg2+, K+, Cl−, HCO3−, SO42−, RSC, SAR, KR, %Na, PI, ESP, SSP, MAR, and PS for the pre-monsoon season during 2014–2021. Similar studies have been conducted by Nosrati & Van Den Eeckhaut (2012) and Chen et al. (2018). PCA exhibits its importance by identifying most important water quality parameters (19 variables), having a significantly higher positive factor loading under several clusters (Eigen value > 1) for the prediction of the response variable (IWQI) for this present study (Mitra et al. 2018). Also, the relative importance of each principal component (PC) is illustrated for the determination of IWQI through the explained variance. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy for this dataset is found to be 0.735 (>0.5) and Bartlett's test of sphericity is significant (p < 0.05), indicating the suitability of the dataset for PCA, which was reported by Hutcheson & Sofroniou (1999). Varimax factor loading is used to determine Eigen values, with percentage of variance explained in Table 8 (Ganiyu et al. 2022; Taşan et al. 2022).
Varimax rotated factor loadings of GW quality parameters
Variables . | Varimax components . | |||
---|---|---|---|---|
PC1 . | PC2 . | PC3 . | PC4 . | |
pH | 0.072 | −0.189 | 0.602 | −0.101 |
EC | 0.918 | −0.324 | 0.099 | −0.072 |
TDS | 0.945 | −0.285 | 0.089 | −0.074 |
Ca2+ | 0.414 | −0.691 | 0.016 | −0.532 |
Mg2+ | 0.699 | −0.514 | 0.086 | 0.394 |
Na+ | 0.932 | 0.214 | 0.207 | −0.047 |
K+ | 0.359 | −0.056 | −0.231 | 0.052 |
HCO3− | 0.634 | −0.338 | 0.594 | −0.043 |
SO42− | 0.671 | −0.170 | −0.231 | 0.187 |
Cl− | 0.860 | −0.215 | −0.314 | −0.131 |
RSC | −0.143 | 0.618 | 0.715 | 0.003 |
SAR | 0.852 | 0.466 | 0.144 | −0.058 |
KR | 0.587 | 0.774 | −0.015 | −0.060 |
%Na | 0.515 | 0.792 | −0.220 | −0.028 |
PI | −0.245 | 0.865 | −0.040 | −0.049 |
ESP | 0.861 | 0.470 | 0.120 | −0.055 |
MAR | 0.418 | −0.061 | 0.076 | 0.868 |
SSP | 0.533 | 0.787 | −0.190 | −0.040 |
PS | 0.890 | −0.222 | −0.323 | −0.089 |
Eigen value | 8.398 | 4.67 | 1.73 | 1.29 |
Variance (%) | 44.20 | 24.58 | 9.11 | 6.81 |
Cumulative variance (%) | 44.20 | 68.78 | 77.89 | 84.70 |
Variables . | Varimax components . | |||
---|---|---|---|---|
PC1 . | PC2 . | PC3 . | PC4 . | |
pH | 0.072 | −0.189 | 0.602 | −0.101 |
EC | 0.918 | −0.324 | 0.099 | −0.072 |
TDS | 0.945 | −0.285 | 0.089 | −0.074 |
Ca2+ | 0.414 | −0.691 | 0.016 | −0.532 |
Mg2+ | 0.699 | −0.514 | 0.086 | 0.394 |
Na+ | 0.932 | 0.214 | 0.207 | −0.047 |
K+ | 0.359 | −0.056 | −0.231 | 0.052 |
HCO3− | 0.634 | −0.338 | 0.594 | −0.043 |
SO42− | 0.671 | −0.170 | −0.231 | 0.187 |
Cl− | 0.860 | −0.215 | −0.314 | −0.131 |
RSC | −0.143 | 0.618 | 0.715 | 0.003 |
SAR | 0.852 | 0.466 | 0.144 | −0.058 |
KR | 0.587 | 0.774 | −0.015 | −0.060 |
%Na | 0.515 | 0.792 | −0.220 | −0.028 |
PI | −0.245 | 0.865 | −0.040 | −0.049 |
ESP | 0.861 | 0.470 | 0.120 | −0.055 |
MAR | 0.418 | −0.061 | 0.076 | 0.868 |
SSP | 0.533 | 0.787 | −0.190 | −0.040 |
PS | 0.890 | −0.222 | −0.323 | −0.089 |
Eigen value | 8.398 | 4.67 | 1.73 | 1.29 |
Variance (%) | 44.20 | 24.58 | 9.11 | 6.81 |
Cumulative variance (%) | 44.20 | 68.78 | 77.89 | 84.70 |
Bold values represent strong (>0.75) and moderate (0.5–0.75) loadings.
(a) Scree plot and (b) loading plot for PCA of the WQ data during the pre-monsoon period.
(a) Scree plot and (b) loading plot for PCA of the WQ data during the pre-monsoon period.
Factor 2 (PC2) has explained 24.58% of the total variance having an Eigen value as 4.67 and Patnaik et al. (2024) have found a similar outcome. It includes KR (0.774), %Na (0.792), PI (0.865), and SSP (0.787), showing a comparatively higher factor loading (>0.75) and RSC having a moderate positive factor loading (0.618) aligned partially with the findings of Raju (2007). On the other hand, %Na, PI, and SSP are strongly correlated with PC2, resulting in dominance of Na+, indicating that evaporation can be a major geochemical mechanism for GW occurrence (Saha & Paul 2019). Therefore, PC2 can be identified as a ‘sodium hazard’ factor.
PC3 has explained 9.11% of the total variance showing an Eigen value of 1.73. RSC (0.715) and pH (0.602) have a moderate positive factor loading in PC3, and this finding partially aligned with Chen et al. (2018). GW may be alkaline for dominance of HCO3− (Sharma et al. 2016) and more concentration of (HCO3− + SO42−) over total alkalis may indicate that carbonate weathering can have more influence of GW occurrence for few wells. PC3 may be called as ‘alkalinity’ factor.
PC4 has explained 6.81% of the total variance including an Eigen value of 1.29. MAR is highly positively correlated (0.868) with PC4, which illustrates mineral unification for GW occurrence. Hence, PC4 can be identified as a ‘magnesium hazard’ factor.
Spatial variation in GW quality through DA
The spatial variation of GW quality from 360 dug wells is studied by DA by adopting 10 explanatory variables (EC, TDS, Na+, Cl−, SAR, KR, ESP, PS, %Na, and PI) and one response variable (IWQI) for this present study aligned with Nosrati & Van Den Eeckhaut (2012) and Chen et al. (2018). The objective of DA is to categorize the most significant variables for showing the significant variability of GW quality among the wells. Table 9 exhibits the statistical summary of DA including Wilks' Lambda and F-test value for determining the significance of DFs at a significance level of p < 0.01 for most of the variables. The evaluation of unstandardized DFs corresponds to the statistical summary provided in Table 10. The first DF explained 93.14% of the total variance, having an Eigen value of 6.12 and has the strongest discriminant power, whereas the second and third DF explained 5.56 and 0.96% of the total variance, respectively. On the contrary, the canonical correlation (CC) coefficient has explained only 86% or (0.927)2 of the first DF, which clearly reflects the significant spatial variability of the WQ variables among the stations. Mammeri et al. (2023) found that the first DF has the highest discriminating power including a CC of 0.971 for showing the spatial variability of surface water aligned mostly to the findings of the present study.
Test of group mean equality for discriminant analysis
Variables . | Wilk's Lambda . | F . | DF1 . | DF2 . | p-value . |
---|---|---|---|---|---|
EC | 0.946 | 3.529 | 4 | 247 | 0.0080 |
TDS | 0.938 | 4.049 | 4 | 247 | 0.0034 |
Na+ | 0.894 | 7.326 | 4 | 247 | <0.0001 |
Cl− | 0.880 | 8.382 | 4 | 247 | <0.0001 |
SAR | 0.859 | 10.136 | 4 | 247 | <0.0001 |
KR | 0.832 | 12.506 | 4 | 247 | <0.0001 |
%Na | 0.818 | 13.698 | 4 | 247 | <0.0001 |
PI | 0.737 | 22.004 | 4 | 247 | <0.0001 |
ESP | 0.842 | 16.691 | 4 | 247 | <0.0001 |
PS | 0.874 | 8.885 | 4 | 247 | <0.0001 |
IWQI | 0.151 | 347.904 | 4 | 247 | <0.0001 |
Variables . | Wilk's Lambda . | F . | DF1 . | DF2 . | p-value . |
---|---|---|---|---|---|
EC | 0.946 | 3.529 | 4 | 247 | 0.0080 |
TDS | 0.938 | 4.049 | 4 | 247 | 0.0034 |
Na+ | 0.894 | 7.326 | 4 | 247 | <0.0001 |
Cl− | 0.880 | 8.382 | 4 | 247 | <0.0001 |
SAR | 0.859 | 10.136 | 4 | 247 | <0.0001 |
KR | 0.832 | 12.506 | 4 | 247 | <0.0001 |
%Na | 0.818 | 13.698 | 4 | 247 | <0.0001 |
PI | 0.737 | 22.004 | 4 | 247 | <0.0001 |
ESP | 0.842 | 16.691 | 4 | 247 | <0.0001 |
PS | 0.874 | 8.885 | 4 | 247 | <0.0001 |
IWQI | 0.151 | 347.904 | 4 | 247 | <0.0001 |
Eigen values for the DFs
Discriminant function . | Eigen value . | % Variance . | Cumulative % . | CC coefficient . |
---|---|---|---|---|
1 | 6.12 | 93.14 | 93.14 | 0.927 |
2 | 0.37 | 5.56 | 98.71 | 0.517 |
3 | 0.06 | 0.96 | 99.70 | 0.244 |
Discriminant function . | Eigen value . | % Variance . | Cumulative % . | CC coefficient . |
---|---|---|---|---|
1 | 6.12 | 93.14 | 93.14 | 0.927 |
2 | 0.37 | 5.56 | 98.71 | 0.517 |
3 | 0.06 | 0.96 | 99.70 | 0.244 |
CC coefficient, canonical correlation coefficient.
Table 11 represents the standardized canonical coefficients for the DFs. Na+, Cl−, SAR, PS, and IWQI can be the major WQ variables in the first DF for discriminating the spatial variation of GW quality across 360 sampling stations. However, Na+, SAR, %Na, PI, and PS are highly correlated with the second DF and Na+, SAR, KR, %Na, and PI have a comparatively higher correlation with the third DF, showing the spatial difference of the GW quality throughout the district for the dug wells considered.
Standardized canonical DF coefficient
Variables . | DF . | ||
---|---|---|---|
1 . | 2 . | 3 . | |
EC | −0.584 | −0.127 | 0.613 |
TDS | 0.929 | 0.554 | −0.552 |
Na+ | −0.962 | −1.403 | −9.027 |
Cl− | 1.076 | −0.641 | 0.558 |
SAR | 1.074 | 1.777 | 12.992 |
KR | −0.411 | 0.182 | −4.090 |
%Na | 0.130 | −1.227 | −2.128 |
PI | −0.198 | 1.536 | 0.980 |
ESP | 0.000 | 0.000 | 0.000 |
PS | −1.487 | 1.340 | 0.720 |
IWQI | 1.032 | 0.183 | −0.012 |
Variables . | DF . | ||
---|---|---|---|
1 . | 2 . | 3 . | |
EC | −0.584 | −0.127 | 0.613 |
TDS | 0.929 | 0.554 | −0.552 |
Na+ | −0.962 | −1.403 | −9.027 |
Cl− | 1.076 | −0.641 | 0.558 |
SAR | 1.074 | 1.777 | 12.992 |
KR | −0.411 | 0.182 | −4.090 |
%Na | 0.130 | −1.227 | −2.128 |
PI | −0.198 | 1.536 | 0.980 |
ESP | 0.000 | 0.000 | 0.000 |
PS | −1.487 | 1.340 | 0.720 |
IWQI | 1.032 | 0.183 | −0.012 |
Table 12 illustrates the classification function coefficients of the input WQ variables for the water restriction zones as per IWQI, namely, NR, LR, MR, HR, and SR zones. Na+, SAR, Cl−, KR, PI, and PS are the most sensitive WQ variables for the classification of water restriction zones. Subsequently, ESP has no influence on the occurrence of these zones.
Classification function coefficients
Variables . | IWQI Class . | ||||
---|---|---|---|---|---|
HR . | LR . | MR . | NR . | SR . | |
EC | 1.335 | −10.947 | −3.725 | −20.917 | 11.535 |
TDS | 0.379 | 0.415 | 0.393 | 0.478 | 0.349 |
Na | −57.282 | −64.414 | −63.308 | −78.394 | −63.896 |
Cl− | 17.395 | 22.425 | 20.607 | 27.940 | 13.424 |
SAR | 71.343 | 86.592 | 84.675 | 109.955 | 88.766 |
KR | −8.981 | −24.818 | −23.555 | −35.172 | −20.223 |
%Na | −2.808 | −2.823 | −2.821 | −3.152 | −3.270 |
PI | 2.617 | 2.566 | 2.540 | 2.756 | 2.961 |
ESP | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
PS | −20.381 | −26.211 | −23.943 | −32.205 | −13.380 |
Intercept | −247.854 | −367.771 | −295.118 | −500.842 | −225.395 |
Variables . | IWQI Class . | ||||
---|---|---|---|---|---|
HR . | LR . | MR . | NR . | SR . | |
EC | 1.335 | −10.947 | −3.725 | −20.917 | 11.535 |
TDS | 0.379 | 0.415 | 0.393 | 0.478 | 0.349 |
Na | −57.282 | −64.414 | −63.308 | −78.394 | −63.896 |
Cl− | 17.395 | 22.425 | 20.607 | 27.940 | 13.424 |
SAR | 71.343 | 86.592 | 84.675 | 109.955 | 88.766 |
KR | −8.981 | −24.818 | −23.555 | −35.172 | −20.223 |
%Na | −2.808 | −2.823 | −2.821 | −3.152 | −3.270 |
PI | 2.617 | 2.566 | 2.540 | 2.756 | 2.961 |
ESP | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
PS | −20.381 | −26.211 | −23.943 | −32.205 | −13.380 |
Intercept | −247.854 | −367.771 | −295.118 | −500.842 | −225.395 |
Table 13 shows that DA is able to properly categorize 94.4 and 90.74% correctly classified water restriction zones in the training and testing dataset, respectively, aligned with the findings of Mammeri et al. (2023). However, DA also effectively identifies 91.27% correctly classified water restriction zones in cross-validation, signifying its appropriateness and effectuality in spatial variation of GW quality for this district.
Confusion matrix for discriminant analysis
. | . | . | . | HR . | LR . | MR . | NR . | SR . | Total . |
---|---|---|---|---|---|---|---|---|---|
Original | Training | Count | HR | 17 | 0 | 3 | 0 | 1 | 21 |
LR | 0 | 60 | 1 | 0 | 0 | 61 | |||
MR | 0 | 9 | 151 | 0 | 0 | 160 | |||
NR | 0 | 0 | 0 | 3 | 0 | 3 | |||
SR | 0 | 0 | 0 | 0 | 7 | 7 | |||
Validation | Count | HR | 14 | 0 | 2 | 0 | 0 | 16 | |
LR | 0 | 21 | 0 | 1 | 0 | 22 | |||
MR | 0 | 5 | 58 | 0 | 0 | 63 | |||
NR | 0 | 0 | 0 | 1 | 0 | 1 | |||
SR | 2 | 0 | 0 | 0 | 4 | 6 | |||
Cross-validated | Count | HR | 16 | 0 | 4 | 0 | 1 | 21 | |
LR | 0 | 59 | 2 | 0 | 0 | 61 | |||
MR | 2 | 10 | 148 | 0 | 0 | 160 | |||
NR | 0 | 0 | 0 | 3 | 0 | 3 | |||
SR | 3 | 0 | 0 | 0 | 4 | 7 |
. | . | . | . | HR . | LR . | MR . | NR . | SR . | Total . |
---|---|---|---|---|---|---|---|---|---|
Original | Training | Count | HR | 17 | 0 | 3 | 0 | 1 | 21 |
LR | 0 | 60 | 1 | 0 | 0 | 61 | |||
MR | 0 | 9 | 151 | 0 | 0 | 160 | |||
NR | 0 | 0 | 0 | 3 | 0 | 3 | |||
SR | 0 | 0 | 0 | 0 | 7 | 7 | |||
Validation | Count | HR | 14 | 0 | 2 | 0 | 0 | 16 | |
LR | 0 | 21 | 0 | 1 | 0 | 22 | |||
MR | 0 | 5 | 58 | 0 | 0 | 63 | |||
NR | 0 | 0 | 0 | 1 | 0 | 1 | |||
SR | 2 | 0 | 0 | 0 | 4 | 6 | |||
Cross-validated | Count | HR | 16 | 0 | 4 | 0 | 1 | 21 | |
LR | 0 | 59 | 2 | 0 | 0 | 61 | |||
MR | 2 | 10 | 148 | 0 | 0 | 160 | |||
NR | 0 | 0 | 0 | 3 | 0 | 3 | |||
SR | 3 | 0 | 0 | 0 | 4 | 7 |
Evaluation of results by regression models
In this present research, four multivariate regression models, namely, MLR, PLSR, PCR, and WLSR, are used for the prediction of SAR, KR, %Na, PI, ESP, and IWQI for the pre-monsoon season by taking 70 and 30% wells as training and testing datasets, respectively. Similar regression models were implemented by many researchers for the prediction of WQ variables. Regression coefficients (Table 14) for these models and their predictive strength are evaluated by determining several statistical measures, namely, RMSE, MAPE, NSEC, R2, WI, VAF, and RBIAS both in training and testing phases, as given in Table 15. Gupta et al. (2024) used similar statistical parameters in their study. A model can underestimate (RBIAS > 0) or overestimate (RBIAS < 0) based on the RBIAS value (Gorgij et al. 2023). MLR can be considered as the superior regression model for the prediction of SAR (R2 = 0.96, VAF = 96.93%, RMSE = 0.21 meq/L1/2) and ESP (R2 = 0.96, VAF = 96.33%, RMSE = 0.26) in the training dataset. Also, it has a comparatively higher predictive strength in the testing dataset for prediction of SAR (R2 = 0.93, VAF = 92.96%, RMSE = 0.24 meq/L1/2) and ESP (R2 = 0.94, VAF = 94.11%, RMSE = 0.30 meq/L1/2). On the contrary, MLR has moderate prediction accuracy for the prediction of %Na, PI, and IWQI having RMSE as 8.24, 10.06, and 5.75 in the training and 7.72, 10.95, and 6.75 in the testing dataset, respectively. MLR highly overestimates the prediction of KR, SAR, and ESP, having a relatively higher negative RBIAS as −39.12, −29.57, and −15.19, respectively, in the training phase. The PCR model shows higher prediction accuracy for all the WQ variables over the PLSR model, having relatively higher R2, NSEC, VAF, and lower RMSE both in training and testing datasets. However, PLSR can be considered as the inferior approach for the determination of IWQI in both training and testing phases, having significantly lower R2 value ( and
) and VAF (VAFtraining = 14.93% and VAFtesting = 23.65%). These observations are similar to the findings of Abba et al. (2020) and Mokhtar et al. (2022). Furthermore, El Bilali & Taleb (2020) used MLR in the Bouregreg watershed in Morocco for the prediction of 10 WQ parameters and found its higher suitability for the prediction of SAR and KR having an R2 of 0.94 and 0.71, respectively. Elsayed et al. (2020) exhibited higher prediction accuracy of MLR and PCR models for the prediction of SAR, KR, %Na, PI, and IWQI. The prediction of all the WQ variables through PLSR and PCR models perfectly aligned with the observed data, having RBIAS of zero in the training dataset. The WLSR model exhibits higher predictive strength for the prediction of SAR (
; RMSEtraining = 0.23 and
; RMSEtesting = 0.28), KR (
; RMSEtraining = 0.17 and
; RMSEtesting = 0.25), and ESP (
; RMSEtraining = 0.28 and
; RMSEtesting = 0.40) in the training phase as compared with the testing phase with relatively higher overestimation in the training phase. Simultaneously, WLSR provides comparatively higher underestimation for ESP (RBIAS = 15.13).
Regression coefficients developed between water quality indices and chemical composition of water in regression models
Dependent variables . | Independent variables . | Regression coefficients . | Dependent variables . | Independent variables . | Regression coefficients . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MLR . | PLSR . | PCR . | WLSR . | MLR . | PLSR . | PCR . | WLSR . | ||||
SAR | Intercept | 1.847 | 1.77 | 1.543 | 1.045 | ESP | Intercept | −0.35 | −0.637 | −0.426 | −0.635 |
pH | −0.162 | −0.179 | −0.13 | −0.083 | EC | 0.276 | 0.256 | 0.0124 | −0.436 | ||
EC | −0.221 | 0.304 | −0.417 | −0.594 | TDS | −0.0021 | −0.0011 | −0.0012 | 0.0011 | ||
TDS | −0.001 | 0.001 | −0.001 | 0.001 | Mg2+ | −0.0782 | −0.475 | −0.0443 | −0.682 | ||
Mg2+ | −0.078 | −0.327 | −0.032 | −0.098 | Na+ | 1.102 | 0.78 | 1.08 | 1.061 | ||
Na+ | 0.801 | 0.584 | 0.795 | 0.714 | Cl− | −0.0177 | 0.101 | −0.0607 | −0.115 | ||
HCO3− | 0.129 | 0.009 | 0.12 | −0.111 | |||||||
KR | Intercept | 1.68 | 2.115 | 1.509 | 1.427 | PI | Intercept | 196.423 | 250.63 | 203.55 | 194.689 |
pH | −0.145 | −0.218 | −0.132 | −0.129 | pH | −13.342 | −20.892 | −14.519 | −13.439 | ||
EC | −0.045 | −0.023 | −0.189 | −0.104 | EC | −19.483 | −9.436 | −21.605 | −14.844 | ||
TDS | −0.001 | 0 | −0.001 | 0 | TDS | −0.114 | −0.015 | −0.067 | −0.093 | ||
Mg2+ | −0.039 | −0.165 | −0.026 | −0.03 | Mg2+ | −2.393 | −7.933 | −2.888 | −2.259 | ||
Na+ | 0.348 | 0.226 | 0.331 | 0.306 | Na+ | 12.455 | 6.806 | 11.332 | 10.595 | ||
Cl− | −0.008 | 0.021 | −0.03 | −0.058 | Cl− | 0.233 | −1.746 | −1.539 | −0.277 | ||
HCO3− | −0.059 | −0.021 | −0.052 | −0.052 | HCO3− | 1.19 | −0.622 | −0.19 | 0.582 | ||
%Na | Intercept | 35.816 | −0.021 | 33.392 | 32.023 | IWQI | Intercept | 121.406 | 61.196 | 158.519 | 128.457 |
EC | 13.213 | −0.109 | 11.279 | −4.054 | EC | −1.2 | 1.528 | −0.846 | −4.285 | ||
TDS | −0.041 | 0.005 | −0.01 | 0.011 | TDS | −0.012 | 0.003 | −0.045 | −0.025 | ||
Mg2+ | −1.534 | −5.477 | −1.762 | −1.636 | Na+ | −22.597 | 0.234 | −13.132 | −22.377 | ||
Na+ | 10.863 | 6.837 | 10.035 | 12.865 | Cl− | −1.955 | 0.448 | −2.153 | −1.063 | ||
Cl− | −0.747 | 1.908 | −2.146 | −2.78 | SAR | −5.03 | 0.353 | −44.467 | −3.022 | ||
HCO3− | −4.85 | −2.43 | −5.291 | −4.784 | ESP | 42.4 | 0.3 | 59.8 | 44.6 | ||
PS | 4.591 | 0.427 | 3.872 | 4.5 | |||||||
PI | −0.198 | −0.035 | −0.362 | −0.215 | |||||||
%Na | 0.078 | 0.028 | 0.139 | 0.031 | |||||||
KR | −53.374 | 0.493 | −40.297 | −61.99 |
Dependent variables . | Independent variables . | Regression coefficients . | Dependent variables . | Independent variables . | Regression coefficients . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MLR . | PLSR . | PCR . | WLSR . | MLR . | PLSR . | PCR . | WLSR . | ||||
SAR | Intercept | 1.847 | 1.77 | 1.543 | 1.045 | ESP | Intercept | −0.35 | −0.637 | −0.426 | −0.635 |
pH | −0.162 | −0.179 | −0.13 | −0.083 | EC | 0.276 | 0.256 | 0.0124 | −0.436 | ||
EC | −0.221 | 0.304 | −0.417 | −0.594 | TDS | −0.0021 | −0.0011 | −0.0012 | 0.0011 | ||
TDS | −0.001 | 0.001 | −0.001 | 0.001 | Mg2+ | −0.0782 | −0.475 | −0.0443 | −0.682 | ||
Mg2+ | −0.078 | −0.327 | −0.032 | −0.098 | Na+ | 1.102 | 0.78 | 1.08 | 1.061 | ||
Na+ | 0.801 | 0.584 | 0.795 | 0.714 | Cl− | −0.0177 | 0.101 | −0.0607 | −0.115 | ||
HCO3− | 0.129 | 0.009 | 0.12 | −0.111 | |||||||
KR | Intercept | 1.68 | 2.115 | 1.509 | 1.427 | PI | Intercept | 196.423 | 250.63 | 203.55 | 194.689 |
pH | −0.145 | −0.218 | −0.132 | −0.129 | pH | −13.342 | −20.892 | −14.519 | −13.439 | ||
EC | −0.045 | −0.023 | −0.189 | −0.104 | EC | −19.483 | −9.436 | −21.605 | −14.844 | ||
TDS | −0.001 | 0 | −0.001 | 0 | TDS | −0.114 | −0.015 | −0.067 | −0.093 | ||
Mg2+ | −0.039 | −0.165 | −0.026 | −0.03 | Mg2+ | −2.393 | −7.933 | −2.888 | −2.259 | ||
Na+ | 0.348 | 0.226 | 0.331 | 0.306 | Na+ | 12.455 | 6.806 | 11.332 | 10.595 | ||
Cl− | −0.008 | 0.021 | −0.03 | −0.058 | Cl− | 0.233 | −1.746 | −1.539 | −0.277 | ||
HCO3− | −0.059 | −0.021 | −0.052 | −0.052 | HCO3− | 1.19 | −0.622 | −0.19 | 0.582 | ||
%Na | Intercept | 35.816 | −0.021 | 33.392 | 32.023 | IWQI | Intercept | 121.406 | 61.196 | 158.519 | 128.457 |
EC | 13.213 | −0.109 | 11.279 | −4.054 | EC | −1.2 | 1.528 | −0.846 | −4.285 | ||
TDS | −0.041 | 0.005 | −0.01 | 0.011 | TDS | −0.012 | 0.003 | −0.045 | −0.025 | ||
Mg2+ | −1.534 | −5.477 | −1.762 | −1.636 | Na+ | −22.597 | 0.234 | −13.132 | −22.377 | ||
Na+ | 10.863 | 6.837 | 10.035 | 12.865 | Cl− | −1.955 | 0.448 | −2.153 | −1.063 | ||
Cl− | −0.747 | 1.908 | −2.146 | −2.78 | SAR | −5.03 | 0.353 | −44.467 | −3.022 | ||
HCO3− | −4.85 | −2.43 | −5.291 | −4.784 | ESP | 42.4 | 0.3 | 59.8 | 44.6 | ||
PS | 4.591 | 0.427 | 3.872 | 4.5 | |||||||
PI | −0.198 | −0.035 | −0.362 | −0.215 | |||||||
%Na | 0.078 | 0.028 | 0.139 | 0.031 | |||||||
KR | −53.374 | 0.493 | −40.297 | −61.99 |
Performance evaluation of regression models in training and testing datasets
WQ parameter . | Statistical measures . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|
MLR . | PLSR . | PCR . | WLSR . | MLR . | PLSR . | PCR . | WLSR . | ||
SAR | RMSE | 0.21 | 0.26 | 0.16 | 0.23 | 0.24 | 0.26 | 0.24 | 0.28 |
MAPE | 0.27 | 0.23 | 0.17 | 0.25 | 0.22 | 0.23 | 0.16 | 0.23 | |
NSEC | 0.95 | 0.93 | 0.97 | 0.94 | 0.92 | 0.91 | 0.93 | 0.89 | |
R2 | 0.97 | 0.93 | 0.97 | 0.96 | 0.93 | 0.92 | 0.93 | 0.91 | |
VAF | 96.93% | 92.98% | 97.35% | 95.31% | 92.96% | 92.02% | 93.01% | 89.50% | |
WI | 0.988 | 0.982 | 0.993 | 0.986 | 0.981 | 0.975 | 0.981 | 0.975 | |
RBIAS | −29.57 | 0.00 | 0.00 | −24.10 | −5.34 | 8.37 | 5.50 | −2.46 | |
KR | RMSE | 0.17 | 0.18 | 0.15 | 0.17 | 0.21 | 0.24 | 0.23 | 0.25 |
MAPE | 0.48 | 0.41 | 0.35 | 0.42 | 0.35 | 0.34 | 0.28 | 0.37 | |
NSEC | 0.77 | 0.74 | 0.82 | 0.78 | 0.68 | 0.60 | 0.63 | 0.56 | |
R2 | 0.82 | 0.74 | 0.82 | 0.80 | 0.68 | 0.66 | 0.66 | 0.56 | |
VAF | 81.51% | 74.23% | 82.43% | 79.74% | 68.45% | 64.78% | 65.66% | 56.23% | |
WI | 0.940 | 0.921 | 0.949 | 0.937 | 0.895 | 0.853 | 0.870 | 0.846 | |
RBIAS | −39.12 | 0.00 | 0.00 | −28.74 | −1.41 | 16.11 | 11.71 | 4.06 | |
%Na | RMSE | 8.24 | 8.76 | 8.13 | 8.88 | 7.72 | 8.92 | 8.37 | 9.32 |
MAPE | 0.28 | 0.30 | 0.27 | 0.24 | 0.19 | 0.22 | 0.19 | 0.19 | |
NSEC | 0.61 | 0.56 | 0.62 | 0.55 | 0.67 | 0.56 | 0.61 | 0.52 | |
R2 | 0.61 | 0.56 | 0.62 | 0.60 | 0.69 | 0.63 | 0.64 | 0.57 | |
VAF | 61.18% | 55.77% | 61.93% | 54.92% | 68.72% | 61.78% | 64.45% | 52.79% | |
WI | 0.869 | 0.837 | 0.868 | 0.873 | 0.888 | 0.837 | 0.867 | 0.860 | |
RBIAS | −6.15 | 0.00 | 0.00 | −6.70 | 5.12 | 9.75 | 7.24 | 3.48 | |
PI | RMSE | 10.06 | 10.83 | 9.95 | 10.05 | 10.95 | 12.96 | 11.54 | 11.46 |
MAPE | 0.12 | 0.12 | 0.11 | 0.11 | 0.08 | 0.12 | 0.09 | 0.09 | |
NSEC | 0.64 | 0.59 | 0.65 | 0.64 | 0.60 | 0.43 | 0.55 | 0.56 | |
R2 | 0.64 | 0.58 | 0.65 | 0.65 | 0.62 | 0.44 | 0.57 | 0.61 | |
VAF | 64.34% | 58.55% | 65.05% | 64.47% | 59.91% | 44.55% | 55.61% | 56.24% | |
WI | 0.887 | 0.854 | 0.883 | 0.874 | 0.833 | 0.770 | 0.809 | 0.801 | |
RBIAS | −1.83 | 0.00 | 0.00 | −2.12 | 1.24 | 2.63 | 1.60 | 1.68 | |
ESP | RMSE | 0.26 | 0.39 | 0.25 | 0.28 | 0.30 | 0.39 | 0.33 | 0.40 |
MAPE | 3.20 | 4.60 | 2.20 | 1.27 | 0.50 | 0.99 | 0.49 | 0.88 | |
NSEC | 0.96 | 0.92 | 0.96 | 0.96 | 0.94 | 0.90 | 0.93 | 0.89 | |
R2 | 0.96 | 0.91 | 0.96 | 0.96 | 0.94 | 0.91 | 0.93 | 0.91 | |
VAF | 96.33% | 91.53% | 96.48% | 95.59% | 94.11% | 91.03% | 93.00% | 89.77% | |
WI | 0.99 | 0.98 | 0.99 | 0.99 | 0.98 | 0.97 | 0.98 | 0.97 | |
RBIAS | −15.19 | 0.00 | 0.00 | −7.07 | 11.18 | 29.91 | 16.00 | 15.13 | |
IWQI | RMSE | 5.75 | 8.31 | 5.45 | 5.96 | 6.75 | 8.82 | 6.93 | 7.25 |
MAPE | 0.07 | 0.10 | 0.07 | 0.07 | 0.09 | 0.12 | 0.09 | 0.09 | |
NSEC | 0.59 | 0.15 | 0.63 | 0.56 | 0.55 | 0.23 | 0.52 | 0.48 | |
R2 | 0.59 | 0.14 | 0.63 | 0.58 | 0.59 | 0.29 | 0.56 | 0.57 | |
VAF | 59.31% | 14.93% | 63.41% | 56.47% | 55.69% | 23.65% | 53.75% | 48.42% | |
WI | 0.861 | 0.496 | 0.877 | 0.861 | 0.867 | 0.496 | 0.855 | 0.862 | |
RBIAS | −0.63 | 0.00 | 0.00 | −1.57 | −1.77 | −1.64 | −2.10 | −1.41 |
WQ parameter . | Statistical measures . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|
MLR . | PLSR . | PCR . | WLSR . | MLR . | PLSR . | PCR . | WLSR . | ||
SAR | RMSE | 0.21 | 0.26 | 0.16 | 0.23 | 0.24 | 0.26 | 0.24 | 0.28 |
MAPE | 0.27 | 0.23 | 0.17 | 0.25 | 0.22 | 0.23 | 0.16 | 0.23 | |
NSEC | 0.95 | 0.93 | 0.97 | 0.94 | 0.92 | 0.91 | 0.93 | 0.89 | |
R2 | 0.97 | 0.93 | 0.97 | 0.96 | 0.93 | 0.92 | 0.93 | 0.91 | |
VAF | 96.93% | 92.98% | 97.35% | 95.31% | 92.96% | 92.02% | 93.01% | 89.50% | |
WI | 0.988 | 0.982 | 0.993 | 0.986 | 0.981 | 0.975 | 0.981 | 0.975 | |
RBIAS | −29.57 | 0.00 | 0.00 | −24.10 | −5.34 | 8.37 | 5.50 | −2.46 | |
KR | RMSE | 0.17 | 0.18 | 0.15 | 0.17 | 0.21 | 0.24 | 0.23 | 0.25 |
MAPE | 0.48 | 0.41 | 0.35 | 0.42 | 0.35 | 0.34 | 0.28 | 0.37 | |
NSEC | 0.77 | 0.74 | 0.82 | 0.78 | 0.68 | 0.60 | 0.63 | 0.56 | |
R2 | 0.82 | 0.74 | 0.82 | 0.80 | 0.68 | 0.66 | 0.66 | 0.56 | |
VAF | 81.51% | 74.23% | 82.43% | 79.74% | 68.45% | 64.78% | 65.66% | 56.23% | |
WI | 0.940 | 0.921 | 0.949 | 0.937 | 0.895 | 0.853 | 0.870 | 0.846 | |
RBIAS | −39.12 | 0.00 | 0.00 | −28.74 | −1.41 | 16.11 | 11.71 | 4.06 | |
%Na | RMSE | 8.24 | 8.76 | 8.13 | 8.88 | 7.72 | 8.92 | 8.37 | 9.32 |
MAPE | 0.28 | 0.30 | 0.27 | 0.24 | 0.19 | 0.22 | 0.19 | 0.19 | |
NSEC | 0.61 | 0.56 | 0.62 | 0.55 | 0.67 | 0.56 | 0.61 | 0.52 | |
R2 | 0.61 | 0.56 | 0.62 | 0.60 | 0.69 | 0.63 | 0.64 | 0.57 | |
VAF | 61.18% | 55.77% | 61.93% | 54.92% | 68.72% | 61.78% | 64.45% | 52.79% | |
WI | 0.869 | 0.837 | 0.868 | 0.873 | 0.888 | 0.837 | 0.867 | 0.860 | |
RBIAS | −6.15 | 0.00 | 0.00 | −6.70 | 5.12 | 9.75 | 7.24 | 3.48 | |
PI | RMSE | 10.06 | 10.83 | 9.95 | 10.05 | 10.95 | 12.96 | 11.54 | 11.46 |
MAPE | 0.12 | 0.12 | 0.11 | 0.11 | 0.08 | 0.12 | 0.09 | 0.09 | |
NSEC | 0.64 | 0.59 | 0.65 | 0.64 | 0.60 | 0.43 | 0.55 | 0.56 | |
R2 | 0.64 | 0.58 | 0.65 | 0.65 | 0.62 | 0.44 | 0.57 | 0.61 | |
VAF | 64.34% | 58.55% | 65.05% | 64.47% | 59.91% | 44.55% | 55.61% | 56.24% | |
WI | 0.887 | 0.854 | 0.883 | 0.874 | 0.833 | 0.770 | 0.809 | 0.801 | |
RBIAS | −1.83 | 0.00 | 0.00 | −2.12 | 1.24 | 2.63 | 1.60 | 1.68 | |
ESP | RMSE | 0.26 | 0.39 | 0.25 | 0.28 | 0.30 | 0.39 | 0.33 | 0.40 |
MAPE | 3.20 | 4.60 | 2.20 | 1.27 | 0.50 | 0.99 | 0.49 | 0.88 | |
NSEC | 0.96 | 0.92 | 0.96 | 0.96 | 0.94 | 0.90 | 0.93 | 0.89 | |
R2 | 0.96 | 0.91 | 0.96 | 0.96 | 0.94 | 0.91 | 0.93 | 0.91 | |
VAF | 96.33% | 91.53% | 96.48% | 95.59% | 94.11% | 91.03% | 93.00% | 89.77% | |
WI | 0.99 | 0.98 | 0.99 | 0.99 | 0.98 | 0.97 | 0.98 | 0.97 | |
RBIAS | −15.19 | 0.00 | 0.00 | −7.07 | 11.18 | 29.91 | 16.00 | 15.13 | |
IWQI | RMSE | 5.75 | 8.31 | 5.45 | 5.96 | 6.75 | 8.82 | 6.93 | 7.25 |
MAPE | 0.07 | 0.10 | 0.07 | 0.07 | 0.09 | 0.12 | 0.09 | 0.09 | |
NSEC | 0.59 | 0.15 | 0.63 | 0.56 | 0.55 | 0.23 | 0.52 | 0.48 | |
R2 | 0.59 | 0.14 | 0.63 | 0.58 | 0.59 | 0.29 | 0.56 | 0.57 | |
VAF | 59.31% | 14.93% | 63.41% | 56.47% | 55.69% | 23.65% | 53.75% | 48.42% | |
WI | 0.861 | 0.496 | 0.877 | 0.861 | 0.867 | 0.496 | 0.855 | 0.862 | |
RBIAS | −0.63 | 0.00 | 0.00 | −1.57 | −1.77 | −1.64 | −2.10 | −1.41 |
Factor importance in the PLSR model
Correlation of independent and explanatory variables with the latent vectors
Index . | Latent vector . | SAR . | KR . | %Na . | PI . | ESP . | IWQI . |
---|---|---|---|---|---|---|---|
R2X | LV1 | 0.666 | 0.648 | 0.740 | 0.656 | 0.773 | 0.624 |
LV2 | 0.776 | 0.746 | 0.868 | 0.752 | 0.863 | ||
R2Y | LV1 | 0.631 | 0.296 | 0.207 | 0.277 | 0.588 | 0.149 |
LV2 | 0.930 | 0.742 | 0.558 | 0.586 | 0.915 |
Index . | Latent vector . | SAR . | KR . | %Na . | PI . | ESP . | IWQI . |
---|---|---|---|---|---|---|---|
R2X | LV1 | 0.666 | 0.648 | 0.740 | 0.656 | 0.773 | 0.624 |
LV2 | 0.776 | 0.746 | 0.868 | 0.752 | 0.863 | ||
R2Y | LV1 | 0.631 | 0.296 | 0.207 | 0.277 | 0.588 | 0.149 |
LV2 | 0.930 | 0.742 | 0.558 | 0.586 | 0.915 |
Relative importance of each input variable for the prediction of (a) SAR; (b) KR; (c) %Na; (d) PI; (e) ESP; and (f) IWQI in the projection of LV1 and LV2.
Relative importance of each input variable for the prediction of (a) SAR; (b) KR; (c) %Na; (d) PI; (e) ESP; and (f) IWQI in the projection of LV1 and LV2.
Importance of input variables in the PCR model
Factor loadings of different chemical constituents of GW samples for prediction of (a) SAR; (b) KR and PI; (c) %Na and ESP; and (d) IWQI.
Factor loadings of different chemical constituents of GW samples for prediction of (a) SAR; (b) KR and PI; (c) %Na and ESP; and (d) IWQI.
Model adequacy based on input parameters in the PCR model
In this current research, some statistical measures are determined to check the adequacy of the PCR model for the prediction of SAR, KR, %Na, PI, ESP, and IWQI based on their possible input parameters. For this purpose, four statistical parameters are evaluated such as DW, Mallow's Cp, AIC, and BIC as shown in Table 17 to examine the goodness of fit of the PCR model. Similar statistical measures were studied by Kouadri et al. (2021). The DW statistic indicates no similarity in the dataset between the time series (2014–2021) for KR, %Na, and PI with the available input parameters. However, the dataset is positively and negatively correlated for SAR, ESP, and IWQI. The prediction of IWQI is poorly fitted with the PCR model for the highest Mallow's Cp (11). The PCR model shows the highest goodness of fit for the prediction of KR followed by SAR, IWQI, %Na, PI, and ESP, exhibiting lower AIC and BIC.
Subset regression analysis for input irrigation water quality parameters in the PCR model
WQ indices . | Input variables . | Durbin–Watson (DW) . | Mallow's Cp . | Akaike information criteria (AIC) . | Bayesian information criteria (BIC) . |
---|---|---|---|---|---|
SAR | EC/pH/Na+/Mg2+/TDS | 2.3 | 6 | −808.1 | −786.9 |
KR | EC/pH/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 8 | −943.6 | −915.4 |
%Na | EC/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 7 | 1,070 | 1,094.7 |
PI | EC/pH/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 8 | 1,173.9 | 1,202.2 |
ESP | EC/Na+/Mg2+/TDS/HCO3−/Cl− | 2.1 | 7 | 1,635.8 | 1,660.5 |
IWQI | EC/TDS/Na+/Cl−/SAR/KR/%Na/PI/ESP/PS | 1.9 | 11 | 900.9 | 939.7 |
WQ indices . | Input variables . | Durbin–Watson (DW) . | Mallow's Cp . | Akaike information criteria (AIC) . | Bayesian information criteria (BIC) . |
---|---|---|---|---|---|
SAR | EC/pH/Na+/Mg2+/TDS | 2.3 | 6 | −808.1 | −786.9 |
KR | EC/pH/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 8 | −943.6 | −915.4 |
%Na | EC/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 7 | 1,070 | 1,094.7 |
PI | EC/pH/Na+/Mg2+/TDS/HCO3−/Cl− | 2 | 8 | 1,173.9 | 1,202.2 |
ESP | EC/Na+/Mg2+/TDS/HCO3−/Cl− | 2.1 | 7 | 1,635.8 | 1,660.5 |
IWQI | EC/TDS/Na+/Cl−/SAR/KR/%Na/PI/ESP/PS | 1.9 | 11 | 900.9 | 939.7 |
Evaluation of results by ML models
K-fold cross-validation of the SVM model
Evaluation of (a) RMSE, (b) R2, and (c) MAE by using K-fold cross-validation for the SVM model.
Evaluation of (a) RMSE, (b) R2, and (c) MAE by using K-fold cross-validation for the SVM model.
Performance evaluation of ML models
Five ML models are used in this present study, i.e. ANN, SVM, CART, CRRF, and KNN, to predict SAR, KR, %Na, PI, ESP, and IWQI. Table 18 shows the statistical measures to check the reliability of the proposed models. The ANN model gives the highest prediction accuracy for SAR followed by ESP, KR, PI, and %Na having higher R2 as 0.98, 0.97, 0.94, 0.93, and 0.90, respectively, and lower RMSE as 0.11, 0.24, 0.09, 4.41, and 4.19, respectively, in the training period as compared with the testing dataset aligned with El Bilali & Taleb (2020) and Gad et al. (2023). ANN highly overestimates and underestimates the prediction for ESP (RBIAS = −5.37) and %Na (RBIAS = 3.91) in the training phase.
Performance evaluation of ML models in training and testing datasets
WQ parameter . | Statistical measures . | Training . | Testing . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
ANN . | SVM . | CART . | CRRF . | KNN . | ANN . | SVM . | CART . | CRRF . | KNN . | ||
SAR | RMSE | 0.11 | 0.16 | 0.18 | 0.38 | 0.46 | 0.11 | 0.25 | 0.37 | 0.23 | 0.41 |
MAPE | 0.11 | 0.14 | 0.08 | 0.11 | 0.41 | 0.10 | 0.15 | 0.14 | 0.11 | 0.36 | |
NSEC | 0.99 | 0.97 | 0.97 | 0.84 | 0.78 | 0.98 | 0.92 | 0.82 | 0.93 | 0.77 | |
R2 | 0.98 | 0.97 | 0.96 | 0.86 | 0.78 | 0.98 | 0.92 | 0.84 | 0.94 | 0.77 | |
VAF | 98.60% | 97.17% | 96.58% | 84.36% | 77.62% | 98.54% | 92.27% | 83.35% | 93.74% | 78.28% | |
WI | 0.996 | 0.993 | 0.991 | 0.949 | 0.929 | 0.996 | 0.979 | 0.944 | 0.980 | 0.937 | |
RBIAS | −0.22 | 1.88 | 0.00 | 4.58 | −7.50 | 2.06 | 6.55 | 8.76 | 6.29 | 7.69 | |
KR | RMSE | 0.09 | 0.15 | 0.15 | 0.17 | 0.22 | 0.10 | 0.25 | 0.25 | 0.20 | 0.23 |
MAPE | 0.14 | 0.28 | 0.13 | 0.20 | 0.50 | 0.11 | 0.26 | 0.26 | 0.18 | 0.33 | |
NSEC | 0.94 | 0.81 | 0.82 | 0.76 | 0.63 | 0.93 | 0.56 | 0.56 | 0.72 | 0.65 | |
R2 | 0.94 | 0.82 | 0.82 | 0.79 | 0.64 | 0.93 | 0.62 | 0.60 | 0.74 | 0.68 | |
VAF | 93.72% | 81.52% | 82.39% | 76.42% | 63.64% | 93.34% | 61.76% | 57.11% | 73.56% | 67.81% | |
WI | 0.983 | 0.948 | 0.949 | 0.916 | 0.874 | 0.982 | 0.854 | 0.874 | 0.907 | 0.881 | |
RBIAS | 1.40 | 12.83 | 0.00 | 4.00 | −9.92 | 1.32 | 17.06 | 6.10 | 7.79 | 11.96 | |
%Na | RMSE | 4.19 | 9.61 | 5.52 | 5.70 | 8.34 | 4.60 | 9.89 | 8.74 | 6.20 | 7.82 |
MAPE | 0.10 | 0.22 | 0.15 | 0.14 | 0.28 | 0.09 | 0.18 | 0.19 | 0.12 | 0.20 | |
NSEC | 0.90 | 0.47 | 0.82 | 0.81 | 0.60 | 0.88 | 0.46 | 0.58 | 0.79 | 0.66 | |
R2 | 0.90 | 0.59 | 0.82 | 0.82 | 0.60 | 0.89 | 0.55 | 0.60 | 0.81 | 0.70 | |
VAF | 90.01% | 46.82% | 82.46% | 81.30% | 60.21% | 88.98% | 48.82% | 58.31% | 80.56% | 69.16% | |
WI | 0.973 | 0.864 | 0.950 | 0.942 | 0.862 | 0.968 | 0.849 | 0.879 | 0.934 | 0.884 | |
RBIAS | 3.91 | −0.59 | 0.00 | −0.45 | −6.31 | 3.33 | 6.80 | 2.87 | 5.47 | 6.95 | |
PI | RMSE | 4.41 | 10.55 | 6.36 | 7.84 | 8.60 | 6.66 | 12.15 | 12.53 | 9.27 | 7.63 |
MAPE | 0.05 | 0.10 | 0.07 | 0.09 | 0.10 | 0.05 | 0.09 | 0.12 | 0.08 | 0.08 | |
NSEC | 0.93 | 0.61 | 0.86 | 0.78 | 0.74 | 0.85 | 0.50 | 0.47 | 0.71 | 0.80 | |
R2 | 0.93 | 0.61 | 0.86 | 0.79 | 0.74 | 0.86 | 0.56 | 0.52 | 0.71 | 0.81 | |
VAF | 93.18% | 61.44% | 85.72% | 78.32% | 73.97% | 85.09% | 52.74% | 47.19% | 71.07% | 80.78% | |
WI | 0.982 | 0.866 | 0.960 | 0.931 | 0.919 | 0.955 | 0.775 | 0.845 | 0.911 | 0.943 | |
RBIAS | −1.24 | 5.06 | 0.00 | −0.59 | −1.91 | 0.35 | 3.84 | 0.07 | −0.14 | 1.54 | |
ESP | RMSE | 0.24 | 0.28 | 0.51 | 0.51 | 0.62 | 0.16 | 0.37 | 0.34 | 0.29 | 0.54 |
MAPE | 1.63 | 2.12 | 1.50 | 2.65 | 11.10 | 0.62 | 0.62 | 0.88 | 0.61 | 2.03 | |
NSEC | 0.97 | 0.96 | 0.85 | 0.86 | 0.78 | 0.98 | 0.91 | 0.92 | 0.94 | 0.80 | |
R2 | 0.97 | 0.96 | 0.85 | 0.87 | 0.78 | 0.98 | 0.92 | 0.93 | 0.94 | 0.82 | |
VAF | 96.76% | 95.70% | 85.22% | 85.64% | 78.24% | 98.25% | 91.30% | 92.57% | 94.47% | 81.82% | |
WI | 0.99 | 0.99 | 0.96 | 0.95 | 0.93 | 1.00 | 0.98 | 0.98 | 0.98 | 0.95 | |
RBIAS | −5.37 | 2.64 | 0.00 | 19.02 | −30.95 | 0.15 | 16.70 | 11.24 | 14.08 | 32.26 | |
IWQI | RMSE | 4.22 | 5.84 | 3.58 | 3.57 | 3.96 | 4.83 | 7.83 | 5.46 | 5.27 | 4.29 |
MAPE | 0.05 | 0.07 | 0.04 | 0.03 | 0.05 | 0.06 | 0.11 | 0.07 | 0.06 | 0.05 | |
NSEC | 0.78 | 0.58 | 0.84 | 0.84 | 0.81 | 0.77 | 0.39 | 0.70 | 0.72 | 0.82 | |
R2 | 0.78 | 0.58 | 0.84 | 0.84 | 0.81 | 0.77 | 0.43 | 0.71 | 0.74 | 0.83 | |
VAF | 78.13% | 58.02% | 84.22% | 84.35% | 80.68% | 76.84% | 41.54% | 70.86% | 74.07% | 82.64% | |
WI | 0.936 | 0.842 | 0.956 | 0.956 | 0.944 | 0.931 | 0.779 | 0.901 | 0.916 | 0.944 | |
RBIAS | 0.46 | 0.30 | 0.00 | −0.60 | 0.57 | −0.48 | −2.73 | −1.18 | −2.23 | −1.65 |
WQ parameter . | Statistical measures . | Training . | Testing . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
ANN . | SVM . | CART . | CRRF . | KNN . | ANN . | SVM . | CART . | CRRF . | KNN . | ||
SAR | RMSE | 0.11 | 0.16 | 0.18 | 0.38 | 0.46 | 0.11 | 0.25 | 0.37 | 0.23 | 0.41 |
MAPE | 0.11 | 0.14 | 0.08 | 0.11 | 0.41 | 0.10 | 0.15 | 0.14 | 0.11 | 0.36 | |
NSEC | 0.99 | 0.97 | 0.97 | 0.84 | 0.78 | 0.98 | 0.92 | 0.82 | 0.93 | 0.77 | |
R2 | 0.98 | 0.97 | 0.96 | 0.86 | 0.78 | 0.98 | 0.92 | 0.84 | 0.94 | 0.77 | |
VAF | 98.60% | 97.17% | 96.58% | 84.36% | 77.62% | 98.54% | 92.27% | 83.35% | 93.74% | 78.28% | |
WI | 0.996 | 0.993 | 0.991 | 0.949 | 0.929 | 0.996 | 0.979 | 0.944 | 0.980 | 0.937 | |
RBIAS | −0.22 | 1.88 | 0.00 | 4.58 | −7.50 | 2.06 | 6.55 | 8.76 | 6.29 | 7.69 | |
KR | RMSE | 0.09 | 0.15 | 0.15 | 0.17 | 0.22 | 0.10 | 0.25 | 0.25 | 0.20 | 0.23 |
MAPE | 0.14 | 0.28 | 0.13 | 0.20 | 0.50 | 0.11 | 0.26 | 0.26 | 0.18 | 0.33 | |
NSEC | 0.94 | 0.81 | 0.82 | 0.76 | 0.63 | 0.93 | 0.56 | 0.56 | 0.72 | 0.65 | |
R2 | 0.94 | 0.82 | 0.82 | 0.79 | 0.64 | 0.93 | 0.62 | 0.60 | 0.74 | 0.68 | |
VAF | 93.72% | 81.52% | 82.39% | 76.42% | 63.64% | 93.34% | 61.76% | 57.11% | 73.56% | 67.81% | |
WI | 0.983 | 0.948 | 0.949 | 0.916 | 0.874 | 0.982 | 0.854 | 0.874 | 0.907 | 0.881 | |
RBIAS | 1.40 | 12.83 | 0.00 | 4.00 | −9.92 | 1.32 | 17.06 | 6.10 | 7.79 | 11.96 | |
%Na | RMSE | 4.19 | 9.61 | 5.52 | 5.70 | 8.34 | 4.60 | 9.89 | 8.74 | 6.20 | 7.82 |
MAPE | 0.10 | 0.22 | 0.15 | 0.14 | 0.28 | 0.09 | 0.18 | 0.19 | 0.12 | 0.20 | |
NSEC | 0.90 | 0.47 | 0.82 | 0.81 | 0.60 | 0.88 | 0.46 | 0.58 | 0.79 | 0.66 | |
R2 | 0.90 | 0.59 | 0.82 | 0.82 | 0.60 | 0.89 | 0.55 | 0.60 | 0.81 | 0.70 | |
VAF | 90.01% | 46.82% | 82.46% | 81.30% | 60.21% | 88.98% | 48.82% | 58.31% | 80.56% | 69.16% | |
WI | 0.973 | 0.864 | 0.950 | 0.942 | 0.862 | 0.968 | 0.849 | 0.879 | 0.934 | 0.884 | |
RBIAS | 3.91 | −0.59 | 0.00 | −0.45 | −6.31 | 3.33 | 6.80 | 2.87 | 5.47 | 6.95 | |
PI | RMSE | 4.41 | 10.55 | 6.36 | 7.84 | 8.60 | 6.66 | 12.15 | 12.53 | 9.27 | 7.63 |
MAPE | 0.05 | 0.10 | 0.07 | 0.09 | 0.10 | 0.05 | 0.09 | 0.12 | 0.08 | 0.08 | |
NSEC | 0.93 | 0.61 | 0.86 | 0.78 | 0.74 | 0.85 | 0.50 | 0.47 | 0.71 | 0.80 | |
R2 | 0.93 | 0.61 | 0.86 | 0.79 | 0.74 | 0.86 | 0.56 | 0.52 | 0.71 | 0.81 | |
VAF | 93.18% | 61.44% | 85.72% | 78.32% | 73.97% | 85.09% | 52.74% | 47.19% | 71.07% | 80.78% | |
WI | 0.982 | 0.866 | 0.960 | 0.931 | 0.919 | 0.955 | 0.775 | 0.845 | 0.911 | 0.943 | |
RBIAS | −1.24 | 5.06 | 0.00 | −0.59 | −1.91 | 0.35 | 3.84 | 0.07 | −0.14 | 1.54 | |
ESP | RMSE | 0.24 | 0.28 | 0.51 | 0.51 | 0.62 | 0.16 | 0.37 | 0.34 | 0.29 | 0.54 |
MAPE | 1.63 | 2.12 | 1.50 | 2.65 | 11.10 | 0.62 | 0.62 | 0.88 | 0.61 | 2.03 | |
NSEC | 0.97 | 0.96 | 0.85 | 0.86 | 0.78 | 0.98 | 0.91 | 0.92 | 0.94 | 0.80 | |
R2 | 0.97 | 0.96 | 0.85 | 0.87 | 0.78 | 0.98 | 0.92 | 0.93 | 0.94 | 0.82 | |
VAF | 96.76% | 95.70% | 85.22% | 85.64% | 78.24% | 98.25% | 91.30% | 92.57% | 94.47% | 81.82% | |
WI | 0.99 | 0.99 | 0.96 | 0.95 | 0.93 | 1.00 | 0.98 | 0.98 | 0.98 | 0.95 | |
RBIAS | −5.37 | 2.64 | 0.00 | 19.02 | −30.95 | 0.15 | 16.70 | 11.24 | 14.08 | 32.26 | |
IWQI | RMSE | 4.22 | 5.84 | 3.58 | 3.57 | 3.96 | 4.83 | 7.83 | 5.46 | 5.27 | 4.29 |
MAPE | 0.05 | 0.07 | 0.04 | 0.03 | 0.05 | 0.06 | 0.11 | 0.07 | 0.06 | 0.05 | |
NSEC | 0.78 | 0.58 | 0.84 | 0.84 | 0.81 | 0.77 | 0.39 | 0.70 | 0.72 | 0.82 | |
R2 | 0.78 | 0.58 | 0.84 | 0.84 | 0.81 | 0.77 | 0.43 | 0.71 | 0.74 | 0.83 | |
VAF | 78.13% | 58.02% | 84.22% | 84.35% | 80.68% | 76.84% | 41.54% | 70.86% | 74.07% | 82.64% | |
WI | 0.936 | 0.842 | 0.956 | 0.956 | 0.944 | 0.931 | 0.779 | 0.901 | 0.916 | 0.944 | |
RBIAS | 0.46 | 0.30 | 0.00 | −0.60 | 0.57 | −0.48 | −2.73 | −1.18 | −2.23 | −1.65 |
The SVM model shows its higher adequacy for the prediction of SAR, ESP, and KR in the training phase than the testing phase, having relatively higher R2 as 0.97, 0.96, and 0.82 in the training phase and lower RMSE as 0.16, 0.15, and 0.28, respectively, and similar findings are observed by Elsayed et al. 2020 and Mokhtar et al. (2022). However, it fails to give satisfactory prediction in the case of %Na and IWQI, having relatively lower VAF and NSEC both in training and testing phases, aligned with Wang et al. (2020). SVM provides the highest underestimation for KR in the testing phase (RBIAS = 17.06) as compared with the training phase (RBIAS = 12.83).
The CART model gives better performance in the training dataset for all the WQ variables, having significantly higher R2 and VAF. However, it only exhibits higher predictive strength for SAR and ESP in the testing phase, having higher R2 and VAF as 0.93, 0.84, 83.35%, and 92.57%, respectively. The CART model perfectively fits the observed data for all the six parameters in the training dataset with RBIAS being zero.
The CRRF model provides comparatively higher prediction for SAR and ESP in the testing phase as compared with the training phase, having significantly higher R2 as 0.94 for both the variables and VAF as 93.74 and 94.47%, respectively, in the testing phase. On the contrary, it gives moderately higher prediction for %Na and IWQI in the training phase (R2%Na = 0.82 and R2IWQI = 0.84) as compared with the testing phase (R2%Na = 0.81 and R2IWQI = 0.74). KNN provides the highest predictive strength for IWQI in the testing phase (R2 = 0.84; VAF = 82.64%) than in the training phase (R2 = 0.81; VAF = 80.68%), with relatively lower overestimation in the testing phase (RBIAS = −1.65). Also, it provides moderately higher prediction accuracy in the testing phase for KR, %Na, PI, and ESP than in the training phase, having R2 as 0.64, 0.60, 0.79, and 0.78, respectively, in the training phase and 0.68, 0.70, 0.81, and 0.82, respectively, in the testing phase, similar to the findings of El Bilali & Taleb (2020). It significantly overestimates (RBIAS = −30.95) and underestimates (RBIAS = 32.26) the prediction of ESP in training and testing datasets, respectively.
Variable importance in the CRRF model
Variable importance in the CRRF model by mean increase error for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.
Variable importance in the CRRF model by mean increase error for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.
Statistical evaluation (mean and standard deviation) of estimated and predicted values for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.
Statistical evaluation (mean and standard deviation) of estimated and predicted values for (a) SAR, (b) KR, (c) %Na, (d) PI, (e) ESP, and (f) IWQI.
Classification accuracy of ML models
Eight statistical parameters, i.e. sensitivity (SE), specificity (SP), accuracy (A), positive predictive value (PPV), false positive rate (FPR), Mathew's correlation coefficient (MCC), F1-score, and error rate (ERR), are evaluated from the confusion matrix for the assessment of classification accuracy of water restriction zones by the proposed ML models, namely, SVM, CART, CRRF, and KNN in training and testing periods. From Table 19, it is observed that CART and CRRF models have significantly higher classification accuracy than the other two models, having comparatively higher SE, SP, A, MCC, and F1-score in both training and testing datasets similar to the findings by Bhoi et al. (2022) and Khan et al. (2022). However, the CART (ERRtraining = 0.10 and ERRtesting = 0.10) model has more misclassification rate than the CRRF (ERRtraining = 0.06 and ERRtesting = 0.05) model. However, KNN provides the highest misclassification rate (ERR = 0.14) followed by SVM (ERR = 0.12) in the testing dataset.
Accuracy assessment of ML models for training and testing datasets from the confusion matrix
Statistical measure . | SVM . | CART . | CRRF . | KNN . | ||||
---|---|---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | Training . | Testing . | Training . | Testing . | |
SE | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
SP | 0.94 | 0.92 | 0.94 | 0.94 | 0.96 | 0.97 | 0.93 | 0.91 |
A | 0.90 | 0.88 | 0.90 | 0.90 | 0.94 | 0.95 | 0.89 | 0.86 |
PPV | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
FPR | 0.06 | 0.08 | 0.06 | 0.06 | 0.04 | 0.03 | 0.07 | 0.09 |
MCC | 0.68 | 0.62 | 0.68 | 0.68 | 0.81 | 0.84 | 0.70 | 0.62 |
F1-score | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
ERR | 0.10 | 0.12 | 0.10 | 0.10 | 0.06 | 0.05 | 0.11 | 0.14 |
Statistical measure . | SVM . | CART . | CRRF . | KNN . | ||||
---|---|---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | Training . | Testing . | Training . | Testing . | |
SE | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
SP | 0.94 | 0.92 | 0.94 | 0.94 | 0.96 | 0.97 | 0.93 | 0.91 |
A | 0.90 | 0.88 | 0.90 | 0.90 | 0.94 | 0.95 | 0.89 | 0.86 |
PPV | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
FPR | 0.06 | 0.08 | 0.06 | 0.06 | 0.04 | 0.03 | 0.07 | 0.09 |
MCC | 0.68 | 0.62 | 0.68 | 0.68 | 0.81 | 0.84 | 0.70 | 0.62 |
F1-score | 0.74 | 0.69 | 0.75 | 0.74 | 0.85 | 0.87 | 0.78 | 0.72 |
ERR | 0.10 | 0.12 | 0.10 | 0.10 | 0.06 | 0.05 | 0.11 | 0.14 |
Results of the Friedman test
The Friedman test is carried out by determining p-value (Table 20) from the correlation matrix to show whether any significant difference is present between any two models at a significance level of p < 0.05. For the prediction of SAR, KR, %Na, PI, and ESP, a significant difference of outputs is observed between MLR and other remaining models at a significance level of p < 0.05. For IWQI, only a significant difference of output is observed between PLSR and SVM at p = 0.041.
CONCLUSION
This study significantly highlights the importance of RS and GIS techniques with the efficient implementation of different regression and ML models for assessment of some important IWQIs of GW for Sundargarh district. A significant number of wells contain good quality of GW based on the spatio-temporal variation of SAR (100%), KR (93%), %Na (80%), ESP (100%), and IWQI (92%), indicating higher suitability of GW for irrigation purpose during the pre-monsoon season. Dug wells of Durubaga, Ekma, Balichuuan, Banki, Lokedega, Balijori, Krinjikela, Talsara, Moshani Kani, Lathikata, R-06 Basanti Colony, R-07 Uditnagar 1, Himgiri, Laxmipose, and Bhasma contain poor quality of GW constituting BGC, schist, and alluvium aquifers. The variation of SAR, KR, %Na, and ESP follows a long-term monotonic increasing trend, and the variation of PI and IWQI shows a long-term monotonic decreasing trend at p > 0.1 during 2014–2021. PCA reveals that salinity and sodicity factors are the governing parameters significantly dominating the GW quality through the prediction of IWQI. Subsequently, DA exhibits that Na+, Cl−, SAR, KR, and PS are the governing water quality variables for showing the spatial variability of GW quality throughout the district significantly. All the regression models demarcate their higher predictive strength for the prediction of SAR and ESP. However, the PCR model shows its higher acceptability for the prediction of KR, %Na, PI, and IWQI. ANN shows the highest predictive strength for prediction of SAR, KR, %Na, PI, and ESP based on the input parameters. On the contrary, the CART model can have comparatively higher prediction accuracy for all the WQ variables followed by CRRF, SVM and KNN models. This observation clearly illustrates that the regression and ML models can be effectively implemented for their robustness and cost-effectiveness over the expansive and time-consuming conventional methods in arid and semi-arid regions for long-term data variation. Also, Friedman test results reflect that significant variation between the regression and ML models is observed for the predictive value for all the WQ indices except for IWQI. Overall, the outcomes of this present study showcase that these regression and ML models can be highly suitable for efficient GW quality management, supporting the sustainable utilization and management of water resources for Sundargarh district.
LIMITATIONS AND FUTURE SCOPE
The only disadvantage of this present study is the data limitation for WQ parameters as the data used in the analysis were only for the pre-monsoon season during 2014–2021. This limitation fails to show the spatio-temporal variation and trend of these WQ indices during post-monsoon and monsoon seasons, which can provide sufficient information on seasonal GW quality for the proper monitoring and management of GW for sustainable use mainly for industrial and agricultural activities for this district.
On the other hand, in future, the effect of seasonal GW quality parameters on the cultivation of paddy and its crop characteristics can be studied comprehensively to develop a cost-effective irrigation management system for Sundargarh district.
CONSENT TO PARTICIPATE
All authors consent to participate.
CONSENT TO PUBLISH
All authors consent to publish.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.