Assessing groundwater quality is vital for irrigation, but financial constraints in developing countries often result in infrequent sampling. This study comprehensively analyzes the groundwater quality of the El Moghra aquifer in Egypt's arid Western Desert, for its suitability for irrigation uses. Detailed hydrochemical analysis and advanced machine learning (ML) techniques, including geographic information systems, were employed to enhance spatial analysis and predictive accuracy. Various ML models, such as random forest, adaptive boosting, and extreme gradient boosting (XGBoost), were optimized using Bayesian optimization to predict the irrigation water quality index (IWQI) accurately. The evaluation incorporated visual and quantitative methods, alongside ranking analysis, to validate model effectiveness. Shapley Additive exPlanations feature importance analysis and a graphical user interface (GUI) model were developed based on the best predictive model. The results indicated that the groundwater quality is generally suitable for irrigation, with XGBoost showing the best performance, achieving a root mean square error of 5.602 and a determination coefficient (R²) of 0.872. Sodium concentration was identified as the most significant factor affecting the IWQI. The GUI facilitates easy prediction of IWQI, aiding agricultural water management and resource allocation within the region.

  • Optimized ML models predict IWQI, with XGBoost achieving top performance (RMSE 5.602 & R² 0.872).

  • Sodium concentration identified as the most influential factor in groundwater quality.

  • Bayesian optimization significantly boosts model accuracy and effectiveness.

  • The user-friendly GUI helps farmers predict IWQI for better irrigation management.

Groundwater is a crucial resource in arid regions, especially in Egypt, where deserts constitute 96% of the land area (Eltarabily et al. 2018). The El Moghra aquifer is essential for agricultural activities, necessitating thorough quality assessment (Eltarabily & Moghazy 2021). The national plan aimed to reclaim one and a half million feddans of desert land through a phased approach (Global Agricultural Information Network 2016). These projects mainly depend on 88% of groundwater for irrigation use due to the restricted water supply from the River Nile, highlighting the importance of groundwater quality for agricultural use in selecting project sites across deserts.

The quality of groundwater is affected by various hydrochemical processes such as dissolution, ion exchange, rock weathering, and biological activities (Jeevanandam et al. 2007). Several studies have focused on the hydrochemical characterization of groundwater (Jianhua et al. 2009). These studies also reveal interactions between groundwater and aquifer minerals, indicating aquifer uniformity or variability, and its transmissivity, storage, and conductivity (Wells & Price 2015). The geological structure and formations influence the composition of groundwater and the direction of its movement (Selvakumar et al. 2017). In regions where aquifer recharge occurs, dissolution processes are generally predominant, while ion exchange mostly happens during the flow of groundwater. Furthermore, the chemical composition and type of groundwater are primarily influenced by factors such as the rate of evaporation, intensity of precipitation, infiltration rate, and ion exchange (Khan et al. 2019).

Geographic information system (GIS) has been widely used to assess groundwater quality and visualize spatial distribution (Schiewe 2003). GIS is essential for quantifying and interpreting groundwater issues (Yadav et al. 2018). Additionally, the integration of GIS and remote sensing has facilitated the measurement of aquifer thickness, the delineation of fault locations, and the study of their development through the resistivity of aquifer technique (Belkhiri & Mouni 2012). Rufino et al. (2019) used the GIS-based groundwater quality index (GQI) and SINTACS to assess groundwater quality for drinking and irrigation purposes in the Agro-Aversano region of southern Italy. Moreover, principal component and multivariate analyses are commonly used for their precision and dependability in qualitative and statistical analysis to interpret the hydrochemical parameters of groundwater and to identify its faces and types (Busico et al. 2020).

Statistical techniques such as multivariate analysis have been employed to identify factors influencing water quality (Machiwal et al. 2018). The effective application of these methods significantly improves the accurate analysis of groundwater quality, which is essential for formulating dependable management strategies for groundwater resources (Machiwal & Jha 2012). Recently, the incorporation of advanced statistical tools within GIS has effectively identified hydrochemical regimes and management of groundwater approaches for complex aquifers affected by human activities (Machiwal & Jha 2015). This integrated approach facilitates the mapping of groundwater chemical composition across extensive areas and regional scales.

Various indicators have been globally developed to characterize water quality for irrigation (Doneen 1964). The water quality index (WQI), initially established by Horton (1965) and further refined by Brown et al. (1972), has been modified by Meireles et al. (2010) to introduce irrigation water quality index (IWQI). This index includes variables such as EC, SAR, , Na+, and Cl. This approach enables decision-makers to effectively evaluate the quality and potential risks associated with different water types using a comprehensive set of parameters (Şener et al. 2021). The IWQI aids in the comparison and assessment of various water samples, helping to reduce negative impacts on soil and vegetation (Batarseh et al. 2021). The use of IWQI also supports the feasibility of drilling wells for agricultural purposes in regions affected by excessive groundwater salinization (El Bilali & Taleb 2020). This method is increasingly employed in numerous research initiatives for its ability to tackle complex problems and clarify the connections between input and output data.

Despite extensive studies on groundwater dynamics, there is a notable gap in applying advanced machine learning (ML) techniques to predict IWQI. This study aims to bridge this gap by employing models such as RF, AdaBoost, and XGBoost to provide accurate predictions and practical solutions for groundwater management. While foundational studies have elucidated basic hydrological and geological characteristics (Abdel Mogith et al. 2012) and addressed aspects of groundwater management (Sayed et al. 2019) and geological surveying (Mohamaden et al. 2016), none have leveraged the predictive power of ML to enhance irrigation decisions. This oversight represents a critical research gap, particularly given the potential of ML to provide detailed, accurate forecasts from complex datasets. Thus, the following are the authors' specific contributions:

  • (1) To predict IWQI based on the hydrochemical parameters to assess the suitability of the El Moghra aquifer's groundwater for irrigation purposes.

  • (2) To implement the BO method for hyperparameter tuning to enhance the performance of the adopted ML models.

  • (3) To establish a comprehensive evaluation framework that incorporates visual, quantitative methods, and ranking analysis to ensure rigorous validation of model effectiveness.

  • (4) To conduct SHAP feature importance analysis to identify key parameters influencing IWQI, offering insights for targeted improvements in ML models.

  • (5) To develop a user-friendly graphical user interface (GUI) to facilitate the straightforward prediction of IWQI based on the best predictive model.

El Moghra is located in the northeastern segment of Egypt's Western Desert (Figure 1), covering an area of approximately 966 km². It lies at the northeastern edge of the Qattara Depression, 40 km south of El-Hamam city and 40 km west of El-Alamein town. The Egyptian Countryside Development Company (ECDC) oversees the project (https://elreefelmasry.com). The region's climate is arid with hot summers and warm winters, and infrequent rainfall. Temperature variations are moderate, with monthly averages ranging from 7.8 to 17.8 °C in January and from 21 to 30 °C in August (https://www.weather-atlas.com/en/egypt/el-dabaa-climate#temperature). The annual mean maximum and minimum temperatures are 24.4 and 14.2 °C, respectively. Precipitation is minimal, ranging from 0 mm in summer to 28–32 mm in winter, with an annual average of 9.9 mm. Wind speeds range from 1.25 to 5.5 km/h, with an annual average daylight and sunshine hours of 12.2 and 9.2 h, respectively. The daily evaporation rate averages from 5.1 mm in December to 14 mm in June. The average daily reference evapotranspiration (ETo) is 3.08 mm, calculated using the Penman-Monteith method via CROPWAT 8.0 software (https://www.fao.org/land-water/databases-and-software/cropwat/en/).
Figure 1

Western Desert in Egypt. (a) Location map of the El Moghra region and (b) aerial map of the study area showing terrain and project (Eltarabily & Moghazy 2021).

Figure 1

Western Desert in Egypt. (a) Location map of the El Moghra region and (b) aerial map of the study area showing terrain and project (Eltarabily & Moghazy 2021).

Close modal

Geological characteristics

The El Moghra aquifer consists mainly of lower Miocene formations, characterized by a shallow marine clastic sequence of sand, sandstone, and clay lenses. This formation extends from the Western Nile Delta to the Qattara Depression and from the Mediterranean Sea to the El-Fayoum Depression, covering around 50,000 km² (Eltarabily & Negm 2018). The upper Quaternary deposits in the region, 2.0–3.0 m thick, are mainly sand dunes. The Miocene layer, 20.0–50.0 m thick, consists of sand, sandstone, shale, and fossils, overlaying Oligocene deposits (150–200 m thick) of sand and gravel (Khan et al. 2014). The subsequent layer is the Upper Eocene deposit, made up of sandstone and limestone, with an average thickness of 600 m (Mohamaden et al. 2016). Faults are found on the eastern and western sides of the project area. Belal et al. (2018) evaluated the suitability of the El Moghra area for irrigation and agriculture using the Cervatana model, integrating GIS, remote sensing, and MicroLEIS for mapping cropping patterns. Their findings showed that 0.69% of the land is highly suitable, 72.16% is moderately suitable, 6.87% is marginal, and 20.28% is non-productive. Limitations to land capability include soil texture and drainage, aridity, and erosion risks. Highly suitable lands mainly consist of Torripsamments soil in lowlands and gravel plains, while marginal and non-productive lands are composed of Sabkha soil near the Qattara Depression (Kady et al. 2018).

Hydrological characteristics

The El Moghra aquifer, with a thickness ranging from 50 to 250 m, directs its groundwater flow toward the Qattara Depression, characterized by a hydraulic gradient of less than 0.2 m/km. The aquifer's transmissivity varies from 700 to 6,500 m²/day, indicating significant variability in water movement potential. Groundwater within the aquifer includes both renewable sources and fossil water (El Tahlawi et al. 2008). Salinity levels also show substantial variation, with TDS ranging from 290 mg/L near Wadi El-Farigh to 31,000 mg/L near the Qattara Depression (Abdel Mogith et al. 2012). Approximately 200 million m³ of water is lost annually due to evaporation and lateral seepage, primarily in the Qattara and Wadi El Natroun Depressions. Lateral seepage into the Marmarica limestone is distinguished on the western side of Qattara. Recharge sources for the aquifer include lateral groundwater movement at its eastern boundary, upward flow from the Nubian Sandstone Aquifer System (NSAS), and minor contributions from infiltrated precipitation (Dawoud et al. 2005). The study area exhibits elevations from −21 to +60 m relative to mean sea level (MSL), with an average elevation of approximately +26 m. The topographical map shown in Figure 2 includes well locations and IDs and provides detailed insights into the region's ground elevations.
Figure 2

Topographical map of the El Moghra aquifer (Eltarabily & Moghazy 2021).

Figure 2

Topographical map of the El Moghra aquifer (Eltarabily & Moghazy 2021).

Close modal

Database collection

The database for ML modeling was collected from Eltarabily & Moghazy (2021) study of 46 groundwater samples that were measured from November 2018 to December 2019. These samples were selected based on the availability of driven wells in plots that were either mostly unreclaimed or where the wells had not been excavated. The average depth of the water table ranged from 70 to 80 m. The groundwater was continuously discharged for at least 30 min to stabilize these parameters before measuring temperature, pH, and EC. The samples were stored in polypropylene bottles and transported to laboratories for hydrochemical analysis. In situ, pH measurements were performed using a pH meter with a precision of ±0.1 pH. EC and TDS were determined with meters accurate to 2% over a temperature range of 0.1–80 °C. The collection analysis included three major anions: , Cl, , and four major cations: Ca2+, Na+, Mg2+, K+, following the standard methods of Rice et al. (2012).

Calcium and magnesium levels were quantified using the complex cation method and ethylenediaminetetraacetic acid disodium salt dehydrate (EDTA). Sodium and potassium were quantified using a digital flame photometer. Chloride concentration was determined using a standard silver nitrate (AgNO3 (0.01 N)) titration with potassium chromate (K2CrO4) as a 5% indicator solution, ceasing upon observing a red-brown color. Consistent titers were obtained after three readings. concentration was estimated using a volumetric method with 0.02 N sulfuric acid (H2SO4), using methyl orange and phenolphthalein as indicators. The endpoint for phenolphthalein alkalinity was marked by the disappearance of the pink color, and total alkalinity was confirmed when the bromocresol green indicator changed from blue to yellow at a pH of 4.5. concentration was measured using a UV/visible spectrophotometer, Unicam model UV4-200 (UK).

Statistical analysis

Table 1 presents the descriptive analysis of the hydrochemical characteristics of the groundwater samples. The table shows that the TDS levels are crucial as they indicate the overall concentration of dissolved substances in water, ranging from 3,328 to 14,008 mg/L across the samples, with an average of 6,090.13 mg/L. Ntanganedzeni et al. (2018) found comparable results in their analysis of brackish groundwater. In El Moghra, the increased TDS levels are associated with the presence of sandstone in the Oligocene deposits and limestone from the upper Eocene. This indicates an upward migration of groundwater from the lower NSAS to the upper El Moghra aquifer. The pH values ranged from 5.7 to 7.8, with an average of 7.1 ± 0.3. This indicates that most samples are slightly alkaline, except for one outlier with a lower pH (5.70), suggesting localized acidity. EC varied widely from 5.00 to 19.00 dS/m, averaging 8.33 dS/m. Higher EC values suggest elevated salinity, impacting water quality for irrigation purposes. Ion concentrations in the groundwater were measured as follows: Ca2+ ranged from 113 to 637 mg/L (average 330.5 ± 109.4 mg/L), Na+ from 490 to 3,249 mg/L (average 1235.8 ± 568.2 mg/L), Mg2+ from 48 to 393 mg/L (average 174.4 ± 81.9 mg/L), and K+ from 15 to 82 mg/L (average 26.3 ± 15.9 mg/L). Among these, Na+ was the most abundant cation, followed by Ca2+, Mg2+, and K+. Abbas et al. (2018) also reported similar findings, noting a predominance of Mg2+ over Ca2+, which contributes to magnesium hazards and alkaline groundwater conditions, particularly in the eastern parts of their study area. Cl was the predominant ion, ranging from 1,056 to 5,911 mg/L, averaging 2143.4 ± 994 mg/L. The high Cl levels are attributed to inadequate salinity drainage in the area and natural rock–water interactions that enhance groundwater salinity, as also observed by Gu et al. (2017). concentrations varied between 58 and 369 mg/L, while ranged from 231 to 2,086 mg/L, averaging 1051.5 ± 396.9 mg/L.

Table 1

Descriptive statistics of the hydrochemical characteristics of the groundwater samples (Eltarabily & Moghazy 2021)

ParameterUnitMinimumMaximumMeanStd. error
TDS mg/L 3,328 14,008 6,090.13 336.50 
pH – 5.70 7.80 7.11 0.05 
EC dS/m 5.00 19.00 8.33 0.46 
 mg/L 58.00 369.00 126.87 10.62 
Cl mg/L 1,056.00 5,911.00 2,143.37 146.56 
 mg/L 231.00 2,086.00 1,051.52 58.53 
Na+ mg/L 490.00 3,249.00 1,235.76 83.78 
Ca2+ mg/L 113.00 637.00 330.48 16.14 
Mg2+ mg/L 48.00 393.00 174.43 12.07 
K+ mg/L 15.00 82.00 26.26 2.34 
ParameterUnitMinimumMaximumMeanStd. error
TDS mg/L 3,328 14,008 6,090.13 336.50 
pH – 5.70 7.80 7.11 0.05 
EC dS/m 5.00 19.00 8.33 0.46 
 mg/L 58.00 369.00 126.87 10.62 
Cl mg/L 1,056.00 5,911.00 2,143.37 146.56 
 mg/L 231.00 2,086.00 1,051.52 58.53 
Na+ mg/L 490.00 3,249.00 1,235.76 83.78 
Ca2+ mg/L 113.00 637.00 330.48 16.14 
Mg2+ mg/L 48.00 393.00 174.43 12.07 
K+ mg/L 15.00 82.00 26.26 2.34 

Figure 3 shows histograms illustrating the frequency distribution of hydrochemical parameters across different ranges. The majority of samples have TDS concentrations around 3,328 mg/L, with 35 occurrences in this range. There are fewer samples with higher TDS values: eight samples in the range of 6,888 mg/L, three samples in the range of 10,448 mg/L, and no significant occurrence at the highest value of 14,008 mg/L. The pH distribution indicates that most water samples are within a neutral to slightly basic range. The highest frequency, 28 samples, is observed at a pH of 7.8. There are 17 samples with a pH of 7.1, indicating neutral water. Only one sample shows a pH of 5.7, indicating slightly acidic water, which is relatively rare compared with the other ranges. Most samples have low EC, with 33 occurrences at 0 ds/m. The next most frequent range, 9.67 ds/m, has 10 samples. The frequency further decreases with increasing conductivity, with only three samples at 14.33 ds/m and no significant occurrence at the highest value of 19.00 ds/m.
Figure 3

Histograms showing the frequency of the groundwater samples for each hydrochemical parameter.

Figure 3

Histograms showing the frequency of the groundwater samples for each hydrochemical parameter.

Close modal

For bicarbonate, the most common concentration is 0.951 mg/L, with 40 occurrences. There are minimal samples with higher bicarbonate concentrations: one sample at 2.650 mg/L and five samples at 6.048 mg/L. This indicates that low bicarbonate concentrations are predominant in the dataset. Chloride levels are mostly concentrated around 29.8 mg/L, with 36 occurrences. There are fewer samples with higher chloride concentrations: eight samples at 75.4 mg/L and two samples at 121.1 mg/L. The highest range of 166.7 mg/L has very few occurrences, indicating that high chloride levels are less common. The histogram for sulfate shows that the highest frequency, 26 samples, is at 30.56 mg/L. There are also significant occurrences at 17.68 mg/L with 14 samples. The lowest and highest ranges, 4.81 and 43.43 mg/L, have fewer samples, 14 and 6 respectively, indicating a broader spread of sulfate concentrations.

Sodium concentrations are most frequent at 21.3 mg/L with 30 samples. There are 14 samples with sodium levels at 61.3 mg/L, indicating a significant number of occurrences in this range. Higher sodium concentrations, such as 101.3 and 141.3 mg/L, are less frequent, with only two samples each, showing that higher sodium levels are rare. The calcium distribution shows the highest frequency at 23.07 mg/L with 24 samples. There are also significant occurrences at 14.35 mg/L with 17 samples. Lower and higher concentrations, such as 5.64 and 31.79 mg/L, have fewer samples, 17 and 5, respectively, indicating moderate variability in calcium levels. Magnesium levels are most common at 3.95 mg/L with 24 samples. The next significant range is 13.41 mg/L with 16 samples. Higher magnesium concentrations, such as 22.88 and 32.34 mg/L, have fewer samples, 6 and 4, respectively, suggesting a decline in frequency with increasing magnesium levels. The histogram for potassium shows that the highest frequency is at 0.384 mg/L with 40 occurrences. There are minimal samples at higher concentrations: three samples each at 0.955, 1.526, and 2.097 mg/L. This indicates that low potassium levels are predominant in the dataset.

Correlation analysis

Examining the correlation between variables is crucial for comprehending the connections between dependent features and the target strength factor, as this analysis seeks to determine the most effective prediction model. This method's most widely used measure is the Pearson correlation coefficient (r), which helps to understand these relationships (Williams et al. 2020; Selim et al. 2024). It can be calculated as the ratio of the covariance (cov) of two variables (x, y) to the product of their standard deviations, as represented in Equation (1):
(1)
where and are the mean of two variables x and y; n is the number of a dataset.
Figure 4 illustrates the correlation matrix for various water quality parameters in the form of a heatmap. Starting with TDS, there is a very weak positive correlation with pH (0.09), but a strong positive correlation with EC at 0.92. TDS also shows a weak positive correlation with HCO₃⁻ at 0.17, and a strong positive correlation with Cl⁻ at 0.83. The correlation with SO42− is moderately positive at 0.59. Strong positive correlations are observed with Na+ at 0.78, Ca2+ at 0.76, and Mg2+ at 0.80, while the correlation with K+ is moderately positive at 0.62.
Figure 4

Heatmap correlation matrix for the hydrochemical characteristics of the groundwater samples.

Figure 4

Heatmap correlation matrix for the hydrochemical characteristics of the groundwater samples.

Close modal

The pH parameter shows weak positive correlations with EC (0.19), HCO3 (0.32), Cl (0.27), Na+ (0.24), and K+ (0.29). However, it has weak negative correlations with SO42− (−0.14), Ca2+ (−0.03), and Mg2+ (−0.03). For EC, there is a weak positive correlation with HCO3 (0.26) but very strong positive correlations with Cl (0.96), Na+ (0.93), and Ca2+ (0.67). EC also has strong positive correlations with Mg2+ (0.60) and a moderate positive correlation with K+ (0.42). HCO3 shows weak positive correlations with Cl (0.25), SO42− (0.07), Mg2+ (0.23), and K+ (0.19), but a moderate positive correlation with Na+ (0.49) and Ca2+ (0.26). Cl has a very strong positive correlation with Na+ (0.92), moderate positive correlations with Ca2+ (0.65) and K+ (0.47), and a strong positive correlation with Mg2+ (0.56). SO42− shows weak positive correlations with Na+ (0.33), Ca2+ (0.58), Mg2+ (0.49), and K+ (0.42). Na+ (sodium) is moderately positively correlated with Ca2+ (0.65) and Mg2+ (0.67), and strongly positively correlated with K+ (0.72). For Ca2+ (calcium), there are moderate positive correlations with Mg2+ (0.49) and K+ (0.47). Lastly, Mg2+ shows a moderate positive correlation with K+ (0.47). In summary, the heatmap indicates several strong correlations among the water quality parameters, particularly between TDS, EC, Cl, Na+, Ca2+, and Mg2+.

Figure 5 shows the scatter pair plots with trend lines, illustrating the relationships between the hydrochemical parameters used for groundwater assessment. Each plot in the matrix provides insights into the correlations and potential dependencies among these parameters. The scatter plot between TDS and pH shows a very weak positive correlation, suggesting that changes in TDS have a minimal impact on pH levels. The trend line is nearly flat, indicating that pH is relatively stable across different TDS values. In contrast, the relationship between TDS and EC is notably strong, evidenced by the tight clustering of points along an upward trend line. This strong positive correlation indicates that as TDS increases, EC also increases significantly, reflecting the fact that TDS represents the total concentration of dissolved ions contributing to electrical conductivity. Similarly, the plots reveal strong positive correlations between TDS and Na+, and between TDS and Cl, suggesting that increases in TDS are closely related to increases in these ions.
Figure 5

Scatter pair plot matrix for the hydrochemical characteristics of the groundwater samples.

Figure 5

Scatter pair plot matrix for the hydrochemical characteristics of the groundwater samples.

Close modal

Examining the scatter plot between TDS and HCO3, a weak positive correlation is observed. While there is a slight upward trend, the spread of points indicates that bicarbonate levels do not strongly depend on TDS. The relationship between TDS and SO42− is moderately positive, suggesting that sulfate levels tend to increase with TDS, although the correlation is not as strong as with EC or Cl. The strong positive correlation between TDS and Na+, as well as between TDS and Mg2+, further indicates that these ions significantly contribute to the total dissolved solids in the water. The correlation between TDS and Ca2+ is also moderate to strong, with an upward trend indicating that calcium levels tend to rise with increasing TDS.

The scatter plot between pH and EC shows a weak positive correlation, suggesting that EC slightly increases with pH, but the relationship is not strong. There is a moderate positive correlation between pH and HCO3, with higher bicarbonate levels associated with higher pH values, consistent with the role of bicarbonates in buffering pH. The scatter plot between pH and Cl shows a weak positive correlation, indicating that chloride levels have a minimal effect on pH. The relationship between pH and SO42− is weakly negative, suggesting that higher sulfate concentrations might be associated with lower pH values. There is a weak positive correlation between pH and Na+, with sodium levels slightly increasing with pH, while the relationships between pH and Ca2+, and between pH and Mg2+, are very weakly negative, indicating a minimal impact of calcium and magnesium on pH.

The scatter plot between EC and HCO3 shows a weak positive correlation, indicating that bicarbonate levels slightly increase with electrical conductivity. A very strong positive correlation is observed between EC and Cl, with the tight clustering of points along an upward trend line indicating that chloride levels are a major contributor to electrical conductivity. The relationship between EC and SO42− is moderately positive, suggesting that higher sulfate levels are associated with higher EC. There is a very strong positive correlation between EC and Na+, indicating that sodium significantly contributes to electrical conductivity. The scatter plot between EC and Ca2+ shows a strong positive correlation, with calcium levels increasing with electrical conductivity. Similarly, the relationship between EC and Mg2+ is strong, indicating that higher magnesium concentrations are associated with higher EC.

Irrigation suitability indices

Various groundwater indices are commonly employed to assess the suitability of groundwater for irrigation purposes. These indices include SAR, total hardness (TH), sodium percent (Na%), magnesium hazard percent (MH%), permeability index (PI), potential salinity (PS), and Kelly's ratio (KR). SAR is a critical indicator of the potential for sodium to accumulate in soil, which can adversely affect soil structure and crop yield. Lower SAR values are preferable for irrigation purposes. The TH index is important as it indicates the concentration of calcium and magnesium ions, which can affect soil structure and crop health. Na% index is also used to evaluate the sodium hazard in irrigation water. Higher Na% values indicate a greater risk of sodium accumulation in the soil. For MH%, high magnesium levels can lead to soil structure problems, affecting water infiltration and plant growth. The higher values of the PI index suggest better permeability, which is essential for maintaining soil health and ensuring efficient water infiltration and root growth. The KR index values above 1 indicate an excess of sodium, which can be detrimental to soil and plant health.

Equations (2)–(8) are utilized to calculate these indices, with all ions measured in meq/L. The choice of utilizing these indices is due to their wide use in global water quality assessment literature, facilitating comparison with a large body of existing research and enabling consistent and comparable results. These indices collectively provide a comprehensive evaluation of groundwater quality for agricultural purposes. For instance, the SAR assesses the sodium hazard to crops, while TH evaluates the potential for scaling in irrigation systems, ensuring a holistic assessment.
(2)
(3)
(4)
(5)
(6)
(7)
(8)

The formulas of these indices are well-established and have been validated through numerous studies, proving their reliability and accuracy (Gad et al. 2023; Hussein et al. 2024). This makes them trustworthy tools for water quality assessment. The chosen indices specifically address the suitability of groundwater for agricultural use, with parameters such as SAR and PI directly impacting soil permeability and crop health, critical factors in agricultural water management. Furthermore, the parameters required for these indices, like concentrations of Na+, Ca2+, Mg2+, Cl, and SO42−, are commonly measured in groundwater quality assessments, making it easier to apply these indices without needing additional or specialized testing.

Irrigation water quality index

It is well understood that the quality of irrigation water, along with factors such as soil type, crop varieties, climatic conditions, and irrigation techniques, significantly influences agricultural profitability and yield. An increase in water salinity adversely impacts both soil and crops. The mineral salts in the irrigation water can alter the soil structure, affecting its permeability and aeration, which in turn disrupts plant growth (Xiao et al. 2014). The IWQI is used to provide a comprehensive assessment of irrigation water quality, reflecting the combined effects of various water quality parameters on its overall condition (Meireles et al. 2010; Abbasnia et al. 2019). To calculate the irrigation water quality parameter (Qi), Equation (9) is utilized in this model.
(9)
where denotes the maximum quality score for the ith water quality parameter, while is the observed value of the jth sample for the ith water quality parameter. The lower threshold or infimum value for the ith water quality parameter is represented by . is the amplitude of the quality score for the ith water quality parameter, which is the difference between the maximum and minimum quality scores. Finally, is the amplitude of the observed values for the ith water quality parameter, calculated as the difference between the upper and lower thresholds. This equation adjusts the observed value of a water quality parameter within a defined range and scales it according to the maximum quality score and the amplitude of the quality scores.
Equation (10) can be used to compute the IWQI (Meireles et al. 2010). The IWQI is classified as ‘Unsuitable’ for ratings between 0 and 40. Scores from 40 to 55 are categorized as ‘Satisfying’ indicating a moderate purity level, which may be suitable for specific applications. A rating between 55 and 70 is considered ‘Good’, indicating a superior water quality that is appropriate for a broader range of uses. Scores from 70 to 85 are categorized as ‘Very Good’, suggesting that the water is suitable for nearly all general purposes. Finally, a score ranging from 85 to 100 is labeled ‘Excellent’, denoting optimal water quality for all intended uses, and represents the highest standard of water purity.
(10)
where Wi represents the relative weights assigned to EC, SAR, Na, Cl, and HCO3, which are 0.211, 0.189, 0.204, 0.194, and 0.202, respectively (Meireles et al. 2010). These weights total 1, ensuring that each parameter proportionally influences the overall IWQI score.

Description of ML models

This study utilized three ensemble models: RF, AdaBoost, and XGBoost, all implemented using the Python programming environment within the Anaconda software. These ensemble techniques are known to boost the performance metrics of predictive models, notably diminishing error rates, and amplifying higher correlations between predicted and actual values (Elshaarawy et al. 2024a). The improvement in the model's performance can be credited to the ensemble's ability to mitigate issues like underfitting, overfitting, or the lack of congruence between the model and the dataset. In the following subsections, the description of each model is explained.

Selecting the right hyperparameters is key to enhancing model performance (Luat et al. 2021). Techniques like grid search (GS), random search (RS), and BO are typically employed to adjust these parameters in ML models (Zhang et al. 2021). GS and RS may not always find the best solution, as they often involve high variance and can be time-intensive due to numerous trials. On the other hand, BO utilizes previous evaluations to search for optimal solutions more effectively (Nair et al. 2024), often requiring less time to identify the most suitable parameters compared with GS and RS. Consequently, this research adopts the BO method for determining the predictive models' ideal parameters. To ensure model robustness on new data and to prevent overfitting, cross-validation (CV) is also applied. Specifically, a 5-fold CV integrated with BO (BO + 5CV) is utilized for hyperparameter optimization of the predictive models.

Random forest

The RF model is an advanced method that is derived from bagging. It enhances decision tree training by randomly selecting attributes for each tree (Eltarabily et al. 2023). The decision to split features is based on measuring impurity and computing information gain. RF builds multiple decision trees by selecting subsets from the dataset with replacements and randomly choosing features for node splitting. This prevents overfitting and reduces model variance. Equation (11) depicts the RF's prediction function is represented as the mean of predictions from each decision tree.
(11)
where represents the model's prediction and is the bootstrapped sample used for training the of each model.

Adaptive boosting

AdaBoost model is a type of ensemble learning method. It is a versatile algorithm applicable to both classification and regression tasks. Specifically tailored for regression problems, AdaBoost constructs a robust model by amalgamating multiple weak learners, each focusing more on previously mispredicted data points. Through iterative training, AdaBoost fine-tunes these weak models on a weighted version of the dataset, gradually increasing emphasis on previously misclassified data points. This iterative process continues until reaching a preset number of models or a desired level of accuracy. The final prediction is derived from the weighted mean of all weak models' predictions. Mathematically, Equation (12) depicts the prediction function of the AdaBoost as follows:
(12)
where is the final prediction from the AdaBoost, N is the total number of weak learners, is the prediction of the weak learner , and is the weight associated with the weak learner . The weight is calculated based on the error of the weak learner using Equation (13):
(13)
where is the error of the weak learner . This process ensures that the weights of the weak learners are adjusted such that the ones that perform better have more influence on the final prediction.

Extreme gradient boosting

XGBoost model represents a version of gradient-boosted decision trees optimized for fast execution and high efficiency (Chen & Guestrin 2016). It is a highly flexible and versatile algorithm known for its efficiency in handling sparse data and its ability to perform well on a wide range of classification and regression tasks. The XGBoost model has been used successfully in numerous ML competitions due to its scalability and ability to produce highly competitive predictive models. For a specified dataset comprising N instances and m characteristics denoted as D = {(xi, yi)}(|D| = N, xi, yi), a tree ensemble model employs K additive functions to forecast the resulting output. Thus, the XGBoost model's basic principle can be described as follows:
(14)
where is the XGBoost model's prediction value, is the output from the regression tree, and is the space of the regression tree. is defined as follows:
(15)
where represents a vector containing scores assigned to leaves, Q denotes a function that maps each data point to its corresponding leaf, and T represents the total number of leaves. The set encompasses all feasible functions generated by various combinations of Q and . The model operates by optimizing an objective function (O), derived from a loss function (L), and a regularization term (). This optimization aims to minimize the disparity between actual and predicted values, as described by Equations (15) and (16). To curtail model complexity and mitigate overfitting, a regularization parameter () is introduced in Equation (17):
(16)
(17)
where is the complexity of the leaf, is the penalty parameter, and w is the vector score on the leaves.

Evaluation criteria

Assessing the efficacy of each predictive model is essential, as the verification of a model's predictive accuracy is fundamental to ensuring its practicality and scientific credibility. The TR dataset used during the construction phase of a model merely indicates how well the model conforms to the dataset in question. To validate the predictive models, testing datasets were therefore utilized (Eltarabily et al. 2024). Model evaluations and their subsequent comparisons predominantly engage two approaches: visual and quantitative methods. Methods based on visualization include the use of scatter plots, violin boxplots, and Taylor diagrams (Selim et al. 2024).

Scatter plots are used to visualize the relationship between two variables (Elshaarawy et al. 2023). Violin boxplots combine elements of boxplots and kernel density plots to provide a more detailed representation of the data distribution. When comparing actual and predicted values, they can show the spread and density of errors, giving insights into the variance and bias of the model's predictions (Elshaarawy et al. 2024b). Taylor diagrams are a specialized graphical representation used to quantify the similarity between predicted and actual values. These diagrams plot the correlation, the standard deviation, and the root mean square error of predictions on a single chart. This provides a comprehensive view of a model's accuracy, variability, and overall performance compared with the actual observations (Elshaarawy & Hamed 2024). These visual tools are beneficial for providing immediate, compelling comparative analyses, offering an instant assessment of the model's predictive accuracy regarding statistical values such as the maximum, minimum, median, and quartiles. They surpass quantitative metrics, which might overlook these elements, but visual tools also fall short in interactive comprehensive model performance data. The study utilized five quantitative metrics, with their ideal values outlined in Table 2.

Table 2

Equations of performance metrics and their ideal values

Performance metricEquationIdeal value
Determination coefficient (R2 (18) 
Root mean square error (RMSE)  (19) 
Mean absolute error (MAE)  (20) 
Mean absolute percentage error (MAPE)  (21) 
Mean bias error (MBE)  (22) 
Performance metricEquationIdeal value
Determination coefficient (R2 (18) 
Root mean square error (RMSE)  (19) 
Mean absolute error (MAE)  (20) 
Mean absolute percentage error (MAPE)  (21) 
Mean bias error (MBE)  (22) 

n is the dataset number; oi and pi are actual and predicted ith values, respectively.

SHAP feature importance analysis

To analyze the sensitivity and interpret ML models on both a wide-scale and a more detailed level (Eltarabily et al. 2023), researchers use the SHAP approach (Lundberg & Lee 2017), which draws on principles from cooperative game theory (Molnar 2018). The SHAP method was employed to gauge the comparative impact of input variables on the prediction process. As an advanced method within the realm of explainable artificial intelligence, SHAP helps to clarify the complex interactions between the input variables and the predictions of the model. SHAP offers critical insights by identifying which features are most influential on predictions and how they modify the predicted results (Vakharia et al. 2016).

Graphical user interface

To make the predictive model for estimating the IWQI accessible and user-friendly, a GUI is developed using the Tkinter library in Python (Elshaarawy et al. 2024a). Tkinter, a standard GUI library, offers a straightforward approach to creating interactive applications (Lundh 1999). The development process starts with setting up the Python environment with Tkinter, which is included in the standard Python distribution. The interface design focuses on simplicity and user guidance, incorporating input fields for users to enter required features, a button to trigger predictions, and labels or instructions for ease of use. The core functionality of the GUI involves creating input fields with TextEntry widgets, a Button widget to initiate the prediction process, and a Label or Text widget to display the predicted IWQI. The trained predictive model is integrated into the GUI using a library-like pickle to load the model and make predictions based on user inputs. To ensure accessibility and collaboration, the GUI application is hosted on GitHub (https://github.com), which provides version control, collaboration features, and easy distribution.

Irrigation suitability indices

Descriptive statistics

Table 3 presents the descriptive statistics of irrigation suitability indices, i.e., SAR, TH, PI, KR, Na%, MH%, and IWQI. This shows that the irrigation suitability indices indicate significant variability in the quality of groundwater used for irrigation purposes. The TH values suggest that the water is generally within acceptable limits for irrigation, although some samples show higher hardness levels. The PI values are favorable, indicating good soil permeability overall. However, the SAR and KR highlight potential risks of sodium accumulation, which could adversely affect soil structure and crop health. The Na% values further corroborate the sodium hazard, emphasizing the need for careful management to prevent soil degradation. The MH% shows a wide range, indicating that some water samples may pose risks to soil structure due to high magnesium levels. The IWQI provides a holistic view, with most values indicating suitable water quality for irrigation, though a few samples fall short.

Table 3

Descriptive statistics of irrigation suitability indices

ParameterUnitMinimumMaximumMeanStd. error
TH mg/L 15.71 63.71 30.84 1.65 
PS – 41.00 176.62 71.40 4.32 
SAR (meq/L)0.5 5.51 30.65 13.76 0.80 
KR – 0.71 4.50 1.82 0.12 
Na% – 42.04 81.99 62.48 1.28 
MH% – 23.27 76.17 45.47 1.41 
PI – 44.16 84.30 63.99 1.27 
IWQI – 34.75 84.73 71.14 1.71 
ParameterUnitMinimumMaximumMeanStd. error
TH mg/L 15.71 63.71 30.84 1.65 
PS – 41.00 176.62 71.40 4.32 
SAR (meq/L)0.5 5.51 30.65 13.76 0.80 
KR – 0.71 4.50 1.82 0.12 
Na% – 42.04 81.99 62.48 1.28 
MH% – 23.27 76.17 45.47 1.41 
PI – 44.16 84.30 63.99 1.27 
IWQI – 34.75 84.73 71.14 1.71 

Correlation analysis

Figure 6 shows a heatmap depicting the correlation matrix of hydrochemical parameters and irrigation suitability indices. The color scale ranges from blue to red, where blue indicates a strong positive correlation, and red signifies a strong negative correlation. IWQI has strong negative correlations with KR (−0.591), PS (−0.926), PI (−0.595), Na% (−0.625), and SAR (−0.878), suggesting that higher values of these parameters generally reduce the overall IWQI. KR positively correlates with PS (0.392), PI (0.96), and Na% (0.955), indicating that increases in KR are associated with higher values of these indices, but it negatively correlates with MH% (−0.265). PS is positively correlated with PI (0.388) and Na% (0.435), but negatively correlated with IWQI (−0.926). PI shows a strong positive correlation with Na% (0.998) and a negative correlation with IWQI (−0.595). MH% has positive correlations with TH (0.317) and negative correlations with PI (−0.418) and Na% (−0.406). Na% positively influences PI (0.998) and negatively affects IWQI (−0.625) and MH% (−0.406). TH is positively correlated with Mg2+ (0.932), Ca2+ (0.894), and SAR (0.164). SAR positively correlates with KR (0.888), PS (0.741), and Na% (0.901), and negatively with IWQI (−0.878).
Figure 6

Heatmap of the relationship between the hydrochemical parameters and irrigation sutability indices.

Figure 6

Heatmap of the relationship between the hydrochemical parameters and irrigation sutability indices.

Close modal

Classification of irrigation water quality

The classification of irrigation water quality through various indices such as SAR, TH, Na%, MH%, PI, PS, KR, and IWQI is essential for assessing its suitability for agricultural use. Each index provides insights based on different water characteristics, such as salinity, hardness, sodium content, magnesium hazard, permeability, PS, and overall water quality. Table 4 shows the classification of each irrigation index. SAR is a critical indicator for assessing the sodium hazard of irrigation water. The values of SAR greater than 26 meq/L classify the water as ‘Unsuitable’ for irrigation, affecting only 2% of the samples. SAR values ranging from 18 to 26 meq/L are considered ‘Doubtful’ and comprise 24% of the samples, indicating potential risks to soil health. A SAR between 10 and 18 meq/L, covering 43% of the samples, is deemed ‘Good’, suitable for irrigation under certain conditions. The best category, ‘Excellent’, includes samples with SAR below 10 meq/L, accounting for 30% of the samples, indicating ideal conditions for irrigation without any risk of sodium accumulation. TH is measured to assess the concentration of calcium and magnesium (Rawat et al. 2018). All samples in this dataset (100%) have a TH of less than 75 mg/L, classifying the water as ‘Soft’. This level of hardness is typically considered optimal for irrigation as it prevents scaling and buildup in irrigation systems.

Table 4

Irrigation indices classification

Irrigation indexClassificationTypeNo. of samplesPercentage
SAR SAR > 26 Unsuitable 
18 < SAR < 26 Doubtful 11 24 
10 < SAR < 18 Good 20 43 
SAR < 10 Excellent 14 30 
TH <75 Soft 46 100 
75–150 Moderately Hard 
150–300 Hard 
>300 Very Hard 
Na% 80–100 Unsuitable 
60–80 Doubtful 19 41 
40–60 Permissible 23 50 
20–40 Good 
<20 Excellent 
MH% >50% Unsuitable 15 33 
<50% Suitable 31 67 
PI <25% Unsuitable 
25–75% Suitable 11 
>75% Good 41 89 
PS >10 Injurious to Unsatisfactory 15 
5–10 Good to Injurious 27 59 
<5 Excellent to Good 12 26 
KR <1 Unsuitable 
>1 Suitable 45 98 
IWQI 0–40 Unsuitable 
40–55 Satisfying 
55–70 Good 11 24 
70–85 Very Good 30 65 
85–100 Excellent 
Irrigation indexClassificationTypeNo. of samplesPercentage
SAR SAR > 26 Unsuitable 
18 < SAR < 26 Doubtful 11 24 
10 < SAR < 18 Good 20 43 
SAR < 10 Excellent 14 30 
TH <75 Soft 46 100 
75–150 Moderately Hard 
150–300 Hard 
>300 Very Hard 
Na% 80–100 Unsuitable 
60–80 Doubtful 19 41 
40–60 Permissible 23 50 
20–40 Good 
<20 Excellent 
MH% >50% Unsuitable 15 33 
<50% Suitable 31 67 
PI <25% Unsuitable 
25–75% Suitable 11 
>75% Good 41 89 
PS >10 Injurious to Unsatisfactory 15 
5–10 Good to Injurious 27 59 
<5 Excellent to Good 12 26 
KR <1 Unsuitable 
>1 Suitable 45 98 
IWQI 0–40 Unsuitable 
40–55 Satisfying 
55–70 Good 11 24 
70–85 Very Good 30 65 
85–100 Excellent 

Na% is another salinity hazard indicator. Samples with 80–100% sodium are classified as ‘Unsuitable’, making up 9% of the samples. Those with 60–80% Na% are ‘Doubtful’ (41%), while Na% between 40 and 60% is ‘Permissible’ for irrigation, representing 50% of the samples. A greater sodium concentration (Na > 60%) may cause deterioration of the physical properties of soil (Aravinthasamy et al. 2021). Magnesium levels above 50% render the water ‘Unsuitable’ for irrigation due to potential detrimental effects on the soil structure and plant absorption (Richards 1954), affecting 33% of the samples. Conversely, samples with less than 50% magnesium content are ‘Suitable’ for irrigation, comprising 67% of the samples. The PI assists in evaluating the effect of water on soil permeability (Wong et al. 2020). 89% of samples are in the ‘Good’ category with a PI greater than 75%, suggesting excellent soil permeability. The remaining 11% fall into the ‘Suitable’ category with PI values between 25 and 75%, indicating moderate permeability. PS assesses the risk of soil salinity from irrigation water. Samples with PS values below 5, which account for 26% of the samples, are classified as ‘Excellent to Good’ for irrigation. Those between 5 and 10 (59%) are ‘Good to Injurious’, and samples with a PS greater than 10 (15%) are deemed ‘Injurious to Unsatisfactory’.

KR below 1, affecting only 2% of the samples, is considered ‘Unsuitable’ due to high sodium content that could be harmful. In contrast, a KR greater than 1 is ‘Suitable’, covering 98% of the samples, indicating a balanced sodium level (Kelly 1940). IWQI values provide a comprehensive measure of water quality for irrigation. IWQI ranging from 70 to 85 (65% of samples) indicates ‘Very Good’ quality, 55 to 70 (24%) as ‘Good’, and 40 to 55 (9%) as ‘Satisfying’. Only 2% of the samples are categorized as ‘Unsuitable’ with an IWQI below 40. Overall, the groundwater quality in the study area is generally suitable for irrigation, with most of the indices showing favorable results for agricultural use. However, attention should be given to managing the sodium and magnesium levels in certain groundwater samples to prevent potential adverse effects on soil and crop health. By implementing appropriate water management strategies, such as the use of soil amendments and choosing salt-tolerant crop varieties where needed, the negative impacts can be minimized, thereby optimizing the irrigation potential of the groundwater.

Results of ML modeling

The results of the employed ML models are outlined and detailed in the following subsections. Notably, input features X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 correspond to TDS, pH, EC, , Cl, , Na+, Ca2+, Mg2+, and K+, respectively, while Y corresponds to IWQI. Table 5 shows the optimal main hyperparameters of the adopted models based on the BO method.

Table 5

Optimal main hyperparameters of the adopted models based on BO

ModelHyperparameterRangeOptimal value
RF n_estimators 10–1,000 619 
max_depth 3–30 30 
min_samples_split 2–20 
min_samples_leaf 1–20 
AdaBoost n_estimators 10–1,000 316 
learning_rate 0.001–1,000 2.73 
loss linear, square, exponential linear 
XGBoost n_estimators 10–1,000 742 
max_depth 3–5 
learning_rate 0.01–0.3 0.25 
colsample_bytree 0.4–1 0.74 
subsample 0.4–1 0.75 
ModelHyperparameterRangeOptimal value
RF n_estimators 10–1,000 619 
max_depth 3–30 30 
min_samples_split 2–20 
min_samples_leaf 1–20 
AdaBoost n_estimators 10–1,000 316 
learning_rate 0.001–1,000 2.73 
loss linear, square, exponential linear 
XGBoost n_estimators 10–1,000 742 
max_depth 3–5 
learning_rate 0.01–0.3 0.25 
colsample_bytree 0.4–1 0.74 
subsample 0.4–1 0.75 

Cross-validation analysis

Figure 7 compares the adopted models using average BO + 5CV scores across five folds. The figure shows that the RF model shows an average R² of 0.934, RMSE of 2.614, and MAE of 1.581. The AdaBoost model records a slightly higher average RMSE of 2.666 and a lower average R² of 0.933 compared with RF, but it has a better average MAE of 1.461. This indicates that, on average, the AdaBoost model's predictions are closer to the actual values, despite a slightly poorer fit in terms of RMSE and R². The XGBoost model outperforms both RF and AdaBoost in all three metrics. It has the lowest average RMSE of 1.856, the highest average R² of 0.959, and the lowest average MAE of 0.733, making it the most accurate and precise model among the others.
Figure 7

Performance of models based on the BO + 5CV process.

Figure 7

Performance of models based on the BO + 5CV process.

Close modal

Quantitative evaluation

Table 6 illustrates the performance improvement of the models before and after hyperparameter tuning, using the performance metrics for both the TR and TS stages. Initially, all models show modest to good performance. However, after tuning, there is a significant enhancement in performance across all models on both metrics, indicating the effectiveness of hyperparameter optimization. Notably, the XGBoost model achieves the highest R2 in TR and TS of 1 and 0.872, respectively, showcasing the substantial impact of tuning on model accuracy and prediction error.

Table 6

Performance of models in the TR and TS stages before and after hyperparameter tuning

ModelMetricInitial TRTuned TRInitial TSTuned TS
RF R2 0.966 0.984 0.801 0.804 
RMSE 1.791 1.242 6.973 6.925 
MAE 1.319 0.804 4.351 4.355 
MAPE 2.081 1.291 9.331 9.333 
MBE 0.326 0.240 2.557 2.469 
AdaBoost R2 0.983 0.991 0.780 0.810 
RMSE 1.275 0.947 7.337 6.814 
MAE 0.900 0.620 4.448 4.396 
MAPE 1.301 0.801 9.766 9.036 
MBE −0.001 −0.464 2.108 1.513 
XGBoost R2 0.806 1.000 0.570 0.872 
RMSE 4.313 0.000439 10.256 5.602 
MAE 3.279 0.000337 7.409 3.239 
MAPE 5.064 0.000483 14.959 7.014 
MBE 0.111 −0.000001 3.335 2.156 
ModelMetricInitial TRTuned TRInitial TSTuned TS
RF R2 0.966 0.984 0.801 0.804 
RMSE 1.791 1.242 6.973 6.925 
MAE 1.319 0.804 4.351 4.355 
MAPE 2.081 1.291 9.331 9.333 
MBE 0.326 0.240 2.557 2.469 
AdaBoost R2 0.983 0.991 0.780 0.810 
RMSE 1.275 0.947 7.337 6.814 
MAE 0.900 0.620 4.448 4.396 
MAPE 1.301 0.801 9.766 9.036 
MBE −0.001 −0.464 2.108 1.513 
XGBoost R2 0.806 1.000 0.570 0.872 
RMSE 4.313 0.000439 10.256 5.602 
MAE 3.279 0.000337 7.409 3.239 
MAPE 5.064 0.000483 14.959 7.014 
MBE 0.111 −0.000001 3.335 2.156 

Results show that there is an improvement in R² post-tuning in both TR and TS based on the RF model predictions, indicating a better model fit to the data. The RMSE also decreases in both sets post-tuning, suggesting that the model's predictions are closer to the actual values. The MAE and MAPE both show a decrease in the TR post-tuning, indicating more precise predictions, although changes in the TS are minimal. The MBE also shows a slight reduction post-tuning, indicating a reduction in systematic prediction bias. For the AdaBoost model, the R² shows a notable improvement in the TS post-tuning, indicating a significant increase in the percentage of explained variance. Similarly, RMSE decreases in both TR and TS post-tuning, pointing to enhanced model accuracy. MAE and MAPE both show improvement post-tuning, especially noticeable in the TS, and the MBE moves closer to zero in the TR and reduces in the TS, indicating reduced bias.

The XGBoost model displays a dramatic improvement in R² post-tuning, especially in the TS stage. RMSE decreases significantly post-tuning, particularly in the TS, which indicates much better prediction accuracy. MAE and MAPE see a drastic reduction in both metrics post-tuning across TR and TS, indicating far more accurate predictions. Additionally, the bias is significantly reduced in the TS post-tuning. Overall, the tuning of hyperparameters leads to improvements across all models and metrics, with the effects particularly outstanding in the XGBoost model where tuning results in nearly perfect outcomes on the TR and substantially improved accuracy on the TS. This underscores the importance of model-specific hyperparameter optimization in enhancing predictive performance and reliability across different datasets.

A rank analysis is performed to assess the overall performance of the adopted models based on their performance. The overall ranking of each model is determined by summing up the individual ratings. The model with the highest total rank is considered the least effective, whereas the model with the lowest total rank is considered the most effective. Table 7 illustrates the results of the rank analysis. However, the XGBoost model stands out above all three models with an overall rank of 12. Following AdaBoost has the second overall rank of 23, and RF has the third overall rank of 27.

Table 7

Rank analysis of the adopted ML models

ModelStageR2RMSEMAEMAPEMBETotal scoreOverall rank
RF TR 13 27 
TS 14 
AdaBoost TR 13 23 
TS 10 
XGBoost TR 12 
TS 
ModelStageR2RMSEMAEMAPEMBETotal scoreOverall rank
RF TR 13 27 
TS 14 
AdaBoost TR 13 23 
TS 10 
XGBoost TR 12 
TS 

Visual evaluation

Figure 8 shows the scatter plots comparing actual values to the predicted values generated by the adopted models. In all three plots, data points are clustered around the equality line (where the predicted value equals the actual value), with a dashed line indicating an error margin of ±10%. Most points fall within this error margin, signifying that the predictions are within 10% of the actual values, which is typically considered very good in many applications. Figure 8(a) shows that the RF model has an almost perfect prediction with high R2 values of 0.984 and 0.804 for the TR and TS stages, respectively. The RMSE and MAE metrics are low for both sets, indicating good model accuracy. Like the RF model, the AdaBoost model demonstrates excellent model performance with high R2 values and low error metrics (Figure 8(b)). Its TR stage shows a better fit than its TS stage, which is common as models tend to perform better on the data they were trained on. Figure 8(c) shows that the XGBoost model has the highest R2 and the lowest RMSE and MAE values in the TR and TS stages, respectively. Thus, the XGBoost model outperformed the other models and has a good generalization of unseen data.
Figure 8

Scatter plots between actual and predicted values in the TR and TS stages based on (a) RF, (b) AdaBoost, and (c) XGBoost models.

Figure 8

Scatter plots between actual and predicted values in the TR and TS stages based on (a) RF, (b) AdaBoost, and (c) XGBoost models.

Close modal
Figure 9(a) shows a violin boxplot for the comparison of prediction values of the adopted models using a violin indicating the distribution of the output. The violin plots reveal that XGBoost manages to capture the variability and central tendency of the TS dataset most effectively among the three models. It demonstrates a balance of accuracy in median prediction and representation of data variability, making it potentially the most reliable model for predicting unseen data. This analysis supports indications that XGBoost is the best-performing model, corroborating its selection based on other performance metrics discussed previously. Figure 9(b) shows the Taylor diagram providing a statistical summary of how well patterns or predicted values match observed values. The figure shows that XGBoost appears to be the closest model among the three as it aligns with the TS dataset, indicating that it not only provides predictions that are consistent with the actual variability of the data but also maintains a strong linear relationship with the actual values.
Figure 9

Comparisons between actual and predicted in the TS stage: (a) Violin boxplot and (b) Taylor diagram.

Figure 9

Comparisons between actual and predicted in the TS stage: (a) Violin boxplot and (b) Taylor diagram.

Close modal

SHAP feature importance

The SHAP feature importance analysis was applied based on the best predictive model (XGBoost). Figure 10(a) shows the SHAP summary plot, highlighting the impact of each feature on the model's output. The features are labeled as X1 to X10, corresponding to hydrochemical parameters: X1 (TDS), X2 (pH), X3 (EC), X4 (), X5 (Cl), , X7 (Na+), X8 (Ca2+), X9 (Mg2+), and X10 (K+). In the SHAP summary plot, the horizontal axis represents the SHAP value, which indicates the impact of a feature on the model's prediction. A SHAP value above zero suggests a positive impact on the model output (IWQI), whereas a value below zero indicates a negative impact. The color gradient from blue to red reflects the feature value, with blue representing low values and red representing high values.
Figure 10

SHAP feature importance (a) summary and (b) bar plots based on the XGBoost model.

Figure 10

SHAP feature importance (a) summary and (b) bar plots based on the XGBoost model.

Close modal

For X7 (Na+), a wide range of SHAP values is observed, predominantly on the negative side, indicating that high sodium levels generally decrease the IWQI. The red points on the left side of the plot suggest that higher sodium values lead to more negative impacts. X5 (Cl) also shows significant influence, with points scattered on both sides of the zero line. This indicates that chloride levels can have both positive and negative effects on the IWQI, depending on the specific value of Cl. X1 (TDS) demonstrates a clear trend where higher values (red points) tend to decrease the IWQI, as evidenced by the clustering of red points on the left side of the plot. This suggests that higher TDS generally have a negative impact on the prediction. X3 (EC) and display a mix of positive and negative impacts. For X3 (EC), there are instances where high values contribute positively to the IWQI, while in other cases, they have a negative effect. shows a similar pattern, indicating that its impact on the model is context-dependent.

X2 (pH) has a more concentrated impact around the zero line, suggesting a relatively neutral effect on the IWQI across different pH levels. However, there are still some instances of positive and negative impacts, indicating variability in its influence. X10 (K+) and show a minimal impact on the IWQI, as evidenced by the clustering of points around the zero line. This indicates that variations in potassium and sulfate levels do not significantly affect the predictions. X8 (Ca2+) and X9 (Mg2+) also exhibit low SHAP values clustered around zero, suggesting a minor influence on the model. However, there are a few instances where high values of these features slightly impact the IWQI.

Figure 10(b) presents a bar chart of mean absolute SHAP values for the 10 features, quantifying their average impact on the IWQI. The bar plot shows that X7 (Na+) has the highest mean SHAP value, indicating that it is the most influential feature in the model. High sodium levels generally decrease the IWQI significantly, as previously observed in the detailed SHAP summary plot. X5 (Cl) is the second most influential feature, suggesting that chloride levels have a considerable impact on IWQI. Cl levels can have both positive and negative effects, and the high mean SHAP value emphasizes its overall importance. X1 (TDS) has a moderate mean SHAP value, indicating that TDS have a noticeable impact. The trend observed earlier showed that higher values of TDS generally decrease the IWQI, which is consistent with its moderate influence.

X3 (EC) and have similar mean SHAP values, showing that EC and both have a moderate but meaningful influence on the model. Their impacts are context-dependent, displaying both positive and negative contributions depending on specific conditions. X2 (pH) has a lower mean SHAP value compared with the previously mentioned features, suggesting that pH levels have a relatively neutral effect on the IWQI, with some variability. This aligns with the observation of its concentrated impact around the zero line. X10 (K+), , , and X8 (Ca2+) have the lowest mean SHAP values, indicating that K+, , Mg2+, and Ca2+ have a minimal impact on the model's predictions. This corresponds with the earlier observation that variations in these features do not significantly affect the IWQI.

In summary, both the SHAP summary plot and the bar plot provide a comprehensive view of each feature's contribution to IWQI. Features such as sodium (X7), chloride (X5), and TDS (X1) show significant impacts, either positively or negatively, indicating their substantial influence on the IWQI. In contrast, features like potassium (X10), sulfate (X6), magnesium (X9), and calcium (X8) exhibit minimal influence on the predictions. Similar to the findings of Fallatah & Khattab (2023), this study also identified high sodium concentrations as a critical factor in groundwater quality. The study found that sodium concentrations in the collected groundwater samples ranged from 36 to 1,527 mg/L, with an average of 413 mg/L, which exceeded the permissible limit of 200 mg/L set by WHO (2017). This high sodium content can negatively impact irrigation and crop growth due to osmotic stress and other issues (Thirumoorthy et al. 2024). Therefore, the assessment of sodium concentrations is crucial for evaluating groundwater quality for irrigation purposes.

Interactive GUI

Finally, to address the practical needs of engineers in efficiently applying ML models to their needs, this section introduces a significant advancement. Despite the complex requirements of database assembly, model training stage, and validation hindering the seamless adoption of ML in everyday design tasks, a novel solution has been crafted. A Python web application has been developed, integrating a model equipped with optimized hyperparameters via an intuitive GUI. This GUI is specifically designed for predicting the IWQI, as illustrated in Figure 11. The IWQI value is directly depicted by clicking the Calculate button. The GUI was built based on the XGBoost (the best predictive model). It was designed with the Tkinter package (Pant & Ramana 2022). The GUI code is freely available at https://github.com/mkamel24/IWQI.
Figure 11

GUI model for predicting IWQI.

Figure 11

GUI model for predicting IWQI.

Close modal

On the right side of the GUI, input fields are provided where users can enter the values for each parameter. For example, the values entered are TDS = 3,373 mg/L, pH = 7.00, EC = 5.00 dS/m, HCO3 = 1.36 meq/L, Cl = 36.95 meq/L, SO4 = 13.49 meq/L, Na = 27.01 meq/L, Ca = 8.73 meq/L, Mg = 15.72 meq/L, and K = 0.38 meq/L. After inputting the values, the user can click on the ‘Calculate’ button to compute the IWQI. The calculation considers the input values to generate the IWQI. The output of the IWQI calculation is displayed at the bottom of the GUI. In this example, the IWQI is calculated to be 83.44. Thus, the water quality in this example is categorized as ‘Very Good’ according to Meireles et al. (2010). This suggests that the water is suitable for irrigation purposes. Thus, the developed GUI model can be used by local farmers to monitor groundwater quality and make informed decisions about irrigation practices, thereby improving crop yield and sustainability.

This research has provided vital insights into the characteristics of groundwater salinity and its hydrochemical properties, which are critical for assessing its suitability for irrigation purposes in the El Moghra aquifer. The use of ML models: RF, AdaBoost, and XGBoost models, optimized with BO, has proven effective in predicting the IWQI. This approach not only enhances our understanding of the aquifer's condition but also showcases the robust capabilities of predictive models in environmental science. Summing up the results, the following key findings can be concluded as follows:

  • The substantial evidence in favor of using ML models in predicting IWQI comes from the good agreement between the predicted and actual values with R2 values.

  • The optimization of hyperparameters notably enhanced the models' performance. For instance, the fine-tuned XGBoost model achieved nearly perfect R2 values of 0.570 and 0.872 in the TS stage, respectively. This highlights the critical role of hyperparameter tuning in achieving optimal model performance.

  • The XGBoost model consistently outperformed RF and AdaBoost models across both the TR and TS stages. This superior performance was further confirmed through the BO + 5CV process, making XGBoost the most accurate and reliable model.

  • The rank analysis further quantified the overall performance of the models, with XGBoost achieving the best rank due to its high-performance metrics.

  • Based on the SHAP feature importance analysis, sodium (Na+) concentration was identified as the most significant factor affecting groundwater quality.

  • The developed user-friendly GUI has practical applications that would enable local farmers and water managers to utilize the research findings effectively.

Despite these advancements, the study recognizes several limitations. The reliance on a selected set of ML models may not comprehensively capture all hydrochemical dynamics due to model-specific biases or assumptions. Additionally, the focus on a specific geographic region may limit the generalizability of the findings to other arid environments with different hydrogeological characteristics. For future research directions, it is recommended to incorporate a wider range of predictive modeling techniques and to extend the geographic scope of the studies to validate and possibly enhance the applicability of the findings. To mitigate groundwater salinity, it is recommended that irrigation practices in the northern region be adjusted, including the use of alternative water sources or the implementation of salt-tolerant crops.

Moreover, it is crucial to implement continuous monitoring and adaptive management strategies to respond effectively to any changes in groundwater quality, thereby ensuring the sustainability of agricultural productivity in the El Moghra region and similar arid areas. These strategies will be vital in adapting to potential environmental changes and in making informed decisions for water resource management. Furthermore, such tools would help end-users make informed decisions about water use without needing in-depth knowledge of ML. Policymakers should consider integrating the insights from this study into regulations and guidelines for sustainable water resource management, particularly in arid regions prone to water scarcity. Optimizing water use efficiency and prioritizing water quality monitoring should be central strategies in sustaining agricultural productivity. Additionally, offering training programs for local water managers and technical staff on interpreting and using ML predictions could enhance the operational utility of these advanced tools in routine decision-making processes.

The authors thank the editor and the anonymous reviewers for their valuable comments.

This research received no external funding.

All authors have equal contributions toward the completion of the research work.

All relevant data are included in the paper or its Supplementary Information.

The authors declare no conflict of interest.

Abbas
Z.
,
Mapoma
H. W. T.
,
Su
C.
,
Aziz
S. Z.
,
Ma
Y.
&
Abbas
N.
2018
Spatial analysis of groundwater suitability for drinking and irrigation in Lahore, Pakistan
.
Environmental Monitoring and Assessment
190
,
391
.
doi:10.1007/s10661-018-6775-3
.
Abbasnia
A.
,
Yousefi
N.
,
Mahvi
A. H.
,
Nabizadeh
R.
,
Radfard
M.
,
Yousefi
M.
&
Alimohammadi
M.
2019
Evaluation of groundwater quality using water quality index and its suitability for assessing water for drinking and irrigation purposes: Case study of Sistan and Baluchistan province (Iran)
.
Human and Ecological Risk Assessment: An International Journal
25
(
4
),
988
1005
.
Abdel Mogith
S. M.
,
Ibrahim
S. M. M.
&
Hafiez
R. A.
2012
Groundwater potentials and characteristics of El-Moghra aquifer in the vicinity of Qattara Depression
.
Egyptian Journal of Desert Research
63
.
doi:10.21608/EJDR.2013.5821
.
Aravinthasamy
P.
,
Karunanidhi
D.
,
Subramani
T.
&
Roy
P. D.
2021
Demarcation of groundwater quality domains using GIS for best agricultural practices in the drought-prone Shanmuganadhi River basin of South India
.
Environmental Science and Pollution Research
28
(
15
),
18423
18435
.
Batarseh
M.
,
Imreizeeq
E.
,
Tilev
S.
,
Al Alaween
M.
,
Suleiman
W.
,
Al Remeithi
A. M.
,
Al Tamimi
M. K.
&
Al Alawneh
M.
2021
Assessment of groundwater quality for irrigation in the arid regions using irrigation water quality index (IWQI) and GIS-Zoning maps: Case study from Abu Dhabi Emirate, UAE
.
Groundwater for Sustainable Development
14
,
17
.
Belal
A. B.
,
Mohamed
E. S.
,
Saleh
A.
,
Jalhoum
M. E.
,
Hendawy
A. E.
,
Abdou
M.
&
Sayed
M. E.
2018
Optimum cropping pattern assessment of El-Moghra area, Egypt using remote sensing and GIS techniques
. In:
13th International Conference of Egyptian Soil Science Society (ESSS) ‘Management of Water and Soil Resources Under Global Climate Changes’
,
Cairo, Egypt
.
Belkhiri
L.
&
Mouni
L.
2012
Hydrochemical analysis and evaluation of groundwater quality in El Eulma area, Algeria
.
Applied Water Science
2
,
127
133
.
doi:10.1007/s13201-012-0033-6
.
Brown
R. M.
,
McClelland
N. I.
,
Deininger
R. A.
&
O'Connor
M. F.
1972
A water quality index – Crashing the psychological barrier
. In
Indicators of Environmental Quality: Proceedings of a Symposium Held During the AAAS Meeting in Philadelphia, Pennsylvania
,
December 26–31, 1971
.
Springer
,
New York, NY, USA
, pp.
173
182
.
Busico
G.
,
Kazakis
N.
,
Cuoco
E.
,
Colombani
N.
,
Tedesco
D.
,
Voudouris
K.
&
Mastrocicco
M.
2020
A novel hybrid method of specific vulnerability to anthropogenic pollution using multivariate statistical and regression analyses
.
Water Research
171
.
doi:10.1016/j.watres.2019.115386
.
Chen
T.
&
Guestrin
C.
2016
XGBoost: A scalable tree boosting system
. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
785
794
.
Dawoud
M. A.
,
Darwish
M. M.
&
El-Kady
M. M.
2005
GIS-based groundwater management model for Western Nile Delta
.
Water Resources Management
19
(
5
),
585
604
.
Doneen
L. D.
1964
Notes on Water Quality in Agriculture
.
Department of Water Science and Engineering, University of California
,
Davis
.
El Bilali
A.
&
Taleb
A.
2020
Prediction of irrigation water quality parameters using machine learning models in a semi-arid environment
.
Journal of the Saudi Society of Agricultural Sciences
19
(
7
),
439
451
.
Elshaarawy
M.
,
Hamed
A. K.
&
Hamed
S.
2023
Regression-based models for predicting discharge coefficient of triangular side orifice
.
Journal of Engineering Research
7
(
5
),
224
231
.
Elshaarawy
M. K.
,
Alsaadawi
M. M.
&
Hamed
A. K.
2024a
Machine learning and interactive GUI for concrete compressive strength prediction
.
Scientific Reports
14
(
1
),
16694
.
Elshaarawy
M. K.
,
Elkiki
M.
,
Selim
T.
&
Eltarabily
M. G.
2024b
Hydraulic Comparison of Different Types of Lining for Irrigation Canals Using Computational Fluid Dynamic Models
.
M.Sc. Thesis
,
Civil Engineering Department, Faculty of Engineering, Port Said University
,
Port Said, Egypt
.
Available from
:
http://dx.doi.org/10.13140/RG.2.2.21927.97441
.
El Tahlawi
M. R.
,
Farrag
A. A.
&
Ahmed
S. S.
2008
Groundwater of Egypt: ‘An environmental overview’
.
Environmental Geology
55
(
3
),
639
652
.
Eltarabily
M. G. A.
&
Negm
A. M.
2018
Groundwater Management for Sustainable Development Plans for the Western Nile Delta
,
Springer, Cham, Switzerland
, pp.
709
727
.
Eltarabily
M. G.
,
Negm
A. M.
,
Yoshimura
C.
&
Takemura
J.
2018
Groundwater modeling in agricultural watershed under different recharge and discharge scenarios for quaternary aquifer eastern Nile Delta, Egypt
.
Environmental Modeling and Assessment
23
,
289
308
.
doi:10.1007/s10666-017-9577-z
.
Eltarabily
M. G.
,
Abd-Elhamid
H. F.
,
Zeleňáková
M.
,
Elshaarawy
M. K.
,
Elkiki
M.
&
Selim
T.
2023
Predicting seepage losses from lined irrigation canals using machine learning models
.
Frontiers in Water
5
,
37
76
.
Global Agricultural Information Network
2016
Egyptian Land Reclamation Efforts
.
(FAS) Foreign Agricultural Service Office of Agricultural Affairs, Cairo, Egypt, (USDA) United States Department of Agriculture
,
Washington, DC, USA
.
Gu
X.
,
Xiao
Y.
,
Yin
S.
,
Pan
X.
,
Niu
Y.
,
Shao
J.
,
Cui
Y.
,
Zhang
Q.
&
Hao
Q.
2017
Natural and anthropogenic factors affecting the shallow groundwater quality in a typical irrigation area with reclaimed water, North China Plain
.
Environmental Monitoring and Assessment
189
,
514
.
doi:10.1007/s10661-017-6229-3
.
Horton
R. K.
1965
An index number system for rating water quality
.
Journal of the Water Pollution Control Federation
37
(
3
),
292
315
,
300–305
.
Hussein
E. E.
,
Derdour
A.
,
Zerouali
B.
,
Almaliki
A.
,
Wong
Y. J.
,
Ballesta-de los Santos
M.
,
Minh Ngoc
P.
,
Hashim
M. A.
&
Elbeltagi
A.
2024
Groundwater quality assessment and irrigation water quality index prediction using machine learning algorithms
.
Water
16
(
2
),
264
.
Jeevanandam
M.
,
Kannan
R.
,
Srinivasalu
S.
&
Rammohan
V.
2007
Hydrogeochemistry and groundwater quality assessment of lower part of the Ponnaiyar River Basin, Cuddalore district, South India
.
Environmental Monitoring and Assessment
132
,
263
274
.
doi:10.1007/s10661-006-9532-y
.
Jianhua
S.
,
Qi
F.
,
Xiaohu
W.
,
Yonghong
S.
,
Haiyang
X.
&
Zongqiang
C.
2009
Major ion chemistry of groundwater in the extreme arid region northwest China
.
Environmental Geology
57
,
1079
1087
.
doi:10.1007/s00254-008-1394-x
.
Kady
M. M.
,
Essa
E. F.
,
ElKady
M.
&
Abdelsalam
A.-S.
2018
Classification and land capability of some soils at El-Moghra Depression, Egypt
.
Egyptian Journal of Desert Research
68
(
2
),
243
258
.
Kelly
W. P.
1940
Permissible composition and concentration of irrigated waters
.
Proceedings of the ASCF
66
,
607
.
Khan
D. S.
,
Fathy
M. S.
&
Abdelazeem
M.
2014
Remote sensing and geophysical investigations of Moghra Lake in the Qattara Depression, Western Desert, Egypt
.
Geomorphology
207
.
doi:10.1016/j.geomorph.2013.10.023
.
Khan
Q.
,
Kalbus
E.
,
Alshamsi
D. M.
,
Mohamed
M. M.
&
Liaqat
M. U.
2019
Hydrochemical analysis of groundwater in Remah and Al Khatim regions, United Arab Emirates
.
Hydrology
6
,
10
22
.
doi:10.3390/hydrology6030060
.
Lundberg
S. M.
&
Lee
S.-I.
2017
A unified approach to interpreting model predictions
. In:
Proceedings of the 31st International Conference on Neural Information Processing Systems
.
Curran Associates Inc.
,
Red Hook, NY, USA
, pp.
4768
4777
.
Lundh
F.
1999
An introduction to tkinter, pp. 539, 540. Available from: www. Pythonware. Com/Library/Tkinter/Introduction/Index.Htm.
Machiwal
D.
&
Jha
M. K.
2012
Hydrologic Time Series Analysis: Theory and Practice
.
doi:10.1007/978-94-007-1861-6
.
Machiwal
D.
&
Jha
M. K.
2015
Identifying sources of groundwater contamination in a hard-rock aquifer system using multivariate statistical analyses and GIS-based geostatistical modeling techniques
.
Journal of Hydrology: Regional Studies
4
,
80
110
.
Machiwal
D.
,
Cloutier
V.
,
Güler
C.
&
Kazakis
N.
2018
A review of GIS-integrated statistical techniques for groundwater quality evaluation and protection
.
Environmental Earth Sciences
77
(
19
),
681
.
Meireles
A. C. M.
,
Andrade
E. M. d.
,
Chaves
L. C. G.
,
Frischkorn
H.
&
Crisostomo
L. A.
2010
A new proposal of the classification of irrigation water
.
Revista Ciência Agronômica
41
(
3
),
349
357
.
Mohamaden
M. I. I.
,
Hamouda
A. Z.
&
Mansour
S.
2016
Application of electrical resistivity method for groundwater exploration at the Moghra area, Western Desert, Egypt
.
The Egyptian Journal of Aquatic Research
42
,
261
268
.
doi:10.1016/j.ejar.2016.06.002
.
Molnar
C.
2018
A guide for making black box models explainable
. URL: .
Nair
P.
,
Vakharia
V.
,
Shah
M.
,
Kumar
Y.
,
Woźniak
M.
,
Shafi
J.
&
Fazal Ijaz
M.
2024
AI-driven digital twin model for reliable lithium-ion battery discharge capacity predictions
.
International Journal of Intelligent Systems
2024
,
8185044
.
Rawat
K. S.
,
Singh
S. K.
&
Gautam
S. K.
2018
Assessment of groundwater quality for irrigation use: A peninsular case study
.
Applied Water Science
8
(
8
),
233
.
Rice
E. W.
,
Bridgewater
L.
&
Association
A. P. H.
2012
Standard Methods for the Examination of Water and Wastewater
, Vol.
10
.
American Public Health Association
,
Washington, DC, USA
.
Richards
L. A.
1954
Diagnosis and Improvement of Saline and Alkali Soils
.
US Department of Agriculture
,
Washington, DC, USA
.
Rufino
F.
,
Busico
G.
,
Cuoco
E.
,
Darrah
T. H.
&
Tedesco
D.
2019
Evaluating the suitability of urban groundwater resources for drinking water and irrigation purposes: An integrated approach in the agro-Aversano area of southern Italy
.
Environmental Monitoring and Assessment
191
,
768
.
doi:10.1007/s10661-019-7978-y
.
Sayed
E.
,
Riad
P.
,
Elbeih
S.
,
Hagras
M.
&
Hassan
A. A.
2019
Multi criteria analysis for groundwater management using solar energy in Moghra Oasis, Egypt
.
Egyptian Journal of Remote Sensing and Space Science
22
(
3
),
227
235
.
Schiewe
J.
2003
Concepts and Techniques of Geographic Information Systems. International Journal of Geographical Information Science 17 (8), 819–820
.
Selim
T.
,
Elshaarawy
M. K.
,
Elkiki
M.
&
Eltarabily
M. G.
2024
Estimating seepage losses from lined irrigation canals using nonlinear regression and artificial neural network models
.
Applied Water Science
14
(
5
),
90
.
Selvakumar
S.
,
Chandrasekar
N.
&
Kumar
G.
2017
Hydrogeochemical characteristics and groundwater contamination in the rapid urban development areas of Coimbatore, India
.
Water Resources and Industry
17
,
26
33
.
doi:10.1016/j.wri.2017.02.002
.
Thirumoorthy
P.
,
Velusamy
S.
,
Nallasamy
J. L.
,
Shanmugamoorthy
M.
,
Sudalaimuthu
G.
,
Veerasamy
S.
,
Periyasamy
M.
&
Murugasamy
M. V.
2024
Evaluation of groundwater quality for irrigation purposes in hard rock terrain of Southern India using water quality indices modelling
.
Desalination and Water Treatment
318
,
100397
.
Vakharia
V.
,
Gupta
V. K.
&
Kankar
P. K.
2016
A comparison of feature ranking techniques for fault diagnosis of ball bearing
.
Soft Computing
20
(
4
),
1601
1619
.
WHO
2017
Guidelines for drinking-water quality fourth edition incorporating the first addendum
.
WHO Chronicle
38
(
4
),
104
108
.
Williams
B.
,
Halloin
C.
,
Löbel
W.
,
Finklea
F.
,
Lipke
E.
,
Zweigerdt
R.
&
Cremaschi
S.
2020
Data-driven model development for cardiomyocyte production experimental failure prediction
.
Computer-Aided Chemical Engineering
48
,
1639
1644
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).