ABSTRACT
Assessing groundwater quality is vital for irrigation, but financial constraints in developing countries often result in infrequent sampling. This study comprehensively analyzes the groundwater quality of the El Moghra aquifer in Egypt's arid Western Desert, for its suitability for irrigation uses. Detailed hydrochemical analysis and advanced machine learning (ML) techniques, including geographic information systems, were employed to enhance spatial analysis and predictive accuracy. Various ML models, such as random forest, adaptive boosting, and extreme gradient boosting (XGBoost), were optimized using Bayesian optimization to predict the irrigation water quality index (IWQI) accurately. The evaluation incorporated visual and quantitative methods, alongside ranking analysis, to validate model effectiveness. Shapley Additive exPlanations feature importance analysis and a graphical user interface (GUI) model were developed based on the best predictive model. The results indicated that the groundwater quality is generally suitable for irrigation, with XGBoost showing the best performance, achieving a root mean square error of 5.602 and a determination coefficient (R²) of 0.872. Sodium concentration was identified as the most significant factor affecting the IWQI. The GUI facilitates easy prediction of IWQI, aiding agricultural water management and resource allocation within the region.
HIGHLIGHTS
Optimized ML models predict IWQI, with XGBoost achieving top performance (RMSE 5.602 & R² 0.872).
Sodium concentration identified as the most influential factor in groundwater quality.
Bayesian optimization significantly boosts model accuracy and effectiveness.
The user-friendly GUI helps farmers predict IWQI for better irrigation management.
INTRODUCTION
Groundwater is a crucial resource in arid regions, especially in Egypt, where deserts constitute 96% of the land area (Eltarabily et al. 2018). The El Moghra aquifer is essential for agricultural activities, necessitating thorough quality assessment (Eltarabily & Moghazy 2021). The national plan aimed to reclaim one and a half million feddans of desert land through a phased approach (Global Agricultural Information Network 2016). These projects mainly depend on 88% of groundwater for irrigation use due to the restricted water supply from the River Nile, highlighting the importance of groundwater quality for agricultural use in selecting project sites across deserts.
The quality of groundwater is affected by various hydrochemical processes such as dissolution, ion exchange, rock weathering, and biological activities (Jeevanandam et al. 2007). Several studies have focused on the hydrochemical characterization of groundwater (Jianhua et al. 2009). These studies also reveal interactions between groundwater and aquifer minerals, indicating aquifer uniformity or variability, and its transmissivity, storage, and conductivity (Wells & Price 2015). The geological structure and formations influence the composition of groundwater and the direction of its movement (Selvakumar et al. 2017). In regions where aquifer recharge occurs, dissolution processes are generally predominant, while ion exchange mostly happens during the flow of groundwater. Furthermore, the chemical composition and type of groundwater are primarily influenced by factors such as the rate of evaporation, intensity of precipitation, infiltration rate, and ion exchange (Khan et al. 2019).
Geographic information system (GIS) has been widely used to assess groundwater quality and visualize spatial distribution (Schiewe 2003). GIS is essential for quantifying and interpreting groundwater issues (Yadav et al. 2018). Additionally, the integration of GIS and remote sensing has facilitated the measurement of aquifer thickness, the delineation of fault locations, and the study of their development through the resistivity of aquifer technique (Belkhiri & Mouni 2012). Rufino et al. (2019) used the GIS-based groundwater quality index (GQI) and SINTACS to assess groundwater quality for drinking and irrigation purposes in the Agro-Aversano region of southern Italy. Moreover, principal component and multivariate analyses are commonly used for their precision and dependability in qualitative and statistical analysis to interpret the hydrochemical parameters of groundwater and to identify its faces and types (Busico et al. 2020).
Statistical techniques such as multivariate analysis have been employed to identify factors influencing water quality (Machiwal et al. 2018). The effective application of these methods significantly improves the accurate analysis of groundwater quality, which is essential for formulating dependable management strategies for groundwater resources (Machiwal & Jha 2012). Recently, the incorporation of advanced statistical tools within GIS has effectively identified hydrochemical regimes and management of groundwater approaches for complex aquifers affected by human activities (Machiwal & Jha 2015). This integrated approach facilitates the mapping of groundwater chemical composition across extensive areas and regional scales.
Various indicators have been globally developed to characterize water quality for irrigation (Doneen 1964). The water quality index (WQI), initially established by Horton (1965) and further refined by Brown et al. (1972), has been modified by Meireles et al. (2010) to introduce irrigation water quality index (IWQI). This index includes variables such as EC, SAR, , Na+, and Cl−. This approach enables decision-makers to effectively evaluate the quality and potential risks associated with different water types using a comprehensive set of parameters (Şener et al. 2021). The IWQI aids in the comparison and assessment of various water samples, helping to reduce negative impacts on soil and vegetation (Batarseh et al. 2021). The use of IWQI also supports the feasibility of drilling wells for agricultural purposes in regions affected by excessive groundwater salinization (El Bilali & Taleb 2020). This method is increasingly employed in numerous research initiatives for its ability to tackle complex problems and clarify the connections between input and output data.
Despite extensive studies on groundwater dynamics, there is a notable gap in applying advanced machine learning (ML) techniques to predict IWQI. This study aims to bridge this gap by employing models such as RF, AdaBoost, and XGBoost to provide accurate predictions and practical solutions for groundwater management. While foundational studies have elucidated basic hydrological and geological characteristics (Abdel Mogith et al. 2012) and addressed aspects of groundwater management (Sayed et al. 2019) and geological surveying (Mohamaden et al. 2016), none have leveraged the predictive power of ML to enhance irrigation decisions. This oversight represents a critical research gap, particularly given the potential of ML to provide detailed, accurate forecasts from complex datasets. Thus, the following are the authors' specific contributions:
(1) To predict IWQI based on the hydrochemical parameters to assess the suitability of the El Moghra aquifer's groundwater for irrigation purposes.
(2) To implement the BO method for hyperparameter tuning to enhance the performance of the adopted ML models.
(3) To establish a comprehensive evaluation framework that incorporates visual, quantitative methods, and ranking analysis to ensure rigorous validation of model effectiveness.
(4) To conduct SHAP feature importance analysis to identify key parameters influencing IWQI, offering insights for targeted improvements in ML models.
(5) To develop a user-friendly graphical user interface (GUI) to facilitate the straightforward prediction of IWQI based on the best predictive model.
STUDY AREA
Geological characteristics
The El Moghra aquifer consists mainly of lower Miocene formations, characterized by a shallow marine clastic sequence of sand, sandstone, and clay lenses. This formation extends from the Western Nile Delta to the Qattara Depression and from the Mediterranean Sea to the El-Fayoum Depression, covering around 50,000 km² (Eltarabily & Negm 2018). The upper Quaternary deposits in the region, 2.0–3.0 m thick, are mainly sand dunes. The Miocene layer, 20.0–50.0 m thick, consists of sand, sandstone, shale, and fossils, overlaying Oligocene deposits (150–200 m thick) of sand and gravel (Khan et al. 2014). The subsequent layer is the Upper Eocene deposit, made up of sandstone and limestone, with an average thickness of 600 m (Mohamaden et al. 2016). Faults are found on the eastern and western sides of the project area. Belal et al. (2018) evaluated the suitability of the El Moghra area for irrigation and agriculture using the Cervatana model, integrating GIS, remote sensing, and MicroLEIS for mapping cropping patterns. Their findings showed that 0.69% of the land is highly suitable, 72.16% is moderately suitable, 6.87% is marginal, and 20.28% is non-productive. Limitations to land capability include soil texture and drainage, aridity, and erosion risks. Highly suitable lands mainly consist of Torripsamments soil in lowlands and gravel plains, while marginal and non-productive lands are composed of Sabkha soil near the Qattara Depression (Kady et al. 2018).
Hydrological characteristics
METHODOLOGY
Database collection
The database for ML modeling was collected from Eltarabily & Moghazy (2021) study of 46 groundwater samples that were measured from November 2018 to December 2019. These samples were selected based on the availability of driven wells in plots that were either mostly unreclaimed or where the wells had not been excavated. The average depth of the water table ranged from 70 to 80 m. The groundwater was continuously discharged for at least 30 min to stabilize these parameters before measuring temperature, pH, and EC. The samples were stored in polypropylene bottles and transported to laboratories for hydrochemical analysis. In situ, pH measurements were performed using a pH meter with a precision of ±0.1 pH. EC and TDS were determined with meters accurate to 2% over a temperature range of 0.1–80 °C. The collection analysis included three major anions: , Cl−, , and four major cations: Ca2+, Na+, Mg2+, K+, following the standard methods of Rice et al. (2012).
Calcium and magnesium levels were quantified using the complex cation method and ethylenediaminetetraacetic acid disodium salt dehydrate (EDTA). Sodium and potassium were quantified using a digital flame photometer. Chloride concentration was determined using a standard silver nitrate (AgNO3 (0.01 N)) titration with potassium chromate (K2CrO4) as a 5% indicator solution, ceasing upon observing a red-brown color. Consistent titers were obtained after three readings. concentration was estimated using a volumetric method with 0.02 N sulfuric acid (H2SO4), using methyl orange and phenolphthalein as indicators. The endpoint for phenolphthalein alkalinity was marked by the disappearance of the pink color, and total alkalinity was confirmed when the bromocresol green indicator changed from blue to yellow at a pH of 4.5. concentration was measured using a UV/visible spectrophotometer, Unicam model UV4-200 (UK).
Statistical analysis
Table 1 presents the descriptive analysis of the hydrochemical characteristics of the groundwater samples. The table shows that the TDS levels are crucial as they indicate the overall concentration of dissolved substances in water, ranging from 3,328 to 14,008 mg/L across the samples, with an average of 6,090.13 mg/L. Ntanganedzeni et al. (2018) found comparable results in their analysis of brackish groundwater. In El Moghra, the increased TDS levels are associated with the presence of sandstone in the Oligocene deposits and limestone from the upper Eocene. This indicates an upward migration of groundwater from the lower NSAS to the upper El Moghra aquifer. The pH values ranged from 5.7 to 7.8, with an average of 7.1 ± 0.3. This indicates that most samples are slightly alkaline, except for one outlier with a lower pH (5.70), suggesting localized acidity. EC varied widely from 5.00 to 19.00 dS/m, averaging 8.33 dS/m. Higher EC values suggest elevated salinity, impacting water quality for irrigation purposes. Ion concentrations in the groundwater were measured as follows: Ca2+ ranged from 113 to 637 mg/L (average 330.5 ± 109.4 mg/L), Na+ from 490 to 3,249 mg/L (average 1235.8 ± 568.2 mg/L), Mg2+ from 48 to 393 mg/L (average 174.4 ± 81.9 mg/L), and K+ from 15 to 82 mg/L (average 26.3 ± 15.9 mg/L). Among these, Na+ was the most abundant cation, followed by Ca2+, Mg2+, and K+. Abbas et al. (2018) also reported similar findings, noting a predominance of Mg2+ over Ca2+, which contributes to magnesium hazards and alkaline groundwater conditions, particularly in the eastern parts of their study area. Cl− was the predominant ion, ranging from 1,056 to 5,911 mg/L, averaging 2143.4 ± 994 mg/L. The high Cl− levels are attributed to inadequate salinity drainage in the area and natural rock–water interactions that enhance groundwater salinity, as also observed by Gu et al. (2017). concentrations varied between 58 and 369 mg/L, while ranged from 231 to 2,086 mg/L, averaging 1051.5 ± 396.9 mg/L.
Parameter . | Unit . | Minimum . | Maximum . | Mean . | Std. error . |
---|---|---|---|---|---|
TDS | mg/L | 3,328 | 14,008 | 6,090.13 | 336.50 |
pH | – | 5.70 | 7.80 | 7.11 | 0.05 |
EC | dS/m | 5.00 | 19.00 | 8.33 | 0.46 |
mg/L | 58.00 | 369.00 | 126.87 | 10.62 | |
Cl− | mg/L | 1,056.00 | 5,911.00 | 2,143.37 | 146.56 |
mg/L | 231.00 | 2,086.00 | 1,051.52 | 58.53 | |
Na+ | mg/L | 490.00 | 3,249.00 | 1,235.76 | 83.78 |
Ca2+ | mg/L | 113.00 | 637.00 | 330.48 | 16.14 |
Mg2+ | mg/L | 48.00 | 393.00 | 174.43 | 12.07 |
K+ | mg/L | 15.00 | 82.00 | 26.26 | 2.34 |
Parameter . | Unit . | Minimum . | Maximum . | Mean . | Std. error . |
---|---|---|---|---|---|
TDS | mg/L | 3,328 | 14,008 | 6,090.13 | 336.50 |
pH | – | 5.70 | 7.80 | 7.11 | 0.05 |
EC | dS/m | 5.00 | 19.00 | 8.33 | 0.46 |
mg/L | 58.00 | 369.00 | 126.87 | 10.62 | |
Cl− | mg/L | 1,056.00 | 5,911.00 | 2,143.37 | 146.56 |
mg/L | 231.00 | 2,086.00 | 1,051.52 | 58.53 | |
Na+ | mg/L | 490.00 | 3,249.00 | 1,235.76 | 83.78 |
Ca2+ | mg/L | 113.00 | 637.00 | 330.48 | 16.14 |
Mg2+ | mg/L | 48.00 | 393.00 | 174.43 | 12.07 |
K+ | mg/L | 15.00 | 82.00 | 26.26 | 2.34 |
For bicarbonate, the most common concentration is 0.951 mg/L, with 40 occurrences. There are minimal samples with higher bicarbonate concentrations: one sample at 2.650 mg/L and five samples at 6.048 mg/L. This indicates that low bicarbonate concentrations are predominant in the dataset. Chloride levels are mostly concentrated around 29.8 mg/L, with 36 occurrences. There are fewer samples with higher chloride concentrations: eight samples at 75.4 mg/L and two samples at 121.1 mg/L. The highest range of 166.7 mg/L has very few occurrences, indicating that high chloride levels are less common. The histogram for sulfate shows that the highest frequency, 26 samples, is at 30.56 mg/L. There are also significant occurrences at 17.68 mg/L with 14 samples. The lowest and highest ranges, 4.81 and 43.43 mg/L, have fewer samples, 14 and 6 respectively, indicating a broader spread of sulfate concentrations.
Sodium concentrations are most frequent at 21.3 mg/L with 30 samples. There are 14 samples with sodium levels at 61.3 mg/L, indicating a significant number of occurrences in this range. Higher sodium concentrations, such as 101.3 and 141.3 mg/L, are less frequent, with only two samples each, showing that higher sodium levels are rare. The calcium distribution shows the highest frequency at 23.07 mg/L with 24 samples. There are also significant occurrences at 14.35 mg/L with 17 samples. Lower and higher concentrations, such as 5.64 and 31.79 mg/L, have fewer samples, 17 and 5, respectively, indicating moderate variability in calcium levels. Magnesium levels are most common at 3.95 mg/L with 24 samples. The next significant range is 13.41 mg/L with 16 samples. Higher magnesium concentrations, such as 22.88 and 32.34 mg/L, have fewer samples, 6 and 4, respectively, suggesting a decline in frequency with increasing magnesium levels. The histogram for potassium shows that the highest frequency is at 0.384 mg/L with 40 occurrences. There are minimal samples at higher concentrations: three samples each at 0.955, 1.526, and 2.097 mg/L. This indicates that low potassium levels are predominant in the dataset.
Correlation analysis
The pH parameter shows weak positive correlations with EC (0.19), HCO3− (0.32), Cl− (0.27), Na+ (0.24), and K+ (0.29). However, it has weak negative correlations with SO42− (−0.14), Ca2+ (−0.03), and Mg2+ (−0.03). For EC, there is a weak positive correlation with HCO3− (0.26) but very strong positive correlations with Cl− (0.96), Na+ (0.93), and Ca2+ (0.67). EC also has strong positive correlations with Mg2+ (0.60) and a moderate positive correlation with K+ (0.42). HCO3− shows weak positive correlations with Cl− (0.25), SO42− (0.07), Mg2+ (0.23), and K+ (0.19), but a moderate positive correlation with Na+ (0.49) and Ca2+ (0.26). Cl− has a very strong positive correlation with Na+ (0.92), moderate positive correlations with Ca2+ (0.65) and K+ (0.47), and a strong positive correlation with Mg2+ (0.56). SO42− shows weak positive correlations with Na+ (0.33), Ca2+ (0.58), Mg2+ (0.49), and K+ (0.42). Na+ (sodium) is moderately positively correlated with Ca2+ (0.65) and Mg2+ (0.67), and strongly positively correlated with K+ (0.72). For Ca2+ (calcium), there are moderate positive correlations with Mg2+ (0.49) and K+ (0.47). Lastly, Mg2+ shows a moderate positive correlation with K+ (0.47). In summary, the heatmap indicates several strong correlations among the water quality parameters, particularly between TDS, EC, Cl−, Na+, Ca2+, and Mg2+.
Examining the scatter plot between TDS and HCO3−, a weak positive correlation is observed. While there is a slight upward trend, the spread of points indicates that bicarbonate levels do not strongly depend on TDS. The relationship between TDS and SO42− is moderately positive, suggesting that sulfate levels tend to increase with TDS, although the correlation is not as strong as with EC or Cl−. The strong positive correlation between TDS and Na+, as well as between TDS and Mg2+, further indicates that these ions significantly contribute to the total dissolved solids in the water. The correlation between TDS and Ca2+ is also moderate to strong, with an upward trend indicating that calcium levels tend to rise with increasing TDS.
The scatter plot between pH and EC shows a weak positive correlation, suggesting that EC slightly increases with pH, but the relationship is not strong. There is a moderate positive correlation between pH and HCO3−, with higher bicarbonate levels associated with higher pH values, consistent with the role of bicarbonates in buffering pH. The scatter plot between pH and Cl− shows a weak positive correlation, indicating that chloride levels have a minimal effect on pH. The relationship between pH and SO42− is weakly negative, suggesting that higher sulfate concentrations might be associated with lower pH values. There is a weak positive correlation between pH and Na+, with sodium levels slightly increasing with pH, while the relationships between pH and Ca2+, and between pH and Mg2+, are very weakly negative, indicating a minimal impact of calcium and magnesium on pH.
The scatter plot between EC and HCO3− shows a weak positive correlation, indicating that bicarbonate levels slightly increase with electrical conductivity. A very strong positive correlation is observed between EC and Cl−, with the tight clustering of points along an upward trend line indicating that chloride levels are a major contributor to electrical conductivity. The relationship between EC and SO42− is moderately positive, suggesting that higher sulfate levels are associated with higher EC. There is a very strong positive correlation between EC and Na+, indicating that sodium significantly contributes to electrical conductivity. The scatter plot between EC and Ca2+ shows a strong positive correlation, with calcium levels increasing with electrical conductivity. Similarly, the relationship between EC and Mg2+ is strong, indicating that higher magnesium concentrations are associated with higher EC.
Irrigation suitability indices
Various groundwater indices are commonly employed to assess the suitability of groundwater for irrigation purposes. These indices include SAR, total hardness (TH), sodium percent (Na%), magnesium hazard percent (MH%), permeability index (PI), potential salinity (PS), and Kelly's ratio (KR). SAR is a critical indicator of the potential for sodium to accumulate in soil, which can adversely affect soil structure and crop yield. Lower SAR values are preferable for irrigation purposes. The TH index is important as it indicates the concentration of calcium and magnesium ions, which can affect soil structure and crop health. Na% index is also used to evaluate the sodium hazard in irrigation water. Higher Na% values indicate a greater risk of sodium accumulation in the soil. For MH%, high magnesium levels can lead to soil structure problems, affecting water infiltration and plant growth. The higher values of the PI index suggest better permeability, which is essential for maintaining soil health and ensuring efficient water infiltration and root growth. The KR index values above 1 indicate an excess of sodium, which can be detrimental to soil and plant health.
The formulas of these indices are well-established and have been validated through numerous studies, proving their reliability and accuracy (Gad et al. 2023; Hussein et al. 2024). This makes them trustworthy tools for water quality assessment. The chosen indices specifically address the suitability of groundwater for agricultural use, with parameters such as SAR and PI directly impacting soil permeability and crop health, critical factors in agricultural water management. Furthermore, the parameters required for these indices, like concentrations of Na+, Ca2+, Mg2+, Cl−, and SO42−, are commonly measured in groundwater quality assessments, making it easier to apply these indices without needing additional or specialized testing.
Irrigation water quality index
Description of ML models
This study utilized three ensemble models: RF, AdaBoost, and XGBoost, all implemented using the Python programming environment within the Anaconda software. These ensemble techniques are known to boost the performance metrics of predictive models, notably diminishing error rates, and amplifying higher correlations between predicted and actual values (Elshaarawy et al. 2024a). The improvement in the model's performance can be credited to the ensemble's ability to mitigate issues like underfitting, overfitting, or the lack of congruence between the model and the dataset. In the following subsections, the description of each model is explained.
Selecting the right hyperparameters is key to enhancing model performance (Luat et al. 2021). Techniques like grid search (GS), random search (RS), and BO are typically employed to adjust these parameters in ML models (Zhang et al. 2021). GS and RS may not always find the best solution, as they often involve high variance and can be time-intensive due to numerous trials. On the other hand, BO utilizes previous evaluations to search for optimal solutions more effectively (Nair et al. 2024), often requiring less time to identify the most suitable parameters compared with GS and RS. Consequently, this research adopts the BO method for determining the predictive models' ideal parameters. To ensure model robustness on new data and to prevent overfitting, cross-validation (CV) is also applied. Specifically, a 5-fold CV integrated with BO (BO + 5CV) is utilized for hyperparameter optimization of the predictive models.
Random forest
Adaptive boosting
Extreme gradient boosting
Evaluation criteria
Assessing the efficacy of each predictive model is essential, as the verification of a model's predictive accuracy is fundamental to ensuring its practicality and scientific credibility. The TR dataset used during the construction phase of a model merely indicates how well the model conforms to the dataset in question. To validate the predictive models, testing datasets were therefore utilized (Eltarabily et al. 2024). Model evaluations and their subsequent comparisons predominantly engage two approaches: visual and quantitative methods. Methods based on visualization include the use of scatter plots, violin boxplots, and Taylor diagrams (Selim et al. 2024).
Scatter plots are used to visualize the relationship between two variables (Elshaarawy et al. 2023). Violin boxplots combine elements of boxplots and kernel density plots to provide a more detailed representation of the data distribution. When comparing actual and predicted values, they can show the spread and density of errors, giving insights into the variance and bias of the model's predictions (Elshaarawy et al. 2024b). Taylor diagrams are a specialized graphical representation used to quantify the similarity between predicted and actual values. These diagrams plot the correlation, the standard deviation, and the root mean square error of predictions on a single chart. This provides a comprehensive view of a model's accuracy, variability, and overall performance compared with the actual observations (Elshaarawy & Hamed 2024). These visual tools are beneficial for providing immediate, compelling comparative analyses, offering an instant assessment of the model's predictive accuracy regarding statistical values such as the maximum, minimum, median, and quartiles. They surpass quantitative metrics, which might overlook these elements, but visual tools also fall short in interactive comprehensive model performance data. The study utilized five quantitative metrics, with their ideal values outlined in Table 2.
Performance metric . | Equation . | Ideal value . | |
---|---|---|---|
Determination coefficient (R2) | (18) | 1 | |
Root mean square error (RMSE) | (19) | 0 | |
Mean absolute error (MAE) | (20) | 0 | |
Mean absolute percentage error (MAPE) | (21) | 0 | |
Mean bias error (MBE) | (22) | 0 |
Performance metric . | Equation . | Ideal value . | |
---|---|---|---|
Determination coefficient (R2) | (18) | 1 | |
Root mean square error (RMSE) | (19) | 0 | |
Mean absolute error (MAE) | (20) | 0 | |
Mean absolute percentage error (MAPE) | (21) | 0 | |
Mean bias error (MBE) | (22) | 0 |
n is the dataset number; oi and pi are actual and predicted ith values, respectively.
SHAP feature importance analysis
To analyze the sensitivity and interpret ML models on both a wide-scale and a more detailed level (Eltarabily et al. 2023), researchers use the SHAP approach (Lundberg & Lee 2017), which draws on principles from cooperative game theory (Molnar 2018). The SHAP method was employed to gauge the comparative impact of input variables on the prediction process. As an advanced method within the realm of explainable artificial intelligence, SHAP helps to clarify the complex interactions between the input variables and the predictions of the model. SHAP offers critical insights by identifying which features are most influential on predictions and how they modify the predicted results (Vakharia et al. 2016).
Graphical user interface
To make the predictive model for estimating the IWQI accessible and user-friendly, a GUI is developed using the Tkinter library in Python (Elshaarawy et al. 2024a). Tkinter, a standard GUI library, offers a straightforward approach to creating interactive applications (Lundh 1999). The development process starts with setting up the Python environment with Tkinter, which is included in the standard Python distribution. The interface design focuses on simplicity and user guidance, incorporating input fields for users to enter required features, a button to trigger predictions, and labels or instructions for ease of use. The core functionality of the GUI involves creating input fields with TextEntry widgets, a Button widget to initiate the prediction process, and a Label or Text widget to display the predicted IWQI. The trained predictive model is integrated into the GUI using a library-like pickle to load the model and make predictions based on user inputs. To ensure accessibility and collaboration, the GUI application is hosted on GitHub (https://github.com), which provides version control, collaboration features, and easy distribution.
RESULTS AND DISCUSSION
Irrigation suitability indices
Descriptive statistics
Table 3 presents the descriptive statistics of irrigation suitability indices, i.e., SAR, TH, PI, KR, Na%, MH%, and IWQI. This shows that the irrigation suitability indices indicate significant variability in the quality of groundwater used for irrigation purposes. The TH values suggest that the water is generally within acceptable limits for irrigation, although some samples show higher hardness levels. The PI values are favorable, indicating good soil permeability overall. However, the SAR and KR highlight potential risks of sodium accumulation, which could adversely affect soil structure and crop health. The Na% values further corroborate the sodium hazard, emphasizing the need for careful management to prevent soil degradation. The MH% shows a wide range, indicating that some water samples may pose risks to soil structure due to high magnesium levels. The IWQI provides a holistic view, with most values indicating suitable water quality for irrigation, though a few samples fall short.
Parameter . | Unit . | Minimum . | Maximum . | Mean . | Std. error . |
---|---|---|---|---|---|
TH | mg/L | 15.71 | 63.71 | 30.84 | 1.65 |
PS | – | 41.00 | 176.62 | 71.40 | 4.32 |
SAR | (meq/L)0.5 | 5.51 | 30.65 | 13.76 | 0.80 |
KR | – | 0.71 | 4.50 | 1.82 | 0.12 |
Na% | – | 42.04 | 81.99 | 62.48 | 1.28 |
MH% | – | 23.27 | 76.17 | 45.47 | 1.41 |
PI | – | 44.16 | 84.30 | 63.99 | 1.27 |
IWQI | – | 34.75 | 84.73 | 71.14 | 1.71 |
Parameter . | Unit . | Minimum . | Maximum . | Mean . | Std. error . |
---|---|---|---|---|---|
TH | mg/L | 15.71 | 63.71 | 30.84 | 1.65 |
PS | – | 41.00 | 176.62 | 71.40 | 4.32 |
SAR | (meq/L)0.5 | 5.51 | 30.65 | 13.76 | 0.80 |
KR | – | 0.71 | 4.50 | 1.82 | 0.12 |
Na% | – | 42.04 | 81.99 | 62.48 | 1.28 |
MH% | – | 23.27 | 76.17 | 45.47 | 1.41 |
PI | – | 44.16 | 84.30 | 63.99 | 1.27 |
IWQI | – | 34.75 | 84.73 | 71.14 | 1.71 |
Correlation analysis
Classification of irrigation water quality
The classification of irrigation water quality through various indices such as SAR, TH, Na%, MH%, PI, PS, KR, and IWQI is essential for assessing its suitability for agricultural use. Each index provides insights based on different water characteristics, such as salinity, hardness, sodium content, magnesium hazard, permeability, PS, and overall water quality. Table 4 shows the classification of each irrigation index. SAR is a critical indicator for assessing the sodium hazard of irrigation water. The values of SAR greater than 26 meq/L classify the water as ‘Unsuitable’ for irrigation, affecting only 2% of the samples. SAR values ranging from 18 to 26 meq/L are considered ‘Doubtful’ and comprise 24% of the samples, indicating potential risks to soil health. A SAR between 10 and 18 meq/L, covering 43% of the samples, is deemed ‘Good’, suitable for irrigation under certain conditions. The best category, ‘Excellent’, includes samples with SAR below 10 meq/L, accounting for 30% of the samples, indicating ideal conditions for irrigation without any risk of sodium accumulation. TH is measured to assess the concentration of calcium and magnesium (Rawat et al. 2018). All samples in this dataset (100%) have a TH of less than 75 mg/L, classifying the water as ‘Soft’. This level of hardness is typically considered optimal for irrigation as it prevents scaling and buildup in irrigation systems.
Irrigation index . | Classification . | Type . | No. of samples . | Percentage . |
---|---|---|---|---|
SAR | SAR > 26 | Unsuitable | 1 | 2 |
18 < SAR < 26 | Doubtful | 11 | 24 | |
10 < SAR < 18 | Good | 20 | 43 | |
SAR < 10 | Excellent | 14 | 30 | |
TH | <75 | Soft | 46 | 100 |
75–150 | Moderately Hard | 0 | 0 | |
150–300 | Hard | 0 | 0 | |
>300 | Very Hard | 0 | 0 | |
Na% | 80–100 | Unsuitable | 4 | 9 |
60–80 | Doubtful | 19 | 41 | |
40–60 | Permissible | 23 | 50 | |
20–40 | Good | 0 | 0 | |
<20 | Excellent | 0 | 0 | |
MH% | >50% | Unsuitable | 15 | 33 |
<50% | Suitable | 31 | 67 | |
PI | <25% | Unsuitable | 0 | 0 |
25–75% | Suitable | 5 | 11 | |
>75% | Good | 41 | 89 | |
PS | >10 | Injurious to Unsatisfactory | 7 | 15 |
5–10 | Good to Injurious | 27 | 59 | |
<5 | Excellent to Good | 12 | 26 | |
KR | <1 | Unsuitable | 1 | 2 |
>1 | Suitable | 45 | 98 | |
IWQI | 0–40 | Unsuitable | 1 | 2 |
40–55 | Satisfying | 4 | 9 | |
55–70 | Good | 11 | 24 | |
70–85 | Very Good | 30 | 65 | |
85–100 | Excellent | 0 | 0 |
Irrigation index . | Classification . | Type . | No. of samples . | Percentage . |
---|---|---|---|---|
SAR | SAR > 26 | Unsuitable | 1 | 2 |
18 < SAR < 26 | Doubtful | 11 | 24 | |
10 < SAR < 18 | Good | 20 | 43 | |
SAR < 10 | Excellent | 14 | 30 | |
TH | <75 | Soft | 46 | 100 |
75–150 | Moderately Hard | 0 | 0 | |
150–300 | Hard | 0 | 0 | |
>300 | Very Hard | 0 | 0 | |
Na% | 80–100 | Unsuitable | 4 | 9 |
60–80 | Doubtful | 19 | 41 | |
40–60 | Permissible | 23 | 50 | |
20–40 | Good | 0 | 0 | |
<20 | Excellent | 0 | 0 | |
MH% | >50% | Unsuitable | 15 | 33 |
<50% | Suitable | 31 | 67 | |
PI | <25% | Unsuitable | 0 | 0 |
25–75% | Suitable | 5 | 11 | |
>75% | Good | 41 | 89 | |
PS | >10 | Injurious to Unsatisfactory | 7 | 15 |
5–10 | Good to Injurious | 27 | 59 | |
<5 | Excellent to Good | 12 | 26 | |
KR | <1 | Unsuitable | 1 | 2 |
>1 | Suitable | 45 | 98 | |
IWQI | 0–40 | Unsuitable | 1 | 2 |
40–55 | Satisfying | 4 | 9 | |
55–70 | Good | 11 | 24 | |
70–85 | Very Good | 30 | 65 | |
85–100 | Excellent | 0 | 0 |
Na% is another salinity hazard indicator. Samples with 80–100% sodium are classified as ‘Unsuitable’, making up 9% of the samples. Those with 60–80% Na% are ‘Doubtful’ (41%), while Na% between 40 and 60% is ‘Permissible’ for irrigation, representing 50% of the samples. A greater sodium concentration (Na > 60%) may cause deterioration of the physical properties of soil (Aravinthasamy et al. 2021). Magnesium levels above 50% render the water ‘Unsuitable’ for irrigation due to potential detrimental effects on the soil structure and plant absorption (Richards 1954), affecting 33% of the samples. Conversely, samples with less than 50% magnesium content are ‘Suitable’ for irrigation, comprising 67% of the samples. The PI assists in evaluating the effect of water on soil permeability (Wong et al. 2020). 89% of samples are in the ‘Good’ category with a PI greater than 75%, suggesting excellent soil permeability. The remaining 11% fall into the ‘Suitable’ category with PI values between 25 and 75%, indicating moderate permeability. PS assesses the risk of soil salinity from irrigation water. Samples with PS values below 5, which account for 26% of the samples, are classified as ‘Excellent to Good’ for irrigation. Those between 5 and 10 (59%) are ‘Good to Injurious’, and samples with a PS greater than 10 (15%) are deemed ‘Injurious to Unsatisfactory’.
KR below 1, affecting only 2% of the samples, is considered ‘Unsuitable’ due to high sodium content that could be harmful. In contrast, a KR greater than 1 is ‘Suitable’, covering 98% of the samples, indicating a balanced sodium level (Kelly 1940). IWQI values provide a comprehensive measure of water quality for irrigation. IWQI ranging from 70 to 85 (65% of samples) indicates ‘Very Good’ quality, 55 to 70 (24%) as ‘Good’, and 40 to 55 (9%) as ‘Satisfying’. Only 2% of the samples are categorized as ‘Unsuitable’ with an IWQI below 40. Overall, the groundwater quality in the study area is generally suitable for irrigation, with most of the indices showing favorable results for agricultural use. However, attention should be given to managing the sodium and magnesium levels in certain groundwater samples to prevent potential adverse effects on soil and crop health. By implementing appropriate water management strategies, such as the use of soil amendments and choosing salt-tolerant crop varieties where needed, the negative impacts can be minimized, thereby optimizing the irrigation potential of the groundwater.
Results of ML modeling
The results of the employed ML models are outlined and detailed in the following subsections. Notably, input features X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 correspond to TDS, pH, EC, , Cl−, , Na+, Ca2+, Mg2+, and K+, respectively, while Y corresponds to IWQI. Table 5 shows the optimal main hyperparameters of the adopted models based on the BO method.
Model . | Hyperparameter . | Range . | Optimal value . |
---|---|---|---|
RF | n_estimators | 10–1,000 | 619 |
max_depth | 3–30 | 30 | |
min_samples_split | 2–20 | 2 | |
min_samples_leaf | 1–20 | 1 | |
AdaBoost | n_estimators | 10–1,000 | 316 |
learning_rate | 0.001–1,000 | 2.73 | |
loss | linear, square, exponential | linear | |
XGBoost | n_estimators | 10–1,000 | 742 |
max_depth | 3–5 | 4 | |
learning_rate | 0.01–0.3 | 0.25 | |
colsample_bytree | 0.4–1 | 0.74 | |
subsample | 0.4–1 | 0.75 |
Model . | Hyperparameter . | Range . | Optimal value . |
---|---|---|---|
RF | n_estimators | 10–1,000 | 619 |
max_depth | 3–30 | 30 | |
min_samples_split | 2–20 | 2 | |
min_samples_leaf | 1–20 | 1 | |
AdaBoost | n_estimators | 10–1,000 | 316 |
learning_rate | 0.001–1,000 | 2.73 | |
loss | linear, square, exponential | linear | |
XGBoost | n_estimators | 10–1,000 | 742 |
max_depth | 3–5 | 4 | |
learning_rate | 0.01–0.3 | 0.25 | |
colsample_bytree | 0.4–1 | 0.74 | |
subsample | 0.4–1 | 0.75 |
Cross-validation analysis
Quantitative evaluation
Table 6 illustrates the performance improvement of the models before and after hyperparameter tuning, using the performance metrics for both the TR and TS stages. Initially, all models show modest to good performance. However, after tuning, there is a significant enhancement in performance across all models on both metrics, indicating the effectiveness of hyperparameter optimization. Notably, the XGBoost model achieves the highest R2 in TR and TS of 1 and 0.872, respectively, showcasing the substantial impact of tuning on model accuracy and prediction error.
Model . | Metric . | Initial TR . | Tuned TR . | Initial TS . | Tuned TS . |
---|---|---|---|---|---|
RF | R2 | 0.966 | 0.984 | 0.801 | 0.804 |
RMSE | 1.791 | 1.242 | 6.973 | 6.925 | |
MAE | 1.319 | 0.804 | 4.351 | 4.355 | |
MAPE | 2.081 | 1.291 | 9.331 | 9.333 | |
MBE | 0.326 | 0.240 | 2.557 | 2.469 | |
AdaBoost | R2 | 0.983 | 0.991 | 0.780 | 0.810 |
RMSE | 1.275 | 0.947 | 7.337 | 6.814 | |
MAE | 0.900 | 0.620 | 4.448 | 4.396 | |
MAPE | 1.301 | 0.801 | 9.766 | 9.036 | |
MBE | −0.001 | −0.464 | 2.108 | 1.513 | |
XGBoost | R2 | 0.806 | 1.000 | 0.570 | 0.872 |
RMSE | 4.313 | 0.000439 | 10.256 | 5.602 | |
MAE | 3.279 | 0.000337 | 7.409 | 3.239 | |
MAPE | 5.064 | 0.000483 | 14.959 | 7.014 | |
MBE | 0.111 | −0.000001 | 3.335 | 2.156 |
Model . | Metric . | Initial TR . | Tuned TR . | Initial TS . | Tuned TS . |
---|---|---|---|---|---|
RF | R2 | 0.966 | 0.984 | 0.801 | 0.804 |
RMSE | 1.791 | 1.242 | 6.973 | 6.925 | |
MAE | 1.319 | 0.804 | 4.351 | 4.355 | |
MAPE | 2.081 | 1.291 | 9.331 | 9.333 | |
MBE | 0.326 | 0.240 | 2.557 | 2.469 | |
AdaBoost | R2 | 0.983 | 0.991 | 0.780 | 0.810 |
RMSE | 1.275 | 0.947 | 7.337 | 6.814 | |
MAE | 0.900 | 0.620 | 4.448 | 4.396 | |
MAPE | 1.301 | 0.801 | 9.766 | 9.036 | |
MBE | −0.001 | −0.464 | 2.108 | 1.513 | |
XGBoost | R2 | 0.806 | 1.000 | 0.570 | 0.872 |
RMSE | 4.313 | 0.000439 | 10.256 | 5.602 | |
MAE | 3.279 | 0.000337 | 7.409 | 3.239 | |
MAPE | 5.064 | 0.000483 | 14.959 | 7.014 | |
MBE | 0.111 | −0.000001 | 3.335 | 2.156 |
Results show that there is an improvement in R² post-tuning in both TR and TS based on the RF model predictions, indicating a better model fit to the data. The RMSE also decreases in both sets post-tuning, suggesting that the model's predictions are closer to the actual values. The MAE and MAPE both show a decrease in the TR post-tuning, indicating more precise predictions, although changes in the TS are minimal. The MBE also shows a slight reduction post-tuning, indicating a reduction in systematic prediction bias. For the AdaBoost model, the R² shows a notable improvement in the TS post-tuning, indicating a significant increase in the percentage of explained variance. Similarly, RMSE decreases in both TR and TS post-tuning, pointing to enhanced model accuracy. MAE and MAPE both show improvement post-tuning, especially noticeable in the TS, and the MBE moves closer to zero in the TR and reduces in the TS, indicating reduced bias.
The XGBoost model displays a dramatic improvement in R² post-tuning, especially in the TS stage. RMSE decreases significantly post-tuning, particularly in the TS, which indicates much better prediction accuracy. MAE and MAPE see a drastic reduction in both metrics post-tuning across TR and TS, indicating far more accurate predictions. Additionally, the bias is significantly reduced in the TS post-tuning. Overall, the tuning of hyperparameters leads to improvements across all models and metrics, with the effects particularly outstanding in the XGBoost model where tuning results in nearly perfect outcomes on the TR and substantially improved accuracy on the TS. This underscores the importance of model-specific hyperparameter optimization in enhancing predictive performance and reliability across different datasets.
A rank analysis is performed to assess the overall performance of the adopted models based on their performance. The overall ranking of each model is determined by summing up the individual ratings. The model with the highest total rank is considered the least effective, whereas the model with the lowest total rank is considered the most effective. Table 7 illustrates the results of the rank analysis. However, the XGBoost model stands out above all three models with an overall rank of 12. Following AdaBoost has the second overall rank of 23, and RF has the third overall rank of 27.
Model . | Stage . | R2 . | RMSE . | MAE . | MAPE . | MBE . | Total score . | Overall rank . |
---|---|---|---|---|---|---|---|---|
RF | TR | 3 | 3 | 2 | 3 | 2 | 13 | 27 |
TS | 3 | 3 | 2 | 3 | 3 | 14 | ||
AdaBoost | TR | 2 | 3 | 3 | 2 | 3 | 13 | 23 |
TS | 2 | 2 | 3 | 2 | 1 | 10 | ||
XGBoost | TR | 1 | 1 | 1 | 1 | 2 | 6 | 12 |
TS | 1 | 1 | 1 | 1 | 2 | 6 |
Model . | Stage . | R2 . | RMSE . | MAE . | MAPE . | MBE . | Total score . | Overall rank . |
---|---|---|---|---|---|---|---|---|
RF | TR | 3 | 3 | 2 | 3 | 2 | 13 | 27 |
TS | 3 | 3 | 2 | 3 | 3 | 14 | ||
AdaBoost | TR | 2 | 3 | 3 | 2 | 3 | 13 | 23 |
TS | 2 | 2 | 3 | 2 | 1 | 10 | ||
XGBoost | TR | 1 | 1 | 1 | 1 | 2 | 6 | 12 |
TS | 1 | 1 | 1 | 1 | 2 | 6 |
Visual evaluation
SHAP feature importance
For X7 (Na+), a wide range of SHAP values is observed, predominantly on the negative side, indicating that high sodium levels generally decrease the IWQI. The red points on the left side of the plot suggest that higher sodium values lead to more negative impacts. X5 (Cl−) also shows significant influence, with points scattered on both sides of the zero line. This indicates that chloride levels can have both positive and negative effects on the IWQI, depending on the specific value of Cl−. X1 (TDS) demonstrates a clear trend where higher values (red points) tend to decrease the IWQI, as evidenced by the clustering of red points on the left side of the plot. This suggests that higher TDS generally have a negative impact on the prediction. X3 (EC) and display a mix of positive and negative impacts. For X3 (EC), there are instances where high values contribute positively to the IWQI, while in other cases, they have a negative effect. shows a similar pattern, indicating that its impact on the model is context-dependent.
X2 (pH) has a more concentrated impact around the zero line, suggesting a relatively neutral effect on the IWQI across different pH levels. However, there are still some instances of positive and negative impacts, indicating variability in its influence. X10 (K+) and show a minimal impact on the IWQI, as evidenced by the clustering of points around the zero line. This indicates that variations in potassium and sulfate levels do not significantly affect the predictions. X8 (Ca2+) and X9 (Mg2+) also exhibit low SHAP values clustered around zero, suggesting a minor influence on the model. However, there are a few instances where high values of these features slightly impact the IWQI.
Figure 10(b) presents a bar chart of mean absolute SHAP values for the 10 features, quantifying their average impact on the IWQI. The bar plot shows that X7 (Na+) has the highest mean SHAP value, indicating that it is the most influential feature in the model. High sodium levels generally decrease the IWQI significantly, as previously observed in the detailed SHAP summary plot. X5 (Cl−) is the second most influential feature, suggesting that chloride levels have a considerable impact on IWQI. Cl− levels can have both positive and negative effects, and the high mean SHAP value emphasizes its overall importance. X1 (TDS) has a moderate mean SHAP value, indicating that TDS have a noticeable impact. The trend observed earlier showed that higher values of TDS generally decrease the IWQI, which is consistent with its moderate influence.
X3 (EC) and have similar mean SHAP values, showing that EC and both have a moderate but meaningful influence on the model. Their impacts are context-dependent, displaying both positive and negative contributions depending on specific conditions. X2 (pH) has a lower mean SHAP value compared with the previously mentioned features, suggesting that pH levels have a relatively neutral effect on the IWQI, with some variability. This aligns with the observation of its concentrated impact around the zero line. X10 (K+), , , and X8 (Ca2+) have the lowest mean SHAP values, indicating that K+, , Mg2+, and Ca2+ have a minimal impact on the model's predictions. This corresponds with the earlier observation that variations in these features do not significantly affect the IWQI.
In summary, both the SHAP summary plot and the bar plot provide a comprehensive view of each feature's contribution to IWQI. Features such as sodium (X7), chloride (X5), and TDS (X1) show significant impacts, either positively or negatively, indicating their substantial influence on the IWQI. In contrast, features like potassium (X10), sulfate (X6), magnesium (X9), and calcium (X8) exhibit minimal influence on the predictions. Similar to the findings of Fallatah & Khattab (2023), this study also identified high sodium concentrations as a critical factor in groundwater quality. The study found that sodium concentrations in the collected groundwater samples ranged from 36 to 1,527 mg/L, with an average of 413 mg/L, which exceeded the permissible limit of 200 mg/L set by WHO (2017). This high sodium content can negatively impact irrigation and crop growth due to osmotic stress and other issues (Thirumoorthy et al. 2024). Therefore, the assessment of sodium concentrations is crucial for evaluating groundwater quality for irrigation purposes.
Interactive GUI
On the right side of the GUI, input fields are provided where users can enter the values for each parameter. For example, the values entered are TDS = 3,373 mg/L, pH = 7.00, EC = 5.00 dS/m, HCO3 = 1.36 meq/L, Cl = 36.95 meq/L, SO4 = 13.49 meq/L, Na = 27.01 meq/L, Ca = 8.73 meq/L, Mg = 15.72 meq/L, and K = 0.38 meq/L. After inputting the values, the user can click on the ‘Calculate’ button to compute the IWQI. The calculation considers the input values to generate the IWQI. The output of the IWQI calculation is displayed at the bottom of the GUI. In this example, the IWQI is calculated to be 83.44. Thus, the water quality in this example is categorized as ‘Very Good’ according to Meireles et al. (2010). This suggests that the water is suitable for irrigation purposes. Thus, the developed GUI model can be used by local farmers to monitor groundwater quality and make informed decisions about irrigation practices, thereby improving crop yield and sustainability.
CONCLUSIONS
This research has provided vital insights into the characteristics of groundwater salinity and its hydrochemical properties, which are critical for assessing its suitability for irrigation purposes in the El Moghra aquifer. The use of ML models: RF, AdaBoost, and XGBoost models, optimized with BO, has proven effective in predicting the IWQI. This approach not only enhances our understanding of the aquifer's condition but also showcases the robust capabilities of predictive models in environmental science. Summing up the results, the following key findings can be concluded as follows:
The substantial evidence in favor of using ML models in predicting IWQI comes from the good agreement between the predicted and actual values with R2 values.
The optimization of hyperparameters notably enhanced the models' performance. For instance, the fine-tuned XGBoost model achieved nearly perfect R2 values of 0.570 and 0.872 in the TS stage, respectively. This highlights the critical role of hyperparameter tuning in achieving optimal model performance.
The XGBoost model consistently outperformed RF and AdaBoost models across both the TR and TS stages. This superior performance was further confirmed through the BO + 5CV process, making XGBoost the most accurate and reliable model.
The rank analysis further quantified the overall performance of the models, with XGBoost achieving the best rank due to its high-performance metrics.
Based on the SHAP feature importance analysis, sodium (Na+) concentration was identified as the most significant factor affecting groundwater quality.
The developed user-friendly GUI has practical applications that would enable local farmers and water managers to utilize the research findings effectively.
Despite these advancements, the study recognizes several limitations. The reliance on a selected set of ML models may not comprehensively capture all hydrochemical dynamics due to model-specific biases or assumptions. Additionally, the focus on a specific geographic region may limit the generalizability of the findings to other arid environments with different hydrogeological characteristics. For future research directions, it is recommended to incorporate a wider range of predictive modeling techniques and to extend the geographic scope of the studies to validate and possibly enhance the applicability of the findings. To mitigate groundwater salinity, it is recommended that irrigation practices in the northern region be adjusted, including the use of alternative water sources or the implementation of salt-tolerant crops.
Moreover, it is crucial to implement continuous monitoring and adaptive management strategies to respond effectively to any changes in groundwater quality, thereby ensuring the sustainability of agricultural productivity in the El Moghra region and similar arid areas. These strategies will be vital in adapting to potential environmental changes and in making informed decisions for water resource management. Furthermore, such tools would help end-users make informed decisions about water use without needing in-depth knowledge of ML. Policymakers should consider integrating the insights from this study into regulations and guidelines for sustainable water resource management, particularly in arid regions prone to water scarcity. Optimizing water use efficiency and prioritizing water quality monitoring should be central strategies in sustaining agricultural productivity. Additionally, offering training programs for local water managers and technical staff on interpreting and using ML predictions could enhance the operational utility of these advanced tools in routine decision-making processes.
ACKNOWLEDGEMENTS
The authors thank the editor and the anonymous reviewers for their valuable comments.
FUNDING
This research received no external funding.
AUTHOR CONTRIBUTIONS
All authors have equal contributions toward the completion of the research work.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare no conflict of interest.