This study examined the groundwater quality in Ha'il according to World Health Organization (WHO) standards using the entropy-weighted water quality index (EWQI) more accurately. The study investigated several parameters in groundwater quality and found that more than 75% of the changes in Ha'il can be attributed to four main factors (MF1, MF2, MF3, and MF4). The MF1 was found to have the biggest role in controlling more than 33% of the changes in the water quality. Due to the entropy calculations for each parameter, zinc was found to have the highest rate of influence on groundwater quality. The results of the EWQI showed that the highest number of samples (76%) had Rank 2 and good quality. Also, it was tried to couple EWQI with machine-learning techniques to improve the model performance and survey the related results in this study. The results showed that the efficiency criteria are improved noticeably. Root-mean-square error decreases by 25%, and the determination coefficient (R2) increases by 27.94%.

  • The study evaluates the groundwater quality in Ha'il, Saudi Arabia, utilizing entropy-weighted water quality index to provide a more accurate evaluation of water quality according to the WHO standards.

  • The study examines 12 parameters in groundwater quality and uses factor analysis to identify the four most important factors affecting water quality in Ha'il, Saudi Arabia.

In recent decades, with the increase in population and industrial expansion, the demand for using surface and groundwater resources has greatly increased (Javidan et al. 2022; Khosravi et al. 2022; Yavari et al. 2022). Human health and hygiene are closely related to the quality of groundwater, which is the most significant source of consumable water in many parts of the world (Aliasghar et al. 2022; Xiong et al. 2022). Groundwater quality is affected by various factors such as the entry of urban, rural, and industrial sewage, the entry of chemical fertilizers, leakage from reservoirs and oil transmission lines, and the location of waste disposal sites, which can lead to its deterioration and destruction (Bonte et al. 2011). With the increase in urbanization and the development of cities, there has been an increase in the extraction of groundwater resources and improper disposal of waste produced from these areas, as well as an increase in the production of industrial units to supply goods and materials, which has resulted in the widespread contamination and leakage of pollutants from these areas (Kumar et al. 2022; Nihalani et al. 2022; Tran et al. 2023).

As mentioned earlier, the groundwater quality is influenced by various factors, including urban, rural, and industrial sewage, chemical fertilizers, and waste disposal sites. However, a more comprehensive analysis of the magnitude and significance of these factors in the specific context of Ha'il, Saudi Arabia, would provide valuable insights. With a detailed analysis of the specific sources and their respective contributions to groundwater deterioration in Ha'il, Saudi Arabia, the understanding of the environmental challenges faced in the region can be enhanced. Including relevant statistics, case studies, and local data will strengthen the impact of this discussion and provide a more comprehensive picture of the groundwater pollution sources in Ha'il (El-Rawy et al. 2023).

Nowadays, various methods are used to evaluate water quality, including fuzzy methods, membership degree, factor analysis, grey modeling, and hierarchical analytical processing (Abad et al. 2023). However, these methods do not have the ability to accurately describe water pollutants, and it is not possible to explain whether the selected parameters for assessing groundwater quality in the region of interest are appropriate (Jainhua et al. 2011; Nihalani et al. 2022).

The water quality index (WQI) is a mechanistic method that uses a numerical expression to determine the quality of groundwater in a specific area. Nowadays, this method is widely used in many parts of the world due to its high ability to express and fully describe water quality information, as well as the use of important and influential parameters in assessing and managing groundwater quality (dos Santos Simoes et al. 2008; Luo et al. 2021). It is a mechanistic method that utilizes a numerical expression to determine the overall quality of groundwater in a specific area. Unlike other methods such as fuzzy methods, membership degree, factor analysis, grey modeling, and hierarchical analytical processing, the WQI offers distinct advantages and enhancements in groundwater quality assessment:

  • (1)

    Comprehensive representation: The WQI provides a comprehensive representation of water quality by considering multiple parameters simultaneously. It takes into account a range of water quality parameters, such as pH, dissolved oxygen (DO), conductivity, turbidity, and the presence of various contaminants, to provide a holistic assessment. This comprehensive approach enables a more accurate evaluation of overall water quality compared to methods that rely on a single parameter or a limited set of parameters.

  • (2)

    Weighted parameter evaluation: The WQI incorporates weighting factors for different parameters based on their relative importance in determining water quality. This weighting allows for a more realistic representation of the significance of each parameter in the overall assessment. By assigning appropriate weights, the WQI can effectively capture the varying impacts of different parameters on water quality, providing a more nuanced evaluation.

  • (3)

    Standardized evaluation: The WQI employs standardized criteria or thresholds for different water quality parameters, often based on regulatory guidelines or international standards such as those set by the World Health Organization (WHO). This standardized approach ensures consistency and comparability in assessing water quality across different regions and facilitates effective decision-making regarding water resource management and protection.

  • (4)

    Numerical expression and interpretability: The WQI provides a numerical expression of water quality, typically on a scale from 0 to 100 or in different quality categories, making it easily interpretable for stakeholders and nonexperts. This allows for straightforward communication of water quality information and facilitates understanding of the severity or overall condition of the water resource.

  • (5)

    Management and policy implications: The WQI's numerical expression and comprehensive assessment can be valuable for water resource management and policy formulation. It helps in identifying areas or specific parameters that require immediate attention and remedial measures. By using the WQI as a decision-support tool, policymakers and water resource managers can prioritize actions, allocate resources effectively, and implement targeted measures for water quality improvement (Patel et al. 2023).

Numerous studies have been conducted in this regard, and some of them are mentioned here. Prabha et al. analyzed the groundwater quality of the Hosur region by examining multiple points and using 14 hydrochemical parameters to determine that the samples under investigation were not suitable for direct use as drinking water (Rajankar et al. 2009). In a study conducted by Rajankar et al. (2009), the quality of groundwater in 22 locations in Maharashtra, India, was analyzed during both wet and dry seasons, with the results showing significant changes in groundwater quality based on the WQI. Reza & Singh (2010) evaluated the groundwater quality in Orissa, India, and reported that the groundwater in this area is significantly affected by the concentration of soluble salts (fluoride, nitrate, calcium, and magnesium) according to WQI. Ishaku (2011) conducted a study to evaluate the quality of groundwater in the Jimeta-Yola region in northeastern Nigeria, with WQI showing a lower value during the dry season compared to the wet season, indicating better water quality in this area.

Despite the widespread use of the WQI, the weight of each parameter in WQI calculations is determined empirically by the specialist, and in this way, much valuable information about groundwater quality is ignored. The optimized WQI equation using entropy-weighted water quality index (EWQI) improves upon the traditional WQI method, which enhances the accuracy and reliability of groundwater quality assessment by (1) consideration of parameter importance, (2) handling parameter interactions, (3) flexibility and adaptability, (4) addressing limitations of traditional WQI. Overall, the inclusion of entropy weighting in the EWQI improves the WQI by providing a more comprehensive and nuanced approach to groundwater quality assessment. It enhances the accuracy of pollutant descriptions, considers parameter interactions, and allows for customization based on local conditions, thereby improving the overall reliability of the assessment (Unigwe & Egbueri 2023).

The EWQI, as one of the novel approaches for assessing the groundwater quality, offers the advantages such as (1) incorporating parameter importance, (2) accounting for parameter interactions, (3) improved objectivity and transparency, (4) addressing data uncertainties, and (5) enhancing the accuracy of assessments, over existing methods for assessing the groundwater quality. These techniques address the limitations mentioned earlier by incorporating parameter importance, accounting for parameter interactions, improving objectivity and transparency, addressing data uncertainties, and enhancing the accuracy of assessments. By employing these entropy-weighting techniques, researches can provide more accurate and robust evaluations of groundwater quality, leading to more informed decision-making and effective water resource management strategies.

The EWQI was developed through a rigorous process that involved several adjustments and modifications to incorporate entropy weighting. The specific adjustments included calculating entropy values and weights for each parameter, with higher weights assigned to parameters with a greater influence on groundwater quality. These adjustments were made to address the limitations of the traditional WQI, which often relied on empirical determination of parameter weights.

By incorporating entropy weighting, the EWQI method aims to provide a more accurate and comprehensive assessment of groundwater quality. The use of entropy weights helps to capture the information content of each parameter, considering both their individual importance and their interactions within the groundwater system. This approach enables a more nuanced evaluation of water pollutants and their impacts on overall water quality.

The introduction of entropy weighting in the EWQI method also addresses the limitation of subjective weight assignment in traditional methods. By using a quantitative and data-driven approach, the EWQI method enhances the objectivity and reliability of groundwater quality assessments. It allows for a more robust comparison of different parameters and their relative contributions to water quality.

However, it is important to note that the development and application of the EWQI method are not without challenges. The calculation of entropy weights requires a comprehensive dataset and appropriate statistical techniques. Ensuring the accuracy and representativeness of the data used for entropy calculations is crucial for obtaining reliable results.

In addition, while the EWQI method provides a valuable approach for assessing groundwater quality, it should not be considered in isolation. Combining the EWQI method with other complementary methods, such as machine-learning techniques like artificial neural networks (ANNs) and the M5 model tree, can further enhance the accuracy and predictive capabilities of groundwater quality assessment. The integration of multiple approaches allows for a more comprehensive understanding of water quality dynamics and improves the overall reliability of the assessment.

In summary, the development of the EWQI method using entropy weight represents a significant advancement in groundwater quality assessment. Its incorporation of entropy-weighting addresses the limitations of traditional methods and provides a more objective and comprehensive evaluation of water pollutants. However, further research and exploration are needed to refine the methodology and explore its integration with other approaches to maximize its effectiveness in assessing groundwater quality (Mohinuddin et al. 2023).

In this study, as a developed technique, the optimized WQI equation that uses entropy-weighted WQI was employed to evaluate the quality of groundwater in Ha'il, which has been discussed at the beginning of this section as a significant case study in Saudi Arabia. As a novel strategy, the machine-learning techniques are employed to develop the efficiency of the EWQI method, and the results were compared. This article is organized as follows: the utilized methods are illustrated in Section 2. The obtained results are discussed in Section 3. Overall to provide a comprehensive perspective, sufficient explanations are described in Section 4. Also, helpful suggestions are provided for future surveys.

Study area

The Saq aquifer (Figure 1) is placed in the northwestern area of the Kingdom of Saudi Arabia, covering 375,000 km2 in the northern part of the country. The Saq Sandstones are separated by the vast aeolian Nafud Desert and can be divided into the eastern Qassim-Ha'il region with natural groundwater flow toward the north-east, and the western Tabuk-Tayma region where the flow direction is generally northward. The Saq aquifer is an important source of groundwater for human consumption and agriculture in the region. However, like many aquifers around the world, it also faces challenges related to overextraction, contamination from various sources such as urban, rural, and industrial sewage, chemical fertilizers, oil transmission lines, and waste disposal sites. Therefore, there is a need to study the Saq aquifer to better understand its characteristics, the factors affecting its groundwater quality, and to develop effective management strategies for its sustainable use.
Figure 1

Location of Saq aquifer in Saudi Arabia.

Figure 1

Location of Saq aquifer in Saudi Arabia.

Close modal

Research method

In this study, 50 groundwater samples collected from Qassim-Ha'il, Saudi Arabia, were used for water quality analysis. The samples were taken from various points in the plain, including deep wells for agricultural and drinking water, shallow rural wells, wells in industrial and recreational areas, and wells drilled near pollution sources such as urban and rural sewage discharges. In this study, 16 water quality parameters were used to assess the status of groundwater quality for drinking targets, involving magnesium (Mg), potassium (K), sodium (Na), calcium (Ca), nitrate (NO3), zinc (Zn), chromium (Cr), chloride (Cl), sulfate (SO4), bicarbonate (HCO3), alkal, electrical conductivity (EC), total dissolved solids (TDS), acidity (pH), DO, and biochemical oxygen demand (BOD). Alkalinity is crucial in groundwater quality assessment due to its role as a buffer against pH changes, maintaining stable conditions for aquatic life and ecosystems. It provides insights into water sources, geochemical processes, and correlations with other parameters. Additionally, alkalinity affects human water use by influencing the taste and quality of drinking water. Monitoring alkalinity helps in water management, pollution identification, and environmental protection.

In the first stage of the study, the use of factor analysis, analysis of statistical properties of hydrochemical parameters, as well as investigation of the geological location of the sampling area can accurately demonstrate the contribution of each parameter in the qualitative variations of the prepared samples. After preliminary investigations, the WQI and EWQI method was employed to analyze the quality of groundwater in Qassim-Ha'il, Saudi Arabia. After that, the ANN and M5 model tree, as the two famous machine-learning techniques, are employed to determine the entropy in the EWQI method to compare the results and provide a vision in the case of combining artificial intelligence and entropy-based models and to calculate the efficiency criteria (Figure 2).
Figure 2

The schematic of the proposed research method.

Figure 2

The schematic of the proposed research method.

Close modal

WQI

The WQI is a numerical indicator that provides a comprehensive assessment of the overall water quality based on multiple water quality parameters. It is widely used as a tool for evaluating and comparing the suitability of water for different purposes, such as drinking, irrigation, and aquatic ecosystem health. The calculation of WQI involves several steps. First, a set of water quality parameters are selected based on their significance in determining water quality. These parameters may include physical, chemical, and biological variables such as temperature, pH, DO, turbidity, nutrients, and pollutants. Once the parameters are selected, each parameter is assigned a weight or importance factor based on its relative significance in influencing water quality. The weighting process is subjective and can be based on expert opinions, regulatory standards, or stakeholder preferences. Next, the measured values of each parameter are compared to standard or guideline values specified for the intended use of water. Deviations from the standards are quantified using a rating scale or a numerical scoring system. Typically, a higher score is assigned to values closer to the desired standards, indicating better water quality. The individual scores for each parameter are then aggregated using an appropriate mathematical formula to obtain a single value, which represents the overall WQI. Different aggregation methods may be employed, such as arithmetic mean, geometric mean, or weighted sum, depending on the specific approach adopted. Finally, the WQI value is often presented on a scale or rating system to facilitate interpretation. This scale can range from 0 to 100 or may be divided into categories, such as excellent, good, fair, poor, and very poor, representing different levels of water quality (Table 1) (Pei-Yue et al. 2010).

Table 1

Quality classification of drinking water based on WQI

WQIRankWater quality
<49 Very good 
49–99 Good 
99–149 Average 
149–199 Bad 
>199 Very bad 
WQIRankWater quality
<49 Very good 
49–99 Good 
99–149 Average 
149–199 Bad 
>199 Very bad 

The formula for calculating the WQI varies depending on the specific methodology used. One widely adopted approach is the weighted arithmetic mean method. The formula for calculating WQI using this method is as follows:
(1)
where W1, W2, … , Wn represent the weights assigned to each water quality parameter. R1, R2, … , Rn represent the rating scores or index values associated with each parameter. The weights (Wi) are determined based on the relative importance or significance of each parameter in influencing water quality. These weights can be assigned subjectively by experts or based on regulatory guidelines. The rating scores (Ri) are calculated for each parameter by comparing the measured value of that parameter to the respective standard or guideline value. The rating scores can be defined using a numerical scale, such as 0 to 100 or 0 to 1, where higher scores indicate better water quality. Alternatively, some methodologies use a nonlinear transformation to obtain the rating scores. Once the WQI is calculated using Equation (1), it provides a single value that represents the overall water quality, taking into account the contributions of multiple parameters. This value allows for easier interpretation and comparison of water quality across different locations or time periods.

EWQI

The concept of entropy was first presented by Shannon in 1948 to express the uncertainty of a random event or the amount of information content of a parameter (Shyu et al. 2011). Shannon entropy indicates the uncertainty of predicted data from a probable event. In mathematical terms, there is an inverse relation between the values of data and the probability of an event. If an event is predicted accurately, its probability will be high, and conversely, the Shannon entropy will be small. Therefore, information and uncertainty explain the two components of information gained and are indirectly calculated by reducing the amount of uncertainty (Crutchik et al. 2020).

Nowadays, various fields such as ecology, hydrology, and water quality use the theory of entropy (Ozkul et al. 2000; Kawachi et al. 2001; Ulanowicz 2001). Shannon entropy can be defined as follows.

Suppose that n pieces of data can be available in the form of with probabilities p(x1), p(x2), … , and p(xn). The fundamental assumption of entropy is based on the amount of data. H(x) is a real nonzero value, additive, and continuous function with probability p. Therefore, the entropy H(x) can be defined as follows (Shyu et al. 2011):
(2)
where is the probability of .

As previously mentioned, in this study, the optimized WQI index was used to evaluate water resources in this plain. To calculate the EWQI, three steps should be followed. In the first stage, the entropy weight of each parameter should be obtained. The stages of calculating entropy, entropy weight, and EWQI are as follows.

Assuming that there are m water samples (i=1, 2,…, m) and n parameters (j=1, 2,…, n) for evaluating the water quality, based on the observed data, the eigenvalue matrix X will be as follows:
(3)
Next, data preparation should be done to reduce the influence of the unit differences of different quality parameters and also the differences in the quality of the samples. Based on the properties of each index, four types and modes can be distinguished, including interval type, fixed type, cost type, and efficiency type. For the efficiency type, the normalization function can be expressed as follows:
(4)
For the cost type, the normalization function of the data is expressed as follows:
(5)
After normalizing the raw data, the data matrix is expressed as follows:
(6)
Next, the ratio of the value of the index for parameter j in sample i should be calculated using the following equation:
(7)
The information entropy is also expressed as follows:
(8)
The lower the entropy value, the greater the impact of parameter j. After calculating the entropy value, the entropy weight (Wj) of each parameter (j) should be obtained using the following equation:
(9)
The second stage in obtaining the EWQI determines the quality ranking criteria (qj) for each parameter, which can be calculated using the following equation:
(10)
In this equation, Cj is the concentration of each chemical parameter in each sample in mg/L and Sj is the concentration of the same parameter based on a specified standard (in this study of WHO) for drinking water in mg/L. In this equation, if parameter j is not present in the water, qj will be equal to zero. Also, if the value of this parameter is equal to the allowable value, qj will be equal to 100. It should be noted that the amount of pH variations is small, which makes the quality ranking criterion for this parameter very small as well. According to the WHO standards, the permissible pH variation is between 5.6 and 9.2; therefore, the following equation is used to calculate the quality ranking criterion for pH:
(11)

In this equation, qpH is the quality ranking criterion for pH, CpH is the observed pH value, SpH is the maximum permissible pH value (according to WHO, equal to 9.2), and SI is the ideal pH value.

The final step in calculating the EWQI is to multiply the weight of entropy and the quality ranking criterion for each parameter and sum all these values together (Jianhua et al. 2011):
(12)
Based on EWQI, groundwater can be classified into five categories of very good, good, moderate, bad, and very bad in terms of drinking water quality (Table 1 is also applied to the subject of EWQI).

ANN

ANNs are computational models inspired by the structure and functioning of biological neural networks, particularly the human brain. ANNs are powerful machine-learning algorithms capable of learning complex patterns and relationships from input data. They are widely used for various tasks such as pattern recognition, prediction, classification, and data modeling. The basic building block of an ANN is an artificial neuron, also known as a perceptron. Neurons receive inputs, apply a mathematical transformation to them, and produce an output. In an ANN, neurons are organized into layers, including an input layer, one or more hidden layers, and an output layer. The connections between neurons, represented by weights, determine the strength and importance of the input signals. The learning process in an ANN involves training the network on a labeled dataset, where the network adjusts its internal weights based on the input–output pairs. This is typically done using optimization algorithms such as gradient descent, which minimize the difference between the network's predicted outputs and the actual outputs. During training, the network learns to generalize from the training examples and can make predictions on unseen data. ANNs are known for their ability to capture complex nonlinear relationships in data. They can handle large amounts of input variables, detect intricate patterns, and adapt to changing conditions. ANNs can be trained in a supervised manner, where the desired outputs are known, or in an unsupervised manner, where the network learns patterns and structures in the data without explicit labels. The success of ANNs lies in their ability to automatically learn and extract meaningful features from raw data, eliminating the need for manual feature engineering. This makes ANNs highly versatile and applicable to various domains, including image and speech recognition, natural language processing, time series analysis, and more. However, ANNs often require substantial computational resources, and their training and interpretation can be complex. It is worth noting that there are different architectures and variations of ANNs, such as feedforward neural networks, recurrent neural networks, convolutional neural networks, and deep neural networks, each designed to address specific types of problems. The choice of the ANN architecture depends on the nature of the data and the task at hand (Yan & Jia 2023).

M5 model tree

The M5 model tree is a machine-learning algorithm that combines decision trees with linear regression. It is an extension of the traditional decision tree algorithm that aims to improve the accuracy and interpretability of the model. Like decision trees, the M5 model tree is a predictive model that uses a tree-like structure to make predictions based on input features. The tree is constructed by recursively splitting the data based on different feature thresholds, creating branches that represent different decision paths. Each leaf node of the tree corresponds to a prediction or a linear regression model. The M5 model tree improves upon traditional decision trees by introducing linear regression models at the leaf nodes. This allows for more accurate predictions, especially when the relationship between the input features and the target variable is not strictly categorical. The linear regression models provide continuous outputs, enabling more precise estimation and handling of numerical data. One of the key advantages of the M5 model tree is its ability to capture both global and local relationships in the data. The tree structure captures the global patterns and relationships between the features, while the linear regression models at the leaf nodes capture the local relationships specific to each leaf. This combination allows the model to adapt to different regions of the feature space and provide more accurate predictions. The M5 model tree also offers interpretability by providing transparent rules for making predictions. Each split in the tree represents a decision based on a specific feature, allowing for easy interpretation of the model's reasoning. In addition, the linear regression models at the leaf nodes provide insight into the relationship between the input features and the target variable. The M5 model tree algorithm involves two main steps: tree construction and model pruning. In the tree construction phase, the algorithm recursively splits the data based on different feature thresholds to create an initial tree structure. In the pruning phase, the algorithm assesses the quality and complexity of the tree and prunes unnecessary branches to prevent overfitting and improve generalization. Overall, the M5 model tree algorithm combines the simplicity and interpretability of decision trees with the accuracy of linear regression models. It is particularly useful when dealing with datasets that exhibit both categorical and continuous relationships, providing a flexible and powerful modeling technique for a wide range of applications (Nejatian et al. 2023).

Factor analysis

Factor analysis is a statistical technique used to examine the interrelationships among a set of observed variables and identify underlying factors that explain the patterns of covariance among these variables. It aims to reduce the complexity of a dataset by extracting a smaller number of latent factors that capture the common variance shared by the observed variables. In the context of groundwater quality assessment, factor analysis can help identify the key underlying factors that contribute to variations in water quality parameters. By analyzing the correlations among the measured parameters, factor analysis can group them into a smaller set of factors that represent different aspects of water quality. The results of factor analysis provide insights into the dominant factors influencing water quality and can help in simplifying the interpretation of complex datasets. Each factor represents a linear combination of the original variables, and the factor loadings indicate the strength and direction of the relationship between the factor and each variable. By analyzing the loadings, researchers can understand which variables contribute the most to each factor and interpret the underlying processes or sources that influence water quality. Factor analysis can be a valuable tool for understanding the underlying structure of water quality data, identifying the key factors affecting water quality, and informing decision-making processes related to water resource management and pollution control.
(13)
where covx,y is the covariance between variable x and y, xi is the data value of x, yi is the data value of y, and are the mean of x and y, and N represents the number of data values.

Efficiency criteria

When comparing the results of hybrid models like ANN/M5-EWQI with the simple EWQI, several efficiency criteria can be used to assess their performance. The choice of criteria depends on the specific objectives and requirements of the study. Here are some commonly used efficiency criteria:

  • (1)
    Root-mean-square error (RMSE): RMSE measures the average difference between the predicted and observed values. A lower RMSE indicates better accuracy and closer agreement between the predicted and actual values (Nourani et al. 2019a).
    (14)
  • (2)
    Coefficient of determination (R2): R2 represents the proportion of the variance in the observed data that is explained by the model. It ranges from 0 to 1, with a higher value indicating a better fit of the model to the data (Nourani et al. 2019b).
    (15)
    where RMSE, R2, N, , , and are determination coefficient, RMSE, number of observations, observed sample value, predicted values, and the mean of observed sample value data, respectively.

Discussion about the EWQI obtained results

Before calculating the EWQI value for each water sample, statistical analysis was performed on the parameters of interest. Table 2 presents the statistical properties of the 50 samples used, along with the WHO standards for drinking water.

Table 2

Statistical summary along with WHO standard for each parameter

MinimumMaximumMeanSDWHO
Ca 18 819 117.34 115.29 74 
Na 2,381.8 189.75 249.98 198 
90 16.01 15 198 
Mg 1.5 144.91 28.68 30.82 50 
NO3 334.88 38 46.50 40–70 
HCO3 60 544.1 147.98 60.59 500 
SO4 0.3 2,121 271.95 370.04 250 
Cr 0.01 120 1.98 16.1 0.05 
Cl 0.4 4,421 312.82 441.32 250 
Zn 0.85 4.08 2.48 0. 
EC 3.51 14,439.56 1,686.41 1,700.85 500 
TDS 175.21 9,860.97 1,083.32 1,161.49 500 
pH 6.5 9.7 7.50 0.40 6.5–9.2 
Alkal 29 221 108.01 30.87 20 
DO 0.20 10.20 5.64 1.68 
BOD 0.00 70.00 7.10 13.93 
MinimumMaximumMeanSDWHO
Ca 18 819 117.34 115.29 74 
Na 2,381.8 189.75 249.98 198 
90 16.01 15 198 
Mg 1.5 144.91 28.68 30.82 50 
NO3 334.88 38 46.50 40–70 
HCO3 60 544.1 147.98 60.59 500 
SO4 0.3 2,121 271.95 370.04 250 
Cr 0.01 120 1.98 16.1 0.05 
Cl 0.4 4,421 312.82 441.32 250 
Zn 0.85 4.08 2.48 0. 
EC 3.51 14,439.56 1,686.41 1,700.85 500 
TDS 175.21 9,860.97 1,083.32 1,161.49 500 
pH 6.5 9.7 7.50 0.40 6.5–9.2 
Alkal 29 221 108.01 30.87 20 
DO 0.20 10.20 5.64 1.68 
BOD 0.00 70.00 7.10 13.93 

As is evident from Table 2, the mean values of Ca, Cr, EC, DO, and BOD are higher than the WHO standard for drinking water, which is due to the geological structures and the solubility of carbonate sediments, leading to an increase in the concentration of these ions in groundwater. The increase in EC values and the higher mean value than the WHO standard could be due to the abundant evaporative deposits observed on the surface of the plain and in the various depths of the aquifers in this area. The high level of heavy metal chromium could also be justified by the activities of industrial units in this region. Considering that the samples in this study were obtained from urban and rural areas as well as being close to many industrial and agricultural units, fluctuations in water quality parameters such as DO and BOD could be expected. In this regard, the average pH and NO3 values are within the WHO-defined permissible range, and the mean values of other parameters are less than the permissible limit, which suggests that more than half of the samples could be classified as having the suitable water quality.

The distribution of certain elements shows positive skewness in their frequency distributions, with some exceptionally high values present. The data exhibit distinct outliers for several elements (Cl, Mg, Na, EC, TDS, Ca, HCO3, K, SO4, and NO3), indicating potential natural or human-induced processes. According to Table 2, these extreme values appear to be separate from the main distribution, deviating from a continuous pattern. In addition, straight-line segments are observed for certain elements like pH and alkalinity. On the left side of the diagram, there is a vertical line representing samples below the detection limit (e.g., Na, Cl, NO3, and SO4). Departing from normality, most elements tend to have higher values that are skewed toward the right side of the distribution line. This positive skewness or convex distribution feature is evident for the majority of elements. Furthermore, the elements exhibit a diverse range of geochemical families and display distributions that are closer to log-normal shapes. This information is valuable in identifying outliers, as numerous individual samples are visibly separated from the distribution curve.

A more precise analysis of the impact of each parameter or category of parameters on the changes in groundwater quality in this plain can be carried out through the method of factor analysis, which will be briefly discussed below. Factor analysis is a multivariate statistical method that reduces the main variables to fewer factors through a type of rearrangement and uses these factors to prepare the best interpretable loading pattern (Shyu et al. 2011). The results of factor analysis of 16 quality parameters related to 50 groundwater samples in Qassim-Ha'il, Saudi Arabia, are presented in Table 3.

Table 3

The rotated factor loadings based on Varimax rotation

ElementFactor
MF1MF2MF3MF4
Ca 0.77 0.08 0.13 −0.35 
Na 0.88 0.21 0.16 0.19 
0.23 0.86 −0.07 0.01 
Mg 0.89 −0.03 −0.05 0.20 
NO3 0.25 −0.20 0.55 0.04 
SO4 0.19 −0.18 0.54 0.03 
HCO3 0.18 −0.16 0.53 0.05 
Cr 0.04 0.04 0.29 0.75 
Cl 0.05 0.06 0.30 0.77 
Zn −0.08 0.10 0.36 −0.68 
EC 0.91 0.25 0.24 0.07 
TDS 0.33 0.77 −0.08 −0.02 
pH −0.04 −0.08 −0.85 0.01 
Alkal −0.05 −0.07 −0.86 0.03 
DO −0.06 −0.70 −0.49 −0.01 
BOD −0.14 0.91 −0.09 −0.07 
Special amount 3.98 2.33 1.50 1.21 
Variance percentage 34.05 19.19 12.84 10.01 
Cumulative percentage variance 33.67 52.87 65.61 75.65 
ElementFactor
MF1MF2MF3MF4
Ca 0.77 0.08 0.13 −0.35 
Na 0.88 0.21 0.16 0.19 
0.23 0.86 −0.07 0.01 
Mg 0.89 −0.03 −0.05 0.20 
NO3 0.25 −0.20 0.55 0.04 
SO4 0.19 −0.18 0.54 0.03 
HCO3 0.18 −0.16 0.53 0.05 
Cr 0.04 0.04 0.29 0.75 
Cl 0.05 0.06 0.30 0.77 
Zn −0.08 0.10 0.36 −0.68 
EC 0.91 0.25 0.24 0.07 
TDS 0.33 0.77 −0.08 −0.02 
pH −0.04 −0.08 −0.85 0.01 
Alkal −0.05 −0.07 −0.86 0.03 
DO −0.06 −0.70 −0.49 −0.01 
BOD −0.14 0.91 −0.09 −0.07 
Special amount 3.98 2.33 1.50 1.21 
Variance percentage 34.05 19.19 12.84 10.01 
Cumulative percentage variance 33.67 52.87 65.61 75.65 

The results of the factor analysis indicate that by using 16 parameters studied in 50 groundwater samples, more than 75% of the quality changes can be evaluated by four main factors (MF1, MF2, MF3, and MF4). Among these, the MF1 with more than 33% of the changes has the most important role, followed by the MF2 with more than 18%, the MF3 with more than 11%, and the MF4 with nearly 10% as the most influential factors on the water quality in the Qassim-Ha'il, Saudi Arabia. As the results suggest, the MF1 consisting of calcium, sodium, magnesium, and EC, which can be attributed to the geological and natural factors of the region, plays a significant role in the concentration and changes in their concentration. The MF2 includes K, TDS, DO, and BOD, which are affected by natural and geological factors of the region, as well as the hydrochemical conditions prevailing in the environment. In MF3, where we face a lower pH environment and less NO3, the human factor has a greater impression on the quality changes of groundwater due to the seepage and infiltration of urban and rural sewage and the use of nitrate fertilizers in agricultural lands. The presence of chromium and zinc in MF4, as the controlling factors of groundwater quality changes in the Qassim-Ha'il, Saudi Arabia, can clearly indicate the impact of industrial units in the area on the deterioration of water quality.

After examining the influential factors on the groundwater quality changes in Qassim-Ha'il, Saudi Arabia, to investigate the origin and relationship between the changes in different parameters, the analysis of correlation coefficients of the relevant hydrochemical parameters was conducted. The correlation coefficients are presented in Table 4.

Table 4

Correlation coefficients of hydrochemical parameters

ECTDSpHAlkalCaKMgNaClSO4HCO3NO3CrZnDOBOD
EC                
TDS 0.91               
pH −0.10 −0.18              
Alkal 0.14 0.13 −0.15             
Ca 0.79 0.84 −0.14 0.09            
0.39 0.39 −0.18 0.20 0.18           
Mg 0.80 0.88 −0.16 0.07 0.74 0.51          
Na 0.89 0.93 −0.14 0.11 0.73 0.47 0.79         
Cl 0.85 0.88 −0.12 0.06 0.75 0.40 0.74 0.89        
SO4 0.78 .86 −0.13 0.10 0.79 0.36 0.79 0.80 0.80       
HCO3 0.21 0.29 −0.15 0.13 0.04 0.39 0.35 0.26 0.20 0.15      
NO3 0.25 0.30 −0.08 −0.04 0.45 −0.31 0.10 0.27 0.23 0.27 −0.11     
Cr −0.08 0.17 −0.09 0.21 0.03 0.20 0.08 0.11 0.24 0.14 −0.09 0.23    
Zn 0.16 −0.08 0.18 −0.10 0.01 −0.19 0.02 −0.12 0.32 −0.23 0.01 0.18 0.19   
DO −0.18 −0.24 −0.16 −0.31 −0.51 −0.02 −0.04 −0.09 −0.14 −0.33 −0.45 0.50 0.21 −0.15  
BOD 0.01 −0.02 0.01 −0.02 0.76 −0.17 −0.15 −0.04 0.08 0.02 0.53 −0.12 −0.50 −0.12 0.14 
ECTDSpHAlkalCaKMgNaClSO4HCO3NO3CrZnDOBOD
EC                
TDS 0.91               
pH −0.10 −0.18              
Alkal 0.14 0.13 −0.15             
Ca 0.79 0.84 −0.14 0.09            
0.39 0.39 −0.18 0.20 0.18           
Mg 0.80 0.88 −0.16 0.07 0.74 0.51          
Na 0.89 0.93 −0.14 0.11 0.73 0.47 0.79         
Cl 0.85 0.88 −0.12 0.06 0.75 0.40 0.74 0.89        
SO4 0.78 .86 −0.13 0.10 0.79 0.36 0.79 0.80 0.80       
HCO3 0.21 0.29 −0.15 0.13 0.04 0.39 0.35 0.26 0.20 0.15      
NO3 0.25 0.30 −0.08 −0.04 0.45 −0.31 0.10 0.27 0.23 0.27 −0.11     
Cr −0.08 0.17 −0.09 0.21 0.03 0.20 0.08 0.11 0.24 0.14 −0.09 0.23    
Zn 0.16 −0.08 0.18 −0.10 0.01 −0.19 0.02 −0.12 0.32 −0.23 0.01 0.18 0.19   
DO −0.18 −0.24 −0.16 −0.31 −0.51 −0.02 −0.04 −0.09 −0.14 −0.33 −0.45 0.50 0.21 −0.15  
BOD 0.01 −0.02 0.01 −0.02 0.76 −0.17 −0.15 −0.04 0.08 0.02 0.53 −0.12 −0.50 −0.12 0.14 

If the correlation coefficients are greater than 70%, two parameters have a very strong correlation. Likewise, the average correlation coefficient can be observed between 50 and 70% at an important level of p<0.05. A correlation coefficient of less than 30% is considered as a lack of correlation between the parameters (Shyu et al. 2011).

The investigation of the correlation coefficients of the studied parameters showed that the highest correlation could be observed between EC and Na (r=0.89), EC and Mg (r=0.80), EC and Ca (r=0.79), as well as Mg and Na (r=0.79). Based on these results, the origin of most parameter changes can be attributed to environmental factors and the geological structure of the region.

A high correlation between EC and TDS (r=0.91) indicates a strong relationship between these two parameters in water samples. EC is a measure of the ability of water to conduct electrical current, which is influenced by the presence of dissolved ions and minerals. TDS, on the other hand, represents the total concentration of dissolved solids in water, including minerals, salts, and other dissolved substances. The high correlation between EC and TDS suggests that the concentration of dissolved solids in water directly contributes to its EC. When the TDS level increases, it leads to a corresponding increase in the EC of the water. Therefore, changes in TDS levels can be reliably estimated or inferred by measuring EC. This correlation is particularly useful in water quality assessment as it provides a simple and convenient way to estimate TDS levels without directly measuring them. By measuring EC, which is relatively easier and faster, the TDS level can be estimated using empirical relationships established through the observed correlation between these two parameters. It is important to note that while EC and TDS are strongly correlated, they are not the same. TDS represents the actual concentration of dissolved solids, while EC measures the conductivity associated with the presence of those solids. Therefore, it is still recommended to directly measure TDS when precise quantification of dissolved solids concentration is required.

Also, according to Table 4, the strong correlation between SO4 and Ca (r=0.79) as well as Mg (r=0.79) suggests a potential connection to source rocks containing dolomites and gypsum. Several variables also demonstrated relatively strong correlations, aligning with the findings from the factor analysis. However, it is worth mentioning that HCO3 and other variables exhibited relatively higher levels of noise. These results support the overall conclusions drawn from the factor analysis.

After initial evaluations and analysis of the relations between parameters and their probable sources using practical analysis and correlation coefficients, as well as comparing the mean of each parameter with WHO standards, the most important part of the study, which determines the EWQI and the ranking of each water sample for drinking purposes, was carried out using the above relationships. In this regard, the weight of each parameter and entropy value were first calculated, and then, the WQI value was determined based on the water quality ranking criteria. Finally, this value was multiplied by the weight of each parameter, and the sum of these amounts for each sample was introduced as the EWQI value for that sample (Equation (11)).

Continuing, the results of calculating the entropy value and weight of entropy for each parameter are shown in Table 5.

Table 5

Results of the calculation of entropy value and entropy weight

CaNaKMgNO3CrClHCO3
Entropy value 5.73 5.62 5.80 5.79 5.73 5.84 5.65 5.81 
Entropy weight 3.70 3.71 3.70 3.74 3.66 3.65 3.71 3.69 
Zn
EC
TDS
pH
DO
BOD
SO4
Alkal
Entropy value 5.61 5.79 5.81 5.80 5.79 5.81 5.78 5.76 
Entropy weight 3.71 3.69 3.70 3.69 3.69 3.67 3.66 3.65 
CaNaKMgNO3CrClHCO3
Entropy value 5.73 5.62 5.80 5.79 5.73 5.84 5.65 5.81 
Entropy weight 3.70 3.71 3.70 3.74 3.66 3.65 3.71 3.69 
Zn
EC
TDS
pH
DO
BOD
SO4
Alkal
Entropy value 5.61 5.79 5.81 5.80 5.79 5.81 5.78 5.76 
Entropy weight 3.71 3.69 3.70 3.69 3.69 3.67 3.66 3.65 

Parameters with the lowest entropy value and the highest weight of entropy have the greatest impact on the groundwater quality (Jianhua et al. 2011). Therefore, Roi has the highest rate of impact on the groundwater quality of Qassim-Ha'il, Saudi Arabia, followed by calcium, sodium, potassium, magnesium, nitrate, acidity, and DO demand in subsequent ranks.

Furthermore, the summation of weights is a criterion for determining the stability of groundwater quality in terms of the parameter of interest. Lower values indicate high instability and continuous changes in groundwater quality (Shyu et al. 2011). Therefore, chromium exhibits the highest instability and continuous changes among the parameters, likely due to local and possibly seasonal input of this parameter into the subsurface environment as a result of industrial activities. After chromium, the smallest and continuous changes can be considered for BOD. The determination of the EWQI using the entropy weight obtained from the previous section and ranking the groundwater quality of the samples collected for drinking water consumption is the main objective of this study, which is presented in Table 6 along with the results of the calculation of EWQI and the ranking of each water sample.

Table 6

EWQI and the quality rating of each sample for drinking according to the WHO standard

SampleQuality rankEWQISampleQuality rankEWQI
55.98 26 81.89 
72.72 27 110.98 
87.03 28 84.32 
65.98 29 48.05 
60.89 30 172.80 
57.96 31 88.10 
75.85 32 76.59 
59.39 33 94.23 
61.01 34 70.66 
10 92.03 35 65.02 
11 83.01 36 87.04 
12 76.04 37 69.58 
13 49.89 38 113.12 
14 51.03 39 61.59 
15 65.23 40 49.25 
16 157.55 41 170.23 
17 63.07 42 47.27 
18 65.14 43 71.65 
19 58.75 44 52.36 
20 239.08 45 240.39 
21 108.19 46 70.96 
22 197.95 47 135.36 
23 77.65 48 55.36 
24 84.12 49 94.25 
25 82.50 50 237.23 
SampleQuality rankEWQISampleQuality rankEWQI
55.98 26 81.89 
72.72 27 110.98 
87.03 28 84.32 
65.98 29 48.05 
60.89 30 172.80 
57.96 31 88.10 
75.85 32 76.59 
59.39 33 94.23 
61.01 34 70.66 
10 92.03 35 65.02 
11 83.01 36 87.04 
12 76.04 37 69.58 
13 49.89 38 113.12 
14 51.03 39 61.59 
15 65.23 40 49.25 
16 157.55 41 170.23 
17 63.07 42 47.27 
18 65.14 43 71.65 
19 58.75 44 52.36 
20 239.08 45 240.39 
21 108.19 46 70.96 
22 197.95 47 135.36 
23 77.65 48 55.36 
24 84.12 49 94.25 
25 82.50 50 237.23 

According to the results, two samples, equivalent to 4% of the total 50 samples, are ranked 1 and have very good quality for drinking purposes. The highest number of samples, equivalent to 76%, which is 36 samples, are ranked 2 and have good quality. Samples ranked 3 with average quality make up nearly 8% of all samples, which is equal to four samples. Four samples (8%) are ranked 4 and have poor quality, and finally, two cases, equivalent to 4% of all samples, are ranked 5 and have very poor quality. The quality ranking of the tested samples, according to the WHO standard for drinking, indicates that the samples with poor and very poor quality are all located near the outlet of urban and rural sewage and, in one case, near an industrial poultry farm.

Therefore, based on the quality criteria in the current study, it can be concluded that the discharge of wastewater into the groundwater is the most important factor in the severe decline of groundwater quality in this plain. The evaluation of the determined quality ranks indicates that the groundwater quality for drinking targets is in the poor category near industrial units.

The advantages of EWQI compared to WQI

As mentioned in Section 1 briefly, previous studies that have utilized the WQI to assess groundwater quality are mentioned. However, a more explicit identification of the limitations or gaps in these previous studies would enhance the understanding of why an optimized WQI equation with entropy weight (EWQI) is needed.

  • (1)

    Lack of consideration for parameter weighting: The researches could highlight that previous studies using WQI may not have adequately addressed the issue of parameter weighting. Parameters used in the calculation of WQI often have different degrees of importance in determining water quality. The EWQI approach, by incorporating entropy weight, aims to overcome this limitation by assigning appropriate weights to different parameters based on their relative significance. This allows for a more accurate and comprehensive assessment of groundwater quality.

  • (2)

    Inadequate handling of parameter interactions: Previous studies utilizing WQI may not have fully accounted for the interactions between different water quality parameters. Water quality is a complex system, and the relationships between parameters can be nonlinear and interdependent. The EWQI approach, through the use of entropy weight, takes into account the interactions between parameters and captures the synergistic or antagonistic effects they may have on water quality. This provides a more holistic evaluation of groundwater quality compared to traditional WQI methods.

  • (3)

    Simplistic classification of water quality: Previous studies may have relied on simplistic classification schemes based on fixed threshold values to categorize water quality. This approach may not capture the nuanced variations in water quality and can lead to misinterpretation of the actual conditions. The EWQI approach, by considering the entropy weight, allows for a more nuanced and flexible classification of water quality. It takes into account the relative importance of different parameters and allows for a more accurate representation of the actual variations in groundwater quality.

  • (4)

    Limited consideration of uncertainty: Previous studies using WQI may not have adequately addressed the uncertainty associated with parameter measurements and their impact on the overall assessment of water quality. The EWQI approach incorporates entropy weight, which inherently accounts for uncertainty and provides a more robust and reliable assessment of groundwater quality. By considering the uncertainty associated with each parameter, the EWQI approach improves the accuracy and reliability of the results.

Understanding factors impacting groundwater quality in Ha'il, Saudi Arabia

As it was mentioned, groundwater quality is influenced by various factors, including urban, rural, and industrial sewage, chemical fertilizers, and waste disposal sites. However, a more comprehensive analysis of the magnitude and significance of these factors in the specific context of Ha'il, Saudi Arabia, would provide valuable insights. By delving deeper into the sources and contributions of groundwater deterioration in the region, the understanding of the environmental challenges faced it can be enhanced:

  • (1)

    Urban sewage: The researches could provide a detailed exploration of the extent of urban sewage contamination in Ha'il and its impact on groundwater quality. It could be discussed about the population growth, urbanization rates, and sewage treatment infrastructure in the area. In addition, case studies or statistical data on the levels of pollutants typically found in urban sewage and their potential infiltration into groundwater would strengthen the analysis.

  • (2)

    Rural sewage: Examining the role of rural sewage in groundwater contamination is crucial. The researches could investigate the agricultural practices, irrigation systems, and disposal methods of agricultural wastewater in Ha'il. Highlighting specific contaminants associated with agricultural activities, such as pesticides, fertilizers, and livestock waste, would provide a comprehensive understanding of the groundwater pollution sources in rural areas.

  • (3)

    Industrial wastewater: The impact of industrial activities on groundwater quality should be thoroughly discussed. The major industries could be explored in Ha'il and their waste management practices. Providing information on the types of pollutants typically found in industrial wastewater, the proximity of industries to water sources, and any historical incidents of contamination would help assess the significance of industrial activities as a source of groundwater pollution.

  • (4)

    Chemical fertilizers: Investigating the use of chemical fertilizers in Ha'il's agricultural practices would be valuable. The types and quantities of fertilizers commonly used, their potential leaching into groundwater, and the associated impacts on water quality can be investigated. Including studies or data on nitrate levels, a common pollutant from fertilizers, in groundwater samples would further support the discussion.

  • (5)

    Waste disposal sites: The managers should analyze the presence and impact of waste disposal sites in Ha'il. They could explore the types of waste typically disposed of, the proximity of these sites to water sources, and the potential for leachate contamination. Case studies or statistics on groundwater contamination near waste disposal sites in the region would provide concrete evidence of the issue.

By providing a detailed analysis of the specific sources and their respective contributions to groundwater deterioration in Ha'il, Saudi Arabia, the understanding of the environmental challenges faced in the region can be enhanced. Including relevant statistics, case studies, and local data will strengthen the impact of this discussion and provide a more comprehensive picture of the groundwater pollution sources in Ha'il.

Combining EWQI with machine-learning techniques

The integration of ANN and M5 model tree can improve the results of IWQI in several ways (Table 7):

  • (1)

    Nonlinear relationships: EWQI is based on a mathematical equation that may assume linear relationships between the water quality parameters. However, water quality data often exhibit complex nonlinear patterns. By incorporating ANN, which is capable of capturing nonlinear relationships, the EWQI model can better represent the intricate interactions among the parameters, leading to improved accuracy in assessing water quality.

  • (2)

    Pattern recognition: ANN has the ability to recognize patterns and detect subtle relationships within a dataset. By training the ANN on a large dataset of water quality samples, it can learn complex patterns and generalize them to make predictions on unseen data. This allows the EWQI model to uncover hidden patterns and correlations among water quality parameters, leading to more accurate assessments.

  • (3)

    Handling missing data: Water quality datasets may contain missing values or incomplete information for certain parameters. ANN has the capability to handle missing data by estimating or imputing the missing values based on the available information. This ensures that the EWQI model can still provide reliable assessments even in the presence of missing data.

  • (4)

    Improved interpretability: M5 model tree is a decision tree-based model that provides a transparent and interpretable representation of the relationships between input variables and the output (EWQI). By employing the M5 model tree alongside ANN, the EWQI model can benefit from the interpretability of the decision tree, allowing for a better understanding of the factors and variables that contribute to the water quality assessment. This can aid in identifying critical parameters and guiding decision-making processes.

Table 7

The comparison of simple EWQI with hybrid ANN-EWQI and M5-EWQI

RMSER2
EWQI 0.20 68% 
ANN-EWQI 0.15 87% 
M5-EWQI 0.18 81% 
RMSER2
EWQI 0.20 68% 
ANN-EWQI 0.15 87% 
M5-EWQI 0.18 81% 

This improved performance of the hybrid models is further supported by various efficiency criteria used for evaluation. Metrics such as RMSE and R2 consistently indicate that the hybrid models outperform the simple EWQI. The hybrid models exhibit lower RMSE and the higher R2 values of the hybrid models highlight their ability to explain a larger proportion of the variance in the observed data and capture the variability more accurately.

Also, the scatter plots of the hybrid ANN-EWQI and M5-EWQI models (Figure 3) demonstrate their superior performance compared to the simple EWQI. The scatter plots reveal a closer alignment of predicted values with observed values, indicating better accuracy and reliability of the hybrid models. The points in the scatter plots for the hybrid models tend to cluster tightly around the diagonal line, signifying a strong correspondence between predicted and observed values. On the other hand, the scatter plots for the simple EWQI exhibit a larger spread and deviations from the diagonal line, indicating a relatively weaker correlation between predicted and observed values.
Figure 3

The scatter plot of hybrid EWQI and simple EWQI for comparison of the model performance.

Figure 3

The scatter plot of hybrid EWQI and simple EWQI for comparison of the model performance.

Close modal

Overall, the results from the scatter plots and efficiency criteria collectively demonstrate that the hybrid ANN-EWQI and M5-EWQI models provide better predictions and improved performance compared to the simple EWQI. These findings emphasize the effectiveness of incorporating artificial intelligence and decision tree techniques into the traditional EWQI framework for groundwater quality assessment. The hybrid models offer enhanced accuracy, reliability, and predictive power, making them valuable tools for water quality management and decision-making processes.

Potential remediation strategies for improving water quality in the samples

There are several potential remediation strategies and interventions that can be implemented to improve the water quality of samples with poor rankings. The choice of the appropriate strategy depends on the specific contaminants or parameters of concern, the characteristics of the study area, and the feasibility of implementation. Here are some common remediation strategies:

  • (1)

    Source control: Identifying and addressing the sources of contamination is crucial for improving water quality. This can involve implementing stricter regulations and control measures on industrial discharges, agricultural practices, and wastewater treatment facilities. Proper management of landfills and waste disposal sites can also help prevent leachate contamination.

  • (2)

    Wastewater treatment: Implementing effective wastewater treatment processes is vital for reducing the discharge of pollutants into groundwater. Advanced treatment technologies such as activated carbon filtration, membrane filtration, and biological treatment can remove contaminants and improve the quality of treated wastewater before it is discharged into the environment.

  • (3)

    Nutrient management: Implementing proper nutrient management practices in agriculture can help reduce nutrient runoff and subsequent groundwater contamination. This includes optimizing fertilizer application, employing precision farming techniques, and promoting the use of organic fertilizers. Buffer zones and vegetative barriers can also be established to intercept and filter nutrient-laden runoff.

  • (4)

    Remediation technologies: Depending on the specific contaminants present, remediation technologies such as in situ chemical oxidation, permeable reactive barriers, or phytoremediation can be employed. These technologies aim to degrade, immobilize, or extract contaminants from groundwater, effectively reducing their concentrations and improving water quality.

  • (5)

    Monitoring and regular testing: Implementing a robust monitoring program is essential to continuously assess water quality, identify potential sources of contamination, and track the effectiveness of remediation efforts. Regular testing of groundwater samples can help detect changes in quality and ensure that interventions are successful in improving water conditions.

It is important to note that the selection and implementation of remediation strategies should be done in consultation with experts, taking into consideration the specific characteristics of the study area, available resources, and regulatory requirements. In addition, public awareness and community engagement are crucial for the success of any remediation efforts, as they can promote responsible water use and contribute to the long-term sustainability of water resources.

The mean values of Ca, Cr, EC, DO, and BOD are higher than the permissible levels set by the WHO drinking water standard. The average values of PH and NO3 fall within the acceptable range defined by the WHO standard, and the mean values of other parameters being lower than the permissible levels, it is possible to classify more than half of the samples into the appropriate quality category.

The results of factor analysis indicate that by examining 16 parameters in the form of 50 samples, more than 75% of groundwater quality variations can be evaluated by four factors (MF1, MF2, MF3, and MF4). The MF1 has the most significant role with more than 33% of the variations, followed by the MF2 with more than 18% MF3 with more than 11% and the MF4 with nearly 10%, as the most influential factors on water quality in the Qassim-Ha'il, Saudi Arabia.

According to the calculated entropy values and their weights for each parameter, Zn has the highest effect rate on groundwater quality in the Qassim-Ha'il, Saudi Arabia, followed by sodium, calcium, nitrate, acidity, potassium, magnesium, and DO in the next categories. EC and TDS have completely equal rates, and Cr has the least impact on the drinking water quality of this region. The assessment of entropy weights shows that Cr has the highest instability and continuous changes in this parameter, likely due to localized and possibly seasonal entry of this parameter into the subsurface environment due to industrial activities. After Cr, the highest continuous and small changes belong to BOD. The ranking results based on EWQI indicate 4% of the total 50 samples, have a rank of 1 and very good quality for drinking. The highest number of samples, equivalent to 76% of the samples, has a rank of 2 and good quality. Samples with a rank of 3 and average quality make up nearly 8% of the total samples. Eight percent of the total have a rank of 4 and poor quality, and finally, equivalent to 4% of the total samples, have a rank of 5 and very poor quality. Based on the quality criteria in this study, it can be concluded that sewage discharge into the subsurface environment is the most important factor in the severe reduction of water quality in this plain, and by moving away from urban and rural residential areas and industrial units, water quality for drinking can be improved.

Also, the superior performance of the hybrid models is reinforced by the evaluation using various efficiency criteria. Parameters such as RMSE and R2 consistently demonstrate that the hybrid models surpass the simple EWQI. The hybrid models exhibit lower RMSE values, indicating their improved accuracy in predicting the observed data. Moreover, the higher R2 values of the hybrid models signify their enhanced ability to explain a larger portion of the data variance and capture the underlying variability more effectively (Table 7).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abad
S. S. A. M. K.
,
Javidan
P.
,
Baghdadi
M.
&
Mehrdadi
N.
2023
Green synthesis of Pd@ biochar using the extract and biochar of corn-husk wastes for electrochemical Cr (VI) reduction in plating wastewater
.
Journal of Environmental Chemical Engineering
11
(
3
),
109911
.
Aliasghar
A.
,
Javidan
P.
,
Rahmaninezhad
S. A.
&
Mehrdadi
N.
2022
Optimizing the desalination rate in a photoelectrocatalytic desalination cell (PEDC) by altering operational conditions
.
Water Supply
22
(
12
),
8659
8668
.
Crutchik
D.
,
Rodríguez-Valdecantos
G.
,
Bustos
G.
,
Bravo
J.
,
González
B.
&
Pabón-Pereira
C.
2020
Vermiproductivity, maturation and microbiological changes derived from the use of liquid anaerobic digestate during the vermicomposting of market waste
.
Water Science and Technology
82
,
1781
1794
.
dos Santos Simoes
F.
,
Moreira
A. B.
,
Bisinoti
M. C.
,
Gimenez
S. M. N.
&
Yabe
M. J. S.
2008
Water quality index as a simple indicator of aquaculture effects on aquatic bodies
.
Ecological Indicators
8
,
476
484
.
Ishaku
J. M.
2011
Assessment of groundwater quality index for Jimeta-Yola area, Northeastern Nigeria
.
Journal of Geology and Mining Research
3
,
219
231
.
Javidan
P.
,
Baghdadi
M.
,
Torabian
A.
&
Goharrizi
B. A.
2022
A tailored metal–organic framework applicable at natural pH for the removal of 17α-ethinylestradiol from surface water
.
Desalination and Water Treatment
264
,
259
269
.
Jianhua
W.
,
Peiyue
L.
&
Hui
Q.
2011
Groundwater quality in Jingyuan County, a semi-humid area in Northwest China
.
E-Journal of Chemistry
8
,
787
793
.
Kawachi
T.
,
Maruyama
T.
&
Singh
V. P.
2001
Rainfall entropy for delineation of water resources zones in Japan
.
Journal of Hydrology
246
,
36
44
.
Khosravi
M.
,
Afshar
A.
,
Molajou
A.
&
Sandoval-Solis
S.
2022
Joint operation of surface and groundwater to improve sustainability index as irrigation system performance: cyclic storage and standard conjunctive use strategies
.
Journal of Water Resources Planning and Management
148
,
04022046
.
Luo
P.
,
Xu
C.
,
Kang
S.
,
Huo
A.
,
Lyu
J.
,
Zhou
M.
&
Nover
D.
2021
Heavy metals in water and surface sediments of the Fenghe River Basin, China: assessment and source analysis
.
Water Science and Technology
84
,
3072
3090
.
Mohinuddin
S.
,
Sengupta
S.
,
Sarkar
B.
,
Saha
U. D.
,
Islam
A.
,
Islam
A. R. M. T.
, Hossain, Z. M., Mahammad, S., Ahamed, T., Mondal, R., Zhang, W. &
Basra
A.
2023
Assessing lake water quality during COVID-19 era using geospatial techniques and artificial neural network model
.
Environmental Science and Pollution Research
30,
1
17
.
Nihalani
S. A.
,
Behede
S. N.
&
Meeruty
A. R.
2022
Groundwater quality assessment in proximity to solid waste dumpsite at Uruli Devachi in Pune, Maharashtra
.
Water Science and Technology
85
,
3331
3342
.
Nourani
V.
,
Davanlou Tajbakhsh
A.
,
Molajou
A.
&
Gokcekus
H.
2019a
Hybrid wavelet-M5 model tree for rainfall-runoff modeling
.
Journal of Hydrologic Engineering
24
,
04019012
.
Nourani
V.
,
Molajou
A.
,
Tajbakhsh
A. D.
&
Najafi
H.
2019b
A wavelet based data mining technique for suspended sediment load modeling
.
Water Resources Management
33
,
1769
1784
.
Ozkul
S.
,
Harmancioglu
N. B.
&
Singh
V. P.
2000
Entropy-based assessment of water quality monitoring networks
.
Journal of Hydrologic Engineering
5
,
90
100
.
Patel
P. S.
,
Pandya
D. M.
&
Shah
M.
2023
A systematic and comparative study of water quality index (WQI) for groundwater quality analysis and assessment
.
Environmental Science and Pollution Research
30
,
54303
54323
.
Rajankar
P. N.
,
Gulhane
S. R.
,
Tambekar
D. H.
,
Ramteke
D. S.
&
Wate
S. R.
2009
Water quality assessment of groundwater resources in Nagpur Region (India) based on WQI
.
E-Journal of Chemistry
6
,
905
908
.
Reza
R.
&
Singh
G.
2010
Assessment of ground water quality status by using water quality index method in Orissa, India
.
World Appl Sci J
9
,
1392
1397
.
Shyu
G. S.
,
Cheng
B. Y.
,
Chiang
C. T.
,
Yao
P. H.
&
Chang
T. K.
2011
Applying factor analysis combined with kriging and information entropy theory for mapping and evaluating the stability of groundwater quality variation in Taiwan
.
International Journal of Environmental Research and Public Health
8
,
1084
1109
.
Tran
T. T. H.
,
Tran
Q. A.
,
Nguyen
H. T. H.
&
Tong
N. X.
2023
Assessing water quality in the Dong Nai River (Vietnam): implications for sustainable management and pollution control
.
Water Science and Technology
87, 2917–2929.
Ulanowicz
R. E.
2001
Information theory in ecology
.
Computers & Chemistry
25
,
393
399
.
Yavari
F.
,
Salehi Neyshabouri
S. A.
,
Yazdi
J.
,
Molajou
A.
&
Brysiewicz
A.
2022
A novel framework for urban flood damage assessment
.
Water Resources Management
36
(
6
),
1991
2011
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).