In the current study, several soft-computing methods including artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS), gene expression programming (GEP), and hybrid wavelet theory-GEP (WGEP) are used for modeling the groundwater's electrical conductivity (EC) variable. Hence, the groundwater samples from three sources (deep well, semi-deep well, and aqueducts), located in six basins of Iran (Urmia Lake (UL), Sefid-rud (SR), Karkheh (K), Kavir-Markazi (KM), Gavkhouni (G), and Hamun-e Jaz Murian (HJM)) with various climate conditions, were collected during 2004–2018. The results of the WGEP model with data de-noising showed the best performance in estimating the EC variable, considering all types of groundwater resources with various climatic conditions. The Root Mean Squared Error (RMSE) values of the WGEP model were varied from 162.068 to 348.911, 73.802 to 171.376, 29.465 to 351.489, 118.149 to 311.798, 217.667 to 430.730, and 76.253 to 162.992 μScm−1 in the areas of UL, SR, K, KM, G, and HJM basins. The WGEP model's performance (R-values) for deep wells, semi-deep wells, and aqueducts of the areas of the KM basin associated with the arid steppe cold (Bsk) dominant climate classification was the best. Also, the WGEP's extracted mathematical equations could be used for EC estimating in other basins.
Iran's groundwater resources face a critical situation.
The Electrical Conductivity (EC) variable of various groundwater resources was estimated using single and hybrid wavelet theory methods.
The impact of various climatic categories on the EC estimation was evaluated.
The data de-noising by wavelet tools can improve the performance of EC estimation models.
In recent decades, the increasing growth of industry and agriculture, as well as climate changes and human development, have increased the demand for water resources which have caused a serious challenge of water quality deterioration (Yang et al. 2017; Kaur et al. 2020). Groundwater, as the world's second-largest source of fresh water, could be a safe and reliable source of drinking water for many communities and regions around the world (Hekmatnia et al. 2020). Groundwater pollution arising from the deleterious consequences of industrial, agricultural, and service development is caused by unauthorized disposal of a variety of industrial and municipal wastewater which contains various pollutants, regardless of environmental considerations, which can endanger human health (Nigam & Yadav 2019). Considering the importance of groundwater resources and the country's location in arid and semi-arid climates, it is important to study the quality of water resources and their contaminants as well as modeling via soft-computing methods (Sharghi et al. 2019). In recent decades, soft-computing methods such as multi-linear regression (MLR), artificial neural network (ANN), support vector machines (SVM), adaptive neuro-fuzzy inference system (ANFIS), multivariate adaptive regression splines (MARS), gene expression programming (GEP) methods, and their hybrids with wavelet theory (W), have been successfully employed in surface and groundwater quality modeling (Wang et al. 2020).
Khudair et al. (2018) applied the ANN method without hybrid methods to predict the Water Quality Index (WQI). Their results showed that the pH and chloride variables have a significant influence on WQI prediction with the R2 value of 0.973 for the optimal model. Wagh et al. (2018) used the ANN method without hybrid methods for modeling the nitrate concentration in the groundwater resources of Kadava river basin. Their results reflected that the Levenberg–Marquardt (LM) back-propagation algorithm was the effective algorithm of ANN models for the prediction of water quality variables. Zaqoot et al. (2018) applied the multi-layer perceptron-ANN (MLP-ANN) model successfully to predict the nitrate concentration of 15 groundwater wells in the Khan Younis and Rafah areas (semi-arid climate) with R and RMSE values of 0.838 and 63.236. Azad et al. (2018) applied different hybrid evolutionary algorithms (EA) with ANFIS without data de-noising to estimate the Gorganrood river water quality. The results showed that the ANFIS-Genetic Algorithm (ANFIS-GA) has a suitable performance for estimating Sodium Adsorption Ratio (SAR). Kisi et al. (2019) used various soft-computing methods to estimate electrical conductivity (EC), total hardness (TH), and SAR variables of groundwater resources in Isfahan-Borkhar, Iran. The results indicated that the hybrid of the continuous genetic algorithm (CGA) algorithm with the ANFIS method had the best performance for estimating the EC, SAR, and TH, respectively. Aryafar et al. (2019) applied GP, ANFIS, and ANN models without hybrids for estimating groundwater quality variables such as TH, total dissolved solids (TDS), and EC of 12 wells in the Khezri plain. According to the results, the GP can be considered as a promising method to estimate the quality variables of groundwater resources with various uses. Kadam et al. (2019) used ANN and MLR methods without hybrid models to estimate the WQI of Shivganga River basin, and 34 representative groundwater characteristics including pH, EC, TDS, TH, Ca, Mg, Na, K, Cl, HCO3, SO4, NO3, and PO4 variables. Their results indicated that the estimation of ANN models had an acceptable performance of RMSE values. Jafari et al. (2019) used MLP, ANFIS, SVM, and GEP methods to estimate the TDS of a groundwater aquifer in Tabriz plain. They found that the GEP model had superior performance to other methods with the R2 and RMSE values of 0.998 and 58.930. Maroufpoor et al. (2019) applied ANN and ANFIS models without hybrids for estimating the spatial distribution of groundwater's EC variable in Keshit, Bam Normashir, and Rhmtabad plains. They illustrated that the ANN method had the best performance with the R2 value of 0.992 and RMSE value of 142.462 compared to the ANFIS approach.
W-hybrid models are mostly developed in estimating surface water quality without reducing data noises. Montaseri et al. (2018) used single and W-hybrid soft-computing methods, including ANN, ANFIS, GEP, wavelet-ANN (WANN), wavelet-ANFIS (WANFIS), and wavelet-GEP (WGEP) without data de-noising, to estimate the amount of TDS in rivers of four basins with different climatic categories (e.g. snow, dry-hot summer (Dsa), arid steppe, cold (Bsk), arid desert, cold (Bwk), and arid steppe, hot (Bsh) Koppen Geiger climate categories), located in Iran. The results indicated the superior performance of W-hybrid methods compared to single methods. Chen et al. (2020) and Rajaee et al. (2020) investigated single and W-hybrid soft-computing methods to estimate the river water qualitative variables. Their evaluations emphasized the applicability of the soft-computing methods in the estimation of water quality indicators for rivers and the superiority of W-hybrid methods compared to the single soft-computing methods for various climatic conditions.
The current study with de-noising the groundwater quality variables corresponding to various climate categories and types of groundwater sources could fill the gaps of previous studies in EC estimating. Our selected basins are located in Iran and were selected based on Koppen Geiger classification that includes Urmia Lake (UL) with a snow, dry and hot summer (Dsa), Sefid-rud (SR) with warm temperate and dry and hot summer (Csa), Karkheh (K) with an arid steppe, hot (Bsh), Kavir-Markazi (KM) with an arid steppe, cold (Bsk), Gavkhouni (G) with an arid desert, cold (Bwk), Hamun-e Jaz Murian (HJM) with an arid desert, hot (Bwh) climate categories. Therefore, the objectives of the current study are to use single and W-hybrid soft-computing methods for the area of the six selected basins with various climate conditions to: (1) apply soft-computing methods (ANN, ANFIS, and GEP) with a novelty structure of W-hybrid model (WGEP) for data de-noising and groundwater's EC estimating; (2) investigate the impact of climate variability and different types of groundwater sources (deep well, semi-deep well, and aqueducts) on the EC estimating models; (3) explore mathematical relationships of groundwater quality variables at spatial and temporal scales for the various groundwater resource types and their validations.
Research concepts and steps
The results of the current research lead to the extraction of the mathematical relation governing the qualitative variables of different types of groundwater resources that could be applied at different time and spatial scales. Figure 1 displays the steps of the current research. The Köppen and Geiger climate classification scheme separates the climates into five main groups (A, B, C, D, and E).
Various types of soft-computing models, such as ANNs, can be used for a variety of applications. The learning process of ANN models has a similarity with the human brain's performance (Haykin 1998; Ham & Kostanic 2001). Three basic characteristics of ANNs for determining the optimal model include: (1) applied learning algorithm, (2) activation function, and (3) neuron numbers. In the current study, the LM algorithm with three-layers was used for the training of the ANNs estimation models. The logsig, tansig, and purelin functions were applied in the hidden and output layers as activation functions. Also, MATLAB software (Montaseri et al. 2018) was used to develop the ANNs models and the trial-error methods were applied to determine the optimal number of neurons in the third layer of the ANN models (Barzegar et al. 2016).
Membership functions (MFs) of inputs Cl and K variables (x, y), are defined by A1, A2, and B1, B2 (LOW, LOW and HIGH, MEDIUM, respectively). The importance of the cluster's number is in determining the efficient radius amount. The best radius values ranged from 0.200 to 0.600.
The GEP is one of the newest methods of evolutionary algorithms that is more applicable because of its high accuracy (Ferreira 2006). Design and implementation steps of the GEP methods include: (1) defining the fitness function; (2) defining the terminals and functions; (3) determining the structure of chromosomes (number of generations, length, and number of genes); (4) determining the linking function of genes; (5) specifying the operators and finally, (6) executing the methods (Ferreira 2006). For the terminal set (T), SO4, Cl, SAR, K, Mg, and Ca variables were selected based on their significant correlation coefficients with the EC variable.
Case study and dataset
In this study, the groundwater quality data and the volume of agricultural, drinking, and industrial water uses were collected by the Iranian Water Resources Management Company (http://wrbs.wrm.ir/) for the study period 2004–2018. The groundwater quality data included three groundwater resources, namely deep wells, semi-deep well, and aqueducts (z = 1, 2, and 3), for six basins with various climate conditions. Figure 3 displays the geographical location of the studied basins. The characteristics of the EC variable for the selected area's basins are listed in Table 1. The maximum variation coefficient of EC variable for deep and semi-deep wells, and aqueducts with the values of 1.339, 1.421, and 1.383, are related to the areas of UL, KM, and SR basins, respectively. On the other hand, the minimum variation coefficient with the value of 0.663 for deep wells was related to the HJM basin, and semi-deep wells and aqueducts with values of 0.740 and 0.259 were related to the K basin.
To create soft-computing models for EC estimation, 75 and 25% of collected data was used for training and testing. Also, SO4, Cl, SAR, K, Mg, and Ca variables (meq.L−1) were selected as input variables for EC estimation due to a significant Pearson correlation coefficient (α = 0.050) with the EC variable.
Calculating the model's performance
RESULTS AND DISCUSSION
Status of the groundwater resources
During the study period, the total number of deep wells had an increasing trend in the areas of all studied basins. Deep wells located in the K and KM basins with Bsh and Bsk climate classes, semi-deep wells located in the UL, G, and HJM basins with Dsa, Bwk, and Bwh climate classes, and aqueducts located in the SR basin with Csa climate type, had the highest average amount of EC variable. Also, in the areas of K and HJM with dominant Bsh and Bwh climate types and minimum average of EC values in semi-deep wells and aqueducts, which are more affected by climate change, with the reduction of surface water resources, groundwater resources have begun to meet the needs. The results showed that the highest water abstraction for agricultural uses, with the highest average EC for deep wells and aqueducts, is related to the KM basin. Also, the lowest water abstraction for agricultural use, with the lowest average amount of EC for deep wells, is related to the SR basin. Therefore, in most cases, groundwater resources that had low quality (more salinity) are prioritized for agricultural uses. Areas with warmer and drier climates had lower quality of groundwater resources with higher agricultural uses.
EC estimation using ANN, ANFIS, and GEP
Three soft-computing approaches, namely ANN, ANFIS, and GEP, were applied to estimate groundwater quality variables of deep wells, semi-deep wells, and aqueducts in different areas of Iran with various climatic conditions. The results of optimal ANN, ANFIS, and GEP models by groundwater resource types in the areas of the studied basins are listed in Table 2. The numbers of neurons in the hidden layer of three-layer structure ANN models for deep wells, semi-deep wells, and aqueducts were (2, 5, 2), (3, 2, 4), (2, 2, 4), (4, 5, 2), (2, 4, 2) and (3, 2, and 2) for the areas of UL, SR, K, KM, G, and HJM basins, respectively. The activation functions of output nodes were obtained linear-purelin or tangent sigmoid-tansig for all the areas with various groundwater resource types. The activation functions of hidden nodes of ANN models were respectively (tansig, tansig and tansig), (logsig, tansig and tansig), (tansig, tansig and tansig), (tansig, tansig and tansig), (logsig, tansig and tansig) and (tansig, tansig and logsig) for the groundwater resource types (z = 1, 2, and 3) located in the areas of UL, SR, K, KM, G, and HJM basins. The radii values of ANFIS models for the deep wells, semi-deep wells, and aqueducts groundwater resources type were UL: 0.260, 0.320, 0.350; SR: 0.420, 0.370, 0.510; K: 0.350, 0.410, 0.280; KM: 0.550, 0.230, 0.480; G: 0.550, 0.220, 0.430 and HJM: 0.330, 0.270, 0.460, respectively. The R values for soft-computing models are close to 1, with the quality relations being: RGEP> RANFIS > RANN for all groundwater resource types and studied areas. The ANFIS model exceeded the ANN model's performance; also, the GEP models had a better performance for EC estimation than the ANFIS and ANN models for all areas (see Supplementary Data, Figure S1 for more details).
The values of a coefficient of fitted equation (y=ax) to the observed and residual values of GEP optimal models for deep wells, semi-deep wells, and aqueducts were UL: 0.051, 0.012, 0.004; SR: 0.090, 0.003, 0.027; K: 0.002, 0.055, 0.005; KM: 0.002, 0.005, 0.011; G: 0.002, 0.011, 0.027 and HJM: 0.016, 0.040, 0.049, respectively. The minimum value of a coefficient, close to 0, indicates the randomness and independence of the estimated values of the GEP optimal models. The minimum value of a coefficient with the value of 0.002 was obtained for deep wells of Bsh, Bsk, and Bwk climate classes with an arid climate group. These results indicated that the performance of the models is influenced by the climatic characteristics and groundwater resource types. The variation coefficients of a, by considering all types of groundwater resources for Bwh, Bsk, Bwk, Csa, Dsa, and Bsh climate categories, were 0.478, 0.718, 0.942, 1.133, 1.129, and 1.457, respectively. These results reflected that the Bwh and Bsh climate types with an arid climate group had the lowest and the highest effect on the amounts of WGEP model's performance to estimate the EC values for various groundwater resource types, respectively. Also, the variation coefficients of a, for aqueducts, semi-deep wells, and deep wells, were 0.835, 1.025, and 1.325. Therefore, aqueducts and deep wells located in six of the study climates had the lowest and the highest effect on the amounts of WGEP model's performance for EC estimating, respectively. The model's performances (R-values) for deep wells and aqueducts located in the areas of KM with Bsk and for semi-deep wells located in the areas of G with Bwk dominant type of Koppen climate classification were the best. The model's performances for deep wells, semi-deep wells, and aqueducts of the SR, UL, and K areas basins associated with the Csa, Dsa, and Bsh dominant type of Koppen climate classification were the poorest among the other climates, respectively. The Csa climate type is a climate where the coldest month is warmer than –3 °C but colder than +18 °C and summers are dry and hot. The Dsa is a climate where there is at least one month colder than –3 °C and summers are dry and hot. The Bsh is a climate which means annual temperature is greater than or equal to 18 °C and is too dry to support a forest, but not dry enough to be a desert, usually consisting of grassland plains. The climatic conditions and type of groundwater resource only affect the performance amount of the models and not their priority. However, the results of the three methods could be acceptable for estimating the EC variable in groundwater resources with various climatic conditions. The RE values of EC estimated data via GEP models at different ranges (25%max, 50%mid, 25%min) are listed in Table 3. The maximum values of RE are related to the areas of SR and K basins with the values of 1.231 and 1.215, respectively at the range of 25%min. Also, the RE values of the GEP models EC estimated data for all cases were acceptable. The GEP models percentage of performance improvement compared to the ANN and ANFIS models for semi-deep wells groundwater sources type equaled approximately 31, 76, 53, 70, 52, and 49% and 26, 60, 27, 53, 35, and 46% in the areas of UL, SR, K, KM, G, and HJM basins, respectively.
EC estimation using WGEP
For improving the performance of GEP models and constructing the hybrid GEP estimation models with wavelet tools (WGEP), the first point is to decompose the groundwater quality variables, into the subseries of mains and details (A and D sub-series) via a wavelet tool. To make and develop the WGEP models, D decomposed subseries are introduced as noise and removed from the models. After data de-noising (see Tables S1 and S2 for more details), decomposed subseries values are estimated separately with GEP models. The results of WGEP optimal models during the test period are listed in Table 4. The RMSE values of the WGEP models with db4 mother wavelet varied from 162.068 to 348.911, 73.802 to 171.376, 29.465 to 351.489, 118.149 to 311.798, 217.667 to 430.730, and 76.253 to 162.992 (μScm−1) in the areas of UL, SR, K, KM, G, and HJM basins for various groundwater source types. Figure 4 displays the observed and estimated EC variable during the test period using the WGEP method. The R values obtained were more than 0.910 for all areas with various groundwater source types in the WGEP optimal models. The results of R, RMSE, MAE, and graphical methods reflected the best performance of the WGEP models.
Wavelet analysis and developing WGEP models greatly enhances the performance of GEP models in all climate types and groundwater sources, by de-noising the data noises and the ability to establish very complex nonlinear relationships in its structure. The results indicated that the performance improvement of WGEP models compared to GEP varied from 17 to 35%, 13 to 32%, and 17 to 46% for deep wells, semi-deep wells, and aqueducts of study areas, respectively.
The WGEP model's performance (R-values) for deep wells, semi-deep wells, and aqueducts of the areas of the KM basins associated with the Bsk dominant type of Koppen climate classification was the best. The Bsk type of Koppen climate classification indicated the climate whose mean annual temperature is less than 18 °C and is too dry to support a forest, but not dry enough to be a desert, usually consisting of grassland plains. However, the main advantage of the GEP over other soft-computing methods (ANFIS and ANN) is in producing predictive equations. The equations obtained with the optimal WGEP models are listed in Table 5. The fitted equations can be applied at variable spatial and temporal scales. Due to the high performance of the KM basin for EC estimating in three types of groundwater resources, it is selected as a ‘basic basin’ for validating the extracted mathematical equations in other study basins. The validation results are listed in Table 6. Our results reflected the high ability of WGEP model's extracted mathematical equations for EC estimating of corresponding groundwater resource types in the areas of various basins. For instance, R values of deep well's extracted mathematical equations in validating were 0.984, 0.921, 0.968, 0.993, and 0.989 for UL, SR, K, G, and HJM basins, respectively. The highest R-value in the validating section is related to the basin with an arid climate, which is in the same climate categories as the basic basin. The result of the present study is in agreement with the findings by Khudair et al. (2018), Zaqoot et al. (2018), Aryafar et al. (2019), Maroufpoor et al. (2019), and Chen et al. (2020).
Due to the reflected main results of our research, climate categories and type of groundwater resources had a major impact on the amount of model's performance in the groundwater resources quality estimating, but not on the priority of applied model's performance. The priority of the model's performance was: RWGEP > RGEP > RANFIS > RANN, without interfering with the climate classes and groundwater resource types. Our results strongly confirm the high ability of the EC estimating of WGEP model's improvements with a new structure of data de-noising models. These results reflected an important message including the data noise impact on the soft-computing model's performance in estimating EC values. The Bwh and Bsh climate types had the lowest and the highest effect, respectively on the amounts of WGEP model's performance to estimate the EC values for various groundwater resource types. On the other hand, aqueducts and deep wells located in six study climates had the lowest and the highest impact on the EC values of the WGEP model. The results reflected that the percentage improvement of WGEP models compared to GEP ranged from 13 to 46% for deep wells, semi-deep wells, and aqueducts of study areas. Also, the obtained R-values of WGEP optimal models (>0.910) for all areas with various groundwater source types are also in line with the suitable performance of extended models. The RE values of WGEP models varied from 0.033 to 1.231 for three ranges of 25%min,max and 50%mid, which could confirm the optimal estimation of the extended new structure model. As the existence of uncertainty in meteorological-hydrological variables is undeniable and introduced structure of soft-computing methods by eliminating data noise could improve the performance of models, selecting the meteorological-quantitative hydrological variables as model's input variables for EC estimation can be suggested for future research. The main practical point of our research indicated the high ability of WGEP model's extracted mathematical equations for EC estimating of corresponding groundwater resource types in the areas of other basins.
The authors would like to thank Sari Agricultural Science and Natural Resources University for financing this research [Code Number: 02-1399-08].
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.