The present study integrates co-kriging as spatial estimator and self-organizing map (SOM) as clustering technique to identify spatially homogeneous clusters of groundwater quality data and to choose the most effective input data for feed-forward neural network (FFNN) model to simulate electrical conductivity (EC) and total dissolved solids (TDS) of groundwater. The methodology is presented in three stages. In the first stage, a geostatistics approach of co-kriging is used to estimate groundwater quality parameters at locations where the groundwater levels are measured. In stage two, a SOM clustering technique is used to identify spatially homogeneous clusters of groundwater quality data. The dominant input data, selected by spatial clustering and mutual information are then imposed into the FFNN model for one-step-ahead predictions of groundwater quality parameters at stage three. The performance of the newly proposed model is compared to a conventional linear forecasting method of multiple linear regression (MLR). The results suggest that the proposed model decreases dimensionality of the input layer and consequently the complexity of the FFNN model with acceptable efficiency in spatiotemporal simulation of groundwater quality parameters. The application of FFNN for modeling EC and TDS parameters increases the accuracy of predictions respectively up to 84.5% and 17% on average with regard to the MLR model.

INTRODUCTION

Groundwater quality is influenced by the geological formation and anthropogenic activities, e.g., changes in land use, urbanization, intensive irrigated agriculture, mining activities, disposal of untreated sewage in rivers, lack of rational management, etc. and clearly the groundwater contamination can cause various diseases and problems. In developing countries, as well as Iran, aquifers are experiencing an increasing threat of pollution from urbanization, industrial development, agricultural activities, and mining enterprises, because of their over-exploitation.

Groundwater quality management, as part of water resource management, requires adequate information about the chemical composition of groundwater which is controlled by many factors, including the composition of precipitation, mineralogy of watersheds, and aquifers’ climate and topography. Such factors may combine to create diverse water types that change in composition spatially and temporally and cause the modeling of groundwater quality to become a complex issue.

Quantity and quality modeling of groundwater systems possesses features such as complexity, nonlinearity, being multi-scale and random, all governed by natural and/or anthropogenic factors, which complicate the dynamic predictions. Therefore, several hydro-chemical models have been developed for simulation of groundwater complex systems. Models based on their involvement of physical characteristics generally fall into three main categories: black box models, conceptual models, and physical-based models (Nourani et al. 2014a). The conceptual and physical-based models are the main tools for predicting hydrological variables and understanding the physical processes that are taking place in the system. However, they have a number of practical limitations, including the need for large amounts of hydro-geological data, sophisticated programs for calibration using rigorous optimization techniques, and a detailed understanding of the underlying physical processes. Owing to the recognized limitations of these models, data-based methods such as artificial neural network (ANN) models (i.e., black box models) have been widely used for forecasting hydro-chemical time series.

Models based on their involvement of physical characteristics generally fall into three main categories: black box, conceptual, and physical-based models (Nourani & Mano 2007). The conceptual and physical based models are the main tools for predicting variables and understanding the physical processes involved in a system. However, they have a number of practical limitations, including the need for large amounts of field data and a detailed understanding of the underlying physical process. If sufficient field data are not available, and accurate predictions are more important than understanding the actual physics of the situation, black box models such as ANN remain a good alternative method and can provide useful predictions without the costly calibration time.

On the other hand, a number of mathematical techniques such as finite difference (FDM), finite volume (FVM), finite element (FEM), and boundary element (BEM) methods have been widely applied to solve the governing physical-based partial differential equation (PDE) of the process. Such numerical solutions of highly nonlinear and complex governing PDEs (such as Richards’ PDE for modeling flow through porous media and dispersion-diffusion PDE for simulating contamination concentration) in the lack of sufficient field data may be prone to some problems, instability, and weak convergence. Therefore, as an alternative, in this study the ANN model was employed for handling nonlinear time variability of the phenomenon to cope with the limitations of FEM (or FDM, etc.) method in temporal modeling of the process. Even so, to use such an ANN modeling approach, the user should have a good understanding of the physical processes involved in the aquifer system and such methods should be used with good engineering knowledge and judgments.

In recent years, the ANN has been used as an alternative approach for the estimation of aquifer water quality parameters (Maier & Dandy 1996). Lin & Chen (2005, 2006) proposed ANNs to determine aquifer parameters and examined the efficiency of back-propagation network (BPN) and radial basis function network (RBFN) models. Maier & Dandy (1996) presented a method using ANN to forecast salinity in water resources. Banerjee et al. (2011) evaluated the prospect of ANN simulation in estimating the safe pumping rate to maintain groundwater salinity in island aquifers. The proposed ANN model with the optimal number of spatial and temporal variables has surfaced as a simpler and more accurate alternative to the numerical method techniques. Cho et al. (2011) evaluated the predictive performance of four different models: multiple linear regressions (MLR), principal component regression (PCR), ANN, and the combination of principal components and an ANN (PC-ANN) for prediction of the arsenic contamination of groundwater for Southeast Asian countries. The modeling results showed that the prediction accuracy of PC-ANN is superior among the four different models. Nadiri et al. (2013) introduced a supervised committee machine based on artificial intelligence (AI) method to predict fluoride in groundwater of Maku, Iran. Four AI models, Sugeno fuzzy logic, Mamdani fuzzy logic, ANN, and neuro-fuzzy were employed to predict fluoride concentration. The results showed significant fitting improvement with regard to an individual AI model, especially for fluoride prediction in the mixing zone. Orzepowski et al. (2014) evaluated the quality of ground- and surface water in rural areas of the Silesian Lowlands under variable climatic conditions of Southwest Poland. The research revealed that statistical analysis employing ANN modeling can be a useful tool for estimation of water content in soils exposed to various climatic scenarios.

In spite of the reliable ability of the ANNs in temporal predictions, they found rare application for the spatial modeling of the environmental processes. Instead, powerful interpolating geostatistics tools have been widely used for unbiased estimation of the spatial variables at a given point. Geostatistics has made rapid advances in recent years since it was first developed by Matheron (1963). Recently, the term geostatistics has been used more generally to describe all applications of statistics in hydrogeology in which the attribute is a random field in space. The heterogeneity of the subsurface is often difficult to characterize adequately for use in deterministic models; therefore, geostatistical techniques are usually used to generate estimates of parameters in mathematical models where parameters are random variables in space.

For groundwater problems, attributes such as groundwater quality parameters are sampled at a limited number of sites whereas values at unsampled sites are sometimes needed for hydro-environmental management. Geostatistical techniques such as kriging and co-kriging can be applied to estimate the values of attributes in such unsampled sites. A comprehensive review of the applications of geostatistical tools to hydrogeology can be found in the ASCE Task Committee report (ASCE 1990). Also, a few papers have reported some applications of the geostatistics tools for groundwater quality predictions (e.g., Gaus et al. 2003; Barca & Passarella 2007).

To establish hydro-chemical classes of groundwater in a study, it is often necessary to use different approaches and methods for the graphical, spatial, and temporal representation of the relevant data. Multivariate data analysis has previously been used by researchers as part of the interpretation of groundwater quality (e.g., Voudouris et al. 2000; Lambrakis et al. 2004; Baalousha 2010). Cluster analysis is a statistical method of partitioning samples into homogeneous classes to produce an operational overview on data sets. The importance of cluster analysis techniques is evident in classifying multivariate data into sub-groups. The need for clustering techniques in hydro-environmental fields, especially groundwater-related researches, has been raised in recent decades (see Kim et al. 2003; Subyani & Al Ahmadi 2010; Chang et al. 2011; Aris et al. 2012).

A clustering technique may be employed as a spatial clustering method to improve the performance of the hydrological modeling. In the context of ANN-based hydrological modeling, clustering is usually performed for classification of the data, stations or zones into homogeneous classes (Nourani & Kalantari 2010) and/or for optimization of the model structure by selecting dominant and relevant inputs (Nourani & Parhizkar 2013; Karunasingha & Liong 2015). Clustering techniques identify structure in an unlabeled data set by objectively arranging data into homogeneous groups, where the within-group-object dissimilarity is minimized, and the between group-object dissimilarity is maximized. Therefore, the training of ANNs using such homogeneous data can lead to much better outcomes.

Among the various clustering methods, the self-organizing map (SOM) presented by Kohonen (1998) operates as an effective tool to convert the complex, nonlinear, statistical relationship between high-dimensional data items into a simple, geometric relationship on a low-dimensional display so as to allow the number of clusters to be determined by inspection. The SOM-based classification is attractive, due to its topology preserving properties for solving various problems that traditionally have been the domain of conventional statistical and operational research techniques. SOMs have been successfully accepted in science and engineering issues because they can yield unbiased and visualized results. Several studies have been carried out on SOM-based classification of groundwater quality parameters to develop regional monitoring network and chemical analyses (e.g., Hong & Rosen 2001; Sanchez-Martos et al. 2002; Peeters et al. 2007).

Modeling and understanding the quality of groundwater is as important as its quantity, because it is the main factor for determining the suitability of water for drinking, domestic, agricultural, and industrial usages. Two of the most important indicators of aquifer contamination are electrical conductivity (EC) and total dissolved solids (TDS). Most of the salts in water are present in the ionic form and are capable of conducting electric current, also EC is an important criterion in determining the suitability of water and wastewater for irrigation and is essential to assess water pollution water treatment processes. TDS is a measure of the amount of material dissolved in water and is a measure of the ‘freshness’ of water. Increasing levels of TDS and EC in an aquifer indicate that the aquifer is contaminated.

This study proposes, for the first time, a new method for time-space modeling of groundwater quality parameters (EC and TDS) based on the spatial estimator of co-kriging, spatial clustering technique of SOM, and temporal forecasting model of feed-forward neural network (FFNN). The proposed method is tested for single-step-ahead forecasting of groundwater EC and TDS of the Ardabil plain in Northwest Iran.

In the next sections, the concepts of the multivariate estimation method of co-kriging, SOM, FFNN, and mutual information (MI) are briefly reviewed. The following sections describe the study area and data sources, the proposed methodology, and the evaluation criteria. The results obtained using the proposed methodology are then presented and discussed, followed by some concluding remarks.

MATERIALS AND METHODS

Geostatistical modeling by co-kriging

Co-kriging is a spatial interpolation method derived from kriging, which is an optimal, in the sense of being a best linear unbiased estimator, interpolation method (Cressie 1993). In co-kriging, additional information from other variables, so-called co-variables (secondary variables), are incorporated into the analysis of the main variable. There are two situations when co-kriging usually provides better estimates than most other interpolation methods – the under sampled and the isotropic case (Myers 1982). In the first case, the primary variable is measured sparsely and additional, highly correlated, data are sampled also at additional locations. By including these stations the area is spatially better covered and more information is gained. Isotropic means that more than one variable is measured at every spatial location.

The aim of co-kriging is to estimate the primary variable of Z1 (here, EC or TDS concentration) at some locations where no observations exist according to primary variable and secondary variable of Z2 (here, groundwater level (GL)) in some sample points. For this, a weighted linear estimator is used as (Cressie 1993): 
formula
1
and present the locations of primary and secondary variables. n and m denote the numbers of piezometers that measure primary and secondary variables, respectively. denotes a vector of geographical positions for the estimates.
For an unbiased estimator, the mean of the estimates must be equal to the true mean. Also, the best linear unbiased estimator must have minimum variance of estimation error (Isaaks & Srivastava 1989). The minimization of the estimation error variance under the constraint of unbiasedness leads to a set of simultaneous linear algebraic equations for co-kriging system with (n+ m+2) unknowns and an equal number of equations, according to Equations (2) and (3), as follows: 
formula
2
 
formula
3

and present cross-covariances that measure covariance between primary and secondary variables where λ1i and λ2j are weighting factors and can be determined by solving a nonlinear optimization problem using the Lagrange multipliers μ1 and μ2.

It is more common to write Equation (3) in terms of variogram. Cross-variogram functions describe the spatial correlation structure between different variables. The cross-variograms (and variograms when i=j, i.e., the correlation structure within one variable) can be expressed as (Isaaks & Srivastava 1989): 
formula
4
, , and present covariance, variance, and semi-variogram functions, respectively.

Self-organizing map

The SOM is an effective software tool for the visualization of high-dimensional data. It implements an orderly mapping of a high-dimensional distribution onto a regular low-dimensional grid. Thereby it is able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display while preserving the topology structure of the data (Kohonen 1998). The way SOMs go about reducing dimensions is by producing a map of usually one or two dimensions which plot the similarities of the data by grouping similar data items together. Thus SOMs accomplish two things: they reduce dimensions and display similarities. The SOM network generally consists of two layers, an input layer and a Kohonen layer. The input layer is fully connected to the Kohonen layer, which in most common applications is two-dimensional. A two-level SOM neural network is a promising approach to catch a preliminary overview on intricate data sets. It augments the conventional SOM network with an additional one-dimensional Kohonen layer in which each neuron is connected to neurons in the previous Kohonen layer. The schematic view of the two-level SOM network is shown in Figure 1.
Figure 1

Architecture of the two-level SOM neural network (Hsu & Li 2010).

Figure 1

Architecture of the two-level SOM neural network (Hsu & Li 2010).

The SOM is trained iteratively; initially the weights are randomly assigned. When the n-dimensional input vector x is sent through the network, the distance between the weight w neurons of SOM and the inputs is computed. The most common criterion to compute the distance is Euclidean distance (Kohonen 1998): 
formula
5
The weight with the closest match to the presented input pattern is called winner neuron or best matching unit (BMU). The BMU and its neighboring neurons are allowed to learn by changing the weights at each training iteration t, in a manner to further reduce the distance between the weights and the input vector (Kohonen 1998): 
formula
6
where is the learning rate, ranging in [0 1], l and m are the positions of the winning neuron and its neighboring output nodes, and hlm is the neighborhood function. The most commonly used neighborhood function is the Gaussian (Kohonen 1998): 
formula
7
where hlm is the neighborhood function of the best matching neuron l at iteration t and l-m is the distance between neurons and on the map grid; and is the width of the topological neighborhood. The training steps are repeated until convergence. After the SOM network is constructed, the homogeneous regions, i.e., clusters, are defined on the map.

Feed-forward neural network

The FFNN is widely applied in hydrology and water resource studies as a forecasting tool. It has already been demonstrated that an FFNN model trained by the back-propagation (BP) algorithm with three layers is satisfactory for forecasting and simulating hydrological problems (Hornik et al. 1989; ASCE 2000). Three-layered FFNNs, which have usually been used in forecasting hydrologic time series, provide a general framework for representing nonlinear functional mapping between a set of input and output variables (Figure 2).
Figure 2

A three layered feed-forward neural network with BP training algorithm (Nourani et al. 2009).

Figure 2

A three layered feed-forward neural network with BP training algorithm (Nourani et al. 2009).

They are based on a linear combination of the input variables, which are transformed by a nonlinear activation function. The term ‘feed-forward’ means that a neuron connection only exists from a neuron in the input layer to other neurons in the hidden layer or from a neuron in the hidden layer to neurons in the output layer. The neurons within a layer are not interconnected.

In Figure 2, i, j, and k denote the input layer, hidden layer, and output layer neurons, respectively, and w is the applied weight by the neuron. The explicit expression for an output value of a three-layered FFNN is given by (Nourani et al. 2009): 
formula
8
where wji is a weight in the hidden layer connecting the i th neuron in the input layer and the j th neuron in the hidden layer, wjo is the bias for the j th hidden neuron, fh is the activation function of the hidden neuron, wkj is a weight in the output layer connecting the j th neuron in the hidden layer and the k th neuron in the output layer, wko is the bias for the k th output neuron, fo is the activation function for the output neuron, xi is i th input variable for input layer, and k and y are computed and observed output variables, respectively. NN and MN are the number of the neurons in the input and hidden layers, respectively. The weights are different in the hidden and output layers, and their values can be changed during the network training process.

In any FFNN-based modeling, there are two important points to which attention must be paid: first, the architecture, i.e., the number of neurons in the input and hidden layers, and second, the training iteration (epoch) number. Appropriate selection of these two parameters improves model efficiency in both training and testing steps. Furthermore, a high epoch number and poor quality or quantity of data could cause the network to over fit during the training step. If this occurs, the model cannot adequately generalize new data outside of the training set.

Mutual information

Shannon mathematically formulated entropy (H), now called the Shannon entropy or information content, for a discrete random variable of X with sample size of N (bin number), which takes values x1; x2; …; xN with probabilities of p1, p2, …, pN, respectively, as (Shannon 1948): 
formula
9
Usually as a conventional method, linear correlation coefficients between candidate input and output are employed to determine the dominant inputs of ANN (Partal & Cigizoglu 2008), but as criticized by Nourani et al. (2014b), it is more appropriate to use a nonlinear measure such as MI in a nonlinear modeling framework such as ANN, since sometimes there may be a weak linear but strong nonlinear relationship between input(s) and output.
MI between two random variables X and Y is defined as (Yang et al. 2000): 
formula
10
with H(A) and H(B) being the entropy of A and B, respectively, and H(A,B) their joint entropy as: 
formula
11

Study area and data set

The Ardabil plain aquifer (38° 03′–38°27′ N and 47°55′–48°20′ E) is located in the northwest of Iran in the province of Ardabil. The region experiences relatively long winters and this plain is well known as the coldest region of Iran with an average annual precipitation of about 300 mm. May and August are the wettest and driest months of the region, respectively. The mean temperature in Ardabil plain is about 9 °C. Figure 3 shows the location of the Ardabil plain.
Figure 3

Study area map and positions of piezometers.

Figure 3

Study area map and positions of piezometers.

Agriculture is the main occupation of the people in this area. Since 1980, the rapid expansion of Ardabil city, intense agricultural and some industrial activities have imposed stress on the Ardabil plain aquifers. According to Ardabil Regional Watercorp reports, the average use of drinking, industrial, and agricultural water in the Ardabil plain is about 26.4 and 177 (million m3/year), respectively, which accounts for 89% of total water demand supplied by groundwater with the remaining 11% being obtained from surface water. In the Ardabil plain 2,622 active pumping wells, 36 qantas, and 77 springs operate. The rapid increase in water demand has led to large-scale groundwater developments in this plain, and intense extraction of groundwater has caused GL to decline as much as 12 m during the past 25 years in the plain.

There are two high mountains located on the Ardabil plain, Alborz and Sabalan. Hydro-chemical studies indicated that geological units have the greatest effect on the groundwater quality in this plain. Ardabil plain has two aquifers; one is located on top of the other. The upper aquifer is multi-layer with varying thickness (20 to 40 m) across the plain and is extracted for different kinds of water supply. The number of penetrating wells to the lower aquifer is very low and may not be extracted because of its poor water quality. The general direction of groundwater flow is from other directions to the north-west of the plain.

The rock units consisting of conglomerate with some sandstone, marl, fresh water limestone, pumice, tuff and lahar of Neogene age are located in the southwestern Ardabil plain. These formations have a very low effect on aquifer recharge. Other rock units including cherty limestone, cherty dolomite, sandstone and conglomerate with thin beds of gypsum in the northeast and megaporphyric trachy andesite, trachy basalt, volcanic breccias, olivine basalt, tuff and sandy tuff located in the south, east, and north of the plain have moderate potential from the aquifer recharge point of view (Kord et al. 2013). The geological map of the Ardabil area is shown in Figure 4.
Figure 4

Geological map of the Ardabil plain.

Figure 4

Geological map of the Ardabil plain.

In the southern and northern parts of the Ardabil plain, although they are areas of recharge, Na-Cl water type is observed. These parts of the aquifer, which have a high concentration of sodium and chloride (poor chemical quality), are in direct contact with evaporated deposits (Kord et al. 2013).

The geo-chemical parameters (EC and TDS) of 38 piezometers, from 1996 to 2013, measured twice a year, were gathered and used in this study. One measurement per year belongs to the wet season in which the GL reaches its highest level and the other in the dry season when the level is at its lowest level. Also for the study period, GL time series of 26 piezometers, rainfall time series observed in five stations, and observed runoff time series at the outlet of the basin, all in monthly time scale, were also employed in this study. Figure 3 presents the locations of piezometers and stations throughout the plain. A statistical summary of the groundwater quality, quantity and hydrological data is presented in Table 1.

Table 1

Statistics of used data

Parameter Unit Max Min Average Standard deviation 
Rainfall mm 156 17.52 22.89 
Runoff m3/s 3,076.6 176.94 297.724 
GL m 1,458.45 1,299.31 1,339.416 39.05 
EC  5,480 148 1,115.763 800.66 
TDS Mg/lit 3,836 22.6 691.91 −541.153 
Parameter Unit Max Min Average Standard deviation 
Rainfall mm 156 17.52 22.89 
Runoff m3/s 3,076.6 176.94 297.724 
GL m 1,458.45 1,299.31 1,339.416 39.05 
EC  5,480 148 1,115.763 800.66 
TDS Mg/lit 3,836 22.6 691.91 −541.153 

Input and output variables for ANN modeling are usually normalized to eliminate their dimensions. The following simple linear mapping of the variables is the most common method for this purpose (Nourani & Kalantari 2010): 
formula
12
where is the actual value and is the respective normalized value. and are the minimum and maximum of the values, respectively. The normalized data were divided into training and verification sets. About 12 years of data were used for the training and the remaining 5 years for the validation purpose.

Model precision evaluation

Efficiency criteria in spatial clustering

To evaluate the performance of clustering results produced by the SOM, the silhouette coefficient is used as the measure of cluster validity (Hsu & Li 2010). The silhouette coefficient of a cluster can indicate the degree of similarity of piezometers within a cluster defined as (Hsu & Li 2010): 
formula
13
where S(i) is the silhouette of piezometer i, a(i) measured as a Euclidean distance, is the average dissimilarity of cluster i to all other piezometers in cluster A; and b(i) is the least average dissimilarity of piezometer i to the piezometers within a cluster different from cluster A. Thus, a smaller S(i) value indicates a better similarity among piezometers within the same cluster. The overall quality of a clustering distribution can then be measured using the average silhouette width for the entire data set.

Efficiency criteria in prediction stage

In this study, two different criteria are used to measure the efficiency of the forecasting method: the root mean square error (RMSE) and the determination coefficient (DC) (Nourani & Kalantari 2010). The RMSE and DC demonstrate discrepancies between predictions and observations as: 
formula
14
 
formula
15
where DC, RMSE, N, , , and are determination coefficient, root mean square error, number of observations, observed data, computed values, and mean of observed data. In a best model, DC and RMSE go to one and zero, respectively. Legates & McCabe (1999) indicated that a hydrological model can be sufficiently evaluated by these two statistics.

Proposed methodology

To optimize the input layer and improve the efficiency of the FFNN-based groundwater quality model of the plain, the groundwater quality data (EC and TDS) were spatially estimated and classified, using a newly proposed hybrid method involving the co-kriging and SOM techniques. For the modeling, the dominant input data selected by SOM and MI were imposed into the FFNN model for single-step-ahead groundwater quality parameters modeling.

The proposed methodology considers the coupling of three stages for prediction of groundwater quality parameters (see Figure 5). At the first stage, since most of the piezometers which monitor the GL and groundwater quality piezometers are not co-locations, the co-kriging method and GL data observed in piezometers (as secondary parameter) are used to estimate quality parameters (as primary parameter) in the locations of GL monitoring. In stage 2, the SOM spatial clustering method is employed to classify piezometers based on GL and quality parameters. Thereafter, the central piezometer of each cluster is identified. At stage 3, first, dominant inputs are selected among groundwater quality (EC and TDS), GL and hydrological parameters (rainfall (P) and runoff (R)) data of central piezometer for each cluster using MI. Finally, the dominant inputs are imposed into the FFNN as a nonlinear forecasting method for predicting groundwater quality parameters in each cluster. The details of the obtained results are presented in the following sections.
Figure 5

Schematic diagram of the proposed methodology (T presents 6-monthly period).

Figure 5

Schematic diagram of the proposed methodology (T presents 6-monthly period).

To verify the proposed methodology, the method applied to a synthetic aquifer. In this example, two piezometers were selected from 26 groundwater level piezometers from different parts of the plain which represent different groundwater level patterns of the plain (i.e., GL1 and GL3, see Figure 3). Thereafter, best fitted equations for both piezometers data were consequently determined as: 
formula
16
 
formula
17
Finally, distinct noises (with normal distribution) and shifts were added to P1(t) and P3(t) equations to create five more synthetic GL time series (P′1 and P″1 according to P1; P′3, P″3 and P′′′3 according to P3; see Figure 6(a) and 6(b)). In this way, seven piezometers with synthetic GL time series data were considered in a hypothetical plain to verify the proposed clustering and forecasting methodology. According to obtained neighbor weighted distances (Figure 7), darker hexagonals divided the Kohonen layer approximately into two separate parts in the two-dimensional SOM and seven piezometers were correctly classified by spatial clustering method of SOM into two groups identifying piezometers of P1 and P3 as central piezometers of two clusters.
Figure 6

Synthetic GL timeseries: (a) P1, (b) P3.

Figure 6

Synthetic GL timeseries: (a) P1, (b) P3.

Figure 7

Two-dimensional SOM clustering of GL for synthetic data with SOM neighbor weighted distance plan.

Figure 7

Two-dimensional SOM clustering of GL for synthetic data with SOM neighbor weighted distance plan.

After clustering, ANN and MLR models were used to predict the GL one time step ahead. For the proposed methodology, the dominant inputs selected by SOM were used in FFNN and MLR models. The results are presented in Tables 2 and 3, respectively, which show the merit of the proposed methodology. In this way, owing to the use of synthetic data both FFNN and MLR could lead to good results, but for the real world data it is expected that the ANN will have a bit better result than the MLR model.

Table 2

Results of GL modeling via FFNN

     DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 P2(t), P3 (t) P1(t + 1) 2-4-1 70 0.989 0.877 0.030 0.024 
Cluster 2 P5(t), P6(t), P7(t) P4(t + 1) 3-7-1 120 0.998 0.983 0.011 0.017 
     DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 P2(t), P3 (t) P1(t + 1) 2-4-1 70 0.989 0.877 0.030 0.024 
Cluster 2 P5(t), P6(t), P7(t) P4(t + 1) 3-7-1 120 0.998 0.983 0.011 0.017 

*The result has been presented for the best structure. First, second, and third numbers denote input, hidden, and output neuron number, respectively.

Table 3

Results of GL modeling via MLR

  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 P1(t + 1) = 0.74P2(t) + 0.291P3(t)-0.024 0.811 0.722 0.0331 0.0433 
Cluster 2 P4(t + 1) = 0.691P5(t) + 0.307P6(t) + 0.193P7(t)-0.068 0.823 0.710 0.028 0.033 
  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 P1(t + 1) = 0.74P2(t) + 0.291P3(t)-0.024 0.811 0.722 0.0331 0.0433 
Cluster 2 P4(t + 1) = 0.691P5(t) + 0.307P6(t) + 0.193P7(t)-0.068 0.823 0.710 0.028 0.033 

RESULTS AND DISCUSSION

The results of the proposed FFNN modeling of EC and TDS using spatial estimator (co-kriging) and spatial clustering (SOM) techniques are presented in the following sub-sections. The FFNN results are also compared with those of the conventional MLR model.

Results of spatial estimation by co-kriging (stage 1)

Estimation of groundwater quality parameters in the locations where the GL are measured (GL piezometers) is the goal of this stage because the GL piezometers (26 piezometers) are mostly not co-locations with groundwater quality piezometers (38 piezometers). For this purpose, the geostatistical method of Co-Kriging is applied to estimate EC and TDS values in the location of GL piezometers.

Unlike ordinary kriging which only deals with the primary variable alone, co-kriging utilizes not only the primary variable (e.g., EC and TDS) but also cross-correlated secondary variables (here GL) in the spatial estimations and thus, co-kriging operates as a linear interpolator of both primary and secondary parameters. If only a limited number of observations are available for the primary variable of concern, knowledge of secondary variables that are correlated with the primary variable can be used to reduce the estimation error. The estimation error is thereby reduced since more information is being used for the estimation of the primary parameter.

The variogram measures dissimilarity, or increasing variance between points (decreasing correlation) as a function of distance. In addition to helping us assess how values at different locations vary over distance, the variogram provides a way to study the influence of other factors which may affect whether the spatial correlation varies only with distance (the isotropic case) or with direction and distance (the anisotropic case). The variogram map provides a visual picture of semi-variance in every compass direction. The cross-variogram calculates experimental semi-variogram values for two input variables and cross-variogram values for the combination of both variables. As two variables are handled simultaneously, the cross-variogram operation can be seen as the multivariate form of the spatial correlation operation.

The geostatistical model, having the least error, was selected by comparing the observed GL, EC, and TDS salinity values with the values estimated by variogram models.

Based on the fitted variograms and cross-variograms (between EC or TDS and GL values) models to the experimental variograms, spatial estimations of EC and TDS values for the area were carried out using co-kriging method. For this purpose, different variogram models (i.e., spherical, exponential, Gaussian; Myers 1982) were examined for spatial estimations of EC and TDS values for the piezometers that only monitor the GL values.

The performance of applied co-kriging method was also verified via cross-validation technique. Cross-validation is a process for checking the compatibility between a set of data, the spatial model, and neighborhood design. In cross-validation, each point in the spatial model is individually removed from the model, and then its value is estimated by a covariance model.

Since groundwater quality parameters have been measured twice a year, co-kriging estimations of EC and TDS were performed 34 times (for 17 years of study period) to create time series of quality parameters co-located with the GL monitoring piezometers.

The smallest RMSE and highest DC values in cross-validation step indicate the most accurate predictions via evaluating different variogram models. In this way, the Gaussian model was obtained as the best fitted variogram and cross-variogram models on EC, TDS, and GL data.

The theoretical fitted Gaussian semi-variogram is described by (Myers 1982): 
formula
18
By fitting a variogram model (e.g., Equation (16), Figure 8) with lag distance of h to the experimental variogram, the model parameters (sill, A, range, r, and nugget, C0) can be determined.
Figure 8

Gaussian semi-variogram model.

Figure 8

Gaussian semi-variogram model.

Table 4 illustrates the parameters of variogram models fitted to the experimental variograms and Table 5 presents the cross-validation results. It should be noted that since the co-kriging approach was applied to the data 34 times, the range (maximum–minimum) of obtained parameters has been presented for each parameter of the variograms (see Table 4). Consequently, Table 5 shows the range of efficiency criteria obtained through the cross-validation process. Also, Figure 9 shows an example of variogram and cross-validation results of TDS values.
Table 4

The parameters of semi-variogram and cross-variogram in co-kriging

Semi-variogram
 
Cross-variogram with GL
 
Variable Model Nugget Sill Range (Km) Model Nugget Sill Range (Km) 
EC Gaussian 0.154–0.21 0.17–0.4 19,150–24,600 Gaussian 0.0538–0.218 0.0005–0.003 18,333.5–23,151.5 
TDS Gaussian 0.055–0.193 0.357–0.477 17,242.9–21,879.4 Gaussian 0.0247–0.194 0.0007–0.003 18,821.1–19,873.9 
GL Gaussian 0.0001–0.0004 0.0006–0.0009 18,333.5–23,151.5     
Semi-variogram
 
Cross-variogram with GL
 
Variable Model Nugget Sill Range (Km) Model Nugget Sill Range (Km) 
EC Gaussian 0.154–0.21 0.17–0.4 19,150–24,600 Gaussian 0.0538–0.218 0.0005–0.003 18,333.5–23,151.5 
TDS Gaussian 0.055–0.193 0.357–0.477 17,242.9–21,879.4 Gaussian 0.0247–0.194 0.0007–0.003 18,821.1–19,873.9 
GL Gaussian 0.0001–0.0004 0.0006–0.0009 18,333.5–23,151.5     
Table 5

Cross-validation results

Variable RMSE DC 
EC 555.9–692.4 (0.58–0.75 
TDS 209.8–688 (mg/lit) 0.59–0.73 
Variable RMSE DC 
EC 555.9–692.4 (0.58–0.75 
TDS 209.8–688 (mg/lit) 0.59–0.73 
Figure 9

(a) Experimental and best-fitted Gaussian semi-variogram model for TDS, (b) scatter plot of cross-validation for TDS for dry season in 2004.

Figure 9

(a) Experimental and best-fitted Gaussian semi-variogram model for TDS, (b) scatter plot of cross-validation for TDS for dry season in 2004.

Results of SOM-based clustering (stage 2)

In groundwater quality modeling, researchers are often confronted with large, multidimensional data sets. An exploratory data analysis is often carried out, aiming at summarizing the available data, extracting useful information and formulating hypotheses for further research.

Implanted piezometers all over the plain can prepare informative data about the quality of groundwater. Application of clustering techniques (e.g., SOM) on available piezometers interspersed spatially over the plain can be an appropriate method to capture the adequate information of homogeneous piezometers.

Owing to the existence of various piezometers over the plain and the importance of managing groundwater resources, it is a necessity to unite the adequate information about groundwater quality in various regions of the plain and identify the dominant piezometers to predict groundwater quality parameters of the plain for the future. In order to accomplish the spatial clustering, SOM was utilized to identify similar and predominant piezometers.

For the clustering at first, a two-dimensional SOM was applied to the groundwater quality (EC and TDS) data as input to classify the piezometers into the clusters with similar groundwater quality patterns. The purpose of such a two-dimensional SOM clustering was to have an overview on homogeneous regions and the approximate number of clusters with regard to the plain topology. In order to apply the proposed two-step SOM, the size of the Kohonen layer was considered to be 6 × 6 for the EC parameter in the first step. Figure 10(a) shows the hits plan of the output layer size of 6 × 6 for the EC parameter. The hits plan is an illustration of a SOM output layer, with each neuron showing the number of classified input vectors. The relative number of vectors for each neuron is shown via the size of a colored patch. Figure 10(b) shows the neighbor weighted distances obtained by the two-dimensional SOM, where the small hexagons represent the neurons. The colors in the linear (large) hexagons indicate the distances between neurons, with the darker colors representing larger distances, and the lighter colors representing smaller distances. According to the obtained neighbor weighted distances for EC (Figure 10(b)), darker linear hexagons divide the Kohonen layer approximately into five parts, in the two-dimensional SOM. Second, in order to be certain of the highlighted clusters, a one-dimensional SOM was used to classify the piezometers into a specific number of clusters, determined in the first step (i.e., 1 × 4). At this one-dimensional SOM, the number of neurons in the Kohonen layer was set equal to the number of clusters determined by two-dimensional SOM (i.e., four clusters).
Figure 10

Two-dimensional SOM clustering of EC data: (a) SOM hits, (b) SOM neighbor weighted distance plan.

Figure 10

Two-dimensional SOM clustering of EC data: (a) SOM hits, (b) SOM neighbor weighted distance plan.

To evaluate the performance of the clustering results produced by SOM, the silhouette coefficient (Equation (9)) was utilized and the results of clustering are presented in Tables 6 and 7 for EC and TDS parameters, respectively.

Table 6

Results of SOM clustering for EC parameter

Cluster no. Piezometers Silhouette coefficient Central piezometer 
Cluster 1 EC4, EC10, EC18, EC26 0.3696, 0.2191, 0.2181, 0.167 EC4 
Cluster 2 EC8, EC15, EC16, EC17 0.2892, 0.2054, 0.5942, 0.098 EC16 
Cluster 3 EC1, EC2, EC5, EC6, EC9, EC11, EC12, EC13, EC14, EC19, EC20, EC22, EC25 0.2139, 0.0985, 0.4658, 0.2986, 0.3569, 0.4035, 0.3943, 0.4113, 0.285, 0.0456, 0.4302, 0.1371, 0.3193 EC5 
Cluster 4 EC3, EC7, EC21, EC23, EC24 0.6783, 0.7909, 0.5265, 0.7779, 0.5033 EC7 
Cluster no. Piezometers Silhouette coefficient Central piezometer 
Cluster 1 EC4, EC10, EC18, EC26 0.3696, 0.2191, 0.2181, 0.167 EC4 
Cluster 2 EC8, EC15, EC16, EC17 0.2892, 0.2054, 0.5942, 0.098 EC16 
Cluster 3 EC1, EC2, EC5, EC6, EC9, EC11, EC12, EC13, EC14, EC19, EC20, EC22, EC25 0.2139, 0.0985, 0.4658, 0.2986, 0.3569, 0.4035, 0.3943, 0.4113, 0.285, 0.0456, 0.4302, 0.1371, 0.3193 EC5 
Cluster 4 EC3, EC7, EC21, EC23, EC24 0.6783, 0.7909, 0.5265, 0.7779, 0.5033 EC7 
Table 7

Results of SOM clustering for TDS parameter

Cluster no. Piezometers Silhouette coefficient Central piezometer 
Cluster 1 TDS3, TDS7, TDS10, TDS18, TDS21, TDS23 0.398, 0.6226, 0.0613, 0.2969, 0.6393, 0.5827 TDS21 
Cluster 2 TDS1, TDS5, TDS11, TDS13, TDS19, TDS24 0.5019, 0.3547, 0.2964, 0.1804, 0.0992, 0.5107 TDS24 
Cluster 3 TDS2, TDS4, TDS15, TDS22, TDS26 −0.0893, −0.0136, 0.1181, 0.2372, 0.3282 TDS26 
Cluster 4 TDS6, TDS8, TDS9, TDS12, TDS14, TDS16, TDS17, TDS20, TDS25 0.4877, 0.3560, 0.5485, 0.3892, 0.1148, 0.2763, 0.0331, 0.04, 0.4465 TDS9 
Cluster no. Piezometers Silhouette coefficient Central piezometer 
Cluster 1 TDS3, TDS7, TDS10, TDS18, TDS21, TDS23 0.398, 0.6226, 0.0613, 0.2969, 0.6393, 0.5827 TDS21 
Cluster 2 TDS1, TDS5, TDS11, TDS13, TDS19, TDS24 0.5019, 0.3547, 0.2964, 0.1804, 0.0992, 0.5107 TDS24 
Cluster 3 TDS2, TDS4, TDS15, TDS22, TDS26 −0.0893, −0.0136, 0.1181, 0.2372, 0.3282 TDS26 
Cluster 4 TDS6, TDS8, TDS9, TDS12, TDS14, TDS16, TDS17, TDS20, TDS25 0.4877, 0.3560, 0.5485, 0.3892, 0.1148, 0.2763, 0.0331, 0.04, 0.4465 TDS9 

The Euclidean distance criterion was then utilized to determine the centroid piezometer (prominent piezometer) of each cluster, which is the best representation of the groundwater quality pattern of the cluster. Thus, the specific measures should be devoted to protecting and maintaining such important piezometers in order to have precise and reliable data for prediction and management of the plain. The determined central piezometers of clusters are presented in the fourth column of Tables 6 and 7.

Tables 6 and 7 (and see Figure 11) show that SOM clusters the EC and TDS piezometers into almost similar clusters. Owing to the fact that most of the salts in water are presented in the ionic form and capable of conducting electric current, conductivity is a good and rapid measure of concentration of TDS and denotes the relationship between EC and TDS. According to the geology of the plain, the north and south-west areas with high concentrations of sodium and chloride are placed in two separated clusters of EC and TDS parameters. The other parts of the plain (the south-east, east and central areas) with good chemical quality are placed in the other clusters.
Figure 11

SOM clustering: (a) EC, (b) TDS.

Figure 11

SOM clustering: (a) EC, (b) TDS.

Results of FFNN-based prediction (stage 3)

The spatial clustering of piezometers by SOM divided the plain into four homogenous clusters (zones) according to EC and TDS parameters. Then, two FFNNs were trained for each cluster to predict EC and TDS values of the central piezometer (as representative of zone) one time step ahead.

Since the piezometers that lie in the same cluster may have a similar quality pattern, the quality behavior of piezometers of the cluster in future may be guessed from the quality predictions of the central piezometer and the behavior of central piezometers can be considered as the representative of groundwater quality pattern in various regions of the plain. To take into account the contribution of surface water in the hydrologic budget of the plain and its impact on the groundwater quality and quantity parameters, both precipitation and runoff data were also considered as potential inputs of the FFNNs as well as GL and quality data at previous time steps to predict quality parameters of EC and TDS one time step ahead.

In this plain, the groundwater quality data (EC and TDS) sampling have been reported at 6-monthly (T) intervals for all of the piezometers but the monthly (t) rainfall, runoff, and GL data are available and used in the FFNN models.

In ANN modeling, it is important that particular attention is paid to the appropriate selection of inputs which can improve the model's efficiency in both the steps of calibration (training) and verification. It also prevents the model from being over-trained. For this purpose, different input combinations (from EC, TDS, rainfall (P), runoff (R), and GL time series for each cluster) were examined as input neurons of FFNNs for one-step-ahead EC and TDS predictions.

In this study, the MI measure (Equation (10)) was employed to determine the dominant inputs and lag times between hydrological processes (rainfall, runoff, and GL) and groundwater quality parameters among different input combinations for each cluster. The results of the sensitivity analysis via MI in selecting dominant inputs and lag times for groundwater quality modeling are presented in Tables 811 in the column Input parameters.

Table 8

Results of FFNN modeling for EC

     DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 EC4(T-1), R(t-2), P6(t) EC4(T + 1) 3-3-1 40 0.932 0.767 0.07 0.143 
Cluster 2 EC16(T-1), GL16(t), R(t-2), P1(t) EC16(T + 1) 4-4-1 20 0.955 0.708 0.045 0.165 
Cluster 3 EC5(T-1), GL5(t), R(t-2), P4(t-3) EC5(T + 1) 4-3-1 20 0.867 0.803 0.081 0.126 
Cluster 4 EC7(T-1), GL7(t), R(t-3) EC7(T + 1) 3-5-1 20 0.968 0.848 0.036 0.077 
     DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 EC4(T-1), R(t-2), P6(t) EC4(T + 1) 3-3-1 40 0.932 0.767 0.07 0.143 
Cluster 2 EC16(T-1), GL16(t), R(t-2), P1(t) EC16(T + 1) 4-4-1 20 0.955 0.708 0.045 0.165 
Cluster 3 EC5(T-1), GL5(t), R(t-2), P4(t-3) EC5(T + 1) 4-3-1 20 0.867 0.803 0.081 0.126 
Cluster 4 EC7(T-1), GL7(t), R(t-3) EC7(T + 1) 3-5-1 20 0.968 0.848 0.036 0.077 

*The result has been presented for the best structure. First, second, and third numbers denote input, hidden, and output neuron number, respectively (T presents 6-monthly period and t presents monthly period).

Table 9

Results of FFNN modeling for TDS

   Network  DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 TDS21(T-1), GL21(t), R(t-3) TDS21(T + 1) 3-5-1 20 0.898 0.771 0.096 0.149 
Cluster 2 TDS24(T-1), GL24(t), R(t-2), P5(t) TDS24(T + 1) 4-3-1 40 0.956 0.9305 0.0655 0.083 
Cluster 3 TDS26(T-1), GL26(t), R(t-2), P6(t-1) TDS26(T + 1) 4-4-1 30 0.969 0.732 0.054 0.151 
Cluster 4 TDS9(T-1), GL9(t), R(t), P4(t-1) TDS9(T + 1) 4-4-1 50 0.97 0.884 0.054 0.107 
   Network  DC
 
RMSE (Normalized)
 
Cluster no. Input variables Output variable structure* Epoch Calibration Verification Calibration Verification 
Cluster 1 TDS21(T-1), GL21(t), R(t-3) TDS21(T + 1) 3-5-1 20 0.898 0.771 0.096 0.149 
Cluster 2 TDS24(T-1), GL24(t), R(t-2), P5(t) TDS24(T + 1) 4-3-1 40 0.956 0.9305 0.0655 0.083 
Cluster 3 TDS26(T-1), GL26(t), R(t-2), P6(t-1) TDS26(T + 1) 4-4-1 30 0.969 0.732 0.054 0.151 
Cluster 4 TDS9(T-1), GL9(t), R(t), P4(t-1) TDS9(T + 1) 4-4-1 50 0.97 0.884 0.054 0.107 

*The result has been presented for the best structure. First, second, and third numbers denote input, hidden, and output neuron number, respectively (T presents 6-monthly period and t presents monthly period).

Table 10

Results of FFNN modeling for EC without clustering

    DC
 
RMSE (Normalized)
 
Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
EC10(T-1), EC18(T-1), EC26(T-1) EC4(T + 1) 3-3-1 30 0.947 0.604 0.062 0.212 
EC8(T-1), EC15(T-1), EC17(T-1) EC16(T + 1) 3-4-1 20 0.793 0.451 0.090 0.171 
EC10(T-1), EC11(T-1), EC20(T-1), EC22(T-1) EC5(T + 1) 4-2-1 30 0.875 0.590 0.094 0.185 
EC3(T-1), EC18(T-1), EC21(T-1), EC23(T-1) EC7(T + 1) 4-3-1 30 0.885 0.664 0.060 0.099 
    DC
 
RMSE (Normalized)
 
Input variables Output variable Network structure* Epoch Calibration Verification Calibration Verification 
EC10(T-1), EC18(T-1), EC26(T-1) EC4(T + 1) 3-3-1 30 0.947 0.604 0.062 0.212 
EC8(T-1), EC15(T-1), EC17(T-1) EC16(T + 1) 3-4-1 20 0.793 0.451 0.090 0.171 
EC10(T-1), EC11(T-1), EC20(T-1), EC22(T-1) EC5(T + 1) 4-2-1 30 0.875 0.590 0.094 0.185 
EC3(T-1), EC18(T-1), EC21(T-1), EC23(T-1) EC7(T + 1) 4-3-1 30 0.885 0.664 0.060 0.099 

*The result has been presented for the best structure. First, second, and third numbers denote input, hidden, and output neuron number, respectively (T presents 6-monthly period).

Table 11

Results of FFNN modeling for TDS without clustering

  Network  DC
 
RMSE (Normalized)
 
Input variables Output variable structure* Epoch Calibration Verification Calibration Verification 
TDS7(T-1), TDS18(T-1), TDS23(T-1) TDS21(T + 1) 3-3-1 50 0.834 0.533 0.114 0.126 
TDS11(T-1), TDS15(T-1), TDS19(T-1), TDS20(T-1) TDS24(T + 1) 4-4-1 40 0.976 0.599 0.045 0.218 
TDS4(T-1), TDS10(T-1), TDS22(T-1) TDS26(T + 1) 3-2-1 30 0.791 0.430 0.137 0.173 
TDS6(T-1), TDS8(T-1), TDS12(T-1), TDS17(T-1) TDS9(T + 1) 4-3-1 40 0.750 0.541 0.089 0.111 
  Network  DC
 
RMSE (Normalized)
 
Input variables Output variable structure* Epoch Calibration Verification Calibration Verification 
TDS7(T-1), TDS18(T-1), TDS23(T-1) TDS21(T + 1) 3-3-1 50 0.834 0.533 0.114 0.126 
TDS11(T-1), TDS15(T-1), TDS19(T-1), TDS20(T-1) TDS24(T + 1) 4-4-1 40 0.976 0.599 0.045 0.218 
TDS4(T-1), TDS10(T-1), TDS22(T-1) TDS26(T + 1) 3-2-1 30 0.791 0.430 0.137 0.173 
TDS6(T-1), TDS8(T-1), TDS12(T-1), TDS17(T-1) TDS9(T + 1) 4-3-1 40 0.750 0.541 0.089 0.111 

*The result has been presented for the best structure. First, second, and third numbers denote input, hidden, and output neuron number, respectively (T presents 6-monthly period).

For each cluster, two FFNNs were trained (for modeling EC and TDS) by back-propagation algorithm and the efficiency criteria for each model were used to determine the best model. The results of temporal modeling of EC and TDS are shown for four clusters in Tables 8 and 9, respectively. The performance efficiencies of FFNNs in both training and validation steps show reliable results for predicting EC and TDS concentrations. Overall, it can be seen that the FFNN model well reproduced the variations of the observed groundwater quality parameters in the Ardabil plain. The result indicates that the proposed methodology can be used for groundwater quality modeling in such areas using groundwater data (quality and quantity) and some easily available hydro-climatologic data as inputs of the model.

The results for EC and TDS parameters indicate that all of the models produced acceptable outcomes, and confirm the appropriate identification of the representative groundwater quality patterns over the plain. In most cases, the east and central parts of the plain provided better results (for both EC and TDS parameters) than other parts which is due to them being rich in chemical quality. As mentioned in geology descriptions, the EC and TDS clusters that are placed in the north and south-west of the plain produced less accurate outcomes in comparison to the other clusters.

It is clear from Tables 8 and 9 that the quality of piezometers placed in areas with a higher distance from the plain outlet show higher lag with the values of runoff time series. The higher distances of central piezometers from the outlet demonstrate a late response to the overland water condition (rainfall and runoff). Therefore, the central piezometers close to the outlet show minimum lag with runoff parameter (see Tables 8 and 9). Also, lag time between the rainfall and central piezometer's EC and TDS values is proportional to the spatial distance between the related rainfall station and that piezometer.

For instance, scatter plots shown in Figures 12 and 13 compare the observed and predicted concentrations of EC and TDS in both training and validation steps for piezometers 5 and 24, respectively.
Figure 12

Comparison of observed and predicted EC concentrations in piezometer 5: (a) train step, (b) verification step.

Figure 12

Comparison of observed and predicted EC concentrations in piezometer 5: (a) train step, (b) verification step.

Figure 13

Comparison of observed and predicted TDS concentrations in piezometer 24: (a) train step, (b) verification step.

Figure 13

Comparison of observed and predicted TDS concentrations in piezometer 24: (a) train step, (b) verification step.

In dealing with many alternative inputs in FFNN modeling, an efficient algorithm is needed to select dominant inputs. Therefore, to optimize the input layer of FFNN and improve the modeling efficiency, SOM clustering approach was used to select dominant inputs instead of using whole potential inputs in the FFNN. Tables 10 and 11 show the results of FFNNs without clustering for modeling EC and TDS, respectively. In this way, the input variables were selected from three or four neighbor piezometers of the coordinate piezometer. The comparison of FFNNs results without and with using clustering approach (Tables 8 and 9) shows that modeling by clustering approach has more accuracy than without clustering (using neighbor piezometers data) for groundwater quality predictions. This is because in modeling without clustering irrelevant data as inputs in the prediction stage enter redundant information and noise into the modeling process which, in turn, leads to inappropriate and over-fitted results for most cases.

For further evaluation of the trained FFNNs and to have a comparison with conventional prediction methods, the classic MLR model was also employed for modeling groundwater quality parameters of the plain. Although a linear and classic time series forecasting model like MLR is likely not to be able to model a complex nonlinear hydrological process, it is still a commonly used method in practice and, accordingly, is useful as a comparison model. In this research, rainfall, runoff, antecedents of GL, and groundwater quality data were used as exogenous inputs to predict future groundwater quality parameter as the output. In the MLR model, input data are selected using correlation coefficient between potential inputs and target. The MLR model, owing to its being linearly inherent, could not detect nonlinear relationship between the data. Thus, correlation coefficient that detects linear correlation between input parameters and target is usually used as criterion for dominant input selection. However, in a nonlinear complex hydrological process such as groundwater quality modeling, in spite of a weak linear relationship, a strong nonlinear relationship may exist between input and output variables. MI measures the statistical nonlinear dependency between potential inputs and the output of the model. Therefore, the implication of these two input selection criteria in the study led to different best input combinations used in the MLR and FFNN models.

MLR models with the best forecasting performance for clusters are shown in Tables 12 and 13 for EC and TDS, respectively. The results show that the MLR model can model the TDS parameter more reliably than the EC parameter which means that the TDS time series relatively follows a linear pattern in temporal scale. This could be the reason why the inputs of the MLR model in modeling the TDS parameter are almost the same as the inputs of the ANN model, in contrast to the inputs of the EC parameter which are quite different from the inputs of ANNs.

Table 12

Results of MLR modeling for EC parameter

  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 EC4(T + 1) = 0.337-0.254P6(t)-0.189R(t-2)-0.02GL4(t) + 0.6EC4(T-1) 0.526 0.478 0.196 0.173 
Cluster 2 EC16(T + 1) = 0.436-0.484R(t-2) + 0.16GL16(t) + 0.364EC16(T-1) 0.468 0.323 0.155 0.189 
Cluster 3 EC5(T + 1) = 0.257-0.111P4(t-3)-0.254GL5(t) + 0.719EC5(T-1) 0.745 0.491 0.14 0.174 
Cluster 4 EC7(T + 1) = 0.356-0.369R(t-3) + 0.296EC7(T-1) 0.509 0.328 0.152 0.121 
  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 EC4(T + 1) = 0.337-0.254P6(t)-0.189R(t-2)-0.02GL4(t) + 0.6EC4(T-1) 0.526 0.478 0.196 0.173 
Cluster 2 EC16(T + 1) = 0.436-0.484R(t-2) + 0.16GL16(t) + 0.364EC16(T-1) 0.468 0.323 0.155 0.189 
Cluster 3 EC5(T + 1) = 0.257-0.111P4(t-3)-0.254GL5(t) + 0.719EC5(T-1) 0.745 0.491 0.14 0.174 
Cluster 4 EC7(T + 1) = 0.356-0.369R(t-3) + 0.296EC7(T-1) 0.509 0.328 0.152 0.121 
Table 13

Results of MLR modeling for TDS parameter

  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 TDS21(T + 1) = 0.639 + 0.103R(t-3)-0.67GL21(t) + 0.3TDS21(T-1) 0.711 0.706 0.161 0.156 
Cluster 2 TDS24(T + 1) = 0.275-0.091P5(t)-0.04R(t-2)-0.19GL24(t) + 0.813EC24(T-1) 0.835 0.83 0.127 0.13 
Cluster 3 TDS26(T + 1) = 0.459-0.223P6(t-1)-0.131R(t-2)-0.177GL26(t) + 0.533TDS26(T-1) 0.913 0.857 0.09 0.099 
Cluster 4 TDS9(T + 1) = 0.211 + 0.286P4(t-1)-0.058R(t)-0.168GL9(t) + 0.765 TDS9(T-1) 0.781 0.764 0.149 0.135 
  DC
 
RMSE (Normalized)
 
Cluster no. MLR model Calibration Verification Calibration Verification 
Cluster 1 TDS21(T + 1) = 0.639 + 0.103R(t-3)-0.67GL21(t) + 0.3TDS21(T-1) 0.711 0.706 0.161 0.156 
Cluster 2 TDS24(T + 1) = 0.275-0.091P5(t)-0.04R(t-2)-0.19GL24(t) + 0.813EC24(T-1) 0.835 0.83 0.127 0.13 
Cluster 3 TDS26(T + 1) = 0.459-0.223P6(t-1)-0.131R(t-2)-0.177GL26(t) + 0.533TDS26(T-1) 0.913 0.857 0.09 0.099 
Cluster 4 TDS9(T + 1) = 0.211 + 0.286P4(t-1)-0.058R(t)-0.168GL9(t) + 0.765 TDS9(T-1) 0.781 0.764 0.149 0.135 

Also, tabulated results indicate that the MLR model, owing to its linear inherence is unable to completely handle the complex nonlinear groundwater quality predictions. Thus, the nonlinear FFNN model is more efficient than the MLR model.

CONCLUDING REMARKS

In this paper, FFNN models have been developed for groundwater quality forecasting over the plain of Ardabil, northwestern Iran. The inputs of the models were monthly rainfall, runoff, and GL and groundwater quality parameters (EC and TDS) at 6-monthly intervals over the study area. Co-kriging estimator was used to estimate groundwater quality parameters in the locations of GL. As a result, the entire study area was divided into four clusters via SOM clustering approach. ANN modeling was then applied separately for central piezometers of each cluster.

Spatial clustering via SOM was shown to be useful in improving FFNN-based modeling of groundwater quality parameters. The computed MI between different input candidates and output for selecting dominant inputs and lags of model showed that the concentrations of groundwater quality parameters (EC and TDS) are significantly affected by GL fluctuations. Furthermore, although the effect of hydrological parameters (rainfall and runoff) in the plain under study was not significant on plain recharge, it affected the changes in qualitative parameters of the plain.

To evaluate model performance, the proposed methodology was also compared with a conventional model (i.e., the MLR method as a linear model). The results show that the MLR model, owing to its linear inherence, was not able to detect a nonlinear relationship between the studied parameters. The FFNN yielded better performance for clusters, on average 84.5% and 17% for modeling EC and TDS parameters, respectively.

The examination of other artificial intelligence approaches to forecast groundwater quality parameters in hydrological processes is also suggested. For instance, owing to the uncertainty of the hydro-geological processes and the ability of the fuzzy concept to handle uncertainties, the conjunction of the ANN and fuzzy inference system (FIS) models as an adaptive neural-fuzzy inference system (ANFIS) model, could provide useful results. Owing to the lack of monthly data for quality parameters of EC and TDS for the study area, the modeling was performed using monthly data of rainfall, runoff, and GL but 6-monthly data of EC and TDS, and it would be useful to apply the proposed methodology on other heterogeneous groundwater systems with longer groundwater quality data sets in monthly scale.

REFERENCES

REFERENCES
Aris
A. Z.
Praveena
S. M.
Abdullah
M. H.
Radojevic
M.
2012
Statistical approaches and hydrochemical modelling of groundwater system in a small tropical island
.
J. Hydroinform.
14
(
1
),
206
220
.
ASCE Task Committee
.
1990
Review of geostatistics in geohydrology, Part II: applications
.
J. Hydrol. Eng.
116
,
633
658
.
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology
2000
Artificial neural networks in hydrology 2: hydrologic applications
.
J. Hydrol. Eng.
5
(
2
),
124
137
.
Banerjee
P.
Singh
V. S.
Chatttopadhyay
K.
Chandra
P. C.
Singh
B.
2011
Artificial neural network model as a potential alternative for groundwater salinity forecasting
.
J. Hydrol.
398
,
212
220
.
Cressie
N.
1993
Statistics for Spatial Data
.
John Wiley & Sons
,
New York
,
USA
, p.
900
.
Hornik
K.
Stinchcombe
M.
White
H.
1989
Multilayer feed-forward networks are universal approximators
.
Neur. Netw.
2
(
5
),
359
366
.
Isaaks
E. H.
Srivastava
R. M.
1989
Applied Geostatistics
.
Oxford University Press
,
New York
,
USA
, p.
561
.
Karunasingha
D. S. K.
Liong
S. Y.
2015
A simple clustering technique to extract subsets of data for function approximation
.
J. Hydroinform.
17
(
5
),
719
732
.
Kim
J. H.
Yum
B. W.
Kim
R. H.
Koh
D. C.
Cheong
T. J.
Lee
J.
Chang
H. W.
2003
Application of cluster analysis for the hydrogeochemical factors of saline groundwater in Kimje, Korea
.
Geosci. J.
7
(
4
),
313
322
.
Kohonen
T.
1998
The self-organizing map
.
Neurocomput.
21
,
1
6
.
Kord
M.
Asghari Moghaddam
A.
Nakhaeei
M.
2013
Investigation of hydrogeological characteristics of Ardabil Plain aquifer, northwest of Iran
.
J. Sci. Technol.
9
(
15
),
63
69
.
Matheron
G.
1963
Principles of geostatistics
.
Econ. Geol.
58
(
8
),
1246
1266
.
Myers
D. E.
1982
Matrix formulation of Cokriging.
Math. Geol.
14
,
249
257
.
Nadiri
A. A.
Fijani
E.
Tsai
F. T.-C.
Asghari Moghaddam
A.
2013
Supervised committee machine with artificial intelligence for prediction of fluoride concentration
.
J. Hydroinform.
15
(
4
),
1474
1490
.
Nourani
V.
Alami
M. T.
Aminfar
M. H.
2009
A combined neural–wavelet model for prediction of Lighvanchai watershed precipitation
.
Eng. Appl. Artif. Intell.
16
,
1
12
.
Nourani
V.
Hosseini Baghanam
A.
Adamowski
J.
Kisi
O.
2014a
Applications of hybrid wavelet–artificial Intelligence models in hydrology: a review
.
J. Hydrol.
514
,
358
377
.
Nourani
V.
Rezapour Khanghah
T.
Hosseini Baghanam
A.
2014b
Implication of feature extraction methods to improve performance of hybrid Wavelet-ANN rainfall–runoff model
. In:
Case Studies in Intelligent Computing
(
Issac
B.
Israr
N.
, eds).
Taylor and Francis Group
,
New York
,
USA
, pp.
457
498
.
Sanchez-Martos
F.
Aguilera
P. A.
Garrido-Frenich
A.
Torres
J.
Pulido-Bosch
A.
2002
Assessment of groundwater quality by means of self-organizing maps: application in a semiarid area
.
Environ. Manage.
30
(
5
),
716
726
.
Shannon
C. E.
1948
The mathematical theory of communications, I and II
.
Bell Syst. Tech. J.
27
,
379
423
.
Subyani
A. M.
Al Ahmadi
M. E.
2010
Multivariate statistical analysis of groundwater quality in WadiRanyah, Saudi Arabia
.
J. Earth Sci.
21
(
2
),
29
46
.
Yang
H. H.
Vuuren
S. V.
Sharma
S.
Hermansky
H.
2000
Relevance of time-frequency features for phonetic and speaker-channel classification
.
Speech Commun.
31
,
35
50
.