Abstract

Recently, the capabilities of artificial neural networks (ANNs) in simulating dynamic systems have been proven. However, the common training algorithms of ANNs (e.g., back-propagation and gradient algorithms) are featured with specific drawbacks in terms of slow convergence and probable entrapment in local minima. Alternatively, novel training techniques, e.g., particle swarm optimization (PSO) and differential evolution (DE) algorithms might be employed for conquering these shortcomings. In this paper, ANN-PSO and ANN-DE models were applied for modeling groundwater qualitative parameters, i.e., SO4 and sodium adsorption ratio (SAR). Three statistical parameters including root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2) were used for assessing the models' capabilities. The results showed that the ANN-DE presents more accurate results than ANN-PSO in modeling SAR and electrical conductivity (EC).

INTRODUCTION

Groundwater is one of the major sources of water supply of domestic as well as agricultural activities. Modeling groundwater quality is needed to develop better strategies for water resources planning and management (Liu et al. 2009; Najah et al. 2014). Traditional water resources management approaches considered surface water and groundwater systems as two separate entities. However, the recent developments in land and water resources analysis have demonstrated that these systems could affect each other, from both qualitative and quantitative points of view. Nonetheless, groundwater contamination, either by anthropogenic activities, or by inherent aquifer material composition, reduces groundwater supply capacity or restricts its exploitation.

Meanwhile, agricultural activities, which might include the uncontrolled use of fertilizers and pesticides, influence and cause the deterioration of groundwater quality, although variations in groundwater quality can be influenced by geological formations and anthropogenic activities, too (Yesilnacar et al. 2008). Variation of groundwater quality is a component of physical and chemical parameters that are enormously impacted by geological formations and anthropogenic activities (Almasri & Kaluarachchi 2005; Subramani et al. 2005; Yesilnacar & Sahinkaya 2012).

Electrical conductivity (EC) illustrates the capacity of a substance or solution to conduct the electricity current. EC is a good predictor for obtaining total dissolved solids (TDS), which is a measure of all inorganic and organic substances (WHO 1984). Having the values of Na+, Ca2+, and Mg2+, sodium adsorption ratio (SAR) can be calculated as (Devadas et al. 2007):  
formula
(1)
where the ion concentrations are in meq/L.
Nevertheless, the hardness of groundwater might be calculated as:  
formula
(2)

SAR, which is determined by the concentrations of solids dissolved in the water, is a significant parameter for analyzing the suitability of irrigation water. Higher SAR values (high Na+ and low Ca2+ and Mg2+ magnitudes) may cause the dispersion of clay particles and destroy the soil structure (Yesilnacar & Sahinkaya 2012). Nevertheless, groundwater sulfate might be provided to the point and non-point sources. The maximum permissible and allowable concentrations of sulfate in drinking water are 200 and 400 mg/l, respectively (WHO 1984). High sulfate concentrations affect the water taste. Therefore, groundwater SAR and sulfate simulation are important tasks in groundwater resources management and planning.

The traditional groundwater quality analysis approach is mainly based on mathematical modeling, e.g., time series analysis, probability statistics, etc., which usually assume a linear relationship between the dependent and independent variables, thus the model's overall accuracy is not high (Luo et al. 2003). Owing to the existing difficulties in simulating groundwater quality (Omran 2012), novel computational approaches are required.

As an alternative to the traditional statistical approaches, artificial intelligence techniques might be used to solve this problem. Among others, artificial neural networks (ANNs) have been widely applied in numerous disciplines, e.g., qualitative/quantitative groundwater modeling (Cheng et al. 2005; Liu & Chung 2014). Yesilnacar et al. (2008) predicted groundwater nitrate concentration in Harran Plain in Turkey using ANNs. Yesilnacar & Sahinkaya (2012) developed an ANN model for predicting groundwater sulfate (SO4) and SAR concentration. Kuo et al. (2004) utilized back-propagation ANN to predict the variations of groundwater quality (in terms of seawater salinization and arsenic pollutant factors) in Taiwan. Khaki et al. (2015) evaluated the potential of adaptive neuro-fuzzy inference system (ANFIS) and ANN to simulate TDS and electrical conductivity (EC) levels.

Despite ANNs' capability in modeling nonlinear systems, establishing these models with conventional training algorithms may produce non-optimum outcomes because of limitations for adapting the best synaptic weights. Alternatively, ANN models might be integrated with some evolutionary algorithms (EAs), e.g., differential evolution (DE) and particle swarm optimization (PSO) to optimize the models' structures.

DE is a meta-heuristic population-based algorithm, which can be used for multidimensional real-valued functions. The PSO algorithm evolves a population of particle individuals through an iterative process to find the optimized solution. Unlike most EAs, PSO has low computational costs and its implementation is straightforward. Each potential solution in PSO is represented by a particle, flies in a multidimensional search space with a velocity dynamically adjusted by the particle's own former information and the experience of the other particles. Numerous applications of PSO have been reported in solving real-world optimization problems (e.g., Liu et al. 2007; Melin et al. 2013; Selakov et al. 2014).

Karterakis et al. (2007) applied DE for the solution of coastal subsurface water management problems. Gaur et al. (2011) applied an analytic element method coupled with PSO for groundwater management and reported that the developed model is efficient in identifying the optimal location and discharge of the pumping wells. Sudheer & Shashi (2012) developed a PSO trained ANN for aquifer parameter estimation. Gaur et al. (2013) applied ANN and PSO for management of groundwater and reported that the ANN-PSO model is capable of identifying the optimal location of wells efficiently. Chiu (2014) applied DE for parameter structure identification in groundwater modeling. Elci & Ayvaz (2014) applied a DE algorithm-based optimization for the site selection of groundwater production wells. Based on a review study, Ketabchi & Ataie-Ashtiani (2015) investigated the literature associated with the application of evolutionary algorithms (e.g., PSO and DE algorithms) in coastal groundwater management problems. Overall, they concluded that the PSO algorithm is among the superior EAs.

In the present study, the capability of PSO and DE algorithms were evaluated in modeling groundwater quality parameters (i.e., SO4 and SAR).

MATERIALS AND METHODS

Site description

This study was conducted in Neyshabur plain, Iran, located between 35°41′ (°N) and 58°20′ (°E) (Figure 1). The average altitude of the region is 1,500 m above mean sea level. Mean annual precipitation and temperature values are 233.7 mm and 14.5 °C, respectively (Mansouri Daneshvar et al. 2013).

Figure 1

Location of the study area and sampling wells.

Figure 1

Location of the study area and sampling wells.

Groundwater sampling and measurement

Monthly groundwater records were collected from 60 observational wells during a 16-year period (1997–2013). Geographical coordinates and elevation of each sampling location was recorded using a handheld global positioning system (GPS). A few locations were also cross-checked with a differential GPS. Collected samples were analyzed in the laboratory to measure the concentration of the qualitative parameters using the existing standard procedures (Table 1).

Table 1

Utilized methods for hydro-chemical parameters identification in the present study

Parameter Method 
Electrical conductivity (μS/cm) Conductivity bridge (Richards 1954
pH pH meter (Thomas 1996
Sodium (mg/L) Flame photometric (Osborn & Johns 1951
Calcium (mg/L) EDTA titration (Richards 1954
Magnesium (mg/L) EDTA titration (Richards 1954
Bicarbonate (mg/L) Acid titration (Hesse 1971
Chloride (mg/L) Mohr's titration (Hesse 1971
Hardness (mg CaCO3/L) EDTA titration (Richards 1954
Total dissolved solids (ppm or mg/L) Water quality analyzer (APHA 1995
Parameter Method 
Electrical conductivity (μS/cm) Conductivity bridge (Richards 1954
pH pH meter (Thomas 1996
Sodium (mg/L) Flame photometric (Osborn & Johns 1951
Calcium (mg/L) EDTA titration (Richards 1954
Magnesium (mg/L) EDTA titration (Richards 1954
Bicarbonate (mg/L) Acid titration (Hesse 1971
Chloride (mg/L) Mohr's titration (Hesse 1971
Hardness (mg CaCO3/L) EDTA titration (Richards 1954
Total dissolved solids (ppm or mg/L) Water quality analyzer (APHA 1995

EDTA, ethylenediaminetetraacetic acid.

In this study, groundwater qualitative parameters, i.e., SO4 and SAR were modeled using two different evolutionary neural networks, namely, ANN-PSO and ANN-DE. Calcium, magnesium, sodium, hardness, electrical conductivity, TDS, pH, bicarbonate, and chloride parameters were used as input variables to estimate the SO4 and SAR. Table 2 sums up the statistical parameters of the applied data. Variability class of the coefficient of variation (CV) was obtained based on the criterion presented by Wilding (1983). Based on this criterion, the CV values less than 15% denote the low variability class, while the CV values higher than 35% stand for the high variability class. The CV values between 15% and 35% correspond to the medium variability class. Considering the results presented in Table 1, high variations were observed in groundwater qualitative parameters (from 41.92% to 123.98%), except pH, which shows low variability with a CV value of 4.72%. For developing the applied models, 50% of data (1,200 patterns) were used for training, while the remaining 25% and 25% (600 and 600 patterns) was used for validating and testing the models, respectively.

Table 2

The range of measured values of the groundwater quality properties

Parameter Unit Min Max Mean Std CV 
EC μS/cm 4.40 35,200.00 2,946.69 3,102.26 105.28 
pH – 6.30 9.50 8.00 0.37 4.72 
Sodium mg/L 0.00 127.30 18.52 18.89 101.99 
Calcium mg/L 0.00 40.60 4.75 5.26 110.58 
Magnesium mg/L 0.00 44.80 5.051 4.43 87.79 
Bicarbonate mg/L 0.00 11.00 2.93 1.23 41.92 
Chloride mg/L 0.00 142.50 17.60 21.83 123.98 
TH mg CaCO3/L 0.00 3,625.00 490.47 448.69 91.48 
TDS ppm or mg/L 2.77 2,2176.00 1856.42 1954.42 105.26 
SO4 mg/L 0.00 50.00 8.49 7.64 90.01 
SAR – 0.00 37.89 8.06 6.77 84.03 
Parameter Unit Min Max Mean Std CV 
EC μS/cm 4.40 35,200.00 2,946.69 3,102.26 105.28 
pH – 6.30 9.50 8.00 0.37 4.72 
Sodium mg/L 0.00 127.30 18.52 18.89 101.99 
Calcium mg/L 0.00 40.60 4.75 5.26 110.58 
Magnesium mg/L 0.00 44.80 5.051 4.43 87.79 
Bicarbonate mg/L 0.00 11.00 2.93 1.23 41.92 
Chloride mg/L 0.00 142.50 17.60 21.83 123.98 
TH mg CaCO3/L 0.00 3,625.00 490.47 448.69 91.48 
TDS ppm or mg/L 2.77 2,2176.00 1856.42 1954.42 105.26 
SO4 mg/L 0.00 50.00 8.49 7.64 90.01 
SAR – 0.00 37.89 8.06 6.77 84.03 

Std: standard deviation, CV: coefficient of variation (%).

Applied algorithms

Artificial neural networks

ANNs are interconnected groups of artificial neurons (processors) designed for information processing through a computational model. They are generally utilized to simulate the output vectors according to the given input vectors, especially in dynamic systems where the interrelationships between the input-target parameters are non-linear (Omkar & Senthilnath 2011; Balouchi et al. 2015). In an ANN structure, input and output vectors are placed as the first and last layers. Among these layers, hidden layer(s) with several neurons are considered. In this study, a neural network with one hidden layer was established and the number of neurons in the hidden layer was determined iteratively. The schematic diagram of the applied feed-forward ANN is shown in Figure 2.

Figure 2

A three-layer ANN architecture.

Figure 2

A three-layer ANN architecture.

Particle swarm optimization

PSO is an evolutionary computation algorithm, based on iterative optimization (Kennedy & Eberhart 1955). PSO consists of a group of particles (individuals) which refine their knowledge of the search space. Each particle has two main characteristics of position and velocity. In the PSO, the iterative method is used to reach the optimal solution according to the fitness values of each particle, which is determined by optimization function. Each particle adjusts its trajectory by tracking two pieces of information: (1) the best visited position (Pbest) and (2) the global extremum attained by species (Gbest) (Assareh et al. 2010). At each generation (iteration) step, each particle is accelerated toward the previous Pbest and the Gbest position of the particle. A new velocity magnitude is calculated for each particle based on its current velocity and its distance from its previous Pbest and Gbest. The updated velocity magnitude is then utilized to calculate the next position of the particle through the search space. The iterative process is continued a set number of times, or until achieving a minimum error.

In PSO, a population of particles or proposed solutions evolves in each iteration, moving towards the optimal solution of the problem. A new population is obtained shifting the positions of the previous one for each iteration. In its movement, each individual is influenced by its neighbors' and its own trajectory. The parameters, or possible set of solutions, are contained in a vector xi, which is called a ‘particle’ of the swarm and represents its position in the search space of possible solutions. The particle dimension is the number of parameters. The particle position and its velocity are randomly obtained. The value of the fitness function is then calculated for each particle and the velocities and positions are updated taking into account these values. The algorithm updates the positions and the velocities of the particles following the equation:  
formula
(3)
The velocity of each particle, i, at iteration k, depends on three components:
  • the previous step velocity term, affected by the constant inertia weight, ω;

  • the cognitive learning term, which is the difference between the particle's best position so far found (called , local best) and the particle current position ;

  • the social learning term, which is the difference between the global best position found thus far in the entire swarm (called gk, global best) and the particle's current position .

These two last components are affected by φ1 = c1r1 and φ2 = c2r2 where r1 and r2 are random numbers distributed uniformly in the interval [0,1] and c1 and c2 are constants. The particles of the swarm make up a cloud that covers the whole search space in the initial iteration and gradually contracts its size as iterations advance, performing the exploration. Thus, in the initial stages the algorithm performs an exploration searching for plausible zones and in the last iterations the best solution is improved. The PSO implementation of the algorithm has been refined over the years and many variants created. In this paper, the Standard 2011 PSO has been used. It contemplates some improvements in the implementation and the PSO parameters are set to the values:  
formula
(4)
The swarm topology defines how particles are connected between them to interchange information with the global best. In the actual Standard PSO each particle informs only K particles, usually three, randomly chosen.

Differential evolution

DE is a population-based stochastic search technique for solving continuous optimization problems. DE algorithm comprises three major operators, namely, mutation, crossover, and selection. Mutation is the simplest genetic operator. It randomly flips bits in a binary string genome from 0 to 1 or from 1 to 0. This operator improves the algorithm by introducing new solutions that do not exist in the population. After initialization, mutation operation is employed with respect to each target individual . Thereafter, a mutant vector is determined by the current population by the following equation (Storn 1996):  
formula
(5)
where i1, i2 and i3 are randomly chosen indices selected within the range {1, 2,…,NP}. After the mutation, crossover operation, which is the process of varying DNA of chromosomes by exchanging some of their sections, is applied. It applies to each (pair of the target vector) and its related mutant vector to generate as an offspring vector can be considered as:  
formula
(6)
Even when Pr = 0, at least one of the parameters of the offspring will differ from the parent (forced by the condition j = r). A number of individuals from the existing generation are selected to breed a new generation. The selection is typically based on fitness. Thus, the fitter an individual is, the more likely it is to be selected. A weak individual still has a chance to be selected, and this helps to keep the diversity of the population high. The trial vector is going to be selected as a member of the population comprising the next generation after being compared to the corresponding target vector . The selection operator for the next target vector is as below:  
formula
(7)
These steps are repeated until a pre-specified stopping criterion is satisfied.

The conventional ANN models utilize gradient-based algorithms (GBAs) (e.g., back-propagation) for identifying the weights. For the GBAs, in the calibration (training) period, it is very easy to get trapped in a local minima (Kumar et al. 2002; Sudheer et al. 2003). The evolutionary algorithms (e.g., PSO, DE) are more robust than the existing direct search methods (e.g., GBAs) because they combine the stochastic and direct search. Evolutionary algorithms (EA) provide the global optimum without being trapped in local optima as in the GBAs (Mantoglou et al. 2004; Karterakis et al. 2007).

Goodness-of-fit of the model

The ANN-PSO and ANN-DE models were evaluated according to three goodness-of-fit measures, namely, the root mean square error (RMSE), the mean absolute error (MAE), and the coefficient of determination (R2), expressions of which are as follows:  
formula
(8)
 
formula
(9)
 
formula
(10)
where N is the number of data, WQi,o denotes the observed water qualitative parameters (SO4 or SAR) value, and WQi,e denotes the corresponding simulated values. meanWQo and meanWQe stand for the average observed and estimated groundwater quality parameters, respectively.

RESULTS AND DISCUSSION

For ANN implementation, first the number of hidden neurons was considered as twice the input numbers, according to Bhattacharyya & Pendharkar (1998). Then, various particle swarm/population sizes were tried. According to Geethanjali et al. (2008), the typical ranges for the number of particles are 20–40, and 10 particles are large enough to get good results for most of the problems.

In the present research, four different particle sizes, i.e., 10, 20, 30, and 40 were tried with 10,000 iterations for the ANN-PSO models, with hidden node number of 18 (2 × 9 inputs). Then, the hidden node number was decreased to the number of inputs (nine nodes). The sensitivity analysis of different ANN-PSO models with respect to hidden node numbers is presented in Table 3. From the table it is seen that the RMSE values vary between 3.19 mg/L and 9.43 mg/L for the SO4 and between 3.87 and 8.03 for the SAR. It is clear that the ANN-PSO is very sensitive to hidden node numbers. The model with 16 hidden nodes presents the best results in estimating SO4 (the lowest RMSE and the highest R2 values). In the case of SAR, however, the ANN-PSO model with 18 hidden nodes outperforms the other models.

Table 3

Sensitivity analysis of different ANN-PSO models with respect to hidden node number

Hidden node number Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 18 2.90 1.88 0.846 3.95 2.54 0.766 3.76 2.46 0.854 
 17 2.64 1.77 0.873 5.66 4.84 0.852 7.38 5.86 0.847 
 16 2.59 1.76 0.877 3.20 2.32 0.847 3.19 2.46 0.888 
 15 3.01 2.06 0.834 10.2 9.61 0.751 9.14 8.51 0.832 
 14 3.00 2.02 0.835 8.52 7.92 0.798 9.43 8.59 0.797 
 13 2.53 1.76 0.884 4.21 3.27 0.776 3.79 3.14 0.909 
 12 2.49 1.71 0.887 4.03 2.78 0.766 3.43 2.64 0.897 
 11 2.73 1.85 0.863 3.62 2.42 0.812 3.31 2.11 0.874 
 10 2.71 1.74 0.865 6.05 5.44 0.833 6.06 5.26 0.877 
 9 2.72 1.80 0.864 5.59 5.20 0.843 4.51 3.43 0.847 
SAR 
 18 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 
 17 3.11 2.03 0.793 10.1 9.32 0.549 6.89 6.11 0.759 
 16 3.67 2.48 0.712 5.20 2.95 0.512 4.55 2.95 0.709 
 15 3.33 2.21 0.763 6.64 5.64 0.733 5.11 4.46 0.801 
 14 3.29 2.11 0.768 4.90 3.73 0.711 4.01 3.06 0.784 
 13 3.06 2.02 0.801 4.81 3.25 0.719 4.12 3.31 0.830 
 12 3.17 2.09 0.787 4.91 3.08 0.701 5.06 3.87 0.807 
 11 3.34 2.23 0.760 9.73 8.89 0.636 8.03 7.20 0.721 
 10 3.12 2.01 0.796 6.13 5.18 0.695 5.24 4.45 0.781 
 9 2.76 1.72 0.839 6.58 5.64 0.725 5.18 4.42 0.804 
Hidden node number Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 18 2.90 1.88 0.846 3.95 2.54 0.766 3.76 2.46 0.854 
 17 2.64 1.77 0.873 5.66 4.84 0.852 7.38 5.86 0.847 
 16 2.59 1.76 0.877 3.20 2.32 0.847 3.19 2.46 0.888 
 15 3.01 2.06 0.834 10.2 9.61 0.751 9.14 8.51 0.832 
 14 3.00 2.02 0.835 8.52 7.92 0.798 9.43 8.59 0.797 
 13 2.53 1.76 0.884 4.21 3.27 0.776 3.79 3.14 0.909 
 12 2.49 1.71 0.887 4.03 2.78 0.766 3.43 2.64 0.897 
 11 2.73 1.85 0.863 3.62 2.42 0.812 3.31 2.11 0.874 
 10 2.71 1.74 0.865 6.05 5.44 0.833 6.06 5.26 0.877 
 9 2.72 1.80 0.864 5.59 5.20 0.843 4.51 3.43 0.847 
SAR 
 18 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 
 17 3.11 2.03 0.793 10.1 9.32 0.549 6.89 6.11 0.759 
 16 3.67 2.48 0.712 5.20 2.95 0.512 4.55 2.95 0.709 
 15 3.33 2.21 0.763 6.64 5.64 0.733 5.11 4.46 0.801 
 14 3.29 2.11 0.768 4.90 3.73 0.711 4.01 3.06 0.784 
 13 3.06 2.02 0.801 4.81 3.25 0.719 4.12 3.31 0.830 
 12 3.17 2.09 0.787 4.91 3.08 0.701 5.06 3.87 0.807 
 11 3.34 2.23 0.760 9.73 8.89 0.636 8.03 7.20 0.721 
 10 3.12 2.01 0.796 6.13 5.18 0.695 5.24 4.45 0.781 
 9 2.76 1.72 0.839 6.58 5.64 0.725 5.18 4.42 0.804 

Similarly to the ANN-PSO models, four different population sizes of 10, 20, 30, and 40 with 10,000 iterations were tried for the ANN-DE models and hidden node number was set to 18. Then, the hidden node number was decreased to the number of inputs (nine nodes). The sensitivity analysis of different ANN-PSO models with respect to hidden node numbers is presented in Table 4. From the table it is clear that the ANN-DE is not very sensitive to hidden node numbers. Similarly to the ANN-PSO, the ANN-DE model comprising 16 hidden nodes presents the best performance in modeling SO4. In estimating SAR, however, the ANN-DE model with 12 hidden nodes performs better than the other models.

Table 4

Sensitivity analysis of ANN-DE models with respect to hidden node number

Hidden node number Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 18 2.76 1.86 0.868 2.92 1.95 0.854 2.91 1.88 0.880 
 17 2.28 1.54 0.910 3.65 1.90 0.799 2.85 1.78 0.885 
 16 2.11 1.43 0.923 2.88 1.72 0.872 2.22 1.64 0.932 
 15 2.16 1.39 0.933 3.33 1.52 0.836 2.89 1.60 0.902 
 14 2.18 1.53 0.930 4.01 2.23 0.809 2.93 1.95 0.898 
 13 2.28 1.45 0.905 3.53 1.93 0.790 2.90 1.78 0.878 
 12 1.91 1.12 0.934 3.25 1.42 0.826 2.62 1.41 0.901 
 11 1.99 1.28 0.929 2.60 1.54 0.884 2.34 1.57 0.923 
 10 2.51 1.89 0.892 3.22 1.95 0.818 2.85 1.99 0.885 
 9 2.09 1.42 0.922 3.10 1.64 0.833 2.59 1.54 0.903 
SAR 
 18 1.98 1.48 0.920 2.55 1.62 0.865 2.22 1.60 0.896 
 17 1.90 1.41 0.930 2.83 1.71 0.834 2.42 1.59 0.874 
 16 2.11 1.43 0.905 3.80 1.83 0.703 2.46 1.63 0.862 
 15 2.15 1.55 0.918 2.82 1.47 0.831 2.27 1.42 0.883 
 14 2.02 1.36 0.921 3.48 1.68 0.763 2.91 1.70 0.829 
 13 2.01 1.29 0.914 3.78 1.72 0.714 2.67 1.68 0.849 
 12 1.84 1.29 0.935 3.18 1.62 0.786 2.11 1.35 0.902 
 11 2.00 1.38 0.925 3.67 1.85 0.725 2.82 1.75 0.839 
 10 1.88 1.40 0.927 2.05 1.46 0.911 2.24 1.57 0.886 
 9 1.75 1.23 0.937 2.20 1.48 0.898 2.32 1.70 0.886 
Hidden node number Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 18 2.76 1.86 0.868 2.92 1.95 0.854 2.91 1.88 0.880 
 17 2.28 1.54 0.910 3.65 1.90 0.799 2.85 1.78 0.885 
 16 2.11 1.43 0.923 2.88 1.72 0.872 2.22 1.64 0.932 
 15 2.16 1.39 0.933 3.33 1.52 0.836 2.89 1.60 0.902 
 14 2.18 1.53 0.930 4.01 2.23 0.809 2.93 1.95 0.898 
 13 2.28 1.45 0.905 3.53 1.93 0.790 2.90 1.78 0.878 
 12 1.91 1.12 0.934 3.25 1.42 0.826 2.62 1.41 0.901 
 11 1.99 1.28 0.929 2.60 1.54 0.884 2.34 1.57 0.923 
 10 2.51 1.89 0.892 3.22 1.95 0.818 2.85 1.99 0.885 
 9 2.09 1.42 0.922 3.10 1.64 0.833 2.59 1.54 0.903 
SAR 
 18 1.98 1.48 0.920 2.55 1.62 0.865 2.22 1.60 0.896 
 17 1.90 1.41 0.930 2.83 1.71 0.834 2.42 1.59 0.874 
 16 2.11 1.43 0.905 3.80 1.83 0.703 2.46 1.63 0.862 
 15 2.15 1.55 0.918 2.82 1.47 0.831 2.27 1.42 0.883 
 14 2.02 1.36 0.921 3.48 1.68 0.763 2.91 1.70 0.829 
 13 2.01 1.29 0.914 3.78 1.72 0.714 2.67 1.68 0.849 
 12 1.84 1.29 0.935 3.18 1.62 0.786 2.11 1.35 0.902 
 11 2.00 1.38 0.925 3.67 1.85 0.725 2.82 1.75 0.839 
 10 1.88 1.40 0.927 2.05 1.46 0.911 2.24 1.57 0.886 
 9 1.75 1.23 0.937 2.20 1.48 0.898 2.32 1.70 0.886 

Training, validation, and test results of the ANN-PSO models are given in Table 5. It is clear from the table that the models' accuracy decreases by increasing swarm size of training data, while for the validation and test stages, the accuracies are fluctuating. Analyzing the error statistics presented in Table 3 shows that the ANN-PSO with 30 swarm size has the lowest RMSE (3.76 mg/L) and MAE (2.46 mg/L) values in estimating SO4 in the test stage; ANN-PSO with 40 swarm size produced the most accurate results for estimating SAR.

Table 5

Comparison of different ANN-PSO models with different swarm sizes

Swarm size Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 10 3.87 2.72 0.750 6.84 6.16 0.746 5.66 4.86 0.690 
 20 3.21 2.19 0.813 6.93 6.23 0.794 7.75 6.75 0.801 
 30 2.90 1.88 0.846 3.95 2.54 0.766 3.76 2.46 0.854 
 40 2.61 1.73 0.875 6.11 5.66 0.858 7.83 7.22 0.864 
SAR 
 10 5.26 3.74 0.442 5.83 4.07 0.428 5.44 4.18 0.498 
 20 4.50 3.14 0.565 10.5 9.46 0.522 9.36 8.33 0.586 
 30 4.38 3.02 0.588 5.42 3.29 0.403 5.50 3.92 0.590 
 40 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 
Swarm size Training
 
Validation
 
Test
 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 10 3.87 2.72 0.750 6.84 6.16 0.746 5.66 4.86 0.690 
 20 3.21 2.19 0.813 6.93 6.23 0.794 7.75 6.75 0.801 
 30 2.90 1.88 0.846 3.95 2.54 0.766 3.76 2.46 0.854 
 40 2.61 1.73 0.875 6.11 5.66 0.858 7.83 7.22 0.864 
SAR 
 10 5.26 3.74 0.442 5.83 4.07 0.428 5.44 4.18 0.498 
 20 4.50 3.14 0.565 10.5 9.46 0.522 9.36 8.33 0.586 
 30 4.38 3.02 0.588 5.42 3.29 0.403 5.50 3.92 0.590 
 40 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 

Table 6 sums up the training, validation, and test results of the ANN-DE models. It is apparent from the table that the ANN-DE presents the lowest RMSE (2.91 mg/L) and MAE (1.88 mg/L) in estimating SO4 in the test period for the 30 population size (PS), while ANN-DE with 40 PS provided the best accuracy in estimating SAR, similar to the ANN-PSO.

Table 6

Comparison of different ANN-DE models with different swarm sizes

Population Training
 
Validation
 
Test
 
Size RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 10 2.53 1.79 0.884 5.80 2.29 0.564 3.17 2.15 0.854 
 20 2.31 1.63 0.902 3.68 1.99 0.782 3.01 1.99 0.868 
 30 2.76 1.86 0.868 2.92 1.95 0.854 2.91 1.88 0.880 
 40 2.50 1.73 0.886 2.88 2.01 0.852 3.03 2.20 0.869 
SAR 
 10 2.05 1.39 0.912 3.27 1.73 0.780 2.44 1.54 0.867 
 20 2.05 1.42 0.919 3.25 1.78 0.793 3.00 1.67 0.814 
 30 1.83 1.37 0.931 2.87 1.52 0.829 2.69 1.62 0.837 
 40 1.98 1.48 0.920 2.55 1.62 0.865 2.22 1.60 0.896 
Population Training
 
Validation
 
Test
 
Size RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
SO4 
 10 2.53 1.79 0.884 5.80 2.29 0.564 3.17 2.15 0.854 
 20 2.31 1.63 0.902 3.68 1.99 0.782 3.01 1.99 0.868 
 30 2.76 1.86 0.868 2.92 1.95 0.854 2.91 1.88 0.880 
 40 2.50 1.73 0.886 2.88 2.01 0.852 3.03 2.20 0.869 
SAR 
 10 2.05 1.39 0.912 3.27 1.73 0.780 2.44 1.54 0.867 
 20 2.05 1.42 0.919 3.25 1.78 0.793 3.00 1.67 0.814 
 30 1.83 1.37 0.931 2.87 1.52 0.829 2.69 1.62 0.837 
 40 1.98 1.48 0.920 2.55 1.62 0.865 2.22 1.60 0.896 

Comparison of Tables 5 and 6 clearly shows that the ANN-DE performs better than the ANN-PSO in estimating SO4 and SAR in all cases. The obtained results revealed that selecting the number of neurons as twice that of the input numbers may not give the optimal results, and should be obtained through a trial and error process.

The optimal ANN-PSO and ANN-DE models are compared in Table 7. From the table it is seen that the ANN-DE models give more accurate results than the ANN-PSO models for all training, validation, and test stages.

Table 7

Comparison of the optimal ANN-PSO and ANN-DE models

Model Hidden node number Training
 
Validation
 
Test
 
Computational cost (iterations) Run time (s) Convergence speed (iteration/s) 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
 SO4    
ANN-PSO 16 2.59 1.76 0.877 3.20 2.32 0.847 3.19 2.46 0.888 1,000 23 43 
ANN-DE 16 2.11 1.43 0.923 2.88 1.72 0.872 2.22 1.64 0.932 950 21 45 
 SAR    
ANN-PSO 18 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 1,200 28 43 
ANN-DE 12 1.84 1.29 0.935 3.18 1.62 0.786 2.11 1.35 0.902 1,100 26 42 
Model Hidden node number Training
 
Validation
 
Test
 
Computational cost (iterations) Run time (s) Convergence speed (iteration/s) 
RMSE MAE R2 RMSE MAE R2 RMSE MAE R2 
 SO4    
ANN-PSO 16 2.59 1.76 0.877 3.20 2.32 0.847 3.19 2.46 0.888 1,000 23 43 
ANN-DE 16 2.11 1.43 0.923 2.88 1.72 0.872 2.22 1.64 0.932 950 21 45 
 SAR    
ANN-PSO 18 3.05 1.98 0.802 4.71 3.03 0.739 3.87 2.61 0.810 1,200 28 43 
ANN-DE 12 1.84 1.29 0.935 3.18 1.62 0.786 2.11 1.35 0.902 1,100 26 42 

Figure 3 compares the time series of the observed and estimated SO4 values obtained by ANN-DE and ANN-PSO models during the test period. It is clear from the figure that the values produced by the ANN-DE model are closer to the observed values than those of the ANN-PSO model. Scatterplots of the observed vs. simulated SO4 values during the test period are also compared in Figure 4. Assuming the fit line equation as y = ax + b, the a and b coefficients of the ANN-DE model are closer to 1 and 0, respectively, with a higher R2 value (0.931), which demonstrates the superiority of the ANN-DE model. Time variation and scatter plots' comparison of the ANN-DE and ANN-PSO in SAR modeling are shown in Figures 5 and 6. Similarly to the SO4 modeling, ANN-DE is superior to the ANN-PSO in simulating SAR.

Figure 3

Observed vs. estimated SO4 values of ANN-DE and ANN-PSO models in the test stage.

Figure 3

Observed vs. estimated SO4 values of ANN-DE and ANN-PSO models in the test stage.

Figure 4

Scatter plots of the observed vs. simulated SO4 values by ANN-DE and ANN-PSO models in the test stage.

Figure 4

Scatter plots of the observed vs. simulated SO4 values by ANN-DE and ANN-PSO models in the test stage.

Figure 5

The observed and estimated SAR values by ANN-DE and ANN-PSO models in the test phase.

Figure 5

The observed and estimated SAR values by ANN-DE and ANN-PSO models in the test phase.

Figure 6

The scatter plots of the observed and estimated SAR values by ANN-DE and ANN-PSO models in the test phase.

Figure 6

The scatter plots of the observed and estimated SAR values by ANN-DE and ANN-PSO models in the test phase.

Further, the results were tested by using one-way analysis of variance (ANOVA) for verifying the robustness of the optimum ANN-DE and ANN-PSO models. Both tests were set at a 95% significance level. Thus, differences between the observed and simulated SO4 and SAR values were considered as significant differences when the resultant significance level (p) was lower than the 0.05 by use of two-tailed significance levels. The test statistics are given in Table 8. The ANN-DE model yields a small testing value with a high significance level for the ANOVA in the case of both the SO4 and SAR modeling. According to the test results, the ANN-DE seems to be more powerful than the ANN-PSO in this case.

Table 8

ANOVA results for the optimum ANN-DE and ANN-PSO models

  SO4
 
SAR
 
F-statistic Resultant significance level F-statistic Resultant significance level 
ANN-DE 0.0159 0.8991 0.8980 0.3431 
ANN-PSO 4.579 0.0325 9.1861 0.0024 
  SO4
 
SAR
 
F-statistic Resultant significance level F-statistic Resultant significance level 
ANN-DE 0.0159 0.8991 0.8980 0.3431 
ANN-PSO 4.579 0.0325 9.1861 0.0024 

CONCLUSIONS

This paper presents particle swarm optimization (PSO) and differential evolution (DE)-based ANN approaches for estimation of groundwater quality parameters (SO4 and SAR). Two powerful bio-inspired algorithms, PSO and DE, were compared in order to determine which one is more suitable to train an ANN. This is very important because the training of an ANN is one of the key issues to obtain a good generalization. Application of PSO- and DE-based ANN to estimate groundwater quality is a novel research area. A comparison between an ANN trained with the PSO and DE algorithms was performed when applied to estimate groundwater quality. The outcomes and finding of this study indicated that both ANN-PSO and ANN-DE are suitable approaches for simulating groundwater quality. However, it can be observed that the DE-based model exhibits better performance in the training as well as validation and test stages than those of the PSO-based model. The present study used ANN-PSO and ANN-DE models for estimating SAR and SO4 using other qualitative parameters. Further studies should be carried out using limited inputs to verify the generalization of the developed models. Nonetheless, studies around relating SO4 pollution with certain industrial discharges or with rainfall intensity would be of interest.

ACKNOWLEDGEMENTS

This study was supported by The Department of Soil Science, University of Tehran, Iran. The authors thank the editor and anonymous reviewers for their help in improving the quality of the manuscript.

REFERENCES

REFERENCES
APHA
1995
Standard Methods for the Examination of Water and Wastewater
,
19th edn
.
American Public Health Association, American Water Works Association, Water Environment Federation
,
Washington, DC
,
USA
.
Bhattacharyya
,
S.
&
Pendharkar
,
P. C.
1998
Inductive, evolutionary and neural techniques for discrimination: a comparative study
.
Decision Sciences
29
(
4
),
871
899
.
Cheng
,
C. T.
,
Chau
,
K. W.
,
Sun
,
Y. G.
&
Lin
,
J. Y.
2005
Long-term prediction of discharges in Manwan Reservoir using artificial neural network models
. In:
Advances in Neural Networks
– ISNN 2005, vol. 3498 of Lecture Notes in Computer Science
.
Springer
,
Berlin
,
Germany
, pp.
1040
1045
.
Devadas
,
D. J.
,
Rao
,
N. S.
,
Rao
,
B. T.
,
Rao
,
K. V. S.
&
Subrahmanyam
,
A.
2007
Hydrogeochemistry of the Sarada river basin, Visakhapatnam District, Andhra Pradesh, India
.
Environmental Geology
52
,
1331
1342
.
Gaur
,
S.
,
Sudheer
,
Ch.
,
Graillot
,
D.
,
Chahar
,
B. R.
&
Nagesh Kumar
,
D.
2013
Application of artificial neural networks and particle swarm optimization for the management of groundwater resources
.
Water Resources Management
27
(
3
),
927
941
.
Geethanjali
,
M.
,
Slochanal
,
S. M. R.
&
Bhavani
,
R.
2008
PSO trained ANN-based differential protection scheme for power transformers
.
Neurocomputing
71
,
904
918
.
Hesse
,
P. R.
1971
A Textbook of Soil Chemical Analysis
.
Chemical Publishing Co.
,
Revere, MA
,
USA
.
Kennedy
,
J.
&
Eberhart
,
R.
1955
Particle swarm optimization. Proc Neural Networks
. In:
Proceedings of IEEE International Conference
,
27 November–1 December
,
1942e8
.
Kumar
,
M.
,
Raghuwanshi
,
N. S.
,
Singh
,
R.
,
Wallender
,
W. W.
&
Pruitt
,
W. O.
2002
Estimating evapotranspiration using artificial neural network
.
Journal of Irrigation and Drainage Engineering
128
(
4
),
224
233
.
Liu
,
B.
,
Wang
,
L.
&
Jin
,
Y. H.
2007
An effective PSO-based memetic algorithm for flow shop scheduling
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
37
(
1
),
18
27
.
Liu
,
W. C.
,
Chen
,
W.-B.
&
Kimura
,
N.
2009
Impact of phosphorus load reduction on water quality in a stratified reservoir eutrophication modeling study
.
Environmental Monitoring and Assessment
159
(
1–4
),
393
406
.
Luo
,
D.
,
Guo
,
Q.
&
Wang
,
X.
2003
Simulation and prediction of underground water dynamics based on RBF neural network
.
Acta Geoscientia Sinica
24
(
5
),
475
478
.
Mansouri Daneshvar
,
M. R.
,
Bagherzadeh
,
A.
&
Alijani
,
B.
2013
Application of multivariate approach in agrometeorological suitability zonation at northeast semiarid plains of Iran
.
Theoretical and Applied Climatology
114
,
139
152
.
Mantoglou
,
A.
,
Papantoniou
,
M.
&
Giannoulopoulos
,
P.
2004
Management of coastal aquifers based on nonlinear optimization and evolutionary algorithms
.
Journal of Hydrology
297
(
1–4
),
209
228
.
Melin
,
P.
,
Olivas
,
F.
,
Castillo
,
O.
,
Valdez
,
F.
,
Soria
,
J.
&
Valdez
,
M.
2013
Optimal design of fuzzy classification systems using PSO with dynamic parameter adaptation through fuzzy logic
.
Expert Systems with Applications
40
(
8
),
3196
3206
.
Najah
,
A.
,
El-Shafie
,
A.
,
Karim
,
O. A.
&
El-Shafie
,
A. H.
2014
Performance of ANFIS versus MLP-NN dissolved oxygen prediction models in water quality monitoring
.
Environmental Science and Pollution Research
21
(
3
),
1658
1670
.
Omkar
,
S. N.
,
Senthilnath
,
J.
2011
Neural network and swarm intelligence for data mining
. In:
Integration of Swarm Intelligence and Artificial Neural Network
(
Dehuri
,
S.
,
Ghosh
,
S.
&
Cho
,
S. B.
, eds).
Series in Machine Perception and Artificial Intelligence
.
World Scientific
,
New Jersey
, pp.
23
66
.
Richards
,
L. A.
1954
Diagnosis and Improvement of Saline and Alkali Soils
.
Agricultural Handbook 60
.
US Department of Agriculture
,
Washington, DC
,
USA
,
160 pp
.
Selakov
,
A.
,
Cvijetinovic
,
D.
,
Milović
,
L.
,
Mellon
,
S.
&
Bekut
,
D.
2014
Hybrid PSO-SVM method for short-term load forecasting during periods with significant temperature variations in city of Burbank
.
Applied Soft Computing
16
,
80
88
.
Storn
,
R.
1996
On the usage of differential evolution for function optimization
. In:
The North American Fuzzy Information Processing Society Conference
,
Berkeley, California, USA
, pp.
519
523
.
Subramani
,
T.
,
Elango
,
L.
&
Damodarasamy
,
S. R.
2005
Groundwater quality and its suitability for drinking and agricultural use in Chithar River Basin, Tamil Nadu, India
.
Environmental Geology
47
(
8
),
1099
1110
.
Sudheer
,
Ch.
&
Shashi
,
M.
2012
Particle swarm optimization trained neural network for aquifer parameter estimation
.
KSCE Journal of Civil Engineering
16
(
3
),
298
307
.
Sudheer
,
K. P.
,
Gosain
,
A. K.
&
Ramasastri
,
K. S.
2003
Estimating actual evapotranspiration from limited climatic data, using neural computing technique
.
Journal of Irrigation and Drainage Engineering
129
(
3
),
214
218
.
Thomas
,
G. W.
1996
Soil pH and soil acidity
. In:
Methods of Soil Analysis: Part 2
(
Page
,
A. L.
, ed.).
Agronomy Handbook 9
.
American Society of Agronomy and Soil Science Society of America
,
Madison, WI
,
USA
, pp.
475
490
.
Tanikić
,
D.
&
Despotovic
,
V.
2012
Artificial intelligence techniques for modelling of temperature in the metal cutting process
.
Metallurgy – Advances in Materials and Processes
.
In Tech, http://dx.doi.org/10.5772/47850
.
Wilding
,
L. P.
,
Dress
,
L. R.
1983
Spatial variability and pedology.
Pedogenesis and Soil Taxonomy. I. Concepts and Interactions.
(
Wilding
,
L. P.
,
Smeckand
,
N. E.
&
Hall
,
G. F.
, eds).
Elsevier
,
London
, pp.
83
116
.
WHO
1984
Guidelines for Drinking Water Quality Vol. 1: Recommendations
.
World Health Organization
,
Geneva
,
Switzerland
,
130 pp
.
Yesilnacar
,
M. I.
&
Sahinkaya
,
E.
2012
Artificial neural network prediction of sulfate and SAR in an unconfined aquifer in southeastern Turkey
.
Environmental Earth Sciences
67
(
4
),
1111
1119
.
Yesilnacar
,
M. I.
,
Sahinkaya
,
E.
,
Naz
,
M.
&
Ozkaya
,
B.
2008
Neural network prediction of nitrate in groundwater of Harran Plain, Turkey
.
Environmental Geology
56
,
19
25
.