Shallow groundwater is generally of great interest to the community due to its easy availability. However, it is very sensitive to external stimulus. In this paper, shallow groundwater quality is assessed and classified with improved Nemerow pollution index, multi-layer perceptron artificial neural network (MLP-ANN) optimized with a back-propagation algorithm and wavelet neural network (WNN) methods in a coastal aquifer, Fujian Province, South China. The data used in three models were collected during the pre-monsoon over the period 2004–2011. The eight parameters, total dissolved solids, total hardness, chemical oxygen demand, chloride, sulphate, nitrate, nitrite and fluorides, were selected to characterize groundwater quality classification based on the National Quality Standard for Groundwater (GB/T 14848-93). The results of MLP-ANN and WNN are interpreted by mean absolute error, root mean square error and R2 (determination coefficient) criteria. The results obtained from three methods demonstrate that WNN has a higher accuracy compared with the other two methods. The study reveals that these methods are efficient tools for assessing groundwater quality.
INTRODUCTION
Water quality assessment is a complex problem due to multiple factors involved, and difficulties of accurate identification of the pollution components which are affected by many factors and processes. The groundwater quality depends not only on natural factors such as the lithology of the aquifer, the quality of recharge water and the type of interaction between water and aquifer, but also on human activities, which can alter these groundwater systems either by polluting them or by changing the hydrological cycle (Helena et al. 2000). Pollutant issues in water sources because of human activity have been of great concern in recent years (Nash & McCall 1995; UNEP ‘the United Nations Environment Programme’ 1999; Milovanovic 2007; Mohapatra et al. 2011; Ogunkunle & Fatoba 2013). Various methods in groundwater quality assessment have been explored. Types of water quality indices were applied in environmental assessment (Wayne 1978; Li et al. 2007; Lermontova et al. 2009; Ni et al. 2010; Liang et al. 2011; Tang et al. 2011; Sarala & Sabitha 2012; Ma et al. 2013). Many traditional approaches and techniques have been applied to water quality assessment including multivariate statistical methods such as cluster analysis, factor analysis, principal component analysis and discriminant analysis, which generally were used to identify the major factors affecting ground-water. The use of graphical methods such as the Stiff diagram (Stiff 1951) to interpret the hydrochemistry is limited to two dimensions (Hem 1970). However, due to the complex nonlinear relationships and uncertainties between the parameters on groundwater quality, artificial neural networks (ANNs) have become a popular method in environmental simulation and prediction because they can overcome some of the difficulties associated with traditional statistical approaches (Wu et al. 2009; Chen et al. 2015; Gholami et al. 2015; Taormina & Chau 2015; Wang et al. 2015).
According to a review paper of ANN applications to water resource variables that had been published until the end of 1998 (Maier & Dandy 2000), the authors reviewed 43 papers and reported that in these papers surface water flow and quality was the topic in 28 and rainfall forecasting in 13 (Yang et al. 2015). The application of ANNs in assessing groundwater quality in recent years has been reported in several studies (Aris et al. 2012; Khashei-Siuki & Sarbazi 2013; Nadiri et al. 2013; Pedro et al. 2013). One of the most important features of ANN models is their ability to adapt to recurrent changes and detect patterns in a complex natural system (Cannas et al. 2006; Adamowski 2007; Partal 2009; Tiwari & Chatterjee 2010; Adamowski & Chan 2011). Application of ANN in hydrological forecasting and prediction can be traced back to the 1990s, ANN models are called ‘black box’ models due to their ability to model dynamic nonlinear systems by detecting patterns in a complex system, without the need to understand the physical mechanism taking place in the system. ANNs are proven to be effective in modeling virtually any nonlinear function to a desired degree of accuracy. The advantages of ANN models over conventional simulation methods have been discussed in detail by French et al. (1992). The most popular type of ANN is the multi-layer perceptron (MLP) model optimized with a backpropagation (BP) algorithm. However, a problem solved with ANNs and other non-linear methods is that they have some limitations with non-stationary data if pre-processing of the input data is not conducted. In the last decade, wavelet analysis has been applied in water resources engineering and hydrology, and it has been found to be very effective for handling non-stationary data. Wavelet transforms can decompose the original time series, and the wavelet-transformed data improve the ability of a forecasting model by capturing useful information on various resolution levels (Adamowski & Sun 2010).
The main advantages of ANN can be summarized as follows: (1) high efficiency of computation in dealing with large quantities of data and nonlinear relationship between parameters (especially for water quality) and data transfer during the calculation process, which enable its accuracy in water quality assessment or simulation; (2) memory ability of large capacity can store large volumes of water quality data and the corresponding relationship between inputs and outputs, combination of high speed of computation will inevitably enhance intelligence level of water quality assessment and simulation; (3) learning ability avoids some processes such as mechanism analysis, boundary and initial hypothesis, parameter estimation and calibration in establishing groundwater quality simulation, only model training is necessary to determine the input–output relationship, which greatly simplifies the model setup procedure.
The main purpose of this paper is to construct the improved Nemerow pollution index (INPI) method, MLP-ANN and wavelet neural network (WNN) methods and demonstrate their applicability to assess and classify the shallow groundwater quality. Comparison among these three methods can provide useful insights for identifying the effectiveness of each method.
DATA AND METHODOLOGY
Groundwater quality dataset
The study area can be considered as an independent hydro-geological unit due to the sea surrounding all the four sides. Water yield property differs greatly because of lithology and thickness of the aquifer. Groundwater type is coarse porous water, recharged predominantly by rainfall infiltration.
The INPI method
The assessment of groundwater quality was conducted using multivariate indexes, which gives priority to the extreme values and weights, as well as the usage purposes of water. The weights are determined according to various uses for the same water. Three essential factors, the water quality indexes, the assessment method and the extreme values and weights, have to be considered in the water quality assessment in the Nemerow pollution index method.
In this paper, a computation between the measured concentrations of water samples and standard values prescribed in groundwater quality standards (GB/T14848-93) was firstly carried out, and then the INPI of each grade was obtained. Lastly, the grades of water quality on the basis of the pollution index values were evaluated.
WNN method
WNN, a newly rising mathematical analysis model which combines the wavelet transform with the ANN, has been applied widely in water quality assessment (Dogan et al. 2009; Singh et al. 2009; Moustris et al. 2010; Chu et al. 2013). Wavelets are mathematical functions that give a time-scale representation of the time series and their relationships to analyze time series that contain non-stationarities. Wavelet analysis allows the use of long time intervals for low frequency information and shorter intervals for high frequency information and is capable of revealing aspects of data like trends, breakdown points, and discontinuities that other signal analysis techniques might miss. Another advantage of wavelet analysis is the flexible choice of the mother wavelet according to the characteristics of the investigated time series (Adamowski & Sun 2010).
The weights correction method of WNN is similar to the BP neural network. By adjusting the weights and factors of the wavelet basis in the gradient modification, the output of WNN will approximate to the predicted output. The modification procedure of the WNN process is summarized as follows.
A series of steps are involved in the WNN training:
The initialization of networks which contains the parameters such as the weights, the factors of wavelet basis and learning rate.
The classification of samples. The samples are grouped into two parts: the training and the testing samples. The training samples are used to train the network and the testing samples are used to test the precision.
The prediction of WNN model. Input the trained data, then calculate the predictive output and error.
The adjustment of the weights and factors of wavelet basis according to the error.
The judgment on whether the algorithm ends. Otherwise, return to step 3.
Model performance indices
RESULTS AND DISCUSSION
Groundwater quality assessment in INPI
In this study, the Grade III standard in the National Quality Standard for Groundwater (GB/T 14848-93) is taken as the assessment criterion. Equation (5) is applied to compute the weights of each selected pollution parameter and the result is summarized in Table 1.
Groundwater parameters . | Grade III . | Weights (Wi) . |
---|---|---|
TH (mg/L) | ≤450 | 4.32 × 10−5 |
COD (mg/L) | ≤3.0 | 6.49 × 10−3 |
TDS (mg/L) | ≤1,000 | 1.95 × 10−5 |
Cl− (mg/L) | ≤250 | 7.78 × 10−5 |
(mg/L) | ≤250 | 7.78 × 10−5 |
(mg/L) | ≤20 | 9.73 × 10−4 |
(mg/L) | ≤0.02 | 0.973 |
F− (mg/L) | ≤1.0 | 0.0195 |
Groundwater parameters . | Grade III . | Weights (Wi) . |
---|---|---|
TH (mg/L) | ≤450 | 4.32 × 10−5 |
COD (mg/L) | ≤3.0 | 6.49 × 10−3 |
TDS (mg/L) | ≤1,000 | 1.95 × 10−5 |
Cl− (mg/L) | ≤250 | 7.78 × 10−5 |
(mg/L) | ≤250 | 7.78 × 10−5 |
(mg/L) | ≤20 | 9.73 × 10−4 |
(mg/L) | ≤0.02 | 0.973 |
F− (mg/L) | ≤1.0 | 0.0195 |
TH, Total hardness; COD, Chemical oxygen demand; TDS, Total dissolved solids.
Table 2 shows the conversion of the groundwater quality standard from the concentration to the special weights. It can be seen from Table 2 that a water sample with less than 0.47643 will be classed into Grade I, which is considered as clean and excellent for drinking. If the concentrations of very few parameters exceed the limits, those water samples are classified as Grade II. Likewise, water samples classified as Grade III mean they are slightly polluted with a few values exceeding the standards. Similarly, water samples in Grade IV mean moderately contaminated with at least two parameters exceeding the criteria. Grade V means the water is seriously polluted as almost all the parameters are far beyond the standard values.
Groundwater parameters . | Grade I . | Grade II . | Grade III . | Grade IV . | Grade V . |
---|---|---|---|---|---|
TH (mg/L) | ≤150 | ≤300 | ≤450 | ≤550 | >550 |
COD (mg/L) | ≤1.0 | ≤2.0 | ≤3.0 | ≤10 | >10 |
TDS (mg/L) | ≤300 | ≤500 | ≤1,000 | ≤2,000 | >2,000 |
Cl− (mg/L) | ≤50 | ≤150 | ≤250 | ≤350 | >350 |
(mg/L) | ≤50 | ≤150 | ≤250 | ≤350 | >350 |
(mg/L) | ≤2.0 | ≤5.0 | ≤20 | ≤30 | >30 |
(mg/L) | ≤0.001 | ≤0.01 | ≤0.02 | ≤0.1 | >0.1 |
F− (mg/L) | ≤1.0 | ≤1.0 | ≤1.0 | ≤2.0 | >2.0 |
≤0.476 | ≤0.702 | ≤1 | ≤3.198 | >3.198 | |
Water class | Excellent | Good | Fair | Poor | Very poor |
Groundwater parameters . | Grade I . | Grade II . | Grade III . | Grade IV . | Grade V . |
---|---|---|---|---|---|
TH (mg/L) | ≤150 | ≤300 | ≤450 | ≤550 | >550 |
COD (mg/L) | ≤1.0 | ≤2.0 | ≤3.0 | ≤10 | >10 |
TDS (mg/L) | ≤300 | ≤500 | ≤1,000 | ≤2,000 | >2,000 |
Cl− (mg/L) | ≤50 | ≤150 | ≤250 | ≤350 | >350 |
(mg/L) | ≤50 | ≤150 | ≤250 | ≤350 | >350 |
(mg/L) | ≤2.0 | ≤5.0 | ≤20 | ≤30 | >30 |
(mg/L) | ≤0.001 | ≤0.01 | ≤0.02 | ≤0.1 | >0.1 |
F− (mg/L) | ≤1.0 | ≤1.0 | ≤1.0 | ≤2.0 | >2.0 |
≤0.476 | ≤0.702 | ≤1 | ≤3.198 | >3.198 | |
Water class | Excellent | Good | Fair | Poor | Very poor |
TH, Total hardness; COD, Chemical oxygen demand; TDS, Total dissolved solids; , Improved Nemerow pollution index.
Well no. . | . | |||||||
---|---|---|---|---|---|---|---|---|
2004 . | 2005 . | 2006 . | 2007 . | 2008 . | 2009 . | 2010 . | 2011 . | |
#1 | 1.019 | 1.625 | 1.184 | 1.611 | 1.184 | 1.046 | 1.04812 | 2.338 |
#2 | 0.425 | 0.467 | 0.173 | 0.125 | 0.302 | 0.118 | 0.33455 | 0.383 |
#3 | 1.682 | 1.834 | 1.557 | 1.593 | 1.770 | 2.404 | 1.53089 | 1.581 |
#4 | 6.107 | 0.722 | 1.631 | 0.958 | 2.577 | 2.881 | 0.68956 | 0.559 |
#5 | 0.734 | 0.096 | 0.105 | 0.176 | 0.567 | 0.236 | 0.59233 | 1.333 |
Well no. . | . | |||||||
---|---|---|---|---|---|---|---|---|
2004 . | 2005 . | 2006 . | 2007 . | 2008 . | 2009 . | 2010 . | 2011 . | |
#1 | 1.019 | 1.625 | 1.184 | 1.611 | 1.184 | 1.046 | 1.04812 | 2.338 |
#2 | 0.425 | 0.467 | 0.173 | 0.125 | 0.302 | 0.118 | 0.33455 | 0.383 |
#3 | 1.682 | 1.834 | 1.557 | 1.593 | 1.770 | 2.404 | 1.53089 | 1.581 |
#4 | 6.107 | 0.722 | 1.631 | 0.958 | 2.577 | 2.881 | 0.68956 | 0.559 |
#5 | 0.734 | 0.096 | 0.105 | 0.176 | 0.567 | 0.236 | 0.59233 | 1.333 |
Groundwater quality assessment in MLP neural network
Groundwater quality assessment in WNN
Data processing
WNN model needs to be trained before it is used to conduct groundwater quality assessment. The selection of the trained samples is of great concern since it is significantly related to the establishment of the WNN model. In this work, the training sample is extracted from the standard data in groundwater environmental quality standards (GB/T14848-93). While all eight indicators are within Grade I, it is identified as Grade I, Grade II is decided while eight indicators are within the Grade II range, and so on. In this study, Latin hypercube sampling (LHS) method was applied for the data sampling. The concept of the Latin-Hypercube simulation is based on Monte Carlo simulation but uses a stratified sampling approach that allows efficient estimation of the output statistics. It subdivides the distribution of each parameter into N strata with a probability of occurrence equal to 1/N. For uniform distributions, the parameter range is subdivided into N equal intervals. Random values of the parameters are generated such that for each of the P parameters, each interval is sampled only once. This approach results in N non-overlapping realizations and the model is run N times. LHS is commonly applied in water quality modelling due to its efficiency and robustness.
The detailed procedure is summarized as follows: 80 samples are extracted from each grade of water samples, thus a total of 400 samples are applied to train the dataset, and a total of 50 samples are applied to validate the dataset. As eight indicators and one corresponding water quality grade are involved, the sample is a nine-dimensional vector. In this process, Grade I is encoded in the dataset as real number 1; Grade II is encoded as to 2, and so on.
Training and testing
Output of WNN | y < 1.5 | 1.5 ≤ y < 2.5 | 2.5 ≤ y < 3.5 | 3.5 ≤ y < 4.5 | 4.5 ≤ y |
Assignment of grade | I | II | III | IV | V |
Output of WNN | y < 1.5 | 1.5 ≤ y < 2.5 | 2.5 ≤ y < 3.5 | 3.5 ≤ y < 4.5 | 4.5 ≤ y |
Assignment of grade | I | II | III | IV | V |
Comparative analysis of groundwater quality assessment
To facilitate the comparative water quality assessment results, Table 5 summarizes the assessment results for three methods. The calculated MAE, RMSE and R2 are 0.292, 0.371, 0.989 for BP-ANN model, and those of WNN are 0.073, 0.091 and 0.996, indicating that WNN model has a higher accuracy. Although the BP model has a good stability shown above, the evaluating result has a relatively large difference. According to the comparison of two neural network methods, the WNN method has a higher accuracy than the BP-ANN method. The BP method requires more iteration with no guarantee of accuracy of the results for the same task. It reveals that the WNN and NMR methods are both effective for water quality assessment because the result is consistent with the actual water quality status.
Year . | Water quality grade . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#1 . | #2 . | #3 . | #4 . | #5 . | |||||||||||
WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | |
2004 | IV | IV | III | II | I | III | V | IV | III | I | V | V | II | III | III |
2005 | IV | IV | IV | II | I | II | IV | IV | III | II | III | IV | I | III | II |
2006 | IV | IV | V | I | I | IV | V | IV | V | III | IV | IV | I | I | II |
2007 | IV | IV | III | I | I | II | V | IV | IV | II | III | III | I | I | II |
2008 | IV | IV | III | I | I | III | V | IV | IV | IV | IV | V | I | II | IV |
2009 | IV | IV | III | I | I | II | V | IV | IV | IV | IV | V | I | I | III |
2010 | IV | IV | III | I | I | II | V | IV | III | II | II | II | II | II | IV |
2011 | IV | IV | V | II | I | III | V | IV | IV | II | II | II | III | IV | IV |
Year . | Water quality grade . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#1 . | #2 . | #3 . | #4 . | #5 . | |||||||||||
WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | WNN . | NMR . | BP . | |
2004 | IV | IV | III | II | I | III | V | IV | III | I | V | V | II | III | III |
2005 | IV | IV | IV | II | I | II | IV | IV | III | II | III | IV | I | III | II |
2006 | IV | IV | V | I | I | IV | V | IV | V | III | IV | IV | I | I | II |
2007 | IV | IV | III | I | I | II | V | IV | IV | II | III | III | I | I | II |
2008 | IV | IV | III | I | I | III | V | IV | IV | IV | IV | V | I | II | IV |
2009 | IV | IV | III | I | I | II | V | IV | IV | IV | IV | V | I | I | III |
2010 | IV | IV | III | I | I | II | V | IV | III | II | II | II | II | II | IV |
2011 | IV | IV | V | II | I | III | V | IV | IV | II | II | II | III | IV | IV |
NMR, The INPI method.
CONCLUSIONS
In order to assess the groundwater quality in the coastal aquifer, water samples collected during the summer season were investigated in the INPI method, multiple layer perception neural network optimized with back propagation algorithm and wavelet transform neural network. The INPI approach not only considers the effects of pollutants with the maximum pollution degree, but also takes into account the influence of the most dangerous pollution factor including the impact of the maximum weights, the second-maximum of weights and the second-maximum of the ratio, which is of great importance in the evaluation. BP-ANN and WNN both demonstrate good performance in assessing groundwater quality, however, WNN has higher accuracy. To protect shallow water sources from contamination, further study will focus on the exploration about what kinds of pollutants dominantly control the groundwater quality.
Some limitations of the present methods used in this study have to be addressed. The INPI method overestimates the maximum pollution factors. The BP-ANN method, the initial weight and learning rate of hidden layer were artificially determined with experience, therefore the learning process may fall into the local minimum in some cases. Keeping in view the seasonal changes of groundwater chemistry, it is suggested that more water sample data in different seasons and their repeat analysis are required in future work.
ACKNOWLEDGEMENTS
This research was financially supported by National Natural Science Foundation of China (Grant No. 41402202).