The identification of groundwater contaminant sources is a primary step in designing and implementing a remediation strategy. The work presented here was undertaken to develop an efficient strategy that addresses the unknown multiple contaminant sources problem, and that could identify the number, location and magnitude of the groundwater contaminant sources and select optimal sampling locations. A Monte Carlo approach was used first to obtain the statistical characteristics of groundwater flow and transport model. Then the linear Kalman filter and a modified comparison method were utilized to update the estimation of concentration values and source weights, which represent the similarity between the estimated composite plume and each individual plume. Moreover, an optimization method was employed to identify the magnitude of contamination and the optimal sampling location. All of these steps were repeated until the weights stabilized and converged. A synthetic example was used to test the strategy and a further uncertainty analysis was conducted. The evaluation demonstrated that the strategy effectively addresses unknown multiple-source problems, under the condition that the error of concentration measurement value was controlled to less than 10%, and the time error was controlled to less than 6%.

## INTRODUCTION

Identifying the characteristics of groundwater contaminant sources is not only a major task before adopting a feasible remediation strategy, but also a general basis for how to apportion the remediation duties and costs from a legal and regulatory point of view (Butera *et al.* 2013). Due to the complexities of any groundwater system, it is always difficult to obtain all the useful information and utilize it to solve the inverse problem perfectly.

Characteristics of contaminant sources mainly include the number, location, magnitude and contaminant release history. For single-source inverse problems, researchers usually focus on identifying the location and release history of the contaminant source, and have achieved fruitful results. Skaggs & Kabala (1994) used the Tikhonov regularization method to recover the release history of a known groundwater contaminant source. Neupauer *et al.* (2000) used both Tikhonov regularization and the minimum relative entropy method to reconstruct the release history of a known source. Michalak & Kitanidis (2004) employed the adjoint state method to identify the source location. Wang & Jin (2013) adopted the Bayesian method to infer the location and magnitude of contaminant sources in three dimensions. Jha & Datta (2015) identified the location, starting release time and active duration of a source based on the dynamic time warping method. Nevertheless, the most common problems encountered in practice are those in which the number of contaminant sources is unknown. So a method tailored to complex and challenging multiple-source problems would be of considerable practical significance. Singh *et al.* (2004) used artificial neural networks to identify the location and release history of contaminant sources based on two synthetic examples. Jha & Datta (2013) identified the locations and release history of contaminant sources in a three-dimensional case based on an adaptive simulated annealing algorithm. Gurarslan & Karahan (2015) proposed a differential evolution algorithm to characterize the locations and release history of groundwater contaminant sources.

However, when solving multiple-source problems, these optimal methods coupled with a heuristic algorithm and the methods based on artificial neural networks may take too much time, or may not be stable. Dokou & Pinder (2009, 2011) proposed a new approach for identifying groundwater contaminant sources based on the Kalman filter and fuzzy set theory. The algorithm could search for a reasonable solution under a single contaminant source case, but was unsuitable for the multiple-source case because the final weights of the true sources could not converge to 1 simultaneously (one weight was 1.0 and the other was 0.8, although both sources were ‘true’ sources). The work presented here was conducted to extend and improve the earlier algorithm developed by Dokou & Pinder (2009, 2011). A modified comparison method for fuzzy sets and a means of describing an ‘equivalent source’ was developed that would be more efficient and robust for solving the unknown source identification problems, whether for a single-source case or a multiple-source case.

## METHODOLOGY

*et al.*2013). In this research, several tools and methods were employed. The main process and solution steps are shown in Figure 1.

### Monte Carlo approach

Considering the uncertainty of a groundwater system, which is due to the heterogeneity of hydraulic conductivity, a Monte Carlo approach was used to statistically describe the distribution of contaminant concentration. This approach assumes that the hydraulic conductivity follows a log-normal distribution. First, 100 hydraulic conductivity fields were generated (Zhang & Pinder 2003), and each of them was used to calculate the concentration field for each potential source. Then the mean value of 100 concentration fields of each potential source was identified (called the ‘individual plume’), and used to calculate the weighted average concentration field (called the ‘composite plume’) and the initial variance–covariance matrix.

### Linear Kalman filter

The linear Kalman filter was first proposed for addressing prediction problems in communication and control (Kalman 1960). When employed in contaminant source identification (Dokou & Pinder 2009, 2011), the main steps that were used to update the state variable and error covariance matrix are as follows.

*K*is the Kalman gain matrix;

*P*is the error covariance matrix;

*H*is the sampling matrix with dimension , whose element is 1 when a specific position is taken as a sampling point, otherwise the elements are zero;

*v*is the covariance of measurement error; is the estimate of contaminant concentration; and

*z*is a vector of

*l*noise-corrupted measurements. In this analysis, because just one sampling data point was used at one time, and ‘’ denotes a prior estimate and ‘’ denotes a posterior estimate.

*i*th row and

*j*th column can be calculated using Equation (4):where

*q*is the total number of hydraulic conductivity realizations (equal to 100), and denote the

*i*th and

*j*th individual plume concentration generated by the

*k*th hydraulic conductivity realization, and and denote the

*i*th and

*j*th composite plume concentrations, respectively.

In the analysis, a means of identifying an ‘equivalent source’ was proposed. For example, assume there were three potential sources named S_{1}, S_{2} and S_{3}. Besides the three single individual contaminant plumes, all combinations of the three potential sources should also be calculated. In other words, for a scenario consisting of three potential sources, three ‘single sources’ and four ‘equivalent sources’ should be characterized.

### Fuzzy mathematics and implementation

*c*becomes as shown below.Then a -cut method was used to compare the fuzzy sets and get the new weights. The original -cut was a crisp set that contains all the elements of a fuzzy set, whose membership degrees were greater than or equal to the . However, in unknown contaminant source problems where more than one true source exists, the ‘equivalent individual plume’ that emanates from the sources must be larger than the ‘single individual plumes’. When comparing the common area with the ‘composite plume’ by the original -cut method, the resulting ‘equivalent individual plume’ would be dominant and distorted. To solve this problem, a modified -cut method was proposed, in which is no longer a specific value, but rather a specific interval. If set (for which ‘[ ]’ denotes a closed interval), the matrix after being operated becomes , as illustrated above in which ‘E’ denotes an empty set.

*g*could be obtained using Equation (5), which provides a measure of how similar the two plumes were, as well as ensuring that higher concentration values were weighted more than lower concentration values.

### Optimal method for magnitude

*i*, is the measured concentration value at the

*i*th position, is the magnitude of the th source,

*n*is the total number of all sampling positions, and

*k*is the total number of potential sources.

*S*under a unity injection for source , and is a coefficient.

### Selection of sampling positions

*j*th hydraulic conductivity field and the composite plume could be expressed as Equations (9) and (10), respectively.In Equations (9) and (10), , and denote the three individual plumes that emanate from each potential source with the

*j*th hydraulic conductivity random field, where

*n*is the number of the hydraulic conductivity random field.

The covariance between each and under the current weighted value was the basic criterion for determining the optimal sampling position because it could reflect the uncertainty of the groundwater system. To simplify the calculations, the variance of was used instead of the covariance between each and . The position with the highest value of variance was chosen as the optimal sampling position.

### Case study

To demonstrate the flexibility of the proposed strategy, a synthetic example was developed. The example and data used were as realistic as possible and based on practical experience. The task in the hypothetical scenario was to identify the number, location and magnitude of the true contaminant source (or sources).

In this scenario, three potential source locations were determined as the result of a primary investigation. Each hydraulic conductivity random field was run for all sources individually. Because the synthetic example represented a typical real-world situation in which there was no exact information about initial weights, an assumption was made that all initial weights were equal to 0.5. The left boundary of used to calculate the degree of similarity of two plumes is equal to the number ranges from 0.1 to 0.8, with an interval of 0.05. Each right boundary is equal to the left boundary plus a value of 0.1. For example, the first three ranges were [0.1, 0.2], [0.15, 0.25] and [0.2, 0.3].

## RESULTS AND DISCUSSION

### Results

Then an optimal method as described under ‘Selection of sampling positions’ was used to determine the best estimated value of the magnitude. The mean relative error (MRE) of the estimated magnitude was 9.33%, indicating that the solution was not ideal. To analyse the reason for the poor solution, we calculated the objective function value using information for the true sources based on the forward model, which is greater than the objective function value obtained based on the estimated magnitude. This shows that the primary error is not caused by the algorithm, but rather because the linear assumption is not accurate enough for this example.

### Uncertainty analysis

For inverse problems such as identifying groundwater contaminant sources, the uncertainty of the solution mainly derives from two aspects: one is randomness and the other is fuzziness. Randomness is typically quantified by the measurement error among various terms, such as hydrogeological parameters and concentration values. In contrast, fuzziness always occurs in the conceptualization or decision process, which involves some anthropic factors. In most instances, randomness and fuzziness cannot be clearly divided.

The random hydraulic conductivity field and a Monte Carlo approach were employed to reduce the randomness of hydrological parameters. Thus, in our analysis, the noise of concentration measurement and the estimation of activity periods of the sources were focused. The example we next describe is the multiple contaminant sources scenario presented previously.

#### Concentration measurement noise

Based on Figure 5, when the absolute value of noise was less than 5%, the updated weights stabilized and converged, and the estimated contaminant source location could be determined as S2&S3. When the noise criterion became 10%, the true sources could also be identified to be S2&S3. However, when the noise index rose to 15%, the location of the sources could not be identified confidently, and when the absolute value of noise extended to 20%, the identification of sources was irregular and could not provide any useful identification information. The accuracy of the estimated magnitude values changed little. All of the MRE for these estimates were approximately 9% to 10%, indicating that the observed data were not so sensitive to the injection rate because the active period of the sources was 4 years.

#### Noise of active period

In fact, when a parameter such as the active period is known, a source's location would also be known (in theory). However, it is difficult to determine this term accurately at first from a primary investigation. Thus, testing the strategy using different durations of noise during the active period is necessary to illustrate the strategy's practical applicability.

Based on the analysis above, for the examples tested, the proposed strategy can identify the true sources of contamination within an acceptable margin of error. For the concentration measurement, the absolute value of the error should be controlled at 10% or less, and the time error, which affected the results to a greater extent, should be controlled to less than 6% (i.e. 3 months within 4 years). However, the MRE of the estimated magnitude was affected little, no matter what and how large the error was.

## CONCLUSIONS

(1) The proposed modified strategy for identifying unknown sources of groundwater contamination can successfully identify the sources no matter how many sources exist or where in a study area they are located.

(2) To guarantee the reliability of a solution, the error of concentration measurement should be controlled at 10% or less, and the time error, which affected the results more than did the concentration measurement error, should be controlled at less than 6%.

(3) With a MRE of 5.83%, the solution of a single contaminant source case was more accurate than the solution for multiple sources (MRE = 9.33%), and the MRE values changed little within an acceptable error range.

The strategy presented here has two major advantages over existing analytical techniques. One is that it includes a novel means of describing an ‘equivalent source’. This feature makes it possible to identify the location of ‘true’ contaminant sources using the final source weights directly, rather than having to base identification on a series of slightly lower weights. The second advantage is that the strategy includes a modified plume comparison method to calculate the degree of similarity between two contaminant plumes. This modified method is a more consistent approach to solving the practical problem of identifying unknown contaminant sources than are existing techniques.

However, the linear assumption used in the modified strategy perhaps is not accurate enough for estimating the magnitude of contamination. This potential shortcoming could be the focus of future research in a follow-up study to improve the strategy.

## ACKNOWLEDGEMENTS

This research was supported by the National Natural Science Foundation of China (41372237) and the Program of the Environmental Protection Department of Jilin Province (2015-11). We thank the editor and reviewers for their insightful comments and suggestions.