The problem of identifying an unknown pollution source in polluted aquifers, based on known contaminant concentration measurements, is part of the broader group of issues called inverse problems. This paper investigates the feasibility of solving the groundwater pollution inverse problem by using artificial neural networks (ANNs). The approach consists first in training an ANN to solve the direct problem, in which the pollutant concentration in a set of monitoring wells is calculated for a known pollutant source. Successively, the trained ANN is frozen and is used to solve the inverse problem, where the pollutant source is calculated which corresponds to a set of concentrations in the monitoring wells. The approach has been applied for a real case which deals with the contamination of the Rhine aquifer by carbon tetrachloride (CCl_{4}) due to a tanker accident. The obtained results are compared with the solution obtained with a different approach retrieved from literature. The results show the suitability of ANN-based methods for solving inverse non-linear problems.

## INTRODUCTION

Groundwater is an important source for the production of drinking water. Consequently, the protection of groundwater resource quality appears of extreme importance for life support systems. Nevertheless, groundwater is exposed to man-made pollution that might prevent its use for drinking as well as for other domestic, industrial and agricultural purposes. When groundwater is polluted, the restoration of quality and removal of pollutants is a very slow, hence long, and sometimes practically impossible task.

In the field of groundwater resource contamination, it should be highlighted that in some cases pollution might result from contaminations whose origins differ in time and place from the point where the contaminations were witnessed. To tackle such situations, it is necessary to develop specific techniques for identifying the behaviour of unknown contaminant sources from both spatial and temporal points of view. Getting to know the initial conditions of pollution is consistent with the implementation of the European Union Directive 2004/35/EC. This Directive, based on the ‘polluter-pays’ principle, concerns the environmental liability in relation to the prevention and compensation of environmental damages. The application of Directive 2004/35/EC requires the development of novel methodologies, such as that proposed in this work, for the identification of unknown pollution sources in contaminated aquifers. The problem of identifying an unknown pollution source in contaminated aquifers, based on known contaminant concentration measurement, is part of the wide group of issues called inverse problems. During the last decade, several studies have been dedicated to the development of different methods for solving inverse problems, however, works using artificial neural networks (ANNs) are less popular. Among these latest, in Zio (1997), an ANN is trained to identify the value of the dispersion coefficient in a simple analytic contaminant transport model; in Fanni *et al.* (2002), an ANN is used to locate a pollutant source; in Rajanayaka *et al.* (2002), a hybrid approach based on a combination of two types of ANN model is applied to estimate hydrogeological parameters; in Scintu (2004), ANNs are trained to predict the coordinates of the pollutant source and the time the pollution occurred; in Singh & Datta (2007), an ANN is used to identify unknown pollution sources with partially missing concentration observation data; in Foddis *et al.* (2013), ANNs are used for locating the source of a contamination event in time and space.

In this work, an ANN-based inverse problem solving method is presented for the identification of unknown contaminant sources and their characteristics. Compared to the state of the art, the originality of the developed method lies in the approach used for solving the inverse problem, which consists of inverting the ANN model instead of solving the equations that describe the physical system. In particular, different ANNs are trained to associate the pollution source features with the contaminant concentration measurements acquired in the monitoring wells. After this phase, the trained ANN is inverted by fixing the measurements at the wells and determining the corresponding pollutant source. This approach might be used for defining punctual contaminant sources to face groundwater quality problems. In particular, location and fluxes over time can be defined. The procedure described in this paper has been validated in a real case which tackles an actual contamination that occurred in the Rhine aquifer (France) (Vigouroux *et al.* 1983).

## METHOD DESCRIPTION

### Multi-layer perceptron network model

Such a neural model is trained by using a set of input–output pairs of patterns. These patterns are created by means of flux and transport contaminant modelling software, which calculates both the source of contaminant and the set of measurements in the monitoring wells. These data are converted into patterns which represent the input and the output of the ANN, respectively. Thanks to the weighted links, the ANN associate an output pattern to any input pattern. The training procedure consists of adapting the weights so that the ANN performs a desired association between the input and output patterns of a given training set of examples. In this work, the desired output values (target) represent the measured values so that the trained ANN is able to associate a given source of pollutant with the corresponding measurements in the monitoring wells. The Levenberg–Marquardt algorithm has been used to train the ANN.

Solving system equations (1) for the input , given an assigned value of the output , means solving an inverse problem of determining the characteristics of the pollutant source. In the following two sections, the adopted methods to extract features from data and to solve system (1) are described.

### Feature extraction

Data which describe groundwater flow and contaminant transport are characterized by huge volume and strong redundancy of information, so that in general it is possible and also mandatory to pre-process the data by reducing the volume without a meaningful loss of information. This operation is referred to as feature extraction and it allows the condensing of the information contained in the original data into a strongly reduced dimension of data. Feature extraction techniques have been chosen in compliance with two important requirements. Firstly, a high rate of compression of the information has to be ensured in order to guarantee a significant correspondence between the back transformed data and the original values. Secondly, the procedure can be easily backward applied to allow the reconstruction of the initial data set with the least possible error.

The adopted feature extraction procedure consists of: (1) calculating the two-dimensional discrete fast Fourier transform (2D-FFT) for constructing both the input and output matrices; (2) reducing the matrices by a fixed threshold of energy content; and (3) reducing the matrix size by means of the principal component analysis (PCA) method.

The training set is constituted by input-output pairs of matrices, calculated by a simulator, where the former reports the time evolution of the pollutant source at four different depths, while the latter one is the time evolution of measured pollutant concentration corresponding to the monitoring wells. Therefore, the former matrix has four rows, while the latter one has as many rows as the number of monitoring wells. The number of columns is equal to the number of time samples. Both input and output matrices have to be converted into a pattern, and this pair of patterns will represent a specific case or example. By performing a suitable number of simulations, we finally obtain an input and an output matrix having the same number of columns, each one corresponding to an example.

The feature extraction procedure is first performed on the two matrices which, respectively, describe source and monitoring wells for each example. The two-dimensional Fourier Transform (Mersereau & Dudgeon 1975) of the two matrices is calculated, switching to the frequency domain. Here, each element represents a spectral component of the signal, whose relevance in the signal is measured by its amplitude. Therefore we can define a preliminary threshold of amplitude in order to eliminate the components which are less relevant. The selected components have to be the same for all the examples, therefore all the components which overcome the threshold in at least one example will be kept. The same set of components will have to be kept for any future example. For each example, the selected components of both input and output matrices are rearranged in two column vectors, therefore at the end of this stage we have two matrices, input and output, respectively, for the entire training set, and for both, the number of columns is equal to the number of examples. The dimension of these two matrices can be further reduced, taking into account the redundancy of information due to the correlation existing among rows of the same matrix. The most efficient way to linearly eliminate such redundancy is the PCA method (Henebry & Rieck 1996). This analysis consists of rotating the reference axis in such a way that the variance of points distribution along some axis (principal) is maximum, so that the residual variance along the remaining axis can be neglected. The quantity of information which is lost with the neglected axis can be easily estimated, and in the real cases the PCA allows strong reduction of the dimension of patterns with very little loss of information. The dropped matrices are the input and output matrices that will be used to train the ANN. The training of the ANN is performed by using the Levenberg–Marquardt algorithm. The quality of the training affects the reliability of the successive inversion of the ANN, therefore all the standard expedients have to be adopted to guarantee the suitable approximation level and generalization capability of the trained ANN.

### Inversion procedure

Once the training of the MLP ANN is completed it is used to solve the inverse problem by exploiting the MLP inversion method described in Fanni & Montisci (2003) and Carcangiu *et al.* (2007). It consists of determining the input pattern which corresponds to a given output pattern, which in our case means identifying the unknown pollution source on the basis of a set of measurements acquired in the monitoring wells. As in any identification problem, it is fundamental to guarantee the uniqueness of the solution. This implies that the two systems of linear equations in (1) have to be determined or over-determined, and so the number of input neurons has to be less than or equal to the number of hidden neurons, which has to be less than or equal to the number of output neurons. Secondly, for the same reason, the first and the second connection weights matrices and have to be full rank. The parameter that can affect such features of the ANN is, generally speaking, the approximation degree of the representation of both input and output data. This means that by increasing the precision used to describe the input and output data, we could find out that the weights matrices are not full rank. For this reason a trial-and-error procedure should be adopted in order to seek the best approximation level which guarantees the invertibility of the MLP ANN.

Inverting the MLP ANN means solving system (1) for the unknowns , and _{,} while the output is fixed. If the constraints on the rank of and are fulfilled, solving system (1) is quite simple as the three unknown patterns , and can be determined one after the other univocally.

*represents the transposition operator. If the number of hidden neurons is equal to the number of output neurons, the weights matrix is squared, and the solution corresponding to the minimum sum squared error can be directly calculated as*

^{T}Also in this case, if the number of input neurons is equal to the number of hidden neurons, the system can be solved by directly inverting the matrix .

The pattern describes the unknown source of pollutant. To generate the time evolution of the source, the transforms that have been applied for the feature extraction have to be back applied to the obtained pattern .

## APPLICATION OF THE METHOD TO A CASE STUDY

The performance of the proposed methodology has been evaluated by defining the behaviour in time and space of the unknown pollution source of the Alsatian aquifer.

### General description of the Upper Rhine Graben and Alsatian aquifer

*et al.*2006).

The Alsatian aquifer surface is over 3,000 km^{2} and contains a volume of alluvium of about 250 billion m^{3}. It represents one of the largest freshwater reserves in Europe. The groundwater reservoir contains about 50 billion m^{3} of water, with an annual renewal of 1.3 billion m^{3}. This large aquifer is of vital importance since it supplies 75% of the drinking water requirements, 50% of the industrial water needs and 90% of the irrigation water needs in Alsace. The Alsatian part of the Rhine aquifer has a surface length of 160 km and a maximum width of 20 km (Hamond 1995; Bertrand *et al.* 2006).

The Alsatian aquifer is an extensive alluvial aquifer with a layered structure composed of a random superposition of different alluviums (clay, sand fine to rough, gravels, coarse etc.). This permeable alluvial has a thickness of a few metres at the Vosgean edge, and 150–200 m in the centre of the Rhine plain (Hamond 1995).

The groundwater reservoir is part of a complex hydrosystem which includes frequent exchanges between the rivers and the aquifer which vary with the seasons and it is highly exposed to contamination from rivers (Stengera & Willingerb 1998).

### History of the pollution of the aquifer by CCl_{4}

In 1970, a tanker truck containing carbon tetrachloride (CCl_{4}), property of a Dutch company, overturned in the north of Benfeld (a small town located about 35 km south of Strasbourg). According to a note from Service Géologique d'Alsace-Lorraine of 21 December 1971, about 4,000 litres of CCl_{4} were spread in the area of the accident, infiltrating into the ground or disappearing by evaporation (Hamond 1995). In 1991, the analyses carried out by Bureau de Recherche Géologique et Minières showed abnormal quantities of CCl_{4} in the supplies of drinking water (about 60 μg/L). These quantities exceeded the safe limits recommended by the World Health Organization (2 μg/L). This high level of CCl_{4} concentration has caused a serious problem in the region by contaminating the main sources of drinking water in the Alsace Region (France) (Aswed 2008).

The exact amount of chemical infiltrated in the subsoil is unknown, as is the behaviour of the pollution source through time. The assessment of such characteristics constitutes the main issue for the individuation and remediation of the accident effects and it is the objective of this study.

### ANN to study the Alsatian aquifer

The behaviour of the Alsatian aquifer unknown contaminant source is determined based on the contaminant concentration measured in the monitoring wells dislocated in the study area.

In a first phase, the ANN was trained to solve the direct problem by using a set of patterns created by means of flux and transport contaminant modelling software, where the patterns describe both the source of contaminant and the set of measurements in the monitoring wells,. The data set of the patterns was been constructed through a coherent number of hydrogeological scenarios, based on a 3D model of the domain developed by Aswed (2008). The input patterns were made of the pollution source features in terms of the injection rates in the four hydrogeological layers. The output patterns were contaminant concentration observation data at 45 monitoring wells. Sources and monitoring wells are related by a bi-univocal relationship, meaning that any specific profile of monitoring wells corresponds to one specific contaminant source behaviour.

In a second phase, the trained network was inverted in order to identify the source of contaminant corresponding to a set of data measured in correspondence with monitoring wells.

### Groundwater flow and contaminant transport numerical model

The CCl_{4} pollution of the Alsatian aquifer has been the subject of several studies (Vigouroux *et al.* 1983; Aswed 2008). The 3D flux and contaminant transport numerical model used for constructing the patterns was calibrated using measured data of CCl_{4} concentration that were collected over 12 years (1992–2004) and simulations were performed for 1970 to 2024.

The patterns have been simulated using the non-commercial numerical software Transport of RadioActiver Elements in the Subsurface (TRACES) that combines mixed-hybrid finite elements and discontinuous finite elements to solve for hydrodynamic state and mass transfer in porous media (Hoteit & Ackerer 2003). The contaminated zone is enclosed within a 3D domain of 6 km width, 20 km length, and about 110 m depth. This aquifer domain is discretized using a 3D triangular prismatic grid with 25,388 nodes and 45,460 elements. According to the estimated geometry of the cross-sections (the landfill site was divided into eight zones by soil type) the domain is discretized into ten layers. The contaminant source is located in the first four layers (the thicknesses of the layers are, respectively, 16 m, 4 m, 5 m, and 5 m from the top to the bottom). The volume of contaminated aquifer is about 230–1,300 m^{3}. The surface of contaminant infiltration is about 7–37 m^{2} (Aswed 2008).

Using TRACES, various states of pollution sources have been constructed adjusting the source characteristics in terms of injection rates over the vertical section. In total, 104 examples were constructed with the same duration of activity of about 31 years (11,520 days) and were located in the same positions in the domain (accident site in Benfeld). So for each of the 104 sources, a different evolution of the contaminant concentration through time for the four layers in the numerical domain has been considered. The total time of simulation is about 54 years (20,000 days).

### ANN pattern construction: elaboration and feature extraction

The example patterns obtained with TRACES consist of 208 matrices of contaminant concentrations:

104 matrices corresponding to the features of the pollution sources. These had dimensions [11,520 × 4]; 11,520 represents the time (days) and 4 represents the layers in the source location.

104 matrices corresponding to contaminant concentration in the monitoring wells. These had dimensions of [4,000 × 45]; 4,000 represents the time (one value for each 5 days) and 45 represents the monitoring wells in the domain.

The 104 input matrices and 104 output matrices have been reorganized to form two matrices which describe the whole training set: one for the input and one for the output.

For the first step, the 2D-FFT has been calculated for each matrix, using Matlab. For each case we obtained two vectors: one for the input that has the dimension of 46,084 elements and one for the output that has the dimension of 15,774 elements. Vectors, corresponding to the different cases, have been joined to form a matrix where each column corresponds to one example, so at the end of this step the dimensions of the two matrices are [46,084 × 104] for the input matrix and [15,774 × 104] for the output matrix, respectively.

In the second step, from each of these matrices the negligible components have been removed. A component is considered negligible if its amplitude does not pass a predetermined threshold in any of the examples. The value of the threshold is determined by seeking a crossover between the level of approximation of the acquired data and the dimension of the input and output matrices. For applying the PCA, the number of remaining values is pushed to have a number of rows less than or equal to the number of examples. As a result of this second step, the input matrix had the dimension [99 × 104] and the output matrix had the dimension [100 × 104].

In the third step, the PCA transform has been applied to further reduce the volume of data. The degree of compression that can be obtained depends on the fraction of information, evaluated as the variance of the distribution, that could be lost. Such a fraction has been determined by means of a trial-and-error procedure which provided a value of 2 × 10^{–3} for the input and 10^{–7} for the output. Consequently, the input matrix had dimension [11 × 104] and the output matrix dimension [36 × 104].

### ANN training and inversion

One 11-11-36 MLP has been trained with the Levenberg–Marquardt algorithm by presenting the pairs of input (pollution source scenarios) and target patterns (contaminant concentration for the 45 monitoring wells). Special attention has been paid to train the ANN in such a way that it is able to generalize the information contained in the training set. To this end, during the training phase, the error was evaluated even in the validation set, and the training was interrupted as soon as the error in the validation set started to rise (cross-validation stop criterion). A third set (test set) of examples has been put aside in order to check on a totally independent set the generalization capability of the trained ANN. The 104 examples have been so subdivided: 74 in the training set, 19 in the validation set and 11 in the test set.

Once the training phase was completed, the trained network was inverted to identify the contaminant source. To this end, an output pattern was generated on the basis of the contaminant concentrations measured in the 45 monitoring wells and the procedure described above, in the section ‘Inversion procedure’, was applied. In the present case, the number of hidden neurons is lower than the number of output neurons, so the system is over-determined. By applying Equation (2), the solution corresponding to the minimum sum squared error for the vector was found. The second equation in (1) states a bi-univocal relation between and , and then the constant term of the third equation in (1) was determined. Finally, by solving this equation for , the solution of the inverse problem was determined.

## RESULTS AND DISCUSSION

As can be noted, the calculated behaviour of the source satisfactorily fits the trend reported in Aswed (2008). The approximation is better in the deeper layers (layer 4) where the concentration is higher due to the deposition of the contaminant with time. The ripple and the boundary peaks do not correspond to a real behaviour of the source, but they are a consequence of the cut of frequencies performed in the feature extraction procedure. Standard signal filtering procedures, like Kalman filters, could be applied in order to improve the appearance of the signal, both filtering the ripple and automatically cutting the peaks by fixing a bound of the profile derivative. Nevertheless, this kind of post-processing of the results appears to be useless for an expert assessment.

## CONCLUSION

This work mainly focuses on an ANN-based methodology for solving the inverse problem in aquifer pollution. A crucial aspect of the procedure is represented by the training of the ANN, which has to guarantee a satisfactory approximation of the physical system, but at the same time it has to fulfil several requirements in terms of number of neurons and rank of the connection weight matrices. The trained ANN is analytically inverted to solve the inverse problem, which consists of determining the profile of the pollutant source on the basis of a set of measurements in monitoring wells. This procedure has been tested on a real case: an accident, which took place in the north-east of France in 1970, gave rise to contamination by CCl_{4} of the Alsatian aquifer. Various numerical model scenarios of the CCl_{4} contamination in the Alsatian aquifer have been generated to train the ANN to solve the direct problem of associating the pollution source characteristics with the measurements acquired in the monitoring wells. The inverse problem is solved by fixing the output pattern and analytically calculating the corresponding input. Finally, the obtained results are compared with the pollution source behaviour reconstructed by another method retrieved from the literature. The obtained results show the suitability of the method, which can provide simple and reasonable solutions to various problem in hydrogeology, especially for this kind of non-linear problem.