A least squares method for identi ﬁ cation of unknown groundwater pollution source

The identi ﬁ cation of unknown groundwater pollution sources is one of the most important premises in groundwater pollution prevention and remediation. In this paper, an exploratory application of a least squares method to identify the unknown groundwater pollution source is conducted. Supported by a small amount of observation data and the analytical solutions of the pollutant transport model, the initial concentration, the leakage location and the pollutant mass are identi ﬁ ed by using the least squares method under a sand tank experiment and a gas station area. In the sand tank experiment, it is found that the ﬁ tting errors of three cross-sections are within 6%. In the gas station area, it is found that the results are nearly consistent with the site investigation information. The results indicate that the least squares method has considerable application values in the identi ﬁ cation of groundwater pollution sources.


INTRODUCTION
Groundwater pollution could cause harm to human health, ecosystem and other environment, which has resulted in great urgency for protecting groundwater environments. The investigation of groundwater pollution sources is the premise of groundwater pollution remediation. However, the cost of acquiring the information of pollutants through observation is relatively high, and the observed data is usually not adequate to determine the location of pollution sources and the range of pollution plumes (Ayvaz ). In general, groundwater pollution source identification includes the identification of location, type, concentration and the diffusion range of the pollutant (Gu et al. ), which can make up for the above shortcomings effectively. Therefore, the identification of groundwater pollution sources becomes one of the most important premise tasks in groundwater pollution prevention, remediation and health risk assessment (Chang & Kashani ).
There are many methods to identify pollution sources in groundwater, which can be summarized by the following methods: the geochemical footprint method, the stochastic method and the optimization method (Mahar & Datta ). The geochemical footprint method refers to the method to obtain the parameters of pollution sources by means of isotope or chemical fingerprinting. Huang applied a simulation-optimization model by integrating MODFLOW and MT3DMS into a shuffled complex evolution optimization algorithm, provided an approach to solve the source identification problems under the complex conditions (Huang et al. ). Borah proved that the numerical-and ANN-based simulation-optimization models have the potential for real-world field applications (Borah & Bhattacharjya ). In addition, the Bayesian global optimization algorithm (Pirot et al. ) and the alternative model method (Zhao et al. ; Xia et al. ) are also applied to identify contaminant source localization.
At present, there are few researches on the identification of groundwater pollution sources by the least squares method.
The least squares method is a kind of mathematical optimization technology. It is widely used to estimate the numerical values of the parameters by fitting a function to a set of measured data (See et al. ). In this paper, the least squares method is applied to the laboratory and the site area to evaluate the identification results, and the limitation in the identification of groundwater pollution source is discussed.

Least squares method in groundwater pollution sources identification
During the migration of pollutant in porous media, processes such as convection, dispersion and adsorption, as well as chemical processes such as oxidation-reduction, ion exchange, dissolution and sedimentation, and biodegradation, often occur. And these processes are also influenced by hydrogeological conditions such as formation lithology, aquifer thickness and saturation. Therefore, it is generally considered that the concentration distribution of pollutants in aquifers is a multiple nonlinear mathematical model. When using the least squares method to identify groundwater pollution sources, it can be considered as a multiple nonlinear regression process using the least squares method.
After obtained a small amount of measured data and the hydrogeological conditions of the site, the identification of groundwater pollution sources is a process that uses certain mathematical methods to invert the relevant information of pollution sources. The information of pollution sources usually includes sources' amount, location, intensity, pollution events and the spatiotemporal distribution of pollutants.
In a certain pollution situation, there is a batch of pollutant concentration monitoring data with time change on the section with the distance of X from the pollution source Groundwater flow and transport simulation The Darcy's Law is the basic law to describe the groundwater flow in the saturated porous media. It can be written as Equation (2), where V i is a vector of the average linear velocity of groundwater flow (LT À1 ), n e is the effective porosity (dimensionless), K ij is the hydraulic conductivity tensor of the porous media (LT À1 ), ℎ is the hydraulic head (L) and x i are the Cartesian coordinates (Yeh et al. ).
Based on three assumptions, the solute transport in the saturated and homogeneous aquifer, the groundwater flow conforms to Darcy's Law and the solute is chemically conservative, the solute transport models in groundwater can be described as the advection-dispersion equation (ADE), and it can be written as Equation (3), where C is the concentration of solute (ML À3 ), D ii is the dispersion coefficient along with different axis (LT À1 ) and u i is the Darcy velocity (LT À1 ).

STUDY AREA AND DATA
Case 1: sand tank experiment In this paper, the least squares method is used to identify groundwater pollution sources based on the total petroleum hydrocarbon (TPH) migration test data of indoor sand tanks. Figure  sample is detected by an infrared spectrophotometer at a certain time interval.
Case 2: gas station area There was a gas station with an area of about 8,700 m 2 in the study area, which was discarded in the 1970s. Due to the vulnerable structure and redox processes for many years, the storage tanks (including six diesel tanks and four gasoline tanks) were corroded, and the leakage has polluted the aquifer. Figure  yield. In addition, bedrock was found at a depth of 40 m.
In the gas station area, toluene in the phreatic aquifer is recognized as the characteristic pollutant. Toluene pollution is mainly from gasoline, while the content of diesel is very small. From July 2017 to September 2018, three wells in the study area were continuously monitored every 2 months.
The release history of Methylbenzene at 3# is shown in Figure 3.

Variation of components in the sand tank experiment
In order to investigate TPH's behavior in aquifer, the TPH concentration, pH and some concentrations of conventional ions including NO À 3 and SO 2À 4 are monitored. Because the main groundwater flow direction is lined with B2 to B5, we analyzed this cross-sectional data partially. The profile of pH, NO À 3 and SO 2À 4 can be seen in Figure 4, while the TPH concentration profile in the sand tank experiment can be seen in Figure 6.
pH at B5 showed a trend of first decreasing and then increasing. The decrease is because the respiration of microorganisms produces a certain amount of CO 2 , which is slightly soluble in water and turns into H 2 CO 3 , moreover, the degradation of petroleum pollutants is a process involved acid production, as can be seen in Equation (4).
When the pollutants had decreased to a certain degree, microbial degradation abated and water-rock interactions can led to pH increasing (Qian et al. ). In addition,  due to the low concentration of petroleum pollutants, the microbial degradation is weak, resulting in a minor variation of pH. NO À 3 tends to decrease first and then increase. Since NO À 2 has been detected in some samples, it can be concluded that the decrease of NO À 3 is due to the microbial denitrification in Equation (5), in which NO À 3 was an electron acceptor to be consumed. Generally, the priority of the electron acceptor being consumed is DO > NO À 3 >Fe 3þ > SO 2À 4 , a small variation of SO 2À 4 indicated that NO À 3 and Fe 3þ are not run out. Therefore, the above results show that the petroleum pollutants in sand tank and gas station are affected by a weak microbial degradation. The results can provide a reliable basis for the application of the least squares method identification under a sand tank experiment and a gas station area.

Mathematical model
Based on the hydrogeological conditions of the contaminated site and the pollution history, it is found that the gas station has been discarded since the 1970s. It is concluded that the oil pollution source has stopped leaking. The reason why pollutants can be detected in groundwater at present is that the terrain of the site is flat, the migration The analytical solution of the TPH concentration at t from the distance x can be written as follows: Hydrogeological parameters values   for the section at 1.05 m, the calculated leakage position X 0 is 1.0565 m, the relative error is 4.40%, the pollutant mass M is 0.6264 kg, the relative error is 5.05%, the initial leakage time T 0 is 0.1064 d and the relative error is 6.40%.
It seems that better identification results will be obtained when the distance to pollutant is further, which can contribute to the vibration of data when the groundwater flows under a transient condition at the very first pollutant injection time. Overall, the fitting errors of each section are within 6%, which is reasonably minor for the distance, initial leak time and mass. Therefore, it can be concluded that the least squares method has a good effect in identifying the initial leakage location, the mass of pollutants and the time of initial leakage of groundwater pollution sources.

Case 2: gas station area
The pollution source parameters M, T 0 and X 0 calculated by the least squares method are 11.0 kg, 4,282 d and 35.5 m, respectively. Figure 7 shows the least squares fitting curve in the gas station area. The initial concentration of all sections falls evenly on both sides of the fitted concentration   in China used to be made of steel, and the standard specification thickness of the steel is usually 6 mm. Due to its longterm contact with air and water, the thickness of the steel will become thinner due to the corrosion, which will eventually lead to the leakage of oil pollutants to varying degrees. In this paper, the corrosion rate is assumed to be 0:4 mm=year, i.e. the damage of the oil tank takes 15 years.
This assumption is also in line with the actual condition.

Application conditions
From the steps of solving the problem, it can be seen that there are three applicable conditions for using the least squares method to identify groundwater pollution sources: (1) the migration model of pollutants in groundwater flow field is obtained accurately. When determining the function type of pollutant migration, it is necessary to determine the mathematical model of pollutant migration in combination with the hydrogeological conditions of the site, so that the

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.