## Abstract

A surrogate model based groundwater optimization model was developed to solve the non-aqueous phase liquids (NAPLs) contaminated groundwater remediation optimization problem. To illustrate the impact of sampling method improvement to the surrogate model performance improvement, aiming at a nitrobenzene contaminated groundwater remediation problem, optimal Latin hypercube sampling (OLHS) method was introduced to sample data in the input variables feasible region, and a radial basis function artificial neural network was used to construct a surrogate model. Considering the surrogate model's uncertainty, a chance-constrained programming (CCP) model was constructed, and it was solved by genetic algorithm. The results showed the following, for the problem considered in this study. (1) Compared with the Latin hypercube sampling (LHS) method, the OLHS method improves the space-filling degree of sample points considerably. (2) The effects of the two sampling methods on surrogate model performance were analyzed through comparison of goodness of fit, residual and uncertainty. The results indicated that the OLHS-based surrogate model performed better than the LHS-based surrogate model. (3) The optimal remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained, which showed that the remediation cost increased with the confidence level. This work would be helpful for increasing surrogate model performance and lowering the risk of a groundwater remediation strategy.

## INTRODUCTION

The contamination of groundwater with organic contaminants such as chlorinated solvents and petroleum hydrocarbons has become a major environmental problem across the globe (Wilson 1989; Kuiper & Illangasekare 1998). Many of these organic contaminants are water-immiscible liquids that maintain separate phases in aqueous media, and in subsurface environments are generally known as non-aqueous phase liquids (NAPLs) (Mason & Kueper 1996; Fetter 1999). The conventional pump and treat technique is generally recognized to be inefficient for long term remediation of aquifers contaminated with NAPLs due to their low solubility and high interfacial tension. Surfactant enhanced aquifer remediation (SEAR), which is a chemical enhancement to the pump and treat technique, is a promising technique for removing NAPLs (Rathfelder *et al.* 2001; Qin *et al.* 2007). However, due to the high cost of the SEAR process, it is an important scientific problem to optimize and analyze the remediation strategy, which can improve the remediation efficiency and reduce the remediation cost (Luo & Lu 2014).

The simulation optimization solving process is computationally expensive due to the thousands of times calls of simulation model (Qin *et al.* 2007; Sreekanth & Datta 2011, 2015). Using a surrogate model to replace the simulation model is an effective way to reduce the computational burden (Zerpa *et al.* 2005; Kalwij & Peralta 2006; Qin *et al.* 2007; Sreekanth & Datta 2011). A surrogate model is an approximation model that can obtain a similar input-output relationship with the simulation model, with a lower computational cost (Jin *et al.* 2001; Sreekanth & Datta 2010; Yao *et al.* 2014). There are two main steps in the surrogate model constructing process: (1) collection of input-output data through running a sampling and simulation model; (2) construction of a surrogate model using the approximate fitting method, based on the input-output data (Jin *et al.* 2001; Huang *et al.* 2003).

Due to the fact that the surrogate model is calibrated based on a set of input-output samples, selecting a proper sampling method to collect input samples is a fundamental premise of constructing the surrogate model (Huang *et al.* 2003; Qin *et al.* 2007; He *et al.* 2008; Yan & Minsker 2011; Asher *et al.* 2015). In previous literatures, most researches used the Monte Carlo method to sample the input data (Huang *et al.* 2003; Qin *et al.* 2007; He *et al.* 2008; Yan & Minsker 2011). Xin *et al.* (2011), Luo *et al.* (2013) and Wu *et al.* (2015) attempted to use the Latin hypercube sampling (LHS) method to sample in the input variables feasible region. LHS has good space-filling properties on any single dimension, but it may have bad space-filling properties over the whole sampling space (Park 1994; Jourdan & Franco 2010). The optimal Latin hypercube sampling (OLHS) method, through the introduction of the optimization method, can find the LHS samples with the highest space filling degree (Park 1994; Kenny *et al.* 2000). There have been no researchers using the OLHS method in the surrogate model of a multi-phase flow simulation model.

To increase the surrogate model performance and lower the risk of the groundwater remediation strategy, OLHS method was introduced to sample the data of a multi-phase flow simulation model, and an OLHS-based surrogate model was introduced to the chance-constrained programming (CCP) model to identify the most cost effective remediation strategy at different confidence levels in this study.

Aiming at a nitrobenzene contaminated site remediation optimization problem, this paper entails the following tasks. (1) Adoption of OLHS and LHS to derive the input dataset of a multi-phase flow simulation model, and conduct the simulation model to collect the output dataset. (2) Construction of surrogate models of the multi-phase flow simulation model using a radial basis function artificial neural network (RBFANN) based on the obtained input-output datasets, and then analysis of the impact of the sampling method improving on the surrogate model performance. (3) Adoption of the OLHS-based RBFANN surrogate model in the CCP model for identifying the most cost effective remediation strategy at different confidence levels.

## MATERIALS AND METHODOLOGY

### Surrogate-based groundwater system optimization management

Surrogate-based groundwater system optimization management is a method that uses the surrogate model as the simulation model with an optimization model coupling method. In the optimization process, the surrogate model was used to replace the input-output relationship of the simulation model, and invoked as an objective function or constraint condition. Figure 1 shows the main steps of the proposed methodology.

### OLHS method

LHS is a stratified sampling method that was first proposed by McKay *et al.* (1979). Compared with the simple random sampling method, LHS guarantees equal coverage of the cumulative probability range using non-overlapping sampling intervals and sample data in each interval. Thereby an LHS has good space-filling properties on any single dimension. However, it may have bad space-filling properties over the whole sampling space when it is randomly selected (Jourdan & Franco 2010). In Figure 2, all the three sampling results are obtained with LHS; however, they have very different performance in term of space filling. Compared with Figure 2(a) and 2(b), Figure 2(c) is a better choice, where the points are more uniformly distributed in the domain.

*et al.*2000). Many criteria have been used to measure the space filling property, such as the max-min criterion (Liefvendahl & Stocki 2006; Jourdan & Franco 2010), the

*φ*criterion (Morris & Mitchell 1995; Kenny

_{p}*et al.*2000; Grosso

*et al.*2009), the Audze-Eglais criterion (Huang

*et al.*2003; Liefvendahl & Stocki 2006), the entropy criterion (Kenny

*et al.*2000), and the centered

*L*

_{2}-discrepancy (Fang

*et al.*2005). It is difficult to relate any of these criteria with the accuracy of the resulting surrogate models. Probably because of that, the choice of objective function for LHS optimization is not unanimous (Viana 2015). Jin

*et al.*(2005) proposed strategies of the

*φ*, entropy, and

_{p}*L*

_{2}discrepancy criteria based on the significant savings in computation time. Therefore, the centered

*L*-discrepancy is used, whose analytical expression is as follows (Fang

_{2}*et al.*2002):

where *n* is the number of samples, *s* is the number of variables, *x _{ki}* and

*x*are the

_{li}*i*

^{th}variable's normalized value of the

*k*

^{th}sample

*s*and

*i*

^{th}variable's normalized value of

*l*

^{th}samples, respectively. The matrix is defined by . is an matrix, of which each column is a randomly selected points permutation of .

### Radial basis function artificial neural network

RBFANN is a three-layered feed forward network consisting of input layer, hidden layer and output layer (Shen *et al.* 2011).

**X**is a

*T*dimensional input vector. Thus the output of the neurons in the RBFANN hidden layer is assumed as: where is the center associated with the neuron in the radial basis function hidden layer, ,

*H*is the number of hidden units, is the norm of , is a radial basis function (Chen

*et al.*1991; Baddari

*et al.*2009; Luo

*et al.*2016). Outputs of the neuron in the RBFANN output layer are linear combinations of the hidden layer neuron outputs as: where is the connecting weights from the hidden layer neuron to the output layer, is the threshold value of the output layer neuron.

### Chance-constrained programming

*et al.*2015; Ouyang

*et al.*2017). A typical CCP model can be written as: where

**is the decision variable vector, is the objective function, and imposing a condition that the constraint is satisfied with at least a probability of , and is the confidence level.**

## CASE STUDY

### Site overview

To illustrate the approximate performance of the OLHS-based surrogate model to the multi-phase flow simulation model, aiming at a nitrobenzene-contaminated aquifer in northeastern China, the surrogate models based on LHS and OLHS were constructed, and the performance of these two models were compared and analyzed.

The contaminated site is located in the second terrace of a valley alluvial plain in the lower reach of the Songhua River. The contaminated site is flat, with an average altitude of 185 m. The upper part of the soil consists of an upper Pleistocene silt and silty clay with a thickness of 1–2 m, while the lower part is made up of medium sand and gravel, with a thickness of about 16 m. The main recharge sources are precipitation and groundwater seepage, while the main discharge source is groundwater seepage. Groundwater flows from northeast to southwest. The objective simulation layer is pore phreatic water in loose rock mass, the buried depth is about 4 m. The study area and initial contaminant plume are shown in Figure 3.

Based on the contaminant distribution, a SEAR process with sodium lauryl sulfate as surfactant was designed to remediate the contaminated aquifer with 10 injection and one extraction wells. The 10 injection wells were located around the contamination plume and the extraction well was located in the central zone of the plume (Figure 4). During the SEAR operation, a 5% surfactant solution (volume fraction) was injected into the injection wells. To maintain the groundwater balance, the extraction rate was set equal to the total injection rate. The optimization objective was identifying the most cost effective strategies that can ensure that more than 90% of the contaminant is removed.

### Multi-phase flow numerical simulation model

The research domain was generalized as a heterogeneous and isotropic 3-D multi-phase flow and transport model with a horizontal area of 130 × 58 m^{2} and a depth of 18 m. The research domain was discretized into nine layers in the *z* direction, each layer was discretized into 65 × 29 grid blocks, and each grid block was 2 × 2 × 2 m³. The parameters partition map is in Figure 5. First-type boundary condition was assigned at the northeast and southwest boundaries of the site, while no-flux boundaries were assigned at the northwest and southeast boundaries. The upper boundary was the phreatic surface, and the lower boundary was granite, which was assigned as a no-flow boundary. The physical and chemical parameters are in Table 1, and the initial hydrogeological parameters of the study field are in Table 2.

Parameters . | Value . |
---|---|

Water density (kg/m^{3}) | 1,000 |

Nitrobenzene density (kg/m^{3}) | 1,205 |

Surfactant density (kg/m^{3}) | 1,090 |

Water viscosity (pa•s) | 0.001 |

Nitrobenzene viscosity (pa•s) | 0.00168 |

Nitrobenzene/water interfacial tension (N/m) | 0.02566 |

Nitrobenzene solubility in water (kg/m^{3}) | 1.9 |

Residual water saturation | 0.24 |

Residual nitrobenzene saturation | 0.17 |

Parameters . | Value . |
---|---|

Water density (kg/m^{3}) | 1,000 |

Nitrobenzene density (kg/m^{3}) | 1,205 |

Surfactant density (kg/m^{3}) | 1,090 |

Water viscosity (pa•s) | 0.001 |

Nitrobenzene viscosity (pa•s) | 0.00168 |

Nitrobenzene/water interfacial tension (N/m) | 0.02566 |

Nitrobenzene solubility in water (kg/m^{3}) | 1.9 |

Residual water saturation | 0.24 |

Residual nitrobenzene saturation | 0.17 |

Parameters . | Value . |
---|---|

Porosity | 0.25 |

Permeability in region I | |

Permeability in region II | |

Longitudinal dispersivity | 1.00 m |

Transverse dispersivity | 0.30 m |

Parameters . | Value . |
---|---|

Porosity | 0.25 |

Permeability in region I | |

Permeability in region II | |

Longitudinal dispersivity | 1.00 m |

Transverse dispersivity | 0.30 m |

*et al.*1996): where

*k*is the component index, including water ( = 1), oil ( = 2) and surfactant ( = 3);

*l*is the phase index including water ( = 1), oil ( = 2) and microemulsion ( = 3) phases, is porosity; is the overall concentration of component

*k*(volume fraction); is the density of component

*k*(kg/m

^{3}); is the concentration of component

*k*in phase

*l*(volume fraction); is the Darcy velocity of phase

*l*(m/ s); is the saturation of phase ; is the dispersion tensor (m

^{2}/s); is the total source/sink term for component

*k*(kg/ (m

^{3}s)). The Darcy velocity of phase

*l*can be calculated from the multiphase form of Darcy's law: where is the relative permeability of phase

*l*; is the soil permeability (m

^{2}); is the viscosity of phase

*l*(pa·s); is the density of phase

*l*(kg·m

^{−3});

*z*is the vertical distance (m).

*l*(Pa), is the initial saturation distribution of phase

*l*, is the initial concentration of component

*k*in phase

*l*(volume fraction).

*n*is the outside normal direction at .

The University of Texas Chemical Compositional Simulator (UTCHEM) was used to solve the mathematical model. UTCHEM is a three-dimensional, multi-phase, multi-component finite difference numerical simulator (Delshad *et al.* 1996; Bhattarai 2006).

The simulation stage was divided into a calibration stage (365 days), a validation stage (365 days), and a remediation stage. Measured and simulated groundwater table and contaminated concentrations at 15 observation points are compared in Figures 6 and 7. Visual inspection suggests that good agreement was generally obtained during the calibration and validation stage. The aquifer parameters after calibration are in Table 3.

Parameters . | Value . |
---|---|

Porosity | 0.30 |

Permeability in region I | |

Permeability in region II | |

Longitudinal dispersivity | 1.00 m |

Transverse dispersivity | 0.30 m |

Parameters . | Value . |
---|---|

Porosity | 0.30 |

Permeability in region I | |

Permeability in region II | |

Longitudinal dispersivity | 1.00 m |

Transverse dispersivity | 0.30 m |

After the calibration and validation stages, the multi-phase flow simulation model can be used for nitrobenzene concentration prediction.

### Surrogate model of multi-phase flow simulation model

It was assumed that the total rates of the extraction wells and injection wells were equal, and there was only one extraction well, so the extraction rate can be obtained with the summary of the 10 injection wells' rates. Therefore, the input variables consist of 11 elements: the rates of the 10 injection wells and the remediation duration. The output variable was the average nitrobenzene removal rate.

Considering the high cost of surfactant, the feasible region for the injection rate and remediation duration were taken as [1, 50 m³/d] and [1, 100 d], respectively. The sample number *n* was set at 100. For both LHS and OLHS methods, 100 input samples were collected through sampling in the 11 input variables' feasible region of the multiphase flow numerical simulation model, and the output responses (average nitrobenzene removal rate) were obtained through running the developed simulation model.

Based on the obtained input-output dataset, RBFANN was used to construct the surrogate model. In order to validate the accuracy of the surrogate model, another 40 samples were randomly selected as the validation data.

### Chance- constrained optimization model

#### Objective function and decision variables

*f*is the total cost of the remediation system ($), the first two terms account for the installation cost and the last two terms account for the operation cost; and are injection and extraction wells' installation cost coefficients, respectively ($); and are the injection and extraction operation cost coefficients, respectively ($/m

^{3}),

*m*and

*n*are the injection and extraction wells' number respectively, is the rate of the injection well (m

^{3}/d), is the rate of the extraction well (m

^{3}/d), and

*t*is the remediation duration (d).

#### Constraint conditions

Three groups of constraints are enforced during the optimization.

- (1)
Contaminant removal rate constraint:

- (a)
- (b)
Removal rate constraints with uncertainty

*et al.*2010; Ouyang

*et al.*2017): where

*e*denotes the residual caused by the surrogate model and can be considered a stochastic variable, and

*a*is the confidence level.

- (2)
- (3)

The constants of these equations are in Table 4.

Constant . | Value . |
---|---|

($) | 5,000 |

($) | 5,000 |

($/m^{3}) | 4 |

($/m^{3}) | 0.04 |

(m^{3}/d) | 50 |

(m^{3}/d) | 500 |

m | 10 |

n | 1 |

100 | |

0.90 |

Constant . | Value . |
---|---|

($) | 5,000 |

($) | 5,000 |

($/m^{3}) | 4 |

($/m^{3}) | 0.04 |

(m^{3}/d) | 50 |

(m^{3}/d) | 500 |

m | 10 |

n | 1 |

100 | |

0.90 |

## RESULTS AND DISCUSSION

### Sampling results analysis

For both the LHS and OLHS method, 100 training samples were collected in the feasible region of 11 decision variables (remediation duration and rate of 10 injection wells). The sampling results are in Table 5. In Table 5, each remediation strategy is a sample collected by LHS or OLHS, and each strategy including the rate of 10 injection wells and remediation duration. The centered *L*_{2} discrepancy is calculated as the optimal criteria. The smaller the centered *L*_{2} is, the better the samples space filling degree is. The centered *L*_{2} of LHS and OLHS are 0.2194 and 0.0017, respectively according to function (1). Therefore, the introduction of the OLHS method improves the space filling degree of samples significantly compared with the LHS method.

Strategies . | t (d)
. | Q (m³/d). | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | . | . | ||

LHS sampling results | |||||||||||

1 | 34.86 | 28.18 | 4.98 | 15.63 | 33.30 | 49.90 | 8.03 | 29.00 | 11.79 | 18.61 | 29.90 |

2 | 81.62 | 11.61 | 2.92 | 21.48 | 9.08 | 3.60 | 13.47 | 19.65 | 32.56 | 20.37 | 1.41 |

3 | 16.53 | 25.58 | 45.64 | 47.72 | 46.53 | 9.05 | 9.72 | 6.49 | 5.10 | 2.38 | 45.48 |

…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |

98 | 67.65 | 3.53 | 46.35 | 39.38 | 9.46 | 23.38 | 17.24 | 44.87 | 41.31 | 35.21 | 26.24 |

99 | 94.02 | 5.55 | 26.99 | 37.94 | 44.83 | 6.82 | 35.05 | 24.16 | 2.72 | 14.89 | 32.05 |

100 | 94.90 | 17.98 | 10.68 | 7.91 | 14.51 | 19.89 | 47.97 | 19.09 | 36.83 | 21.29 | 46.22 |

OLHS sampling results | |||||||||||

1 | 59.77 | 19.72 | 37.81 | 38.79 | 12.01 | 8.35 | 37.12 | 20.39 | 12.07 | 9.41 | 2.80 |

2 | 25.65 | 46.24 | 4.34 | 3.01 | 16.29 | 9.05 | 24.83 | 41.54 | 36.52 | 26.72 | 25.24 |

3 | 90.06 | 12.30 | 27.57 | 41.07 | 5.03 | 20.23 | 2.87 | 17.54 | 20.77 | 34.01 | 23.14 |

…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |

98 | 91.10 | 37.80 | 37.71 | 18.75 | 23.23 | 41.39 | 10.48 | 39.49 | 8.19 | 16.46 | 3.21 |

99 | 74.02 | 32.16 | 24.98 | 35.39 | 39.58 | 39.22 | 31.20 | 2.45 | 29.75 | 2.87 | 47.52 |

100 | 16.30 | 8.81 | 18.44 | 42.02 | 26.66 | 1.76 | 35.63 | 3.47 | 35.75 | 20.84 | 15.91 |

Strategies . | t (d)
. | Q (m³/d). | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | . | . | ||

LHS sampling results | |||||||||||

1 | 34.86 | 28.18 | 4.98 | 15.63 | 33.30 | 49.90 | 8.03 | 29.00 | 11.79 | 18.61 | 29.90 |

2 | 81.62 | 11.61 | 2.92 | 21.48 | 9.08 | 3.60 | 13.47 | 19.65 | 32.56 | 20.37 | 1.41 |

3 | 16.53 | 25.58 | 45.64 | 47.72 | 46.53 | 9.05 | 9.72 | 6.49 | 5.10 | 2.38 | 45.48 |

…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |

98 | 67.65 | 3.53 | 46.35 | 39.38 | 9.46 | 23.38 | 17.24 | 44.87 | 41.31 | 35.21 | 26.24 |

99 | 94.02 | 5.55 | 26.99 | 37.94 | 44.83 | 6.82 | 35.05 | 24.16 | 2.72 | 14.89 | 32.05 |

100 | 94.90 | 17.98 | 10.68 | 7.91 | 14.51 | 19.89 | 47.97 | 19.09 | 36.83 | 21.29 | 46.22 |

OLHS sampling results | |||||||||||

1 | 59.77 | 19.72 | 37.81 | 38.79 | 12.01 | 8.35 | 37.12 | 20.39 | 12.07 | 9.41 | 2.80 |

2 | 25.65 | 46.24 | 4.34 | 3.01 | 16.29 | 9.05 | 24.83 | 41.54 | 36.52 | 26.72 | 25.24 |

3 | 90.06 | 12.30 | 27.57 | 41.07 | 5.03 | 20.23 | 2.87 | 17.54 | 20.77 | 34.01 | 23.14 |

…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |

98 | 91.10 | 37.80 | 37.71 | 18.75 | 23.23 | 41.39 | 10.48 | 39.49 | 8.19 | 16.46 | 3.21 |

99 | 74.02 | 32.16 | 24.98 | 35.39 | 39.58 | 39.22 | 31.20 | 2.45 | 29.75 | 2.87 | 47.52 |

100 | 16.30 | 8.81 | 18.44 | 42.02 | 26.66 | 1.76 | 35.63 | 3.47 | 35.75 | 20.84 | 15.91 |

### Surrogate model performance analysis

Gauss function was used as the radial basis function, which is the widely used radial basis function. Orthogonal least square was used for RBFANN training on the MATLAB platform; the parameters of the trained RBFANN model are in Table 6, which were obtained by the serial trial and error method.

Parameters . | Value . |
---|---|

No. of input layer neurons | 11 |

No. of hidden layer neurons | 91 |

No. of output layer neurons | 1 |

Training objective error | 0.0025 |

Parameters . | Value . |
---|---|

No. of input layer neurons | 11 |

No. of hidden layer neurons | 91 |

No. of output layer neurons | 1 |

Training objective error | 0.0025 |

#### (1) Goodness of fit of surrogate models

The average removal rates of the 40 validation samples obtained using the two surrogate models were compared with those obtained using the developed simulation model, and the results are shown in Figure 8. *R*^{2} was selected to assess the goodness of fit of different surrogate models to the simulation model, which is also shown in Figure 8.

Figure 8 demonstrates that both of these two surrogate model have favorable fitting results with the simulation model. Furthermore, the points obtained with the OLHS-based surrogate model were closer to the 1:1 perfect line, and the *R*^{2} of the OLHS-based surrogate model was higher than that of the LHS-based surrogate model, which demonstrated that the OLHS-based surrogate model has a better fitting result.

#### (2) Residual of surrogate models

Root mean square error (RMSE) and root mean square percentage error (RMSPE) were selected to assess the error of the different surrogate models to the simulation model. The smaller the RMSE and RMSPE are, the more accurate the surrogate model is. The RMSE of the LHS based surrogate model and OLHS based surrogate model were 0.05 and 0.04, while the RMSPE of the LHS based surrogate model and OLHS based surrogate model were 15% and 33%, which demonstrated that the OLHS based surrogate model has a better accuracy.

#### (3) Uncertainty analysis of surrogate models

Figure 9 presents the histograms of residuals generated by the surrogate model for the validation points. The figures had approximately symmetrical shapes, and the normality of the residuals could be straightforwardly assumed. The Shapiro-Wilk test and T-test were performed to verify the normality of residuals and zero-mean hypothesis of the residuals, respectively, and the test results showed that the two hypotheses can be accepted. Through the comparison of Figure 9(a) and 9(b), we can conclude that the OLHS based surrogate model has a smaller uncertainty.

### Optimization results analysis

*et al.*2017). According to the uncertainty analysis results of the surrogate models the normality hypothesis and the zero-mean hypothesis can be accepted; therefore, Equation (6c) can be transformed to: where E[·] is the expected value, is the value of the standard normal cumulative distribution corresponding to a confidence level of , and is the standard deviation of

*e*.

The genetic algorithm was adopted to solve the developed nonlinear optimization model on the MATLAB platform. In the optimization process, the developed OLHS-based RBFANN surrogate model was invoked instead of the computational simulation model.

In the genetic algorithm searching process, selection probability, crossover probability and mutation probability are usually set between 0.7 − 1.0, 0.7 − 1.0, 0.01 − 0.05 (Simpson *et al.* 1994), and in this study they were set as 0.8, 0.8 and 0.05, and the initial population size and the maximum number of generations were set as 200 and 300, respectively. The optimal value of decision variables (optimal remediation strategy) at different confidence levels are in Table 7. The optimal remediation cost at different confidence levels is in Figure 10.

Confidence level . | Optimal remediation duration (d) . | Optimal wells rates (m^{3}/d). | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Injection wells . | Extraction well . | |||||||||||

Q
. ^{In}_{1} | Q
. ^{In}_{2} | Q
. ^{In}_{3} | Q
. ^{In}_{4} | Q
. ^{In}_{5} | Q
. ^{In}_{6} | Q
. ^{In}_{7} | Q
. ^{In}_{8} | Q
. ^{In}_{9} | Q
. ^{In}_{10} | Q
. ^{Ex} | ||

99% | 100 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 500 |

95% | 92 | 45 | 43 | 42 | 42 | 45 | 40 | 40 | 45 | 40 | 48 | 430 |

90% | 91 | 40 | 30 | 39 | 34 | 34 | 39 | 36 | 28 | 35 | 35 | 350 |

85% | 91 | 35 | 30 | 32 | 32 | 34 | 34 | 36 | 28 | 35 | 35 | 331 |

80% | 91 | 35 | 30 | 28 | 30 | 29 | 30 | 32 | 35 | 35 | 34 | 318 |

50% | 89 | 32 | 14 | 28 | 36 | 36 | 30 | 21 | 19 | 24 | 8 | 248 |

Confidence level . | Optimal remediation duration (d) . | Optimal wells rates (m^{3}/d). | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Injection wells . | Extraction well . | |||||||||||

Q
. ^{In}_{1} | Q
. ^{In}_{2} | Q
. ^{In}_{3} | Q
. ^{In}_{4} | Q
. ^{In}_{5} | Q
. ^{In}_{6} | Q
. ^{In}_{7} | Q
. ^{In}_{8} | Q
. ^{In}_{9} | Q
. ^{In}_{10} | Q
. ^{Ex} | ||

99% | 100 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 500 |

95% | 92 | 45 | 43 | 42 | 42 | 45 | 40 | 40 | 45 | 40 | 48 | 430 |

90% | 91 | 40 | 30 | 39 | 34 | 34 | 39 | 36 | 28 | 35 | 35 | 350 |

85% | 91 | 35 | 30 | 32 | 32 | 34 | 34 | 36 | 28 | 35 | 35 | 331 |

80% | 91 | 35 | 30 | 28 | 30 | 29 | 30 | 32 | 35 | 35 | 34 | 318 |

50% | 89 | 32 | 14 | 28 | 36 | 36 | 30 | 21 | 19 | 24 | 8 | 248 |

## CONCLUSIONS

To illustrate the impact of sampling method improvement on improving the surrogate model performance, aiming at a nitrobenzene contaminated groundwater remediation problem, the OLHS method was introduced to sample data in the input variables feasible region of a multi-phase flow numerical simulation model, and RBFANN was used to construct surrogate models. Considering the surrogate model uncertainty, CCP model was constructed, and the remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained. The main conclusions of this research are as follows:

- (1)
The centered

*L*_{2}-discrepancy demonstrated that compared with the LHS method, the OLHS method introduces an optimal criterion into the LHS process, which improves the space-filling degree of the sample points considerably. - (2)
The OLHS-based surrogate model performed better than the LHS-based surrogate model in terms of the goodness of fit (R

^{2}), residual (RMSE and RMSPE) and uncertainty. That is to say, the OLHS-based surrogate model was more appropriate for use in the NAPLs contaminated aquifer remediation optimization process. Therefore, in the following optimization process, the developed OLHS-based surrogate model was invoked for assessing the remediation efficiency of different remediation strategies. - (3)
The surrogate-modeling uncertainty could be successfully handled by CCP. The optimal remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained. For achieving a higher confidence level, the remediation cost must be increased.

## ACKNOWLEDGEMENTS

This research was supported by the National Natural Science Foundation of China (No. 41502221), the China Postdoctoral Science Foundation (No. 2015M570274) and the National Natural Science Foundation of China (No. 41372237). The authors would like to thank editors and anonymous reviewers for their insightful comments and suggestions, which improved this manuscript.

## REFERENCES

*.*

*.*