## Abstract

A surrogate model based groundwater optimization model was developed to solve the non-aqueous phase liquids (NAPLs) contaminated groundwater remediation optimization problem. To illustrate the impact of sampling method improvement to the surrogate model performance improvement, aiming at a nitrobenzene contaminated groundwater remediation problem, optimal Latin hypercube sampling (OLHS) method was introduced to sample data in the input variables feasible region, and a radial basis function artificial neural network was used to construct a surrogate model. Considering the surrogate model's uncertainty, a chance-constrained programming (CCP) model was constructed, and it was solved by genetic algorithm. The results showed the following, for the problem considered in this study. (1) Compared with the Latin hypercube sampling (LHS) method, the OLHS method improves the space-filling degree of sample points considerably. (2) The effects of the two sampling methods on surrogate model performance were analyzed through comparison of goodness of fit, residual and uncertainty. The results indicated that the OLHS-based surrogate model performed better than the LHS-based surrogate model. (3) The optimal remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained, which showed that the remediation cost increased with the confidence level. This work would be helpful for increasing surrogate model performance and lowering the risk of a groundwater remediation strategy.

## INTRODUCTION

The contamination of groundwater with organic contaminants such as chlorinated solvents and petroleum hydrocarbons has become a major environmental problem across the globe (Wilson 1989; Kuiper & Illangasekare 1998). Many of these organic contaminants are water-immiscible liquids that maintain separate phases in aqueous media, and in subsurface environments are generally known as non-aqueous phase liquids (NAPLs) (Mason & Kueper 1996; Fetter 1999). The conventional pump and treat technique is generally recognized to be inefficient for long term remediation of aquifers contaminated with NAPLs due to their low solubility and high interfacial tension. Surfactant enhanced aquifer remediation (SEAR), which is a chemical enhancement to the pump and treat technique, is a promising technique for removing NAPLs (Rathfelder et al. 2001; Qin et al. 2007). However, due to the high cost of the SEAR process, it is an important scientific problem to optimize and analyze the remediation strategy, which can improve the remediation efficiency and reduce the remediation cost (Luo & Lu 2014).

The simulation optimization solving process is computationally expensive due to the thousands of times calls of simulation model (Qin et al. 2007; Sreekanth & Datta 2011, 2015). Using a surrogate model to replace the simulation model is an effective way to reduce the computational burden (Zerpa et al. 2005; Kalwij & Peralta 2006; Qin et al. 2007; Sreekanth & Datta 2011). A surrogate model is an approximation model that can obtain a similar input-output relationship with the simulation model, with a lower computational cost (Jin et al. 2001; Sreekanth & Datta 2010; Yao et al. 2014). There are two main steps in the surrogate model constructing process: (1) collection of input-output data through running a sampling and simulation model; (2) construction of a surrogate model using the approximate fitting method, based on the input-output data (Jin et al. 2001; Huang et al. 2003).

Due to the fact that the surrogate model is calibrated based on a set of input-output samples, selecting a proper sampling method to collect input samples is a fundamental premise of constructing the surrogate model (Huang et al. 2003; Qin et al. 2007; He et al. 2008; Yan & Minsker 2011; Asher et al. 2015). In previous literatures, most researches used the Monte Carlo method to sample the input data (Huang et al. 2003; Qin et al. 2007; He et al. 2008; Yan & Minsker 2011). Xin et al. (2011), Luo et al. (2013) and Wu et al. (2015) attempted to use the Latin hypercube sampling (LHS) method to sample in the input variables feasible region. LHS has good space-filling properties on any single dimension, but it may have bad space-filling properties over the whole sampling space (Park 1994; Jourdan & Franco 2010). The optimal Latin hypercube sampling (OLHS) method, through the introduction of the optimization method, can find the LHS samples with the highest space filling degree (Park 1994; Kenny et al. 2000). There have been no researchers using the OLHS method in the surrogate model of a multi-phase flow simulation model.

To increase the surrogate model performance and lower the risk of the groundwater remediation strategy, OLHS method was introduced to sample the data of a multi-phase flow simulation model, and an OLHS-based surrogate model was introduced to the chance-constrained programming (CCP) model to identify the most cost effective remediation strategy at different confidence levels in this study.

Aiming at a nitrobenzene contaminated site remediation optimization problem, this paper entails the following tasks. (1) Adoption of OLHS and LHS to derive the input dataset of a multi-phase flow simulation model, and conduct the simulation model to collect the output dataset. (2) Construction of surrogate models of the multi-phase flow simulation model using a radial basis function artificial neural network (RBFANN) based on the obtained input-output datasets, and then analysis of the impact of the sampling method improving on the surrogate model performance. (3) Adoption of the OLHS-based RBFANN surrogate model in the CCP model for identifying the most cost effective remediation strategy at different confidence levels.

## MATERIALS AND METHODOLOGY

### Surrogate-based groundwater system optimization management

Surrogate-based groundwater system optimization management is a method that uses the surrogate model as the simulation model with an optimization model coupling method. In the optimization process, the surrogate model was used to replace the input-output relationship of the simulation model, and invoked as an objective function or constraint condition. Figure 1 shows the main steps of the proposed methodology.

Figure 1

Flow chart of surrogate-based groundwater system optimization management.

Figure 1

Flow chart of surrogate-based groundwater system optimization management.

### OLHS method

LHS is a stratified sampling method that was first proposed by McKay et al. (1979). Compared with the simple random sampling method, LHS guarantees equal coverage of the cumulative probability range using non-overlapping sampling intervals and sample data in each interval. Thereby an LHS has good space-filling properties on any single dimension. However, it may have bad space-filling properties over the whole sampling space when it is randomly selected (Jourdan & Franco 2010). In Figure 2, all the three sampling results are obtained with LHS; however, they have very different performance in term of space filling. Compared with Figure 2(a) and 2(b), Figure 2(c) is a better choice, where the points are more uniformly distributed in the domain.

Figure 2

Different samples generated by LHS (size n = 9 from two variables that have uniform distributions).

Figure 2

Different samples generated by LHS (size n = 9 from two variables that have uniform distributions).

OLHS is a method that introduce an optimal criterion into the LHS process, and selects the optimal Latin hypercube design on this criteria, and then sample randomly based on this Latin hypercube design (Kenny et al. 2000). Many criteria have been used to measure the space filling property, such as the max-min criterion (Liefvendahl & Stocki 2006; Jourdan & Franco 2010), the φp criterion (Morris & Mitchell 1995; Kenny et al. 2000; Grosso et al. 2009), the Audze-Eglais criterion (Huang et al. 2003; Liefvendahl & Stocki 2006), the entropy criterion (Kenny et al. 2000), and the centered L2-discrepancy (Fang et al. 2005). It is difficult to relate any of these criteria with the accuracy of the resulting surrogate models. Probably because of that, the choice of objective function for LHS optimization is not unanimous (Viana 2015). Jin et al. (2005) proposed strategies of the φp, entropy, and L2 discrepancy criteria based on the significant savings in computation time. Therefore, the centered L2-discrepancy is used, whose analytical expression is as follows (Fang et al. 2002):
(1)

where n is the number of samples, s is the number of variables, xki and xli are the ith variable's normalized value of the kth samples and ith variable's normalized value of lth samples, respectively. The matrix is defined by . is an matrix, of which each column is a randomly selected points permutation of .

### Radial basis function artificial neural network

RBFANN is a three-layered feed forward network consisting of input layer, hidden layer and output layer (Shen et al. 2011).

X is a T dimensional input vector. Thus the output of the neurons in the RBFANN hidden layer is assumed as:
(2)
where is the center associated with the neuron in the radial basis function hidden layer, , H is the number of hidden units, is the norm of , is a radial basis function (Chen et al. 1991; Baddari et al. 2009; Luo et al. 2016). Outputs of the neuron in the RBFANN output layer are linear combinations of the hidden layer neuron outputs as:
(3)
where is the connecting weights from the hidden layer neuron to the output layer, is the threshold value of the output layer neuron.

### Chance-constrained programming

Chance-constrained programming (CCP) was pioneered by Charnes & Cooper (1959) as a means of handling uncertainty by specifying a confidence level at which the stochastic constraint is intended to hold. CCP can be used to analyze the risk of violating the uncertain constraints and to recognize the trade-offs between the objective function and system risk, tolerance levels of constraints, and prescribed levels of probability, which are valuable to decision makers (Liu et al. 2015; Ouyang et al. 2017). A typical CCP model can be written as:
(4)
where is the decision variable vector, is the objective function, and imposing a condition that the constraint is satisfied with at least a probability of , and is the confidence level.

## CASE STUDY

### Site overview

To illustrate the approximate performance of the OLHS-based surrogate model to the multi-phase flow simulation model, aiming at a nitrobenzene-contaminated aquifer in northeastern China, the surrogate models based on LHS and OLHS were constructed, and the performance of these two models were compared and analyzed.

The contaminated site is located in the second terrace of a valley alluvial plain in the lower reach of the Songhua River. The contaminated site is flat, with an average altitude of 185 m. The upper part of the soil consists of an upper Pleistocene silt and silty clay with a thickness of 1–2 m, while the lower part is made up of medium sand and gravel, with a thickness of about 16 m. The main recharge sources are precipitation and groundwater seepage, while the main discharge source is groundwater seepage. Groundwater flows from northeast to southwest. The objective simulation layer is pore phreatic water in loose rock mass, the buried depth is about 4 m. The study area and initial contaminant plume are shown in Figure 3.

Figure 3

Contaminant concentration distribution at initial time of remediation stage: (a) X–Y plane (ninth layer), (b) ISO surface.

Figure 3

Contaminant concentration distribution at initial time of remediation stage: (a) X–Y plane (ninth layer), (b) ISO surface.

Based on the contaminant distribution, a SEAR process with sodium lauryl sulfate as surfactant was designed to remediate the contaminated aquifer with 10 injection and one extraction wells. The 10 injection wells were located around the contamination plume and the extraction well was located in the central zone of the plume (Figure 4). During the SEAR operation, a 5% surfactant solution (volume fraction) was injected into the injection wells. To maintain the groundwater balance, the extraction rate was set equal to the total injection rate. The optimization objective was identifying the most cost effective strategies that can ensure that more than 90% of the contaminant is removed.

Figure 4

Locations of the injection and extraction wells at the contaminated site.

Figure 4

Locations of the injection and extraction wells at the contaminated site.

### Multi-phase flow numerical simulation model

The research domain was generalized as a heterogeneous and isotropic 3-D multi-phase flow and transport model with a horizontal area of 130 × 58 m2 and a depth of 18 m. The research domain was discretized into nine layers in the z direction, each layer was discretized into 65 × 29 grid blocks, and each grid block was 2 × 2 × 2 m³. The parameters partition map is in Figure 5. First-type boundary condition was assigned at the northeast and southwest boundaries of the site, while no-flux boundaries were assigned at the northwest and southeast boundaries. The upper boundary was the phreatic surface, and the lower boundary was granite, which was assigned as a no-flow boundary. The physical and chemical parameters are in Table 1, and the initial hydrogeological parameters of the study field are in Table 2.

Table 1

Physical and chemical parameters in simulation model

ParametersValue
Water density (kg/m31,000
Nitrobenzene density (kg/m31,205
Surfactant density (kg/m31,090
Water viscosity (pa•s) 0.001
Nitrobenzene viscosity (pa•s) 0.00168
Nitrobenzene/water interfacial tension (N/m) 0.02566
Nitrobenzene solubility in water (kg/m31.9
Residual water saturation 0.24
Residual nitrobenzene saturation 0.17
ParametersValue
Water density (kg/m31,000
Nitrobenzene density (kg/m31,205
Surfactant density (kg/m31,090
Water viscosity (pa•s) 0.001
Nitrobenzene viscosity (pa•s) 0.00168
Nitrobenzene/water interfacial tension (N/m) 0.02566
Nitrobenzene solubility in water (kg/m31.9
Residual water saturation 0.24
Residual nitrobenzene saturation 0.17
Table 2

Initial hydrogeological parameters of the study field

ParametersValue
Porosity 0.25
Permeability in region I
Permeability in region II
Longitudinal dispersivity 1.00 m
Transverse dispersivity 0.30 m
ParametersValue
Porosity 0.25
Permeability in region I
Permeability in region II
Longitudinal dispersivity 1.00 m
Transverse dispersivity 0.30 m
Figure 5

Parameters partition map.

Figure 5

Parameters partition map.

A three-dimensional mathematical model is constructed to evaluate the efficiency of SEAR strategies. The basis mass conservation equation for each component can be written as (Delshad et al. 1996):
(5a)
where k is the component index, including water ( = 1), oil ( = 2) and surfactant ( = 3); l is the phase index including water ( = 1), oil ( = 2) and microemulsion ( = 3) phases, is porosity; is the overall concentration of component k (volume fraction); is the density of component k (kg/m3); is the concentration of component k in phase l (volume fraction); is the Darcy velocity of phase l (m/ s); is the saturation of phase ; is the dispersion tensor (m2/s); is the total source/sink term for component k (kg/ (m3 s)). The Darcy velocity of phase l can be calculated from the multiphase form of Darcy's law:
(5b)
where is the relative permeability of phase l; is the soil permeability (m2); is the viscosity of phase l (pa·s); is the density of phase l (kg·m−3); z is the vertical distance (m).
The initial conditions can be expressed as:
(5c)

(5d)

(5e)
where Equation (5c) is the initial pressure condition, Equation (5d) is the initial saturation condition and Equation (5e) is the initial concentration condition, is the initial pressure distribution of phase l (Pa), is the initial saturation distribution of phase l, is the initial concentration of component k in phase l (volume fraction).
The boundary conditions can be expressed as:
(5f)

(5g)

(5h)

(5i)
where Equation (5f) is the specified pressure boundary, Equation (5g) is the no flow boundary, Equation (5h) is the specified concentration boundary, Equation (5i) is the no-flux boundary, and are known functions at n is the outside normal direction at .

The University of Texas Chemical Compositional Simulator (UTCHEM) was used to solve the mathematical model. UTCHEM is a three-dimensional, multi-phase, multi-component finite difference numerical simulator (Delshad et al. 1996; Bhattarai 2006).

The simulation stage was divided into a calibration stage (365 days), a validation stage (365 days), and a remediation stage. Measured and simulated groundwater table and contaminated concentrations at 15 observation points are compared in Figures 6 and 7. Visual inspection suggests that good agreement was generally obtained during the calibration and validation stage. The aquifer parameters after calibration are in Table 3.

Table 3

Aquifer parameters obtained with calibration stage

ParametersValue
Porosity 0.30
Permeability in region I
Permeability in region II
Longitudinal dispersivity 1.00 m
Transverse dispersivity 0.30 m
ParametersValue
Porosity 0.30
Permeability in region I
Permeability in region II
Longitudinal dispersivity 1.00 m
Transverse dispersivity 0.30 m
Figure 6

Measured groundwater table vs. simulated groundwater table during (a) calibration stage and (b) validation stage.

Figure 6

Measured groundwater table vs. simulated groundwater table during (a) calibration stage and (b) validation stage.

Figure 7

Measured contaminated concentration vs. simulated contaminated concentration during (a) calibration stage and (b) validation stage.

Figure 7

Measured contaminated concentration vs. simulated contaminated concentration during (a) calibration stage and (b) validation stage.

After the calibration and validation stages, the multi-phase flow simulation model can be used for nitrobenzene concentration prediction.

### Surrogate model of multi-phase flow simulation model

It was assumed that the total rates of the extraction wells and injection wells were equal, and there was only one extraction well, so the extraction rate can be obtained with the summary of the 10 injection wells' rates. Therefore, the input variables consist of 11 elements: the rates of the 10 injection wells and the remediation duration. The output variable was the average nitrobenzene removal rate.

Considering the high cost of surfactant, the feasible region for the injection rate and remediation duration were taken as [1, 50 m³/d] and [1, 100 d], respectively. The sample number n was set at 100. For both LHS and OLHS methods, 100 input samples were collected through sampling in the 11 input variables' feasible region of the multiphase flow numerical simulation model, and the output responses (average nitrobenzene removal rate) were obtained through running the developed simulation model.

Based on the obtained input-output dataset, RBFANN was used to construct the surrogate model. In order to validate the accuracy of the surrogate model, another 40 samples were randomly selected as the validation data.

### Chance-constrained optimization model

#### Objective function and decision variables

To identify the optimal remediation strategy, a nonlinear optimization model was developed using the minimal remediation cost as the objective function, with the extraction, injection rates and remediation duration as the decision variables. The objective function can be represented as:
(6a)
where f is the total cost of the remediation system ($), the first two terms account for the installation cost and the last two terms account for the operation cost; and are injection and extraction wells' installation cost coefficients, respectively ($); and are the injection and extraction operation cost coefficients, respectively ($/m3), m and n are the injection and extraction wells' number respectively, is the rate of the injection well (m3/d), is the rate of the extraction well (m3/d), and t is the remediation duration (d). #### Constraint conditions Three groups of constraints are enforced during the optimization. • (1) Contaminant removal rate constraint: • (a) Deterministic contaminant removal rate constraint (6b) where is the average contaminant removal rate, which is an output response of the surrogate model, is the minimum allowable value of the contaminant average removal rate. • (b) Removal rate constraints with uncertainty If the statistical tests show that the normality hypothesis and the zero-mean hypothesis can be accepted, then the surrogate residuals are linked to the deterministic optimization formulation. Correspondingly, constraints (6b) are expressed as a probability of the contaminant removal rate no less than the environmental standard (He et al. 2010; Ouyang et al. 2017): (6c) where e denotes the residual caused by the surrogate model and can be considered a stochastic variable, and a is the confidence level. • (2) Remediation duration constraint: (6d) where is the maximum allowable remediation duration. • (3) Wells rates constraint: (6e) (6f) (6g) where and are maximum allowable injection rate and extraction rates of wells (m3/d). The constants of these equations are in Table 4. Table 4 Constant included in the optimization model ConstantValue ($) 5,000
($) 5,000 ($/m3
($/m30.04 (m3/d) 50 (m3/d) 500 m 10 n 100 0.90 ConstantValue ($) 5,000
($) 5,000 ($/m3
(\$/m30.04
(m3/d) 50
(m3/d) 500
m 10
n
100
0.90

## RESULTS AND DISCUSSION

### Sampling results analysis

For both the LHS and OLHS method, 100 training samples were collected in the feasible region of 11 decision variables (remediation duration and rate of 10 injection wells). The sampling results are in Table 5. In Table 5, each remediation strategy is a sample collected by LHS or OLHS, and each strategy including the rate of 10 injection wells and remediation duration. The centered L2 discrepancy is calculated as the optimal criteria. The smaller the centered L2 is, the better the samples space filling degree is. The centered L2 of LHS and OLHS are 0.2194 and 0.0017, respectively according to function (1). Therefore, the introduction of the OLHS method improves the space filling degree of samples significantly compared with the LHS method.

Table 5

Sampling results

Strategiest (d)Q (m³/d)
LHS sampling results
34.86 28.18 4.98 15.63 33.30 49.90 8.03 29.00 11.79 18.61 29.90
81.62 11.61 2.92 21.48 9.08 3.60 13.47 19.65 32.56 20.37 1.41
16.53 25.58 45.64 47.72 46.53 9.05 9.72 6.49 5.10 2.38 45.48
…… …… …… …… …… …… …… …… …… …… …… ……
98 67.65 3.53 46.35 39.38 9.46 23.38 17.24 44.87 41.31 35.21 26.24
99 94.02 5.55 26.99 37.94 44.83 6.82 35.05 24.16 2.72 14.89 32.05
100 94.90 17.98 10.68 7.91 14.51 19.89 47.97 19.09 36.83 21.29 46.22
OLHS sampling results
59.77 19.72 37.81 38.79 12.01 8.35 37.12 20.39 12.07 9.41 2.80
25.65 46.24 4.34 3.01 16.29 9.05 24.83 41.54 36.52 26.72 25.24
90.06 12.30 27.57 41.07 5.03 20.23 2.87 17.54 20.77 34.01 23.14
…… …… …… …… …… …… …… …… …… …… …… ……
98 91.10 37.80 37.71 18.75 23.23 41.39 10.48 39.49 8.19 16.46 3.21
99 74.02 32.16 24.98 35.39 39.58 39.22 31.20 2.45 29.75 2.87 47.52
100 16.30 8.81 18.44 42.02 26.66 1.76 35.63 3.47 35.75 20.84 15.91
Strategiest (d)Q (m³/d)
LHS sampling results
34.86 28.18 4.98 15.63 33.30 49.90 8.03 29.00 11.79 18.61 29.90
81.62 11.61 2.92 21.48 9.08 3.60 13.47 19.65 32.56 20.37 1.41
16.53 25.58 45.64 47.72 46.53 9.05 9.72 6.49 5.10 2.38 45.48
…… …… …… …… …… …… …… …… …… …… …… ……
98 67.65 3.53 46.35 39.38 9.46 23.38 17.24 44.87 41.31 35.21 26.24
99 94.02 5.55 26.99 37.94 44.83 6.82 35.05 24.16 2.72 14.89 32.05
100 94.90 17.98 10.68 7.91 14.51 19.89 47.97 19.09 36.83 21.29 46.22
OLHS sampling results
59.77 19.72 37.81 38.79 12.01 8.35 37.12 20.39 12.07 9.41 2.80
25.65 46.24 4.34 3.01 16.29 9.05 24.83 41.54 36.52 26.72 25.24
90.06 12.30 27.57 41.07 5.03 20.23 2.87 17.54 20.77 34.01 23.14
…… …… …… …… …… …… …… …… …… …… …… ……
98 91.10 37.80 37.71 18.75 23.23 41.39 10.48 39.49 8.19 16.46 3.21
99 74.02 32.16 24.98 35.39 39.58 39.22 31.20 2.45 29.75 2.87 47.52
100 16.30 8.81 18.44 42.02 26.66 1.76 35.63 3.47 35.75 20.84 15.91

### Surrogate model performance analysis

Gauss function was used as the radial basis function, which is the widely used radial basis function. Orthogonal least square was used for RBFANN training on the MATLAB platform; the parameters of the trained RBFANN model are in Table 6, which were obtained by the serial trial and error method.

Table 6

Parameters of RBFANN

ParametersValue
No. of input layer neurons 11
No. of hidden layer neurons 91
No. of output layer neurons
Training objective error 0.0025
ParametersValue
No. of input layer neurons 11
No. of hidden layer neurons 91
No. of output layer neurons
Training objective error 0.0025

#### (1) Goodness of fit of surrogate models

The average removal rates of the 40 validation samples obtained using the two surrogate models were compared with those obtained using the developed simulation model, and the results are shown in Figure 8. R2 was selected to assess the goodness of fit of different surrogate models to the simulation model, which is also shown in Figure 8.

Figure 8

Surrogate model results vs. simulation results in validation phase: (a) LHS-based surrogate model, (b) OLHS-based surrogate model.

Figure 8

Surrogate model results vs. simulation results in validation phase: (a) LHS-based surrogate model, (b) OLHS-based surrogate model.

Figure 8 demonstrates that both of these two surrogate model have favorable fitting results with the simulation model. Furthermore, the points obtained with the OLHS-based surrogate model were closer to the 1:1 perfect line, and the R2 of the OLHS-based surrogate model was higher than that of the LHS-based surrogate model, which demonstrated that the OLHS-based surrogate model has a better fitting result.

#### (2) Residual of surrogate models

Root mean square error (RMSE) and root mean square percentage error (RMSPE) were selected to assess the error of the different surrogate models to the simulation model. The smaller the RMSE and RMSPE are, the more accurate the surrogate model is. The RMSE of the LHS based surrogate model and OLHS based surrogate model were 0.05 and 0.04, while the RMSPE of the LHS based surrogate model and OLHS based surrogate model were 15% and 33%, which demonstrated that the OLHS based surrogate model has a better accuracy.

#### (3) Uncertainty analysis of surrogate models

Figure 9 presents the histograms of residuals generated by the surrogate model for the validation points. The figures had approximately symmetrical shapes, and the normality of the residuals could be straightforwardly assumed. The Shapiro-Wilk test and T-test were performed to verify the normality of residuals and zero-mean hypothesis of the residuals, respectively, and the test results showed that the two hypotheses can be accepted. Through the comparison of Figure 9(a) and 9(b), we can conclude that the OLHS based surrogate model has a smaller uncertainty.

Figure 9

Surrogate model residuals histograms: (a) LHS-based surrogate model, (b) OLHS-based surrogate model.

Figure 9

Surrogate model residuals histograms: (a) LHS-based surrogate model, (b) OLHS-based surrogate model.

### Optimization results analysis

To demonstrate the optimization results at different confidence levels, the confidence levels were set to 50%, 80%, 85%, 90%, 95% and 99%, respectively. Note that a stochastic formulation with a confidence level of 50% is equivalent to a deterministic formulation because (Ouyang et al. 2017). According to the uncertainty analysis results of the surrogate models the normality hypothesis and the zero-mean hypothesis can be accepted; therefore, Equation (6c) can be transformed to:
(7)
where E[·] is the expected value, is the value of the standard normal cumulative distribution corresponding to a confidence level of , and is the standard deviation of e.

The genetic algorithm was adopted to solve the developed nonlinear optimization model on the MATLAB platform. In the optimization process, the developed OLHS-based RBFANN surrogate model was invoked instead of the computational simulation model.

In the genetic algorithm searching process, selection probability, crossover probability and mutation probability are usually set between 0.7 − 1.0, 0.7 − 1.0, 0.01 − 0.05 (Simpson et al. 1994), and in this study they were set as 0.8, 0.8 and 0.05, and the initial population size and the maximum number of generations were set as 200 and 300, respectively. The optimal value of decision variables (optimal remediation strategy) at different confidence levels are in Table 7. The optimal remediation cost at different confidence levels is in Figure 10.

Table 7

Optimal value of decision variables (optimal remediation strategy) at different confidence levels

Confidence levelOptimal remediation duration (d)Optimal wells rates (m3/d)
Injection wells
Extraction well
QIn1QIn2QIn3QIn4QIn5QIn6QIn7QIn8QIn9QIn10QEx
99% 100 50 50 50 50 50 50 50 50 50 50 500
95% 92 45 43 42 42 45 40 40 45 40 48 430
90% 91 40 30 39 34 34 39 36 28 35 35 350
85% 91 35 30 32 32 34 34 36 28 35 35 331
80% 91 35 30 28 30 29 30 32 35 35 34 318
50% 89 32 14 28 36 36 30 21 19 24 248
Confidence levelOptimal remediation duration (d)Optimal wells rates (m3/d)
Injection wells
Extraction well
QIn1QIn2QIn3QIn4QIn5QIn6QIn7QIn8QIn9QIn10QEx
99% 100 50 50 50 50 50 50 50 50 50 50 500
95% 92 45 43 42 42 45 40 40 45 40 48 430
90% 91 40 30 39 34 34 39 36 28 35 35 350
85% 91 35 30 32 32 34 34 36 28 35 35 331
80% 91 35 30 28 30 29 30 32 35 35 34 318
50% 89 32 14 28 36 36 30 21 19 24 248
Figure 10

Optimal remediation cost at different confidence levels.

Figure 10

Optimal remediation cost at different confidence levels.

## CONCLUSIONS

To illustrate the impact of sampling method improvement on improving the surrogate model performance, aiming at a nitrobenzene contaminated groundwater remediation problem, the OLHS method was introduced to sample data in the input variables feasible region of a multi-phase flow numerical simulation model, and RBFANN was used to construct surrogate models. Considering the surrogate model uncertainty, CCP model was constructed, and the remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained. The main conclusions of this research are as follows:

• (1)

The centered L2-discrepancy demonstrated that compared with the LHS method, the OLHS method introduces an optimal criterion into the LHS process, which improves the space-filling degree of the sample points considerably.

• (2)

The OLHS-based surrogate model performed better than the LHS-based surrogate model in terms of the goodness of fit (R2), residual (RMSE and RMSPE) and uncertainty. That is to say, the OLHS-based surrogate model was more appropriate for use in the NAPLs contaminated aquifer remediation optimization process. Therefore, in the following optimization process, the developed OLHS-based surrogate model was invoked for assessing the remediation efficiency of different remediation strategies.

• (3)

The surrogate-modeling uncertainty could be successfully handled by CCP. The optimal remediation strategies at 99%, 95%, 90%, 85%, 80% and 50% confidence levels were obtained. For achieving a higher confidence level, the remediation cost must be increased.

## ACKNOWLEDGEMENTS

This research was supported by the National Natural Science Foundation of China (No. 41502221), the China Postdoctoral Science Foundation (No. 2015M570274) and the National Natural Science Foundation of China (No. 41372237). The authors would like to thank editors and anonymous reviewers for their insightful comments and suggestions, which improved this manuscript.

## REFERENCES

REFERENCES
Asher
,
M.
,
Croke
,
B.
,
Jakeman
,
A.
&
Peeters
,
L.
2015
A review of surrogate models and their application to groundwater modeling
.
Water Resources Research
51
(
8
),
5957
5973
.
,
K.
,
Aïfa
,
T.
,
Djarfour
,
N.
&
Ferahtia
,
J.
2009
Application of a radial basis function artificial neural network to seismic data inversion
.
Computational Geoscience
35
(
12
),
2338
2344
.
Bhattarai
,
M.
2006
A Numerical Modeling Study of Surfactant Enhanced Mobilization of Residual LNAPL Using UTCHEM
.
ProQuest
,
Ann Arbor, MI, USA
.
Charnes
,
A.
&
Cooper
,
W.
1959
Chance-constrained programming
.
Management Science
6
(
1
),
73
79
.
Chen
,
S.
,
Cowan
,
C. F. N.
&
Grant
,
P. M.
1991
Orthogonal least squares learning algorithm for radial basis function networks
.
IEEE Transactions on Neural Networks
2
(
2
),
302
309
.
,
M.
,
Pope
,
G.
&
Sepehrnoori
,
K.
1996
A compositional simulator for modeling surfactant enhanced aquifer remediation, 1 formulation
.
Journal of Contaminant Hydrology
23
(
4
),
303
327
.
Fang
,
K.-T.
,
Ma
,
C.-X.
&
Winker
,
P.
2002
Centered L2-discrepancy of random sampling and Latin hypercube design, and construction of uniform designs
.
Mathematics of Computation
71
(
237
),
275
296
.
Fang
,
K.-T.
,
Li
,
R.
&
Sudjianto
,
A.
2005
Design and Modeling for Computer Experiments
.
CRC Press
,
Boca Raton, FL, USA
.
Fetter
,
C. W.
1999
Contaminant Hydrogeology
.
Macmillan Publishing Company
,
New York, NY, USA
.
Grosso
,
A.
,
Jamali
,
A.
&
Locatelli
,
M.
2009
Finding maximin Latin hypercube designs by iterated local search heuristics
.
European Journal of Operational Research
197
(
2
),
541
547
.
Huang
,
Y.
,
Li
,
J.
,
Huang
,
G.
,
Chakma
,
A.
&
Qin
,
X.
2003
Integrated simulation-optimization approach for real-time dynamic modeling and process control of surfactant-enhanced remediation at petroleum-contaminated sites
.
Practice Periodical of Hazardous Toxic and Radioactive Waste Management
7
(
2
),
95
105
.
Jin
,
R.
,
Chen
,
W.
&
Simpson
,
T. W.
2001
Comparative studies of metamodelling techniques under multiple modelling criteria
.
Structural and Multidisciplinary Optimization
23
(
1
),
1
13
.
Jin
,
R.
,
Chen
,
W.
&
Sudjianto
,
A.
2005
An efficient algorithm for constructing optimal design of computer experiments
.
Journal of Statistical Planning & Inference
134
(
1
),
268
287
.
Jourdan
,
A.
&
Franco
,
J.
2010
Optimal Latin hypercube designs for the Kullback–Leibler criterion
.
94
(
4
),
341
351
.
Kalwij
,
I. M.
&
Peralta
,
R. C.
2006
Simulation/optimization modeling for robust pumping strategy design
.
Ground Water
44
(
4
),
574
582
.
Kenny
,
Q. Y.
,
Li
,
W.
&
Sudjianto
,
A.
2000
Algorithmic construction of optimal symmetric Latin hypercube designs
.
Journal of Statistical Planning and Inference
90
(
1
),
145
159
.
Kuiper
,
L. K.
&
Illangasekare
,
T. K.
1998
Numerical simulation of NAPL flow in the subsurface
.
Computational Geosciences
2
(
3
),
171
189
.
Liefvendahl
,
M.
&
Stocki
,
R.
2006
A study on algorithms for optimization of Latin hypercubes
.
Journal of Statistical Planning and Inference
136
(
9
),
3231
3247
.
Liu
,
X. M.
,
Huang
,
G. H.
,
Wang
,
S.
&
Fan
,
Y. R.
2015
Water resources management under uncertainty: factorial multi-stage stochastic program with chance constraints
.
Stochastic Environmental Research and Risk Assessment
30
(
3
),
945
957
.
Luo
,
J.
,
Lu
,
W.
,
Ji
,
Y.
&
Ye
,
D.
2016
A comparison of three prediction models for predicting monthly precipitation in Liaoyuan city, China
.
Water Science & Technology Water Supply
16
(
3
),
845
854
.
Mason
,
A.
&
Kueper
,
B.
1996
Numerical simulation of surfactant flooding to remove pooled DNAPL from porous media
.
Environmental Science & Technology
30
(
11
),
3205
3215
.
McKay
,
M. D.
,
Beckman
,
R. J.
&
Conover
,
W.
1979
A comparison of three methods for selecting values of input variables in the analysis of output from a computer code
.
Technometrics
21
(
2
),
239
245
.
Morris
,
M. D.
&
Mitchell
,
T. J.
1995
Exploratory designs for computational experiments
.
Journal of Statistical Planning and Inference
43
(
3
),
381
402
.
Park
,
J. S.
1994
Optimal Latin-hypercube designs for computer experiments
.
Journal of Statistical Planning and Inference
39
(
1
),
95
111
.
Qin
,
X. S.
,
Huang
,
G. H.
,
Chakma
,
A.
,
Chen
,
B.
&
Zeng
,
G. M.
2007
Simulation-based process optimization for surfactant-enhanced aquifer remediation at heterogeneous DNAPL-contaminated sites
.
Science of the Total Environment
381
(
1–3
),
17
37
.
Rathfelder
,
K. M.
,
Abriola
,
L. M.
,
Taylor
,
T. P.
&
Pennell
,
K. D.
2001
Surfactant enhanced recovery of tetrachloroethylene from a porous medium containing low permeability lenses: 2. Numerical simulation
.
Journal of Contaminant Hydrology
48
(
3
),
351
374
.
Shen
,
W.
,
Guo
,
X.
,
Wu
,
C.
&
Wu
,
D.
2011
Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm
.
Knowledge-Based Systems
24
(
3
),
378
385
.
Simpson
,
A. R.
,
Dandy
,
G. C.
&
Murphy
,
L. J.
1994
Genetic algorithms compared to other techniques for pipe optimization
.
Journal of Water Resources Planning and Management
120
(
4
),
423
443
.
Sreekanth
,
J.
&
Datta
,
B.
2015
Review: simulation-optimization models for the management and monitoring of coastal aquifers
.
Hydrogeology Journal
23
(
6
),
1155
1166
.
Viana
,
F. A. C.
2015
A tutorial on Latin hypercube design of experiments
.
Quality & Reliability Engineering International
32
(
5
),
1975
1985
.
Wilson
,
D. J.
1989
Soil clean up by in-situ surfactant flushing. I. Mathematical modeling
.
Separation Science And Technology
24
(
11
),
863
892
.
Wu
,
B.
,
Zheng
,
Y.
,
Wu
,
X.
,
Tian
,
Y.
,
Han
,
F.
,
Liu
,
J.
&
Zheng
,
C.
2015
Optimizing water resources management in large river basins with integrated surface water-groundwater modeling: a surrogate-based approach
.
Water Resources Research
51
(
4
),
2153
2173
.
Xin
,
X.
,
Lu
,
W.
,
Luo
,
J.
&
Chen
,
S.
2011
An alternative model for multiphase flow and transport simulation in DNAPLs-contaminated aquifer
.
Journal of Jilin University (Earth Science Edition)
41
(
3
),
855
860
.
Yan
,
S.
&
Minsker
,
B.
2011
Applying dynamic surrogate models in noisy genetic algorithms to optimize groundwater remediation designs
.
Journal of Water Resources Planning and Management
137
,
284
292
.
Yao
,
W.
,
Chen
,
X.
,
Huang
,
Y.
&
van Tooren
,
M.
2014
A surrogate-based optimization method with RBF neural network enhanced by linear interpolation and hybrid infill strategy
.
Optimization Methods and Software
29
(
2
),
406
429
.
Zerpa
,
L. E.
,
Queipo
,
N. V.
,
Pintos
,
S.
&
Salager
,
J.-L.
2005
An optimization methodology of alkaline–surfactant–polymer flooding processes using field scale numerical simulation and multiple surrogates
.
Journal of Petroleum Science and Engineering
47
(
3
),
197
208
.