Abstract

The computationally expensive variable density and salt transport numerical models hinder the implementation of simulation-optimization routines for coastal aquifer management. To reduce the computational cost, surrogate models have been utilized in pumping optimization of coastal aquifers. However, it has not been previously addressed whether surrogate modelling is effective given a limited number of numerical simulations with the seawater intrusion model. To that end, two surrogate-based optimization (SBO) frameworks are employed and compared against the direct optimization approach, under restricted computational budgets. The first, a surrogate-assisted algorithm, employs a strategy which aims at a fast local improvement of the surrogate model around optimal values. The other, balances global and local improvement of the surrogate model and is applied for the first time in coastal aquifer management. The performance of the algorithms is investigated for optimization problems of moderate and large dimensionalities. The statistical analysis indicates that for the specified computational budgets, the sample means of the SBO methods are statistically significantly better than those of the direct optimization. Additionally, the selection of cubic radial basis functions as surrogate models, enables the construction of very fast approximations for problems with up to 40 decision variables and 40 constraint functions.

INTRODUCTION

Variable density and salt transport (VDST) numerical models are indispensable tools for simulating seawater intrusion (SWI) in coastal aquifers (Werner et al. 2013). They have been effectively employed to improve understanding in real-world SWI problems (e.g., Gingerich & Voss 2005; Giambastiani et al. 2007; Kopsiaftis et al. 2009; Kerrou et al. 2013). Additionally, the simulation of dispersive flow between seawater and freshwater by using VDST models, enables a more accurate management of groundwater abstraction in coastal aquifers (Pool & Carrera 2011).

However, VDST models are computationally expensive, as is the case with most of the high-fidelity computer simulations. Hence, their use in iterative numerical tasks, such as sensitivity analysis or optimization, is hindered by the increased computational cost. To address this issue, several studies have employed data-driven surrogate modelling techniques either to partly or fully replace the computationally expensive VDST simulations (Sreekanth & Datta 2015). Examples of surrogate models in coastal aquifer management comprise artificial neural networks (e.g., Rao et al. 2004; Bhattacharjya & Datta 2005; Kourakos & Mantoglou 2009; Ataie-Ashtiani et al. 2014; Roy et al. 2016), genetic programming (Sreekanth & Datta 2011), evolutionary polynomial regression (Hussain et al. 2015), polynomial chaos expansions (Rajabi et al. 2015), radial basis functions (RBFs) (Christelis & Mantoglou 2016a) and fuzzy inference systems (Roy & Datta 2017).

Typically, an initial set of input–output data from the physics-based models is used to train the surrogate models in order to attain a certain level of accuracy for predicting responses to unseen data (Solomatine & Ostfeld 2008). It is unlikely though that a global accurate surrogate model can be constructed, given that the number of available simulations with the original model is usually limited due to computational restrictions (Forrester et al. 2008). In certain coastal aquifer management studies, hundreds to thousands input–output patterns were used to construct an accurate surrogate model (Sreekanth & Datta 2015). The use of large training patterns may lead to impractical computational cost, even for a VDST model with simulation runtimes of a few minutes.

Most coastal aquifer management studies have applied surrogate-based optimization (SBO) methods without pre-specified restrictions on the overall computational budget. The use of adaptive surrogate training frameworks has significantly reduced the associated computational burden (e.g., Kourakos & Mantoglou 2009; Papadopoulou et al. 2010; Christelis & Mantoglou 2016a). Alternatively, Ataie-Ashtiani et al. (2014) proposed a zonation methodology as a practical approach to reduce the dimensionality of the optimization problem and, therefore, the required training data for building the surrogate models. It is also worth noting that pumping optimization problems of coastal aquifers usually involve non-linear constraints (Mantoglou et al. 2004). The presence of non-linear constraints further complicates the development of SBO methods (Forrester et al. 2008).

Nevertheless, many engineering optimization studies have focused on approximating the global optimum based on a specified number of simulations with the original expensive computer model. There is a wide body of SBO literature which develops adaptive sampling strategies that effectively utilize the expensive original model simulations, to update the surrogate and increase its accuracy within regions of interest (e.g., Jones et al. 1998; Mugunthan et al. 2005; Regis & Shoemaker 2005, 2007, 2013a, 2013b; Forrester & Keane 2009; Parr et al. 2012; Regis 2014; Tsoukalas et al. 2016). However, the application of comprehensive SBO strategies which exploit information from the surrogate models in order to sample the expensive original model is rather limited in groundwater modelling and optimization (Asher et al. 2015). Furthermore, it is debatable whether there is a benefit from the use of surrogate models in optimization problems of increased dimensionality and under limited computational budgets (Razavi et al. 2012a).

In the present paper, we address the effectiveness of surrogate modelling in pumping optimization of coastal aquifers, given a limited number of available simulations with the expensive SWI model. Two SBO frameworks are employed in order to solve single-objective pumping optimization problems. The first SBO algorithm utilizes a metamodel-embedded evolution framework which constructs RBF surrogate models for the constraints functions only. RBF surrogate models have been successfully applied in several SBO problems (Razavi et al. 2012b). The other is an advanced SBO algorithm, namely, ConstrLMSRBF (Regis 2011), which simultaneously deals with the objective function and the constraints of the optimization problem, by constructing RBF surrogate models for each one of them.

Our main contribution is the application and comparison of SBO methods on coastal aquifer management optimization problems under a very limited computational budget. Moreover, while the ConstrLMSRBF algorithm has been successfully applied on many synthetic test problems and on a large-scale optimization problem in the auto industry, this algorithm is applied for the first time in water resources optimization and for problems of pumping optimization of coastal aquifers. The goal is to investigate the performance of SBO algorithms for different dimensionalities of the decision variable space while imposing strong restrictions on the number of available simulations with the VDST model. The latter assumption is closer to real-world cases where coastal aquifer management problems involve computationally heavy numerical models of SWI. The SBO algorithms are compared against direct optimization with the VDST model in order to evaluate the usefulness of constructing surrogate models in the case of limited computational budgets. We emphasize that our goal is not to run the algorithms until they find the global optimum since this is not realistic in the computationally expensive setting, and more so when the problems are high-dimensional. Instead, we wish to determine and compare how much progress the algorithms can achieve when the number of VDST simulations are very limited.

The rest of the paper includes four sections. The next section presents the SWI numerical model, the coastal aquifer model and the formulation of the pumping optimization problem. The description of the surrogate models and their implementation in SBO strategies follows in the next section. Then the optimization results are presented and the final section concludes with the findings of the present study.

METHODS

SWI modelling

VDST models utilize numerical codes which solve a coupled system of partial differential equations of flow and transport in order to simulate SWI (Voss & Souza 1987). It is considered a complicated and computationally expensive numerical task mostly due to the spatial and time discretization requirements of the solute transport step (Werner et al. 2013). In the present paper, the HydroGeoSphere code (HGS) (Therrien & Sudicky 1996; Graf & Therrien 2005; Therrien et al. 2006) was used to simulate SWI. The HGS code applies the control volume finite element method with adaptive time-stepping while a Picard iteration scheme is utilized to iteratively solve the system of flow and transport equations for VDST simulations (Thompson et al. 2007). The mathematical formulation of VDST modelling is briefly described below whereas comprehensive presentations can be found elsewhere (e.g., Kolditz et al. 1998).  
formula
(1)
 
formula
(2)
In the flow Equation (1) the equivalent freshwater head is the flow variable given by , where is the fluid pressure, is the reference fluid density, is the gravity acceleration constant and is the elevation above horizontal datum. The indices represent the unit vectors in x and y directions, respectively, while represents the direction of flow and it equals 1 in the vertical direction and 0 for the horizontal directions. In transport Equation (2), the dimensionless relative concentration, is the transport variable which varies between 0 and 1. It is linearly related to fluid density through , under the assumption that the solute concentration of a fluid is when . The term represents the dimensionless relative density . are the coefficients of freshwater hydraulic conductivity tensor, are the coefficients of the dispersion tensor, is porosity, is time, is a volumetric fluid source/sink term per unit aquifer volume, is a solute mass source/sink term and is the specific storage. The Darcy flux term is expressed for freshwater properties as:  
formula
(3)

Coastal aquifer application model

The numerical SWI simulations are based on a coastal aquifer model of rectangular shape (Figure 1) which is an approximation of a real aquifer at the Greek island of Kalymnos (Mantoglou et al. 2004).

Figure 1

Coastal aquifer model with applied boundary conditions and the locations of fully penetrating pumping wells ().

Figure 1

Coastal aquifer model with applied boundary conditions and the locations of fully penetrating pumping wells ().

The horizontal dimensions of the coastal aquifer model are x = 7,000 m, y = 3,000 m and the aquifer base is at z = −25 m below sea-level. On the west side of the aquifer model, a hydrostatic boundary condition is applied along with a specified salinity concentration of 35 kg/m3 for a saltwater density of 1,025 kg/m3. The aquifer is replenished by both recharge and inland fluxes. The two lateral boundaries are no-flow boundaries while fully penetrating pumping wells extract groundwater from the coastal aquifer. A homogeneous and anisotropic coastal aquifer is assumed where the values of hydraulic conductivity are Kx=Ky =100 m/day and Kz = 10 m/day. In the absence of field data and due to the exploratory features of this study, the longitudinal dispersivity value was set to 100 m and the transverse dispersivity value to 10 m. Based on the above dispersivity values the spatial discretization was set to Δx = Δy = 100 m and Δz = 5 m. Note that for all the optimization problems described in the following sections, multiple independent optimization runs are performed in order to produce a statistical output considering the stochastic nature of the algorithms. In that sense, a relatively fast VDST model is required to perform such a demanding computational task for generic comparison purposes. A single VDST simulation required an approximate CPU time of 30 seconds, running on a 2.53 GHz Intel i5 processor with 6 GB of RAM in a 64-bit Windows 7 system.

Formulation of the pumping optimization problem

The pumping optimization problem of the present work lies in the category of non-linear constrained optimization problems described as follows:  
formula
(4)
where represent the objective function and inequality constraint functions, respectively. The vector takes values in the dimensional continuous space . A real vector is sought so that , subject to the constraints defined in Equation (4). It is assumed that the derivatives of are not available while the bound constraints define the search space of the optimization problem. The corresponding single-objective pumping optimization problem can be mathematically described as follows:  
formula
(5)
where is the individual pumping rate of each pumping well and is the horizontal distance of the iso-salinity from the coast, as a function of pumping rates from each pumping well. The variable refers to the pumping well location, while and define the lower and upper limits which pumping rates can take. Note that the dimension of the search space for this optimization problem is the same as the number of inequality constraints, which is given by M. The goal is to maximize (the reason for the negative sign in the objective function) the total groundwater extraction, subject to constraints which maintain the salinity levels in pumped groundwater at the specified limit of . Figure 2 illustrates a plan view of the simulated iso-salinity contours at the aquifer base, for a feasible vector Q of pumping rates.
Figure 2

Plan view of the coastal aquifer model showing simulated iso-salinity contours for a feasible vector of pumping rates ().

Figure 2

Plan view of the coastal aquifer model showing simulated iso-salinity contours for a feasible vector of pumping rates ().

The optimization problem in Equation (5) can be translated to a bound-constrained optimization problem using penalty terms in the objective function. Thus, the objective function value is penalized every time that a constraint of the problem is violated. In this study, we have applied the following objective function penalty formulation:  
formula
(6)
where represents the number of pumping wells that the constraint is violated. The above formulation aims to attribute a separate score for each violated constraint while it involves the magnitude of violation through the squared difference between and . The penalized objective function is also multiplied by to incorporate the number of constraint violations for a non-feasible vector Q. In the above formulation, the objective function is equal to the penalty function when the point is infeasible. In the literature, it is more common to add the penalty function to the original objective function. However, in the computationally expensive setting, a practitioner would prefer to obtain a feasible point first, assuming none was initially available, given the limited number of simulations, and the above formulation facilitates this process. In contrast, when using the combined objective function plus penalty approach, one could get a good combined value without necessarily ever getting a feasible point. By using our approach, we focus on minimizing the penalty (i.e., getting a feasible point) first when a feasible point is not provided at the start. One potential drawback of this approach though is that the resulting objective function is discontinuous along the boundary of the feasible region and this could be a problem for traditional optimization methods. Fortunately, this is not an issue for the evolutionary algorithms used in this study, and besides, the ConstrLMSRBF algorithm uses the original optimization formulation in Equation (5).

The pumping optimization problem defined above can be directly solved using the VDST model combined with a proper optimization algorithm. In pumping optimization of coastal aquifers, evolutionary algorithms tend to perform better than conventional gradient-based algorithms which might get trapped in local minima (Ketabchi & Ataie-Ashtiani 2015). However, evolutionary algorithms require a large number of function evaluations to converge and their performance may vary depending on the application (Mantoglou & Papantoniou 2008; Karpouzos & Katsifarakis 2013; Ketabchi & Ataie-Ashtiani 2015). Therefore, the direct solution of pumping optimization problems using VDST models and evolutionary algorithms may result in excessive computational burden.

In this study, a heuristic optimization method, namely, the evolutionary annealing-simplex (EAS) algorithm (Efstratiadis & Koutsoyiannis 2002), is utilized to solve the penalized formulation of the optimization problem defined in Equation (6). EAS algorithm employs the concepts of evolutionary search, the downhill simplex scheme and simulated annealing (Rozos et al. 2004). It has demonstrated a robust performance for various pumping optimization problems of coastal aquifers (Kourakos & Mantoglou 2009; Christelis & Mantoglou 2016a, 2016b). Thereinafter, the direct optimization approach with the SWI model will be referred to as VDST-EAS.

The surrogate model

The VDST-EAS approach may considerably increase the required computational effort to get an optimal solution. In some cases, the VDST simulations can be very expensive so that only a small number of them can be utilized to estimate a feasible solution in reasonable computational times (e.g., Christelis & Mantoglou 2016b). In this section, surrogate models are proposed as an alternative method for attaining an improved optimal solution based on a specified number of simulations with the VDST model.

In the pumping optimization problem described in Equation (5), the objective function is just a linear function of the decision variables which are the pumping rates, while the constraint functions are computationally expensive to evaluate. There are a variety of surrogate modelling techniques that can be used to approximate the constraints, including kriging, RBF and support vector machines (SVM). This paper employs a cubic RBF model augmented with a linear polynomial tail, in order to build a surrogate model for each of the M inequality constraint functions . This type of surrogate was chosen because of its prior success when used with some SBO algorithms for constrained black-box optimization (e.g., Regis 2011, 2014).

For convenience, denote the decision vector of pumping rates by and the objective function by , and rewrite each inequality constraint function in the form , where . Now, given the vectors (which are simply referred to as points) where the constraint functions have been evaluated (i.e., so that the values are known for all ), this paper uses an RBF model of the form (Powell 1992):  
formula
(7)
for each of the M inequality constraints. Here, (the cubic form), are coefficients to be determined, and is a linear polynomial whose coefficients also need to be determined. Training the above RBF surrogate model for a constraint function means obtaining suitable values for the coefficients of the RBF part and the polynomial part so that the error between the constraint function and the RBF model at the training points , is minimized. For the particular RBF model and training method used in this paper, the training error will always be zero, which means that the resulting RBF model passes through all the data points, that is, the surrogate model is an exact emulator. To obtain the coefficients in the above cubic RBF model for the constraint function , define the matrix where ) and the matrix whose row is. Moreover, define the vector . Now, the vector of coefficients for the RBF part and the coefficients for the polynomial part are obtained by solving the following system of linear equations:  
formula
(8)
Under some simple conditions on the training points, namely, that the matrix P has full column rank, the interpolation matrix in the above system is guaranteed to be invertible (Powell 1992). Since the above system can be solved quickly and efficiently, even when M is large, the training time for the cubic RBF model is negligible in comparison to the simulation time needed to generate the constraint function values. In all, the computational benefits from the negligible training time of cubic RBF models and their exact interpolation characteristics appear attractive for deterministic pumping optimization problems of large dimensionalities.

SBO optimization using a prediction-based infill strategy

Adaptive SBO methods have been successfully applied in problems of pumping optimization of coastal aquifers. Those approaches managed to reduce the number of input–output patterns required from the surrogate model to provide reasonable approximations of the VDST model during optimization (Sreekanth & Datta 2015). Recently, Christelis & Mantoglou (2016a) applied cubic RBF surrogate models for a pumping optimization problem of coastal aquifers, which involved 10 pumping wells and 10 corresponding constraint functions for each pumping well. In their work, an online training scheme of the RBF models was embedded within the EAS algorithm. Their approach was to add infill points to the initial sampling plan by using the current best solutions found by the RBF model during the optimization operations. This infill strategy favours a fast improvement of the RBF model at the region of the current optimum (local exploitation). However, it neglects the global improvement of the surrogate model and might fail to identify the region of the global optimum (Forrester et al. 2008). In that study, the above approach reduced by 96% the corresponding computational time with the VDST-EAS approach while it successfully located the region of the global optimum. We apply the same method here, in order to test its performance as a basic SBO strategy and evaluate its performance for problems of larger dimensions and under limited computational budgets. The steps of the method, denoted hereinafter as RBF-EAS, are briefly presented below since the details have been presented in Christelis & Mantoglou (2016a):

  • 1.

    Use a Latin hypercube sampling method to produce the initial population for the EAS algorithm and evaluate the VDST model at these points.

  • 2.

    Store the initial sampling plan of the evaluation points , along with the responses of the VDST model for the constraint functions and train the RBF surrogate models.

  • 3.

    Run EAS algorithm based on the RBF models and if a new optimum is found, use the VDST model to evaluate the current best solution Q. Add the new input–output data to the initial sampling plan, and re-train the RBF models.

  • 4.

    Is the computational budget exhausted? If yes, return final solution, else go to step 3.

The ConstrLMSRBF algorithm

The ConstrLMSRBF algorithm (Regis 2011) is an SBO algorithm for constrained black-box optimization that uses the RBF interpolation model described previously, to approximate the black-box objective and inequality constraint functions. In the case of the pumping optimization problem in Equation (5), only the constraint functions are computationally expensive to evaluate. However, in the standard implementation of ConstrLMSRBF, the algorithm also maintains an RBF surrogate model for the objective function. In this case though, since the objective function is linear in the decision variables (the pumping rates), one can mathematically prove that the resulting surrogate will also be linear and will be identical to the objective function, provided there are at least training points.

Before applying ConstrLMSRBF, the variables are rescaled so that the region defined by the bound constraints (i.e., the search space) is transformed to the unit hypercube . ConstrLMSRBF begins by evaluating the objective and constraint functions at a feasible starting point and at the points of a space-filling design, specifically a Latin hypercube design (LHD) with points, over the region defined by the bound constraints of the problem, which is now . Together, the feasible starting point and the LHD points constitute the initial training points. The space-filling design points possibly include infeasible points, and for the version of ConstrLMSRBF used in this paper, the first initial point must be feasible. The requirement of having a feasible point is not unreasonable since in many applications a feasible solution is often available or easy to obtain, as is the case for the above pumping optimization problem, and the practitioner is simply looking to improve this feasible solution. However, an extension of ConstrLMSRBF in Regis (2014) allows all initial points to be infeasible.

After evaluating the objective and inequality constraint functions at the initial points, RBF models are fit for the objective and constraint functions using all available data points. Then, the algorithm goes through a loop that involves generating a large number of random candidate points obtained by perturbing some (or all) of the coordinates of the current best feasible point using Gaussian distributions with zero mean and with standard deviations that are allowed to vary adaptively depending on performance, to facilitate either local search or global search. When generating a candidate point, the choice of which coordinates of the current best point are perturbed is random, and is controlled by a parameter which is the probability that a given coordinate is perturbed. In the numerical experiments, equals 0.5 or 1. The initial standard deviation of the Gaussian perturbations is set at 0.05 (based on the unit hypercube search space) and it is doubled when the algorithm achieves a certain number of consecutive improvements . Moreover, this standard deviation is reduced by a half if the algorithm results in a certain number of consecutive failures (i.e., iterations with no improvement in the best feasible objective function value). The success threshold and failure tolerance were set to and , respectively. Note that the settings of the standard deviation of the Gaussian perturbations (particularly the initial value) biases ConstrLMSRBF towards local search, but provides the opportunity to become more global depending on performance. This strategy was found to be effective in Regis (2011) on high-dimensional and highly constrained problems, including a 124-dimensional automotive application with 68 black-box inequality constraints, when a feasible starting point is provided. This is because when there are many constraints and the current best point is feasible, it is better to be more conservative when perturbing this current best point since it takes only one constraint violation to render a candidate point infeasible.

Next, the algorithm gathers the candidate points that are predicted to be feasible or that have the minimum number of predicted constraint violations. These points will be referred to as the valid candidate points. A candidate point is predicted to be feasible if it satisfies the RBF models of the constraints (i.e., for where is the RBF model of the ith constraint function ). The next point where the simulation will be run (or where objective and constraint functions will be evaluated) is chosen to be the best point among all the valid candidate points according to two criteria: predicted objective function value of the candidate point according to the RBF model of the objective, and its minimum distance from previously evaluated points. More precisely, for each valid candidate point Q, the algorithm calculates a score for the RBF criterion, , and a score for the distance criterion, . These scores vary from 0 to 1, with the preferred candidate points having scores closer to zero. Then, the next point where the simulation will take place is the valid candidate point Q that minimizes the value of:  
formula
(9)
where and are the weights for the two criteria and they satisfy . In the numerical experiments, these weights were fixed to and to put more emphasis on the RBF criterion. This means that we put more emphasis on selecting a candidate point with good values of the RBF model predictions. However, because the weight for the distance criterion is non-negligible, we also prefer a candidate point that has far from previously evaluated points among those that have good RBF model values. Once the VDST simulation has taken place at the selected valid candidate point, the algorithm re-trains the RBF surrogate model with the new data point. Then it goes back to generating a new set of random candidate points and continues in the same manner as before until the computational budget is exhausted (e.g., the maximum number of VDST simulations has been reached). More details on ConstrLMSRBF can be found in Regis (2011).

Problem settings

Four pumping optimization problems of different dimensionality were solved to test the performance of the algorithms described above. That is, , , and . For an increase in the number of pumping wells, the total recharge of the coastal aquifer model was modified accordingly. This change facilitated the comparison of the algorithms by moving the region of the global optimum in a different location. Therefore, for the total recharge was set to 5,409.86 m3/day, for the total recharge was set to 6,159.8 m3/day, for the total recharge was set to 6,909.8 m3/day and for the total recharge was set to 7,659.8 m3/day. For each optimization problem (due to the different total recharge rates) an initial VDST model simulation was performed with no pumping present, until the head and salinity concentration fields reached steady-state. These were used as the initial conditions for the subsequent VDST simulations during the optimization task.

The location of the pumping wells is rather arbitrary since this study worked on a hypothetical aquifer. The base case (10 pumping wells) indicates the location of the wells near the coast boundary. In the other cases (M = 20, 30, 40), more pumping wells were added at inland locations. The minimum and maximum pumping flow rate for each well are and . The lateral recharge was set as 3,696 m3/day, 4,446 m3/day, 5,196 m3/day, 5,946 m3/day for the M = 10, 20, 30, 40 cases, respectively.

Each optimization problem was solved based on a specified budget of VDST simulations. The maximum allowed number of VDST model simulations was set to . Since the optimization methods of this study are based on stochastic operators, a set of 30 independent optimization runs is used for each approach in order to perform an adequate statistical comparison. In addition, for each independent optimization run a new initial population is generated which is applied to all the optimization methods to ensure the same starting conditions.

RESULTS AND DISCUSSION

The problems with 10 and 20 pumping wells are considered of moderate dimensionality and are grouped together. The problems with 30 and 40 pumping wells are considered of larger dimensionality and are also grouped together. Figures 3 and 4 present the performance of the optimization methods based on their average best feasible objective function value among the 30 independent runs as the number of VDST simulations increases. We caution the reader that the plots in Figures 3 and 4 do not show the means of the actual objective function values obtained by the different algorithms as the number of VDST simulations increases. Instead, these are the means of the best feasible values found so far, and this is why the curves tend to be smooth. The error bars represent 95% t-confidence intervals for the mean. That is, the length of each side of the error bars is 2.045 times the standard deviation of the best feasible objective function values divided by the square root of the number of runs.

Figure 3

Best average performance of the optimization frameworks against the available runs with the VDST model (plotting starts after the initial points): left view (), right view ().

Figure 3

Best average performance of the optimization frameworks against the available runs with the VDST model (plotting starts after the initial points): left view (), right view ().

Figure 4

Best average performance of the optimization frameworks against the available simulations with the VDST model (plotting starts after the initial points): left view (), right view ().

Figure 4

Best average performance of the optimization frameworks against the available simulations with the VDST model (plotting starts after the initial points): left view (), right view ().

Results demonstrate that the SBO methods outperform the direct VDST-EAS optimization for all test problems. The SBO methods were able to improve the objective function value given the available number of simulations with the VDST model. In both SBO frameworks, the objective function value exhibits a rapid improvement after the initial population evaluation in contrast with VDST-EAS. However, in problems where and , RBF-EAS appears to stall as the computational budget is exhausted. On the other hand, ConstrLMSRBF achieves a continuous improvement of the average objective function value as the number of VDST simulations is increased for all problems. Thus, the combined global and local search capabilities of ConstrLMSRBF algorithm are demonstrated particularly for the higher dimensional problems.

The increasing trend in the curves in Figures 3 and 4, particularly for ConstrLMSRBF, indicate that the optimal solution has not been reached. However, as mentioned in the Introduction, the main purpose of this study is to determine the effectiveness of surrogate modelling in pumping optimization of coastal aquifers given a limited number of simulations of the computationally expensive SWI model. The goal is not to run algorithms until they find the optimal solution because this will take a long time (recall that each simulation is expensive). Rather, we wish to determine how much progress the algorithms can achieve given a very limited computational budget of VDST simulations (i.e., 10 times the problem dimension), which is reasonable in the computationally expensive setting. Besides finding the global minimum of a general 40-D constrained optimization problem is not a realistic goal unless the problem has some special mathematical structure (e.g., convexity of the objective and constraint functions). This is because it will require an astronomically large number of function evaluations to verify if the point obtained (say after the curves plateau) is really the global minimizer. The search space of our 40-D problem has corner points. Hence, even if an optimization solver can perform one billion simulations (and this is typically not done in the literature), it will only be able to perform simulations on 0.09% of all the corner points, and this does not even include any interior points.

Part of the reason for the success of the SBO methods (RBF-EAS and ConstrLMSRBF) compared to VDST-EAS can be attributed to the ability of the RBF surrogates to approximate the objective and constraint functions generally well, at least in the region of the search space where the VDST simulations are performed or at least well enough to direct the simulations into the promising regions. RBF surrogate models have been known to provide accurate and robust function approximations (e.g., see Jin et al. (2001)). Moreover, the RBF surrogates used here are interpolating models, which means that the errors at the data points (i.e., the training errors) are all zero. In addition, it is noteworthy that our SBO methods use dynamically updated surrogates so that the surrogate model error in any given iteration is not as important in succeeding iterations. This is very different from the more traditional approach of fitting a surrogate model once and then performing the search for the optimum simply on the surrogate. In our SBO algorithms, the RBF surrogates are refitted every time a new data point becomes available so that these surrogates constantly improve as the number of data points increases. That is, if one evaluates a point and it turns out that the error between the RBF surrogate and the original VDST model is very large, then in the next iteration of ConstrLMSRBF, this error will become zero because of the interpolating nature of our RBF model. That is, our SBO algorithms essentially learn from their mistakes.

Table 1 exhibits the consistent outperformance of the SBO methods against the direct optimization while ConstrLMSRBF presents the most robust performance with the lowest standard error in most optimization problems. A one-way analysis of variance (ANOVA) test followed by a multiple comparison test was also performed on the best feasible objective function values obtained by the three algorithms at the computational budget of VDST simulations using the built-in MATLAB functions anova1 and multcompare (Statistics and Machine Learning Toolbox 2016). The multiple comparison test used the Tukey–Kramer procedure, which is the default in MATLAB.

Table 1

Statistics of the best feasible objective function values (best results are highlighted in bold font)

  Worst Best Median Mean Standard error (mean) 
M = 10 
 VDST-EAS     17.354 
 RBF-EAS     17.8936 
 ConstrLMSRBF     13.9262 
M = 20 
 VDST-EAS     18.6767 
 RBF-EAS     12.9824 
 ConstrLMSRBF     14.7258 
M = 30 
 VDST-EAS     15.7283 
 RBF-EAS     16.0658 
 ConstrLMSRBF     13.9859 
M = 40 
 VDST-EAS     21.8063 
 RBF-EAS     19.5244 
 ConstrLMSRBF     8.5851 
  Worst Best Median Mean Standard error (mean) 
M = 10 
 VDST-EAS     17.354 
 RBF-EAS     17.8936 
 ConstrLMSRBF     13.9262 
M = 20 
 VDST-EAS     18.6767 
 RBF-EAS     12.9824 
 ConstrLMSRBF     14.7258 
M = 30 
 VDST-EAS     15.7283 
 RBF-EAS     16.0658 
 ConstrLMSRBF     13.9859 
M = 40 
 VDST-EAS     21.8063 
 RBF-EAS     19.5244 
 ConstrLMSRBF     8.5851 

Table 2 demonstrates that the p-values for the simultaneous comparisons between VDST-EAS and the two SBO strategies are close to zero for all optimization problems. This confirms that the sample means for RBF-EAS and ConstrLMSRBF are statistically significantly better than the sample mean for VDST-EAS on all problems at the computational budget of VDST simulations. Furthermore, the comparison between RBF-EAS and ConstrLMSRBF shows that the sample mean for the latter is statistically significantly better than the sample mean for the former, at VDST simulations for the 30- and the 40-decision variable problems.

Table 2

p-values from the multiple comparison test among the three optimization frameworks

Optimization frameworks p-value
 
M = 10 M = 20 M = 30 M = 40 
VDST-EAS RBF-EAS     
VDST-EAS ConstrLMSRBF     
RBF-EAS ConstrLMSRBF 0.994 0.799 0.0021  
Optimization frameworks p-value
 
M = 10 M = 20 M = 30 M = 40 
VDST-EAS RBF-EAS     
VDST-EAS ConstrLMSRBF     
RBF-EAS ConstrLMSRBF 0.994 0.799 0.0021  

CONCLUSIONS

A single-objective pumping optimization problem of coastal aquifers was solved using both direct and SBO methods. The direct optimization (VDST-EAS) involved the combination of a VDST numerical model with an evolutionary algorithm. The two SBO methods were applied by utilizing the same surrogate models, namely, cubic RBF models. However, they were based on different update strategies for the surrogate model. The first (RBF-EAS) employed a classic prediction-based infill strategy (local exploitation) embedded in the same evolutionary algorithm with the direct optimization framework. The second (ConstrLMSRBF) was based on a comprehensive infill strategy which aims at both local exploitation and global exploration of the decision variable space.

To the best of our knowledge, this is the first time in coastal aquifer management that optimization problems of moderate and large dimensionalities are employed and compared for both direct and SBO methods, based on limited computational budgets. It is also the first time that a generic constrained SBO method (ConstrLMSRBF algorithm) is tested for single-objective pumping optimization problems of coastal aquifers. The results demonstrated an outperformance of the SBO methods against the direct optimization for the case of four different optimization problems with increased dimensionality (from 10 to 40 pumping wells). In particular, the ConstrLMSRBF algorithm is considered a promising SBO method for coastal aquifer management since it located the best solutions and demonstrated a robust performance for all optimization problems. The ANOVA and multiple comparison tests confirmed the statistical significance of the differences in the sample means between the direct optimization and the SBO methods. Furthermore, the use of cubic RBF surrogate models facilitated the simple and fast treatment of a large number of constraint functions (up to 40). In fact, the RBF models produced a negligible computational cost to the optimization runs which is desirable in cases of limited computational budgets.

CONFLICT OF INTEREST

The authors would like to declare that they have no conflict of interest.

REFERENCES

REFERENCES
Asher
,
M. J.
,
Croke
,
B. F. W.
,
Jakeman
,
A. J.
&
Peeters
,
L. J. M.
2015
A review of surrogate models and their application to groundwater modelling
.
Water Resour. Res.
51
(
8
),
5957
5973
.
Bhattacharjya
,
R. K.
&
Datta
,
B.
2005
Optimal management of coastal aquifers using linked simulation optimization approach
.
Water Resour. Manage.
19
(
3
),
295
320
.
Christelis
,
V.
&
Mantoglou
,
A.
2016a
Pumping optimization of coastal aquifers assisted by adaptive metamodelling methods and radial basis functions
.
Water Resour. Manage.
30
(
15
),
5845
5859
.
Doi:10.1007/s11269-016-1337-3
.
Efstratiadis
,
A.
&
Koutsoyiannis
,
D.
2002
An evolutionary annealing-simplex algorithm for global optimization of water resource systems
. In:
Proceedings of the Fifth International Conference on Hydroinformatics
,
Cardiff, UK
,
International Water Association Publishing
2
, pp.
1423
1428
.
Forrester
,
A. I. J.
&
Keane
,
A. J.
2009
Recent advances in surrogate-based optimization
.
Prog. Aerosp. Sci.
49
,
50
79
.
Forrester
,
A. I. J.
,
Sóbester
,
A.
&
Keane
,
A. J.
2008
Engineering Design Via Surrogate Modelling – A Practical Guide
.
Wiley
,
New York
.
Giambastiani
,
B. M.
,
Antonellini
,
M.
,
Essink
,
G. H. O.
&
Stuurman
,
R. J.
2007
Saltwater intrusion in the unconfined coastal aquifer of Ravenna (Italy): a numerical model
.
J. Hydrol.
340
(
1
),
91
104
.
Hussain
,
M. S.
,
Javadi
,
A. A.
,
Ahangar-Asr
,
A.
&
Farmani
,
R.
2015
A surrogate model for simulation–optimization of aquifer systems subjected to seawater intrusion
.
J. Hydrol.
523
,
542
554
.
Jin
,
R.
,
Chen
,
W.
&
Simpson
,
T. W.
2001
Comparative studies of metamodelling techniques under multiple modelling criteria
.
Structural and Multidisciplinary Optimization
23
(
1
),
1
13
.
Jones
,
D.
,
Schonlau
,
M.
&
Welch
,
W.
1998
Efficient global optimization of expensive black-box functions
.
J. Global Optim.
13
,
455
492
.
Karpouzos
,
D. K.
&
Katsifarakis
,
K. L.
2013
A set of new benchmark optimization problems for water resources management
.
Water Resour. Manage.
27
(
9
),
3333
3348
.
Kolditz
,
O.
,
Ratke
,
R.
,
Diersch
,
H. J. G.
&
Zielke
,
W.
1998
Coupled groundwater flow and transport: 1. Verification of variable density flow and transport models
.
Adv. Water Resour.
21
(
1
),
27
46
.
Kopsiaftis
,
G.
,
Mantoglou
,
A.
&
Giannoulopoulos
,
P.
2009
Variable density coastal aquifer models with application to an aquifer on Thira Island
.
Desalination
237
(
1
),
65
80
.
Mantoglou
,
A.
,
Papantoniou
,
M.
&
Giannoulopoulos
,
P.
2004
Management of coastal aquifers based on nonlinear optimization and evolutionary algorithms
.
J. Hydrol.
297
(
1–4
),
209
228
.
MATLAB
2016
Statistics and Machine Learning Toolbox, Release
.
The MathWorks, Inc.
,
Natick, MA
,
USA
.
Parr
,
J. M.
,
Keane
,
A. J.
,
Forrester
,
A. I.
&
Holden
,
C. M. E.
2012
Infill sampling criteria for surrogate-based optimization with constraint handling
.
Eng. Optimiz.
44
(
10
),
1147
1166
.
Powell
,
M. J. D.
1992
The theory of Radial Basis Function Approximation in 1990
. In:
Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision Algorithms and Radial Basis Functions
(
Light
,
W.
, ed.).
Oxford University Press
,
Oxford
,
UK
. pp.
105
210
.
Rao
,
S. V. N.
,
Sreenivasulu
,
V.
,
Bhallamudi
,
S. M.
,
Thandaveswara
,
B. S.
&
Sudheer
,
K. P.
2004
Planning groundwater development in coastal aquifers
.
Hydrol. Sci. J.
49
(
1
),
155
170
.
Razavi
,
S.
,
Tolson
,
B. A.
&
Burn
,
D. H.
2012a
Review of surrogate modelling in water resources
.
Water Resour. Res.
48
(
7
),
W07401
,
doi:10.1029/2011WR011527
.
Razavi
,
S.
,
Tolson
,
B. A.
&
Burn
,
D. H.
2012b
Numerical assessment of metamodelling strategies in computationally intensive optimization
.
Environ. Modell. Softw.
34
,
67
86
.
Regis
,
R. G.
&
Shoemaker
,
C. A.
2007
Improved strategies for radial basis function methods for global optimization
.
J. Global Optim.
37
,
113
135
.
Roy
,
T.
,
Schütze
,
N.
,
Grundmann
,
J.
,
Brettschneider
,
M.
&
Jain
,
A.
2016
Optimal groundwater management using state-space models: a case study for an arid coastal region
.
J. Hydroinform.
18
(
4
),
666
686
.
Rozos
,
E.
,
Efstratiadis
,
A.
,
Nalbantis
,
I.
&
Koutsoyiannis
,
D.
2004
Calibration of a semi-distributed model for conjunctive simulation of surface and groundwater flows
.
Hydrol. Sci. J.
49
(
5
),
819
842
.
Solomatine
,
D. P.
&
Ostfeld
,
A.
2008
Data-driven modelling: some past experiences and new approaches
.
J. Hydroinf.
10
(
1
),
3
22
.
Therrien
,
R.
,
McLaren
,
R. G.
,
Sudicky
,
E. A.
&
Panday
,
S. M.
2006
HydroGeoSphere – A Three-Dimensional Numerical Model Describing Fully-Integrated Subsurface and Surface Flow and Solute Transport. Groundwater Simulations Group.
University of Waterloo
,
Canada
.
Thompson
,
C.
,
Smith
,
L.
&
Maji
,
R.
2007
Hydrogeological modelling of submarine groundwater discharge on the continental shelf of Louisiana
.
J. Geophys. Res.
Doi: 10.1029/2006JC003557
.
Werner
,
A. D.
,
Bakker
,
M.
,
Post
,
V. E. A.
,
Vandenbohede
,
A.
,
Lu
,
C.
,
Ataie-Ashtiani
,
B.
,
Simmons
,
C. T.
&
Barry
,
D. A.
2013
Seawater intrusion processes, investigation and management: recent advances and future challenges
.
Adv. Water Resour.
51
,
3
26
.