Abstract
Contaminants in groundwater may enter through various sources which are required to be identified for informed decision-making regarding remediation. In early stages of contamination of an aquifer, the sources of contamination are generally unknown. To estimate the unknown release histories at potential sources, the governing equations of contaminant transport are solved backwards in time which renders the inverse problem as ill-posed. Furthermore, due to measurement or operational errors, observed breakthrough curves can be noisy which can induce uncertainty. Simulation-optimization (SO) techniques are generally used for solving the inverse model. Nevertheless, swarm population-based optimization algorithms may get trapped in local optima in the nonlinear search space related to the problem. In this study, SO model based on meshless radial point collocation method (RPCM) and multiverse optimizer (MVO) is proposed. MVO implements gradual transformation from global exploration to local exploitation phase which helps in better estimation of unknowns. The performance of RPCM-MVO-SO model is compared with two other SO models developed by combining RPCM with grey wolf optimizer and particle swarm optimization in two case studies. The proposed model performs better compared to the other two SO models considered which suggests the efficacy of the model for release history estimation in groundwater.
HIGHLIGHTS
A novel RPCM-MVO-SO model is proposed for contaminant source identification.
The RPCM-MVO model performs better compared to RPCM-PSO and RPCM-GWO for the two case studies considered.
RPCM-MVO model estimated release intensities have low variations for noisy observations.
INTRODUCTION
Groundwater is one of the important water resources which gets frequently contaminated due to deliberate or, unattended discharges from various sources located underground such as chemical releases, septic tanks, storage tanks and tailing ponds (Batu 2005). These sources may inject several reactive or, conservative pollutants which severely affect water quality. An aquifer may be perceived as contaminated when water quality obtained in any one of the observation wells becomes objectionable or, nonusable for various activities such as drinking, agricultural or industrial purposes. When the contamination is first detected in an aquifer, many a time the details about the source are not generally known due to the slow movement of groundwater which results in a large time difference between the event of contamination and its detection. Obtaining the details of the contaminant source, i.e., its location and release history are essential as it assists in designing the optimal remedial actions, prediction of future contaminant plume movement and demarcation of the contaminated aquifer.
The transport of contaminants in groundwater occurs through the presence of a hydraulic gradient which results in advection and dispersion processes (Bear & Cheng 2010). These processes are represented by partial differential equations (PDEs) which are forward in time. Nevertheless, the estimation of release histories in an already contaminated aquifer requires solving the PDEs backwards in time using concentration measurements at the observation wells. However, solving the PDEs backwards in time becomes prohibitively complex due to the ill-posedness of the inverse problem (Jha & Datta 2013) and the physical infeasibility of the process (Gorelick et al. 1983). Generally, simulation-optimization (SO) models are used for release history estimation. In these models, the release history of contaminants from possible source locations is estimated by developing a simulation model which represents the aquifer hydrogeology and the information from observation wells. Further, the optimization model utilizes the simulation model to simulate the response of the aquifer at the observation wells as a result of contaminant release at the source. To predict the history of contaminant release at the source locations, the difference between observed and simulated breakthrough curves at the observation wells is minimized in the SO framework.
Due to the inherent complexity of the groundwater flow and transport process, the relationship between release rates and breakthrough curves is nonlinear. Therefore, stochastic population-based optimization models are usually used in inverse problems due to the nonrequirement of gradients. Several optimization algorithms such as particle swarm optimization (PSO) (Singh et al. 2018), genetic algorithm (Han et al. 2020), cat swarm optimization (CSO) (Thomas et al. 2018), differential evolution (DE) (Gurarslan & Karahan 2015), grey wolf optimization (GWO) (Majumder & Eldho 2020), etc., have been successfully used in SO models for groundwater problems. Apart from that, algorithms such as Improved Runge–Kutta Optimization (IRKO) (Devi et al. 2022), genetic firefly (Gupta et al. 2021), Yin-Yang firefly algorithm (Wang et al. 2020) and Harris hawk optimization (Sammen et al. 2020) are used in solving optimization problems in various fields. These algorithms traverse the nonlinear search space and evolve the solution iteratively to arrive at the global optimum solution. The search procedure in an optimization problem can be divided into global exploration and local exploitation phases (Mirjalili et al. 2016). In the global exploration phase, the potential locations of global optima are identified whereas in the location exploitation phase, the solutions at these potential locations are further refined iteratively. Further many researchers used surrogate models with SO models to reduce the computational cost to a great extent (Borah & Bhattacharjya 2016; Majumder & Eldho 2020). Anshuman & Eldho (2022b) proposed a surrogate SO model based on Entity Aware Sequence-to-sequence learning using Long Short Term Memory (EAS-LSTM) for simultaneous aquifer parameter estimation and contaminant source identification.
In this study, the goal is to evaluate the efficiency of a recently proposed optimization model Multiverse optimization (MVO) (Mirjalili et al. 2016) with grey wolf optimizer (GWO) (Mirjalili et al. 2014) and PSO (Eberhart & Kennedy 1995) for release history estimation in groundwater using the SO technique. The SO model is developed by linking the meshfree RPCM simulation model with the optimization models. The MVO iteratively improves the estimates of the unknowns by reducing the error between observed and simulated breakthrough curves. The MVO algorithm uses a parameter named wormhole existence probability (WEP) which allows local changes in limited population size which increases with the increase in iteration number. This technique helps in a gradual transition from global exploration to the local exploitation phase. This study compares the performance of RPCM-MVO-SO with that of RPCM-PSO and RPCM-GWO models for two case studies. Additionally, using multiple erroneous breakthrough curves in case study 2, this study attempts to compare the uncertainty of estimated release histories by different SO models as a result of observational errors.
MATERIAL AND METHODS
Groundwater flow and transport
Radial point collocation method
As shown in Equation (8), a semi-implicit scheme is used to represent the temporal discretization of concentration in Equation (2) (Anshuman et al. 2019). The temporal discretization parameter takes a value of 0 for fully implicit and 1 for fully explicit as per Equation (8). Here a value of 0.5 is assigned to , which is also referred to as the Crank–Nicolson scheme (Rastogi 2007), as it is known to provide accurate transient simulations (Singh et al. 2016).
Multiverse optimization
The term WEP denotes WEP which is also dependent on t and . As observed in Equations (11) and (12), the WEP increases linearly with t while TDR decreases nonlinearly with iteration number t. The maximum and minimum values of WEP are set as 1 and 0.2 (Mirjalili et al. 2016). This essentially means that at the first iteration, approximately 20% population undergoes local changes as per Equation (10) and this percentage increases to approximately 100% as the iteration number reaches to . It may be noted that, at all iterations, the changes are also implemented through Equation (9). In this manner, the WEP parameter helps in the gradual transition of the search process from global exploration to local exploitation.
Particle swarm optimization
Here, and represent the velocity and position of the ith particle for iteration number t. b and G represent the best solution of individual particle and swarm, respectively. The terms ω, c1 and c2 are constants. The terms r1 and r2 are random numbers in [0,1]. In this study, the parameters of PSO, i.e., ω, c1 and c2 are set as 0.5, 1.49 and 1.49 based on the study reported by Thomas et al. (2019).
Grey wolf optimization
The coefficient C depends on a randomly generated number in the range [0,1] whereas A depends on a random number in the range [0,1] as well as iteration number t. These are explained in detail in Mirjalili et al. (2014).
DEVELOPMENT OF INVERSE MODEL
Objective function
Here and are erroneous concentrations and normalized erroneous concentrations. The term represents the normalization function. Here maxmin normalization is used. The release history estimation using the SO model is presented in case studies 4.1 and 4.2.1. In these problems, the simulated concentrations at the observation wells are used as observed concentrations. The SO models are also tested for estimation of release histories using erroneous concentrations in case 4.2.2 considering the possibility of measurement error. Here, the simulated concentrations at the observation wells with random normal error, i.e., perturbed concentrations are used in the objective function as in Equation (17).
SO model
In the SO model for contaminant source identification, the optimization utilizes the simulation model to minimize the error (see Equation (17)) between the simulated/predicted and observed breakthrough curves at the observation wells. With increasing iteration numbers, the optimization models improve these predictions. The step-wise procedure for the RPCM-MVO-SO model is described given below:
- 1.
The aquifer parameters such as aquifer properties, boundary conditions and source/sink information are provided in this step. Using this information and MQ-RBF parameters, coupled flow and transport RPCM model is developed.
- 2.
The populations are initialized for the first iteration based on the bounds provided for each variable to be obtained.
- 3.
The parameters WEP and TDR are evaluated based on the iteration number. The objective function is defined as a minimization of error between the concentrations and the corresponding observations evaluated for each universe/population. It is also named the inflation rate in MVO. The universes are sorted based on their inflation rates and normalized inflation rates are evaluated. The universe with the highest value of inflation rate is considered the best universe. In this study, the minimization of error between observed and simulated concentration is considered the objective function. Hence, the inflation rate is considered as −1 times the objective function value (Mirjalili et al. 2016).
- 4.
As mentioned earlier, in MVO, there is only one universe with a white hole and all other universes possess black holes. For each variable in a universe, the white hole is selected based on a roulette wheel selection technique if a random number (r1) is less than the corresponding normalized inflation rate. The variable from that universe is replaced by the variable from the selected white hole containing the universe.
- 5.
Furthermore, based on the random number value r2 and WEP, the wormhole tunnel is established between the concerned universe and the white hole containing the universe. Subsequently, the variables in the universes are changed based on the TDR and randomly generated number r3.
- 6.
Steps 4–6 are repeated until the termination criteria are met. The termination criteria can be the maximum number of iterations or, the defined value of the objective function. The universe containing the best value of the inflation rate is considered the solution obtained from the RPCM-MVO-SO model.
Performance indicators
Here denotes the release intensities of the contaminant at the source locations. The actual and predicted values of are represented using superscripts act and pred, respectively. The terms and denote the number of source locations and stress periods, respectively. The value of R2 varies between 0 and 1. For higher correlation, the value of R2 tends to be 1. Meanwhile, Root Mean Squared Error (RMSE) can take any value, with lower values indicating lower errors and a better match between actual and estimated contaminant release histories.
MODEL APPLICATIONS
Case study 1
Hydrogeological parameter . | Case study 1 . | Case study 2 . | |
---|---|---|---|
Hydraulic conductivity | Kx | 8.4 m/day | 5 m/day |
Ky | 8.4 m/day | 2.5 m/day | |
Porosity | 0.2 | 0.3 | |
Dispersivity | αL | 7.6 m | 50 m |
αT | 2.3 m | 5 m |
Hydrogeological parameter . | Case study 1 . | Case study 2 . | |
---|---|---|---|
Hydraulic conductivity | Kx | 8.4 m/day | 5 m/day |
Ky | 8.4 m/day | 2.5 m/day | |
Porosity | 0.2 | 0.3 | |
Dispersivity | αL | 7.6 m | 50 m |
αT | 2.3 m | 5 m |
Stress period . | TDS released at source . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Case study 1 (g/s) . | Case study 2 . | ||||||||
S1 . | S2 . | S3 . | S4 . | S5 . | S1 . | S2 . | S3 . | S4 . | |
SP1 | 51 | 15 | 47 | 27 | 0 | 1 | 12 | 23 | 20 |
SP2 | 0 | 9 | 44 | 35 | 0 | 0 | 7 | 3 | 20 |
SP3 | 0 | 11 | 8 | 0 | 12 | 12 | 17 | 5 | 25 |
SP4 | 24 | 0 | 0 | 0 | 16 | 6 | 8 | 21 | 2 |
Stress period . | TDS released at source . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Case study 1 (g/s) . | Case study 2 . | ||||||||
S1 . | S2 . | S3 . | S4 . | S5 . | S1 . | S2 . | S3 . | S4 . | |
SP1 | 51 | 15 | 47 | 27 | 0 | 1 | 12 | 23 | 20 |
SP2 | 0 | 9 | 44 | 35 | 0 | 0 | 7 | 3 | 20 |
SP3 | 0 | 11 | 8 | 0 | 12 | 12 | 17 | 5 | 25 |
SP4 | 24 | 0 | 0 | 0 | 16 | 6 | 8 | 21 | 2 |
Inverse model . | Case study 1 . | Case study 2 . | ||
---|---|---|---|---|
R2 . | RMSE (ppm) . | R2 . | RMSE (ppm) . | |
RPCM-MVO | 0.982 | 3.139 | 0.992 | 1.054 |
RPCM-GWO | 0.858 | 9.214 | 0.991 | 1.079 |
RPCM-PSO | 0.912 | 8.109 | 0.967 | 2.299 |
Inverse model . | Case study 1 . | Case study 2 . | ||
---|---|---|---|---|
R2 . | RMSE (ppm) . | R2 . | RMSE (ppm) . | |
RPCM-MVO | 0.982 | 3.139 | 0.992 | 1.054 |
RPCM-GWO | 0.858 | 9.214 | 0.991 | 1.079 |
RPCM-PSO | 0.912 | 8.109 | 0.967 | 2.299 |
Case study 2
Observed breakthrough curves without error
Erroneous breakthrough curves
DISCUSSION
There are a variety of sources that can introduce contaminants into groundwater, and these sources must be identified so that informed decisions can be made concerning groundwater remediation and prediction of plume movement. This study proposes an RPCM-MVO-SO model for the estimation of release histories utilizing contaminant observations and compares its efficacy to that of the RPCM-GWO and RPCM-PSO models in two case studies. In the first case study, the number of estimable variables is 20. Based on the performance indicators, the RPCM-MVO model produces significantly better results compared to that of RPCM-GWO and RPCM-PSO. Interestingly, the convergence curve of RPCM-MVO is observed to converge slower compared to the other SO models which are due to initial restrictions on local exploitation imposed using the WEP parameter in MVO. Due to this, the algorithm focuses on global exploration and gradually focuses on local exploitation with an increase in iteration number. On the other hand, the PSO and GWO algorithms tend to converge faster as they apply changes in the entire population using Equations (14) and (16), respectively.
In case study 2, the SO models are utilized for the estimation of 16 variables (4 sources × 4 stress periods). Similar to case study 1, the RPCM-MVO model is observed to perform better in terms of R2 and RMSE. Interestingly, the RPCM-PSO model performs well in terms of R2. However, due to imprecise estimations for source location 3, the RMSE computed is significantly higher than that of RPCM-MVO. Furthermore, considering measurement and operational errors in the measurement at observation points, the SO models are run using five noisy breakthrough curves. The performances indicate that the RPCM-MVO estimated release histories are close to the actual values for all source locations and stress periods. Additionally, the convergence curves and boxplots of the SO models in Figure 9 indicate low variability in RPCM-MVO estimates. In contrast, the RPCM-GWO and RPCM-PSO estimates show higher variability due to noisy breakthrough curves, especially in source locations 2 and 3. It may be noted that the breakthrough curves used for the evaluation of the objective function use fewer data points compared to the timesteps used for simulation. In case study 1, the measurement interval for case study 1 is 365 days whereas it is 30 days for case study 2 for simulation periods of 15 and 2 years, respectively. This indicates that the SO model can perform well in limited data availability. Based on the case studies, MVO exploits the multidimensional search space for the inverse problems better compared to GWO and PSO. The time required for all SO models is nearly equal as observed in the case studies. However, in terms of convergence, the MVO model is slower compared to GWO and PSO due to the restricted change to the population in the initial phase of optimization in MVO. Hence, for low-dimensional problems with relatively smoother objective functions, the GWO and PSO may provide accurate solutions in fewer iterations. Due to the use of the simulation model in conjunction with the optimization model, the PDEs related to governing equations are solved at every timestep increasing the computational cost. The computational efficiency of the inverse modelling can be improved significantly by using surrogate models (Borah & Bhattacharjya 2016). However, machine learning-based surrogate models also require sample generation for training which can be computationally expensive for a large study area.
CONCLUSIONS
In this study, an SO inverse model RPCM-MVO is used for contaminant source identification in groundwater. The performance of the proposed SO model is compared with RPCM-GWO and RPCM-PSO models for two case studies. The RPCM-MVO is observed to provide closer estimates of the actual contaminant release rates compared to other SO models considered. The RMSE values obtained are 3.14 ppm and 1.05 ppm for case studies 1 and 2, respectively. Furthermore, for inverse modelling using noisy observations, the RPCM-MVO model provides lower variation in the estimates of the release intensities at the source locations. The better performance of MVO can be attributed to the gradual transition between global exploration and local exploitation phases which also results in slower convergences as observed in the convergence curves in the case studies. However, this technique prevents MVO solutions from getting trapped in the local minima. On the other hand, GWO and PSO solutions converge faster initially. Nonetheless, they show little improvement with an increase in iterations after this initial phase. It may be noted that the use of simulation models in SO technique results in an increased computational cost, making it computationally expensive for large aquifers and long simulation periods. Using surrogate models derived from machine learning techniques in this case can be computationally effective. Moreover, the use of surrogate models can also assist in more comprehensive estimation of uncertainty in release history estimation due to observational errors by using a greater number of breakthrough curve realizations. Due to the presence of complexity in the flow and transport process in the groundwater, the search space for the optimization problem is highly nonlinear and multidimensional. From the case studies considered in this study, it can be concluded that the MVO algorithm traverses the nonlinear search space more efficiently compared to GWO and PSO indicating higher efficacy of the RPCM-MVO model for contaminant source identification.
ACKNOWLEDGEMENTS
The first author of the paper is grateful to Indian Institute of Technology Bombay for providing funding for the research. The authors are thankful to the anonymous reviewers for their constructive suggestions which helped in improving the quality of the paper.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.