## Abstract

Contaminants in groundwater may enter through various sources which are required to be identified for informed decision-making regarding remediation. In early stages of contamination of an aquifer, the sources of contamination are generally unknown. To estimate the unknown release histories at potential sources, the governing equations of contaminant transport are solved backwards in time which renders the inverse problem as ill-posed. Furthermore, due to measurement or operational errors, observed breakthrough curves can be noisy which can induce uncertainty. Simulation-optimization (SO) techniques are generally used for solving the inverse model. Nevertheless, swarm population-based optimization algorithms may get trapped in local optima in the nonlinear search space related to the problem. In this study, SO model based on meshless radial point collocation method (RPCM) and multiverse optimizer (MVO) is proposed. MVO implements gradual transformation from global exploration to local exploitation phase which helps in better estimation of unknowns. The performance of RPCM-MVO-SO model is compared with two other SO models developed by combining RPCM with grey wolf optimizer and particle swarm optimization in two case studies. The proposed model performs better compared to the other two SO models considered which suggests the efficacy of the model for release history estimation in groundwater.

## HIGHLIGHTS

A novel RPCM-MVO-SO model is proposed for contaminant source identification.

The RPCM-MVO model performs better compared to RPCM-PSO and RPCM-GWO for the two case studies considered.

RPCM-MVO model estimated release intensities have low variations for noisy observations.

## INTRODUCTION

Groundwater is one of the important water resources which gets frequently contaminated due to deliberate or, unattended discharges from various sources located underground such as chemical releases, septic tanks, storage tanks and tailing ponds (Batu 2005). These sources may inject several reactive or, conservative pollutants which severely affect water quality. An aquifer may be perceived as contaminated when water quality obtained in any one of the observation wells becomes objectionable or, nonusable for various activities such as drinking, agricultural or industrial purposes. When the contamination is first detected in an aquifer, many a time the details about the source are not generally known due to the slow movement of groundwater which results in a large time difference between the event of contamination and its detection. Obtaining the details of the contaminant source, i.e., its location and release history are essential as it assists in designing the optimal remedial actions, prediction of future contaminant plume movement and demarcation of the contaminated aquifer.

The transport of contaminants in groundwater occurs through the presence of a hydraulic gradient which results in advection and dispersion processes (Bear & Cheng 2010). These processes are represented by partial differential equations (PDEs) which are forward in time. Nevertheless, the estimation of release histories in an already contaminated aquifer requires solving the PDEs backwards in time using concentration measurements at the observation wells. However, solving the PDEs backwards in time becomes prohibitively complex due to the ill-posedness of the inverse problem (Jha & Datta 2013) and the physical infeasibility of the process (Gorelick *et al.* 1983). Generally, simulation-optimization (SO) models are used for release history estimation. In these models, the release history of contaminants from possible source locations is estimated by developing a simulation model which represents the aquifer hydrogeology and the information from observation wells. Further, the optimization model utilizes the simulation model to simulate the response of the aquifer at the observation wells as a result of contaminant release at the source. To predict the history of contaminant release at the source locations, the difference between observed and simulated breakthrough curves at the observation wells is minimized in the SO framework.

Due to the inherent complexity of the groundwater flow and transport process, the relationship between release rates and breakthrough curves is nonlinear. Therefore, stochastic population-based optimization models are usually used in inverse problems due to the nonrequirement of gradients. Several optimization algorithms such as particle swarm optimization (PSO) (Singh *et al.* 2018), genetic algorithm (Han *et al.* 2020), cat swarm optimization (CSO) (Thomas *et al.* 2018), differential evolution (DE) (Gurarslan & Karahan 2015), grey wolf optimization (GWO) (Majumder & Eldho 2020), etc., have been successfully used in SO models for groundwater problems. Apart from that, algorithms such as Improved Runge–Kutta Optimization (IRKO) (Devi *et al.* 2022), genetic firefly (Gupta *et al.* 2021), Yin-Yang firefly algorithm (Wang *et al.* 2020) and Harris hawk optimization (Sammen *et al.* 2020) are used in solving optimization problems in various fields. These algorithms traverse the nonlinear search space and evolve the solution iteratively to arrive at the global optimum solution. The search procedure in an optimization problem can be divided into global exploration and local exploitation phases (Mirjalili *et al.* 2016). In the global exploration phase, the potential locations of global optima are identified whereas in the location exploitation phase, the solutions at these potential locations are further refined iteratively. Further many researchers used surrogate models with SO models to reduce the computational cost to a great extent (Borah & Bhattacharjya 2016; Majumder & Eldho 2020). Anshuman & Eldho (2022b) proposed a surrogate SO model based on Entity Aware Sequence-to-sequence learning using Long Short Term Memory (EAS-LSTM) for simultaneous aquifer parameter estimation and contaminant source identification.

In this study, the goal is to evaluate the efficiency of a recently proposed optimization model Multiverse optimization (MVO) (Mirjalili *et al.* 2016) with grey wolf optimizer (GWO) (Mirjalili *et al.* 2014) and PSO (Eberhart & Kennedy 1995) for release history estimation in groundwater using the SO technique. The SO model is developed by linking the meshfree RPCM simulation model with the optimization models. The MVO iteratively improves the estimates of the unknowns by reducing the error between observed and simulated breakthrough curves. The MVO algorithm uses a parameter named wormhole existence probability (WEP) which allows local changes in limited population size which increases with the increase in iteration number. This technique helps in a gradual transition from global exploration to the local exploitation phase. This study compares the performance of RPCM-MVO-SO with that of RPCM-PSO and RPCM-GWO models for two case studies. Additionally, using multiple erroneous breakthrough curves in case study 2, this study attempts to compare the uncertainty of estimated release histories by different SO models as a result of observational errors.

## MATERIAL AND METHODS

### Groundwater flow and transport

*h*is the hydraulic head; is transmissivity in

*x-*direction and is transmissivity in

*y*-direction;

*Q*

_{w}represents injection/pumping well and is Kronecker delta function. The value of is 1 subject to the presence of injection/pumping well at , otherwise, it is taken as zero. The governing equation for the transport of a contaminant species in a two-dimensional aquifer domain is given by (Bear & Verruijt 1987).where

*C*represents the concentration of the contaminant, and are seepage velocities,

*λ*is the rate of first-order decay and

*Q*

_{c}is the release rate of the contaminant at the source location. The terms and denote dispersion coefficients which are obtained using seepage velocities and dispersivities (longitudinal and transverse, i.e., and , respectively) (Rastogi 2007). The seepage velocities in a coupled system are obtained using Darcy's law which utilizes the effective porosity (

*η*

_{e}), hydraulic conductivity (

*κ*) and head variation in the problem domain obtained using the flow model (Bear & Cheng 2010). Along with the governing Equations (1) and (2), corresponding boundary conditions which are applicable at the boundaries are solved to obtain head and contaminant concentrations in the problem domain (Singh

*et al.*2016).

### Radial point collocation method

*et al.*2016; Mategaonkar

*et al.*2018; Anshuman

*et al.*2019; Thomas

*et al.*2019) is utilized in this study. RPCM disintegrates the problem domain into many intersecting support domains which surround each node present in the problem domain. Equation (3) computes the state variable ‘

*h*’ based on the

*n*nodes in the support domain of a point of interest (Liu & Gu 2005).

*i*in the support domain of

*k*th node as follows (Kansa 1990).

*i*th node in the support domain of the point of interest

*x*.

_{k}*d*represents the average distance between all internal nodes in a problem domain. MQ-RBF shape parameters are denoted by the terms

_{c}*q*and

*α*. The vector consisting of the coefficients in Equation (3) can be calculated by implementing the MQ-RBF moment matrix at all nodes in the local support domain as follows (Liu & Gu 2005).

*h*. Substitution of the coefficient vector presented in Equation (5) into Equation (3) yields the solution for the state variable as follows (Liu & Gu 2005).

*ϕ*represents the shape function which is used for spatial discretization of governing equations. For estimating the derivatives of the shape function, the corresponding derivatives of the MQ-RBF vector are used in Equation (6). The governing Equation (1) after the implementation of shape functions and derivatives can be represented as shown in Equation (7) (Anshuman & Eldho 2022a).

*et al.*2016).

As shown in Equation (8), a semi-implicit scheme is used to represent the temporal discretization of concentration in Equation (2) (Anshuman *et al.* 2019). The temporal discretization parameter takes a value of 0 for fully implicit and 1 for fully explicit as per Equation (8). Here a value of 0.5 is assigned to , which is also referred to as the Crank–Nicolson scheme (Rastogi 2007), as it is known to provide accurate transient simulations (Singh *et al.* 2016).

### Multiverse optimization

*et al.*2016) is a lately proposed optimization algorithm which is inspired by a popular physics theory which argues for the existence of multiple universes (Barrow

*et al.*2004). Here, the candidate solutions of the decision variables are referred to as universes. As per the optimization algorithm, the universes interact by swapping elements through white holes, black holes and wormholes. A black hole can devour anything within its gravitational field and is observed throughout the universe. Contrastingly, white holes are theorized to eject enormous matter into space (Eardley 1974). It is theorized that wormholes connect different areas of space-time, as well as the multiverse (Morris & Thorne 1988). A white hole is assumed to exist in the best universe in MVO, while a black hole is assumed to exist in the remaining universes (Mirjalili

*et al.*2016). The tunnels, which are determined by a roulette wheel mechanism, transport objects from the best universe white to other universes. It is represented as follows.

*j*th variable is denoted by and is selected through the roulette wheel mechanism on the

*k*th universe. Accordingly, refers to the

*j*th variable, while refers to the normalized inflation rate for the

*i*th universe. The term represents a random number in [0, 1]. A proportional inflation rate is assigned to the universes in the multiverse according to the objective function value. After each iteration, the universe with the best inflation rate is considered the best solution. The interaction between universes in MVO is governed by five principles based on inflation rates which are detailed in Mirjalili

*et al.*(2016). On account of these principles, the exploration and exploitation of the search space in the universe are done using white holes/black and wormholes, respectively. The wormholes which can be present in any universe facilitate local changes in the universes using the following equation.where represents the

*j*th variable of the best universe. The upper and lower bounds of the

*j*th variable are represented by and . The terms , and denote random numbers in [0,1]. TDR represents the travelling distance rate which is dependent on iteration number

*t*and maximum iteration number . It is given by Equation (11).

The term WEP denotes WEP which is also dependent on *t* and . As observed in Equations (11) and (12), the WEP increases linearly with *t* while TDR decreases nonlinearly with iteration number *t*. The maximum and minimum values of WEP are set as 1 and 0.2 (Mirjalili *et al.* 2016). This essentially means that at the first iteration, approximately 20% population undergoes local changes as per Equation (10) and this percentage increases to approximately 100% as the iteration number reaches to . It may be noted that, at all iterations, the changes are also implemented through Equation (9). In this manner, the WEP parameter helps in the gradual transition of the search process from global exploration to local exploitation.

### Particle swarm optimization

Here, and represent the velocity and position of the *i*th particle for iteration number *t*. *b* and *G* represent the best solution of individual particle and swarm, respectively. The terms *ω*, *c*_{1} and *c*_{2} are constants. The terms *r*_{1} and *r*_{2} are random numbers in [0,1]. In this study, the parameters of PSO, i.e., *ω, c*_{1} and *c*_{2} are set as 0.5, 1.49 and 1.49 based on the study reported by Thomas *et al.* (2019).

### Grey wolf optimization

*et al.*2014). The alpha, beta, delta and omega wolves in the pack are ranked according to descending hierarchical order. The major steps in the hunting of grey wolves are tracking, encircling and attacking the prey (Muro

*et al.*2011). In GWO, the candidate solutions are represented as wolves and the alpha wolf corresponds to the solution with the best value of fitness function. Mathematically, the encircling behaviour of grey wolves is represented by Equation (15). The positions of the wolves are modified after each iteration as per Equation (16).

The coefficient *C* depends on a randomly generated number in the range [0,1] whereas *A* depends on a random number in the range [0,1] as well as iteration number *t*. These are explained in detail in Mirjalili *et al.* (2014).

## DEVELOPMENT OF INVERSE MODEL

### Objective function

*et al.*2018).

Here and are erroneous concentrations and normalized erroneous concentrations. The term represents the normalization function. Here *maxmin* normalization is used. The release history estimation using the SO model is presented in case studies 4.1 and 4.2.1. In these problems, the simulated concentrations at the observation wells are used as observed concentrations. The SO models are also tested for estimation of release histories using erroneous concentrations in case 4.2.2 considering the possibility of measurement error. Here, the simulated concentrations at the observation wells with random normal error, i.e., perturbed concentrations are used in the objective function as in Equation (17).

### SO model

In the SO model for contaminant source identification, the optimization utilizes the simulation model to minimize the error (see Equation (17)) between the simulated/predicted and observed breakthrough curves at the observation wells. With increasing iteration numbers, the optimization models improve these predictions. The step-wise procedure for the RPCM-MVO-SO model is described given below:

- 1.
The aquifer parameters such as aquifer properties, boundary conditions and source/sink information are provided in this step. Using this information and MQ-RBF parameters, coupled flow and transport RPCM model is developed.

- 2.
The populations are initialized for the first iteration based on the bounds provided for each variable to be obtained.

- 3.
The parameters WEP and TDR are evaluated based on the iteration number. The objective function is defined as a minimization of error between the concentrations and the corresponding observations evaluated for each universe/population. It is also named the

*inflation rate*in MVO. The universes are sorted based on their inflation rates and normalized inflation rates are evaluated. The universe with the highest value of inflation rate is considered the best universe. In this study, the minimization of error between observed and simulated concentration is considered the objective function. Hence, the inflation rate is considered as −1 times the objective function value (Mirjalili*et al.*2016). - 4.
As mentioned earlier, in MVO, there is only one universe with a white hole and all other universes possess black holes. For each variable in a universe, the white hole is selected based on a roulette wheel selection technique if a random number (

*r*_{1}) is less than the corresponding normalized inflation rate. The variable from that universe is replaced by the variable from the selected white hole containing the universe. - 5.
Furthermore, based on the random number value

*r*_{2}and WEP, the wormhole tunnel is established between the concerned universe and the white hole containing the universe. Subsequently, the variables in the universes are changed based on the TDR and randomly generated number*r*_{3}. - 6.
Steps 4–6 are repeated until the termination criteria are met. The termination criteria can be the maximum number of iterations or, the defined value of the objective function. The universe containing the best value of the inflation rate is considered the solution obtained from the RPCM-MVO-SO model.

*et al.*(2016) and Anshuman & Eldho (2020). Further, the RPCM model is linked with MVO for release history estimation. The flowchart of the developed RPCM-MVO-SO model is explained in Figure 1. It may also be noted that the performance of RPCM-MVO is compared with that of RPCM-GWO and RPCM-PSO in two case studies. The development of these SO models is very similar to that of RPCM-MVO with corresponding changes in steps 3–5.

### Performance indicators

*et al.*2018).

Here denotes the release intensities of the contaminant at the source locations. The actual and predicted values of are represented using superscripts *act* and *pred*, respectively. The terms and denote the number of source locations and stress periods, respectively. The value of *R*^{2} varies between 0 and 1. For higher correlation, the value of *R*^{2} tends to be 1. Meanwhile, Root Mean Squared Error (RMSE) can take any value, with lower values indicating lower errors and a better match between actual and estimated contaminant release histories.

## MODEL APPLICATIONS

### Case study 1

*et al.*(1983) is considered (see Figure 2(a)). The north and south boundaries of the aquifer are subjected to Dirichlet boundary conditions with head values being 100 and 87.8 m resulting in the southward direction of groundwater flow. The no-flow boundary condition is applied to the east and west boundaries. The hydrogeological parameters influencing flow and transport phenomena are provided in Table 1. The aquifer is recharged at the rate of 0.0067 m/day at the pond location as shown in Figure 2(a). The potential contamination sources (S

_{1}, S

_{2}, S

_{3,}S

_{4}and S

_{5}) as shown in Figure 2(b) inject total dissolved solutes (TDS) into the aquifer for an overall period of 4 years consisting of four stress periods of 1 year each. The disposal rate or, release history of TDS into the aquifer at these sources is presented in Table 2. These constitute 20 unknowns (5 possible source locations 4 stress periods) which are required to be estimated for the concerned case study using the time series at the observation wells (OB

_{1}, OB

_{2}and OB

_{3}) shown in Figure 2(b). The flow and transport in the aquifer are simulated by the meshless RPCM model which consists of 632 nodes with a nodal spacing of 45.7 m (see Figure 2(b)). The MQ-RBF parameters such as

*q*,

*α*and radius of the local support domain are set as 0.98, 5 and 3, respectively. The values of the parameters are adapted from previous studies which demonstrate different contaminant transport problems in groundwater (Singh

*et al.*2016; Anshuman & Eldho 2020). The sensitivity analysis of these parameters is also presented in detail in Anshuman & Eldho (2020) and Anshuman

*et al.*(2019). The RPCM flow model simulated head distribution is presented in Figure 3(a). As seen in the figure, the hydraulic head contours in the vicinity of the pond location are altered due to incoming groundwater flux from the pond. Figure 3(b) illustrates the RPCM-simulated concentrations at the observation points. Figure 4 presents the concentration contours at different timesteps. It may be noted that, although contaminant is introduced in the sources during the first 4 years, the coupled RPCM flow and transport simulation model is run for 15 years considering the response time in the observation wells. The time step size in the simulation model is 36.5 days. The developed simulation model is coupled with optimization algorithms, i.e., PSO, GWO and MVO as discussed in section 2.

Hydrogeological parameter . | Case study 1 . | Case study 2 . | |
---|---|---|---|

Hydraulic conductivity | Kx | 8.4 m/day | 5 m/day |

Ky | 8.4 m/day | 2.5 m/day | |

Porosity | 0.2 | 0.3 | |

Dispersivity | αL | 7.6 m | 50 m |

αT | 2.3 m | 5 m |

Hydrogeological parameter . | Case study 1 . | Case study 2 . | |
---|---|---|---|

Hydraulic conductivity | Kx | 8.4 m/day | 5 m/day |

Ky | 8.4 m/day | 2.5 m/day | |

Porosity | 0.2 | 0.3 | |

Dispersivity | αL | 7.6 m | 50 m |

αT | 2.3 m | 5 m |

Stress period . | TDS released at source . | ||||||||
---|---|---|---|---|---|---|---|---|---|

Case study 1 (g/s) . | Case study 2 . | ||||||||

S_{1}
. | S_{2}
. | S_{3}
. | S_{4}
. | S_{5}
. | S_{1}
. | S_{2}
. | S_{3}
. | S_{4}
. | |

SP_{1} | 51 | 15 | 47 | 27 | 0 | 1 | 12 | 23 | 20 |

SP_{2} | 0 | 9 | 44 | 35 | 0 | 0 | 7 | 3 | 20 |

SP_{3} | 0 | 11 | 8 | 0 | 12 | 12 | 17 | 5 | 25 |

SP_{4} | 24 | 0 | 0 | 0 | 16 | 6 | 8 | 21 | 2 |

Stress period . | TDS released at source . | ||||||||
---|---|---|---|---|---|---|---|---|---|

Case study 1 (g/s) . | Case study 2 . | ||||||||

S_{1}
. | S_{2}
. | S_{3}
. | S_{4}
. | S_{5}
. | S_{1}
. | S_{2}
. | S_{3}
. | S_{4}
. | |

SP_{1} | 51 | 15 | 47 | 27 | 0 | 1 | 12 | 23 | 20 |

SP_{2} | 0 | 9 | 44 | 35 | 0 | 0 | 7 | 3 | 20 |

SP_{3} | 0 | 11 | 8 | 0 | 12 | 12 | 17 | 5 | 25 |

SP_{4} | 24 | 0 | 0 | 0 | 16 | 6 | 8 | 21 | 2 |

_{min}and WEP

_{max}as shown in Equation (12). On the other hand, PSO and GWO used in this study do not enforce such restrictions which result in faster convergence as shown in Figure 5(a). However, from the results obtained in Figure 5, it is observed that RPCM-MVO exhibits a better match with actual injection rates compared to RPCM-PSO and RPCM-GWO models for estimation of release histories at different sources. Similarly, the error indicators presented in Table 3 show that the RPCM-MVO-SO model shows higher accuracy compared to RPCM-PSO and RPCM-GWO with

*R*

^{2}and RMSE values of 0.982 and 3.139, respectively. The better performance of RPCM-MVO can be attributed to a smooth transition of global exploration to local exploitation of the search space using the WEP parameter.

Inverse model . | Case study 1 . | Case study 2 . | ||
---|---|---|---|---|

R^{2}
. | RMSE (ppm) . | R^{2}
. | RMSE (ppm) . | |

RPCM-MVO | 0.982 | 3.139 | 0.992 | 1.054 |

RPCM-GWO | 0.858 | 9.214 | 0.991 | 1.079 |

RPCM-PSO | 0.912 | 8.109 | 0.967 | 2.299 |

Inverse model . | Case study 1 . | Case study 2 . | ||
---|---|---|---|---|

R^{2}
. | RMSE (ppm) . | R^{2}
. | RMSE (ppm) . | |

RPCM-MVO | 0.982 | 3.139 | 0.992 | 1.054 |

RPCM-GWO | 0.858 | 9.214 | 0.991 | 1.079 |

RPCM-PSO | 0.912 | 8.109 | 0.967 | 2.299 |

### Case study 2

*S*

_{1},

*S*

_{2},

*S*

_{3}and

*S*

_{4}) as shown in Figure 6(b) inject TDS into the aquifer for an overall period of 2 years consisting of four stress periods of 6 months each. The TDS release history of the sources is presented in Table 2. These constitute 16 unknowns that are required to be estimated for the concerned case study. The flow and transport in the aquifer are simulated by the meshless RPCM model which consists of 4131 nodes of a nodal distance of 10 m (see Figure 6(b)). The MQ-RBF shape parameters are fixed the same as in case study 1. There are five observation wells, namely OB

_{1}, OB

_{2}, OB

_{3}, OB

_{4}and OB

_{5}. The RPCM flow model-simulated head distribution is presented in Figure 6(c). Using the head values, the velocities and dispersion coefficients are computed. The RPCM transport model is run for a simulation period of 2 years with a timestep size of 5 days. The concentrations at the observation points are shown in Figure 6(d). The concentration contours at different timesteps are presented in Figure 7.

#### Observed breakthrough curves without error

_{1}. All SO models are observed to provide reasonably accurate results for source location S

_{2}. The RPCM-PSO model estimated release rates show the highest difference at source location S

_{3}. On the other hand, RPCM-MVO is observed to be the most accurate both in terms of

*R*

^{2}and RMSE.

#### Erroneous breakthrough curves

*et al.*2007). Here, breakthrough curves at the observation wells are introduced with random normal samples as explained in section 3.1. The perturbed breakthrough curves are presented in Figure 9(a). Here five breakthrough curves are used. For each breakthrough curve, the SO models are run to estimate the release intensities at the source locations. The population size and maximum iteration number of optimization models are kept the same as in section 4.2.1. The convergence curves obtained for the SO model simulations are presented in Figure 9(d). In the figure, the range of objective function values is estimated using where and signify the mean and standard deviation of the corresponding convergence curves of the SO models, respectively. It is observed in Figure 9(d) that the RPCM-MVO model converges to a lower value of objective function compared to RPCM-GWO and RPCM-PSO. Also, the range of estimates of the best values of the objective function for different breakthrough curves for RPCM-MVO, RPCM-GWO and RPCM-PSO are 0.021, 0.238 and 0.068. Hence, this denotes that RPCM-MVO exhibits lower uncertainty in the estimation of release history compared to other SO models. The estimated release histories shown in Figure 9 at different stress periods also denote that RPCM-MVO model estimates have lower deviations compared to other SO models. Also, RPCM-MVO model provides reasonably close estimates of the release intensities to the actual values for different sources and stress periods. In contrast, the range of estimated release rates is higher for RPCM-GWO and RPCM-PSO for source location 3 and 2, respectively, as shown in Figure 9. This suggests that the RPCM-MVO-SO model simulations are more reliable compared to the other SO models in consideration.

## DISCUSSION

There are a variety of sources that can introduce contaminants into groundwater, and these sources must be identified so that informed decisions can be made concerning groundwater remediation and prediction of plume movement. This study proposes an RPCM-MVO-SO model for the estimation of release histories utilizing contaminant observations and compares its efficacy to that of the RPCM-GWO and RPCM-PSO models in two case studies. In the first case study, the number of estimable variables is 20. Based on the performance indicators, the RPCM-MVO model produces significantly better results compared to that of RPCM-GWO and RPCM-PSO. Interestingly, the convergence curve of RPCM-MVO is observed to converge slower compared to the other SO models which are due to initial restrictions on local exploitation imposed using the WEP parameter in MVO. Due to this, the algorithm focuses on global exploration and gradually focuses on local exploitation with an increase in iteration number. On the other hand, the PSO and GWO algorithms tend to converge faster as they apply changes in the entire population using Equations (14) and (16), respectively.

In case study 2, the SO models are utilized for the estimation of 16 variables (4 sources × 4 stress periods). Similar to case study 1, the RPCM-MVO model is observed to perform better in terms of *R*^{2} and RMSE. Interestingly, the RPCM-PSO model performs well in terms of *R*^{2}. However, due to imprecise estimations for source location 3, the RMSE computed is significantly higher than that of RPCM-MVO. Furthermore, considering measurement and operational errors in the measurement at observation points, the SO models are run using five noisy breakthrough curves. The performances indicate that the RPCM-MVO estimated release histories are close to the actual values for all source locations and stress periods. Additionally, the convergence curves and boxplots of the SO models in Figure 9 indicate low variability in RPCM-MVO estimates. In contrast, the RPCM-GWO and RPCM-PSO estimates show higher variability due to noisy breakthrough curves, especially in source locations 2 and 3. It may be noted that the breakthrough curves used for the evaluation of the objective function use fewer data points compared to the timesteps used for simulation. In case study 1, the measurement interval for case study 1 is 365 days whereas it is 30 days for case study 2 for simulation periods of 15 and 2 years, respectively. This indicates that the SO model can perform well in limited data availability. Based on the case studies, MVO exploits the multidimensional search space for the inverse problems better compared to GWO and PSO. The time required for all SO models is nearly equal as observed in the case studies. However, in terms of convergence, the MVO model is slower compared to GWO and PSO due to the restricted change to the population in the initial phase of optimization in MVO. Hence, for low-dimensional problems with relatively smoother objective functions, the GWO and PSO may provide accurate solutions in fewer iterations. Due to the use of the simulation model in conjunction with the optimization model, the PDEs related to governing equations are solved at every timestep increasing the computational cost. The computational efficiency of the inverse modelling can be improved significantly by using surrogate models (Borah & Bhattacharjya 2016). However, machine learning-based surrogate models also require sample generation for training which can be computationally expensive for a large study area.

## CONCLUSIONS

In this study, an SO inverse model RPCM-MVO is used for contaminant source identification in groundwater. The performance of the proposed SO model is compared with RPCM-GWO and RPCM-PSO models for two case studies. The RPCM-MVO is observed to provide closer estimates of the actual contaminant release rates compared to other SO models considered. The RMSE values obtained are 3.14 ppm and 1.05 ppm for case studies 1 and 2, respectively. Furthermore, for inverse modelling using noisy observations, the RPCM-MVO model provides lower variation in the estimates of the release intensities at the source locations. The better performance of MVO can be attributed to the gradual transition between global exploration and local exploitation phases which also results in slower convergences as observed in the convergence curves in the case studies. However, this technique prevents MVO solutions from getting trapped in the local minima. On the other hand, GWO and PSO solutions converge faster initially. Nonetheless, they show little improvement with an increase in iterations after this initial phase. It may be noted that the use of simulation models in SO technique results in an increased computational cost, making it computationally expensive for large aquifers and long simulation periods. Using surrogate models derived from machine learning techniques in this case can be computationally effective. Moreover, the use of surrogate models can also assist in more comprehensive estimation of uncertainty in release history estimation due to observational errors by using a greater number of breakthrough curve realizations. Due to the presence of complexity in the flow and transport process in the groundwater, the search space for the optimization problem is highly nonlinear and multidimensional. From the case studies considered in this study, it can be concluded that the MVO algorithm traverses the nonlinear search space more efficiently compared to GWO and PSO indicating higher efficacy of the RPCM-MVO model for contaminant source identification.

## ACKNOWLEDGEMENTS

The first author of the paper is grateful to Indian Institute of Technology Bombay for providing funding for the research. The authors are thankful to the anonymous reviewers for their constructive suggestions which helped in improving the quality of the paper.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Science and Ultimate Reality: Quantum Theory, Cosmology, and Complexity*. Cambridge University Press, Cambridge. ISBN 0-521-83113-X