The main goal in this research is study of impacts of various likelihood functions on DREAM(zs) (Differential Evolution Adaptive Metropolis) method results in simulation-optimization model of aquifer. In this study, DREAM(zs) algorithm has been developed to study aquifer simulation-optimization model uncertainties. DREAM(zs) is used to investigate uncertainty of parameters of the simulation-optimization model in Isfahan-Barkhar aquifer, Isfehan province, Iran. This study is carried out on an aquifer simulation model of MODFLOW that is coupled with MOPSO (multi-objective particle swarm optimization) optimization. Three likelihood functions, L1, L2, and L3, are considered as informal and the remaining (L4 and L5) are represented as formal categories. Likelihood function L1 is Nash–Sutcliffe efficiency and L2 is based on minimum mean square error. L3 uses estimation of model error variance and L4 focuses on the relationship between the traditional least squares fitting and the Bayesian inference. In likelihood function L5 the serial dependence of residual errors is calculated using a first-order autoregressive model of the residuals. Results suggested that the parameters sensitivity depend on the likelihood function selection, and sensitivity of all parameters is not similar in different likelihood functions. MOPSO algorithm outputs indicated that likelihood function No. 5 has a higher speed in reaching convergence and this function also showed that objective functions had a better performance compared to the other likelihood functions.
Uncertainty could be defined as the occurrence of phenomena that are out of the researcher's control. Simulation of groundwater flow is always surrounded by the uncertainty due to the lack of the complete knowledge of the studied physical system, model structure, and spatial and temporal variability (Mirzaei et al. 2015) because the data based on which mathematical models are provided are usually not enough and also the algorithm that is selected for the modeling is not exactly similar to what is happening in nature (Alaviani et al. (2018).
There have been various categorizations provided by the researchers in order to study the sources leading to uncertainty in groundwater or hydrologic simulation models. Generally, considering the logical processes in modeling the groundwater and different categorizations of researchers, the uncertainty sources are categorized into three branches (Wu & Zeng 2013). The three branches include the uncertainty of the modeling parameters, for instance studies of Feyen et al. (2001), Harrar et al. (2003), Feyen & Caers (2006), Wu et al. (2011), and He et al. (2013), uncertainty of the conceptual model, for instance Refsgaard et al. (2006) and Rojas et al. (2008), and the uncertainty of the observational data, for instance Troldborg et al. (2010).
The stochastic process, which is dependent on its previous outcomes, is called stochastic process with a Markov feature. Based on this, the stochastic process that holds true with a Markov feature is called the Markov process or chains. In addition, in cases in which the model results were studied following simulation with many repetitions, it is said that Monte Carlo simulation is used. Markov Chain Monte Carlo (MCMC) method is important for hydrologic modelers in analyzing the uncertainty of hydrologic and environmental models (Vrugt et al. 2003, 2006).
The MCMC algorithm used in this method is called DREAM(zs) (Differential Evolution Adaptive Metropolis). DREAM(zs) is based on the primary algorithm of DREAM. However, in order to produce proposed points in each individual chain, the archive of the past states is used. This sampling is designed for accelerating the convergence and for problems with high parametric dimensions and uses only three to five parallel chains in order to properly search the posterior density function (Schoups & Vrugt 2010).
Laloy & Bielders (2009) studied DREAM(zs) function for hydrologic models with many parameters. When using the DREAM(zs) method, in order to estimate the uncertainty of the parameter in the hydrologic model, confidence in selection of the convenient likelihood function that could produce reliable parameters is of importance. Estimating good fitting between the observations and simulated corresponding values is mainly dependent on the likelihood function selection (Beven & Binley 1992).
The generalized likelihood uncertainty estimation (GLUE) method was used in studying the influence of likelihood function selection on the uncertainty of XAJ-RR hydrologic model in the Nangao region in China. Alazzy et al. (2015) showed that likelihood function selection has a great influence on the sensitivity of the parameters and studying uncertainty. Accurate likelihood function selection requires a descriptive explanation of the model error distribution for statistic inference, uncertainty conclusion and accuracy of the prediction intervals (He et al. 2010).
Additionally, those likelihood functions must be used that are capable of showing the uncertainty in the model structure, input data, and parameters. In order to estimate the uncertainty of the parameters, there are two categories of informal and formal likelihood functions (McMillan & Clark 2009; Vrugt et al. 2009a). Informal likelihood functions are the subjective likelihood probabilities and do not result in a certain model for the random error series (Smith et al. 2008).
In contrast, formal likelihood functions are derived from a hypothetical statistical model on residual errors (Box & Tiao 1992). However, this method has been criticized due to it highly relying on residual errors hypotheses (Thyer et al. 2009) where, in many cases, the residual errors are correlated, nonstationary (heteroscedasticity), and non-Gaussian (Kuczera 1983). Revoking SLS (standard least squares) assumptions may result in biased estimations of the parameters and lead to parameter or prediction uncertainties. In order to decrease the SLS error assumptions, some formal methods are provided (Sorooshian & Dracup 1980; Schoups & Vrugt 2010). Efforts have also been made to separate the various sources of error in hydrologic modeling (Vrugt et al. 2008, 2009b; Thyer et al. 2009; Renard et al. 2011).
In order to study the influence of model structural error Vrugt et al. (2009b) used the first-order autoregressive (AR) of residual errors. The first-order AR of residual errors scheme removes the temporal autocorrelation of the residuals (Sorooshian & Dracup 1980; Bates & Campbell 2001).
Many of the hydrological researches have studied the influence of likelihood function selection on uncertainty analysis in the GLUE method and have shown that the likelihood function selection could directly influence the uncertainty analysis and parameters (Freer et al. 1996; Stedinger et al. 2008; Freni et al. 2009; Alazzy et al. 2015).
For hydrological research, sufficient likelihood functions should be used to provide more accurate parameters to estimate the uncertainty of input parameters of hydrologic models. For this reason, the Bayesian approach used formal and informal probability functions to estimate parameter uncertainty (Mantovan & Todini 2006; Beven et al. 2008; Stedinger et al. 2008; McMillan & Clark 2009; Vrugt et al. 2009b; Cheng et al. 2014).
Nourali et al. (2016) investigated the impact of likelihood function selection on estimating the uncertainty of the rainfall-runoff model (HEC-HMS) using the MCMC algorithm. The results showed that the likelihood function related to the serial dependence of residual errors can obtain a valid posterior distribution and thus can be employed for further applications.
Cheng et al. (2014) compared the likelihood functions for parameter uncertainty analysis and calibration using the MCMC method. The results showed that the type of likelihood function essentially impacts the calibrated parameters and the re-enacted consequences of high- and low-flow components.
He et al. (2010) investigated the effect of likelihood function selection on estimating crop model parameters using the GLUE method. The results of their research showed that the combinatorial probability method has a great impact on parameter estimation. A hybrid approach based on the probability of products in each set of observations can significantly reduce the uncertainty in the posterior distribution.
The main purpose in this study is the investigation of the importance of the likelihood function selection on the uncertainty analysis using DREAM(zs) algorithm in simulation-optimization model. Accordingly, the quantitative and qualitative simulation model of the groundwater was provided by MODFLOW and MT3DMS, respectively. Subsequently, this model was linked to the DREAM algorithm in MATLAB. In the aforementioned algorithm, the influence of three informal likelihood functions and two formal likelihood functions on the analysis of the uncertainty of model parameters in DREAM(zs) method was studied. The outputs resulting from linking the model and algorithm uncertainty were inserted into the multi-objective particle swarm optimization (MOPSO) algorithm and the most optimized results were derived considering the objective function, which includes minimizing the groundwater level drop and total dissolved solids (TDS).
MATERIALS AND METHODS
The studied region in this research is the Isfahan–Barkhar region, which is located in the central part of Iran. This region has an area of around 1,500 km2 and is one of the largest and most significant regions in the Isfahan province (Figure 1).
Isfahan-Barkhar basin has faced an increase in TDS in most regions during the recent years. Based on current statistics and maps, the total arable land in the studied region is around 70 hectares, which is decreasing due to the water scarcity in the recent years and the high expansion of cities and industrial areas. The most prominent activity in the region is the agricultural activity with an area of around 31,000 hectares. Due to the poor quality of the groundwater in most parts of the region for drinking, the water is supplied from the Zayanderud River to most rural and urban areas.
Since one of the most significant features in groundwater modeling is the parameter of recharging from surface, initially its temporal and local distribution was determined using a Soil and Water Assessment Tool (SWAT).
A quantitative and qualitative simulation model of groundwater was developed based on the available statistics and data from the studied region. In order to achieve this, a GMS (groundwater modeling system) v10.1 model, which included special features such as feasibility of developing a conceptual model and the capability to automatically convert it to numerical model and adaptability with geographic information system (GIS)-based systems, was used. In GMS, MODFLOW and MT3DMS models were used for quantitative and qualitative simulation. Calibration of these models is particularly important before being used in predicting and evaluating system responses to what has not been seen. In this regard, the manual and also automatic calibration methods are considered by the PEST (parameter estimation) model. As already mentioned, DREAM uncertainty method is used. This method, which studies the parametric space with the least number of repetitions, was linked to the simulator in MATLAB by coded link. In DREAM uncertainty algorithm, five types of likelihood functions were used to assess the output of simulation-optimization.
Moreover, to link the simulator to the optimizer, coding in MATLAB was used. Subsequently, the developed simulator models with the capability to calculate the processes and requirements for planning the exploitation of the aquifer were linked to a multi-objective metaheuristic optimizer model in order to optimize the exploitation of the aquifer and the studied water resources system (Kamali & Niksokhan 2017). The decision variables in this optimizer model are pump discharge values in exploited wells. The objectives include minimizing the annual groundwater table changes and minimizing the groundwater qualitative changes; while the constraints include the TDS in wells, the available surface water amount, pump discharge in exploited wells, and limitation related to the water supply. For each set of the aforementioned variables, simulator models of MODFLOW and MT3DMS were recalled and embedded to MOPSO. Subsequently, the general heuristic rules and updating of the variables continue, along with guiding the evolutionary search process and the basic population in the metaheuristic optimizing model for optimizing the exploitation variables and designing, until convergence is reached. Figure 2 demonstrates the research framework.
The closer the residuals to 0, the better the observed values are simulated. Although due to the error of the input data, lack of efficiency in model structure, errors in measuring Oj output, and uncertainty correlated with the correct selection of θ values, the residual values will never be 0. Better match of the observational and simulated flow and the residual value becoming closer to 0 is carried out by regulating the values of parameters. Usually the input data uncertainty and model structure are not considered as the error potential source, which is not realistic for real-world applications (Vrugt et al. 2008, 2009b), and hence developing an inferential method that operates all separated and convenient error resources is quite convenient (Vrugt et al. 2008).
If it is supposed that the error value in Equation (8) is mutually independent (non-correlated) and has a Gaussian distribution with error variance of σ20, the likelihood value of Equation (4) will be likelihood function of L4. Using this formula is convenient, but the assumption of independent error in hydrologic modeling is not realistic.
The AR-1 of error residuals meets the time residual autocorrelation. The R parameter in this model is estimated along with the hydrologic model parameters during calibration. The prior uncertainty range of R parameter (correlation coefficient) is considered to be between 0 and 1 (Schoups & Vrugt 2010).
Parameters of hydraulic conductivity (HC), horizontal anisotropy (HA), storage coefficient (SC), and recharge (RCH) are considered as the input parameters with errors in this research.
F1 and F2 are the objective functions, Ntp is the total number of planning months that is equal to 125, Nj is the total number of model cells that is equal to 15,912, Htj is the water table at tth time step in ith cell, H1j is the water table at the first time step in jth cell, Ctj is the water TDS at tth time step ith and C1j is the water TDS at the first time step in jth cell.
GWtp is the total water pumped from the agricultural wells in the month of tp, tp is the month counter, td is the number of the days in the month of tp, Qk.tp is the pump discharge of the well of k in the month of tp (m3/day), k is the pumped wells number counter, NW is the total number of pumped wells in the region, Ntp is the total number of planned months, Ck.tp is the TDS in the well k in the month tp (mg/lit), and SWtp is the amount of surface water used in the month of tp (m3). Dtp is the water required for agriculture in the month of tp based (m3) and SWmintp is the minimum amount of used surface water in the month of tp (m3), which is equal to 0 (m3). SWmaxtp is the maximum amount of used surface water in the month of tp (m3), which is equal to the total water needed for agriculture in the region (m3). Cmin is the minimum TDS in the used water, which is equal to 0; Cmax is the maximum TDS in the used water, which is equal to 2,000 mg/L. The total water demand in the studied region is 600 MCM provided from the surface and groundwater.
RESULTS AND DISCUSSION
Linking SWAT and MODFLOW
In this study, a consecutive or one-way combination method was used for combining two models of SWAT and MODFLOW. In this method, the construction and calibration of two models of SWAT and MODFLOW are separate from each other. In other words, the results of the SWAT model, such as recharge, are used as the input data in the MODFLOW model. After implementing the SWAT model, groundwater recharge values are converted into the format of input in MODFLOW in GMS along with its temporal and local changes. The SWAT model for recharge values includes a text file in which the time step of recharge values is provided for each HRU (hydrologic response unit) is given in mm/month. In this file, the recharge values for each HRU is as GW-RCH-i, in which i includes each HRU number. From HRULandUseSoilsReport.txt file, the data related to any sub-basin is extracted separately. This data includes number of HRUs and the area of HRU in each sub-basin and other data such as land cover, land slope and soil type in each HRU. According to the above information, the recharge value in each sub-basin is derived from the area occupied by HRUs available in it and their recharge values, which are estimated using recharge formulas. The above-mentioned stages are coded in a computer program in MATLAB that produces recharge value in MODFLOW for any sub-basin in the form of a text file, from the SWAT output files. The recharge value derived from any sub-basin is assigned to its corresponding cells in MODFLOW. The cells covered by any sub-basin are determined by GIS through polygon transfer of SWAT sub-basins to GMS. Using the proposed method, the recharge values derived from the SWAT model are transferred to the MODFLOW model.
In this research, the MODFLOW model in the format of GMS software was used in order to study the water balance condition and the discharge capacity of the aquifer. To calibrate in the steady state, the first period from October 7, 2002 to November 7, 2002 was selected because the water table change is insignificant. In the steady state, the HC and HA parameters were retrieved and, in transient condition, the conductivity parameters and HA and recharge parameters (derived from SWAT) were formulated in 95 monthly time steps from November 7, 2002 to October 7, 2010, and the specific drainage was estimated. The quantitative calibration model was manually set so that the model outputs were adapted with the observational as much as possible. Subsequently, the automatic calibration was carried out using PEST considering the initial values derived for parameters at the end of manual calibration stage and introducing allowed maximum and minimum values for each of the parameters and applying other regulations in this software.
Figure 3(a) demonstrates the calibrated steady-state result for November 7, 2002. Green color demonstrates errors under 0.5 unit, yellow color indicates errors in the range of 0.5 and 1 unit, and red color demonstrates errors larger than 1 unit. The consequences of the monitoring well show that the majority of computed groundwater elevation is within a 0.5 m interval from the observed value. Figure 3(b) demonstrates the validated outcome for the end of the simulation model on March 7, 2013.
Coupling MODFLOW model, DREAM algorithm and MOPSO
In order to read the input set inserted in the DREAM(zs) algorithm, it is necessary to prepare the format of the input files and, after inserting them in DREAM(zs) algorithm and correcting the errors (uncertainty), a file is coded in MATLAB so that MODFLOW model can read them and calculate the proper output. For this purpose, a code was written to separate the aforementioned parameters from the model parameters. Moreover, the code prepares the proper format for being accepted by DREAM(zs). This program reads the inputs from the file with the extensions of MODFLOW model input and converts them to a vector format. This vector categorizes each input and inserts them in a matrix to DREAM(zs). Subsequently, by executing the mentioned file in MATLAB, the program carries out the conversions and drawings automatically, and inserts in DREAM(zs). After returning from DREAM(zs) it converts the file format into the initial format and saves them. After providing the inputs with the proper format for DREAM(zs) algorithm, these inputs are recalled by MODFLOW and inserted in DREAM(zs) algorithm, and its output are the inputs that the model optimizes based on the minimum head difference. Subsequently, the MODFLOW model receives inputs from the previous stage and prepares the outputs. This stage is repeated as many times as needed until convergence is reached. The received output from the previous stage is then recalled by MOPSO algorithm, and ultimately, the optimized head and TDS in each aquifer network cell are reached, considering the desired objective functions. It has to be mentioned that all the items as mentioned earlier and related codes were carried out in MATLAB.
Figure 4 presents the HC, SC, and HA inputs for two replications for which DREAM has predicted. As observed, the head diagram on the right side of these diagrams is the function fitness rate.
Appendix A presents the best input determined by DREAM(zs) algorithm and its difference rate with observed values. What is required is that the value that DREAM(zs) predicts is close to the real value and the main nature of the input parameters is not lost. Among the defined likelihood functions, function No. 1 has applied a higher degree of change compared to the other functions. It has to be mentioned that Appendix A presents the HC and recharge in 100 cells, in order to compare the likelihood functions and variations that are applied to these two parameters.
To show the sensitivity of the parameters toward different likelihood functions, the cumulative distribution of the parameters used for five likelihood functions were drawn against their observed values. Figure 5 shows that the likelihood function has a great influence on the parameters sensitivity and the sensitivity of all parameters is not the same in different likelihood functions. For this reason, the selection of likelihood functions should be carried out with more care due to their significance in parameters values.
Given the above description and the influence of the likelihood function on the sensitivity of the parameters, researchers such as Abebe et al. (2010) and Alazzy et al. (2015) reached a similar conclusion in studying the cumulative distribution of later parameters in the hydrological models, HBV and XAJ-RR, respectively, and noted that not all parameters have the same sensitivity to different likelihood functions.
For most parameters, especially the recharge parameters, likelihood functions No. 4 and No. 5 have a completely different influence of the sensitivity of the parameters compared to the other functions. Cumulative distribution comparison showed that for most of the parameters, cumulative distribution under likelihood functions No. 1–No. 3 are almost the same and these likelihood functions have almost the same influence of the sensitivity of the parameters. The influence of selecting these likelihood functions is therefore insignificant on the parameters.
Results from the drawing of the posterior distribution of the functions showed that the posterior distribution of most parameters for the likelihood functions No. 1–No. 4 are, to a great extent, similar in the parameter range and distribution shape. Moreover, in likelihood function No. 5, the posterior distribution is different from other likelihood functions in the parameter range and distribution shape. Posterior distribution for most parameters by the likelihood function No. 5 is determined to be better and have lesser uncertainty so that the posterior distributions derived by these likelihood functions have normal distributions for most of the parameters. These diagrams show that most of the parameters are better determined by these functions and have lesser uncertainty and higher identification capability. (Appendix B) In addition, the prior distributions diagram was drawn for different parameters, which is similar to the posterior distribution diagram in likelihood function No. 5.
Studying the parameters coefficient of variation
In order to determine the parameters sensitivity degree, coefficient of variation (CV) statistic is used. Lower CV for the parameters shows higher sensitivity of those parameters (He et al. 2010; Pourreza-Bilondi et al. 2013; Shafiei et al. 2014; Alazzy et al. 2015). The mean and coefficient of variation of the used parameters are presented in Table 1. Comparing the parameters in various likelihood functions, it is determined that in likelihood function No. 5, the CV for most parameters is smaller compared to other likelihood functions. This suggests that most parameters are better determined by likelihood function No. 5 and have lower uncertainty, higher sensitivity, and higher identification capability. These results prove those in Appendix B, which are related to the form of posterior distribution functions of these parameters.
|Average .||CV .||Average .||CV .||Average .||CV .||Average .||CV .||Average .||CV .|
|Average .||CV .||Average .||CV .||Average .||CV .||Average .||CV .||Average .||CV .|
According to Table 1, the lower rate CV for recharge parameter and HC suggests that the parameters variation range is smaller compared to the initial range of the parameters, and these parameters are considered as the most sensitive parameters. Results from studying the parameters posterior functions and CV suggested that likelihood functions No. 4 and No. 5 have a relatively larger influence on results of parameters uncertainty analyses compared to other likelihood functions.
The results related to linking the model to algorithm MOPSO are presented in Appendix C. The performance of likelihood functions in the convergence of MOPSO algorithm shows that the likelihood function No. 5 performs better than other likelihood functions.
The ultimate output value of the model for the last 20 replications is presented in Appendix D, and as observed, the performance of likelihood function No. 5 and its speed in reaching convergence is better than other functions.
Since the objective in MOPSO algorithm is to minimize head loss and TDS rate, the percentage of the output head and TDS from simulation-optimization was drawn in DREAM(zs), considering various likelihood functions. As could be observed in Appendix E, the likelihood function No. 4 and No. 5 had the best performance in this regard.
The model output in the simulation-optimization mode with and without the DREAM(zs) algorithm is shown in Appendix F. The maps show the involvement of uncertainty algorithm (elimination of observational errors in model inputs) in simulation-optimization, which leads to an increase in groundwater level by approximately 1 m. The level of groundwater level improvement is about 4 m, considering the observational head level in some parts of the study area. The second objective in this study, which was to minimize TDS, led to the improvement of the parameter status by an average of 745 mg/L. Table 2 presents the statistics of the observed TDS parameters against its optimal value.
|Observed .||Optimum .||Observed .||Optimum .||Observed .||Optimum .|
|Observed .||Optimum .||Observed .||Optimum .||Observed .||Optimum .|
In this research, the influence of three informal and two formal likelihood functions on the estimation of simulation-optimization parameters under DREAM(zs) was assessed. The likelihood function No. 1 is the Nash–Sutcliffe efficiency (NS); the likelihood function No. 2 is a minimum mean square error; the likelihood function No. 3 is a model estimation error variance; and the likelihood function No. 4 is the relation between standard least squares fit and Bayesian inference. In likelihood function No. 5, the sequential dependency of the residual errors is calculated by using first-order AR of residual errors.
Among the defined likelihood functions, the values predicted by function No. 1 were more different than the other likelihood functions. The likelihood function has a great effect on the sensitivity of the parameters and the results showed that the sensitivity of all the parameters is not the same for different likelihood functions. Due to their importance in the parameter values, the choice of the likelihood functions should therefore be considered more carefully. Low values of coefficients of variation for the recharge and HC parameter indicate that the range of the parameter changes is smaller than the initial range of parameters and these parameters are considered as the most sensitive parameters. Functions No. 4 and No. 5 also had a completely different effect on the sensitivity of the parameters compared to other likelihood functions.
Results from the drawing of the posterior distribution of the functions showed that the posterior distribution of most parameters for the likelihood functions No. 1–No. 3 are, to a great extent, similar in the parameter range and distribution shape. The posterior distribution for most of the parameters is determined better by likelihood functions No. 4 and No. 5 and they also have lower uncertainty and higher identification capability. In likelihood function No. 5, compared to other likelihood functions, the CV for most parameters is smaller, suggesting that most parameters are better determined by likelihood function No. 5 and have lower uncertainty, higher sensitivity, and higher identification capability. MOPSO algorithm outputs indicated that the likelihood function No. 5 has a higher speed in reaching convergence. The results of this function also showed that objective functions had a better performance compared to the other likelihood functions. With respect to DREAM(zs) uncertainty, the results of MODFLOW output parameters showed that these parameters were faster in reaching convergence by MOPSO algorithm and likelihood function No. 5 had the highest speed.
The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/ws.2020.003.