Accurate hydraulic conductivity estimates are vital for groundwater evaluation. Usually, interpolations of hydraulic conductivity data are needed to obtain spatial estimates over larger areas, but the results present a high uncertainty which can be reduced by adding a secondary variable in the estimation. In this paper, the influence of the number and spatial configuration of hydraulic conductivity (K) and hydraulic head (HH) data on the estimation of K is evaluated using univariate and bivariate geostatistical-Kalman filter approaches (similar to kriging and cokriging, respectively). A synthetic case based on a transient groundwater flow model is used to generate different numbers, spatial arrays, and data. With these data, variogram models for the univariate and bivariate cases were fitted and used to calculate the corresponding covariance matrices for the Kalman filter. The results show that K estimates are more reliable when HH data is added than when only K is used, independently of the number and distribution of the data, since there is a better agreement between the calculated errors and estimate error variances. HH data provides valuable information only where K is not sampled. This evaluation could support the design of optimal sampling strategies to obtain reliable K estimates.

  • Accurate hydraulic conductivity estimates are essential for groundwater evaluation.

  • A geostatistical–Kalman filter method that includes secondary hydraulic head data is used.

  • The effect of different data numbers and locations on the hydraulic conductivity estimates is evaluated.

  • Estimates are more reliable when hydraulic head data is added, independently of the number and distribution of the data.

Graphical Abstract

Graphical Abstract

In practice, infiltration and aquifer tests are carried out to determine the hydraulic conductivity values that characterize some areas of an aquifer. However, these field tests usually cover only a tiny portion of the entire aquifer extent. Geostatistical techniques have often been used to overcome this limitation, obtaining minimum variance estimates at unsampled sites. These estimates represent an initial spatial configuration of the hydraulic parameter for the construction of conceptual models, the execution of water balances, or as reference fields for groundwater model calibration processes.

Geostatistical methods have been compared with different interpolation methods. For example, Panagopoulos et al. (2006) compared ordinary kriging, inverse distance, and Thiessen polygons; the geostatistical kriging method gave the best results. Similarly, Ghosh & Pekkat (2019) compared kriging, inverse distance weighted, natural neighbor, spline, and trend interpolation methods for the spatial prediction of hydraulic conductivity; they got the best results with kriging even for cases with a highly heterogeneous medium with the additional benefit of providing a measure of the estimate uncertainty, an important feature of geostatistical techniques with respect to traditional interpolation methods. For these cases, a good agreement between the obtained errors and the estimate error variances gives reliability to the selected variogram model.

The hydraulic conductivity can be estimated within an aquifer using data obtained from field or laboratory tests (Ahmad et al. 2021; Chandel et al. 2022). However, these data are usually scarce and characterize only small portions of the total area of an aquifer; additionally, hydraulic conductivity varies by several orders of magnitude in small regions. For these reasons, the spatial configurations of K obtained with traditional interpolation methods present a high uncertainty in most cases. In this sense, bivariate geostatistical techniques (using a more sampled secondary variable) have represented an alternative to improve hydraulic conductivity (K) or transmissivity (T) estimates reducing the estimation uncertainties.

Freixas et al. (2017) revealed preferential flow paths by analyzing cokriging transmissivity estimates using the storage coefficient as a covariate. Also, Patriarche et al. (2005) used two approaches to estimate hydraulic conductivity. In the first, cokriging was used to estimate K (direct approach), and in the second, cokriging was used to estimate transmissivity and K was retrieved dividing T by the well screen length (indirect approach); for both, the specific capacity was the covariate. The best results were obtained by the indirect approach: log () data, and dense T sampling. Ahmed & de Marsily (1987) also used specific capacity as a secondary variable. Frohlich & Kelly (1985), Kupfersberger et al. (1992), El Idrysy & De Smedt (2007), Xu et al. (2017), and Chen et al. (2018) included electrical resistivity as secondary data to get K or T spatial estimations when electrical resistivity and the hydraulic parameter have a good correlation, and the electrical resistivity data is densely sampled. Other secondary variables have been used to estimate T using geostatistical methods, for example, seismic data (Rubin et al. 1992), and slope of the water table (El Idrysy & de Smedt 2007).

As far as we know, Ahmed & de Marsily (1989) were the first to test the use of transmissivity and piezometric measurements simultaneously to estimate transmissivity (T) with cokriging and compared the T estimates with those of a kriging approach. The hydraulic head (HH) and the transmissivity natural logarithm (lnT) were assumed to be related through an equation. Thus, the covariances of lnT, HH, and their cross-covariance were related through a differential equation. A set of Gaussian covariance models that satisfied these differential equations was proposed. The authors tested their approach using a hypothetical example based on 2,000 km2 of a chalk aquifer in northern France. A conditional geostatistical simulation of T was generated based on 72 actual transmissivity measurements. Thirty-five synthetic T and 144 HH measurements were generated using a flow model with the simulated T. The model did not include sources or sinks. For the geostatistical analysis, the HH trend was removed using a simple flow model; the fitted ln T covariance was isotropic, and the covariance of the HH residuals and the cross-covariance of HH and T were anisotropic. For the kriging T estimates, simple kriging was used, whilst residual cokriging was used for the bivariate estimates. The results indicate that the errors of the cokriging estimates were smaller than the kriging ones. However, the mean squared error (MSE) of cokriging was larger than that of kriging (5.79 and 2.4 m2/d, respectively), indicating that the cokriging variance underestimated the errors.

Most work undertaken to estimate K or T using piezometric data has been done to solve the inverse problem for groundwater flow and transport models (Kitanidis & Vomvoris 1983; Xu et al. 2013; Erdal & Cirpka 2016) maybe because the models establish a natural relation between K or T and HH. In the context of stochastic groundwater modeling, ensemble Kalman filter methods have been used to estimate K using HH or drawdown measurements (Panzeri et al. 2015; Xu & Gómez-Hernández 2015; Carrera et al. 2020). Other approaches to estimate saturated hydraulic conductivity in large areas of an aquifer consist of using artificial intelligence algorithms (Gorgij et al. 2018; Yusefzadeh & Nadiri 2021; Singh et al. 2022). However, to our knowledge, none of them perform a sensitivity analysis of the number and spatial distribution of the available data.

Júnez-Ferreira et al. (2020) took up the use of bivariate geostatistical-based techniques to estimate hydraulic parameters within an aquifer. They present univariate, bivariate, and bivariate space-time geostatistical-Kalman filter-based approaches (similar to kriging, cokriging and spatio-temporal kriging) to estimate K in heterogeneous aquifers. For the bivariate case, data were used. The correlation structures between data of the principal () and the auxiliary () variable were determined through geostatistical analyses and the estimates were obtained through a static Kalman filter (that does not use the model equations). Data from a synthetic numerical flow model were used to test the method. A fixed number of spatially uniformly distributed K and data were used to estimate K with the static Kalman filter.

Several authors have pointed out the importance of using reliable variogram estimates to reduce the estimation error in kriging interpolations (for example, Gumiere et al. 2014; Oliver & Webster 2014). The number and spatial locations of available data directly influence the selected variogram model and its parameters and, therefore, the variable estimate. But, small changes in the model function and its parameters can significantly affect the kriging error variances, which are essential to predict the estimation error magnitude. Oliver & Webster (2014) recommend using 100–150 data to obtain reliable variogram estimates. However, this recommended highly dense sampling is not a common situation in many real cases, specifically for hydraulic conductivity. Also, as far as we know, general criteria on the number of data needed to get reliable estimates using cokriging are not available. Therefore, evaluating the estimation method's performance under different numbers and spatial configurations of the available data is advisable. Even though geostatistical methods can give reasonable K estimates, it is essential to evaluate the model performance when the estimates are used for numerical modeling purposes.

This paper presents a sensitivity analysis for the univariate and the bivariate estimation methods proposed by Júnez-Ferreira et al. (2020), evaluating the hydraulic conductivity estimates for different numbers and spatial locations of the available data for the principal and auxiliary variables. These estimates are compared with the known hydraulic conductivity values of a synthetic numerical flow model. This evaluation aims to support the design of optimal sampling strategies (for the measurement of hydraulic head and the execution of pumping tests) oriented to obtain reliable hydraulic conductivity estimates. These actions, among other benefits, could reduce calibration efforts by providing adequate reference fields and increasing the knowledge of the correlation structure for field variables.

The remaining sections of this paper are structured as follows. In the Materials and Methods section, the geostatistical theory and static Kalman filter used in this paper are presented, together with the structure of the covariance matrices for the univariate and bivariate cases. The Case Study section includes the characteristics of the synthetic numerical flow model employed and the datasets used for the estimation of K and its comparisons; it also includes the error statistics used for these evaluations. In the Results and Discussion section, the parameters of the selected variogram models, the evaluation of the K estimates, and simulations of the numerical flow model for the univariate and bivariate cases are shown and discussed. Finally, Conclusions are presented.

The influence of the data number and spatial locations on the hydraulic conductivity estimation was evaluated for univariate and bivariate geostatistical–Kalman filter estimation approaches. The evaluation was done using data generated from a previously built transient numerical flow model. Different number, spatial arrays, and combinations of K and simulated were used for estimating K. Different variogram models for the univariate and bivariate cases are obtained from geostatistical analyses. They are used to obtain the corresponding covariance matrices that the static Kalman filter employs for estimating K. The hydraulic conductivity was estimated by two different approaches:

  • - Univariate estimation (based on the spatial distribution of hydraulic conductivity data only);

  • - Bivariate or cross estimation (hydraulic conductivity as the primary variable and hydraulic head for the last simulation as the secondary variable).

A description of the geostatistical procedure to derive the univariate, and bivariate covariance matrices is presented in the following subsections. Relevant aspects of the static Kalman filter theory and its implementation for the proposed methodology are also included.

Geostatistical theory

Geostatistical analyses were done to get the spatial and cross structures of covariance matrices derived from variograms.

The sample variogram, , for the univariate case is calculated with:
(1)
where is the number of data pairs, and are the tail and head values of the primary variable data (K or ln K being the primary variable depending on the case; the cases are explained in the Case Study section below) is the position i of coordinates , and the separation vector is specified with some direction and distance (lag) tolerance (Deutsch & Journel 1998).
When a variogram model is selected, the covariance matrix of a random field Y(X) can be derived by using the following property for a second-order stationary spatial random field:
(2)
where is the spatial covariance, is the variance, is the Y spatial variogram model, and is the separation vector between two positions of the covariance matrix.
For the bivariate case, the sample cross-variogram for variables HH and Y is used:
(3)
For a second-order stationary process, a cross-covariance and pseudo-cross variogram are related by (Webster & Oliver 2007):
(4)

Static Kalman filter

The static Kalman filter was used for the hydraulic conductivity estimation. It does not use the system model equation and uses only the measurement equations (Herrera 1998), as follows:
(5)
in which the measurement vector is related to the system state through the measurement matrix and the addition of a white Gaussian noise (measurement error). has a covariance matrix (both may change for different measurements; however, in this paper, they are assumed constant and equal to zero), and is a measurement label that gives an order to the measurements.
The discrete Kalman filter state is updated by the following equations:
(6)
(7)
(8)
where and are the state vector estimate and the error covariance matrix estimate using the first r measurements, respectively; is the Kalman gain, and is the identity matrix. and are, respectively, the updated state vector and covariance matrix using measurements.

For the recursive implementation of these equations, a prior state estimate and a prior covariance matrix estimate are required; the latter is obtained from a geostatistical analysis for the case study. After each update, the process is repeated, starting from the last state estimate and its error covariance matrix (Briseño-Ruiz 2012).

For each new sampling data, the Kalman filter modifies the state vector estimate and the covariance matrix estimate using equations 6, 7, and 8. For the univariate case, the available hydraulic conductivity data are used to update the estimate, and for the bivariate case, the hydraulic conductivity and hydraulic head data are used.

In this paper, the state vector is , for the univariate case, and , for the bivariate case, where has components , is a given position, and Y is hydraulic conductivity or its natural logarithm , depending on the test case in question (they are explained in the Case Study section below); HH has components . Y length is , and length is , so the total length is , which is also the number of files and rows of the covariance matrix . Finally, one measurement is processed at a time, so is a zero-row vector but at the measurement position at which its value is 1. For the bivariate case, even when the vector includes and , 's estimate is the only one used. The corresponding subvector of the prior state vector equals the available data mean value of Y or .

For the univariate case, the prior covariance matrix used is:
(9)
where is calculated using Equation (2) with .
For the bivariate case, the prior covariance matrix used is an assemble of the univariate and cross-covariance matrices:
(10)
where is defined analogous to in Equation (9), and
(11)
where is calculated using Equation (4). is the transpose of the matrix .
The formulas proposed by Sichel (Samper Calvete & Carrera Ramírez 1990) were used to back-transform the estimates of natural logarithm data to the original variable:
(12)
(13)
(14)
where = mean value of Z (original variable), = mean value of Y (transformed data), = variance of Y, = estimate of Z, = estimate of Y, = estimate error variance of Y, = estimate error variance of Z.

Evaluation error statistics

The different error statistics used to evaluate the performance of each analyzed case are given below:

Mean error

Average value of the difference between observed and estimated values. Theoretically, the best values for this error statistic are those closer to zero, however these results deserve a deeper analysis since compensation among large positive and negative errors can occur:
(15)
where and are the estimated and observed values at position i, and n is the number of evaluated positions.

Root Mean Square Error (RMSE)

By calculating the square root of the MSE, an average magnitude of the estimation error is obtained.

Standardized Mean Squared Error (SMSE)

This measures the goodness of fit of the variogram model by calculating the average value of the ratio between the squared estimation errors and the estimate error variances. Values closer to 1 represent a good agreement.
(16)

Mean Absolute Percentage Error (MAPE)

This is the percentage average ratio between the absolute errors and the available data. The lower this value, the better estimations. It defines the average size of the errors with respect to the corresponding data.
(17)
The synthetic case study is based on the example presented by Júnez-Ferreira et al. (2020). It consists of a confined aquifer of a 5,000 m × 5,000 m area and a depth of 500 m, with K values ranging from 0.2 to 10 m/d, corresponding to clay to sand ranges, and a constant specific storage coefficient value of (Figure 1(a)). The case study was simulated with a transient flow model on a block-centered finite-difference grid composed of 20 columns and 20 rows (and a grid resolution of 250 m in X and Y directions). The north and south boundary conditions were constant hydraulic head values of 945 m and 950 m, respectively, and the east and west were no-flow. Three fully-penetrating pumping wells were included with flow rates of 1,000 m3/d each. Two years were simulated, with 4-month' time steps, with an initial hydraulic head of 1,000 m for the entire domain. The set of spatially-distributed hydraulic head results for the last simulated time (400 data) was stored, and different subsets were used as data surrogates in the examples (explained in the Dataset section below).
Figure 1

(a) Hydraulic conductivity spatial configuration and pumping well locations for the groundwater flow model, (b) Hydraulic head spatial configuration for the last simulation time.

Figure 1

(a) Hydraulic conductivity spatial configuration and pumping well locations for the groundwater flow model, (b) Hydraulic head spatial configuration for the last simulation time.

Close modal
The groundwater flow equation was solved using the MODFLOW transient flow simulator (Harbaugh et al. 2000). The flow equation for an incompressible or slightly compressible fluid in a saturated porous medium can be expressed by combining Darcy's Law and the continuity equation (Bear 1972; Freeze & Cherry 1979):
(18)
where K is the hydraulic conductivity [L/T], h is the hydraulic head [L]; W represents sources or sinks [L3/T]; is the specific storage coefficient [1/L]; t is the time ; is the divergence operator of a vector field, and is the gradient operator of a scalar field. The solution of the flow equation is obtained by applying the finite differences numerical method.

Datasets

Different sets of K and data (corresponding to the final simulation time results) were used to estimate K and in the modeling grid nodes. The resulting K and estimates were compared with the reference K and values (Figure 1(a) and 1(b), respectively). A flow direction from South to North can be seen imposed by the defined boundary conditions, with lower hydraulic gradients in the North where the larger K values are found. Furthermore, equipotential lines are closer to straight lines in the North half while a zig-zag type behavior is produced in the contrast of low K values in the South.

For the univariate case, subsets of the 400 hydraulic conductivity reference values were used as ‘data’. The analyzed cases use 20, 40, and 60 K data, selected arbitrarily but seeking a spatial distribution close to homogeneous through the entire aquifer, and 20 K data distributed: (a) in the Northern half of the aquifer, and (b) in the Western half of the aquifer. Additionally, six analogous cases using natural logarithm of K data were analyzed.

For the bivariate case, the hydraulic conductivity was estimated using 20, 40, and 60 K data, or their natural logarithm, distributed through the entire aquifer, in combination with 60 and 400 data for the last simulation time (used as the secondary variable), and 20 K data, or their natural logarithm, distributed in the Northern and Western halves, combined with 60 or 400 data. The 400 data cover the entire estimation grid, and the 60 data locations are the same as for the 60 K data.

Univariate estimation

 The state vector length is N = 400, and the initial state is the average value of the available data ( or ) for each case. Then, the covariance matrix for each case has 400 rows (structured as shown in Equation (9)), its components are calculated using the selected variogram model for the ‘available’ hydraulic conductivity data for each case. These variograms are shown in Table 1; note that larger sill values are registered when K values are distributed through the entire aquifer extent.

Table 1

Parameters of the fitted variogram models for the univariate cases

Available data
NuggetSillRange
Kln KModelComponent(m2/d2)(m2/d2)(m)
20 – Spherical K 8.25 2,692.61 
– 20 Spherical ln K 0.33 1,802.78 
40 – Gaussian K 0.90 14.00 3,876.95 
– 40 Gaussian ln K 0.19 0.71 3,012.25 
60 – Spherical K 6.70 2,177.68 
– 60 Spherical ln K 0.90 3,555.60 
20 (N) – Gaussian K 0.53 4.32 2,542.10 
– 20 (N) Gaussian ln K 0.10 2,247.48 
20 (W) – Gaussian K 2.63 1,118.03 
– 20 (W) Gaussian ln K 0.20 1,118.03 
Available data
NuggetSillRange
Kln KModelComponent(m2/d2)(m2/d2)(m)
20 – Spherical K 8.25 2,692.61 
– 20 Spherical ln K 0.33 1,802.78 
40 – Gaussian K 0.90 14.00 3,876.95 
– 40 Gaussian ln K 0.19 0.71 3,012.25 
60 – Spherical K 6.70 2,177.68 
– 60 Spherical ln K 0.90 3,555.60 
20 (N) – Gaussian K 0.53 4.32 2,542.10 
– 20 (N) Gaussian ln K 0.10 2,247.48 
20 (W) – Gaussian K 2.63 1,118.03 
– 20 (W) Gaussian ln K 0.20 1,118.03 

Estimates on the finite-differences grid nodes for all cases were obtained using the Kalman filter and compared with the reference values (the 400 K values of the numerical flow model), and the errors were calculated as the difference between them. Besides the minimum, maximum, and mean error, the RMSE, SMSE, and Mean Absolute Percentage Error (MAPE) are reported in Table 2. The best results for each sampling case are highlighted in bold (in Table 2), with regard to the lowest errors and the agreement between these errors and the estimate error variances. For the cases of 20, 40, and 60 K data distributed through the entire aquifer, the best SMSE values (closer to one) correspond to the lowest variances which gives more confidence to those estimates. For the samples distributed at north and west, the best SMSE values correspond to the largest variances. Although the lower MAPE value is obtained for the log-transformed 60 K data option, the overestimation of the estimate error variances is reflected in the corresponding low SMSE value which reduces the reliability of the selected variogram model.

Table 2

K and ln K estimate errors for the univariate cases

Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln K(m/d)(m/d)(m/d))(m/d)SMSE
20  − 3.51 3.41 0.04 1.06 0.39 50.13 
– 20 −3.59 3.32 0.07 1.12 0.22 46.13 
40  − 3.67 3.00 − 0.12 1.04 0.50 36.95 
– 40 −3.10 3.70 0.37 1.06 0.06 42.43 
60  − 3.21 1.86 − 0.08 0.77 0.34 19.58 
– 60 −3.42 2.46 0.09 0.81 0.06 15.86 
20 (N) – −3.13 6.45 1.97 3.42 3.02 213.98 
 20 (N) − 3.97 6.54 1.98 3.49 2.65 217.44 
20 (W) – −5.25 4.25 −0.08 2.58 2.63 120.75 
 20 (W) − 6.06 3.44 − 0.72 2.69 2.19 97.12 
Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln K(m/d)(m/d)(m/d))(m/d)SMSE
20  − 3.51 3.41 0.04 1.06 0.39 50.13 
– 20 −3.59 3.32 0.07 1.12 0.22 46.13 
40  − 3.67 3.00 − 0.12 1.04 0.50 36.95 
– 40 −3.10 3.70 0.37 1.06 0.06 42.43 
60  − 3.21 1.86 − 0.08 0.77 0.34 19.58 
– 60 −3.42 2.46 0.09 0.81 0.06 15.86 
20 (N) – −3.13 6.45 1.97 3.42 3.02 213.98 
 20 (N) − 3.97 6.54 1.98 3.49 2.65 217.44 
20 (W) – −5.25 4.25 −0.08 2.58 2.63 120.75 
 20 (W) − 6.06 3.44 − 0.72 2.69 2.19 97.12 

The best results for each sampling case are in bold.

The best estimates were obtained, as expected, when 60 K data distributed through the entire aquifer extent were available, in particular, with K data without transformation. The RMSE value is the best since it is closer to zero than for the other cases; even though the value for SMSE reflects an overestimation.

For the 20 K data estimates, K is overestimated in the North Central and Southern half regions (the lowest K reference values are located in the latter). The higher estimated error variances are in the Southeast and South-Central regions with values above 4 m2/d2.

For the 40 K data case, K is overestimated in the northern half and the middle regions; the South is better characterized than the 20 K data estimates. Low estimate error variances (between 1.01 and 2.00 m2/d2) are present in a more extended area; on the other hand, in the Southeast region, high values, above 4 m2/d2, are present.

For the 60 K data estimates, K is more similar to the reference configuration (Figure 2(a)). Low estimate error variances (between 1.01 and 2.00 m2/d2) cover almost the entire aquifer, with small areas of estimate error variances between 3.01 and 4.00 m2/d2 in the South (Figure 2(b)).
Figure 2

Estimates using 60 K data (univariate case). (a) Estimation map. (b) Estimation error variance.

Figure 2

Estimates using 60 K data (univariate case). (a) Estimation map. (b) Estimation error variance.

Close modal

For the 20 K (N) data case, K is overestimated in the North Central and in the Southern half, where all the estimates fall between 6.01 and 7.00 m/d due to the lack of data in these regions. The estimated error variances in the Northern half are similar to those of the 40 K data case. However, the Southern half presents a highly extended area with values between 4.01 and 5.00 m2/d2.

For the 20 K (W) data estimates, K is estimated adequately in the West, with a slight overestimation in the South; however, K is not well estimated for the Eastern half (due to the lack of data in this region), where all the estimates fall between 4.01 and 5.00 m/d (Figure 3(a)). The estimated error variances in the western half region are similar to those obtained with the 40 K data case; however, the eastern half presents a highly extended area with values between 2.01 and 3.00 m2/d2 (Figure 3(b)).
Figure 3

Estimates using 20 K (W) data (univariate case). (a) Estimation map. (b) Estimation error variance.

Figure 3

Estimates using 20 K (W) data (univariate case). (a) Estimation map. (b) Estimation error variance.

Close modal

The different hydraulic conductivity estimates were used as input for the K values in the numerical flow model and the simulated results were compared with those of the original model. These ‘errors’ were used to evaluate how well the hydraulic conductivity estimates allow the flow mechanisms of the original model to be reproduced (Table 3; the best simulations for each sampling case are highlighted in bold). The simulations with the lowest errors correspond to the estimation using 60 K data, differences with simulations of the original model (400 spatially distributed data for 6 time-steps) were minimal. As long as fewer K data were used for the estimation, larger errors were obtained. The maximum errors correspond to the case of K data grouped in the North followed by the case of K data grouped in the West.

Table 3

model simulation errors for the univariate K estimates

Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln K(m)(m)(m)(m)
20 – −0.18 1.43 0.23 0.34 0.02 
 20 − 0.24 1.13 0.22 0.30 0.02 
40  − 0.19 1.18 0.12 0.23 0.01 
– 40 −0.19 1.16 0.15 0.24 0.02 
60  − 0.45 0.33 0.03 0.09 0.01 
– 60 −0.22 0.83 0.07 0.15 0.00 
20 (N)  0.00 2.47 0.8 0.99 0.08 
– 20 (N) 0.00 2.5 0.81 1.01 0.09 
20 (W) – 0.03 2.17 0.62 0.78 0.07 
 20 (W) − 0.02 2.13 0.59 0.74 0.06 
Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln K(m)(m)(m)(m)
20 – −0.18 1.43 0.23 0.34 0.02 
 20 − 0.24 1.13 0.22 0.30 0.02 
40  − 0.19 1.18 0.12 0.23 0.01 
– 40 −0.19 1.16 0.15 0.24 0.02 
60  − 0.45 0.33 0.03 0.09 0.01 
– 60 −0.22 0.83 0.07 0.15 0.00 
20 (N)  0.00 2.47 0.8 0.99 0.08 
– 20 (N) 0.00 2.5 0.81 1.01 0.09 
20 (W) – 0.03 2.17 0.62 0.78 0.07 
 20 (W) − 0.02 2.13 0.59 0.74 0.06 

The best results for each sampling case are in bold.

The MAPE shows that for all cases the model error represents a minimal fraction of the data at each node; this means that all the estimations are robust enough to be used in the numerical flow model which has low sensitivity.

Bivariate estimation

The state vector length for the bivariate case is twice the size of that for the univariate case because it now includes the 400 s of the final simulation time. The prior estimate values for each subvector are the average data values ‘available’ for K or , respectively. The K and covariance matrices and the cross-covariance matrix are derived using the selected variogram models for each case's K and available’ data (Tables 4 and 5). Each matrix is constructed of 400 rows and 400 columns (as stated in Equations (9, 10 and 11) for the bivariate case).

Table 4

Parameters of the fitted variogram models for the bivariate cases with K distributed through the entire aquifer extent

Available data
NuggetSillRange
Kln KHHCorrModelComponent(m2/d2)(m2/d2)(m)
     K – K 0.34 13.47  
20 – 60 −0.768 Gaussian K – HH – −2.46 3,941.79 
     HH – HH 3.07  
     ln K – ln K 0.75  
– 20 60 −0.859 Spherical ln K – HH – −0.95 3,812.19 
     HH – HH 2.38  
     K – K 0.34 13.47  
20 – 400 −0.768 Gaussian K – HH – −2.43 3,941.79 
     HH – HH 3.1  
     ln K – ln K 0.75  
– 20 400 −0.859 Spherical ln K – HH – −1.05 3,812.19 
     HH – HH 2.08  
     K – K 0.91 14.3  
40 – 60 −0.738 Gaussian K – HH – −2.68 3,990.39 
     HH – HH 3.12  
     ln K – ln K 0.17 0.92  
– 40 60 −0.789 Gaussian ln K – HH – −0.62 4,414.23 
     HH – HH 3.37  
     K – K 0.91 14.3  
40 – 400 −0.738 Gaussian K – HH – −2.66 3,990.39 
     HH – HH 3.15  
     ln K – ln K 0.17 0.93  
– 40 400 −0.789 Gaussian ln K – HH – −0.61 4,478.34 
     HH – HH 3.47  
     K – K 0.78 14.4  
60 – 60 −0.776 Gaussian K – HH – −2.53 4,227.33 
     HH – HH 3.26  
     ln K – ln K 0.09 1.08  
– 60 60 −0.847 Gaussian ln K – HH – −0.74 4,107.15 
     HH – HH 3.18  
     K – K 0.78 14.4  
60 – 400 −0.776 Gaussian K – HH – −2.5 4,227.33 
     HH – HH 3.3  
     ln K – ln K 0.09 1.11  
– 60 400 −0.847 Gaussian ln K – HH – −0.7 4,227.33 
     HH – HH 3.3  
Available data
NuggetSillRange
Kln KHHCorrModelComponent(m2/d2)(m2/d2)(m)
     K – K 0.34 13.47  
20 – 60 −0.768 Gaussian K – HH – −2.46 3,941.79 
     HH – HH 3.07  
     ln K – ln K 0.75  
– 20 60 −0.859 Spherical ln K – HH – −0.95 3,812.19 
     HH – HH 2.38  
     K – K 0.34 13.47  
20 – 400 −0.768 Gaussian K – HH – −2.43 3,941.79 
     HH – HH 3.1  
     ln K – ln K 0.75  
– 20 400 −0.859 Spherical ln K – HH – −1.05 3,812.19 
     HH – HH 2.08  
     K – K 0.91 14.3  
40 – 60 −0.738 Gaussian K – HH – −2.68 3,990.39 
     HH – HH 3.12  
     ln K – ln K 0.17 0.92  
– 40 60 −0.789 Gaussian ln K – HH – −0.62 4,414.23 
     HH – HH 3.37  
     K – K 0.91 14.3  
40 – 400 −0.738 Gaussian K – HH – −2.66 3,990.39 
     HH – HH 3.15  
     ln K – ln K 0.17 0.93  
– 40 400 −0.789 Gaussian ln K – HH – −0.61 4,478.34 
     HH – HH 3.47  
     K – K 0.78 14.4  
60 – 60 −0.776 Gaussian K – HH – −2.53 4,227.33 
     HH – HH 3.26  
     ln K – ln K 0.09 1.08  
– 60 60 −0.847 Gaussian ln K – HH – −0.74 4,107.15 
     HH – HH 3.18  
     K – K 0.78 14.4  
60 – 400 −0.776 Gaussian K – HH – −2.5 4,227.33 
     HH – HH 3.3  
     ln K – ln K 0.09 1.11  
– 60 400 −0.847 Gaussian ln K – HH – −0.7 4,227.33 
     HH – HH 3.3  
Table 5

Parameters of the fitted variogram models for the bivariate cases with K data distributed in the Northern (N) and Western (W) halves of the aquifer

Available data
NuggetSillRange
Kln KHHCorrModelComponent(m2/d2)(m2/d2)(m)
     K – K 2.64  
20 (N) – 60 −0.870 Spherical K – HH – −0.59 1,348.49 
     HH – HH 0.69  
     ln K – ln K 0.07  
– 20 (N) 60 −0.886 Gaussian ln K – HH – −0.09 1,118.03 
     HH – HH 0.57  
     K – K 0.19 6.23  
20 (N) – 400 −0.819 Gaussian K – HH – −0.34 3,503.97 
     HH – HH 2.83  
     ln K – ln K 0.17  
– 20 (N) 400 −0.816 Gaussian ln K – HH – −0.06 3,503.97 
     HH – HH 2.83  
     K – K 14.55  
20 (W) – 60 −0.881 Spherical K – HH – −2.79 4,976.19 
     HH – HH 2.55  
     ln K – ln K 0.96  
– 20 (W) 60 −0.852 Gaussian ln K – HH – −0.82 3,483.45 
     HH – HH 2.23  
     K – K 14.06  
20 (W) – 400 −0.776 Spherical K – HH – −3.02 4,804.63 
     HH – HH 2.6  
     ln K – ln K 0.77  
– 20 (W) 400 −0.819 Gaussian ln K – HH – −0.78 4,406.08 
     HH – HH 2.63  
Available data
NuggetSillRange
Kln KHHCorrModelComponent(m2/d2)(m2/d2)(m)
     K – K 2.64  
20 (N) – 60 −0.870 Spherical K – HH – −0.59 1,348.49 
     HH – HH 0.69  
     ln K – ln K 0.07  
– 20 (N) 60 −0.886 Gaussian ln K – HH – −0.09 1,118.03 
     HH – HH 0.57  
     K – K 0.19 6.23  
20 (N) – 400 −0.819 Gaussian K – HH – −0.34 3,503.97 
     HH – HH 2.83  
     ln K – ln K 0.17  
– 20 (N) 400 −0.816 Gaussian ln K – HH – −0.06 3,503.97 
     HH – HH 2.83  
     K – K 14.55  
20 (W) – 60 −0.881 Spherical K – HH – −2.79 4,976.19 
     HH – HH 2.55  
     ln K – ln K 0.96  
– 20 (W) 60 −0.852 Gaussian ln K – HH – −0.82 3,483.45 
     HH – HH 2.23  
     K – K 14.06  
20 (W) – 400 −0.776 Spherical K – HH – −3.02 4,804.63 
     HH – HH 2.6  
     ln K – ln K 0.77  
– 20 (W) 400 −0.819 Gaussian ln K – HH – −0.78 4,406.08 
     HH – HH 2.63  

Cross-validation was used to evaluate the best variogram model fit from all the analyzed models. As can be seen in Table 6, when the K data are distributed through the entire aquifer extent, the increase in the number of data is almost unnoticeable in the value of the variogram parameters except for the range; on the other hand, changes in the number of K data produce slightly more significant changes in the nugget and sill values. The impact of the number of data in the variogram parameters is larger when the K data are grouped in the North and the West.

Table 6

K and ln K estimate errors for the bivariate cases

Number of data
Min errorMax errorMean errorMAPE (%)
Kln KHH(m/d)(m/d)(m/d)RMSE (m/d)SMSE
20  60 − 3.91 2.67 − 0.07 1.15 0.95 43.00 
– 20 60 −3.67 2.35 −0.08 0.93 0.10 28.86 
20 – 400 −3.7 4.07 −0.02 1.34 0.84 63.95 
– 20 400 −3.79 2.39 −0.17 1.01 0.87 28.28 
40  60 − 3.68 2.57 − 0.14 0.99 0.51 30.09 
– 40 60 −3.26 3.17 0.28 1.01 0.05 30.63 
40 – 400 −3.67 3.07 −0.14 0.51 30.56 
– 40 400 −3.24 3.15 0.29 1.01 0.09 30.97 
60  60 − 3.21 2.08 − 0.08 0.78 0.51 20.47 
– 60 60 −3.24 2.42 0.1 0.84 0.12 18.13 
60 – 400 −3.21 2.08 −0.08 0.78 0.5 20.25 
– 60 400 −3.21 2.38 0.1 0.83 0.12 18.07 
20(N) – 60 −2.96 1.78 2.91 3.88 173.93 
– 20(N) 60 −2.56 6.34 2.06 3.11 3.23 184.08 
20(N) – 400 −2.9 6.01 1.51 2.89 2.34 181.67 
 20(N) 400 − 2.96 6.19 1.56 2.97 1.58 185.67 
20 (W) – 60 −4.35 2.1 −0.64 1.43 0.41 31.69 
– 20 (W) 60 −4.18 3.11 −0.18 1.44 1.26 52.99 
20 (W) – 400 −4.21 2.66 −0.56 1.4 0.46 34.62 
 20 (W) 400 − 3.61 3 − 0.2 1.34 1. 24 43.61 
Number of data
Min errorMax errorMean errorMAPE (%)
Kln KHH(m/d)(m/d)(m/d)RMSE (m/d)SMSE
20  60 − 3.91 2.67 − 0.07 1.15 0.95 43.00 
– 20 60 −3.67 2.35 −0.08 0.93 0.10 28.86 
20 – 400 −3.7 4.07 −0.02 1.34 0.84 63.95 
– 20 400 −3.79 2.39 −0.17 1.01 0.87 28.28 
40  60 − 3.68 2.57 − 0.14 0.99 0.51 30.09 
– 40 60 −3.26 3.17 0.28 1.01 0.05 30.63 
40 – 400 −3.67 3.07 −0.14 0.51 30.56 
– 40 400 −3.24 3.15 0.29 1.01 0.09 30.97 
60  60 − 3.21 2.08 − 0.08 0.78 0.51 20.47 
– 60 60 −3.24 2.42 0.1 0.84 0.12 18.13 
60 – 400 −3.21 2.08 −0.08 0.78 0.5 20.25 
– 60 400 −3.21 2.38 0.1 0.83 0.12 18.07 
20(N) – 60 −2.96 1.78 2.91 3.88 173.93 
– 20(N) 60 −2.56 6.34 2.06 3.11 3.23 184.08 
20(N) – 400 −2.9 6.01 1.51 2.89 2.34 181.67 
 20(N) 400 − 2.96 6.19 1.56 2.97 1.58 185.67 
20 (W) – 60 −4.35 2.1 −0.64 1.43 0.41 31.69 
– 20 (W) 60 −4.18 3.11 −0.18 1.44 1.26 52.99 
20 (W) – 400 −4.21 2.66 −0.56 1.4 0.46 34.62 
 20 (W) 400 − 3.61 3 − 0.2 1.34 1. 24 43.61 

Best options for the different number of available K data are in bold.

Tables 4 and 5 show the correlation between the or and data. The higher correlations on average correspond to the cases when 20 are grouped in the North, followed closely by the cases where 20 K data are grouped in the West. The lower average correlation was for the cases where 40 K data are distributed throughout the extent of the aquifer.

The estimate errors for the bivariate cases are shown in Table 6; the best options for the different number of available K data are highlighted in bold. For the 20 K data distributed through the entire aquifer extent, the lower estimate errors were obtained for one of the cases with higher correlations between K and data (the 20 – 60 data case). The second lowest variances for these cases were obtained for the case of 20 K and 60 HH data, with a better agreement between the calculated errors and the estimate error variances, which makes these estimations more reliable. Using the data over the 20 K univariate cases reduces the RMSE for K by up to 16.96%.

For the bivariate cases that use 40 K data, the best estimates were obtained for 40 K – 60 data followed closely by the 40 K – 400 data (both with lower correlations among the 40 K cases). A marginal improvement in estimations is noticed compared to the univariate cases (the RMSE for K is reduced by up to 4.71% for the lnK cases). In both bivariate cases (without transformation) the SMSE values are also very similar, however the 40 K – 60 HH data are preferable since less data are required for similar estimations.

When 60 K data are available, the best estimates were obtained for the univariate case without transformation; however, the 60 K – 60 and the 60K– 400 HH data cases are similar with even better SMSE values. The 60 K – 60 HH data option was selected as the best since a lower number of HH data gives practically the same results. It is important to note that the RMSE for the 60 lnK univariate case is 1.28% lower than that obtained for the best bivariate case.

For the bivariate cases of 20, 40 and 60 K data distributed through the aquifer, the RMSE values for the best options are similar to the respective univariate cases, however the SMSE values for the former cases are closer to one; additionally the corresponding estimate error variances are lower which provides more confidence to the estimates obtained for the bivariate cases. The MAPE value is improved substantially for the 20 and 40 K data. The contribution of the secondary variable is reduced for a larger amount of data for the primary variable.

With respect to the K data grouped at in the North or West, larger variances are obtained for the bivariate cases, but according to the SMSE there is a better correspondence with the estimate errors. Furthermore, the calculated errors are lower for the bivariate cases with a notable reduction of the MAPE for the best bivariate case of K data distributed in the north, and for the best bivariate case of K data distributed in the west. These results clearly show the major contribution of the secondary variable in those zones where the primary variable has been poorly sampled.

In general, the best estimates were obtained for the bivariate cases with higher correlations, for the 20 K data cases. The worst estimates were produced for the bivariate cases of 20 K data grouped in the North (N) and West (W), with similar correlations for both (Table 5). The best estimates for these cases are obtained when using 20 (N) – 400 and the 20 (W) – 400 data (with the second lower correlations among their respective cases). With respect to the univariate cases using 20 K data grouped in the North (N) and West (W), the RMSE for K is reduced for their respective bivariate cases.

For the 20 -60 data bivariate estimates of K, this parameter is better estimated in the Northwest and Southern regions when compared to the univariate case. Furthermore, the highest estimate error variances (values above 4 m2/d2) can be found in more extended areas in the Southeast and South-Central regions. In general, the estimate error variances are higher for the whole aquifer.

The bivariate estimates of K for the 40 -60 data are very similar to the 40 K univariate estimates, with an overestimation in the northern half and the middle regions. Low estimate error variances (between 1.01 and 2.00 m2/d2) are found over a more extended area when compared to the univariate case, high values above 4 m2/d2 can be found in a small part of the Southeast.

For the 60 K – 60 data estimates, K is similar to the reference configuration (Figure 4(a)), like in the univariate case. Low estimate error variances (between 1.01 and 2.00 m2/d2) cover the aquifer almost entirely (these areas are more extensive than for the univariate case); on the other hand, more significant estimate error variances, between 2.01 and 3.00 m2/d2, can be found in minor spots (Figure 4(b)).
Figure 4

Estimates using 60 K and 60 data (bivariate case). (a) Estimation map. (b) Estimation error variance.

Figure 4

Estimates using 60 K and 60 data (bivariate case). (a) Estimation map. (b) Estimation error variance.

Close modal

Estimates of the bivariate – 400 data characterize the North Central and the middle part of the aquifer in a better manner; on the other hand, K estimates are now between 6.01 and 7.00 m/d in a more reduced area of the South. The estimated error variances are much lower in the Northern region; however, the extreme South presents an extended area with values between 6.01 and 7.00 m2/d2, which are higher than the univariate case.

Finally, the bivariate 20 (W) – 400 data substantially improves the univariate case estimates in the Eastern region; the estimated K values reflect the reference spatial configuration's general pattern due to the second variable (Figure 5(a)). The estimated error variances for almost the entire Western region are extremely low (less than 1.01 m2/d2); however, the Eastern half presents a highly extended area with values higher than 10.01 m2/d2 (Figure 5(b)).
Figure 5

Estimates using 20 (W) data and 400 data (bivariate case). (a) Estimation map. (b) Estimation error variance.

Figure 5

Estimates using 20 (W) data and 400 data (bivariate case). (a) Estimation map. (b) Estimation error variance.

Close modal

For the cases with 20 K data distributed within the entire aquifer extent (with or without using a secondary variable), the minimum RMSE do not correspond to the case where the SMSE values are closer to one; the variance values are large for the cases of the minimum RMSE. On the other hand, for the univariate and bivariate cases using 40 and 60 K data, the minimum RMSE correspond to the best agreements between the calculated errors and the estimate error variances obtained with the selected model (SMSE values closer to one).

For the cases of 20 K data distributed in the North or in the West, the minimum RMSE correspond to the cases of the second best SMSE values.

After the bivariate K estimates were used to simulate the hydraulic head evolution, the model hydraulic head errors were calculated (Table 7). The best simulations for each sampling case are highlighted in bold. The best results for 20 K data distributed through the entire aquifer extent correspond to the 20 – 60 and 20 – 400 cases (the error statistics are very similar) which correspond to the best K estimates and the highest correlations of these cases. The improvement in the simulations concerning the univariate case is notable (the RMSE for the hydraulic head simulation is reduced by up to 56.67%). The 20 – 60 case is considered better since it demands less data for the secondary variable.

Table 7

Model simulation errors for the bivariate K estimates

Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln KHH(m)(m)(m)(m)
20 – 60 −0.24 1.13 0.22 0.30 0.02 
 20 60 − 0.46 0.52 0.04 0.13 0.01 
20 – 400 −0.08 1.71 0.29 0.46 0.03 
20 400 −0.31 0.59 0.04 0.13 0.01 
40 – 60 −0.38 0.80 0.08 0.18 0.01 
 40 60 − 0.27 0.73 0.07 0.14 0.01 
40 – 400 −0.39 0.83 0.08 0.18 0.01 
– 40 400 −0.27 0.75 0.07 0.15 0.01 
60 – 60 −0.44 0.65 0.07 0.16 0.01 
 60 60 − 0.35 0.26 0.00 0.07 0.00 
60 – 400 −1.01 0.17 −0.02 0.13 0.01 
– 60 400 −0.42 0.25 0.00 0.07 0.00 
20 (N)  60 0.00 2.14 0.68 0.85 0.07 
– 20 (N) 60 0.00 2.46 0.71 0.91 0.07 
20 (N) – 400 0.00 2.46 0.71 0.91 0.07 
– 20 (N) 400 0.00 2.47 0.72 0.92 0.08 
20 (W)  60 − 0.98 0.52 0.02 0.17 0.01 
– 20 (W) 60 −0.15 1.33 0.20 0.33 0.02 
20 (W) – 400 −3.30 0.31 −0.09 0.40 0.02 
– 20 (W) 400 −0.34 0.97 0.14 0.25 0.02 
Variogram and estimation data
Min errorMax errorMean errorRMSEMAPE (%)
Kln KHH(m)(m)(m)(m)
20 – 60 −0.24 1.13 0.22 0.30 0.02 
 20 60 − 0.46 0.52 0.04 0.13 0.01 
20 – 400 −0.08 1.71 0.29 0.46 0.03 
20 400 −0.31 0.59 0.04 0.13 0.01 
40 – 60 −0.38 0.80 0.08 0.18 0.01 
 40 60 − 0.27 0.73 0.07 0.14 0.01 
40 – 400 −0.39 0.83 0.08 0.18 0.01 
– 40 400 −0.27 0.75 0.07 0.15 0.01 
60 – 60 −0.44 0.65 0.07 0.16 0.01 
 60 60 − 0.35 0.26 0.00 0.07 0.00 
60 – 400 −1.01 0.17 −0.02 0.13 0.01 
– 60 400 −0.42 0.25 0.00 0.07 0.00 
20 (N)  60 0.00 2.14 0.68 0.85 0.07 
– 20 (N) 60 0.00 2.46 0.71 0.91 0.07 
20 (N) – 400 0.00 2.46 0.71 0.91 0.07 
– 20 (N) 400 0.00 2.47 0.72 0.92 0.08 
20 (W)  60 − 0.98 0.52 0.02 0.17 0.01 
– 20 (W) 60 −0.15 1.33 0.20 0.33 0.02 
20 (W) – 400 −3.30 0.31 −0.09 0.40 0.02 
– 20 (W) 400 −0.34 0.97 0.14 0.25 0.02 

Best simulations for each sampling case are in bold.

The best simulation for the bivariate cases using 40 K data corresponds to the 40 – 60 case. Significant improvements in simulation results are obtained compared to those in the univariate case (up to 41.67% reduction of the RMSE for hydraulic head simulation), but lower than the best for the 20 K cases.

For the bivariate cases using 60 K data, the best simulations were obtained for the 60 – 60 (with the highest correlation). The advantage of the bivariate cases compared to the simulations with the univariate case K estimates is also clear (the RMSE for hydraulic head simulation is reduced by up to 53.33%).

As for the univariate cases, the MAPE shows that for all the cases the calculated error represents a minimal fraction of the data at each node, which corroborates the robustness of the estimates to be used in the numerical flow model. The major reduction in the MAPE corresponds to the simulation obtained for the best bivariate case with 20 K data grouped in the west (it is reduced 0.05% with respect to the corresponding best univariate case).

Although the log-transformed K data do not help to obtain the best K estimates for the bivariate cases, they provide the best results in the numerical flow model simulations for all the bivariate cases of K data distributed through the entire aquifer extent. This transformation favors obtaining similar absolute values of the bivariate variogram parameters for the and the sills and a larger value of the sill component. With these similar values, a higher weight is given to the data in the estimation of K (without logarithmic transformation, the largest sill value corresponds to the univariate K sill component), allowing more advantage to be taken of the information provided by the secondary variable.

For the cases when the available K data are located in the northern portion of the model domain, the best simulations are obtained for the 20 K (N) – 60 data (which coincides with the best estimates and the second lowest correlation of these cases). The best simulations for the cases where the K data are grouped in the West portion of the domain correspond to the 20 K (W) – 60 data (it has the second-best correlation of these cases). Bivariate cases for data grouped in the North and West offer better simulations than their respective univariate cases, due to the additional information provided by the data in areas without K data. Unlike the K data distributed through the entire aquifer extent cases, the worst simulations among these bivariate cases are obtained when the logarithmic transformation is used. Using 60 instead of 400 data seems to be a better option according to simulation results. The RMSE for hydraulic head simulation is reduced by up to 14.14 and 78.21% for the bivariate K (N) and K (W) cases, respectively.

As expected, K estimates are improved as the spatial coverage of K increases, reducing the contribution of data. In the Kalman filter implementation, the use of K data rapidly reduces the estimate error variance on estimation points; therefore, for bivariate cases, information provided by data is highly redundant in the vicinity of K sampled sites. Therefore, more benefit is obtained from the information provided by the secondary variable when there are extensive areas without data for the primary variable, as occurs with the case of K data grouped in the West; since there is no data for K in the East, the larger the number of data, then the estimates are better in that zone.

However, care must be taken in obtaining the cross-variogram and selecting the initial state to maximize the advantage of using a secondary variable. In this sense, prior to implementing these bivariate techniques it is necessary to evaluate correlation between variables and planning for an adequate sampling scheme that considers hydrogeological knowledge of the study area.

Ahmed & de Marsily (1989) presented the only previous geostatistical study we know of in which cokriging is used to estimate T using HH data (not in the inverse problem context), and the results are compared with kriging T estimates. In this work, we use a static Kalman filter to estimate K, which is closely related to T, and we also use HH data. Both methods are related because they are best linear unbiased estimation methods, and, in both, the spatial structure model is obtained through a geostatistical analysis. Our results confirm that cokriging cross-validation estimate errors are smaller than the kriging errors for the corresponding case; however, in contrast to their results, our results had SMSE values closer to one than the kriging values were. On the other hand, we point out that Ahmed & de Marsily (1989) did not test the T estimate's sensitivity to the data's number and locations.

Other papers estimate hydraulic conductivity using hydraulic head data in the inverse problem context. However, we note that an inverse problem objective is different from ours, so we do not think comparing our results with a paper of that nature would provide insights into our analysis. Recent works that estimate hydraulic conductivity using artificial intelligence also do not perform a sensitivity analysis of the estimates to the number of data and their spatial distribution.

The results show that the use of HH data as a secondary variable in the evaluated geostatistical–Kalman filter estimation method helps to obtain low estimate error variances in the estimation of K; furthermore, from a geostatistical point of view, these results are more reliable since they have a better agreement with the calculated errors from the cross validation of the selected variogram model. However, it is noticed that the contribution of the HH data (as secondary variable) is reduced when the number and coverage of available K data increases. A major contribution of the HH data is achieved in those areas of the aquifer where this variable has been measured and K has been poorly sampled.

The differences in the simulated values of HH for the different spatial configurations of K obtained with the geostatistical–Kalman filter method for the case study are minimal, which implies the robustness of the estimation method to obtain reference fields useful for a groundwater model calibration process.

It is also worth mentioning that the method proposed in Júnez-Ferreira et al. (2020) is faster than the kriging and cokriging methods because the Kalman filter allows processing one data at a time, avoiding inverting a matrix. Processing the test case with 60 K and 400 HH data (the one with the most data) takes a few seconds on a notebook computer with an Intel Core(TM) i76500U CPU (2.5 GHz, 2.6 GHz) and 16GB RAM.

Univariate geostatistical techniques, such as ordinary kriging, represent a standard tool in hydrogeology to obtain low uncertainty estimates of hydraulic conductivity; however, more recently, the use of auxiliary variables has been promoted to improve the results. In this paper, the influence in the estimation of hydraulic conductivity of the number and spatial location of data was evaluated for univariate and bivariate Kalman filter–geostatistical approaches, and the errors of the hydraulic heads simulated with the resulting K estimates.

The use of statistics that measure differences between estimated and reference values is perhaps the most usual manner to evaluate estimation techniques; however, in most field situations, there are no data to validate the estimation results, making synthetic cases an excellent alternative to do the evaluation when introducing and validating new methods.

In the case study presented, it was found that bivariate estimates take advantage of the correlation between the primary and the secondary variables ( and , respectively) to outperform the univariate estimates (the improvement in the estimation for bivariate cases is reduced when a better sampling scheme for the primary variable is followed). Our analysis shows that hydraulic head can provide valuable information where hydraulic conductivity was not sampled.

For those areas where only is available, the bivariate estimated K outperforms the univariate cases, but the obtained estimate error variances are larger than in the univariate case. The larger sill of the bivariate K model variograms can explain these large variances.

The variation in the number of data (the primary variable) produces more changes in the bivariate variogram than the variation of the number of data (the secondary variable). This is because ‘available’ data allow the estimate error variance for and to be reduced to close-to-zero values, and the influence of the secondary variable () data on the estimates is insignificant.

For the bivariate cases with a uniform spatial distribution of the K and data, a higher correlation between K and data does not guarantee better estimates of K. When K is uniformly distributed over the estimation area, the data contributes more to improving the K estimates if the largest sill value of the bivariate variogram model corresponds to the univariate component of . Also, when K is not sampled for the entire estimation area, a higher correlation between the primary and the secondary variable and a good selection of the initial state vector are more relevant.

The best model simulation results for the bivariate cases with data distributed through the entire aquifer extent are obtained using (corresponding to the highest correlations); however, this option does not always offer the best estimates. The cases using 20 K data grouped in the Western half of the domain only (20 and 400 data) are very close to those obtained with the best simulation of the bivariate cases using 20 K data uniformly distributed (20 and 60 data). These results corroborate the advantage of using a secondary variable to improve the estimations of the primary variable at unsampled sites. For the case study, the sampling of K in a fringe from North to South (aligned with the flow direction) takes more advantage of the correlation between K and HH in comparison to the case where K is sampled in a fringe from West to East (perpendicular to the flow direction).

One of the major limitations of this study is that the proposed synthetic case is simple. It consists of a single layer isotropic medium with a constant storage coefficient, and a simple geometry. Also, the measurement errors for K and HH were neglected.

One important improvement of the estimation method would be the inclusion of additional auxiliary variables such as electrical resistivity, specific capacity and water level slopes.

For the case study, the sampling of K data oriented in the flow direction takes more advantage of the HH data located in unsampled areas of the aquifer; this result deserves a deeper analysis in order to optimize the characterization of the hydraulics parameters of an aquifer. Another interesting topic is the estimation of anisotropic K fields.

Future work may include strategies for sampling design (including the hydrogeologic framework) to improve estimates of within an aquifer, considering uncertainty in ‘available’ data, and more complex cases.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmad
N.
,
Akbar
M.
,
Qayoom
A.
&
Bintul
M.
2021
Determination of hydraulic conductivity for granular filters based on constriction size and shape parameters
.
Water Supply
21
(
8
),
4121
4129
.
doi: 10.2166/ws.2021.167
.
Ahmed
S.
&
De Marsily
G.
1989
Co-Kriged estimates of transmissivities using jointly water level data
.
Geostatistics
2
,
615
628
.
Bear
J.
1972
Dynamics of Fluids in Porous Media
.
Elsevier Pub. Co
,
New York
.
Briseño-Ruiz
J.-B.
2012
Método para la calibración de modelos estocásticos de flujo y transporte en aguas subterráneas, para el diseño de redes de monitoreo de calidad del agua
.
Master Thesis
,
UNAM
,
México
.
https://doi.org/10.22201/dgpyfe.9786070253164e.2012
.
Carrera
B.
,
Man
C.
&
Papaioannou
I.
2020
Efficient estimation of hydraulic conductivity heterogeneity with non-redundant measurement information
.
GEM – International Journal on Geomathematics
11
15
.
https://doi.org/10.1007/s13137-020-00151-1
.
Chandel
A.
,
Sharma
S.
&
Shankar
V.
2022
Prediction of hydraulic conductivity of porous media using a statistical grain-size model
.
Water Supply
22
(
4
),
4176
4192
.
https://doi.org/10.2166/ws.2022.043
.
Chen
Y. C.
,
Tsai
J. P.
,
Chang
L. C.
,
Chang
P. Y.
&
Lin
H.
2018
Estimating hydraulic conductivity fields in composite fan delta using vertical electrical sounding
.
Water (Switzerland)
10
(
11
),
1
19
.
Deutsch
C. V.
&
Journel
A. G.
1998
Geostatistical Software Library and User's Guide (GSLIB). Oxford University Press, New York
.
El Idrysy
E. H.
&
De Smedt
F.
2007
A comparative study of hydraulic conductivity estimations using geostatistics
.
Hydrogeology Journal
15
(
3
),
459
470
.
Erdal
D.
&
Cirpka
O. a.
2016
Joint inference of groundwater-recharge and hydraulic-conductivity fields from head data using the Ensemble-Kalman filter
.
Hydrology and Earth System Sciences
20
,
555
569
.
https://doi.org/10.5194/hessd-12-5565-2015
.
Freeze
R. A.
&
Cherry
J. A.
1979
Groundwater. Prentice Hall, Englewood Cliffs, New Jersey
.
Freixas
G.
,
Fernàndez-Garcia
D.
&
Sanchez-Vila
X.
2017
Stochastic estimation of hydraulic transmissivity fields using flow connectivity indicator data
.
Water Resources Research
53
(
1
),
602
618
.
Gorgij
A. D.
,
Kisi
O.
,
Moayeri
M. M.
&
Moghaddam
A. A.
2018
Hydraulic conductivity estimation via the AI-based numerical model optimization using the harmony search algorithm
.
Hydrology Research
49
(
5
),
1669
1683
.
https://doi.org/10.2166/nh.2018.147
.
Gumiere
S. J.
,
Lafond
J. A.
,
Hallema
D. W.
,
Périard
Y.
,
Caron
J.
&
Gallichand
J.
2014
Mapping soil hydraulic conductivity and matric potential for water management of cranberry: characterisation and spatial interpolation methods
.
Biosystems Engineering
128
,
29
40
.
Harbaugh
A. W.
,
Banta
E. R.
,
Hill
M. C.
&
McDonald
M. G.
2000
MODFLOW-2000, the U.S. Geological Survey modular ground-water model—user guide to modularization concepts and the ground-water flow process. Tech rep. Open-File Report 00-92. U.S. Department of the Interior, U.S. Geological Survey, Reston, Virginia. 121 pp
.
Herrera
G.
1998
Cost Effective Groundwater Quality Sampling Network Design
.
PhD Thesis
,
University of Vermont
,
USA
.
Júnez-Ferreira
H. E.
,
González-Trinidad
J.
,
Júnez-Ferreira
C. A.
,
Octavio
C.
,
Rovelo
R.
,
Herrera
G. S.
&
Olmos-trujillo
E.
2020
Implementation of the Kalman filter for a geostatistical bivariate spatiotemporal estimation of hydraulic conductivity in aquifers
.
Water
12
,
1
20
.
Kitanidis
P. K.
&
Vomvoris
E. G.
1983
A geostatistical approach to the inverse problem in groundwater modeling (steady-state) and one- dimensional simulations
.
Water Resources Research
20
(
7
),
1003
1020
.
Kupfersberger
H.
,
Blaschke
A. P.
&
Reitinger
J.
1992
Behandlung von Inhomogenitäten in einem Grundwasserleiter. In: Hydrology of Austria. Protecting groundwater in valley and basin areas. Tech. Report No. 2, Technical University of Vienna, Vienna
.
Oliver
M. A.
&
Webster
R.
2014
A tutorial guide to geostatistics: computing and modelling variograms and kriging
.
Catena
113
,
56
69
.
https://doi.org/10.1016/j.catena.2013.09.006
.
Panagopoulos
T.
,
Jesus
J.
,
Antunes
M. D. C.
&
Beltrão
J.
2006
Analysis of spatial interpolation for optimising management of a salinized field cultivated with lettuce
.
European Journal of Agronomy
24
(
1
),
1
10
.
Panzeri
M.
,
Riva
M.
,
Guadagnini
A.
&
Neuman
S. P.
2015
EnKF coupled with groundwater flow moment equations applied to Lauswiesen aquifer, Germany
.
Journal of Hydrology
521
,
205
216
.
Patriarche
D.
,
Castro
M. C.
&
Goovaerts
P.
2005
Estimating regional hydraulic conductivity fields – a comparative study of geostatistical methods
.
Mathematical Geology
37
(
6
),
587
613
.
Rubin
Y.
,
Mavko
G.
&
Harris
J.
1992
Mapping permeability in heterogeneous aquifers using hydro- logic and seismic data
.
Water Resources Research
28
,
1809
1816
.
Samper Calvete
F. J.
&
Carrera Ramírez
J.
1990
Geoestadística. Aplicaciones a la hidrogeología subterranea
.
Centro Internacional de métodos numéricos en ingeniería
,
Barcelona
.
Singh
V. K.
,
Panda
K. C.
,
Sagar
A.
,
Al-Ansari
N.
,
Duan
H.
,
Paramaguru
P. K.
,
Vishwakarma
D. K.
,
Kumar
A.
,
Kumar
D.
,
Kashyap
P. S.
,
Singh
R. M.
&
Elbeltagi
A.
2022
Novel Genetic Algorithm (GA) based hybrid machine learning-pedotransfer Function (ML-PTF) for prediction of spatial pattern of saturated hydraulic conductivity
.
Engineering Applications of Computational Fluid Mechanics
16
(
1
),
1082
1099
.
doi:10.1080/19942060.2022.2071994
.
Xu
T.
&
Gómez-Hernández
J. J.
2015
Inverse sequential simulation: a new approach for the characterization of hydraulic conductivities demonstrated on a non-Gaussian field
.
Water Resources Research
51
(
4
),
2227
2242
.
https://doi.org/10.1002/2014WR016320
.
Xu
T.
,
Gómez-Hernández
J. J.
,
Li
L.
&
Zhou
H.
2013
Parallelized ensemble Kalman filter for hydraulic conductivity characterization
.
Computers and Geosciences
52
,
42
49
.
https://doi.org/10.1016/j.cageo.2012.10.007
.
Xu
Z.
,
Wang
X.
,
Chai
J.
,
Qin
Y.
&
Li
Y.
2017
Simulation of the spatial distribution of hydraulic conductivity in porous media through different methods
.
Mathematical Problems in Engineering
2017
,
10
.
Yusefzadeh
S.
&
Nadiri
A. A.
2021
Estimation hydraulic conductivity via intelligent models using geophysical data
.
Advanced Applied Geology
11
(
2
),
382
404
.
https://doi.org/10.22055/AAG.2020.29223.1970
.
Webster
R.
&
Oliver
M. A.
2007
Geostatistics for Environmental Scientists (Second Ed., Vol. 14). John Wiley & Sons, Chichester
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).