## Abstract

Groundwater as a vital resource for humankind is being debilitated by enormous over-extraction and intensifying contamination. Insightful advancement and protection of this significant resource needs a careful understanding of aquifer parameters. In the present study, the groundwater level was predicted at first, using a hybrid wavelet artificial neural networks and genetic programming (wavelet-ANN-GP) model. The hybrid model results were then evaluated using the performance evaluation criteria including R square, root mean square error (RMSE), mean absolute error and Nash–Sutcliffe efficiency, respectively ranged from 0.81 to 0.97, 0.070 to 4.45, 0.016 to 3.036 and 0.74 to 0.96, which revealed the high applicability of the hybrid model. The groundwater levels were predicted using wavelet-ANN-GP and then entered into the numerical model. Harmony search (HS) was used for the optimization of the numerical model. Hydraulic conductivity (HC) was estimated during the optimization process. Then, the estimated HC was extended throughout the aquifer domain by the empirical Bayesian kriging (EBK) method. Eventually, estimated hydraulic conductivity was compared by defined hydraulic conductivity through the pumping test. The plotted map of the estimated hydraulic conductivity showed about 87.5% conformity to points with distinct hydraulic conductivities obtained from the pumping test. The results proved the applicability of AI-based meta-heuristic optimization models in water resource studies.

## INTRODUCTION

About two billion people around the world satisfy their need for drinking water via groundwater, which is the main fresh water resource on Earth (Richey *et al.* 2015). To protect this vital life-sustaining reserve from the progressively intensifying predicament of aquifer withdrawal (Clarke *et al.* 1996; Wada *et al.* 2010; Konikow 2011; Werner & Gleeson 2012; Richey *et al.* 2015), the increasing occurrence of aquifer contamination (Clarke *et al.* 1996), and the probable influences of climate change (Taylor *et al.* 2013), we must have detailed knowledge of aquifer hydrogeological properties, of which hydraulic conductivity is one of the most important ones.

Aquifer pumping tests are traditionally utilized to attain approximations of hydraulic parameters. Many deterministic and stochastic models have been produced to suit distinctive aquifer settings and pumping (Kruseman & de Ridder 1994; Batu 1998; Li *et al.* 2005). Other methods, for example, borehole flow meters (Paillet 1998; Li *et al.* 2008; Paradis *et al.* 2011), geophysical surveys (Hinnell *et al.* 2010; Capriotti & Li 2015; Singha *et al.* 2015), and experimental estimation techniques (Domenico & Schwartz 1998) were developed to indirectly calculate hydraulic conductivity of the aquifer. Pumping tests are a fundamental aquifer testing method for estimating hydraulic conductivity (Sun & Goltz 2016).

Leading a pumping test is, however, by no means a trivial task; primarily, it costs both money and time to plan and implement a test. Furthermore, there are some circumstances in which an important pumping test cannot be directed. For instance, aquifers with low conductivity will be unable to withstand an expansive adequate pumping rate to initiate quantifiable drawdown at monitoring wells. Finally, there are frequently regulatory and economic obstacles forced by the need to discard the potentially polluted water that was extracted throughout the test. An alternative technique that can be used to assess aquifer hydraulic conductivity while abstaining from pumping tests would be of incredible benefit, of which mathematical models are one (Sun & Goltz 2016).

The last decade has seen a signiﬁcant intensification of the number of scientiﬁc methods used for hydrologic modelling and prediction, involving the most popular ‘data-driven’ methods. Such modelling methods include mathematical equations obtained from an analysis of simultaneous input and output time series that are not from the physical process in the watershed (Solomatine & Ostfeld 2008). Mathematical models are the essential tools of modelling groundwater systems, defining the responses of the aquifer systems for diverse management substitutes. Meanwhile, it is becoming increasingly difficult to disregard the role of artificial intelligence (AI) in hydrological processes forecasting due to its efficiency in complex physical processes modelling based on certain data/information governing the process. There has been a great deal of research on groundwater, utilizing the AI (e.g. Adamowski & Chan 2011; Kisi & Shiri 2012; Fallah-Mehdipour *et al.* 2013; Maheswaran & Khosa 2013a; Moosavi *et al.* 2013b; Nourani *et al.* 2015; Seo *et al.* 2015; Sivapragasam *et al.* 2015).

Nevertheless, the above-mentioned models may not be eligible to determine the best expected results. Combining AI with new optimization algorithms to solve problems may be the best possible method and the harmony search (HS) meta-heuristic algorithm is one of the latest of these.

The HS algorithm introduced an innovative connection between optimization techniques and music. The idea of the new algorithm was born when Zong Woo Geem observed the imaginative procedure of a group of musicians playing together, and the analogy with an optimization technique was expanded by Geem *et al.* (2001). Whereas this novel method is aimed at optimum situations such as minimum cost, maximum benefits, resources management, scheduling etc., there is no study related to its application in aquifer hydraulic parameters estimation, which the present study has tried to achieve.

Briefly, this study attempts to estimate the hydraulic conductivity of an aquifer domain through the optimization of a hybrid artificial intelligence-mathematical model and extending its results throughout the domain, employing the empirical Bayesian kriging (EBK) method.

## METHODS AND MATERIALS

The procedure of the present study includes groundwater level forecasting by a hybrid AI model as the initial step, which will be carried out using wavelet-based artiﬁcial neural network (WANN) and its conjugating to the genetic programming (GP) model (WANN-GP). The second step is the numerical modelling of the aquifer flow system and optimizing its results. The third step is hydraulic conductivity obtained from the water flow governing equations and finally extending the results over the entire aquifer domain using EBK. Figure 1 shows the flowchart of the overall procedure for this study.

### Hybrid wavelet-artificial neural network

Since artiﬁcial intelligence has shown promising results in the modeling and prediction of non-linear hydrological phenomena and in handling large amounts of dynamicity and noise concealed in datasets, hybrid modeling of AI was employed for precise simulation of water levels in piezometers and the elimination of their statistical gaps over a long-term period. For this, the wavelet transform model was linked to an artificial neural network.

WANN is the conjunction model of wavelet decomposition and ANN. In WANN, sub-series obtained using wavelet decomposition are used as inputs to ANN. Details and approximation sub-components attained by wavelet decomposition are utilized as inputs. The WANN can be briefly explained as follows.

Step 1. Wavelet analysis via discrete wavelet transform (DWT) decomposes a time series into details (*D*_{1}, *D*_{2} … *D _{N}*) and approximation (

*A*), where

_{N}*N*is the decomposition level. Water table time series were decomposed into the D1 and D2 details and A1 as approximation in this study. Decomposition levels have been selected according to the number of data for each piezometer (i.e. the data from the water table for each piezometer).

Step 2. ANN was trained and validated by using each sub-component as input to the applied model. Figure 2 shows the schematic flowchart of this step.

### Genetic programming

GP is a kind of AI method that depends on the random iterative searching process to obtain an appropriate relation between input and output. The common structure of this method is a tree shape, representing the expression. Variables, functions, and operators in this structure are situated in the nodes, which are linked together by branches (Docheshmeh Gorgij *et al.* 2016).

However, in this study, the simulated water tables by WANN were used as inputs to the GP model for time series prediction, which can be seen in the flowchart of the study steps (Figure 2). In other words, the simulated water tables of each piezometer have been forecast using the GP model. For this, the Genexpro tool was utilized and the training, testing and forecasting of the future 12-month period of water tables was carried out. The general GP model implementation and general structure used in this study is shown in Figure 3.

### Performance evaluation

^{2}), root mean square error (RMSE), mean absolute error (MAE) and Nash–Sutcliffe efficiency (NSE). The mathematical form of performance criteria is as below: where

*X*and

_{obs}*X*are observation and forecasted values, respectively and

_{est}*n*is the number of data.

### Numerical model

*K*,

_{xx}*K*and

_{yy}*K*show the hydraulic conductivities (L T

_{zz}^{−l}) along the

*x*,

*y*, and

*z*axes;

*h*indicates the potentiometric head (L);

*W*is the volumetric flux per unit volume and indicates sources and/or sinks of water (T

^{−l});

*S*shows the specific storage of the porous material (L

_{s}^{−1}); and

*t*indicates time (T). For solving groundwater flow (Equation (5)), the mass balance should be fulfilled, which suggests that the total flow in and through every cell ought to be equivalent to the capacity inside the cell. The mass balance in a cell is described as below: in which

*Q*indicates the flow rate into the cell (L

_{i}^{3}/T),

*S*(L

_{S}^{−1}) is the specific storage and indicates the water volume that can be injected per unit volume of aquifer material per unit change in head,

*ΔV*shows the cell volume (L

^{3}), and

*Δh*is the change in head over a time interval of

*Δt*.

Solving the governing equation of groundwater flow in the study area, the Modflow code, inserted in GMS (groundwater modelling system) interface, was utilized. GMS is a graphical user interface (GUI), and uses the finite difference method in solving the governing equation of the system flow. The finite difference method is more popular and needs less time to construct the lattice.

GMS version 10 was selected as the numerical model software. It is a comprehensive graphical environment for numerical modelling, a tool in area describing, making a conceptual model, constructing the lattice, geostatistics, and a complicated tool in the graphical visualization of the study area. A lattice was designed with cells in 500 × 500 meter dimension, consisting of 24 rows and 27 columns. Parameters such as basement elevation, surface topography, specific storage, surface incharge, initial water table and so on were dedicated to each cell of the lattice.

Lattice points are spaced horizentally by a distance *Δx* and vertically by a distance *Δy*. The distances *Δx* and *Δy* are the natural units of the lattice. Figure 4 shows a schematic of a lattice in two-dimensional form (Wang & Anderson 1995).

_{0}, y

_{0}) is obtained as below: and similarly for , substituting the

*y*instead of the

*x*in Equation (7).

*i*,

*j*) simplifies as below:

This is the most widely used equation in finite difference solutions for steady state flow modelling (Wang & Anderson 1995). For this study, the numerical model of the study area was run in steady state for one year.

### HS optimization algorithm

Similarly to alternative algorithms, the computational methodology of HS is roused from a heuristic event. Nonetheless, unlike the others, the primary philosophy of HS relies on a musical improvisation process which occurs when a group of musicians searches for a musically satisfying harmony. This procedure was initially applied to engineering optimization problems by Geem & Choi (2007).

Every musician in this adaptation is analogous to a decision variable and the collection of notes in the musicians’ memories is similar to the values of the decision factors. In melodic composition, musicians attempt to discover a musically satisfying harmony by the notes stored in their memories, though a global ideal solution is obtained by the set of values appointed to every design variable in the optimization procedure (Ayvaz 2009).

*i*th harmony vector is shown as which is generated randomly as below: For

*j*= 1, 2 … ,

*n*and

*i*= 1, 2 … , HMS

*r*is a random number between zero and one with a uniform distribution, while

*LB*(

*j*) and

*UB*(

*j*) are the lower and upper boundaries of the

*x*(

*j*) variable, respectively. In this case, the HM matrix is filled as below by the harmony vectors:

Also, the objective function amount is calculated for all harmony vectors of HM. The new harmony generation is started after the HM filling. There are two states of harmony generation. In the first state, the Harmony Memory Consideration Rate (HMCR) and Pitch Adjustment Rate (PAR) rules should be applied, whereas in the second state the new harmony is generated via the random initializing. Figure 5 shows the general form of the HS algorithm.

### Empirical Bayesian kriging

EBK is a geostatistical interpolation technique that automates the most troublesome parts of building a valid kriging model (ESRI 2015). Other kriging techniques need to regulate parameters manually to receive precise results, but EBK ascertains these parameters automatically through a procedure of sub-setting and simulations. On the other hand, EBK varies from other kriging techniques through accounting for the error presented by evaluating the underlying semivariogram. This procedure certainly accepts that the evaluated semivariogram is the true semivariogram for the interpolation region. By not considering the vulnerability of semivariogram estimation, other kriging techniques underestimate the standard errors of prediction. Unlike other kriging techniques (which utilize weighted least squares), the semivariogram parameters in EBK are evaluated using restricted maximum likelihood (REML). There are some advantages of the EBK method (ESRI 2015). It requires minimal interactive modeling, and standard prediction errors are more precise than other kriging techniques. Furthermore, it permits precise predictions of moderately nonstationary data, and is more precise than other kriging strategies for small datasets.

### Study area

The Azarshahr Plain located in Azerbaijan province, northwest of Iran, is one of the Urmia Lake watersheds. Figure 5 illustrates the studied area. Azarshahr Plain is bordered from the west by the salty flat plains (Urmia Lake). Urmia Lake in the northwestern corner of Iran is one of the largest permanent hyper-saline lakes in the world and is the largest lake in the Middle East (Zarghami 2011). The area is highly populated and 100% of its industrial, domestic, and drinking water and 80% of agricultural water are obtained from groundwater sources (Asghari Moghaddam 1991). Yearly mean precipitation of the study region is around 221.2 mm for a long-time period (1982–2009), while the yearly mean evaporation is around 1,500 mm. Azarshahr Chay is the main river in the area, which begins from the Sahand Mountain and seldom discharges into the lake because of losses of percolation and evaporation, as well as the water diversion for the purposes of irrigation (Azerbaijan Territorial Water Association (ATWA) 2009). During the past decade, the study area has faced groundwater depletion and subsequently a reduction in reservoir volume, coinciding with population growth, incredible changes in climate and over-extraction of water resources in the study area, causing environmental hazards. Figure 6 shows the aquifer domain and piezometers location in the study area.

^{2}/day near the Qazi-Jahan village and the lowest amount is about 25 m

^{2}/day in the northwest of Mamqan town. Table 1 provides some data about the wells, and which pumping test has been conducted on them. The hydraulic conductivity has been gained using Equation (11): where

*K*is hydraulic conductivity in meter per day,

*T*is transmissivity in square meters per day, and

*b*is the thickness of the saturated layer in meters.

X | Y | Location name | Well depth (m) | Transmissivity (m^{2}/day) |
---|---|---|---|---|

584,924 | 4179724 | Azarshahr | 63 | 850 |

579918 | 4183397 | Teimoorlou | 72 | 550 |

579992 | 4184140 | Teimoorlou | 60 | 330 |

579854 | 4184804 | Teimoorlou | 36 | 280 |

579736 | 4185516 | Teimoorlou | 20 | 100 |

579609 | 4185742 | Teimoorlou | 35 | 95 |

579698 | 4185164 | Teimoorlou | 35 | 150 |

578832 | 4183276 | Teimoorlou | 50 | 40 |

580059 | 4179833 | Dastjerd | 55 | 40 |

577141 | 4181001 | Qeshlagh | 42 | 130 |

580409 | 4181574 | Gogan | 60 | 300 |

579593 | 4181892 | Gogan | 40 | 870 |

580300 | 4182839 | Gogan | 72 | 340 |

578178 | 4181736 | Gogan | 50 | 170 |

577278 | 4181973 | Gogan | 30 | 80 |

582700 | 4186000 | Goy-Gonbad | 92 | 30 |

X | Y | Location name | Well depth (m) | Transmissivity (m^{2}/day) |
---|---|---|---|---|

584,924 | 4179724 | Azarshahr | 63 | 850 |

579918 | 4183397 | Teimoorlou | 72 | 550 |

579992 | 4184140 | Teimoorlou | 60 | 330 |

579854 | 4184804 | Teimoorlou | 36 | 280 |

579736 | 4185516 | Teimoorlou | 20 | 100 |

579609 | 4185742 | Teimoorlou | 35 | 95 |

579698 | 4185164 | Teimoorlou | 35 | 150 |

578832 | 4183276 | Teimoorlou | 50 | 40 |

580059 | 4179833 | Dastjerd | 55 | 40 |

577141 | 4181001 | Qeshlagh | 42 | 130 |

580409 | 4181574 | Gogan | 60 | 300 |

579593 | 4181892 | Gogan | 40 | 870 |

580300 | 4182839 | Gogan | 72 | 340 |

578178 | 4181736 | Gogan | 50 | 170 |

577278 | 4181973 | Gogan | 30 | 80 |

582700 | 4186000 | Goy-Gonbad | 92 | 30 |

## RESULTS AND DISCUSSION

According to Nourani *et al.* (2009), and considering the data period (10–20 years) of water level in piezometers, two decomposition levels were used in DWT. Thus, the original groundwater time series were decomposed into D1, D2 and A2 where D1 and D2 are details and A2 is an approximation. D1, D2 and A2 were used as input to the ANN. The data number used for ANN varies from 141 to 249, including water levels for a 10–20-year period, whereas 80 and 20% of data were selected for training and testing the ANN, respectively. The Levenberg–Marquardt algorithm was chosen as the training algorithm. Different hidden node numbers were tried and the optimal hidden nodes were found to vary between 2 and 8 for the optimal ANN models. Simulation iterations were finished at 23–25 to achieve the best results. It is noteworthy that piezometer numbers 4, 8 and 14 had one-, one- and three-month gaps, respectively, during the test data period, and WANN covered the gaps and forecast the water levels in the test period.

The simulation results of WANN were imported to the GP model in order to predict the water level of each piezometer for a 12-month ahead period. Twelve-month ahead water table forecasting was carried out by GenexPro Tools version 4.0 for each piezometer (Figure 7). Figure 7 shows the long-term period simulation of water level and the surrounding part of each graph reveals the predicted water level of each piezometer for 12 months ahead; these parts of the foreseen water level were applied for groundwater numerical modelling.

The evaluation of model performance was carried out by statistical criteria (R^{2}, RMSE, MAE and NSE) and the results are shown in Table 2. The RMSE, MAE, NSE and R^{2} values, respectively ranging as 0.070–4.45, 0.049–3.036, 0.74–0.96 and 0.80–0.97, indicate the high accuracy of the GP model in groundwater forecasting.

Piezometer | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | 14 | 19 | 21 | 23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

R^{2} | 0.97 | 0.92 | 0.9 | 0.88 | 0.9 | 0.81 | 0.87 | 0.83 | 0.81 | 0.96 | 0.94 | 0.8 | 0.84 |

RMSE | 0.070 | 0.237 | 0.104 | 0.710 | 0.076 | 4.45 | 0.387 | 0.556 | 0.622 | 0.268 | 0.358 | 1.164 | 0.461 |

MAE | 0.049 | 0.160 | 0.070 | 0.016 | 0.538 | 3.036 | 0.290 | 0.363 | 0.440 | 0.210 | 0.166 | 0.686 | 0.292 |

NSE | 0.93 | 0.91 | 0.92 | 0.83 | 0.91 | 0.75 | 0.78 | 0.85 | 0.82 | 0.96 | 0.94 | 0.81 | 0.74 |

Piezometer | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | 14 | 19 | 21 | 23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

R^{2} | 0.97 | 0.92 | 0.9 | 0.88 | 0.9 | 0.81 | 0.87 | 0.83 | 0.81 | 0.96 | 0.94 | 0.8 | 0.84 |

RMSE | 0.070 | 0.237 | 0.104 | 0.710 | 0.076 | 4.45 | 0.387 | 0.556 | 0.622 | 0.268 | 0.358 | 1.164 | 0.461 |

MAE | 0.049 | 0.160 | 0.070 | 0.016 | 0.538 | 3.036 | 0.290 | 0.363 | 0.440 | 0.210 | 0.166 | 0.686 | 0.292 |

NSE | 0.93 | 0.91 | 0.92 | 0.83 | 0.91 | 0.75 | 0.78 | 0.85 | 0.82 | 0.96 | 0.94 | 0.81 | 0.74 |

### Numerical model

Figure 8 shows the results of AI-based numerical model calibration. It reveals that the numerical model has had a good operation and the results of groundwater level simulation are acceptable.

### Hybrid model optimization

*K*is the hydraulic conductivity in the intended cell,

*h*is the hydraulic head,

*x*and

*y*are the cell dimensions,

*W*is the cell discharge or recharge and

*R*is the surface recharge of the cell.

*x*and

*y*dimensions of the cell in the numerical model, since the piezometer is a point and has the same hydraulic head in length and width, the equation becomes: In the current study, the HMCR and PAR are 0.95 and 0.05, respectively. The bandwidth equals to 0.02 and the maximum iteration of program equals to 1,000. The results of the optimization model are presented in Figure 9.

Finally, hydraulic conductivity was obtained via the optimization process and using the EBK method extended to the aquifer domain. Figure 10 depicts the cross validation diagram of predicted hydraulic conductivity for the study area. Also, Table 3 shows the hydraulic conductivity prediction errors for the EBK model. Both the cross validation diagram and Table 3 reveal the high accuracy and low error of the EBK method in the prediction of hydraulic conductivity distribution in aquifer domains that cause an increase in the model's reliability.

Standardized RMSE | Standardized ME | RMSE | ME | Correlation coefficient |
---|---|---|---|---|

0.639 | 0.022 | 0.2533 | 0.0027 | 0.990 |

Standardized RMSE | Standardized ME | RMSE | ME | Correlation coefficient |
---|---|---|---|---|

0.639 | 0.022 | 0.2533 | 0.0027 | 0.990 |

The hydraulic conductivity in the study area is variable as the northern and western parts of the aquifer have lower hydraulic conductivity owing to the presence of salty flats adjacent to Urmia Lake. The presence of coarser deposits juxtaposed to the river increases the hydraulic conductivity. High hydraulic conductivity belongs to the central parts and upstream of the study area based on a generated map of hydraulic conductivity by the present study (Figure 11).

Table 4 shows the calculated hydraulic conductivity via the pumping test and compares it to the obtained corresponding values through the suggested method in the present study. It can be observed that about 87.5% of the calculated values are almost equal to the estimated values from the present study. The differences between calculated and estimated hydraulic conductivity may be caused by the uncertainty in aquifer thickness that can change the obtained results of pumping test implementation.

Pumping test no. | Calculated K via pumping test (m/day)^{a} | Calculated K via the HS and EBK models (m/day)^{a} |
---|---|---|

1 | 15.5 | >16 |

2 | 7.65 | 4–8 |

3 | 5.5 | 4–8 |

4 | 3.75 | <4 |

5 | 4 | <4 |

6 | 2.7 | <4 |

7 | 3.3 | <4 |

8 | 1.8 | <4 |

9 | 1.7 | <4 |

10 | 3.1 | <4 |

11 | 5 | <4 |

12 | 14.5 | 12–16 |

13 | 4.7 | 4–8 |

14 | 3.4 | <4 |

15 | 2.65 | <4 |

16 | 0.3 | <4 |

Pumping test no. | Calculated K via pumping test (m/day)^{a} | Calculated K via the HS and EBK models (m/day)^{a} |
---|---|---|

1 | 15.5 | >16 |

2 | 7.65 | 4–8 |

3 | 5.5 | 4–8 |

4 | 3.75 | <4 |

5 | 4 | <4 |

6 | 2.7 | <4 |

7 | 3.3 | <4 |

8 | 1.8 | <4 |

9 | 1.7 | <4 |

10 | 3.1 | <4 |

11 | 5 | <4 |

12 | 14.5 | 12–16 |

13 | 4.7 | 4–8 |

14 | 3.4 | <4 |

15 | 2.65 | <4 |

16 | 0.3 | <4 |

^{a}K is the hydraulic conductivity (m/day).

## CONCLUSIONS

Precise estimation of the aquifer hydraulic parameters helps custodians in the management of groundwater resources, and in making proper decisions for future plans. Indirect determination of hydraulic parameters could be helpful, especially in situations where direct methods such as pumping test implementation are time-consuming and costly. The current study has attempted to estimate the hydraulic conductivity through the AI based mathematical model optimization by HS meta-heuristic algorithm and extended it over the entire study area, i.e. the aquifer domain. The results of this method were found to be acceptable in comparison to the calculated hydraulic conductivity obtained by the pumping test. The results of the current study showed that the use of the hybrid WANN-GP model can be practical in the prediction of water levels, covers the data gaps and may be a good aid in mathematical models. In addition, EBK illustrated its ability in the prediction of parameters distribution. Generally, the current study indicates the applicability of the HS optimization algorithm and AI hybrid models in the estimation of groundwater levels and aquifer parameters, and may also be used for the approximation of other hydraulic parameters in the aquifer domain.

## REFERENCES

*.*

*.*

*,*

*(*

*,*Vol.

*.*