Meticulous prediction of hydrological processes, especially water budget, has an individual importance in environmental management plans. On the other hand, conservation of groundwater, a fundamental resource in arid and semi-arid areas, needs to be considered as a great priority in development plans. Prediction of a groundwater budget utilizing artificial intelligence was the scope of this study. For this aim, the Azarshahr Plain aquifer, East Azerbaijan, Iran, was selected because of its great dependence on groundwater and the necessity of cognizance of its budget in future programs. The long-term fluctuations of the water table in 13 piezometers were simulated by a wavelet-based artificial neural network (WANN) hybrid model, and their statistical gaps were covered. Then, the modelled water table was predicted for the next 12 months using genetic programming. The results of simulation and prediction were assessed by performance evaluation criteria such as R2, root mean squared error, mean absolute error and Nash–Sutcliffe efficiency. Thiessen polygons were then utilized, plotting the predicted unit hydrograph of the study area. The predicted water table from September 2012 to August 2013 revealed about 0.12 m depletion. Regarding the area of the Azarshahr Plain aquifer and its average storage coefficient, the aquifer budget will be reduced by about 0.3557 million cubic metres during this period.
Awareness of coming natural events, particularly hydrological processes, is a great challenge for environmental custodians, especially hydrologists. Notwithstanding their highly stochastic nature, the development of models capable of describing such complex phenomena is a growing area of research.
Groundwater as a major source of water supply for domestic, agricultural and industrial users and, of course, the main part of the hydrology cycle has a vital role in arid and semi-arid areas. In several such areas, much more groundwater is withdrawn than the recharge rate, leading to damaging environmental effects such as water level depletion, drying up of wells, abatement of water quality, amplified pumping prices and reduced well yields (Adamowski & Chan 2011). Effectively managing groundwater, predicting groundwater level ﬂuctuations, and quantifying these changes are strategic hydrological issues.
Recent years have seen a signiﬁcant rise in the number of scientiﬁc approaches applied to hydrologic modelling and forecasting, including the main popular ‘data-based’ or ‘data-driven’ approaches. Such modelling methods involve mathematical equations drawn not from the physical process in the watershed but from an analysis of simultaneous input and output time series (Solomatine & Ostfeld 2008). Meanwhile, it is becoming increasingly difficult to ignore the role of artificial intelligence (AI) in hydrological processes' prediction due to its efficiency in modeling complex physical processes based on certain data/information governing the process. There are numerous researches about groundwater level changes utilizing AI (e.g., Adamowski & Chan 2011; Kisi & Shiri 2012; Fallah-Mehdipour et al. 2013; Maheswaran & Khosa 2013; Moosavi et al. 2013; Nourani et al. 2015; Seo et al. 2015; Sivapragasam et al. 2015).
Nevertheless, AI is progressively being preferred primarily and many studies have already been reported in forecasting groundwater level changes; however, there is no direct forecasting of groundwater budget using this method. The low number of studies on groundwater budget modelling via AI demonstrates the need to consider groundwater and relevant issues.
Among various conceptual and black box models developed over the mentioned period, hybrid AI-based models have been among the most promising in simulating hydrologic processes (Nourani et al. 2014), and wavelet-AI is an example of these methods. On the other hand, genetic programming (GP) is an AI method that is based on the random iterative searching process to achieve an appropriate relationship between input and output. Conjugating the wavelet and GP methods can give an incredibly precise result in hydrological processes' prediction; for instance, Nourani et al. (2012) investigated the linkage of wavelet analysis to GP in constructing a hybrid model to detect seasonality patterns in rainfall–runoff. The hybrid model was useful in forecasting runoff.
In this study an attempt is made to model the variation of groundwater budget in the Azarshahr Plain aquifer using a conjugated wavelet-based artificial neural network (WANN)-GP model. This work differs from previously reported works in the sense that emphasis is given to improving the insight into the groundwater budget change and also linking the wavelet and GP for groundwater modelling.
This study develops and applies two different hybrid models, a WANN and a WANN hybrid GP model (WANN-GP), for water budget forecasting in the Azarshahr Plain aquifer, East Azerbaijan, Iran. The statistical performance assessment is employed to evaluate the developed models. Figure 1 shows the procedure for the present study.
The Azarshahr Plain is one of the Urmia Lake sub-basins, and is located in Azerbaijan province, northwest Iran. The study area is densely populated, with 100% of its drinking, domestic and industrial water and 80% of agricultural water supplied from groundwater resources.
Azarshahr Chay is the main stream flow in the study area, which originates from Sahand Mountain and rarely discharges into the lake due to percolation and evaporation losses, as well as diversion of water for irrigation. The average annual precipitation of the study area is about 221.2 mm for the long-term period 1982 to 2009, whereas the annual evaporation is about 1,500 mm.
The alluvial aquifer of the study area has been known for many years to be a good one and has been extensively developed for public and agricultural water supplies, particularly in connection with groundwater utilization (ATWA 2009). However, there is considerable evidence that groundwater resources are already being exploited at rates faster than aquifer recharge in the area of the Lake Urmia watershed (Wada et al. 2010). During the past decade, the study area has faced groundwater depletion and subsequently a reduction in reservoir volume, coinciding with population growth, great changes in climate and over-extraction of water resources in the study area, causing environmental hazards. Figure 2 shows the aquifer domain and piezometer locations in the study area.
As a contemporary tool of applied mathematics, wavelet transform (WT), is a signal processing strategy that has indicated higher performance contrasted with Fourier transform (FT) and short time FT in examining non-stationary signals. WT analysis, created during recent decades in the mathematics community, appears to be a more effective device than the FT in studying non-stationary time series (Partal & Kisi 2007). The principal point of preference for WTs is their capacity at the same time to get information on the time, location and frequency of a signal, while the FT will just give the frequency information of a signal.
The WT is implemented through discrete and continuous WT (DWT and CWT). Since different scales should be taken into consideration in CWT and using a numerical method, the equation integration for each scale is resolved. Calculating the wavelet coefficient is time-consuming in all scales and produces huge amounts of data. In other words, we can say that CWT consists of redundant and inefficient sections, which are its weak points (Adamowski 2007; Partal 2009); whereas the DWT has eradicated the CWT drawbacks. Meanwhile, it is an efficient alternative for the discrete data.
In DWT, the original time series is passed through high-pass and low-pass ﬁlters (digital filtering), getting time-scale signals. The results of digital filtering are detailed coefﬁcients and approximation series, obtained with the wavelet algorithm (Zhang & Li 2001). Every time that this procedure is repeated, the approximation and one or more details are gained.
Performing the above-mentioned transform, the raw data are divided into approximation (A) and details (D). The approximation consists of high scale and low frequency components of the signal. The details consist of low scale and high frequency components of the signal, which are obtained from low-pass and high-pass filters, respectively.
Consequently, the DWT was used to decompose the time series data belonging to the groundwater level for the wavelet analysis-artificial neural network (WA–ANN) models developed in this study.
The WT is appropriate in significant and potentially beneficial data mining, available in experimental sciences (prediction, reanalysis, global climate model simulations, etc.). Providing obvious information in a readable form, it can be applied to resolve analytical, classiﬁcation or forecasting issues. In a review of the applications of the WT in hydrologic time series modelling, Sang (2013) highlighted the complex information that can be drawn from such analysis: characterization and understanding of hydrologic series' multi-temporal scales, identiﬁcation of seasonality and trends, and data de-noising. Consequently, better interpreting of hydrological processes is derived from the decomposing ability of the WT (Nason & Sachs 1999; Adamowski 2008; Adamowski et al. 2009; Kisi 2010; Mirbagheri et al. 2010; Sang 2012).
Since AI has shown promise in modelling and forecasting non-linear hydrological processes and in handling large amounts of dynamicity and noise concealed in datasets, hybrid modelling of AI was employed for precise simulation of water level in piezometers and elimination of their statistical gaps in a long-term period. For this, the WT model was linked to ANN, producing WANN.
WANN is the conjunction model of wavelet decomposition and ANN. The results of time series wavelet decomposition are used as inputs to ANN for WANN. In other words, details and approximations attained by wavelet decomposition are utilized as input to the ANN. Figure 3 shows the general procedure of the WANN model used in this study.
In this study, DWT using Mallat's (1998) algorithm was used for decomposing the time series signal. The time series signal in this study is the water table fluctuation in piezometers, used only as the mother signal, which must be decomposed. The multi-resolution analysis by Mallat's algorithm generates approximations and details for a given time series signal. The general trend of the original signal and high frequency components are held and depicted by an approximation and detail, respectively. This results in breaking down the original signal into lower resolution constituents. Nourani et al. (2009) introduced the L = Int [log (N)] for choosing the number of decomposition levels or DWs, where L was the decomposition level while N was the number of time series data.
N-level DWT decomposes a signal x (t) into D1, D2…DN and AN, where D1 to DN are details and AN is an approximation. D1, D2…DN and AN are used as input to the ANN. The second step corresponds to training and testing phases using the ANN.
The WANN algorithm is summarized as follows:
Step 1: Multilevel wavelet analysis using DWT decomposes a signal into details (D1, D2… DN) and approximation (AN), where N is the decomposition level. Water table time series data were decomposed into details D1 and D2 and an approximation A1 in this study. Decomposition levels have been selected with respect to the number of data used for each piezometer (the data of water table used for each piezometer) and they are shown in Table 1. For decomposition level, DL = log(No. Data) formula was used, following the suggestion of Wang & Ding (2003), Partal & Kisi (2007) and Nourani et al. (2009).
Step 2: ANN is trained and tested using the details and approximation as input and the model performance is evaluated.
aThe number of data for each piezometer (number of months).
GP is a kind of artiﬁcial intelligence method that is based on the random iterative searching process to achieve an appropriate relationship between input and output. The common structure of this method is the tree shape, representing the expression. Variables, functions and operators in this structure are situated in the nodes, which are linked together by branches.
GP is an evolutionary algorithm based on Darwinian theories of natural selection and survival of the fittest. The algorithm considers an initial population of randomly generated equations, derived from the random combination of input variables, random numbers and functions. The function can include arithmetic operators (plus, minus, multiply and divide), mathematical functions (sin, cos, exp, log) etc., which have to be chosen based on some understanding of the process. This population is then subjected to an evolutionary process and the fitness of the evolved programs are evaluated; individual programs that best fit the data are then selected from the initial population. The programs that best fit are selected to exchange part of the information between them to produce a better program through ‘crossover’ and ‘mutation’. The user must decide a number of GP parameters before applying the algorithm to the data. The program that fitted the data less well is discarded. This evolution process is repeated over successive generations and is driven towards finding symbolic expressions describing the data, which can be scientifically interpreted to derive knowledge about the process being modelled (Sivapragasam et al. 2015).
However, in this study, the simulated water tables by WANN were used as inputs to the GP model for time series prediction as can be seen from the flowchart of the study steps (Figure 1). It means that the simulated water tables of each piezometer have been entered into the GP model and forecasting has been done after that. For this aim, GeneXproTools was utilized and the training, testing and forecasting of 12-month ahead water tables was done.
Correlation coefficient (CC), mean squared error, root mean squared error (RMSE), relative absolute error, mean absolute error (MAE), relative squared error, and root relative squared error (RRSE) are the fitness functions used in GeneXproTools software, and the RRSE is used as the default fitness function for time series prediction. The General GP model implementation and general structure shown in Figure 4 were used in this study.
Groundwater budget calculation
After prediction of the water level for a considered period in the piezometers of the aquifer domain, the Thiessen polygons were drawn by calculating the spatial domain of each piezometer (Figure 5).
The procedure of monthly water level forecasting is as below:
Forecasted water level for each month = [[(polygon area of 1st piezometer × 1st piezometer water level) + (polygon area of 2nd piezometer × 2nd piezometer water level) + ··· + (polygon area of Nth piezometer × Nth piezometer water level)]/(polygon area summation)]
In hydrological studies, we need to know the water budget of the groundwater reservoir, where sometimes we do not have the data of the input and output parameters of the study area, such as precipitation, evaporation, etc.; in this situation, we can evaluate the reservoir groundwater budget by calculating the fluctuations in the water table of the aquifer during the given period (here from 2012 to 2013). Knowing the difference between the water table at the beginning and end of the period, and also knowing the specific storage, S, and the area of the reservoir (A), it is possible to calculate the changes of reservoir groundwater volume. This leads us to understand the groundwater budget, i.e., if the reservoir gained or lost water during the given period of time.
RESULT AND DISCUSSION
According to Nourani et al. (2009) and considering the statistical period (10 to 20 years) of the water level in the piezometers, two decomposition levels were used in DWT. Thus, the original groundwater budget time series were decomposed into D1, D2 and A2, where D1 and D2 are details and A2, is an approximation. D1, D2 and A2 were inputted into the ANN. The number of data used for ANN varies from 141 to 249, including water levels for a 10–20 year period, and 80% and 20% of data were selected for training and testing the ANN, respectively. The Levenberg–Marquardt algorithm was chosen as the training algorithm. Different hidden node numbers were tried, and the optimal hidden nodes were found to vary between 2 and 8 for the optimal ANN models. Simulation iterations were finished at 23 to 25 to achieve the best results. Figure 6 depicts the testing results for 13 piezometers in the aquifer domain. It is noteworthy that piezometer numbers 4, 8 and 14 had 1, 1 and 3 month gaps during the test data period. WANN covered the gaps and forecasted the water levels in the test period.
The simulation results of WANN were imported to the GP model in order to predict the water level of each piezometer for a 12-month period ahead. Twelve months ahead water table forecasting was done by GeneXproTools version 4.0 for each piezometer (Figure 7). Figure 7 shows the long-term period simulation of the water level, and the surrounding part of each graph reveals the predicted water level of each piezometer for 12 months ahead; these parts of the predicted water level were applied for groundwater budget calculations.
The evaluation of model performance was done by statistical criteria (R2, RMSE, MAE and NSE) and the results are shown in Table 2.
It can be clearly seen that the greater the fluctuation of the water level, the lower the performance of the model. The lower performance belongs to piezometer numbers 8, 11, 12 and 21, and may be driven by their situation. For example, piezometer No. 8 is located near the road and belongs to a tubing manufacturer that uses groundwater for its production, and therefore the water level experiences oscillation. Also piezometers numbers 11 and 12 are near the river and perhaps affected by river fluctuations. Thus it can be inferred that the model performs with lower accuracy and more error. Table 3 also shows the results of the predicted water level for the 13 piezometers and their Thiessen polygon area used for groundwater budget computing.
aNo. = piezometer number.
bH1…H12 = water level from 1st month to 12th month.
The unit hydrograph was done after that, using the predicted water level in the piezometers and Thiessen polygons (Figure 8). The predicted unit hydrograph showed the routine fluctuation in the aquifer water table from September 2012 until August 2013. Also, Table 4 shows the aquifer domain predicted water table during this period. It can be seen that groundwater level depletes in dry months and rises in wet months such as March, April and May.
Using Equation (7), the predicted water table for the water budget domain, considering the aquifer domain area (81.65 km2) and storage coefficient, derived about 0.036 from pumping wells and qanat discharges in the Azarshahr Plain detailed collected data, done by the Azerbaijan Territorial Water Association (ATWA) in 2009. The reservoir volume, the groundwater budget of the study area, has reduced by about 0.3557 million cubic metres (MCM) during the prediction period (September 2012 to August 2013).
This paper has given an account for the widespread use of AI in the simulation and prediction of environmental processes. Hybrid modelling of wavelet and ANN in simulating the water table and then forecasting the future water table using GP was applied to determine the study area groundwater reservoir changes during the forecasted period of time. Performance assessment shows satisfying results, revealing the accuracy of AI hybrid models. Not only the WANN hybrid model but also the GP model has shown its ability in simulation and prediction of natural events. The next year's water budget was measured, using the Thiessen polygon method, knowing the area of the aquifer domain and its storage coefficient and forecasting the aquifer water table fluctuation. The groundwater reservoir has lost about 0.35 MCM of its storage during the predicted period, which indicates the importance of monitoring the groundwater resource and, of course, the predicted water budget can be taken into account for future environmental plans.