Reliable temporal modelling of groundwater level is significant for efficient water resources management in hydrological basins and for the prevention of possible desertification effects. In this work we propose a stochastic method of temporal monitoring and prediction that can incorporate auxiliary information. More specifically, we model the temporal (mean annual and biannual) variation of groundwater level by means of a discrete time autoregressive exogenous variable (ARX) model. The ARX model parameters and its predictions are estimated by means of the Kalman filter adaptation algorithm (KFAA) which, to our knowledge, is applied for the first time in hydrology. KFAA is suitable for sparsely monitored basins that do not allow for an independent estimation of the ARX model parameters. We apply KFAA to time series of groundwater level values from the Mires basin in the island of Crete. In addition to precipitation measurements, we use pumping data as exogenous variables. We calibrate the ARX model based on the groundwater level for the years 1981 to 2006 and use it to predict the mean annual and biannual groundwater level for recent years (2007–2010). The predictions are validated with the available annual averages reported by the local authorities.
INTRODUCTION
Groundwater is a significant source for the functions of a watershed. It replenishes the surface waters, supports the ecosystems viability and is a primary source for drinking and agricultural purposes. Groundwater level reduction affects the base flow in rivers, evapotranspiration rates, and water quality, as well as human welfare, agricultural and industrial activities (Famiglietti 2008). Examination and modelling of temporal trends of groundwater monitoring wells or of basin averages provide useful information about the aquifer temporal response to different meteorological (e.g. precipitation extremes) or anthropogenic effects (groundwater over-exploitation). Therefore, the modelling of temporal variations of the groundwater level provides a useful management tool in sparsely monitored basins that helps to assess future trends (Margat & Van der Gun 2013).
In the case of temporal estimation it is desirable to formulate a predictive model of groundwater table level that can incorporate various physical variables that determine the groundwater level, such as meteorological data (e.g. precipitation), evapotranspiration, runoff and water usage. A stochastic model is required due to the considerable uncertainty of certain variables (e.g., evapotranspiration), the spatial variability of conditions within the basin, the sparse nature of sampling in space-time and the inadequacy of available measurements. For example, water usage is estimated based on data from the official boreholes, whereas an unspecified number of unregulated boreholes operate in the area. Any physically-based model becomes stochastic once its inputs, parameters or outputs are treated as random functions (Christakos 2000).
For water management purposes, it is important to have information about water table level and especially about the dynamic relationship between water table level and hydro-meteorological variables that affect the water table level variation. Water table fluctuations have been successfully modelled using empirical time series to describe water table dynamics. In several studies linear transfer-function noise models (Knotters & Van Walsum 1997; Van Geer & Zuur 1997; Von Asmuth & Knotters 2004; Yi & Lee 2004; Von Asmuth et al. 2008; Fabbri et al. 2011) have been applied. Additionally, an approach based on model-predefined impulse response function in continuous time (Von Asmuth et al. 2002) has been used to model the dynamic relationship of precipitation surplus and water table level at basin scale (Manzione et al. 2010, 2012). Yet, an empirical model to simulate daily water table levels based on hydro-meteorological variables was proposed by Morgan & Stolt (2009), while an empirical time series predictive tool that incorporates daily precipitation was used to model the fluctuations of the water table depth in an alluvial aquifer in Italy (Calzolari & Ungaro 2012).
Data driven techniques have been widely developed to deal mainly with complex non-linear physical phenomena. However, most of them ignore the physical base of the problem. And yet, in many cases have been successfully applied in groundwater level forecasting e.g. genetic programming (GP) and neural fuzzy inference system (Fallah-Mehdipour et al. 2013), evolutionary polynomial regression (EPR) (Giustolisi et al. 2008), support vector machines (SVM) and wavelet transforms (Suryanarayana et al. 2014), and self-organizing maps (SOM) (Chen et al. 2010). Trained artificial neural network (ΑΝΝ) models have also been applied to model the water table temporal variability at various catchments, e.g. Trichakis et al. (2009, 2011) and Tapoglou et al. (2014), as well as at a single well at the study basin of this work (Daliakopoulos et al. 2005; Tsanis et al. 2008). ANN models in several occasions have been combined with autoregressive exogenous variable (ARX) functions, applied in this work, to enhance groundwater level predictions (e.g. Giustolisi & Simeone 2006; Izady et al. 2013).
One can find advantages and disadvantages in all data driven techniques depending on the modelled phenomenon and the available data characteristics: data size, extreme values, statistical distribution, data value range and time span. The major advantage of machine learning techniques is their capability to model complex non-linear phenomena. The general disadvantage of all the machine learning techniques is related to the loss of physical interpretation and the complexity of their structure, the a priori requirements and computational complexity (Todini 2007). Other drawbacks of ANN, SOM and GP are the calculation of locally optimal values, constants miscalculation and overfitting. The biggest disadvantages of the SVM and Wavelet Transforms lies in the choice of the right kernel (or wavelet) and on the slow training section (Abe 2005). EPR have provided an improved framework of operation compared to the classic machine learning methods, it searches via genetic algorithms for the best form of the function structure and like the other methods requires the dataset to be as large as possible to capture the behaviour of the modelled phenomenon. Finally, the major disadvantage of machine learning techniques is the requirement of extensive data. Small datasets are a drawback especially for the training section as a significant number of parameters need to be determined (Giustolisi & Savic 2006; Giustolisi et al. 2008; Solomatine & Ostfeld 2008; Anguita et al. 2010; Remesan & Mathew 2015).
ARX models, on the other hand, combine simple predictive models with a linear threshold function without hidden layers and a priori requirements. They are less complex than the machine learning methods and under specific conditions, optimal process interpretation, can be equally accurate. Thus, the application of a model that can physically explain the relationship between the groundwater level and hydrological variables is generally preferred to simulate and predict the groundwater level temporal variability. Such a model is the ARX. Furthermore, in this work the available datasets are small. The ARX model embedded in a Kalman filter has an advantage over other data driven techniques when small datasets are available due to the recursive update of the system parameters and of the predictions, conditionally on observed data, as the data are successively processed.
ARX models proposed by Knotters & Bierkens (2000) have also proved to be useful tools for modeling the dynamic relationship between meteorological variables and water table level (Knotters & Bierkens 2001; Webster & Heuvelink 2006). An extension of ARX models involves the regionalised version of the ARX model (Bierkens 2001; Bierkens et al. 2001; Knotters & Bierkens 2002).
This study aims to model the temporal variability of the mean annual groundwater level in a sparsely monitored basin of great interest for the welfare of the local community by exploiting the available field data and maximizing the possible information gain. We adopt the discrete time ARX model for the variation of the water table level (Knotters & Bierkens 2001, 2002). The term exogenous denotes that the model equations incorporate information from other variables besides water table level. In particular, the ARX discrete-time model herein incorporates information about the time variability based on the available time series of groundwater level, precipitation measurements and abstraction rates. The ARX model is embedded in a Kalman filter which enables the estimation of parameters and the prediction of the optimal groundwater level (Ljung 1999).
Similar works that involve ARX model assume prior knowledge of the model parameters from field monitoring, or estimate the parameters initially from available field data, or from calibration at a nearby location, or a different time period (Bierkens 2001; Bierkens et al. 2001; Webster & Heuvelink 2006; Yuan et al. 2008). In the Kalman filter approach proposed herein the ARX model parameters are estimated and optimized using the time series of the water table level. The difference of this approach from other works is that the ARX model parameters and the output of the state equation are simultaneously estimated by means of the Kalman filter adaptation algorithm (KFAA) (Ljung 1999; Lanzi et al. 2006). The initial values of the ARX model parameters are randomly selected due to lack of field information. To our knowledge, this approach is used for the first time in a hydrological application.
The ARX model has been originally used with a time step equal to one day (Bierkens et al. 2001). Herein we apply the model with a considerably larger time step equal to one year and biannually, since we are limited by the available measurements. Hence, we employ a coarse-grained version of the original model which has the same form due to the linearity of the ARX equation. We also include the impact of anthropogenic activities, i.e., pumping data, in the ARX model. As we show below, the model predictions are in very good agreement with the data after the initial period of parameter adaptation. The temporal variation of the mean annual and biannual groundwater level allows us to assess the aquifer's behavior during the last 30 years and its response to precipitation and pumping.
STUDY AREA
Porosity decreases with depth below the surface in the range 0.05–0.12 whereas effective porosity, which is the percentage of pores (interconnected) that are available for fluid flow has been determined to be 0.085 in the Mires Basin. High values of hydraulic conductivity are present in the Mires basin, where it varies between 10 and 120 m/day reflecting the presence of several gravel and sand horizons in the alluvial sequence. Lower values occur in the northern side of the Mires basin, where lower Pleistocene rocks are present. The significant groundwater level drop in the past 30 years (>35 masl) has affected the hydraulic characteristics of the aquifer presenting a 25% average reduction in hydraulic conductivity values.
The basin's aquifer is characterized in hydro-geological reports as unconfined. Aquifer storage coefficients, obtained from pumping tests, in conjunction with the behavior of the water table (free surface existence) suggests, that although heterogeneous and locally confined, the aquifer behaves at the regional scale as an unconfined unit. A detailed hydro-geological description of the Mesara catchment including Mires basin is presented in Varouchakis (2016).
The spatial variability of the groundwater level in the basin has been studied using stochastic and deterministic geostatistical methods (Varouchakis & Hristopulos 2013a), applying non-linear data normalization techniques in combination with Kriging (Varouchakis et al. 2012) and by developing spatial trend models based on topographical variables and physical laws in association with Residual Kriging (Varouchakis & Hristopulos 2013b; Varouchakis et al. 2016a).
A correlation analysis, in terms of correlation coefficient R, shows regarding the annual values that groundwater level and precipitation have an R = 0.58, with p-value: 0.0007, groundwater level and pumping have: R = −0.52, p-value: 0.003 while precipitation and pumping: R = −0.70, p-value: 0.000017. The p-value assesses the hypothesis of no correlation. If p-value is small, i.e. less than 0.05, then the correlation coefficient R is significant. R values denote an important correlation for an interval of 0.5–0.70 and a strong correlation for the interval 0.7–0.9 (Tichy 1993). In addition, the correlation analysis of the sub-annual data provide the following results: groundwater level and precipitation have R = 0.76, p-value: 1 × 10−6, groundwater level and pumping have: R = −0.68, p-value: 1 × 10−6 while precipitation and pumping: R = −0.81, p-value: 1 × 10−6.
A hydrogeological tool (Haitjema & Mitchell-Bruker 2005) has been also applied to the basin (Varouchakis et al. 2016a) to examine if the aquifer is recharge or topography controlled. The equation that describes the tool involved in the numerator the average annual recharge or infiltration rate, i (m/d) and the square of the average distance between surface waters, L (m). In the denominator, a factor m between 8 and 16 for aquifers that are strip like or circular in shape respectively, the (horizontal) aquifer hydraulic conductivity, K (m/d), the aquifer thickness, H (m) and the maximum distance between the average surface water levels and the terrain elevation, d (m). If the ratio is greater than one then the aquifer is topography controlled while if less than 1 then it is recharged controlled. Solving the equation using the basin's hydrogeological data (Varouchakis 2016) the calculated ratio is equal to 0.7 (Varouchakis et al. 2016a) and thus less than 1. This result means that the aquifer is recharge controlled. Therefore, precipitation surplus is affecting the water levels by recharging the aquifer.
METHODS
Background of ARX groundwater level model
The ARX model introduced by Bierkens et al. (2001) and Knotters & Bierkens (2001) is a linear time series model that relates explicitly the precipitation surplus with water table level. The parameters of ARX model are physically motivated (Knotters & Bierkens 2001). At locations where sufficiently long time series of water table levels are available, the ARX parameters are obtained from the model calibration process which minimizes the mean square error between the measured and the modelled values.
Equation (1) has a physical background. It describes the water table variations for discrete time steps assuming zero surface runoff and a linear relation between the water table level and drainage from the groundwater zone to the surface water. Besides precipitation surplus, other variables as well can be incorporated in the model and treated as exogenous (e.g. pumping rates, surface water levels) (Bierkens et al. 2001; Knotters & Bierkens 2001). It is assumed that the precipitation surplus is a global, space-invariant variable. This assumption is reasonable for relatively small areas, such as the Mires basin.
ARX groundwater level model for the Mires basin
We apply the ARX model to simulate the mean annual and biannual groundwater level of the basin. We use a recursive discrete time ARX model, which we embed in a discrete-time Kalman filter to estimate the model parameters and predict the optimal mean annual and biannual groundwater level.
The actual evapotranspiration is an important variable for the model, because it enters in the estimation of the precipitation surplus. The mean annual actual evapotranspiration over the island of Crete is 70% of the mean annual precipitation (Department of Water Resources Management 2000). In areas where the surface elevation is less than 300 metres above sea level it is 75% and in the Mesara valley has been estimated at 65% (Croke et al. 2000). The Mires basin though belongs to the Mesara watershed and it has elevation lower than 300 m. In addition, according to a recent work 75% evapotranspiration was found appropriate for modelling purposes at Mires basin (Varouchakis et al. 2016b).
Kalman filter identification of ARX model
The calculations can be divided into two steps. First, Equation (7) is used to predict the state at time based on measurements up to time . Secondly, at time as the new measurement becomes available the prediction is corrected by means of Equation (10). This yields the optimal linear estimate for the state at time based on the measurements up to time . The matrices F, B, M, Q and R must be known. The matrices F and B, however, are functions of the system parameters. These parameters are not known a priori for the ARX model in the water table level application. Therefore, the Kalman filter cannot be directly applied. The following set of equations describes the KFAA that estimates recursively the parameters and the state equation output of an ARX model (Ljung 1999).
KFAA of ARX model
where is the prediction of and is the (n1) regression vector that represents the gradient of with respect to the parameters . Since the true system parameters are unknown a priori it is assumed that . The estimation algorithm minimizes the mean square prediction error, , which means that Equation (15) is solved for all time steps using the parameters .
In the above equations, is the (nn) parameter error covariance matrix at , is the residual (innovation) covariance during the parameters update process, and is the scalar variance of the innovations in Equation (14), i.e. .
The Kalman filter algorithm is entirely specified by the sequence of data , the gradient , the parameter covariance matrix , the variance of the innovations , the true parameters or an initial guess, and the parameter error covariance matrix . The recursive estimate of the parameters and the output of the ARX model is implemented in the Matlab® programming environment using originally developed code.
For the specific application, is the (41) vector of the parameter estimates (a, b, c and d) of the state Equation (6) at time , is a (41) vector, is the (41) regression vector corresponding to each variable input involved in the state Equation (6) (, unit value for parameter c), the (44) covariance matrix used in the Kalman gain, and is a (44) parameter error covariance matrix.
RESULTS AND DISCUSSION
The modelling of temporal groundwater level fluctuations for the Mires basin aims to analyse the aquifer behaviour in conjunction with physical quantities (precipitation, pumping) that affect the aquifer's water table level. Useful information therefore can be extracted from the relationship of mean annual groundwater level, annual precipitation and annual pumping as well as from biannual groundwater level, precipitation and pumping for future predictions.
The data used for the model calibration span the interval from the hydrological years 1980–1981 to 2005–2006 as during the last periods of the time span the extreme values of the dataset are present. The groundwater level is predicted consecutively for the hydrological years 2006–2007 to 2009–2010 and during time windows of four, three and two years. The model initially is run separately to predict the groundwater level for each year period as the number of time steps is small. Therefore, each time the model is run the optimal parameters of the system at the last time step are calculated. Based on these parameters the groundwater level prediction at the next time is provided considering the corresponding precipitation surplus and pumping. Every prediction is then added consecutively in to the model in order for the estimate at the next time period to be performed. Therefore, the system for each tested time period is updating the optimal parameters. Similarly, the model is run for longer time windows in order for the efficiency of the model to be assessed.
Period . | 2007/08–2009/10 . | 2008/09–2009/10 . | 2006/07–2009/10 . |
---|---|---|---|
3-year window . | 2-year window . | single-year window . | |
MAE (m) | 2.86 | 1.83 | 0.93 |
MARE | 0.25 | 0.16 | 0.09 |
Period . | 2007/08–2009/10 . | 2008/09–2009/10 . | 2006/07–2009/10 . |
---|---|---|---|
3-year window . | 2-year window . | single-year window . | |
MAE (m) | 2.86 | 1.83 | 0.93 |
MARE | 0.25 | 0.16 | 0.09 |
Period . | 2007/08-2009/10 . | 2008/09-2009/10 . | 2006/07-2009/10 . |
---|---|---|---|
3-year window . | 2-year window . | single-year window . | |
MAE (m) | 1.86 | 1.65 | 0.81 |
MARE | 0.17 | 0.14 | 0.07 |
Period . | 2007/08-2009/10 . | 2008/09-2009/10 . | 2006/07-2009/10 . |
---|---|---|---|
3-year window . | 2-year window . | single-year window . | |
MAE (m) | 1.86 | 1.65 | 0.81 |
MARE | 0.17 | 0.14 | 0.07 |
In this work, mean annual values have been used mainly because of pumping data shortage. Modelling of groundwater level at annual scale is useful because it can help to combine the proposed model with climate data models that usually provide information at annual scale (Tsanis et al. 2011; IPCC 2014). In addition, many water resources management reports regarding catchment management refer to actions at annual scale (Chen et al. 2002; Barthel 2011; Bozzola & Swanson 2014).
However, biannual groundwater level and precipitation data (including precipitation surplus) are available, while biannual pumping can be calculated as has been previously explained. Therefore, the proposed model can also be set at a biannual basis. The sub-annual model formation provides a more detailed approach to the groundwater level fluctuations at the basin and a clearer assessment of the proposed model efficiency and applicability.
Period . | 2006/07–2009/10 . | 2006/07–2009/10 . |
---|---|---|
4-year window . | single-year window . | |
MAE (m) | 2.65 | 1.72 |
MARE | 0.27 | 0.20 |
Period . | 2006/07–2009/10 . | 2006/07–2009/10 . |
---|---|---|
4-year window . | single-year window . | |
MAE (m) | 2.65 | 1.72 |
MARE | 0.27 | 0.20 |
Period . | 2006/07–2009/10 . | 2006/07–2009/10 . |
---|---|---|
4-year window . | single-year window . | |
MAE (m) | 1.45 | 0.92 |
MARE | 0.20 | 0.14 |
Period . | 2006/07–2009/10 . | 2006/07–2009/10 . |
---|---|---|
4-year window . | single-year window . | |
MAE (m) | 1.45 | 0.92 |
MARE | 0.20 | 0.14 |
At an annual basis the differences might be small (Tables 1 and 2) but they are still significant for prediction accuracy. The model form with the abstraction term is more accurate in terms of MAE by 54% for the three-year window predictions, 11% for the two-year window predictions and 14% for the single-year window predictions compared to the original model form. On the other hand, at the biannual basis the calibration period (Figures 6 and 7) has a similar behaviour with the calibration period of mean annual levels. The four-year window predictions (Tables 3 and 4) were improved from the presence of the pumping term in the model by 82% while, the single-year window predictions by 84%. Similarly, this applies for the three- and two-year window predictions where an improvement of 75% occur. As observed in Figures 3–7, as the time window becomes shorter the predictions also become more accurate. However, the ARX model with the support of KFAA can provide accurate results for longer time windows. According to the results it is clearly obtained that the pumping term addition in the ARX model improves the model's efficiency and the accuracy of results especially in the longer prediction windows and with the biannual data.
The general aim of this paper is to introduce the KFAA in hydrological applications. The specific goal is to model the Mires basin aquifer temporal response. The ARX model Equation (6) used for the mean annual and biannual groundwater level modelling delivers accurate predictions as evidenced by the low validation errors (Tables 1–4). The model is calibrated for the period 1981–2006 whereas the period 2007–2010 is used as the validating period. The ARX model is embedded in a discrete-time Kalman filter, which is calibrated adaptively up to the period preceding the prediction window. We include in the calibration period of the ARX model the recent extreme groundwater level fluctuations (year 2002–2003) in order to calibrate the model more efficiently. As observed in Figures 3–7 the accuracy of the estimates improves as more of the training data are processed by the method whereas the recent extremes are captured reasonably well by the model.
According to Equation (6), the mean annual groundwater level value estimate at time depends on groundwater level measured at the previous step, rainfall and/or abstraction volume for the specified prediction period , and the estimated parameters that describe the aquifer dynamic response. The optimal parameter values are adaptively updated as new data values become available. The validation results of the two ARX model forms show that the predictions for the years beyond 2006 follows closely the groundwater level trend providing accurate results for the different time windows assessed. In addition, the predictions can be used in place of measurements to calculate the parameters for future predictions.
ARX estimates will be mostly reliable for short time periods up to a year ahead for future predictions, as long term or even short term predictions of rainfall intensity involve high uncertainty. In addition the abstraction volume rate is also uncertain, because it depends on anthropogenic activities that cannot be accurately predicted for longer than a year. The tables of results and the figures show that the incorporation of the abstraction rate in Equation (6) improves the estimates. The improvement might be small in the single year prediction window of annual data, but considerable for biannual and longer prediction periods. Overall, the model adaptation is better when the abstraction rate is applied.
The initial data analysis showed that the precipitation is primarily correlated to the groundwater level while pumping has a lower correlation. However, the declining trend results from the extensive pumping and the poor recharge, while the effect of pumping along with the precipitation on the groundwater level is combined. Therefore, the proposed model is trained for an initial period that includes the extremes of all the variables involved, in order for the parameters of the model to be appropriately determined after every update so as to follow the best possible primary variable trend. The correlation with the groundwater level might be lower compared to precipitation, but the pumping variable helps the model to better capture the level's trend. If the correlation of the groundwater level was favoured to pumping and the model did not capture well the groundwater level with its original form, then the addition of the pumping variable would have more clearly improved the estimations.
The annual precipitation overall has declined at a rate of around 5.5 mm/yr over the past 30 years, while pumping has an increasing trend of 0.22 Mm3/yr. This means that a groundwater deficit of almost 7 Mm3 has been created in the basin and therefore a declining trend exists. The latter value agrees also with the findings of the recent water resources management report of the region of Crete for the area (Special Water Secretariat of Greece 2015).
The strong negative correlation between rainfall and pumping leads to a similar effect to the groundwater level as both affect the groundwater level in the same direction. Increased rainfall and decreased pumping both lead to groundwater level increase, while the opposite also applies. Thus, the original model was trained first to capture the declining trend according to the rainfall variability and then pumping was added which follows closely the opposite of precipitation trend. Therefore, because of this correlation, similar but improved results (effect) are expected from the additional use of the pumping variable, as indeed have been observed. The use of pumping improves the model results by capturing more accurately the groundwater level trend and this is presented clearer in the modelling results with the biannual data. This is also the reason to include other auxiliary variables at the original equation (Bierkens et al. 2001). The falling trend is already captured by the use of the rainfall and the use of a correlated variable, such as pumping, reinforces the predicting capability of the model.
Finally, future ARX predictions should involve mainly reliable rainfall estimations from climate models or from statistical approaches. In addition, improvement of ARX model predictions in longer time windows should involve more data on shorter time scales and knowledge of potential pumping activity of unauthorized wells.
CONCLUSIONS
This work was initially motivated by the dramatic decrease in recent years of groundwater levels in Mesara valley due to overexploitation. In light of this development, the sparse monitoring of the basin and the expected adverse effects of climate change on water resources accurate temporal modeling of the groundwater level is necessary.
We use an ARX time series model to relate mean annual and biannual groundwater level to respective precipitation surplus and abstraction rates. The ARX model is embedded in a Kalman filter under the adaptation algorithm functionality to adaptively estimate the model parameters. After initial considerable fluctuations, the ARX model accurately tracks the level's temporal evolution. The ARX model estimates become progressively more accurate as more data are incorporated, and the model parameters are recursively refined during each update. The recursive nature of the parameter inference procedure implies that the model becomes more accurate as the length of the time series increases. As shown, the groundwater level in the Mires basin exhibits a definite declining trend. The model captures this trend accurately for both time series, captures the level's extremes and accurately predicts the groundwater level for the time periods 2007–2010. A reliable prediction of future groundwater levels for the Mires basin can be used to develop sound management plans for the exploitation of the groundwater resources in the area. For example, the impact of different scenarios for pumping activity and precipitation trends can be investigated. Model predictions should be incorporated with realistic estimates of the water needs of the area, leading to management recommendations such as control of pumping rates, especially during years when the groundwater level is expected to drop.