## Abstract

Groundwater level prediction and forecasting using univariate time series models are useful for effective groundwater management under data limiting conditions. The seasonal autoregressive integrated moving average (SARIMA) models are widely used for modeling groundwater level data as the groundwater level signals possess the seasonality pattern. Alternatively, deseasonalized autoregressive and moving average models (Ds-ARMA) can be modeled with deseasonalized groundwater level signals in which the seasonal component is estimated and removed from the raw groundwater level signals. The seasonal component is traditionally estimated by calculating long-term averaging values of the corresponding months in the year. This traditional way of estimating seasonal component may not be appropriate for non-stationary groundwater level signals. Thus, in this study, an improved way of estimating the seasonal component by adopting a 13-month moving average trend and corresponding confidence interval approach has been attempted. To test the proposed approach, two representative observation wells from Adyar basin, India were modeled by both traditional and proposed methods. It was observed from this study that the proposed model prediction performance was better than the traditional model's performance with R^{2} values of 0.82 and 0.93 for the corresponding wells' groundwater level data.

## INTRODUCTION

Hydrological systems are widely studied using models that describe and relate various physical processes. Conceptual or physical models are the key tools to understand and predict the variable responses of the system (Anderson & Woessner 1992). However, most of the time, the absence of adequate physical parameters and poor understanding of the physical system may lead to poor modeling of the system. Hence, stochastic modeling that preserves the statistical relationship within the past observations is often used to predict/forecast the response of a system. A time series model is an empirical model for stochastically simulating and forecasting the behavior of uncertain hydrologic systems (Chatfield & Weigend 1994; Sannasiraj *et al.* 2004; Kim *et al.* 2005). This kind of model is often useful to model any hydrological system such as groundwater system for groundwater level fluctuation modeling and forecasting where there is limited availability of additional auxiliary data sets.

In general, stochastic time series models can be classified into two kinds: univariate and multivariate models. Univariate models consist of single variable series which can be modeled by autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA) group of models (Young 1999; Ahn 2000; Bidwell 2005; Adamowski & Chan 2011; Pektas & Cigizoglu 2013; Bayer *et al.* 2017), or traditional decomposition-based time series models (Peng & Liu 2000; Yang *et al.* 2009; Lu *et al.* 2013; Moeeni *et al.* 2017). On the other hand, multivariate models involve two or more input variables and their dynamic relationships with the output can be modeled by transfer function-noise (TFN) modeling technique. For example, TFN models predict the groundwater level response with respect to rainfall pulses (Tankersley *et al.* 1993; Bierkens *et al.* 1999; Bidwell 2005; Yu & Chu 2012; Shirmohammadi *et al.* 2013; Mohanasundaram *et al.* 2017). Multiple linear regression (MLR) techniques for predicting groundwater level involve the process of correlating the groundwater levels with one or more input variables which are causing the groundwater level responses (Sahoo & Jha 2015; Seeboonruang 2015; Seo *et al.* 2015). The application of artificial neural networks (ANN) and wavelet decomposition and the combination of ANN and wavelet decomposition models are also widely used for predicting groundwater level signals (Coulibaly *et al.* 2001; Coppola *et al.* 2003; Daliakopoulos *et al.* 2005; Kisi 2010; Mohanty *et al.* 2010; Adamowski & Chan 2011; Sreekanth *et al.* 2011; Maheswaran & Khosa 2013; Maiti & Tiwari 2014; Khaki *et al.* 2015; Nourani *et al.* 2015). However, the accuracy of the prediction and forecasting efficiency of the univariate time series model is mainly dependent on the proper conceptualization, model identification, and parameter estimation. Therefore, time series models for groundwater level signals must be considered for the seasonal component properly as monthly groundwater level signals are mostly influenced by periodicity or seasonality.

Seasonal autoregressive integrated moving average (SARIMA) is one of the varieties of ARIMA models. SARIMA models are especially applied to time series data having a strong seasonality component. Seasonal component is the significant component in the hydrological time series variables such as rainfall, stream flow, and groundwater level time series data. In general, finding suitable ARIMA and SARIMA models for modeling a univariate system is a trial and error process to find the suitable model structure with appropriate model parameters based on Box and Jenkins time series methodology (Box & Jenkins 1976). There have been many studies illustrating the application of SARIMA models for flow modeling and forecasting the flow in river systems (Govindasamy 1991; Mondal & Wasimi 2006; Modarres 2007; Wong *et al.* 2007; Fernández *et al.* 2008; Martins *et al.* 2011). Comparison of ARIMA models against deseasonalized ARMA (DARMA) models for weekly, monthly, and bi-monthly time series of karstic flow showed that DARMA model forecasting performance was better for the bi-monthly and monthly time series flow data, whereas the ARIMA model performed only slightly better than DARMA models for weekly data (Ghanbarpour *et al.* 2012). Similarly, there have been many studies illustrating the application of SARIMA models to predict and forecast the groundwater level fluctuations (Ahn & Salas 1997; Ahn 2000; Montanari *et al.* 2000; Adhikary *et al.* 2012; Shirmohammadi *et al.* 2013).

Paul (2008) developed a univariate time series model to forecast groundwater levels. Two different multiplicative SARIMA models, SARIMA (2,0,0)(3,1,0)_{12} and SARIMA ((1,2,10),0,0)(3,1,0)_{12} were developed. The developed models were compared for their forecasting accuracy in terms of root mean square (RMSE) values. The SARIMA ((1,2,10),0,0)(3,1,0)_{12} model performed better than the SARIMA (2,0,0)(3,1,0)_{12} model. Similarly, Adhikary *et al.* (2012) developed various seasonal ARIMA models and traditional decomposition time series models to fit the weekly groundwater level fluctuation data. Seasonal ARIMA (1,1,1)(1,1,1)_{52} model was selected among other forms of seasonal ARIMA models since the selected model error was free from autocorrelation process. In addition to that, it was proved in their study that the forecasting performance of the chosen seasonal ARIMA (1,1,1)(1,1,1)_{52} model was performing better in terms of mean absolute error (MAE), mean square error (MSE), and maximum absolute error (Max AE) when compared to traditional decomposition models. As the seasonal component is significant in the raw groundwater level signals, the coefficient for fitting the seasonal AR and MA components in SARIMA models perhaps have huge uncertainty in reproducing the seasonal variation in the prediction or validation period.

Decomposition-based models are developed based on decomposing the actual time series signal into trend, seasonal, and random components. Decomposition model applications on groundwater level response have been widely studied, for example, Seeboonruang (2014) studied the groundwater level response against climate and anthropogenic activities by decomposing deep groundwater signal into trend, seasonal, and random components. Eight different methods of trend estimation technique were applied to long-term groundwater levels monitored in the lower Chao Phraya basin in Thailand. Fifth-order polynomial interpolation of trendlines significantly matched with groundwater withdrawal pattern. Similarly, the periodic behavior occurring from detrended residuals was compared with the global climatic variability. They concluded that the magnitude response of deep groundwater in the region was much affected by the anthropogenic activities as compared to the global climatic variability. Similarly, Almedeij & Al Ruwaih (2006) investigated the periodic pattern of groundwater level fluctuation for six monitoring wells located in residential areas in Kuwait. Monthly groundwater level measurements obtained from observation wells were correlated with monthly averaged temperature and monthly totaled rainfall time series data over 10 years. The cyclic pattern of groundwater level together with rainfall and temperature was examined in the frequency domain using periodogram analysis. The trend component of groundwater level series was fitted by simple regression method in their study. Furthermore, they estimated the seasonal component by calculating monthly averages of detrended groundwater level series. However, the traditional way of fitting the trend component and estimating the seasonal component might cause a significant error when the mean and variance in the groundwater level time series data is a heteroscedastic process (varying mean and variance). Therefore, there is a need to consider a proper trend and seasonal estimate in the groundwater level time series data for minimizing the error in real-time prediction and forecasting the groundwater levels using decomposition-based time series models.

Traditional deseasonalization technique for monthly groundwater level data is done by calculating long-term monthly average values for the corresponding months from the time series data. The long-term monthly average values are used as the seasonal component in the traditional decomposition-based models. In general, these seasonal values denote the representative average values from the long-term groundwater level fluctuation data. These seasonal values may be representative average values for the stationary groundwater level fluctuation data as mean and variance is almost constant over time. However, these seasonal values might not be representative average values for non-stationary groundwater level fluctuation data as mean and variance varies over time.

Therefore, in this study, we have developed a novel technique to properly estimate the seasonal component and deseasonalize the time series by adopting a 13-month trend and corresponding confidence interval-based approach. The specific objectives of this study are: (1) to construct the confidence interval for irregularly varying groundwater level signal based on 13-month moving average trend values to estimate the seasonal component and (2) to compare the prediction performance of the proposed DARMA model against the traditional DARMA and SARIMA models.

## METHODOLOGY

### Basic steps in time series modeling

H0: = 0

and the alternate hypothesis is,

H1: < 0

where is the time series value at time t; is the differencing operator; *p* is the number of lagged difference terms; is the autoregressive parameter; is the white noise residual; and *c* is the constant.

Non-stationary data can be corrected by first- or higher-order differencing of the time series data to make them stationary. Once the data are converted to the stationary process, a suitable number of AR and MA model parameters were identified by looking at the autocorrelation function (ACF) and partial autocorrelation function plots (PACF). After identifying the ARMA model structure, the parameter values were estimated by the maximum likelihood technique. Finally, the estimated model residuals were checked for independence, normality, and homoscedastic conditions prior to prediction and forecasting applications during the validation time period.

### Deseasonalized ARMA models

#### Traditional deseasonalization method

where … are non-seasonal autoregressive parameters of order p; … are non-seasonal moving average parameters of order q; is the groundwater level value during the month j and year i; is the estimated mean monthly values corresponding to the month j.

#### Proposed deseasonalization method

The proposed 13-month moving average and corresponding confidence interval-based seasonal estimation and deseasonalization methods are explained in four major steps as follows.

*Step 1. Estimating 13-month trend-cycle term*: The trend component of the time series data was estimated by cell centered 13-month moving average filter. This rolling 13-month moving average filter calculates the trend component at the center cell based on the adjacent six cells' values on both the sides of the center cell.

*Step 2. Constructing confidence interval around the 13-month moving average trend line*: The mean and standard deviation values of each 13-cell sliding window were calculated along with the estimated trend values. The confidence interval was constructed around the 13-month trend line using the estimated mean and standard deviation values. The standard equations of 99% confidence interval were used for constructing the confidence interval around the trend line. The upper bound (Ub) and lower bound (Lb) equations for 99% confidence interval are given as follows: where is mean of the sliding window values; is standard deviation of the sliding window values;

*N*is number of sliding window cells (13 numbers).

*Step 3. Modified deseasonalizing*: Unlike traditional deseasonalizing, the range of the raw time series data used for calculating the seasonal component in modified deseasonalizing were restricted with the bounded regions based on the constructed confidence interval limits (Equations (3) and (4)). As the confidence interval region excludes some of the extremely varying groundwater level values, the smoothed range of the groundwater level data set was only used for calculating the seasonal component. This way of limiting the groundwater level time series data from extreme values appropriately estimates the seasonal component for heteroscedastic groundwater level signals. But, the traditional deseasonalization method considers the entire actual range of groundwater level time series data for calculating the seasonal component, which might be overestimated in the case of highly varying groundwater level conditions. Furthermore, the estimated modified seasonal component was detected from the raw time series data to obtain the modified deseasonalized time series.

*Step 4. Fitting the ARMA model for modified deseasonalized time series data*: The modified deseasonalized time series data were fitted with proper AR and MA models, represented here as MDs-ARMA as follows: where is the monthly average estimate corresponding to the month j from the modified deseasonalization approach.

### SARIMA models

where … are non-seasonal autoregressive parameters; … are non-seasonal moving average parameters; … are seasonal autoregressive parameters; … are seasonal moving average parameters; is the water level value at time t; is the white noise; , are non-seasonal orders; , are seasonal orders; c is a constant; *D* is the order of non-seasonal differencing; and is the order of seasonal differencing.

#### Model information criterion indices

*L*is the log likelihood value;

*m*is the number of parameters in the model;

*n*is the number of observations; and is the residual sum square.

#### Residual test

*n*is sample size;

*m*is the lag length; is the autocorrelation coefficient; and is the Chi-square value with m degrees of freedom.

#### Model performance indices

^{2}) indices were used to assess the developed time series model performances during the validation data period. The RMSE and R

^{2}formulations are given as follows: where

*N*is the sample size; is the observed value; is the forecasted/predicted value; and is the observed mean value.

### Application site: Adyar basin

The Adyar basin is in the northeast coastal part of Tamil Nadu, India (Figure 1). The basin receives rainfall during both southwest (June–September) and northeast monsoons (October–December) with the northeast monsoon contributing more than 65% of the total annual average rainfall. The southwest monsoon rainfall is erratic in nature and summer rainfall is negligible. Long-term annual average rainfall is about 1,315 mm (WRO 2007). Elevation ranges from 183 m in the west to sea level in the eastern part of the basin. Soils in the basin have been classified into clayey, black, red sandy, alluvial, and colluvial soils. Black soils occur in the depressions adjacent to hilly areas in the western part. Alluvial soils occur along the river courses and eastern part of the coastal areas. Major hydrogeology in the basin has been classified as unconsolidated, semi-consolidated, and weathered and fractured rock formations. Groundwater occurs under phreatic and semi-confined conditions in inter-granular pore spaces in sands, sandstones, and in bedding planes and thin fractures in shales (WRO 2007). The Adyar basin river network with the locations of observation wells is shown in Figure 1. Observation wells are mostly concentrated in the eastern part of the Adyar basin.

### Data used

Twenty-nine observation wells monthly depth to water table data were procured from the Institute for Water Studies (IWS), Taramani. However, for brevity, the proposed time series method, MDs-ARMA, along with TDs-ARMA and SARIMA was tested on two observation wells (well-13012 and well-13235) as shown in Figure 1 (as selected wells). The depth to water table data were converted to groundwater level elevation above mean sea level after subtracting the depth to water table values from digital elevation model values of the corresponding well locations in Adyar basin. The total length of the monthly groundwater level data used in this study is about 240 months (*N* = 240) from January 1988 to December 2007. Out of 240 months, the first 180 months' values (3/4th of the total data set) were used to calibrate the times series models and the remaining 60 months' values (1/4th of the total data set) were used to validate the models.

## RESULTS AND DISCUSSION

### Autocorrelation and partial autocorrelation analysis

As the first step in time series modeling, raw time series data were analyzed for the unit root non-stationary process (Figure 2). From the ADF test, it was observed that the null hypothesis for both the wells (well-13012 and well-13235) were accepted with the *p* values of 0.57 and 0.43 at 95% confidence level, respectively. According to ADF test, both the wells (well-13012 and well-13235) groundwater level times series data were implying non-stationarity due to the existence of the unit root process. Therefore, in SARIMA models, a seasonal differencing was done to change the non-stationary series to a stationary series. However, in the case of deseasonalized models, long-term average monthly values subtracted from raw time series (deseasonalized series) data itself changed the non-stationary data into stationary data. Therefore, no further differencing was done in the deseasonalized data as the deseasonalized time series itself was satisfying the stationarity test.

ACF and PACF plots of raw time series data for both the wells show the strong seasonality pattern in the monthly groundwater level time series data (Figures 3 and 4). This is obvious from the fact that the groundwater level response is a function of climatic variables such as rainfall and evapotranspiration which are also seasonal in nature. Therefore, the SARIMA models were chosen with seasonal differencing along with SAR and SMA parameters. The partial autocorrelation was significant at the lag of 1 month for both the wells which indicates that the data need to be modeled with simple non-seasonal AR process with one AR and MA parameters for both SARIMA and DARMA models.

### Deseasonalized time series using traditional and proposed methods

Groundwater level signal along with 13-month moving average trend and the constructed confidence interval for both the wells are shown in Figures 5(a) and 6(a). The extreme variation in the groundwater level signal out of the confidence interval appears during pre-monsoon (June) and post-monsoon (January) seasons within the year and a repeated pattern was observed for all the years in the calibration data set (Figures 5(a) and 6(a)). The seasonal component was estimated by averaging the corresponding monthly values from January to December in the observed groundwater level signals. The resulting 12-month values were repeated for the remaining years during the calibration period. This is shown in Figures 5(b) and 6(b) as the traditional seasonal component. On the other hand, the modified seasonal component was estimated from January to December by averaging the corresponding months using the time series data in which the extremely varying groundwater level signals beyond the confidence interval were replaced with the corresponding boundary values of the confidence interval. The resulting 12-month values were repeated for the remaining years in the calibration period. This is shown in Figures 5(b) and 6(b) as modified seasonal component. As the extremely varying time series data were smoothed by confidence interval values, the resulting estimated modified seasonal component values were relatively lesser in magnitude when compared to traditionally estimated seasonal values (Figures 5(b) and 6(b)).

A significant difference among the traditional and modified seasonal component values was observed. For example, the estimated traditional and modified seasonal values ranged from 3.5 m to 7 m and 4.7 m to 6.4 m, respectively, for well-13012. A similar difference in the computed seasonal values among traditional and modified time series methods was observed for the other well (well-13235) also. Furthermore, the deseasonalized time series for TDs-ARMA and MDs-ARMA were obtained by subtracting the respective estimated traditional and modified seasonal components from the raw groundwater level time series signals for both the wells (Figures 5(c) and 6(c)). As there was a difference among traditional and modified seasonal estimates, there were corresponding changes in the calculated deseasonalized time series data (Figures 5(c) and 6(c)).

### Model identification and parameter estimation

#### SARIMA models

The initial model structure can be guessed based on ACF and PACF plots of the time series data with unique combinations of seasonal and non-seasonal parameters for SARIMA models (Figures 3 and 4). Therefore, the three most suitable SARIMA model forms were identified for each well's time series data, from which the best one was selected based on AIC and BIC values (Table 1). The three identified forms of SARIMA models for both the wells are as follows:

Well no. | Model | AIC | BIC |
---|---|---|---|

13012 | SARIMA(1,0,0)(1,1,0)_{12} | 470.6 | 480.2 |

SARIMA(1,0,1)(1,1,0)_{12} | 470.0 | 482.5 | |

SARIMA(1,0,1)(0,1,1)_{12} | 429.9 | 442.7 | |

13235 | SARIMA(1,0,0)(1,1,0)_{12} | 585.8 | 595.4 |

SARIMA(1,0,1)(1,1,0)_{12} | 587.6 | 600.4 | |

SARIMA(1,0,1)(0,1,1)_{12} | 554.5 | 567.3 |

Well no. | Model | AIC | BIC |
---|---|---|---|

13012 | SARIMA(1,0,0)(1,1,0)_{12} | 470.6 | 480.2 |

SARIMA(1,0,1)(1,1,0)_{12} | 470.0 | 482.5 | |

SARIMA(1,0,1)(0,1,1)_{12} | 429.9 | 442.7 | |

13235 | SARIMA(1,0,0)(1,1,0)_{12} | 585.8 | 595.4 |

SARIMA(1,0,1)(1,1,0)_{12} | 587.6 | 600.4 | |

SARIMA(1,0,1)(0,1,1)_{12} | 554.5 | 567.3 |

Table 1 shows that the AIC and BIC values were comparatively less for the third SARIMA model, SARIMA(1,0,1)(0,1,1)_{12}, when compared to the other two SARIMA models for both the wells. Therefore, the best model was chosen as SARIMA(1,0,1)(0,1,1)_{12} for further parameter estimation and model predictions. The best SARIMA model parameters were estimated using maximum likelihood technique and the estimated values and corresponding statistics are given in Table 2. Statistical t-test and *p*-value for AR and SMA parameters were highly significant at 0.01 alpha level (99% confidence level), whereas one of the MA parameters for well-13235 was not statistically significant. The constant parameters were also not statistically significant for both wells (Table 2). Therefore, it can be concluded that the SARIMA (1,0,1)(0,1,1)_{12} process during the calibration time period was mainly driven by AR and SMA parameters rather than MA and constant parameters.

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | SARIMA(1,0,1)(0,1,1)_{12} | Constant | −0.02 | 0.02 | −0.77 | 0.22 |

ϕ_{1} | 0.65 | 0.08 | 8.41 | 0.00 | ||

θ_{1} | 0.20 | 0.08 | 2.41 | 0.01 | ||

ϴ_{12} | −0.79 | 0.05 | −15.00 | 0.00 | ||

13235 | SARIMA(1,0,1)(0,1,1)_{12} | Constant | 0.01 | 0.02 | 0.38 | 0.35 |

ϕ_{1} | 0.80 | 0.05 | 17.04 | 0.00 | ||

θ_{1} | 0.03 | 0.07 | 0.48 | 0.32 | ||

ϴ_{12} | −0.77 | 0.05 | −15.99 | 0.00 |

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | SARIMA(1,0,1)(0,1,1)_{12} | Constant | −0.02 | 0.02 | −0.77 | 0.22 |

ϕ_{1} | 0.65 | 0.08 | 8.41 | 0.00 | ||

θ_{1} | 0.20 | 0.08 | 2.41 | 0.01 | ||

ϴ_{12} | −0.79 | 0.05 | −15.00 | 0.00 | ||

13235 | SARIMA(1,0,1)(0,1,1)_{12} | Constant | 0.01 | 0.02 | 0.38 | 0.35 |

ϕ_{1} | 0.80 | 0.05 | 17.04 | 0.00 | ||

θ_{1} | 0.03 | 0.07 | 0.48 | 0.32 | ||

ϴ_{12} | −0.77 | 0.05 | −15.99 | 0.00 |

#### TDs-ARMA model

TDs-ARMA model parameters were estimated using maximum likelihood method and the corresponding statistics are given in Table 3. The *p*-value of autoregressive parameters for both the wells' data was highly significant at 0.01 alpha level (99% confidence level). However, the MA parameters for both the wells show relatively less significance than AR parameters during the calibration period.

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | TDs-ARMA(1,1) | Constant | 0.00 | 0.06 | 0.05 | 0.48 |

ϕ_{1} | 0.74 | 0.07 | 10.90 | 0.00 | ||

θ_{1} | 0.15 | 0.09 | 1.62 | 0.05 | ||

13235 | TDs-ARMA(1,1) | Constant | 0.00 | 0.08 | 0.03 | 0.49 |

ϕ_{1} | 0.81 | 0.05 | 16.79 | 0.00 | ||

θ_{1} | 0.01 | 0.07 | 0.12 | 0.45 |

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | TDs-ARMA(1,1) | Constant | 0.00 | 0.06 | 0.05 | 0.48 |

ϕ_{1} | 0.74 | 0.07 | 10.90 | 0.00 | ||

θ_{1} | 0.15 | 0.09 | 1.62 | 0.05 | ||

13235 | TDs-ARMA(1,1) | Constant | 0.00 | 0.08 | 0.03 | 0.49 |

ϕ_{1} | 0.81 | 0.05 | 16.79 | 0.00 | ||

θ_{1} | 0.01 | 0.07 | 0.12 | 0.45 |

#### MDs-ARMA model

The AR and MA model coefficients for the MDs-ARMA model were estimated using the maximum likelihood estimation technique using MATLAB software. The estimated AR and MA model parameters for MDs-ARMA model are given in Table 4. The AR coefficients for both the wells were significant at alpha 0.01 level (99% confidence level), but the MA coefficients were significant at 0.05 alpha level and 0.01 alpha level for well-13012 and well-13235, respectively (Table 4). This shows that the modified deseasonalized series model yields significant values for both AR and MA coefficients, unlike the traditional deseasonalized time series model.

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | MDs-ARMA(1,1) | Constant | −0.02 | 0.08 | −0.29 | 0.39 |

ϕ_{1} | 0.71 | 0.08 | 8.80 | 0.00 | ||

θ_{1} | 0.23 | 0.10 | 2.33 | 0.01 | ||

13235 | MDs-ARMA(1,1) | Constant | 0.02 | 0.11 | 0.18 | 0.43 |

ϕ_{1} | 0.75 | 0.06 | 11.79 | 0.00 | ||

θ_{1} | 0.17 | 0.08 | 2.13 | 0.02 |

Well no. | Model | Parameter | Value | Standard error | t-value | p-value |
---|---|---|---|---|---|---|

13012 | MDs-ARMA(1,1) | Constant | −0.02 | 0.08 | −0.29 | 0.39 |

ϕ_{1} | 0.71 | 0.08 | 8.80 | 0.00 | ||

θ_{1} | 0.23 | 0.10 | 2.33 | 0.01 | ||

13235 | MDs-ARMA(1,1) | Constant | 0.02 | 0.11 | 0.18 | 0.43 |

ϕ_{1} | 0.75 | 0.06 | 11.79 | 0.00 | ||

θ_{1} | 0.17 | 0.08 | 2.13 | 0.02 |

### Residual analysis for the developed models

A diagnostic check for the calibrated SARIMA, TDs-ARMA, and MDs-ARMA models was done using residual analysis for white noise process (zero mean with unit variance). The Ljung and Box statistical test proves that the overall serial correlation of the modeled error was statistically insignificant for both the wells for all three models.

### Groundwater level prediction and validation by developed models

One-month rolling ahead prediction of the developed time series models along with the observed groundwater level data during the validation period for both the wells are shown in Figure 7(a) and 7(b). In general, it was observed that the decomposition-based ARMA models (TDs-ARMA and MDs-ARMA) predict the groundwater levels better than the SARIMA model. Although the TDs-ARMA and MDs-ARMA model performances were relatively close to each other, the MDs-ARMA model performs comparatively better than the TDs-ARMA model for both the wells.

The improved model performance of the MDs-ARMA model over TDs-ARMA model was mainly due to the improved estimated seasonal component based on 13-month trend and confidence interval approach. However, the traditional way of estimating the seasonal component values were relatively higher than the proposed model thus reducing the overall model performance for the TDs-ARMA model (Figure 7(a) and 7(b)).

SARIMA models are the least performing models for both the wells as they highly over- and under-predict the actual groundwater level signal peaks and troughs during pre-monsoon and post-monsoon seasons (Figure 7). This is mainly due to the fact that the estimated SMA parameters were not able to capture the actual variation in the extreme peaks and troughs of the groundwater level signals. This uncertainty in SMA parameters might have been lower for the homoscedastic (constant mean and constant variance) groundwater level signals as the seasonal variation among the years is perhaps less variable. However, in this study, both the groundwater level signals show a strong heteroscedastic process, the estimated SMA parameters fail to mimic the actual seasonal variation in the groundwater level signals.

Figure 8 shows the predicted and observed groundwater levels on 1:1 diagonal plot for all three models. As well-13012 is located in the highly elevated hard rock terrain part of the Adyar basin, the variation in the groundwater level signal is naturally varying in higher magnitude as the rainfall response in the fractured geological system is highly dynamic. This is the reason for all models predicting the groundwater level signal with comparatively higher deviation from the 1:1 line (Figure 8(a)). However, well-13235 is located in the central part of the basin where the geological condition is highly porous, thus the rainfall response in this region on groundwater level signal is more or less uniform. As a result, the developed time series model predictions were close to the 1:1 line (Figure 8(b)). However, as expected, the MDs-ARMA model predicted values were lying closer to the 1:1 line than the other two models for both the well locations. SARIMA model-predicted values during some periods highly deviated from the diagonal line, thus reducing this model's overall performance significantly compared to TDs-ARMA and MDs-ARMA models (Figure 8).

The calculated RMSE and R^{2} values during the validation period for the developed time series models are shown in Table 5. MDs-ARMA model prediction performance during the validation period was always higher when compared to the other two models. The reduction in RMSE values for the MDs-ARMA model over the TDs-ARMA and SARIMA models was 10 cm to 30 cm and 100 cm to 180 cm, respectively, for both wells (Table 5).

Well | Models | Validation statistics | |
---|---|---|---|

RMSE (m) | R^{2} | ||

13012 | SARIMA | 1.75 | 0.13 |

TDs-ARMA | 0.90 | 0.76 | |

MDs-ARMA | 0.79 | 0.82 | |

13235 | SARIMA | 2.53 | 0.00 |

TDs-ARMA | 0.93 | 0.86 | |

MDs-ARMA | 0.64 | 0.93 |

Well | Models | Validation statistics | |
---|---|---|---|

RMSE (m) | R^{2} | ||

13012 | SARIMA | 1.75 | 0.13 |

TDs-ARMA | 0.90 | 0.76 | |

MDs-ARMA | 0.79 | 0.82 | |

13235 | SARIMA | 2.53 | 0.00 |

TDs-ARMA | 0.93 | 0.86 | |

MDs-ARMA | 0.64 | 0.93 |

## CONCLUSIONS

A unique deseasonalized ARMA model, MDs-ARMA, was developed to assess its prediction ability over traditional SARIMA and TDs-ARMA models in this study. A novel deseasonalizing method was proposed in this study based on 13-month moving average trend and confidence interval method for estimating the seasonal component. These models were tested for two observation wells' groundwater level signals. The groundwater level signal for both wells shows a strong seasonal pattern which was further identified from the ACF and PACF plots. Based on ACF and PACF plots, suitable SARIMA models with different forms were recommended, from which a best SARIMA model (SARIMA (1,0,1)(0,1,1)_{12}) was selected based on AIC and BIC values. Similarly, the deseasonalized time series models (TDs-ARMA and MDs-ARMA) were modeled with one AR and MA parameter as they were already adjusted for the seasonal component.

Maximum likelihood method was adopted to estimate the developed model parameters. Autoregressive parameters were more highly significant than other parameters in all the developed time series models. The calibrated models were then used to predict the groundwater levels in a validation period and corresponding RMSE and R^{2} values were calculated for the respective models. It was observed that the proposed method was performing better than traditional deseasonalized ARMA and SARIMA models. The specific conclusions from this study can be summarized as follows:

Deseasonalized ARMA models consider less number of parameters when compared to SARIMA models as deseasonalized ARMA models were modeled with one AR and MA parameters, whereas SARIMA models were modeled with one extra seasonal MA parameter along with one nonseasonal AR and MA parameter.

Both deseasonalized ARMA models (TDs-ARMA and MDs-ARMA) reproduced the seasonal variation pattern of the actual groundwater level signal during the validation period.

The estimated seasonal component by traditional deseasonalizing approach was comparatively higher than the proposed deseasonalizing approach as the actual variation in the groundwater level signal was used to estimate the long-term monthly average values. This reduces the TDs-ARMA model performances when compared to MDs-ARMA model.

The proposed MDs-ARMA model consistently performed better with both the wells' groundwater level signals, thus, the application of this model is robust across various geological conditions.

In this study, these time series models were evaluated with 1-month rolling ahead prediction method. However, the multi-time step prediction and forecasting performance of the developed models need to be analyzed for long-term forecasting applications. Although univariate time series models are useful in prediction and forecasting of groundwater level signals under data limiting conditions, they cannot be used to predict the response due to climate change and land use change scenarios. Under such conditions, the time series model and their coefficients need to be related to the physical process and parameters to assess the climatic and land use change cases.

## ACKNOWLEDGEMENTS

We would like to acknowledge the State Ground and Surface Water Resources Data Centre, IWS, Taramani, Chennai for providing groundwater level data to carry out this study. We would like to acknowledge the Water Engineering and Management program at Asian Institute of Technology for providing the computing facility. Finally, we would like to thank the reviewers for their valuable comments to improve this manuscript.