On the use of ARIMA models for short-term water tank levels forecasting

In this paper a statistical study on the time series of water levels measured, during the 2014, in the water tank of Cesine, Avellino (Italy) is presented. In particular, the ARIMA forecasting methodology is applied to model and forecast the daily water levels. This technique combines the autoregression and the moving average approaches, with the possibility to differentiate the data, to make the series stationary. In order to better describe the trend, over the time, of the water levels in the reservoir, three ARIMA models are calibrated, validated and compared: ARIMA (2,0,2), ARIMA (3,1,3), ARIMA (6,1,6). After a preliminary statistical characterization of the series, the models ’ parameters are calibrated on the data related to the ﬁ rst 11 months of 2014, in order to keep last month of data for validating the results. For each model, a graphical comparison with the observed data is presented, together with the calculation of the summary statistics of the residuals and of some error metrics. The results are discussed and some further possible applications are highlighted in the conclusions.


INTRODUCTION
Urban water forecasting allows making predictions of water needs, allowing water distribution system (WDS) operators to handle production, pumps energy and valve regulation.
Demand estimation provides a useful predictive tool to water utilities for monitoring and controlling WDSs.
Water demand, in the specific context of public water supply, is the total amount of water, including water losses to a certain extent, needed to supply customers, i.e. private households, public buildings, irrigation of public gardens, sewers cleaning, etc., within a time interval. A water demand estimation must ensure nodal demands while satisfying water quality and pressure levels across the network. A number of factors influence water demand, including population growth (residential, fluctuating), economic income, industry (size, technology involved, production types), local climate influencing seasonal demand patterns, price changes. Water demand is strongly related to water tank level, according to the well-known continuity equation.
Let V i (t) be the Volume of water inside the tank at the time t, h(t) be the water level inside the tank and Q i the generic flow rate (for instance positive if it enters the tank, negative otherwise). The continuity equation coupled with the tank law read respectively: where A t is the net surface of the tank of constant horizontal section. From the combination of Equations (1) and (2), it follows that there is a straight relation between flowrates and the water level.
In water applications, forecasting, even if complex, is beneficial for many reasons. First of all, investments in water supply systems can be extremely expensive, involving millions to hundreds of millions Euros, thus implying the need of forecasting a scenario which is naturally evolving because of the aging of pipelines, water tank degradation (Viccione et al. a), water leakages, sizing, etc. Minimizing investment costs is therefore essential either in a short or long term interval when planning new developments or system expansion. Predictions are also relevant in processes for reviewing prices (Herrera et al. ). Secondly, as water is a precious resource to preserve for environmental and financial reasons, it is of interest knowing in advance what the water demand is expected to be in the short term, allowing a sustainable exploitation. In addition, short term forecasting of water use helps optimizing day-to-day utility operations and planning maintenance schedules (Shabani et al. ).
Several forecasting methods can be adopted in water In this paper, ARIMA models are introduced as predictive tool for short term daily average water tank level forecasting in a rural area. This study is motivated by the fact that, in Italy, especially in rural towns, a single water tank can be used to serve the small area. For larger towns, the methodology used in this paper can be easily extended to any number of interconnected tanks, opening the way to studies about the relations and the influence between the models applied in each tank. Very often the tanks are equipped with water level meters and more rarely with flow rate meters (Viccione et al. b). Thus, the potential interest of water managers can be to take a decision on the basis of the post processing of recorded water levels in tanks. By predicting short term water levels, it is possible to prevent overflow discharges in water tanks and to optimize management plans for the water distribution, e.g.
during drought events.

METHODOLOGY
The Box-Jenkins/ARIMA forecasting model is here adopted.
It is amongst the most popular procedures for time series analysis and forecasting application. The order of an ARIMA model is usually denoted by the notation ARIMA(p,d,q), where p is the order of the autoregressive part, d is the order of the differencing and q is the order of the moving-average process. The general source formula is: in which Y t is the value of the series observed at the time t, B is the delay operator, Φ are the autoregressive polynomials, θ are the moving average polynomials and e t is the difference between the observed value Y t and the forecastŶ t at the time t.

Error metrics
To have a numerical comparison on the effectiveness of the proposed models, some error metrics can be adopted. In this paper, the authors use the Mean Percentage Error (MPE), the Coefficient of Variation of the Error (CVE) and the Mean Absolute Scaled Error (MASE), defined as follow: The MPE measures the error distortion, i.e. it is able to describe if the model overestimates or underestimates the actual data: The variation from the actual data in absolute value is given by the CVE, that provides the error dispersion: where Y is the mean value of the actual data in the considered time range. The MASE for seasonal time series is computed according to the following formula (Franses ): In this case, according to the features of the dataset that will be presented in the following sections, a simple naïve model based on the assumption that the data of today is the same as yesterday

DATASET ANALYSIS
This statistical study was applied on the time series of the levels observed in one of the reservoirs of the water supply system of the town of Avellino, Italy, located in the area of Cesine. Data refer to daily average water levels, expressed in meters, measured in 2014, from January to December.  The autocorrelation plot of the reconstructed series, Figure 1(c), shows that the function decreases as the lag increases. So, a periodicity in the data cannot be identified.
In these cases, the forecasting through seasonal determinis-

CALIBRATION OF THE MODEL
In order to better describe the trend, over the time, of the water levels in the reservoir, three ARIMA models were performed. Coefficients were estimated on the first 333 calibration periods by means of the statistical program 'R' and using the likelihood maximization as technique of parameter estimation.
Model ARIMA (2,0,2) Because of the stationarity of the data, ARIMA model (2,0,2) does not include a differentiation process. The prediction of the water level in the tank in a given period t, provided by the model, is described by Equation (7).     Model ARIMA (3,1,3) contemplates an order one of differentiation of the data. Moreover, both an autoregressive and a moving average term were added. The level in the reservoir, predicted by the model at a generic period t, is described by Equation (8).
This model, as the previous one, has a forecasting time horizon of one day. Table 5 shows the estimated value of the coefficients of the model and their relative standard errors.
Model ARIMA (6,1,6) In order to provide a three-day forecasting horizon a model ARIMA (6,1,6) was developed. In the phase of estimation of parameters, the values of the first three autoregressive parameters (φ 1 , φ 2 , φ 3 ) and the first three moving average parameters (ϑ 1 , ϑ 2 , ϑ 3 ) were manually set to zero, performing the likelihood maximization with the new function.
A differentiation of order one of the series was also carried out. The model forecasting formula is reported in

VALIDATION OF THE MODEL
In order to verify the forecasting abilities of the three proposed models, a validation phase was carried out by using, as already explained, the values of the daily water level registered in the tank of Cesine in the month of December 2014.
The summary statistics of the forecast error are reported in Table 7. Model (3,1,3), instead, averagely overestimates the data observed throughout the validation period, as can be noticed in Figure 3(b).
Figure 3(c) shows the comparison between the observed water level and the forecast level provided by the ARIMA model (6,1,6). The forecast shows the typical delay, since the information extracted to predict the series is that of the three previous days.

RESULTS AND DISCUSSION
In order to quantitatively compare the proposed models, the analysis of the forecast errors in the calibration phase (residuals) and in the validation phase (errors) has been executed. The ARIMA model (2,0,2) has both an average of residuals and a median equal to zero and the standard deviation is particularly low, too (Table 8) Residuals of the ARIMA model (2,0,2) appear to be symmetrically and regularly distributed (Figure 4(a)). As regards the other two models there are slight asymmetries in the histograms, which appear of modest magnitude, also when evaluating the quantile-quantile diagrams of Figure 5. Thus, in general the proposed ARIMA models guarantee a forecast error apparently due only to random fluctuations well described by a normal distribution.
As for the autocorrelation of the residuals (Figures 6), the models (2,0,2) and (3,1,3) present a very low autocorrelation. The model (6,1,6), instead, seems not to have been able to exploit all the autocorrelation of the series. This result was predictable due to the fact that the lowest lag data (1, 2 and 3) could not be used in order to extend the forecast horizon to the next three days.
In Table 9, the summary statistics of the errors calculated in the validation phase are reported. As expected, the error distributions tend to get worse with respect to the calibration phase. Anyway, the mean errors and the standard deviations still give good results.
Finally, in Table 10

CONCLUSIONS
In this paper the problem of modelling the time series of water levels in a reservoir connected to the supply system of a city has been discussed. In particular, a methodology to model and forecast the daily levels was applied. The results of the modelling were very encouraging both by comparing, graphically, the observed level with the expected one, and by analysing the distribution of the forecast error from a quantitative point of view. The best forecast results both in the calibration phase, i.e. during the parameter estimation, and in the validation phase, were obtained with an ARIMA model (2,0,2). A model with differentiation of the first-order series, ARIMA (3,1,3), was also tested, however it did not provide clear improvements in the forecast. Both of these models are characterized by a one-day forecasting horizon. To obtain a longer time horizon, an ARIMA model (6,1,6) was implemented. It provided slightly worse forecasts but with a three-day forecast horizon. Therefore, a possible and novel approach could be to implement two models in parallel: the ARIMA (6,1,6) model could be used for 'medium term' (three days ahead) rough predictions, combined with the ARIMA (2,0,2) model, used to provide short term (one step ahead, daily) forecast. The use of these techniques on water data showed excellent potential, presenting a possible way for the integration between the current water service monitoring systems and the forecasts obtained with this methodology. The contribution of this paper is to extend the studies on the adoption of ARIMA techniques on this kind of data, dealing with a particular case study and focusing on the methodology to choose the best TSA model for the data under study. In the authors' opinion, the proposed approach could give water managers a tool to operate, especially in cases where flow rate data are not available.
Further studies will include a similar approach to the flow rate, in order to explore the potentiality of these techniques to a different dataset. In addition, in other time series in which the hourly data are available, different time bases can be considered, so that seasonal patterns could arise and could be implemented in the models.