## Abstract

In this paper a statistical study on the time series of water levels measured, during 2014, in the water tank of Cesine, Avellino (Italy), is presented. In particular, the autoregressive integrated moving average (ARIMA) forecasting methodology is applied to model and forecast the daily water levels. This technique combines the autoregression and the moving average approaches, with the possibility to differentiate the data, to make the series stationary. In order to better describe the trend, over time, of the water levels in the reservoir, three ARIMA models are calibrated, validated and compared: ARIMA (2,0,2), ARIMA (3,1,3), ARIMA (6,1,6). After a preliminary statistical characterization of the series, the models' parameters are calibrated on the data related to the first 11 months of 2014, in order to keep the last month of data for validating the results. For each model, a graphical comparison with the observed data is presented, together with the calculation of the summary statistics of the residuals and of some error metrics. The results are discussed and some further possible applications are highlighted in the conclusions.

## INTRODUCTION

*V*(

_{i}*t*) be the volume of water inside the tank at time

*t*,

*h*(

*t*) be the water level inside the tank and

*Q*the generic flow rate (for instance positive if it enters the tank, negative otherwise). The continuity equation coupled with the tank law read respectively: where

_{i}*A*is the net surface of the tank of constant horizontal section. From the combination of Equations (1) and (2), it follows that there is a straight relation between flow rates and the water level.

_{t}In water applications, forecasting, even if complex, is beneficial for many reasons. First of all, investments in water supply systems can be extremely expensive, involving millions to hundreds of millions of euros, thus implying the need for forecasting a scenario which is naturally evolving because of the aging of pipelines, water tank degradation (Viccione *et al.* 2019a), water leakages, sizing, etc. Minimizing investment costs is therefore essential either in a short- or long-term interval when planning new developments or system expansion. Predictions are also relevant in processes for reviewing prices (Herrera *et al.* 2010). Secondly, as water is a precious resource to preserve for environmental and financial reasons, it is of interest to know in advance what the water demand is expected to be in the short term, allowing sustainable exploitation. In addition, short-term forecasting of water use helps in optimizing day-to-day utility operations and planning maintenance schedules (Shabani *et al.* 2018).

Several forecasting methods can be adopted in water applications, e.g. per capita (or per customer) model: end-use models which rely on customer behaviour; regression models, e.g. as in Maidment *et al.* (1985), Zhou *et al.* (2000), Bakker *et al.* (2014) and Shabani *et al.* (2017), which are appropriate in the case of time-varying water prices, personal income, and population growth; and time-series/extrapolation models including averaging, trend analysis, exponential smoothing and autoregressive integrated moving average (ARIMA), in which the prediction is based on projecting past trends.

The latter forecasting technique is based on the Time Series Analysis (TSA) approach. These models range from straight averages of time series to relatively highly elaborated statistical models. The ones based on autoregressive integrated moving average techniques, both seasonal (SARIMA) and not (ARIMA), are powerful models able to predict the slope of data evolving in time (Box & Pierce 1970; Chatfield 1975; Box & Jenkins 1976). These techniques have been developed mainly in statistics, and then applied in economics, management, physics, and elsewhere (Milanato 2008; Manganelli & Tajani 2015). Some of the authors have applied the TSA deterministic decomposition and the ARIMA models to acoustical noise in urban areas (Guarnaccia *et al.* 2014a, 2014b, 2017) and in proximity of an airport, also in combination with a non-homogeneous Poisson distribution (Guarnaccia *et al.* 2016). Other applications by some of the authors to electric consumption by public transportation in Sofia, Bulgaria, and to air pollution in Nuevo Leon, Monterrey, Mexico, can be found respectively in Tepedino *et al.* (2015) and in Guarnaccia *et al.* (2014c).

Concerning water demand, water tank levels, inflows and outflows, extrapolation models usually rely on historical data on water use to forecast future trends. Statistical methods such as Multiple Linear Regression (MLR) and ARIMA models are commonly used for short-term urban water demand forecasting. Among the pioneers, Maidment & Miaou (1986) used ARIMA in the daily demand model based on a study of nine American cities. Soon after, Smith (1988) developed a time series model to forecast daily municipal water use in Washington, USA. Wong *et al.* (2010) applied seasonal autoregressive moving series models to analyse the structure of daily urban water consumption in Hong Kong as a function of rainfall. Two autoregressive integrated moving average models, one non-seasonal (daily, ARIMA model) and one seasonal (monthly, SARIMA model), to predict daily and monthly inflows into the Elephant Butte Reservoir, New Mexico, are adopted and discussed in Zamani Sabzi *et al.* (2018). A parallel adaptive weighting strategy of water consumption forecasting for the next 24–48 h, using univariate time series of potable water consumption, is proposed in Sardinha-Lourenço *et al.* (2018). Long-term forecasting of water consumption in the Hohoe municipality, Ghana, is proposed by Amponsah *et al.* (2015), using ARIMA as well. Suhartono and co-workers (Suhartono *et al.* 2018) proposed a hybrid class of methods based on Singular Spectrum Analysis (SSA) decomposition, Time Series Regression (TSR), and ARIMA, known as hybrid SSA-TSR-ARIMA, for water demand forecasting. A comparison among multiple linear (MLR) and nonlinear regression, ARIMA, artificial neural network (ANN) and hybrid wavelet transforms and artificial neural network methods (WA-ANN) for urban water demand prediction in Montreal, Canada, was made by Adamowski *et al.* (2012). Other technique comparisons were made by Mohammadi *et al.* (2005), de Lima *et al.* (2016), between ARIMA and ANN, and by Antunes *et al.* (2018) between machine learning and ARIMA. The latter paper concludes that it is difficult to affirm that one of the two techniques is always better than the other, since in some cases ARIMA performs better than machine learning and vice versa. The main conclusion is that in order to determine which strategy performs better, the peculiarities of the data of the case under study should be taken into account. Moreover, one of the most interesting features of ARIMA techniques is that they require as input only the measured data of the parameter that it is going to be forecast. This is a strong simplification with respect to machine learning approaches, which usually require several parameters in input.

This large literature about water demand and related parameters forecasting shows the high interest from the scientific community. Anyway, reservoir or tank water level prediction is very important in reservoir management, in order to support the decision system. For instance, Ashaary *et al.* (2015), Rani & Parekh (2014) and Valizadeh & El-Shafie (2013) underline the importance of predicting water reservoir levels, implementing interesting models based on ANN models. Nwobi-Okoye & Igboanugo (2013) perform daily water level predictions in a reservoir, comparing ANN and ARIMA techniques, concluding that the accuracy of the ANN model increases with increasing inputs, while the ARIMA model yields the best prediction in the considered dataset.

In this paper, ARIMA models are introduced as predictive tools for short-term daily average water tank level forecasting in a rural area. This study is motivated by the fact that, in Italy, especially in rural towns, a single water tank can be used to serve the small area. For larger towns, the methodology used in this paper can be easily extended to any number of interconnected tanks, opening the way to studies about the relations and the influence between the models applied in each tank. Very often the tanks are equipped with water level meters and more rarely with flow rate meters (Viccione *et al.* 2019b). Thus, the potential interest of water managers can be to take a decision on the basis of the post-processing of recorded water levels in tanks. By predicting short-term water levels, it is possible to prevent overflow discharges in water tanks and to optimize management plans for water distribution, e.g. during drought events.

## METHODOLOGY

*p*,

*d*,

*q*), where

*p*is the order of the autoregressive part,

*d*is the order of the differencing and

*q*is the order of the moving-average process. The general source formula is: in which

*Y*is the value of the series observed at the time

_{t}*t*,

*B*is the delay operator,

*ϕ*are the autoregressive polynomials,

*θ*are the moving average polynomials and

*e*is the difference between the observed value

_{t}*Y*and the forecast at the time

_{t}*t*.

### Error metrics

To have a numerical comparison of the effectiveness of the proposed models, some error metrics can be adopted. In this paper, the authors use the Mean Percentage Error (MPE), the Coefficient of Variation of the Error (CVE) and the Mean Absolute Scaled Error (MASE), defined as follows.

In this case, according to the features of the dataset that will be presented in the following sections, a simple naïve model based on the assumption that the data of today is the same as yesterday is applied, choosing a lag *k* = 1 in the above formula (Hyndman & Athanasopoulos 2013).

## DATASET ANALYSIS

This statistical study was applied on the time series of the levels observed in one of the reservoirs of the water supply system of the town of Avellino, Italy, located in the area of Cesine. Data refer to daily average water levels, expressed in metres, measured in 2014, from January to December. The original dataset was divided into two subsets: the first 333 values were used to perform the calibration and, therefore, the estimation of the model parameters (calibration dataset), the remaining ones were used for the validation phase (validation dataset).

The calibration dataset runs from 2 January 2014 to 30 November 2014. It shows some missing values, due to the occurrence of holidays and other, not identified, events. So, in the first 333 days of the year only 280 measurements were acquired. The last month of 2014, from 1 to 31 December, was used for validation of the model. This subset is made of 31 data: 26 measured and five imputed by calculating the average of the two closest available measurements as already explained above.

The summary statistics of the 280 calibration data (Table 1) show a low value of the standard deviation, if compared with the average of the levels. Thus, the series seems fairly regular over time, without frequent changes of great relevance. Furthermore, since the values of asymmetry and kurtosis are very close to zero, the data can be considered normally distributed, as extracted from a normal, or Gaussian, population.

Sample size . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

280 | 2.73 | 0.64 | 2.7 | 1.4 | 4 | −0.16 | −0.89 |

Sample size . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

280 | 2.73 | 0.64 | 2.7 | 1.4 | 4 | −0.16 | −0.89 |

Then, the calibration dataset was reconstructed by imputing the missing values with the average of the immediately preceding and the immediately following one. Generally, for not-seasonal time series such as the one under study, the imputation technique offers good results since the linear interpolation between two contiguous values takes into account the temporal location of the omitted value with respect to the contiguous ones.

The result of this elaboration is a calibration dataset that consists of 280 observed and 53 reconstructed values. In Figure 1(a) and 1(b) the complete reconstructed time series has been presented, together with the ‘duration plot’ of the data (sorted from the highest to the lowest).

The autocorrelation plot of the reconstructed series, Figure 1(c), shows that the function decreases as the lag increases. So, a periodicity in the data cannot be identified. In these cases, forecasting through a seasonal deterministic model does not seem plausible but the scenario suggests the use of a stochastic technique as the ARIMA model.

In order to complete the characterization of the time series, some common statistical tests have been carried out. In particular, the Ljung–Box (LB) and Box–Pierce (BP) tests were used to verify a possible autocorrelation, while Lee–White–Granger and Terasvirta–Lin–Granger tests were used to check linearity. The results (Table 2) show that the autocorrelation can be considered statistically significant in this case and that a linear process is present. Thus, it seems reasonable to use linear models like those of the ARIMA class.

Test . | . | d
. | p-value
. |
---|---|---|---|

Ljung–Box | 1,366.32 | 20 | <2.2e-16 |

Box–Pierce | 1,389.20 | 30 | <2.2e-16 |

Lee–White–Granger | 1.9282 | 2 | 0.3813 |

Terasvirta–Lin–Granger | 2.2767 | 2 | 0.3203 |

Test . | . | d
. | p-value
. |
---|---|---|---|

Ljung–Box | 1,366.32 | 20 | <2.2e-16 |

Box–Pierce | 1,389.20 | 30 | <2.2e-16 |

Lee–White–Granger | 1.9282 | 2 | 0.3813 |

Terasvirta–Lin–Granger | 2.2767 | 2 | 0.3203 |

Finally, the Augmented Dickey–Fuller (ADF) and Phillips–Perron (PP) tests (Table 3), performed to verify the stationarity, show a tendency to reject the null hypothesis, i.e. the presence of unit roots and the non-stationarity of the process. Thus, the analysed series presents a stationary trend: the mean, variance and autocorrelation does not significantly change during the overall observation time.

Test . | Statistic of the test . | Lag . | p-value
. |
---|---|---|---|

Augmented Dickey–Fuller | −6.4257 | 6 | 0.01 |

Phillips–Perron | −4.4054 | 5 | 0.01 |

Test . | Statistic of the test . | Lag . | p-value
. |
---|---|---|---|

Augmented Dickey–Fuller | −6.4257 | 6 | 0.01 |

Phillips–Perron | −4.4054 | 5 | 0.01 |

## CALIBRATION OF THE MODEL

In order to better describe the trend, over time, of the water levels in the reservoir, three ARIMA models were performed. Coefficients were estimated on the first 333 calibration periods by means of the statistical program R and using likelihood maximization as the technique of parameter estimation.

### Model ARIMA (2,0,2)

*t*, provided by the model, is described by Equation (7):

Table 4, instead, reports the estimated value of the coefficients of the model and their relative standard errors. The value of the intercept (about 2.7 m) can be considered the average value of the water level in the tank during the calibration period.

. | Intercept, [m] . | AR1, . | AR2, . | MA1, . | MA2, . |
---|---|---|---|---|---|

Estimated value | 2.7314 | 1.8491 | −0.8926 | −0.4743 | −0.1794 |

Standard error | 0.0634 | 0.0389 | 0.0354 | 0.0697 | 0.0659 |

. | Intercept, [m] . | AR1, . | AR2, . | MA1, . | MA2, . |
---|---|---|---|---|---|

Estimated value | 2.7314 | 1.8491 | −0.8926 | −0.4743 | −0.1794 |

Standard error | 0.0634 | 0.0389 | 0.0354 | 0.0697 | 0.0659 |

The graphical comparison between the observed data and ARIMA model (2,0,2) (Figure 2(a)) shows how the curve of the predicted data (dotted curve) follows very closely that of the observed ones (solid curve). The time horizon of the forecast is one day. Only the fastest fluctuations of the level are not well foreseen since the model needs, at least, one period of time (in this case one day) to settle down after a sharp change in the slope of the observed signal.

### Model ARIMA (3,1,3)

*t*, is described by Equation (8):

This model, like the previous one, has a forecasting time horizon of one day.

Table 5 shows the estimated value of the coefficients of the model and their relative standard errors.

. | AR1, . | AR2, . | AR3, . | MA1, . | MA2, . | MA3, . |
---|---|---|---|---|---|---|

Estimated value | 1.2350 | 0.2090 | −0.5201 | −0.8703 | −0.5839 | 0.4542 |

Standard error | 0.2225 | 0.3822 | 0.1782 | 0.2154 | 0.2689 | 0.0950 |

. | AR1, . | AR2, . | AR3, . | MA1, . | MA2, . | MA3, . |
---|---|---|---|---|---|---|

Estimated value | 1.2350 | 0.2090 | −0.5201 | −0.8703 | −0.5839 | 0.4542 |

Standard error | 0.2225 | 0.3822 | 0.1782 | 0.2154 | 0.2689 | 0.0950 |

### Model ARIMA (6,1,6)

*φ*

_{1},

*φ*

_{2},

*φ*

_{3}) and the first three moving average parameters (ϑ

_{1}, ϑ

_{2}, ϑ

_{3}) were manually set to zero, performing the likelihood maximization with the new function. A differentiation of order one of the series was also carried out. The model forecasting formula is reported in Equation (9):

Table 6 shows the model coefficients with the relative standard errors.

. | AR4, . | AR5, . | AR6, . | MA4, . | MA5, . | MA6, . |
---|---|---|---|---|---|---|

Estimated value | 0.2594 | −0.0225 | −0.4463 | −0.1590 | 0.2224 | 0.3933 |

Standard error | 0.3138 | 0.3347 | 0.1301 | 0.3416 | 0.3163 | 0.1532 |

. | AR4, . | AR5, . | AR6, . | MA4, . | MA5, . | MA6, . |
---|---|---|---|---|---|---|

Estimated value | 0.2594 | −0.0225 | −0.4463 | −0.1590 | 0.2224 | 0.3933 |

Standard error | 0.3138 | 0.3347 | 0.1301 | 0.3416 | 0.3163 | 0.1532 |

The graphical comparison, Figure 2(c), shows how the curve of ARIMA model (6,1,6) (dot-dashed line) follows quite precisely the one related to the observed data (black line) but the extension of the forecasting horizon from one to three days is paid with a loss of precision. Furthermore, it is possible to observe the typical delay of the expected signal with respect to the observed one. As can be expected this model needs three periods to settle any rapid variations in the series.

## VALIDATION OF THE MODEL

In order to verify the forecasting abilities of the three proposed models, a validation phase was carried out by using, as already explained, the values of the daily water level registered in the tank of Cesine in the month of December 2014.

The summary statistics of the forecast error are reported in Table 7.

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.02 | 0.20 | 0.04 | −0.92 | 0.44 | −2.75 | 11.89 |

(3,1,3) | −0.10 | 0.20 | −0.09 | −1.03 | 0.27 | −2.84 | 12.04 |

(6,1,6) | 0.11 | 0.46 | 0.05 | −0.56 | 1.36 | 0.65 | −0.14 |

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.02 | 0.20 | 0.04 | −0.92 | 0.44 | −2.75 | 11.89 |

(3,1,3) | −0.10 | 0.20 | −0.09 | −1.03 | 0.27 | −2.84 | 12.04 |

(6,1,6) | 0.11 | 0.46 | 0.05 | −0.56 | 1.36 | 0.65 | −0.14 |

Figure 3(a) shows the comparison between the level value observed in December (continuous black line) and the level predicted by ARIMA model (2,0,2) (dotted line). In the first days, the model's delay in anticipating the sharp change in slope in the observed series is manifested. Starting from period 337 the forecast manages to follow very well the trend of the data which present very progressive and not rapid variations in value.

Model (3,1,3), instead, averagely overestimates the data observed throughout the validation period, as can be noticed in Figure 3(b).

Figure 3(c) shows the comparison between the observed water level and the forecast level provided by ARIMA model (6,1,6). The forecast shows the typical delay, since the information extracted to predict the series is that of the three previous days.

## RESULTS AND DISCUSSION

In order to quantitatively compare the proposed models, the analysis of the forecast errors in the calibration phase (residuals) and in the validation phase (errors) has been executed. ARIMA model (2,0,2) has both an average of residuals and a median equal to zero and the standard deviation is particularly low, too (Table 8). Model (3,1,3) confirms the tendency to overestimate the real value of the level showing a negative average and median of the residuals. The asymmetry and kurtosis indices suggest a normal distribution of the residuals for all the three proposed models. ARIMA model (2,0,2) has the lowest mean value of residuals as shown in Table 8. The residuals distribution of model (6,1,6) exhibits the highest standard deviation, even though the mean residual is very low. This worsening in the performances is balanced by the advantage of giving a broader forecasting horizon.

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.00 | 0.15 | 0.00 | −0.52 | 0.52 | −0.22 | 2.18 |

(3,1,3) | −0.12 | 0.15 | −0.12 | −0.63 | 0.38 | −0.15 | 2.03 |

(6,1,6) | −0.01 | 0.57 | 0.02 | −2.40 | 1.56 | −0.44 | 1.30 |

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.00 | 0.15 | 0.00 | −0.52 | 0.52 | −0.22 | 2.18 |

(3,1,3) | −0.12 | 0.15 | −0.12 | −0.63 | 0.38 | −0.15 | 2.03 |

(6,1,6) | −0.01 | 0.57 | 0.02 | −2.40 | 1.56 | −0.44 | 1.30 |

Residuals of ARIMA model (2,0,2) appear to be symmetrically and regularly distributed (Figure 4(a)). As regards the other two models there are slight asymmetries in the histograms, which appear of modest magnitude, also when evaluating the quantile–quantile diagrams of Figure 5.

Figure 5 shows the quantile–quantile diagrams (Q–Q plot) of the residuals for all the three proposed models. This graph compares the cumulative distribution of the observed variable with the cumulative distribution of a normal distribution. If the observed variable has a normal distribution, the points of this joint distribution gather on a diagonal directed from the bottom to the top and from left to right. In the Q–Q plots shown we can see that all three models provide normally distributed forecast errors, i.e. they do not differ significantly from this desired distribution.

Thus, in general the proposed ARIMA models guarantee a forecast error apparently due only to random fluctuations well described by a normal distribution.

As for the autocorrelation of the residuals (Figure 6), models (2,0,2) and (3,1,3) present a very low autocorrelation. Model (6,1,6), instead, seems not to have been able to exploit all the autocorrelation of the series. This result was predictable due to the fact that the lowest lag data (1, 2 and 3) could not be used in order to extend the forecast horizon to the next three days.

In Table 9, the summary statistics of the errors calculated in the validation phase are reported. As expected, the error distributions tend to get worse with respect to the calibration phase. Anyway, the mean errors and the standard deviations still give good results.

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.02 | 0.20 | 0.04 | −0.92 | 0.44 | −2.75 | 11.89 |

(3,1,3) | −0.10 | 0.20 | −0.09 | −1.03 | 0.27 | −2.84 | 12.04 |

(6,1,6) | 0.11 | 0.46 | 0.05 | −0.56 | 1.36 | 0.65 | −0.14 |

ARIMA model . | Mean [m] . | Std dev. [m] . | Median [m] . | Min [m] . | Max [m] . | Skewness . | Kurtosis . |
---|---|---|---|---|---|---|---|

(2,0,2) | 0.02 | 0.20 | 0.04 | −0.92 | 0.44 | −2.75 | 11.89 |

(3,1,3) | −0.10 | 0.20 | −0.09 | −1.03 | 0.27 | −2.84 | 12.04 |

(6,1,6) | 0.11 | 0.46 | 0.05 | −0.56 | 1.36 | 0.65 | −0.14 |

Finally, in Table 10 the values of the average percentage error (MPE), of the error variation coefficient (CVE) and of the MASE are reported, in both the phases. Model (2,0,2) presents the lower average error (MPE) and the lower error variation coefficient (CVE), both in calibration and in validation, compared with the other models. This result confirms that for this time series, it does not seem useful to carry out the differentiation of order one and therefore the preferable model among those proposed to obtain a forecast with a time horizon of one day is ARIMA (2,0,2). Model (6,1,6) shows a worsening of the values of the error metrics offset, basically due to the extension of the forecast horizon.

Phase . | ARIMA model . | MPE . | CVE . | MASE . |
---|---|---|---|---|

Calibration | (2,0,2) | −0.306 | 0.055 | 0.764 |

Calibration | (3,1,3) | −4.959 | 0.070 | 1.123 |

Calibration | (6,1,6) | −1.781 | 0.209 | 3.145 |

Validation | (2,0,2) | 0.278 | 0.070 | 0.858 |

Validation | (3,1,3) | −4.035 | 0.077 | 1.105 |

Validation | (6,1,6) | 4.471 | 0.161 | 2.922 |

Phase . | ARIMA model . | MPE . | CVE . | MASE . |
---|---|---|---|---|

Calibration | (2,0,2) | −0.306 | 0.055 | 0.764 |

Calibration | (3,1,3) | −4.959 | 0.070 | 1.123 |

Calibration | (6,1,6) | −1.781 | 0.209 | 3.145 |

Validation | (2,0,2) | 0.278 | 0.070 | 0.858 |

Validation | (3,1,3) | −4.035 | 0.077 | 1.105 |

Validation | (6,1,6) | 4.471 | 0.161 | 2.922 |

## CONCLUSIONS

In this paper the problem of modelling the time series of water levels in a reservoir connected to the supply system of a city has been discussed. In particular, a methodology to model and forecast the daily levels was applied.

The univariate time series is related to the daily measurements of the water levels, observed during the year 2014, in the reservoir of the Cesine area connected to the network for the drinking water use of the city of Avellino (Italy). Some measurements, corresponding to public holidays, were missing and, so, were reconstructed imputing the average value of the two temporally closest measurements, one previous and one following the missing period. In this way a complete time series of 364 periods was obtained.

Once the time series was statistically characterized, a stochastic modelling of the ARIMA class was proposed. The statistical analysis showed that the series of levels does not present evident periodicity or seasonality, which could be used for the construction of a deterministic or seasonal model. Furthermore, the series was linear and stationary. Then, the dataset was divided into two subgroups: 333 calibration periods, for the estimation of the coefficients of the proposed models, and 31 validation periods, corresponding to the measurements of the levels in December 2014.

The results of the modelling were very encouraging both from comparing, graphically, the observed level with the expected one, and by analysing the distribution of the forecast error from a quantitative point of view. The best forecast results, both in the calibration phase, i.e. during parameter estimation, and in the validation phase, were obtained with an ARIMA model (2,0,2). A model with differentiation of the first-order series, ARIMA (3,1,3), was also tested, however it did not provide clear improvements in the forecast. Both of these models are characterized by a one-day forecasting horizon. To obtain a longer time horizon, an ARIMA model (6,1,6) was implemented. It provided slightly worse forecasts but with a three-day forecast horizon. Therefore, a possible and novel approach could be to implement two models in parallel: the ARIMA (6,1,6) model could be used for ‘medium term’ (three days ahead) rough predictions, combined with the ARIMA (2,0,2) model, used to provide short-term (one step ahead, daily) forecasts. The use of these techniques on water data showed excellent potential, presenting a possible way for integration between the current water service monitoring systems and the forecasts obtained with this methodology. The contribution of this paper is to extend the studies on the adoption of ARIMA techniques on this kind of data, dealing with a particular case study and focusing on the methodology to choose the best TSA model for the data under study. In the authors' opinion, the proposed approach could give water managers a tool to operate, especially in cases where flow rate data are not available.

Further studies will include a similar approach to the flow rate, in order to explore the potentiality of these techniques for a different dataset. In addition, in other time series in which the hourly data are available, different time bases can be considered, so that seasonal patterns can arise and can be implemented in the models.

## REFERENCES

*MATEC Web of Conferences*,

**125**, 05013 (21st International Conference on Circuits, Systems, Communications and Computers (CSCC 2017))