The specialized literature on water demand forecasting indicates that successful predicting models are based on soft computing approaches such as neural networks, fuzzy systems, evolutionary computing, support vector machines and hybrid models. However, soft computing models are extremely sensitive to sample size, with limitations for modeling extensive time-series. As an alternative, this work proposes the use of the dynamic time scan forecasting (DTSF) method to predict time-series for water demand in urban supply systems. Such a model scans a time-series looking for patterns similar to the values observed most recently. The values that precede the selected patterns are used to create the prediction using similarity functions. Compared with soft computing approaches, the DTSF method has very low computational complexity and is indicated for large time-series. Results presented here demonstrate that the proposed method provides similar or improved forecast values, compared with soft computing and statistical methods, but with lower computational cost. Thus, its use for online water demand forecasts is favored.

  • Novel analog-based methodology to forecasting in univariate time-series.

  • A fast time-series forecasting methodology for large data sets.

  • The great advantage of this data-oriented method is that, given a large amount of data, in general, the performance improves.

  • The method has very low computational complexity, thus, its use for online water demand forecasts is favored.

  • There is no best model for predicting daily water demand.

Graphical Abstract

Graphical Abstract
Graphical Abstract

The growing demand for water and the imminent risk of shortages are associated with the increase in urban population, per capita consumption and irregular distribution of rainfall. In the last 100 years, the world population has increased threefold, while water consumption has increased sixfold (Cirilo 2015). Considering population growth, between 2009 and 2050, the world population is projected to increase by around 2.3 billion inhabitants, from 6.8 to 9.1 billion, at the same time that the urban population will increase from 3.4 billion in 2009 to 6.3 billion in 2050, increasing the stress on water availability (WWAP 2012).

Despite all the existing urban water infrastructures, many cities are currently facing water stress. In fact, a large proportion of the world population has been affected by water stress (Vörösmarty et al. 2000; Maddocks et al. 2015). Studies indicate that one quarter (25% ± 4%) of the population in large cities, or 381 ± 55 million people, has water supplies that are stressed (McDonald et al. 2014). This scenario represents a major challenge for providing a sufficient amount of water of adequate quality.

To minimize the effects of water stress, water supply system (WSS) management techniques have been used to maintain a balance between water supply and demand. This balance is achieved by applying operational actions, many of which require the application of statistical tools for water demand forecasting.

The importance of water demand forecasting has been justified by the increase in interest among researchers and professionals. Over the past 20 years, the number of articles published on this topic has increased exponentially (Groppo et al. 2019), reflecting the need to develop efficient systems to manage water demand. The literature presents numerous models for urban water demand forecasting using statistical techniques such as linear regression (Campisi-Pinto et al. 2012; Santos & Pereira Filho 2014), nonlinear regression (Adamowski et al. 2012), time-series analysis (Caiado 2010; Huang et al. 2014; Al-Zahrani & Abo-Monasar 2015; Arandia et al. 2016; Rajballie et al. 2022), similarity-based approaches (Alvisi et al. 2007; Bakker et al. 2013; Tian et al. 2016; Gagliardi et al. 2017a; Pacchin et al. 2019), Markov chains (Gagliardi et al. 2017b; Pacchin et al. 2019) and techniques based on Soft Computing.

For water demand forecasting, several Soft Computing methods are presented and applied to historical water demand series. It is known that these series have stochastic and nonlinear components, making water demand forecasting a complex issue. In this context, Soft Computing methods such as Fuzzy Logic (Firat et al. 2009a; Ambrosio et al. 2019), Neural Computing (Firat et al. 2009b, 2010; Santos & Pereira Filho 2014; Al-Zahrani & Abo-Monasar 2015; Pacchin et al. 2019), Evolutionary Computation (Bai et al. 2014; Romano & Kapelan 2014; Leon et al. 2020; Shirkoohi et al. 2021), Support Vector Machines (Herrera et al. 2010; Brentan et al. 2016; Ambrosio et al. 2019), Random Forests (Chen et al. 2017; Ambrosio et al. 2019), Long Short-Term Memory (Boudhaouia & Wira 2021), Dual-Scale Deep Belief Network (Xu et al. 2018), Continuous Deep Belief Echo State Network (Xu et al. 2019b) and hybrid models (e.g., Nasseri et al. 2011; Adamowski et al. 2012; Campisi-Pinto et al. 2012; Odan & Reis 2012; Huang et al. 2014, 2021, 2022; Tiwari & Adamowski 2015; Guo et al. 2022; Rajballie et al. 2022) have provided more accurate results for urban water demand forecasting. In general, hybrid models are more robust for water demand forecasting compared with Feed Forward Neural Network (FFNN), Multiple Linear Regression (RLM), Multiple Nonlinear Regression (MNLR) and ARIMA models.

In general, the main factors that impact urban water demand are often difficult to identify using traditional algorithms. For instance, Xu et al. (2019a) applied the energy spectrum (Oshima & Kosuda 1998) and the highest Lyapunov coefficient (Tsonis 1992) to examine the main characteristics of the water demand time-series. Results indicate that the water demand time-series can be represented as chaotic time-series. Soft Computing methods, without proper pre-processing, can become unstable and produce erroneous results if applied to water demand forecasting (Zhang & Qi 2005).

Simultaneously, with the profusion of methods developed by the scientific community, the digitization process has been advancing. Consequently, data acquisition has been increasing, creating a ‘big data’ problem. De Mauro et al. (2014) claim that big data represents information assets characterized by high volume, speed and variety, which require specific technology and analytical methods to turn it into value. Data values can be divided into three groups. Values associated with the characteristics of the data set; values associated with the specific technology and analytical methods for manipulating the data; and, values associated with the insights, i.e., knowledge extracted from the data. Therefore, the goal of big data analytics is knowledge discovery from massive data sets (Chen & Han 2016).

Due to the rapid increase in data volume, a large amount of storage is necessary, requiring greater bandwidth and high latency in data processing (Caiza et al. 2020), making the Industrial Internet of Things (IIoT) a challenge for the current infrastructure (Sabireen & Neelanarayanan 2021). A solution to mitigate this problem is ‘cloud computing’, which, according to Sabireen & Neelanarayanan (2021), constitutes a new component of Industry 4.0. Cloud computing solves the big data problem, reducing energy consumption in network industrial sensors, improving security, processing and real-time data storage. In addition to solving major limitations in processing large databases, algorithms with low computational complexity (CC) have been developed (Arnaiz-González et al. 2016; Baldán & Benítez 2019; Sarma et al. 2019; Baldán et al. 2021).

In this scenario, the present work evaluates the feasibility of using a novel, analog-based methodology named dynamic time scan forecasting (DTSF) (Costa et al. 2021) to perform multi-step forecasting in univariate time-series in order to predict short-term (hourly) demand in WSS. This is done by comparing several known univariate alternatives in terms of computational efficiency and cost. Statistical, machine learning and hybrid approaches were used. As in Costa et al. (2021), this study employed the Naive Bayes (naive 1 method), the Pattern Sequence-Based Forecasting (PSF) analog approach, the time-series-based approach Box–Jenkins (SARIMA), Trigonometric Box–Cox transformation model, ARMA errors, Trend and Seasonal components (TBATS), the hybrid model using Seasonal and Trend decomposition using Loess filter (STL) and the exponential smoothing method (ETS) STL + ETS, the Hybrid.2 approach, and the NNET.2 approach – an automatic method that employs extreme learning machines (ELM). Additionally, we used the hybrid autoregressive neural network approach with bootstrap (BNNAR), THETA model and the ETS, which, according to Makidrakis et al. (2018), comparing numerous statistical and machine learning methods to predict a step forward, obtained the best accuracy among all evaluated methods.

Study area

The investigated water supply zone (SZ) is located in the south-central region of a capital in southeastern Brazil. The analysis period was from 00:00 h on January 1st, 2015, until 24:00 on December 31st, 2016. This region is predominantly residential (with different social strata) and has broadly varied businesses. According to information from the responsible concessionaire, SZ had 35,710 connections, 98,449 households and an average distributed flow of 861.05 L/s in September 2017 (Figure 1). The referenced SZ serves a population of approximately 230,000 people, comparable to a medium-sized city, through two water treatment plants.
Figure 1

The supply zone area (SZ).

Figure 1

The supply zone area (SZ).

Close modal

The water demand time-series data were monitored using the ‘Data Logger’ device, equipment equipped with nonvolatile memory that performs data acquisition with the most varied types of existing sensors. The equipment was configured to register the flow rate (L/s) of the water exiting the water treatment plant (WTP-A) at 5 min intervals, and on the macrometer that connects the WTP-B to the study area at 15 min intervals. In the analyzed period, missing data comprise days with no data collection due to failures in the ‘Data Logger’ device. Missing data were replaced using the Naive Bayes imputation method. The choice of this method is due to its simplicity.

Prior to the forecasting analysis, the data were aggregated into 1 h intervals, using the mean function. The final data set showed a mean water flow rate of 533.63 (L/s). The forecast horizon was set at 24 h; i.e., a 24-step ahead.

Table 1 presents the descriptive measures of the historical demand series used in this study. The median flow rate for the period analyzed was 522.81 L/s, standard deviation of 123.16 L/s and coefficient of variation of 23.08%.

Table 1

Descriptive measures of the historical demand series

StatisticUnits
Mean 533.63 (L/s) 
Median 522.81 (L/s) 
Standard deviation 123.16 (L/s) 
Minimum 116.57 (L/s) 
Maximum 947.18 (L/s) 
First quantile 438.48 (L/s) 
Third quantile 629.64 (L/s) 
Coefficient of variation 23.08% 
StatisticUnits
Mean 533.63 (L/s) 
Median 522.81 (L/s) 
Standard deviation 123.16 (L/s) 
Minimum 116.57 (L/s) 
Maximum 947.18 (L/s) 
First quantile 438.48 (L/s) 
Third quantile 629.64 (L/s) 
Coefficient of variation 23.08% 

DTSF method

The DTSF method is based on scan statistics (Glaz et al. 2009), which comprise a class of statistical methods aiming to estimate anomalous behavior in databases with temporal, spatial or spatio-temporal components. This method was originally presented by Joseph Naus (Naus 1965) and adapted for epidemiological surveillance systems using spatial (Kulldorff 1997), temporal and spatio-temporal data (Kulldorff et al. 1998; Kulldorff 2001). Briefly, a scanning window with a pre-fixed geometry scans the data, alternating its position and dimension. A test statistic is calculated for each window configuration. The configuration yielding the highest value for the test statistic represents a potential candidate for anomalous behavior. Statistical inference is obtained using Monte Carlo simulations (Mooney 1997), with the null hypothesis that the data set does not present anomalous data. Detailed information about scan statistics is found in Glaz et al. (2009).

The DTSF method scans a time-series using a fixed window. The objective is to find historical windows in which the data patterns are similar to the most recent values in the time-series. A test statistic, or similarity statistic, is calculated for each window. In addition, a similarity function is estimated for each window. The similarity function aims to define an equation for mapping historical data in the scanning window to the most recent data in the time-series. Once the most similar historical windows are detected, the respective similarity functions are applied to the subsequent values of the selected windows, thus generating future forecasts of the time-series.

Initially, let vector be defined as the last w observations of the time-series,
(1)
where are observations from a time-series of length and w also represents the length of the scanning window.
The objective of DTSF includes identifying patterns in the time-series, strongly correlated with the vector . Thus, the set of candidate vectors can be written as in the following equation:
(2)
where In this equation, the upper bound of the time sequence guarantees that the vector does not overlap with vector .
The DTSF method is shown in Figure 2. A scanning window of the same number of elements (w) is used to verify the previous values of the time-series.
Figure 2

Illustration of the DTSF method.

Figure 2

Illustration of the DTSF method.

Close modal
The goal of the DTSF method is to provide a time-series prediction. To do so, DTSF identifies patterns that are the most similar. Subsequent values referring to the most similar patterns are used as forecast values by applying a similarity function, shown in the following equation:
(3)
where , known as the similarity function, is a function that correlates elements of vector with elements of vector .
A constraint can be imposed on the methodology, such as . This constraint ensures that if the most correlated time-series window includes the values immediately prior to the vector , the forecast values will be a function of vector , as shown in the following equation:
(4)
As noted in Equations (3) and (4), the forecast values depend on the length of window w and function . A parametric proposal for function consists of a linear regression equation of the elements of vector . Thus, the historical values can be similar to the most recent values, except for a scaling factor and an offset factor. By assuming the similarity function to be a linear model, the parameters can be estimated to minimize the sum of the squares between the elements of vector and the linear equation: . In addition, the similarity statistic can be defined as the model coefficient of determination, (Montgomery et al. 2012), shown in the following equation:
(5)
where is the sample mean of vector and is the -th predicted value using the similarity function. It is worth mentioning that is within the unit interval [0–1]. If → 1 then the estimated values are very close to the observed values, i.e., the past observed values located at time t are similar to the last observed values after scaling and shift correction. The procedure for scanning and detecting similar historical windows is illustrated in Figure 3 using a 24-h window applied to an hourly time-series. The seven most similar historical windows (R2) are indicated in rectangles. For each window, a linear regression model is estimated.
Figure 3

Example of the application of the DTSF method using an hourly water demand time-series. The length of the scanning window is 24 h. A linear model (the similarity function) is estimated for each window. The past seven windows with higher similarity (according to their values) are indicated in rectangles.

Figure 3

Example of the application of the DTSF method using an hourly water demand time-series. The length of the scanning window is 24 h. A linear model (the similarity function) is estimated for each window. The past seven windows with higher similarity (according to their values) are indicated in rectangles.

Close modal
Using the similarity functions, subsequent data from the most similar historical windows are used to generate the forecast values, shown in Figure 4.
Figure 4

Forecasting estimates using similarity functions of the most similar scanning windows, according to the DTSF methodology.

Figure 4

Forecasting estimates using similarity functions of the most similar scanning windows, according to the DTSF methodology.

Close modal

The final predictions are generated using an aggregation function, such as average or median functions. The DTSF method requires three parameters: the length of the scanning window, w, the parametric specification of the similarity function (the degree of polynomial, j, chosen to set the similarity function) and the number of the best matches, m (the number of the most similar historical windows). To guarantee the high computational speed of the method, linear similarity functions such as linear, quadratic or cubic equations are preferable. The number of best matches can be selected dynamically using, for example, those windows with similarity statistics above a previously defined threshold. In the present study, the number of the most similar historical windows was previously chosen. Briefly, different values for the length of the scanning window and the number of best matches were evaluated. The length of the scanning window and the number of best matches with minimum forecasting error were selected.

An important requirement for the application of the DTSF method is the availability of a large time-series. It is worth mentioning that DTSF is a data-oriented method.

Statistical and machine learning methods used

The main Soft Computing tools used in this work are implemented in the R software. The Nonlinear Autoregressive Neural Network methods with Bootstrap, Autoregressive Integrated Moving Average, Trigonometric seasonality, Box–Cox transformation, ARMA errors, Trend and Seasonal components, Exponential Smoothing and Theta were applied using the Forecast package (Hyndman & Khandakar 2008; Hyndman et al. 2017). It is worth mentioning that the Theta model, developed by Assimakopoulos & Nikolopoulos (2000), became the winner of the M3 Competition and was used as one of the benchmark methods in the M4 Competition (Makidrakis et al. 2020). The hybrid model, using the STL filter with exponential smoothing (STL + ETS), and Hybrid.2 (consisting of an ensemble of seven different time-series forecasts – arima, tbats, ets, theta, stlm, snaive and nnetar – which are individually fitted and combined to create an ensemble mean forecast), were applied using the hybrid Model function from the forecast Hybrid package (Shaub & Ellis 2019). This package makes it possible to combine several models with equal weights, or with weights based on in-sample errors, and is based on the study by Bates & Granger (1969). The extreme learning machines ELM model was implemented using the nnfor package (Kourentzes 2017), which is based on studies by Crone & Kourentzes (2010) and Kourentzes et al. (2014). According to Lendasse et al. (2010), this method was ranked as the second-best method in ESTSP'08 – 2nd European Symposium of Time-Series Prediction. The PSF algorithm (Martinez Alvarez et al. 2011) was implemented using the PSF package (Bokde et al. 2017). The DTSF method was implemented using the DTScanF package (Costa & Mineti 2019).

Forecasts were made for all days of the month, with a 24 h horizon, for each method evaluated. Forecasting errors were used to form a matrix of errors with rows representing days and columns representing forecasting horizons. Errors were obtained from the differences between observed and predicted values. This procedure is summarized in the flowchart shown in Figure 5.
Figure 5

Flowchart for analysis and comparison of all evaluated methods.

Figure 5

Flowchart for analysis and comparison of all evaluated methods.

Close modal

Cross-validation

The use of cross-validation in time-series is recommended whenever possible (Hyndman 2014). Cross-validation using rolling windows is employed in the present study to evaluate a 24 h forecast horizon. In this procedure, the forecasts are performed by sequentially moving the test values to the training set and changing the origin of the forecasts. For each forecast, the models are recalibrated using all the data available in the training set (Bergmeier & Benítez 2012), as illustrated in Figure 6.
Figure 6

Cross-validation of time-series forecasts using rolling windows. Light gray blocks represent training sets and dark gray blocks represent test sets (Adapted from Bergmeier & Benítez 2012).

Figure 6

Cross-validation of time-series forecasts using rolling windows. Light gray blocks represent training sets and dark gray blocks represent test sets (Adapted from Bergmeier & Benítez 2012).

Close modal

According to Tashman (2000), rolling window cross-validation is the preferred procedure to evaluate forecasting performances.

CC, statistical methods and machine learning

The Naive Bayes method was applied at two moments in this study. The first moment was to impute missing data, caused by data collection problems, in the time-series. The second moment was to calculate CC. This was also used to determine the time required to train a model and use it for predictions. The CC is given by the following equation:
(6)

The metrics used to evaluate the accuracy of the candidate models are: mean square error (MSE), the root of the mean square error (RMSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE). Model Fitting (MF) was also used as a measure of the fitness of historical data to the prediction model; it represents the MSE of the model predictions, normalized by the average of the time-series under study.

The evaluation metrics were chosen based on the most applied metrics found in the literature review (RMSE, MAPE and MAE). However, the MF and CC metrics were used by Makidrakis et al. (2018). The respective equations are shown below:
(7)
(8)
(9)
(10)
(11)
where is the observed value of the water demand at time t, is the estimated value of demand and n is the forecast horizon.

Computational time was estimated using a computer having a CPU with 7th Generation Intel® Core ™ i7-7500 U @ 3.5 GHz processor, 16 GB DDR4 memory and 64× based processor.

Figures 7 and 8 show hourly and monthly data, respectively, from the sample period. In Figure 7, it can be observed that water production remains stable throughout the day, varying only at peak times when the cost of electricity is higher. However, a homogeneous dispersion was observed in the early hours of the day. In Figure 8, it can be observed that the lowest water consumption and the lowest dispersions occur in the autumn and winter seasons and in December. July has the lowest median and can be associated with the school vacation period.
Figure 7

Records of hourly average flow rate throughout the day.

Figure 7

Records of hourly average flow rate throughout the day.

Close modal
Figure 8

Records of monthly average flow rate throughout the two-year sample.

Figure 8

Records of monthly average flow rate throughout the two-year sample.

Close modal
Figure 9 shows the boxplots of the forecast errors of the evaluated models. It can be observed that the values for the second quartile are very close to zero, with the exception of the PSF method. The model with the smallest error dispersion was Naive Bayes, followed by DTSF. The largest interquartile ranges can be observed in the ETS, STL-ETS, SARIMA and THETA models. The results also show a heavy tail distribution with extreme values for these methods.
Figure 9

Boxplots of the forecast errors for the subsequent 24 h of the evaluated models.

Figure 9

Boxplots of the forecast errors for the subsequent 24 h of the evaluated models.

Close modal
Figure 10 demonstrates a wide dispersion of forecast errors for all models tested throughout 2016. A greater dispersion is observed in the summer and autumn months. However, the observed dispersion does not show clearly which model was better in the study period.
Figure 10

Evolution of the median of forecast errors, for the next 24 h, of each of the evaluated models.

Figure 10

Evolution of the median of forecast errors, for the next 24 h, of each of the evaluated models.

Close modal

Table 2 shows the results of predictions for the evaluated models. From January to May, and then in December, no model was more accurate than the others for forecasting. From June to November, the ELM method showed the best results. Regarding CC, the DTSF algorithm was extremely fast, as compared with the other methods.

Table 2

Evaluation of the forecasting models for an hourly forecast horizon using the root mean square error, average absolute error, average percentage error and computational complexity

Forecast monthNB
DTSF
ELM
RMSEMAEMAPECCRMSEMAEMAPECCRMSEMAEMAPECC
Jan 16 209.745 156.491 0.379 1.000 182.011 139.618 0.323 2.230 145.077 111.832 0.281 15,433.794 
Feb 16 183.982 145.982 0.301 1.000 147.250 112.660 0.229 1.036 134.271 107.674 0.210 390.773 
Mar 16 166.761 121.163 0.233 1.000 122.836 90.394 0.177 0.936 123.607 101.783 0.199 735.929 
Apr 16 141.742 102.277 0.185 1.000 123.432 95.721 0.180 1.873 116.152 90.609 0.159 742.174 
May 16 143.721 104.316 0.191 1.000 119.758 88.201 0.168 2.900 95.900 80.430 0.152 1,582.850 
Jun 16 149.416 107.567 0.208 1.000 117.162 85.972 0.170 1.978 98.230 79.438 0.157 1,343.134 
Jul 16 146.528 106.845 0.216 1.000 128.379 97.254 0.198 2.999 101.832 84.061 0.165 3,202.723 
Aug 16 123.719 81.809 0.154 1.000 109.401 77.338 0.150 2.118 90.620 72.558 0.137 2,460.946 
Sep 16 133.974 93.607 0.180 1.000 118.224 90.596 0.172 2.148 95.081 74.738 0.146 4,689.409 
Oct 16 148.850 113.865 0.234 1.000 123.264 93.284 0.195 2.276 107.985 88.033 0.183 3,632.866 
Nov 16 171.804 139.047 0.290 1.000 141.991 108.943 0.232 1.858 115.048 91.792 0.203 3,249.069 
Dec 16 152.384 115.530 0.267 1.000 133.700 96.193 0.224 2.311 103.291 82.606 0.196 4,362.896 
BNNAR
SARIMA
TBATS
Jan 16 151.753 112.491 0.285 2,734.733 214.723 162.299 0.381 1,215.395 174.200 129.920 0.298 1,395.868 
Feb 16 135.642 105.629 0.214 2,215.384 173.828 130.608 0.266 75.517 131.353 102.755 0.212 1,247.589 
Mar 16 115.860 95.741 0.186 2,641.903 155.864 115.328 0.231 83.812 118.280 94.931 0.190 1,837.428 
Apr 16 103.786 78.048 0.138 3,220.659 149.156 112.378 0.217 97.013 132.075 105.810 0.205 1,751.243 
May 16 101.266 79.276 0.144 6,005.786 131.096 98.502 0.193 511.578 109.265 86.394 0.170 2,797.262 
Jun 16 104.291 84.596 0.164 4,219.843 134.915 98.710 0.202 117.171 121.736 92.953 0.193 2,021.754 
Jul 16 107.118 90.339 0.181 6,865.314 135.242 98.178 0.208 182.392 115.083 87.799 0.186 3,333.398 
Aug 16 97.933 81.933 0.154 5,238.267 126.745 92.346 0.189 128.817 106.305 79.481 0.164 2,058.823 
Sep 16 104.829 88.543 0.169 6,203.078 128.884 95.397 0.187 144.644 121.547 91.489 0.179 2,412.095 
Oct 16 108.647 92.146 0.186 7,199.256 144.261 106.244 0.225 213.127 123.633 95.830 0.196 2,652.736 
Nov 16 121.898 97.024 0.213 6,612.746 152.601 113.928 0.250 477.352 132.453 98.412 0.210 2,185.039 
Dec 16 113.015 82.346 0.202 9,574.767 146.398 107.750 0.250 1,103.022 123.433 92.259 0.208 3,094.292 
ETS
STL_ETSHYBRID
Jan 16 213.707 162.322 0.378 52.817 216.071 166.601 0.385 104.382 176.488 134.700 0.320 1,655.709 
Feb 16 165.098 124.729 0.254 51.728 167.871 127.170 0.268 125.303 146.842 112.945 0.228 1,460.979 
Mar 16 148.331 110.633 0.225 59.732 143.909 111.058 0.222 114.901 125.762 99.579 0.198 1,640.583 
Apr 16 149.849 117.553 0.222 70.576 147.702 117.951 0.223 99.036 119.430 95.936 0.182 1,874.222 
May 16 133.367 108.852 0.208 149.174 117.948 89.126 0.171 163.454 112.123 89.137 0.172 3,266.722 
Jun 16 152.310 120.529 0.232 102.690 126.296 95.176 0.192 107.247 114.996 91.713 0.186 2,158.089 
Jul 16 131.575 98.838 0.204 143.758 121.445 93.172 0.193 178.819 115.819 91.565 0.192 3,481.806 
Aug 16 112.965 82.736 0.165 117.989 104.619 75.636 0.153 115.801 109.975 88.549 0.176 2,551.959 
Sep 16 126.392 94.812 0.181 127.450 116.838 88.206 0.171 179.993 115.912 92.689 0.182 2,704.730 
Oct 16 142.584 105.532 0.214 149.554 138.741 102.991 0.215 171.218 98.067 98.067 0.207 3,066.671 
Nov 16 147.768 113.436 0.242 112.067 148.777 110.931 0.247 139.069 135.237 103.686 0.236 2,451.597 
Dec 16 146.032 110.034 0.253 383.011 141.399 103.414 0.237 187.751 124.390 91.905 0.218 3,203.210 
THETA
PSF
Jan 16 211.083 159.680 0.370 1.898 179.452 139.340 0.343 826.263     
Feb 16 165.127 125.294 0.253 2.117 161.338 124.413 0.244 738.665     
Mar 16 146.005 108.254 0.219 2.099 138.917 106.329 0.214 827.778     
Apr 16 149.722 113.971 0.221 2.205 127.581 104.753 0.195 971.017     
May 16 132.070 96.776 0.192 3.725 117.770 89.641 0.173 1,660.057     
Jun 16 131.349 95.584 0.197 2.838 128.741 94.023 0.192 1,174.000     
Jul 16 133.194 96.189 0.204 4.570 130.295 96.451 0.207 1,839.220     
Aug 16 118.684 84.174 0.173 2.729 123.583 88.572 0.185 1,304.881     
Sep 16 123.626 90.155 0.174 3.424 133.609 99.580 0.199 1,467.738     
Oct 16 135.324 98.766 0.206 3.575 142.547 108.125 0.232 1,690.579     
Nov 16 143.930 107.266 0.237 2.899 155.052 115.282 0.261 1,338.822     
Dec 16 143.383 105.410 0.241 3.573 141.751 106.136 0.255 1,797.798     
Forecast monthNB
DTSF
ELM
RMSEMAEMAPECCRMSEMAEMAPECCRMSEMAEMAPECC
Jan 16 209.745 156.491 0.379 1.000 182.011 139.618 0.323 2.230 145.077 111.832 0.281 15,433.794 
Feb 16 183.982 145.982 0.301 1.000 147.250 112.660 0.229 1.036 134.271 107.674 0.210 390.773 
Mar 16 166.761 121.163 0.233 1.000 122.836 90.394 0.177 0.936 123.607 101.783 0.199 735.929 
Apr 16 141.742 102.277 0.185 1.000 123.432 95.721 0.180 1.873 116.152 90.609 0.159 742.174 
May 16 143.721 104.316 0.191 1.000 119.758 88.201 0.168 2.900 95.900 80.430 0.152 1,582.850 
Jun 16 149.416 107.567 0.208 1.000 117.162 85.972 0.170 1.978 98.230 79.438 0.157 1,343.134 
Jul 16 146.528 106.845 0.216 1.000 128.379 97.254 0.198 2.999 101.832 84.061 0.165 3,202.723 
Aug 16 123.719 81.809 0.154 1.000 109.401 77.338 0.150 2.118 90.620 72.558 0.137 2,460.946 
Sep 16 133.974 93.607 0.180 1.000 118.224 90.596 0.172 2.148 95.081 74.738 0.146 4,689.409 
Oct 16 148.850 113.865 0.234 1.000 123.264 93.284 0.195 2.276 107.985 88.033 0.183 3,632.866 
Nov 16 171.804 139.047 0.290 1.000 141.991 108.943 0.232 1.858 115.048 91.792 0.203 3,249.069 
Dec 16 152.384 115.530 0.267 1.000 133.700 96.193 0.224 2.311 103.291 82.606 0.196 4,362.896 
BNNAR
SARIMA
TBATS
Jan 16 151.753 112.491 0.285 2,734.733 214.723 162.299 0.381 1,215.395 174.200 129.920 0.298 1,395.868 
Feb 16 135.642 105.629 0.214 2,215.384 173.828 130.608 0.266 75.517 131.353 102.755 0.212 1,247.589 
Mar 16 115.860 95.741 0.186 2,641.903 155.864 115.328 0.231 83.812 118.280 94.931 0.190 1,837.428 
Apr 16 103.786 78.048 0.138 3,220.659 149.156 112.378 0.217 97.013 132.075 105.810 0.205 1,751.243 
May 16 101.266 79.276 0.144 6,005.786 131.096 98.502 0.193 511.578 109.265 86.394 0.170 2,797.262 
Jun 16 104.291 84.596 0.164 4,219.843 134.915 98.710 0.202 117.171 121.736 92.953 0.193 2,021.754 
Jul 16 107.118 90.339 0.181 6,865.314 135.242 98.178 0.208 182.392 115.083 87.799 0.186 3,333.398 
Aug 16 97.933 81.933 0.154 5,238.267 126.745 92.346 0.189 128.817 106.305 79.481 0.164 2,058.823 
Sep 16 104.829 88.543 0.169 6,203.078 128.884 95.397 0.187 144.644 121.547 91.489 0.179 2,412.095 
Oct 16 108.647 92.146 0.186 7,199.256 144.261 106.244 0.225 213.127 123.633 95.830 0.196 2,652.736 
Nov 16 121.898 97.024 0.213 6,612.746 152.601 113.928 0.250 477.352 132.453 98.412 0.210 2,185.039 
Dec 16 113.015 82.346 0.202 9,574.767 146.398 107.750 0.250 1,103.022 123.433 92.259 0.208 3,094.292 
ETS
STL_ETSHYBRID
Jan 16 213.707 162.322 0.378 52.817 216.071 166.601 0.385 104.382 176.488 134.700 0.320 1,655.709 
Feb 16 165.098 124.729 0.254 51.728 167.871 127.170 0.268 125.303 146.842 112.945 0.228 1,460.979 
Mar 16 148.331 110.633 0.225 59.732 143.909 111.058 0.222 114.901 125.762 99.579 0.198 1,640.583 
Apr 16 149.849 117.553 0.222 70.576 147.702 117.951 0.223 99.036 119.430 95.936 0.182 1,874.222 
May 16 133.367 108.852 0.208 149.174 117.948 89.126 0.171 163.454 112.123 89.137 0.172 3,266.722 
Jun 16 152.310 120.529 0.232 102.690 126.296 95.176 0.192 107.247 114.996 91.713 0.186 2,158.089 
Jul 16 131.575 98.838 0.204 143.758 121.445 93.172 0.193 178.819 115.819 91.565 0.192 3,481.806 
Aug 16 112.965 82.736 0.165 117.989 104.619 75.636 0.153 115.801 109.975 88.549 0.176 2,551.959 
Sep 16 126.392 94.812 0.181 127.450 116.838 88.206 0.171 179.993 115.912 92.689 0.182 2,704.730 
Oct 16 142.584 105.532 0.214 149.554 138.741 102.991 0.215 171.218 98.067 98.067 0.207 3,066.671 
Nov 16 147.768 113.436 0.242 112.067 148.777 110.931 0.247 139.069 135.237 103.686 0.236 2,451.597 
Dec 16 146.032 110.034 0.253 383.011 141.399 103.414 0.237 187.751 124.390 91.905 0.218 3,203.210 
THETA
PSF
Jan 16 211.083 159.680 0.370 1.898 179.452 139.340 0.343 826.263     
Feb 16 165.127 125.294 0.253 2.117 161.338 124.413 0.244 738.665     
Mar 16 146.005 108.254 0.219 2.099 138.917 106.329 0.214 827.778     
Apr 16 149.722 113.971 0.221 2.205 127.581 104.753 0.195 971.017     
May 16 132.070 96.776 0.192 3.725 117.770 89.641 0.173 1,660.057     
Jun 16 131.349 95.584 0.197 2.838 128.741 94.023 0.192 1,174.000     
Jul 16 133.194 96.189 0.204 4.570 130.295 96.451 0.207 1,839.220     
Aug 16 118.684 84.174 0.173 2.729 123.583 88.572 0.185 1,304.881     
Sep 16 123.626 90.155 0.174 3.424 133.609 99.580 0.199 1,467.738     
Oct 16 135.324 98.766 0.206 3.575 142.547 108.125 0.232 1,690.579     
Nov 16 143.930 107.266 0.237 2.899 155.052 115.282 0.261 1,338.822     
Dec 16 143.383 105.410 0.241 3.573 141.751 106.136 0.255 1,797.798     

The DTSF model was superior to SARIMA and ETS in all metrics evaluated (RMSE, MAE and MAPE) for all months of 2016. The STL-ETS hybrid model was more accurate based on the RMSE metric for May, July, August and September. In relation to the MAPE metric, the STL-ETS hybrid model was slightly better for the months of July and September, with values of 19.3% and 17.1%, respectively. For the same period, the DTSF model reached values of 19.8 and 17.2%, respectively. When compared with the PSF analogue, regarding the MAPE metric, the DTSF model obtained the best results for all months of 2016. Regarding the RMSE metric, TBATS was more accurate than DTSF in January, February, March, May, July, August, November and December. However, in relation to the MAPE metric, DTSF was more accurate in the periods from March to June and from August to October. Compared with the THETA method, DTSF obtained the best results for all months of the year using the RMSE and MAPE metrics. Regarding the MAE metric, THETA was more accurate for the months of July to September and in the month of November.

Next, a comparative analysis of the DTSF, ELM and BNNAR methods is presented. ELM proved to be more accurate, not only in relation to DTSF but also to BNNAR, for the entire second semester of 2016 for the RMSE and MAPE metrics. BNNAR presented superior results in relation to ELM and DTSF in April and May for the MAPE metric and superior results in April, May and December for the MAE metric. With respect to this metric, ELM was superior to DTSF and BNNAR in the first quarter of the year.

Using the MF metric, ELM and BNNAR were the two models showing the best adjustments during the study period, followed by DTSF. Results are shown in Table 3 and Figure 11.
Table 3

Forecast performance using Model Fitting metric in 2016

Forecast monthNBDTSFELMBNNARSARIMAETSTBATSTHETASTL-ETSHYBRIDPSF
Jan 16 0.3382 0.2102 0.1782 0.1996 0.3162 0.2989 0.1979 0.2895 0.3067 0.2211 0.2493 
Feb 16 0.1786 0.1093 0.0842 0.0947 0.1526 0.1387 0.0999 0.1341 0.1599 0.1085 0.1169 
Mar 16 0.1092 0.0589 0.0595 0.0486 0.1035 0.0968 0.0584 0.0942 0.0857 0.0658 0.0819 
Apr 16 0.0678 0.0572 0.0411 0.0342 0.0905 0.0841 0.0720 0.0920 0.0818 0.0574 0.0616 
May 16 0.0703 0.0542 0.0341 0.0329 0.0705 0.0638 0.0496 0.0734 0.0539 0.0502 0.0564 
Jun 16 0.0861 0.0584 0.0429 0.0433 0.0894 0.0839 0.0774 0.0855 0.0746 0.0659 0.0806 
Jul 16 0.0996 0.0749 0.0430 0.0511 0.0875 0.0772 0.0673 0.0858 0.0687 0.0671 0.0861 
Aug 16 0.0584 0.0498 0.0322 0.0368 0.0749 0.0552 0.0565 0.0658 0.0503 0.0545 0.0751 
Sep 16 0.0723 0.0542 0.0414 0.0438 0.0672 0.0605 0.0616 0.0589 0.0547 0.0566 0.0777 
Oct 16 0.1093 0.0833 0.0661 0.0612 0.1196 0.0942 0.0788 0.0980 0.1026 0.0926 0.1269 
Nov 16 0.1519 0.1208 0.0923 0.1017 0.1398 0.1189 0.0993 0.1263 0.1356 0.1348 0.1730 
Dec 16 0.1626 0.1341 0.0895 0.1207 0.1663 0.1565 0.1025 0.1490 0.1423 0.1326 0.1774 
Forecast monthNBDTSFELMBNNARSARIMAETSTBATSTHETASTL-ETSHYBRIDPSF
Jan 16 0.3382 0.2102 0.1782 0.1996 0.3162 0.2989 0.1979 0.2895 0.3067 0.2211 0.2493 
Feb 16 0.1786 0.1093 0.0842 0.0947 0.1526 0.1387 0.0999 0.1341 0.1599 0.1085 0.1169 
Mar 16 0.1092 0.0589 0.0595 0.0486 0.1035 0.0968 0.0584 0.0942 0.0857 0.0658 0.0819 
Apr 16 0.0678 0.0572 0.0411 0.0342 0.0905 0.0841 0.0720 0.0920 0.0818 0.0574 0.0616 
May 16 0.0703 0.0542 0.0341 0.0329 0.0705 0.0638 0.0496 0.0734 0.0539 0.0502 0.0564 
Jun 16 0.0861 0.0584 0.0429 0.0433 0.0894 0.0839 0.0774 0.0855 0.0746 0.0659 0.0806 
Jul 16 0.0996 0.0749 0.0430 0.0511 0.0875 0.0772 0.0673 0.0858 0.0687 0.0671 0.0861 
Aug 16 0.0584 0.0498 0.0322 0.0368 0.0749 0.0552 0.0565 0.0658 0.0503 0.0545 0.0751 
Sep 16 0.0723 0.0542 0.0414 0.0438 0.0672 0.0605 0.0616 0.0589 0.0547 0.0566 0.0777 
Oct 16 0.1093 0.0833 0.0661 0.0612 0.1196 0.0942 0.0788 0.0980 0.1026 0.0926 0.1269 
Nov 16 0.1519 0.1208 0.0923 0.1017 0.1398 0.1189 0.0993 0.1263 0.1356 0.1348 0.1730 
Dec 16 0.1626 0.1341 0.0895 0.1207 0.1663 0.1565 0.1025 0.1490 0.1423 0.1326 0.1774 
Figure 11

Forecasting accuracy of the models using Model Fitting metric from 1 January 2016 to 31 December 2016.

Figure 11

Forecasting accuracy of the models using Model Fitting metric from 1 January 2016 to 31 December 2016.

Close modal

Figure 11 shows the accuracy of the models in the study period using the MF metric. The worst performance for all methods occurred in the summer (January, February and December) and spring (September to November) months, while the best performance occurred in the autumn (March to May) and winter (June to August) months, when the water demand is lower. The two best models, ELM and BNNAR, are similar. DTSF and TBATS alternated best performance accuracy throughout the studied period.

Table 4 presents the evaluation of the forecasting models for the different seasons in 2016 using the root mean square error, average absolute error, average percentage error and MF. During the summer, which is a period of higher water consumption, no model achieved better forecasting performance for any metrics. Nonetheless, the ELM achieved the best results during the winter months.

Table 4

Evaluation of the forecasting models for the different seasons in 2016 using the root mean square error, average absolute error, average percentage error and model fitting

Forecast monthSeasonsRMSEMAEMAPEMF
Jan 16 Summer ELM ELM ELM ELM 
Feb 16 Summer TBATS TBATS ELM ELM 
Mar 16 Summer–Autumn BNNAR DTSF DTSF BNNAR 
Apr 16 Autumn BNNAR BNNAR BNNAR BNNAR 
May 16 Autumn ELM BNNAR BNNAR BNNAR 
Jun 16 Autumn–Winter ELM ELM ELM ELM 
Jul 16 Winter ELM ELM ELM ELM 
Aug 16 Winter ELM ELM ELM ELM 
Sep 16 Winter–Spring ELM ELM ELM ELM 
Oct 16 Spring ELM ELM ELM BNNAR 
Nov 16 Spring ELM ELM ELM ELM 
Dec 16 Spring–Summer ELM BNNAR ELM ELM 
Forecast monthSeasonsRMSEMAEMAPEMF
Jan 16 Summer ELM ELM ELM ELM 
Feb 16 Summer TBATS TBATS ELM ELM 
Mar 16 Summer–Autumn BNNAR DTSF DTSF BNNAR 
Apr 16 Autumn BNNAR BNNAR BNNAR BNNAR 
May 16 Autumn ELM BNNAR BNNAR BNNAR 
Jun 16 Autumn–Winter ELM ELM ELM ELM 
Jul 16 Winter ELM ELM ELM ELM 
Aug 16 Winter ELM ELM ELM ELM 
Sep 16 Winter–Spring ELM ELM ELM ELM 
Oct 16 Spring ELM ELM ELM BNNAR 
Nov 16 Spring ELM ELM ELM ELM 
Dec 16 Spring–Summer ELM BNNAR ELM ELM 

The results, shown in Table 5, show the computational overhead (log 10) of the methods used in the present study.

Table 5

Computational overhead for each of the models throughout 2016 (log 10)

Forecast monthDTSFELMBNNARSARIMAETSTBATSTHETASTL-ETSHYBRIDPSF
Jan 16 0.348 4.188 3.437 3.085 1.723 3.145 0.278 2.019 3.219 2.917 
Feb 16 0.016 2.592 3.345 1.878 1.714 3.096 0.326 2.098 3.165 2.868 
Mar 16 − 0.029 2.867 3.422 1.923 1.776 3.264 0.322 2.060 3.215 2.918 
Apr 16 0.273 2.871 3.508 1.987 1.849 3.243 0.343 1.996 3.273 2.987 
May 16 0.462 3.199 3.779 2.709 2.174 3.447 0.571 2.213 3.514 3.220 
Jun 16 0.296 3.128 3.625 2.069 2.012 3.306 0.453 2.030 3.334 3.070 
Jul 16 0.477 3.506 3.837 2.261 2.158 3.523 0.660 2.252 3.542 3.265 
Aug 16 0.326 3.391 3.719 2.110 2.072 3.314 0.436 2.064 3.407 3.116 
Sep 16 0.332 3.671 3.793 2.160 2.105 3.382 0.534 2.255 3.432 3.167 
Oct 16 0.357 3.560 3.857 2.329 2.175 3.424 0.553 2.234 3.487 3.228 
Nov 16 0.269 3.512 3.820 2.679 2.049 3.339 0.462 2.143 3.389 3.127 
Dec 16 0.364 3.640 3.981 3.043 2.583 3.491 0.553 2.274 3.506 3.255 
Forecast monthDTSFELMBNNARSARIMAETSTBATSTHETASTL-ETSHYBRIDPSF
Jan 16 0.348 4.188 3.437 3.085 1.723 3.145 0.278 2.019 3.219 2.917 
Feb 16 0.016 2.592 3.345 1.878 1.714 3.096 0.326 2.098 3.165 2.868 
Mar 16 − 0.029 2.867 3.422 1.923 1.776 3.264 0.322 2.060 3.215 2.918 
Apr 16 0.273 2.871 3.508 1.987 1.849 3.243 0.343 1.996 3.273 2.987 
May 16 0.462 3.199 3.779 2.709 2.174 3.447 0.571 2.213 3.514 3.220 
Jun 16 0.296 3.128 3.625 2.069 2.012 3.306 0.453 2.030 3.334 3.070 
Jul 16 0.477 3.506 3.837 2.261 2.158 3.523 0.660 2.252 3.542 3.265 
Aug 16 0.326 3.391 3.719 2.110 2.072 3.314 0.436 2.064 3.407 3.116 
Sep 16 0.332 3.671 3.793 2.160 2.105 3.382 0.534 2.255 3.432 3.167 
Oct 16 0.357 3.560 3.857 2.329 2.175 3.424 0.553 2.234 3.487 3.228 
Nov 16 0.269 3.512 3.820 2.679 2.049 3.339 0.462 2.143 3.389 3.127 
Dec 16 0.364 3.640 3.981 3.043 2.583 3.491 0.553 2.274 3.506 3.255 

DTSF presented computational overhead ranging between −0.029 and 0.477. THETA was the method with the second-lowest computational overhead, with values ranging from 0.278 to 0.660. This was followed by ETS with values ranging from 1.714 to 2.583. Both the THETA and ETS methods are used as benchmarks of the M4 Competition. It is worth mentioning that the THETA method relies on a linear decomposition of the time-series into two components. In sequence, a linear regression model is applied to the first component and an exponential smoothing model is applied to the second component. The linear decomposition of the time-series, the fit of the linear regression model and the fit of the exponential smoothing model are extremely fast procedures. On the other hand, ELM, BNNAR, HYBRID, TBATS and PSF showed marked increases in computational overhead as new data were incorporated into the forecasts (recalibration of the rolling time window) (Tashman 2000). The computational overhead of these five methods ranged from 2.592 to 4.188, 3.345 to 3.981, 3.165 to 3.542, 3.096 to 3.523 and 2.868 to 3.265, respectively. In December 2016, BNNAR, ELM, HYBRID and PSF showed computational overheads of 3.981, 3.640, 3.506 and 3.255, respectively, compared with the DTSF computational overhead of 0.364.

Figure 12 compares the computational overhead of DTSF with the remaining methods.
Figure 12

Computational overhead of DTSF and the other methods (log 10).

Figure 12

Computational overhead of DTSF and the other methods (log 10).

Close modal
To show the accuracy of the models more clearly, a seven-day moving average was applied to the median of the hourly RMSE. The result of this smoothing is shown in Figure 13, giving a clearer representation of how the models behaved throughout the year. This result indicates that, on a weekly horizon, the hourly forecast error for the different models varied considerably throughout the year. Therefore, it is concluded that no single model is consistently better than any of the other models to predict water demand, for every day of the year, as indicated in Groppo et al. (2019). This smoothing, using moving averages, suggests a possible strategy for model switching. The historical trend of the smoothed error of a model can be used to define a period in which the forecast of a specific model is, on average, preferable.
Table 6

Numbers and percentages of the best forecast adjustments for each of the models for the next 24 h in 2016

ModelNumber of best predictionsPercentage of best predictions
NB 77 21.45 
DTSF 62 17.27 
ELM 59 16.43 
BNNAR 72 20.06 
ETS 1.39 
SARIMA 11 3.06 
TBATS 17 4.74 
STL-ETS 13 3.62 
PSF 18 5.01 
HYBRID 1.11 
THETA 21 5.85 
ModelNumber of best predictionsPercentage of best predictions
NB 77 21.45 
DTSF 62 17.27 
ELM 59 16.43 
BNNAR 72 20.06 
ETS 1.39 
SARIMA 11 3.06 
TBATS 17 4.74 
STL-ETS 13 3.62 
PSF 18 5.01 
HYBRID 1.11 
THETA 21 5.85 
Figure 13

Evolution of the median of the RMSE, for the next 24 h, after applying a seven-day moving average for all evaluated models.

Figure 13

Evolution of the median of the RMSE, for the next 24 h, after applying a seven-day moving average for all evaluated models.

Close modal

After applying the seven-day moving average to the median of forecast errors, the results were obtained as shown in Table 6. The results indicate that in a one-week horizon with hourly forecasts, the forecast error for the different models varies greatly. Therefore, there is no better model for predicting water demand for every day of the year. The best water demand forecasting model in 2016 was NB, with 21.45% of the best forecasts, after applying a weekly moving average. This was followed by the hybrid neuronal model BNNAR with 20.06%; the DTSF method, with 17.27%; ELM, with 16.43%; THETA, with 5.85%; and PSF, with 5.01%; this was followed by hybrid model ETS-STL, with 3.62%. The statistical methods TBATS, SARIMA and ETS presented 4.74%, 3.06% and 1.39% of the best predictions, respectively. The HYBRID model showed the worst performance, with 1.11%. It is worth noting that a more complex method was not as accurate as a simpler method (Makidrakis et al. 2018). Dalessandro (2013) shows that the use of computationally less expensive algorithms, which are also less accurate in the intermediate stages, define a model that performs equally well in big data forecasting. These algorithms may have more iterations compared with those that are computationally more expensive. However, the iterations are much faster. Consequently, less expensive algorithms tend to converge much faster, while giving the same precision.

Table 7 shows the daily frequency of methods with the best forecast performance for each month during 2016. As previously mentioned, NB, DTSF, ELM and BNNAR achieved improved forecast performance, most frequently. A daily analysis of the best forecast methods is found in the Supplementary Material.

Table 7

List by month of the best forecast methods evaluated daily during 2016

Forecast monthSeasonsNBDTSFELMBNNARETSSARIMATBATSSTL-ETSPSFHYBRIDTHETA
Jan 16 Summer  18       
Feb 16 Summer 11      
Mar 16 Summer–Autumn 12     
Apr 16 Autumn 18       
May 16 Autumn 9     
Jun 16 Autumn–Winter 8      
Jul 16 Winter 10  
Aug 16 Winter 17       
Sep 16 Winter–Spring 21        
Oct 16 Spring 15     
Nov 16 Spring  10    
Dec 16 Spring–Summer  16     
Forecast monthSeasonsNBDTSFELMBNNARETSSARIMATBATSSTL-ETSPSFHYBRIDTHETA
Jan 16 Summer  18       
Feb 16 Summer 11      
Mar 16 Summer–Autumn 12     
Apr 16 Autumn 18       
May 16 Autumn 9     
Jun 16 Autumn–Winter 8      
Jul 16 Winter 10  
Aug 16 Winter 17       
Sep 16 Winter–Spring 21        
Oct 16 Spring 15     
Nov 16 Spring  10    
Dec 16 Spring–Summer  16     

The method with the best performance in each month is indicated in bold type.

Based on the literature and the forecast results using different univariate methodologies, it is possible to conclude that algorithms with high computational overhead often are not applicable in large-scale, real-world situations. In this context, DTSF is extremely fast as compared with the remaining methods and provides reasonably lower prediction errors. The great advantage of this data-oriented method is that, given a large amount of data, in general, the performance improves. Thus, the use and evaluation of DTSF for water demand forecasting is an important and novel contribution of this study.

Furthermore, scanning and scaling previous observations using similarity functions compensate changes in the mean and in the scale throughout the time-series. Thus, observations under the influence of pipe failures and leakages can be used for forecasting purposes, as long as the water demand time-series pattern under these circumstances is strongly correlated with the last observations.

Results show that, in summer and autumn, no model surpassed the others in terms of forecasting accuracy. However, from June to November (winter and spring), ELM presented the best results, i.e., ELM presented the best forecasting performance. However, its computational overhead is a major disadvantage. The other methods based on neural networks presented the best results for the metrics RMSE, MAE, MAPE and MF. Similar to ELM, computational overhead increases significantly if cross-validation with sliding windows and recalibration is applied.

In general, the main findings show that there is no uniform method to predict water demand with high accuracy. Nonetheless, there is a set of methods with optimal performances. Each method can be further investigated to improve its own performance. Alternatively, future work will aim at investigating the switching of methods, since each method achieved better forecast accuracy at different parts of the year (see Figures A.1–A.4 in the Supplementary Material).

Finally, forecasting results using data from 2016 indicate that, within a week-long horizon with an hourly forecast resolution, errors using the different models vary. Therefore, there is no best model for predicting daily water demand. Results indicate the use of different models at different times of the year.

The authors thank CNPq and Fapemig for financial support, and Sanitation Company of Minas Gerais (Copasa MG) for providing urban water demand data.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alvisi
S.
,
Franchini
M.
&
Marinelli
A.
2007
A short-term, pattern-based model for water-demand forecasting
.
Journal of Hydroinformatics
9
(
1
),
39
50
.
Ambrosio
J. K.
,
Brentan
B. M.
,
Herrera
M.
,
Luvizotto
E.
Jr.
,
Ribeiro
L.
&
Izquierdo
J.
2019
Committee machines for hourly water demand forecasting in water supply systems
.
Mathematical Problems in Engineering
2019
,
9765468
.
Arandia
E.
,
Ba
A.
,
Eck
B.
&
McKenna
S.
2016
Tailoring seasonal time series models to forecast short-term water demand
.
Journal of Water Resources Planning and Management
142
(
3
),
04015067
.
Arnaiz-González
Á.
,
Díez-Pastor
J. F.
,
Rodríguez
J. J.
&
García-Osorio
C.
2016
Instance selection of linear complexity for big data
.
Knowledge-Based Systems
107
,
83
95
.
Assimakopoulos
V.
&
Nikolopoulos
K.
2000
The theta model: a decomposition approach to forecasting
.
International Journal Forecasting
16
(
4
),
521
530
.
Bai
Y.
,
Wang
P.
,
Li
C.
,
Xie
J.
&
Wang
Y.
2014
A multi-scale relevance vector regression approach for daily urban water demand forecasting
.
Journal of Hydrology
517
,
236
245
.
Bakker
M.
,
Vreeburg
J. H. G.
,
van Schagen
K. M.
&
Rietveld
L. C.
2013
A fully adaptive forecasting model for short-term drinking water demand
.
Environmental Modelling & Software
48
,
141
151
.
Baldán
F. J.
&
Benítez
J. M.
2019
Distributed FastShapelet Transform: a Big Data time series classification algorithm
.
Information Sciences
496
,
451
463
.
Baldán
F. J.
,
Peralta
D.
,
Saeys
Y.
&
Benítez
J. M.
2021
SCMFTS: Scalable and distributed complexity measures and features for univariate and multivariate time series in Big Data environments
.
International Journal of Computational Intelligence Systems
14
,
186
.
Bates
J. M.
&
Granger
C. W. J.
1969
The combination of forecasts
.
Operations Research Quarterly
20
(
4
),
451
468
.
Bergmeier
C.
&
Benítez
J. M.
2012
On the use of cross-validation for time series predictor evaluation
.
Information Sciences
191
,
192
213
.
Bokde
N.
,
Asencio-Cortés
G.
,
Martínez-Álvarez
F.
&
Kulat
K.
2017
PSF: Introduction to R package for pattern sequence based forecasting algorithm
.
The R Journal
9
(
1
),
324
333
.
Brentan
B. M.
,
Luvizotto
E.
,
Herrera
M.
,
Izquierdo
J.
&
Pérez-García
R.
2016
Hybrid regression model for near real-time urban water demand forecasting
.
Journal of Computational and Applied Mathematics
309
,
532
541
.
Chen
Y.
&
Han
D.
2016
Big data and hydroinformatics
.
Journal of Hydroinformatics
18
(
4
),
599
614
.
Chen
G.
,
Long
T.
,
Xiong
J.
&
Bai
Y.
2017
Multiple random forests modelling for urban water consumption forecasting
.
Water Resources Management
31
,
4715
4729
.
Cirilo
J. A.
2015
Water crisis: challenges and overcoming
.
Revista USP
106
,
45
58
(in Portuguese)
.
Costa
M. A.
&
Mineti
L. B.
2019
DTScanF: dynamic time scan forecasting. Zenodo. doi:10.5281/zenodo.2603007.
Costa
M. A.
,
Ruiz-Cárdenas
R.
,
Mineti
L. B.
&
Prates
M. O.
2021
Dynamic time scan forecasting for multi-step wind speed prediction
.
Renewable Energy
177
,
584
595
.
De Mauro
A.
,
Greco
M.
&
Grimaldi
M.
2014
What is Big Data? A consensual definition and a review of key research topics
. In:
4th International Conference on Integrated Information
,
Madrid, Spain
.
doi:10.13140/2.1.2341.5048
.
Firat
M.
,
Turan
M. E.
&
Yurdusev
M. A.
2009a
Comparative analysis of fuzzy inference systems for water consumption time series prediction
.
Journal of Hydrology
374
,
235
241
.
Firat
M.
,
Yurdusev
M. A.
&
Turan
M. E.
2009b
Evaluation of artificial neural network techniques for municipal water consumption modeling
.
Water Resources Management
23
,
617
632
.
Gagliardi
F.
,
Alvisi
S.
,
Franchini
M.
&
Guidorzi
M.
2017a
A comparison between pattern-based and neural network short-term water demand forecasting models
.
Water Supply
17
(
5
),
1426
1435
.
Gagliardi
F.
,
Alvisi
S.
,
Kapelan
Z.
&
Franchini
M.
2017b
A probabilistic short-term water demand forecasting model based on the Markov Chain
.
Water
9
(
7
),
507
.
Glaz
J.
,
Pozdnyakov
V.
&
Wallenstein
S.
2009
Scan Statistics: Methods and Applications
.
Birkhäuser
,
Boston, MA, USA
.
Groppo
G. S.
,
Costa
M. A.
&
Libânio
M.
2019
Predicting water demand: a review of the methods employed and future possibilities
.
Water Supply
19
(
8
),
2179
2198
.
Herrera
M.
,
Torgo
L.
,
Izquierdo
J.
&
Pérez-García
R.
2010
Predictive models for forecasting hourly urban water demand
.
Journal of Hydrology
387
,
141
150
.
Huang
L.
,
Zhang
C.
,
Peng
Y.
&
Zhou
H.
2014
Application of a combination model based on wavelet transform and KPLS-ARMA for urban annual water demand forecasting
.
Journal of Water Resources Planning and Management
140
,
04014013
.
Huang
H.
,
Zhang
Z.
&
Song
F.
2021
An ensemble-learning-based method for short-term water demand forecasting
.
Water Resources Management
35
,
1757
1773
.
Hyndman
R. J.
2014
Measuring forecast accuracy
. https://pdfs.semanticscholar.org/af71/3d815a7caba8dff7248ecea05a5956b2a487.pdf
(accessed 13 September 2020).
Hyndman
R. J.
&
Khandakar
Y.
2008
Automatic time series forecasting: the forecast package for R
.
Journal of Statistical Software
27
(
3
),
1
22
.
Hyndman
R. J.
,
O'Hara-Wild
M.
,
Bergmeier
C.
,
Razbash
S.
&
Wang
E.
2017
Forecasting Functions for Time Series and Linear Models
.
R package version 8.2
https://pkg.robjhyndman.com/forecast/
.
Kourentzes
N.
2017
Time Series Forecasting with Neural Networks
.
R package version 0.9.2
.
https://cran.r-project.org/web/packages/nnfor/nnfor.pdf
.
Kourentzes
N.
,
Barrow
D. K.
&
Crone
S. F.
2014
Neural network ensemble operators for time series forecasting
.
Expert Systems with Applications
41
,
4235
4244
.
Kulldorff
M.
1997
A spatial scan statistic
.
Communications in Statistics – Theory and Methods
26
(
6
),
1481
1496
.
Kulldorff
M.
2001
Prospective time periodic geographical disease surveillance using a scan statistic
.
Journal of the Royal Statistical Society Series A (Statistics in Society)
164
,
61
72
.
Kulldorff
M.
,
Athas
W. F.
,
Feurer
E. J.
,
Miller
B. A.
&
Key
C. R.
1998
Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico
.
American Journal of Public Health
88
,
1377
1380
.
Lendasse
A.
,
Honkela
T.
&
Simula
O.
2010
European Symposium on Time Series Prediction
.
Neurocomputing
73
,
1919
1922
.
Maddocks
A.
,
Young
R. S.
&
Reig
P.
2015
Ranking the world's most water-stressed countries in 2040
World Resources Institute
.
(26 August)
. https://www.wri.org/insights/ranking-worlds-most-water-stressed-countries-2040
(accessed 24 June 2021).
Makidrakis
S.
,
Spiliotis
E.
&
Assimakopoulos
V.
2018
Statistical and machine learning forecasting methods: concerns and ways forward
.
PLoS ONE
13
(
3
),
e0194889
.
Makidrakis
S.
,
Spiliotis
E.
&
Assimakopoulos
V.
2020
The M4 Competition: 100,000 time series and 61 forecasting methods
.
International Journal of Forecasting
36
,
54
74
.
Martinez Alvarez
F.
,
Troncoso
A.
,
Riquelme
J. C.
&
Aguilar Ruiz
J. S.
2011
Energy time series forecasting based on pattern sequence similarity
.
IEEE Transactions on Knowledge and Data Engineering
23
(
8
),
1230
1243
.
McDonald
R. I.
,
Weber
K.
,
Padowski
J.
,
Flörke
M.
,
Schneider
C.
,
Green
P. A.
,
Gleeson
T.
,
Eckman
S.
,
Lehner
B.
,
Balk
D.
,
Boucher
T.
,
Grill
G.
&
Montgomery
M.
2014
Water on an urban planet: urbanization and the reach of urban water infrastructure
.
Global Environmental Change
27
,
96
105
.
Montgomery
D. C.
,
Peck
E. A.
&
Vining
G. G.
2012
Introduction to Linear Regression Analysis
.
John Wiley & Sons
,
Hoboken, NJ, USA
.
Mooney
C. Z.
1997
Monte Carlo Simulation
.
Sage Publications
,
Thousand Oaks, CA, USA
.
Nasseri
M.
,
Moeini
A.
&
Tabesh
M.
2011
Forecasting monthly urban water demand using Extended Kalman Filter and Genetic Programming
.
Expert Systems with Applications
38
,
7387
7395
.
Naus
J. I.
1965
The distribution of the size of the maximum cluster of points on a line
.
Journal of the American Statistical Association
60
,
532
538
.
Odan
F. K.
&
Reis
L. F. R.
2012
Hybrid water demand forecasting model associating artificial neural network with Fourier series
.
Journal of Water Resources Planning and Management
138
,
245
256
.
Pacchin
E.
,
Gagliardi
F.
,
Alvisi
S.
&
Franchini
M.
2019
A comparison of short-term water demand forecasting models
.
Water Resources Management
33
,
1481
1497
.
Rajballie
A.
,
Tripathi
V.
&
Chinchamee
A.
2022
Water consumption forecasting models – a case study in Trinidad (Trinidad and Tobago)
.
Water Supply
22
(
5
),
5434
5447
.
Santos
C. C.
&
Pereira Filho
A. J.
2014
Water demand forecasting model for the metropolitan area of São Paulo, Brazil
.
Water Resources Management
28
,
4401
4414
.
Sarma
A.
,
Goyal
P.
,
Kumari
S.
,
Wani
A.
,
Challa
J. S.
,
Islam
S.
&
Goyal
N.
2019
μDBSCAN: an exact scalable DBSCAN algorithm for Big Data exploiting spatial locality
. In:
2019 IEEE International Conference on Cluster Computing (CLUSTER)
,
23–26 September, Albuquerque, NM, USA
.
Shaub
D.
&
Ellis
P.
2019
Convenient Functions for Ensemble Time Series Forecasts
.
R package version 4.2.17
.
https://cran.r-project.org/package=forecastHybrid
.
Tashman
L. J.
2000
Out-of-sample tests of forecasting accuracy: an analysis and review
.
International Journal of Forecasting
16
,
437
450
.
Tian
D.
,
Martinez
C. J.
&
Asefa
T.
2016
Improving short-term urban water demand forecasts with reforecast analog ensembles
.
Journal of Water Resources Planning and Management
142
(
6
),
04016008
.
Tiwari
M. K.
&
Adamowski
J. F.
2015
Medium-term urban water demand forecasting with limited data using an ensemble wavelet-bootstrap machine-learning approach
.
Journal of Water Resources Planning and Management
141
(
2
),
04014053
.
Tsonis
A. A.
1992
Chaos: From Theory to Applications
.
Springer
,
New York, USA
.
Vörösmarty
C. J.
,
Green
P.
,
Salisbury
J.
&
Lammers
R. B.
2000
Global water resources: vulnerability from climate change and population growth
.
Science
289
(
5477
),
284
288
.
World Water Assessment Programme (WWAP)
2012
The United Nations World Water Development Report 4: Managing Water under Uncertainty and Risk
.
UNESCO
,
Paris, France
. https://unesdoc.unesco.org/ark:/48223/pf0000215644_eng
(accessed 24 June 2021).
Zhang
G. P.
&
Qi
M.
2005
Neural network forecasting for seasonal and trend time series
.
European Journal of Operational Research
160
(
2
),
501
514
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data