Abstract
The specialized literature on water demand forecasting indicates that successful predicting models are based on soft computing approaches such as neural networks, fuzzy systems, evolutionary computing, support vector machines and hybrid models. However, soft computing models are extremely sensitive to sample size, with limitations for modeling extensive time-series. As an alternative, this work proposes the use of the dynamic time scan forecasting (DTSF) method to predict time-series for water demand in urban supply systems. Such a model scans a time-series looking for patterns similar to the values observed most recently. The values that precede the selected patterns are used to create the prediction using similarity functions. Compared with soft computing approaches, the DTSF method has very low computational complexity and is indicated for large time-series. Results presented here demonstrate that the proposed method provides similar or improved forecast values, compared with soft computing and statistical methods, but with lower computational cost. Thus, its use for online water demand forecasts is favored.
HIGHLIGHTS
Novel analog-based methodology to forecasting in univariate time-series.
A fast time-series forecasting methodology for large data sets.
The great advantage of this data-oriented method is that, given a large amount of data, in general, the performance improves.
The method has very low computational complexity, thus, its use for online water demand forecasts is favored.
There is no best model for predicting daily water demand.
Graphical Abstract
INTRODUCTION
The growing demand for water and the imminent risk of shortages are associated with the increase in urban population, per capita consumption and irregular distribution of rainfall. In the last 100 years, the world population has increased threefold, while water consumption has increased sixfold (Cirilo 2015). Considering population growth, between 2009 and 2050, the world population is projected to increase by around 2.3 billion inhabitants, from 6.8 to 9.1 billion, at the same time that the urban population will increase from 3.4 billion in 2009 to 6.3 billion in 2050, increasing the stress on water availability (WWAP 2012).
Despite all the existing urban water infrastructures, many cities are currently facing water stress. In fact, a large proportion of the world population has been affected by water stress (Vörösmarty et al. 2000; Maddocks et al. 2015). Studies indicate that one quarter (25% ± 4%) of the population in large cities, or 381 ± 55 million people, has water supplies that are stressed (McDonald et al. 2014). This scenario represents a major challenge for providing a sufficient amount of water of adequate quality.
To minimize the effects of water stress, water supply system (WSS) management techniques have been used to maintain a balance between water supply and demand. This balance is achieved by applying operational actions, many of which require the application of statistical tools for water demand forecasting.
The importance of water demand forecasting has been justified by the increase in interest among researchers and professionals. Over the past 20 years, the number of articles published on this topic has increased exponentially (Groppo et al. 2019), reflecting the need to develop efficient systems to manage water demand. The literature presents numerous models for urban water demand forecasting using statistical techniques such as linear regression (Campisi-Pinto et al. 2012; Santos & Pereira Filho 2014), nonlinear regression (Adamowski et al. 2012), time-series analysis (Caiado 2010; Huang et al. 2014; Al-Zahrani & Abo-Monasar 2015; Arandia et al. 2016; Rajballie et al. 2022), similarity-based approaches (Alvisi et al. 2007; Bakker et al. 2013; Tian et al. 2016; Gagliardi et al. 2017a; Pacchin et al. 2019), Markov chains (Gagliardi et al. 2017b; Pacchin et al. 2019) and techniques based on Soft Computing.
For water demand forecasting, several Soft Computing methods are presented and applied to historical water demand series. It is known that these series have stochastic and nonlinear components, making water demand forecasting a complex issue. In this context, Soft Computing methods such as Fuzzy Logic (Firat et al. 2009a; Ambrosio et al. 2019), Neural Computing (Firat et al. 2009b, 2010; Santos & Pereira Filho 2014; Al-Zahrani & Abo-Monasar 2015; Pacchin et al. 2019), Evolutionary Computation (Bai et al. 2014; Romano & Kapelan 2014; Leon et al. 2020; Shirkoohi et al. 2021), Support Vector Machines (Herrera et al. 2010; Brentan et al. 2016; Ambrosio et al. 2019), Random Forests (Chen et al. 2017; Ambrosio et al. 2019), Long Short-Term Memory (Boudhaouia & Wira 2021), Dual-Scale Deep Belief Network (Xu et al. 2018), Continuous Deep Belief Echo State Network (Xu et al. 2019b) and hybrid models (e.g., Nasseri et al. 2011; Adamowski et al. 2012; Campisi-Pinto et al. 2012; Odan & Reis 2012; Huang et al. 2014, 2021, 2022; Tiwari & Adamowski 2015; Guo et al. 2022; Rajballie et al. 2022) have provided more accurate results for urban water demand forecasting. In general, hybrid models are more robust for water demand forecasting compared with Feed Forward Neural Network (FFNN), Multiple Linear Regression (RLM), Multiple Nonlinear Regression (MNLR) and ARIMA models.
In general, the main factors that impact urban water demand are often difficult to identify using traditional algorithms. For instance, Xu et al. (2019a) applied the energy spectrum (Oshima & Kosuda 1998) and the highest Lyapunov coefficient (Tsonis 1992) to examine the main characteristics of the water demand time-series. Results indicate that the water demand time-series can be represented as chaotic time-series. Soft Computing methods, without proper pre-processing, can become unstable and produce erroneous results if applied to water demand forecasting (Zhang & Qi 2005).
Simultaneously, with the profusion of methods developed by the scientific community, the digitization process has been advancing. Consequently, data acquisition has been increasing, creating a ‘big data’ problem. De Mauro et al. (2014) claim that big data represents information assets characterized by high volume, speed and variety, which require specific technology and analytical methods to turn it into value. Data values can be divided into three groups. Values associated with the characteristics of the data set; values associated with the specific technology and analytical methods for manipulating the data; and, values associated with the insights, i.e., knowledge extracted from the data. Therefore, the goal of big data analytics is knowledge discovery from massive data sets (Chen & Han 2016).
Due to the rapid increase in data volume, a large amount of storage is necessary, requiring greater bandwidth and high latency in data processing (Caiza et al. 2020), making the Industrial Internet of Things (IIoT) a challenge for the current infrastructure (Sabireen & Neelanarayanan 2021). A solution to mitigate this problem is ‘cloud computing’, which, according to Sabireen & Neelanarayanan (2021), constitutes a new component of Industry 4.0. Cloud computing solves the big data problem, reducing energy consumption in network industrial sensors, improving security, processing and real-time data storage. In addition to solving major limitations in processing large databases, algorithms with low computational complexity (CC) have been developed (Arnaiz-González et al. 2016; Baldán & Benítez 2019; Sarma et al. 2019; Baldán et al. 2021).
In this scenario, the present work evaluates the feasibility of using a novel, analog-based methodology named dynamic time scan forecasting (DTSF) (Costa et al. 2021) to perform multi-step forecasting in univariate time-series in order to predict short-term (hourly) demand in WSS. This is done by comparing several known univariate alternatives in terms of computational efficiency and cost. Statistical, machine learning and hybrid approaches were used. As in Costa et al. (2021), this study employed the Naive Bayes (naive 1 method), the Pattern Sequence-Based Forecasting (PSF) analog approach, the time-series-based approach Box–Jenkins (SARIMA), Trigonometric Box–Cox transformation model, ARMA errors, Trend and Seasonal components (TBATS), the hybrid model using Seasonal and Trend decomposition using Loess filter (STL) and the exponential smoothing method (ETS) STL + ETS, the Hybrid.2 approach, and the NNET.2 approach – an automatic method that employs extreme learning machines (ELM). Additionally, we used the hybrid autoregressive neural network approach with bootstrap (BNNAR), THETA model and the ETS, which, according to Makidrakis et al. (2018), comparing numerous statistical and machine learning methods to predict a step forward, obtained the best accuracy among all evaluated methods.
MATERIALS AND METHODS
Study area
The water demand time-series data were monitored using the ‘Data Logger’ device, equipment equipped with nonvolatile memory that performs data acquisition with the most varied types of existing sensors. The equipment was configured to register the flow rate (L/s) of the water exiting the water treatment plant (WTP-A) at 5 min intervals, and on the macrometer that connects the WTP-B to the study area at 15 min intervals. In the analyzed period, missing data comprise days with no data collection due to failures in the ‘Data Logger’ device. Missing data were replaced using the Naive Bayes imputation method. The choice of this method is due to its simplicity.
Prior to the forecasting analysis, the data were aggregated into 1 h intervals, using the mean function. The final data set showed a mean water flow rate of 533.63 (L/s). The forecast horizon was set at 24 h; i.e., a 24-step ahead.
Table 1 presents the descriptive measures of the historical demand series used in this study. The median flow rate for the period analyzed was 522.81 L/s, standard deviation of 123.16 L/s and coefficient of variation of 23.08%.
Descriptive measures of the historical demand series
Statistic . | Units . |
---|---|
Mean | 533.63 (L/s) |
Median | 522.81 (L/s) |
Standard deviation | 123.16 (L/s) |
Minimum | 116.57 (L/s) |
Maximum | 947.18 (L/s) |
First quantile | 438.48 (L/s) |
Third quantile | 629.64 (L/s) |
Coefficient of variation | 23.08% |
Statistic . | Units . |
---|---|
Mean | 533.63 (L/s) |
Median | 522.81 (L/s) |
Standard deviation | 123.16 (L/s) |
Minimum | 116.57 (L/s) |
Maximum | 947.18 (L/s) |
First quantile | 438.48 (L/s) |
Third quantile | 629.64 (L/s) |
Coefficient of variation | 23.08% |
DTSF method
The DTSF method is based on scan statistics (Glaz et al. 2009), which comprise a class of statistical methods aiming to estimate anomalous behavior in databases with temporal, spatial or spatio-temporal components. This method was originally presented by Joseph Naus (Naus 1965) and adapted for epidemiological surveillance systems using spatial (Kulldorff 1997), temporal and spatio-temporal data (Kulldorff et al. 1998; Kulldorff 2001). Briefly, a scanning window with a pre-fixed geometry scans the data, alternating its position and dimension. A test statistic is calculated for each window configuration. The configuration yielding the highest value for the test statistic represents a potential candidate for anomalous behavior. Statistical inference is obtained using Monte Carlo simulations (Mooney 1997), with the null hypothesis that the data set does not present anomalous data. Detailed information about scan statistics is found in Glaz et al. (2009).
The DTSF method scans a time-series using a fixed window. The objective is to find historical windows in which the data patterns are similar to the most recent values in the time-series. A test statistic, or similarity statistic, is calculated for each window. In addition, a similarity function is estimated for each window. The similarity function aims to define an equation for mapping historical data in the scanning window to the most recent data in the time-series. Once the most similar historical windows are detected, the respective similarity functions are applied to the subsequent values of the selected windows, thus generating future forecasts of the time-series.



















Example of the application of the DTSF method using an hourly water demand time-series. The length of the scanning window is 24 h. A linear model (the similarity function) is estimated for each window. The past seven windows with higher similarity (according to their
values) are indicated in rectangles.
Example of the application of the DTSF method using an hourly water demand time-series. The length of the scanning window is 24 h. A linear model (the similarity function) is estimated for each window. The past seven windows with higher similarity (according to their
values) are indicated in rectangles.
Forecasting estimates using similarity functions of the most similar scanning windows, according to the DTSF methodology.
Forecasting estimates using similarity functions of the most similar scanning windows, according to the DTSF methodology.
The final predictions are generated using an aggregation function, such as average or median functions. The DTSF method requires three parameters: the length of the scanning window, w, the parametric specification of the similarity function (the degree of polynomial, j, chosen to set the similarity function) and the number of the best matches, m (the number of the most similar historical windows). To guarantee the high computational speed of the method, linear similarity functions such as linear, quadratic or cubic equations are preferable. The number of best matches can be selected dynamically using, for example, those windows with similarity statistics above a previously defined threshold. In the present study, the number of the most similar historical windows was previously chosen. Briefly, different values for the length of the scanning window and the number of best matches were evaluated. The length of the scanning window and the number of best matches with minimum forecasting error were selected.
An important requirement for the application of the DTSF method is the availability of a large time-series. It is worth mentioning that DTSF is a data-oriented method.
Statistical and machine learning methods used
The main Soft Computing tools used in this work are implemented in the R software. The Nonlinear Autoregressive Neural Network methods with Bootstrap, Autoregressive Integrated Moving Average, Trigonometric seasonality, Box–Cox transformation, ARMA errors, Trend and Seasonal components, Exponential Smoothing and Theta were applied using the Forecast package (Hyndman & Khandakar 2008; Hyndman et al. 2017). It is worth mentioning that the Theta model, developed by Assimakopoulos & Nikolopoulos (2000), became the winner of the M3 Competition and was used as one of the benchmark methods in the M4 Competition (Makidrakis et al. 2020). The hybrid model, using the STL filter with exponential smoothing (STL + ETS), and Hybrid.2 (consisting of an ensemble of seven different time-series forecasts – arima, tbats, ets, theta, stlm, snaive and nnetar – which are individually fitted and combined to create an ensemble mean forecast), were applied using the hybrid Model function from the forecast Hybrid package (Shaub & Ellis 2019). This package makes it possible to combine several models with equal weights, or with weights based on in-sample errors, and is based on the study by Bates & Granger (1969). The extreme learning machines ELM model was implemented using the nnfor package (Kourentzes 2017), which is based on studies by Crone & Kourentzes (2010) and Kourentzes et al. (2014). According to Lendasse et al. (2010), this method was ranked as the second-best method in ESTSP'08 – 2nd European Symposium of Time-Series Prediction. The PSF algorithm (Martinez Alvarez et al. 2011) was implemented using the PSF package (Bokde et al. 2017). The DTSF method was implemented using the DTScanF package (Costa & Mineti 2019).
Cross-validation
Cross-validation of time-series forecasts using rolling windows. Light gray blocks represent training sets and dark gray blocks represent test sets (Adapted from Bergmeier & Benítez 2012).
Cross-validation of time-series forecasts using rolling windows. Light gray blocks represent training sets and dark gray blocks represent test sets (Adapted from Bergmeier & Benítez 2012).
According to Tashman (2000), rolling window cross-validation is the preferred procedure to evaluate forecasting performances.
CC, statistical methods and machine learning
The metrics used to evaluate the accuracy of the candidate models are: mean square error (MSE), the root of the mean square error (RMSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE). Model Fitting (MF) was also used as a measure of the fitness of historical data to the prediction model; it represents the MSE of the model predictions, normalized by the average of the time-series under study.
Computational time was estimated using a computer having a CPU with 7th Generation Intel® Core ™ i7-7500 U @ 3.5 GHz processor, 16 GB DDR4 memory and 64× based processor.
RESULTS AND DISCUSSION
Records of monthly average flow rate throughout the two-year sample.
Boxplots of the forecast errors for the subsequent 24 h of the evaluated models.
Boxplots of the forecast errors for the subsequent 24 h of the evaluated models.
Evolution of the median of forecast errors, for the next 24 h, of each of the evaluated models.
Evolution of the median of forecast errors, for the next 24 h, of each of the evaluated models.
Table 2 shows the results of predictions for the evaluated models. From January to May, and then in December, no model was more accurate than the others for forecasting. From June to November, the ELM method showed the best results. Regarding CC, the DTSF algorithm was extremely fast, as compared with the other methods.
Evaluation of the forecasting models for an hourly forecast horizon using the root mean square error, average absolute error, average percentage error and computational complexity
Forecast month . | NB . | DTSF . | ELM . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | MAPE . | CC . | RMSE . | MAE . | MAPE . | CC . | RMSE . | MAE . | MAPE . | CC . | |
Jan 16 | 209.745 | 156.491 | 0.379 | 1.000 | 182.011 | 139.618 | 0.323 | 2.230 | 145.077 | 111.832 | 0.281 | 15,433.794 |
Feb 16 | 183.982 | 145.982 | 0.301 | 1.000 | 147.250 | 112.660 | 0.229 | 1.036 | 134.271 | 107.674 | 0.210 | 390.773 |
Mar 16 | 166.761 | 121.163 | 0.233 | 1.000 | 122.836 | 90.394 | 0.177 | 0.936 | 123.607 | 101.783 | 0.199 | 735.929 |
Apr 16 | 141.742 | 102.277 | 0.185 | 1.000 | 123.432 | 95.721 | 0.180 | 1.873 | 116.152 | 90.609 | 0.159 | 742.174 |
May 16 | 143.721 | 104.316 | 0.191 | 1.000 | 119.758 | 88.201 | 0.168 | 2.900 | 95.900 | 80.430 | 0.152 | 1,582.850 |
Jun 16 | 149.416 | 107.567 | 0.208 | 1.000 | 117.162 | 85.972 | 0.170 | 1.978 | 98.230 | 79.438 | 0.157 | 1,343.134 |
Jul 16 | 146.528 | 106.845 | 0.216 | 1.000 | 128.379 | 97.254 | 0.198 | 2.999 | 101.832 | 84.061 | 0.165 | 3,202.723 |
Aug 16 | 123.719 | 81.809 | 0.154 | 1.000 | 109.401 | 77.338 | 0.150 | 2.118 | 90.620 | 72.558 | 0.137 | 2,460.946 |
Sep 16 | 133.974 | 93.607 | 0.180 | 1.000 | 118.224 | 90.596 | 0.172 | 2.148 | 95.081 | 74.738 | 0.146 | 4,689.409 |
Oct 16 | 148.850 | 113.865 | 0.234 | 1.000 | 123.264 | 93.284 | 0.195 | 2.276 | 107.985 | 88.033 | 0.183 | 3,632.866 |
Nov 16 | 171.804 | 139.047 | 0.290 | 1.000 | 141.991 | 108.943 | 0.232 | 1.858 | 115.048 | 91.792 | 0.203 | 3,249.069 |
Dec 16 | 152.384 | 115.530 | 0.267 | 1.000 | 133.700 | 96.193 | 0.224 | 2.311 | 103.291 | 82.606 | 0.196 | 4,362.896 |
. | BNNAR . | SARIMA . | TBATS . | |||||||||
Jan 16 | 151.753 | 112.491 | 0.285 | 2,734.733 | 214.723 | 162.299 | 0.381 | 1,215.395 | 174.200 | 129.920 | 0.298 | 1,395.868 |
Feb 16 | 135.642 | 105.629 | 0.214 | 2,215.384 | 173.828 | 130.608 | 0.266 | 75.517 | 131.353 | 102.755 | 0.212 | 1,247.589 |
Mar 16 | 115.860 | 95.741 | 0.186 | 2,641.903 | 155.864 | 115.328 | 0.231 | 83.812 | 118.280 | 94.931 | 0.190 | 1,837.428 |
Apr 16 | 103.786 | 78.048 | 0.138 | 3,220.659 | 149.156 | 112.378 | 0.217 | 97.013 | 132.075 | 105.810 | 0.205 | 1,751.243 |
May 16 | 101.266 | 79.276 | 0.144 | 6,005.786 | 131.096 | 98.502 | 0.193 | 511.578 | 109.265 | 86.394 | 0.170 | 2,797.262 |
Jun 16 | 104.291 | 84.596 | 0.164 | 4,219.843 | 134.915 | 98.710 | 0.202 | 117.171 | 121.736 | 92.953 | 0.193 | 2,021.754 |
Jul 16 | 107.118 | 90.339 | 0.181 | 6,865.314 | 135.242 | 98.178 | 0.208 | 182.392 | 115.083 | 87.799 | 0.186 | 3,333.398 |
Aug 16 | 97.933 | 81.933 | 0.154 | 5,238.267 | 126.745 | 92.346 | 0.189 | 128.817 | 106.305 | 79.481 | 0.164 | 2,058.823 |
Sep 16 | 104.829 | 88.543 | 0.169 | 6,203.078 | 128.884 | 95.397 | 0.187 | 144.644 | 121.547 | 91.489 | 0.179 | 2,412.095 |
Oct 16 | 108.647 | 92.146 | 0.186 | 7,199.256 | 144.261 | 106.244 | 0.225 | 213.127 | 123.633 | 95.830 | 0.196 | 2,652.736 |
Nov 16 | 121.898 | 97.024 | 0.213 | 6,612.746 | 152.601 | 113.928 | 0.250 | 477.352 | 132.453 | 98.412 | 0.210 | 2,185.039 |
Dec 16 | 113.015 | 82.346 | 0.202 | 9,574.767 | 146.398 | 107.750 | 0.250 | 1,103.022 | 123.433 | 92.259 | 0.208 | 3,094.292 |
. | ETS . | STL_ETS . | HYBRID . | |||||||||
Jan 16 | 213.707 | 162.322 | 0.378 | 52.817 | 216.071 | 166.601 | 0.385 | 104.382 | 176.488 | 134.700 | 0.320 | 1,655.709 |
Feb 16 | 165.098 | 124.729 | 0.254 | 51.728 | 167.871 | 127.170 | 0.268 | 125.303 | 146.842 | 112.945 | 0.228 | 1,460.979 |
Mar 16 | 148.331 | 110.633 | 0.225 | 59.732 | 143.909 | 111.058 | 0.222 | 114.901 | 125.762 | 99.579 | 0.198 | 1,640.583 |
Apr 16 | 149.849 | 117.553 | 0.222 | 70.576 | 147.702 | 117.951 | 0.223 | 99.036 | 119.430 | 95.936 | 0.182 | 1,874.222 |
May 16 | 133.367 | 108.852 | 0.208 | 149.174 | 117.948 | 89.126 | 0.171 | 163.454 | 112.123 | 89.137 | 0.172 | 3,266.722 |
Jun 16 | 152.310 | 120.529 | 0.232 | 102.690 | 126.296 | 95.176 | 0.192 | 107.247 | 114.996 | 91.713 | 0.186 | 2,158.089 |
Jul 16 | 131.575 | 98.838 | 0.204 | 143.758 | 121.445 | 93.172 | 0.193 | 178.819 | 115.819 | 91.565 | 0.192 | 3,481.806 |
Aug 16 | 112.965 | 82.736 | 0.165 | 117.989 | 104.619 | 75.636 | 0.153 | 115.801 | 109.975 | 88.549 | 0.176 | 2,551.959 |
Sep 16 | 126.392 | 94.812 | 0.181 | 127.450 | 116.838 | 88.206 | 0.171 | 179.993 | 115.912 | 92.689 | 0.182 | 2,704.730 |
Oct 16 | 142.584 | 105.532 | 0.214 | 149.554 | 138.741 | 102.991 | 0.215 | 171.218 | 98.067 | 98.067 | 0.207 | 3,066.671 |
Nov 16 | 147.768 | 113.436 | 0.242 | 112.067 | 148.777 | 110.931 | 0.247 | 139.069 | 135.237 | 103.686 | 0.236 | 2,451.597 |
Dec 16 | 146.032 | 110.034 | 0.253 | 383.011 | 141.399 | 103.414 | 0.237 | 187.751 | 124.390 | 91.905 | 0.218 | 3,203.210 |
. | THETA . | PSF . | . | . | . | . | ||||||
Jan 16 | 211.083 | 159.680 | 0.370 | 1.898 | 179.452 | 139.340 | 0.343 | 826.263 | ||||
Feb 16 | 165.127 | 125.294 | 0.253 | 2.117 | 161.338 | 124.413 | 0.244 | 738.665 | ||||
Mar 16 | 146.005 | 108.254 | 0.219 | 2.099 | 138.917 | 106.329 | 0.214 | 827.778 | ||||
Apr 16 | 149.722 | 113.971 | 0.221 | 2.205 | 127.581 | 104.753 | 0.195 | 971.017 | ||||
May 16 | 132.070 | 96.776 | 0.192 | 3.725 | 117.770 | 89.641 | 0.173 | 1,660.057 | ||||
Jun 16 | 131.349 | 95.584 | 0.197 | 2.838 | 128.741 | 94.023 | 0.192 | 1,174.000 | ||||
Jul 16 | 133.194 | 96.189 | 0.204 | 4.570 | 130.295 | 96.451 | 0.207 | 1,839.220 | ||||
Aug 16 | 118.684 | 84.174 | 0.173 | 2.729 | 123.583 | 88.572 | 0.185 | 1,304.881 | ||||
Sep 16 | 123.626 | 90.155 | 0.174 | 3.424 | 133.609 | 99.580 | 0.199 | 1,467.738 | ||||
Oct 16 | 135.324 | 98.766 | 0.206 | 3.575 | 142.547 | 108.125 | 0.232 | 1,690.579 | ||||
Nov 16 | 143.930 | 107.266 | 0.237 | 2.899 | 155.052 | 115.282 | 0.261 | 1,338.822 | ||||
Dec 16 | 143.383 | 105.410 | 0.241 | 3.573 | 141.751 | 106.136 | 0.255 | 1,797.798 |
Forecast month . | NB . | DTSF . | ELM . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | MAPE . | CC . | RMSE . | MAE . | MAPE . | CC . | RMSE . | MAE . | MAPE . | CC . | |
Jan 16 | 209.745 | 156.491 | 0.379 | 1.000 | 182.011 | 139.618 | 0.323 | 2.230 | 145.077 | 111.832 | 0.281 | 15,433.794 |
Feb 16 | 183.982 | 145.982 | 0.301 | 1.000 | 147.250 | 112.660 | 0.229 | 1.036 | 134.271 | 107.674 | 0.210 | 390.773 |
Mar 16 | 166.761 | 121.163 | 0.233 | 1.000 | 122.836 | 90.394 | 0.177 | 0.936 | 123.607 | 101.783 | 0.199 | 735.929 |
Apr 16 | 141.742 | 102.277 | 0.185 | 1.000 | 123.432 | 95.721 | 0.180 | 1.873 | 116.152 | 90.609 | 0.159 | 742.174 |
May 16 | 143.721 | 104.316 | 0.191 | 1.000 | 119.758 | 88.201 | 0.168 | 2.900 | 95.900 | 80.430 | 0.152 | 1,582.850 |
Jun 16 | 149.416 | 107.567 | 0.208 | 1.000 | 117.162 | 85.972 | 0.170 | 1.978 | 98.230 | 79.438 | 0.157 | 1,343.134 |
Jul 16 | 146.528 | 106.845 | 0.216 | 1.000 | 128.379 | 97.254 | 0.198 | 2.999 | 101.832 | 84.061 | 0.165 | 3,202.723 |
Aug 16 | 123.719 | 81.809 | 0.154 | 1.000 | 109.401 | 77.338 | 0.150 | 2.118 | 90.620 | 72.558 | 0.137 | 2,460.946 |
Sep 16 | 133.974 | 93.607 | 0.180 | 1.000 | 118.224 | 90.596 | 0.172 | 2.148 | 95.081 | 74.738 | 0.146 | 4,689.409 |
Oct 16 | 148.850 | 113.865 | 0.234 | 1.000 | 123.264 | 93.284 | 0.195 | 2.276 | 107.985 | 88.033 | 0.183 | 3,632.866 |
Nov 16 | 171.804 | 139.047 | 0.290 | 1.000 | 141.991 | 108.943 | 0.232 | 1.858 | 115.048 | 91.792 | 0.203 | 3,249.069 |
Dec 16 | 152.384 | 115.530 | 0.267 | 1.000 | 133.700 | 96.193 | 0.224 | 2.311 | 103.291 | 82.606 | 0.196 | 4,362.896 |
. | BNNAR . | SARIMA . | TBATS . | |||||||||
Jan 16 | 151.753 | 112.491 | 0.285 | 2,734.733 | 214.723 | 162.299 | 0.381 | 1,215.395 | 174.200 | 129.920 | 0.298 | 1,395.868 |
Feb 16 | 135.642 | 105.629 | 0.214 | 2,215.384 | 173.828 | 130.608 | 0.266 | 75.517 | 131.353 | 102.755 | 0.212 | 1,247.589 |
Mar 16 | 115.860 | 95.741 | 0.186 | 2,641.903 | 155.864 | 115.328 | 0.231 | 83.812 | 118.280 | 94.931 | 0.190 | 1,837.428 |
Apr 16 | 103.786 | 78.048 | 0.138 | 3,220.659 | 149.156 | 112.378 | 0.217 | 97.013 | 132.075 | 105.810 | 0.205 | 1,751.243 |
May 16 | 101.266 | 79.276 | 0.144 | 6,005.786 | 131.096 | 98.502 | 0.193 | 511.578 | 109.265 | 86.394 | 0.170 | 2,797.262 |
Jun 16 | 104.291 | 84.596 | 0.164 | 4,219.843 | 134.915 | 98.710 | 0.202 | 117.171 | 121.736 | 92.953 | 0.193 | 2,021.754 |
Jul 16 | 107.118 | 90.339 | 0.181 | 6,865.314 | 135.242 | 98.178 | 0.208 | 182.392 | 115.083 | 87.799 | 0.186 | 3,333.398 |
Aug 16 | 97.933 | 81.933 | 0.154 | 5,238.267 | 126.745 | 92.346 | 0.189 | 128.817 | 106.305 | 79.481 | 0.164 | 2,058.823 |
Sep 16 | 104.829 | 88.543 | 0.169 | 6,203.078 | 128.884 | 95.397 | 0.187 | 144.644 | 121.547 | 91.489 | 0.179 | 2,412.095 |
Oct 16 | 108.647 | 92.146 | 0.186 | 7,199.256 | 144.261 | 106.244 | 0.225 | 213.127 | 123.633 | 95.830 | 0.196 | 2,652.736 |
Nov 16 | 121.898 | 97.024 | 0.213 | 6,612.746 | 152.601 | 113.928 | 0.250 | 477.352 | 132.453 | 98.412 | 0.210 | 2,185.039 |
Dec 16 | 113.015 | 82.346 | 0.202 | 9,574.767 | 146.398 | 107.750 | 0.250 | 1,103.022 | 123.433 | 92.259 | 0.208 | 3,094.292 |
. | ETS . | STL_ETS . | HYBRID . | |||||||||
Jan 16 | 213.707 | 162.322 | 0.378 | 52.817 | 216.071 | 166.601 | 0.385 | 104.382 | 176.488 | 134.700 | 0.320 | 1,655.709 |
Feb 16 | 165.098 | 124.729 | 0.254 | 51.728 | 167.871 | 127.170 | 0.268 | 125.303 | 146.842 | 112.945 | 0.228 | 1,460.979 |
Mar 16 | 148.331 | 110.633 | 0.225 | 59.732 | 143.909 | 111.058 | 0.222 | 114.901 | 125.762 | 99.579 | 0.198 | 1,640.583 |
Apr 16 | 149.849 | 117.553 | 0.222 | 70.576 | 147.702 | 117.951 | 0.223 | 99.036 | 119.430 | 95.936 | 0.182 | 1,874.222 |
May 16 | 133.367 | 108.852 | 0.208 | 149.174 | 117.948 | 89.126 | 0.171 | 163.454 | 112.123 | 89.137 | 0.172 | 3,266.722 |
Jun 16 | 152.310 | 120.529 | 0.232 | 102.690 | 126.296 | 95.176 | 0.192 | 107.247 | 114.996 | 91.713 | 0.186 | 2,158.089 |
Jul 16 | 131.575 | 98.838 | 0.204 | 143.758 | 121.445 | 93.172 | 0.193 | 178.819 | 115.819 | 91.565 | 0.192 | 3,481.806 |
Aug 16 | 112.965 | 82.736 | 0.165 | 117.989 | 104.619 | 75.636 | 0.153 | 115.801 | 109.975 | 88.549 | 0.176 | 2,551.959 |
Sep 16 | 126.392 | 94.812 | 0.181 | 127.450 | 116.838 | 88.206 | 0.171 | 179.993 | 115.912 | 92.689 | 0.182 | 2,704.730 |
Oct 16 | 142.584 | 105.532 | 0.214 | 149.554 | 138.741 | 102.991 | 0.215 | 171.218 | 98.067 | 98.067 | 0.207 | 3,066.671 |
Nov 16 | 147.768 | 113.436 | 0.242 | 112.067 | 148.777 | 110.931 | 0.247 | 139.069 | 135.237 | 103.686 | 0.236 | 2,451.597 |
Dec 16 | 146.032 | 110.034 | 0.253 | 383.011 | 141.399 | 103.414 | 0.237 | 187.751 | 124.390 | 91.905 | 0.218 | 3,203.210 |
. | THETA . | PSF . | . | . | . | . | ||||||
Jan 16 | 211.083 | 159.680 | 0.370 | 1.898 | 179.452 | 139.340 | 0.343 | 826.263 | ||||
Feb 16 | 165.127 | 125.294 | 0.253 | 2.117 | 161.338 | 124.413 | 0.244 | 738.665 | ||||
Mar 16 | 146.005 | 108.254 | 0.219 | 2.099 | 138.917 | 106.329 | 0.214 | 827.778 | ||||
Apr 16 | 149.722 | 113.971 | 0.221 | 2.205 | 127.581 | 104.753 | 0.195 | 971.017 | ||||
May 16 | 132.070 | 96.776 | 0.192 | 3.725 | 117.770 | 89.641 | 0.173 | 1,660.057 | ||||
Jun 16 | 131.349 | 95.584 | 0.197 | 2.838 | 128.741 | 94.023 | 0.192 | 1,174.000 | ||||
Jul 16 | 133.194 | 96.189 | 0.204 | 4.570 | 130.295 | 96.451 | 0.207 | 1,839.220 | ||||
Aug 16 | 118.684 | 84.174 | 0.173 | 2.729 | 123.583 | 88.572 | 0.185 | 1,304.881 | ||||
Sep 16 | 123.626 | 90.155 | 0.174 | 3.424 | 133.609 | 99.580 | 0.199 | 1,467.738 | ||||
Oct 16 | 135.324 | 98.766 | 0.206 | 3.575 | 142.547 | 108.125 | 0.232 | 1,690.579 | ||||
Nov 16 | 143.930 | 107.266 | 0.237 | 2.899 | 155.052 | 115.282 | 0.261 | 1,338.822 | ||||
Dec 16 | 143.383 | 105.410 | 0.241 | 3.573 | 141.751 | 106.136 | 0.255 | 1,797.798 |
The DTSF model was superior to SARIMA and ETS in all metrics evaluated (RMSE, MAE and MAPE) for all months of 2016. The STL-ETS hybrid model was more accurate based on the RMSE metric for May, July, August and September. In relation to the MAPE metric, the STL-ETS hybrid model was slightly better for the months of July and September, with values of 19.3% and 17.1%, respectively. For the same period, the DTSF model reached values of 19.8 and 17.2%, respectively. When compared with the PSF analogue, regarding the MAPE metric, the DTSF model obtained the best results for all months of 2016. Regarding the RMSE metric, TBATS was more accurate than DTSF in January, February, March, May, July, August, November and December. However, in relation to the MAPE metric, DTSF was more accurate in the periods from March to June and from August to October. Compared with the THETA method, DTSF obtained the best results for all months of the year using the RMSE and MAPE metrics. Regarding the MAE metric, THETA was more accurate for the months of July to September and in the month of November.
Next, a comparative analysis of the DTSF, ELM and BNNAR methods is presented. ELM proved to be more accurate, not only in relation to DTSF but also to BNNAR, for the entire second semester of 2016 for the RMSE and MAPE metrics. BNNAR presented superior results in relation to ELM and DTSF in April and May for the MAPE metric and superior results in April, May and December for the MAE metric. With respect to this metric, ELM was superior to DTSF and BNNAR in the first quarter of the year.
Forecast performance using Model Fitting metric in 2016
Forecast month . | NB . | DTSF . | ELM . | BNNAR . | SARIMA . | ETS . | TBATS . | THETA . | STL-ETS . | HYBRID . | PSF . |
---|---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | 0.3382 | 0.2102 | 0.1782 | 0.1996 | 0.3162 | 0.2989 | 0.1979 | 0.2895 | 0.3067 | 0.2211 | 0.2493 |
Feb 16 | 0.1786 | 0.1093 | 0.0842 | 0.0947 | 0.1526 | 0.1387 | 0.0999 | 0.1341 | 0.1599 | 0.1085 | 0.1169 |
Mar 16 | 0.1092 | 0.0589 | 0.0595 | 0.0486 | 0.1035 | 0.0968 | 0.0584 | 0.0942 | 0.0857 | 0.0658 | 0.0819 |
Apr 16 | 0.0678 | 0.0572 | 0.0411 | 0.0342 | 0.0905 | 0.0841 | 0.0720 | 0.0920 | 0.0818 | 0.0574 | 0.0616 |
May 16 | 0.0703 | 0.0542 | 0.0341 | 0.0329 | 0.0705 | 0.0638 | 0.0496 | 0.0734 | 0.0539 | 0.0502 | 0.0564 |
Jun 16 | 0.0861 | 0.0584 | 0.0429 | 0.0433 | 0.0894 | 0.0839 | 0.0774 | 0.0855 | 0.0746 | 0.0659 | 0.0806 |
Jul 16 | 0.0996 | 0.0749 | 0.0430 | 0.0511 | 0.0875 | 0.0772 | 0.0673 | 0.0858 | 0.0687 | 0.0671 | 0.0861 |
Aug 16 | 0.0584 | 0.0498 | 0.0322 | 0.0368 | 0.0749 | 0.0552 | 0.0565 | 0.0658 | 0.0503 | 0.0545 | 0.0751 |
Sep 16 | 0.0723 | 0.0542 | 0.0414 | 0.0438 | 0.0672 | 0.0605 | 0.0616 | 0.0589 | 0.0547 | 0.0566 | 0.0777 |
Oct 16 | 0.1093 | 0.0833 | 0.0661 | 0.0612 | 0.1196 | 0.0942 | 0.0788 | 0.0980 | 0.1026 | 0.0926 | 0.1269 |
Nov 16 | 0.1519 | 0.1208 | 0.0923 | 0.1017 | 0.1398 | 0.1189 | 0.0993 | 0.1263 | 0.1356 | 0.1348 | 0.1730 |
Dec 16 | 0.1626 | 0.1341 | 0.0895 | 0.1207 | 0.1663 | 0.1565 | 0.1025 | 0.1490 | 0.1423 | 0.1326 | 0.1774 |
Forecast month . | NB . | DTSF . | ELM . | BNNAR . | SARIMA . | ETS . | TBATS . | THETA . | STL-ETS . | HYBRID . | PSF . |
---|---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | 0.3382 | 0.2102 | 0.1782 | 0.1996 | 0.3162 | 0.2989 | 0.1979 | 0.2895 | 0.3067 | 0.2211 | 0.2493 |
Feb 16 | 0.1786 | 0.1093 | 0.0842 | 0.0947 | 0.1526 | 0.1387 | 0.0999 | 0.1341 | 0.1599 | 0.1085 | 0.1169 |
Mar 16 | 0.1092 | 0.0589 | 0.0595 | 0.0486 | 0.1035 | 0.0968 | 0.0584 | 0.0942 | 0.0857 | 0.0658 | 0.0819 |
Apr 16 | 0.0678 | 0.0572 | 0.0411 | 0.0342 | 0.0905 | 0.0841 | 0.0720 | 0.0920 | 0.0818 | 0.0574 | 0.0616 |
May 16 | 0.0703 | 0.0542 | 0.0341 | 0.0329 | 0.0705 | 0.0638 | 0.0496 | 0.0734 | 0.0539 | 0.0502 | 0.0564 |
Jun 16 | 0.0861 | 0.0584 | 0.0429 | 0.0433 | 0.0894 | 0.0839 | 0.0774 | 0.0855 | 0.0746 | 0.0659 | 0.0806 |
Jul 16 | 0.0996 | 0.0749 | 0.0430 | 0.0511 | 0.0875 | 0.0772 | 0.0673 | 0.0858 | 0.0687 | 0.0671 | 0.0861 |
Aug 16 | 0.0584 | 0.0498 | 0.0322 | 0.0368 | 0.0749 | 0.0552 | 0.0565 | 0.0658 | 0.0503 | 0.0545 | 0.0751 |
Sep 16 | 0.0723 | 0.0542 | 0.0414 | 0.0438 | 0.0672 | 0.0605 | 0.0616 | 0.0589 | 0.0547 | 0.0566 | 0.0777 |
Oct 16 | 0.1093 | 0.0833 | 0.0661 | 0.0612 | 0.1196 | 0.0942 | 0.0788 | 0.0980 | 0.1026 | 0.0926 | 0.1269 |
Nov 16 | 0.1519 | 0.1208 | 0.0923 | 0.1017 | 0.1398 | 0.1189 | 0.0993 | 0.1263 | 0.1356 | 0.1348 | 0.1730 |
Dec 16 | 0.1626 | 0.1341 | 0.0895 | 0.1207 | 0.1663 | 0.1565 | 0.1025 | 0.1490 | 0.1423 | 0.1326 | 0.1774 |
Forecasting accuracy of the models using Model Fitting metric from 1 January 2016 to 31 December 2016.
Forecasting accuracy of the models using Model Fitting metric from 1 January 2016 to 31 December 2016.
Figure 11 shows the accuracy of the models in the study period using the MF metric. The worst performance for all methods occurred in the summer (January, February and December) and spring (September to November) months, while the best performance occurred in the autumn (March to May) and winter (June to August) months, when the water demand is lower. The two best models, ELM and BNNAR, are similar. DTSF and TBATS alternated best performance accuracy throughout the studied period.
Table 4 presents the evaluation of the forecasting models for the different seasons in 2016 using the root mean square error, average absolute error, average percentage error and MF. During the summer, which is a period of higher water consumption, no model achieved better forecasting performance for any metrics. Nonetheless, the ELM achieved the best results during the winter months.
Evaluation of the forecasting models for the different seasons in 2016 using the root mean square error, average absolute error, average percentage error and model fitting
Forecast month . | Seasons . | RMSE . | MAE . | MAPE . | MF . |
---|---|---|---|---|---|
Jan 16 | Summer | ELM | ELM | ELM | ELM |
Feb 16 | Summer | TBATS | TBATS | ELM | ELM |
Mar 16 | Summer–Autumn | BNNAR | DTSF | DTSF | BNNAR |
Apr 16 | Autumn | BNNAR | BNNAR | BNNAR | BNNAR |
May 16 | Autumn | ELM | BNNAR | BNNAR | BNNAR |
Jun 16 | Autumn–Winter | ELM | ELM | ELM | ELM |
Jul 16 | Winter | ELM | ELM | ELM | ELM |
Aug 16 | Winter | ELM | ELM | ELM | ELM |
Sep 16 | Winter–Spring | ELM | ELM | ELM | ELM |
Oct 16 | Spring | ELM | ELM | ELM | BNNAR |
Nov 16 | Spring | ELM | ELM | ELM | ELM |
Dec 16 | Spring–Summer | ELM | BNNAR | ELM | ELM |
Forecast month . | Seasons . | RMSE . | MAE . | MAPE . | MF . |
---|---|---|---|---|---|
Jan 16 | Summer | ELM | ELM | ELM | ELM |
Feb 16 | Summer | TBATS | TBATS | ELM | ELM |
Mar 16 | Summer–Autumn | BNNAR | DTSF | DTSF | BNNAR |
Apr 16 | Autumn | BNNAR | BNNAR | BNNAR | BNNAR |
May 16 | Autumn | ELM | BNNAR | BNNAR | BNNAR |
Jun 16 | Autumn–Winter | ELM | ELM | ELM | ELM |
Jul 16 | Winter | ELM | ELM | ELM | ELM |
Aug 16 | Winter | ELM | ELM | ELM | ELM |
Sep 16 | Winter–Spring | ELM | ELM | ELM | ELM |
Oct 16 | Spring | ELM | ELM | ELM | BNNAR |
Nov 16 | Spring | ELM | ELM | ELM | ELM |
Dec 16 | Spring–Summer | ELM | BNNAR | ELM | ELM |
The results, shown in Table 5, show the computational overhead (log 10) of the methods used in the present study.
Computational overhead for each of the models throughout 2016 (log 10)
Forecast month . | DTSF . | ELM . | BNNAR . | SARIMA . | ETS . | TBATS . | THETA . | STL-ETS . | HYBRID . | PSF . |
---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | 0.348 | 4.188 | 3.437 | 3.085 | 1.723 | 3.145 | 0.278 | 2.019 | 3.219 | 2.917 |
Feb 16 | 0.016 | 2.592 | 3.345 | 1.878 | 1.714 | 3.096 | 0.326 | 2.098 | 3.165 | 2.868 |
Mar 16 | − 0.029 | 2.867 | 3.422 | 1.923 | 1.776 | 3.264 | 0.322 | 2.060 | 3.215 | 2.918 |
Apr 16 | 0.273 | 2.871 | 3.508 | 1.987 | 1.849 | 3.243 | 0.343 | 1.996 | 3.273 | 2.987 |
May 16 | 0.462 | 3.199 | 3.779 | 2.709 | 2.174 | 3.447 | 0.571 | 2.213 | 3.514 | 3.220 |
Jun 16 | 0.296 | 3.128 | 3.625 | 2.069 | 2.012 | 3.306 | 0.453 | 2.030 | 3.334 | 3.070 |
Jul 16 | 0.477 | 3.506 | 3.837 | 2.261 | 2.158 | 3.523 | 0.660 | 2.252 | 3.542 | 3.265 |
Aug 16 | 0.326 | 3.391 | 3.719 | 2.110 | 2.072 | 3.314 | 0.436 | 2.064 | 3.407 | 3.116 |
Sep 16 | 0.332 | 3.671 | 3.793 | 2.160 | 2.105 | 3.382 | 0.534 | 2.255 | 3.432 | 3.167 |
Oct 16 | 0.357 | 3.560 | 3.857 | 2.329 | 2.175 | 3.424 | 0.553 | 2.234 | 3.487 | 3.228 |
Nov 16 | 0.269 | 3.512 | 3.820 | 2.679 | 2.049 | 3.339 | 0.462 | 2.143 | 3.389 | 3.127 |
Dec 16 | 0.364 | 3.640 | 3.981 | 3.043 | 2.583 | 3.491 | 0.553 | 2.274 | 3.506 | 3.255 |
Forecast month . | DTSF . | ELM . | BNNAR . | SARIMA . | ETS . | TBATS . | THETA . | STL-ETS . | HYBRID . | PSF . |
---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | 0.348 | 4.188 | 3.437 | 3.085 | 1.723 | 3.145 | 0.278 | 2.019 | 3.219 | 2.917 |
Feb 16 | 0.016 | 2.592 | 3.345 | 1.878 | 1.714 | 3.096 | 0.326 | 2.098 | 3.165 | 2.868 |
Mar 16 | − 0.029 | 2.867 | 3.422 | 1.923 | 1.776 | 3.264 | 0.322 | 2.060 | 3.215 | 2.918 |
Apr 16 | 0.273 | 2.871 | 3.508 | 1.987 | 1.849 | 3.243 | 0.343 | 1.996 | 3.273 | 2.987 |
May 16 | 0.462 | 3.199 | 3.779 | 2.709 | 2.174 | 3.447 | 0.571 | 2.213 | 3.514 | 3.220 |
Jun 16 | 0.296 | 3.128 | 3.625 | 2.069 | 2.012 | 3.306 | 0.453 | 2.030 | 3.334 | 3.070 |
Jul 16 | 0.477 | 3.506 | 3.837 | 2.261 | 2.158 | 3.523 | 0.660 | 2.252 | 3.542 | 3.265 |
Aug 16 | 0.326 | 3.391 | 3.719 | 2.110 | 2.072 | 3.314 | 0.436 | 2.064 | 3.407 | 3.116 |
Sep 16 | 0.332 | 3.671 | 3.793 | 2.160 | 2.105 | 3.382 | 0.534 | 2.255 | 3.432 | 3.167 |
Oct 16 | 0.357 | 3.560 | 3.857 | 2.329 | 2.175 | 3.424 | 0.553 | 2.234 | 3.487 | 3.228 |
Nov 16 | 0.269 | 3.512 | 3.820 | 2.679 | 2.049 | 3.339 | 0.462 | 2.143 | 3.389 | 3.127 |
Dec 16 | 0.364 | 3.640 | 3.981 | 3.043 | 2.583 | 3.491 | 0.553 | 2.274 | 3.506 | 3.255 |
DTSF presented computational overhead ranging between −0.029 and 0.477. THETA was the method with the second-lowest computational overhead, with values ranging from 0.278 to 0.660. This was followed by ETS with values ranging from 1.714 to 2.583. Both the THETA and ETS methods are used as benchmarks of the M4 Competition. It is worth mentioning that the THETA method relies on a linear decomposition of the time-series into two components. In sequence, a linear regression model is applied to the first component and an exponential smoothing model is applied to the second component. The linear decomposition of the time-series, the fit of the linear regression model and the fit of the exponential smoothing model are extremely fast procedures. On the other hand, ELM, BNNAR, HYBRID, TBATS and PSF showed marked increases in computational overhead as new data were incorporated into the forecasts (recalibration of the rolling time window) (Tashman 2000). The computational overhead of these five methods ranged from 2.592 to 4.188, 3.345 to 3.981, 3.165 to 3.542, 3.096 to 3.523 and 2.868 to 3.265, respectively. In December 2016, BNNAR, ELM, HYBRID and PSF showed computational overheads of 3.981, 3.640, 3.506 and 3.255, respectively, compared with the DTSF computational overhead of 0.364.
Numbers and percentages of the best forecast adjustments for each of the models for the next 24 h in 2016
Model . | Number of best predictions . | Percentage of best predictions . |
---|---|---|
NB | 77 | 21.45 |
DTSF | 62 | 17.27 |
ELM | 59 | 16.43 |
BNNAR | 72 | 20.06 |
ETS | 5 | 1.39 |
SARIMA | 11 | 3.06 |
TBATS | 17 | 4.74 |
STL-ETS | 13 | 3.62 |
PSF | 18 | 5.01 |
HYBRID | 4 | 1.11 |
THETA | 21 | 5.85 |
Model . | Number of best predictions . | Percentage of best predictions . |
---|---|---|
NB | 77 | 21.45 |
DTSF | 62 | 17.27 |
ELM | 59 | 16.43 |
BNNAR | 72 | 20.06 |
ETS | 5 | 1.39 |
SARIMA | 11 | 3.06 |
TBATS | 17 | 4.74 |
STL-ETS | 13 | 3.62 |
PSF | 18 | 5.01 |
HYBRID | 4 | 1.11 |
THETA | 21 | 5.85 |
Evolution of the median of the RMSE, for the next 24 h, after applying a seven-day moving average for all evaluated models.
Evolution of the median of the RMSE, for the next 24 h, after applying a seven-day moving average for all evaluated models.
After applying the seven-day moving average to the median of forecast errors, the results were obtained as shown in Table 6. The results indicate that in a one-week horizon with hourly forecasts, the forecast error for the different models varies greatly. Therefore, there is no better model for predicting water demand for every day of the year. The best water demand forecasting model in 2016 was NB, with 21.45% of the best forecasts, after applying a weekly moving average. This was followed by the hybrid neuronal model BNNAR with 20.06%; the DTSF method, with 17.27%; ELM, with 16.43%; THETA, with 5.85%; and PSF, with 5.01%; this was followed by hybrid model ETS-STL, with 3.62%. The statistical methods TBATS, SARIMA and ETS presented 4.74%, 3.06% and 1.39% of the best predictions, respectively. The HYBRID model showed the worst performance, with 1.11%. It is worth noting that a more complex method was not as accurate as a simpler method (Makidrakis et al. 2018). Dalessandro (2013) shows that the use of computationally less expensive algorithms, which are also less accurate in the intermediate stages, define a model that performs equally well in big data forecasting. These algorithms may have more iterations compared with those that are computationally more expensive. However, the iterations are much faster. Consequently, less expensive algorithms tend to converge much faster, while giving the same precision.
Table 7 shows the daily frequency of methods with the best forecast performance for each month during 2016. As previously mentioned, NB, DTSF, ELM and BNNAR achieved improved forecast performance, most frequently. A daily analysis of the best forecast methods is found in the Supplementary Material.
List by month of the best forecast methods evaluated daily during 2016
Forecast month . | Seasons . | NB . | DTSF . | ELM . | BNNAR . | ETS . | SARIMA . | TBATS . | STL-ETS . | PSF . | HYBRID . | THETA . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | Summer | 4 | 1 | 18 | 1 | |||||||
Feb 16 | Summer | 1 | 5 | 11 | 4 | 5 | 3 | |||||
Mar 16 | Summer–Autumn | 3 | 12 | 3 | 1 | 4 | 4 | 4 | ||||
Apr 16 | Autumn | 6 | 2 | 1 | 18 | 3 | ||||||
May 16 | Autumn | 7 | 4 | 9 | 3 | 2 | 1 | 5 | ||||
Jun 16 | Autumn–Winter | 8 | 6 | 6 | 3 | 1 | 6 | |||||
Jul 16 | Winter | 5 | 1 | 10 | 6 | 1 | 2 | 2 | 1 | 1 | 2 | |
Aug 16 | Winter | 17 | 3 | 2 | 2 | 7 | ||||||
Sep 16 | Winter–Spring | 21 | 4 | 2 | 3 | |||||||
Oct 16 | Spring | 5 | 15 | 4 | 3 | 1 | 1 | 2 | ||||
Nov 16 | Spring | 7 | 10 | 4 | 2 | 4 | 2 | 1 | ||||
Dec 16 | Spring–Summer | 7 | 3 | 16 | 3 | 1 | 1 |
Forecast month . | Seasons . | NB . | DTSF . | ELM . | BNNAR . | ETS . | SARIMA . | TBATS . | STL-ETS . | PSF . | HYBRID . | THETA . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Jan 16 | Summer | 4 | 1 | 18 | 1 | |||||||
Feb 16 | Summer | 1 | 5 | 11 | 4 | 5 | 3 | |||||
Mar 16 | Summer–Autumn | 3 | 12 | 3 | 1 | 4 | 4 | 4 | ||||
Apr 16 | Autumn | 6 | 2 | 1 | 18 | 3 | ||||||
May 16 | Autumn | 7 | 4 | 9 | 3 | 2 | 1 | 5 | ||||
Jun 16 | Autumn–Winter | 8 | 6 | 6 | 3 | 1 | 6 | |||||
Jul 16 | Winter | 5 | 1 | 10 | 6 | 1 | 2 | 2 | 1 | 1 | 2 | |
Aug 16 | Winter | 17 | 3 | 2 | 2 | 7 | ||||||
Sep 16 | Winter–Spring | 21 | 4 | 2 | 3 | |||||||
Oct 16 | Spring | 5 | 15 | 4 | 3 | 1 | 1 | 2 | ||||
Nov 16 | Spring | 7 | 10 | 4 | 2 | 4 | 2 | 1 | ||||
Dec 16 | Spring–Summer | 7 | 3 | 16 | 3 | 1 | 1 |
The method with the best performance in each month is indicated in bold type.
CONCLUSIONS
Based on the literature and the forecast results using different univariate methodologies, it is possible to conclude that algorithms with high computational overhead often are not applicable in large-scale, real-world situations. In this context, DTSF is extremely fast as compared with the remaining methods and provides reasonably lower prediction errors. The great advantage of this data-oriented method is that, given a large amount of data, in general, the performance improves. Thus, the use and evaluation of DTSF for water demand forecasting is an important and novel contribution of this study.
Furthermore, scanning and scaling previous observations using similarity functions compensate changes in the mean and in the scale throughout the time-series. Thus, observations under the influence of pipe failures and leakages can be used for forecasting purposes, as long as the water demand time-series pattern under these circumstances is strongly correlated with the last observations.
Results show that, in summer and autumn, no model surpassed the others in terms of forecasting accuracy. However, from June to November (winter and spring), ELM presented the best results, i.e., ELM presented the best forecasting performance. However, its computational overhead is a major disadvantage. The other methods based on neural networks presented the best results for the metrics RMSE, MAE, MAPE and MF. Similar to ELM, computational overhead increases significantly if cross-validation with sliding windows and recalibration is applied.
In general, the main findings show that there is no uniform method to predict water demand with high accuracy. Nonetheless, there is a set of methods with optimal performances. Each method can be further investigated to improve its own performance. Alternatively, future work will aim at investigating the switching of methods, since each method achieved better forecast accuracy at different parts of the year (see Figures A.1–A.4 in the Supplementary Material).
Finally, forecasting results using data from 2016 indicate that, within a week-long horizon with an hourly forecast resolution, errors using the different models vary. Therefore, there is no best model for predicting daily water demand. Results indicate the use of different models at different times of the year.
ACKNOWLEDGEMENTS
The authors thank CNPq and Fapemig for financial support, and Sanitation Company of Minas Gerais (Copasa MG) for providing urban water demand data.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.