Water resources management is crucial for human well-being and contemporary socio-economic development. However, the increasing use of water has led to various problems that affect its quality and availability. To address these issues, accurate forecasting of water consumption is essential for the optimal operation of water collection, treatment, and distribution systems. This study aims to compare four machine learning methods for predicting daily urban water demand in a Brazilian coastal tourist city (Guaratuba – Paraná). Historical data from the city’s water distribution system, spanning from 2016 to 2019 (1,461 measurements in total), were considered along with meteorological and calendar data to conduct the investigation. Three time series cross-validation approaches were considered for each method, thus totaling 12 evaluation settings. All models were subjected to hyperparameter optimization and evaluated using appropriate performance metrics from the literature. Results demonstrate the importance of using nonlinear models to predict short-term water demand, highlighting the problem’s complexity. From the compared models, multilayer perceptron provided the best results. Finally, regardless of the model, the best results were obtained by applying an expanding window time series cross-validation, indicating that the more historical data available, the better, in this particular case.

  • Machine learning algorithms are appropriate for water demand estimation.

  • Preprocessing allows for better data quality and final predictive results.

  • The addition of historical data showed positive results in the case study.

  • A case study in a coastal touristic city, Guaratuba – Paraná – Brazil is presented.

  • Artificial neural networks provided the best predictions for the case study.

The management of water resources is critical to ensure human well-being and socio-economic development. Throughout the course of history, civilizations have flourished along river banks, underscoring the indispensable role of water to our existence. However, the growing demand for water has created several challenges that compromise the quality and availability of this vital resource (UN-Water 2021). Among the contributing factors to an increase in water demand are population growth, economic development, and changes in consumption patterns. As a consequence, accurate water demand forecasting assumes paramount importance for the optimal operation of water catchment, treatment, and distribution systems (Billings & Jones 2011).

Despite that, water demand forecasting is a complex task due to the nature of the data under analysis and the various factors influencing water consumption. Indeed, several works have been published specifically on this topic, e.g., Kanakoudis & Gonelas (2014). As outlined by Cominola et al. (2023), such factors can be organized into three main categories, namely observable, latent, and external. Observable factors include tangible aspects that are readily apparent and trackable, such as socio-demographic, and house and yard characteristics. Latent factors, on the other hand, are subjective and difficult to measure, such as consumer perception, awareness, habits, and opinions. These factors are often hidden or not immediately apparent, requiring more detailed analysis to determine their impact on water consumption. External factors include climate variables, water prices, and other extrinsic factors that can influence water consumption patterns. Although often beyond the control of the system under consideration, they can still have a significant impact on water usage.

Furthermore, it is essential to acknowledge the intricacies and nuances of each scenario under scrutiny. In coastal tourist cities, for instance, there is a clear positive correlation between the influx of tourists and the observed water consumption levels (Toth et al. 2018). The concentration of tourists in these regions can potentially lead to increased pressure on local water resources, making tourism a significant contributor to the overall water consumption (Gössling et al. 2012). However, few studies have considered water demand forecasting for cities with peculiar fluctuations in population. Almutaz et al. (2012) proposed a probabilistic forecasting model for Mecca, Saudi Arabia, a significant religious center that annually attracts millions of Islamic faith followers. Kofinas et al. (2016) offered a comparative analysis of artificial neural networks and adaptive neuro-fuzzy inference systems, focusing on a Mediterranean tourist resort characterized by arid, hot summers coinciding with substantial tourist influxes. Felfelani & Kerachian (2016) employed various artificial neural network architectures to predict water demand in Mashhad, Iran, a tourist city affected by fluctuations driven by national and religious events.

To address the complexities of water demand forecasting, a wide variety of methods have been proposed. Although there is no single convention, these approaches can be broadly divided into two main categories, namely statistical and machine learning based (Guo et al. 2018). Statistical methods, including time-series analysis and regression models, leverage historical water consumption data to discern trends and patterns. Machine learning methods have become increasingly popular for forecasting, as they can handle complex relationships and adapt to dynamic conditions. Prominent machine learning algorithms frequently employed in water demand forecasting include artificial neural networks and support vector regression. Comprehensive, in-depth reviews on the topic are presented by Donkor (2014), Ghalehkhondabi et al. (2017), de Souza Groppo et al. (2019) and, more recently, by Niknam et al. (2022).

In this paper the specific scenario of short-term water demand forecasting in a coastal tourist city characterized by significant population variations is considered. To briefly offer some insight into the complexity of the scenario, it is worth reporting that despite a population of around 38 thousand inhabitants (IBGE 2021), the city experiences peaks of up to one million visitors during New Year’s Eve, as reported by local authorities (Portal da Cidade 2022). Four machine learning methods are taken into account: linear regression (LR), k-nearest neighbors (kNN), support vector regression (SVR), and multilayer perceptron (MLP). Consequently, this research contributes to the growing body of literature on water demand forecasting, focusing on the unique challenges posed by such dynamic scenarios. The experimental setup employed within this study, which comprehends the two main cross-validation approaches in time-series, can be used as a reference (testbed) for future research. Finally, the findings may also have practical implications, supporting the development of more accurate water demand forecasting models for water utility companies, and benefiting all stakeholders engaged in water resource management.

In this section, a comprehensive overview of the study area and the data utilized in the research is provided, including its characteristics and the preprocessing techniques employed to enhance its quality. Subsequently, the methods used in the experiments are described, along with details on how the models were trained and evaluated.

Study area and data

The city of Guaratuba is situated on the coast of the state of Paraná, Brazil (Figure 1). According to estimates from IBGE (2021), the city has a population of 37,974 inhabitants. During the summer months (December to March), the city experiences a substantial influx of tourists, which alters water consumption patterns and leads to an increase in per capita dues (Carvalho Junior 2021). For this research, two datasets related to the city’s water distribution system (WDS) were available, but only one was chosen for further analysis. The additional data collected to model these consumption patterns can be classified into two categories: meteorological and calendar data.

Regarding the historical information related to the municipal WDS, the company responsible for providing water supply services to Guaratuba (SANEPAR) shared two types of data: the daily volume of water consumed from reservoirs and the daily volume of water produced in the water treatment plants (WTPs). Both datasets cover a period of four years, from 2016 to 2019. However, only one is utilized for the prediction task, as further discussed in Section 3.1.
Figure 1

Guaratuba highlighted on Brazil’s map.

Figure 1

Guaratuba highlighted on Brazil’s map.

Close modal

The meteorological data used in this study were sourced from SIMEPAR, the company responsible for providing the state with meteorological, hydrological, and environmental data. The dataset includes measurements of temperature, radiation, relative humidity, and precipitation, and was collected from May 2015 to November 2020 at a 15-minute frequency. As reported by Billings & Jones (2011), these climatic factors play a significant role in generating the seasonal components of water usage. It is worth noting that this data provides an ideal meteorological forecast for model development, as only estimates for these values can be obtained when the model is employed in real-world scenarios.

The calendar data was collected in an attempt to model the city’s influx of tourists. Previous research showed that in 2012, the number of tourists visiting Paraná’s coast throughout the year was 2,597,392 (Carvalho Junior 2021). As such, data regarding holidays and school recesses were gathered. Holidays were selected for Guaratuba, as well as for the two largest nearby cities, Curitiba and Joinville, each with populations exceeding half a million people. The information was acquired from a web API. School recess data was collected considering solely the state of Paraná from the website of the state’s education and sports department (SEED/PR 2022). In addition to summer and winter holidays, recesses during the term period were also taken into account.

Table 1 presents a summary of the raw data utilized in this study. Prior to any analysis, the data was carefully selected, extracted, and transformed into a set of features that are appropriate for the intended use case. Despite the laborious and time-consuming nature of these steps, it is imperative to understand and prepare the raw data in a format suitable for further analysis (Tan et al. 2016). Among the preprocessing steps taken are: selecting the target attribute based on data quality characteristics, feature extraction and transformation, imputing missing values, reducing noise in the target variable, and encoding and normalizing categorical and numerical attributes, respectively.

Table 1

Main characteristics of the data

Raw dataCategoryMeasurement frequencySource
Water produced in the WTPs Target 1 day SANEPAR 
Water consumed from the reservoirs Target 1 day SANEPAR 
Temperature Meteorological 15 minutes SIMEPAR 
Radiation Meteorological 15 minutes SIMEPAR 
Relative humidity Meteorological 15 minutes SIMEPAR 
Precipitation Meteorological 15 minutes SIMEPAR 
Holidays Calendar – – 
School recesses (Paraná State) Calendar – (SEED/PR 2022
Raw dataCategoryMeasurement frequencySource
Water produced in the WTPs Target 1 day SANEPAR 
Water consumed from the reservoirs Target 1 day SANEPAR 
Temperature Meteorological 15 minutes SIMEPAR 
Radiation Meteorological 15 minutes SIMEPAR 
Relative humidity Meteorological 15 minutes SIMEPAR 
Precipitation Meteorological 15 minutes SIMEPAR 
Holidays Calendar – – 
School recesses (Paraná State) Calendar – (SEED/PR 2022

To impute the missing values (only five were found in the meteorological dataset), the kNN (k-Nearest Neighbors) algorithm with Euclidean distance and k=5 was employed. The influence of each neighbor was weighted by the inverse of its distance. The water demand time series was subjected to noise reduction using the singular spectrum analysis (SSA) technique, which has become a standard tool in the analysis of meteorological and geophysical time series (Zhigljavsky & Golyandina 2020; Zubaidi et al. 2020). The SSA configuration followed the methodology outlined in Hassani (2010). A window length (L) of 560 was chosen to compute the trajectory matrix. This size strikes a balance, being sufficiently large to capture significant patterns while remaining smaller than half of the entire train data (730 data points). This matrix forms the basis for applying singular value decomposition (SVD). Following SVD, meaningful components were identified by interpreting singular vectors, guided by a visual analysis of w-correlations to determine the parameter r. The first 108 eigentriples were selected, representing essential patterns, while the rest were considered noise. The final step involves reconstructing the one-dimensional series using the chosen eigenvectors. Finally, categorical attributes were encoded by converting each label to an integer value, whereas numerical attributes were normalized between 0 and 1.

Supervised learning algorithms

Time series forecasting, the objective of this work, can be approached as a regression task and, thus, allows the use of supervised learning methods (Bontempi et al. 2012; Brownlee 2017). In this context, historical time series observations become the target attribute of the dataset, alongside other input features. A model is then generated by exploring the relationship between the set of input variables and the output, which can be used for one-step or multi-step prediction. This section covers the supervised learning methods used to tackle the time series forecasting task of this study.

Linear regression

Linear regression is a widely used method for predicting a real-valued output given a vector of n real-valued inputs (Goodfellow et al. 2016; Murphy 2022). Its popularity stems from its simplicity and interpretability since it assumes the expected value of the output is a linear function of the input. It is noteworthy that despite this linearity assumption, linear regression can be effectively applied to transformed data, allowing for the modeling of nonlinear relationships. The predicted value () is defined as:
where are the weights or regression coefficients learned during training. These coefficients are adjusted to minimize the cost function, residual sum of squares (RSS), between the observed targets in the dataset and the ones predicted by the model. It is worth noting that while linear regression is a simple and limited learning algorithm, it is often used as a baseline for comparing other algorithms (Goodfellow et al. 2016).

k-Nearest neighbours

The kNN algorithm is a nonparametric technique that is both simple and powerful. Unlike parametric models, this algorithm does not rely on any assumptions about the underlying data distribution, and instead, derives its model structure from the training dataset (Murphy 2022). The method stores the inputs X and outputs y of the training set, and when attempting to predict the target value of a new object, it searches for the nearest k instances in the stored set and returns the associated regression target. The returned value is usually the average of the target attribute from the k nearest instances, which can be weighted by the inverse of their distance to the new object, giving more importance to the closest points. The performance of the KNN algorithm is sensitive to the choice of k and the distance metric adopted between instances (Flach 2012). To ensure accurate predictions, it may be also necessary to scale the attributes to a similar range.

Support vector regression

Initially developed for classification tasks under the name of support vector machine (SVM) and later generalized for regression, SVR is one of the most elegant regression methods (Cortes & Vapnik 1995; Drucker et al. 1996). The algorithm seeks a hyperplane that widely separates the training samples, constrained to have at most ε deviation from the actual values (Smola & Schölkopf 2004). It does not penalize errors as long as they are smaller than ε, giving the flexibility to define how much error is acceptable in the model. The hyperplane and ε define a region called ε-insensitive tube and the support vectors are those data points that lie on the boundary or outside. Hence, SVR computes a hyperplane that minimizes the ε-insensitive tube to be as narrow as possible while comprising the most number of training samples, handling the error term is the constraints of the algorithm (Zhang & O’Donnell 2020). A technique known as the ‘kernel trick’ is used to capture non-linear relationships between variables, which maps the data to a higher dimensional space. For further mathematical details, references such as Bishop & Nasrabadi (2006) and Murphy (2022) can be consulted.

Multilayer perceptron

MLP is a type of artificial neural network (ANN) that is widely used for supervised learning tasks. ANN, commonly referred to as ‘neural networks’, are a set of models that are inspired by the structure and function of the biological neural networks that constitute the human brain. The basic building block of an ANN is a computational model of a neuron, which is an information-processing unit whose output y is mathematically described by Equation (1) (Haykin 1998). Neuron inputs are represented by x and are multiplied by its synaptic weights w, with positive (excitatory) or negative (inhibitory) weights. The external threshold , called bias, is multiplied by a fixed input and is responsible for shifting the linear combiner from the origin. The weighted inputs are summed to form the induced local field (or activation potential) ν of the neuron which is then passed through an activation function . Such a function is responsible for determining whether the neuron will fire or not, that is, if it will produce an output or not.
(1)
However, simple neurons like the perceptron can only solve a limited class of linearly separable problems. To overcome this limitation, networks with many neurons organized in layers can be constructed. The MLP is one such network, which consists of an input layer, one or more hidden layers, and an output layer. Each node, except for the input nodes, is a neuron like the one described above. These networks exhibit a high degree of connectivity (generally fully connected, i.e., each neuron is connected to all neurons of the next layer) and include nonlinear activation functions that are differentiable (such as the logistic function). An algorithm known as backpropagation guides the training process to determine the network parameters, i.e., the weights of the neurons (Rumelhart et al. 1986).

Model selection & evaluation

This section outlines the methodology adopted for selecting and evaluating machine learning models to estimate water demand in Guaratuba – PR. Initially, the preprocessed data was split into three distinct sets: train, validation, and test. The training set was exclusively used for model training, whereas the validation set was utilized to optimize model hyperparameters (i.e., model selection), and the test set to provide an unbiased evaluation of the chosen model (i.e., model evaluation), reproducing a real-world application scenario. To ensure a reasonable split that considers the annual patterns observed in water demand data, the first two years were allocated for training, the third year for validation, and the last year for testing. All models underwent the same training and evaluation process.

Out of the four methods used in this study, only linear regression does not have any hyperparameters. The other methods were fine-tuned using the widely adopted hyperparameter optimization technique known as grid search (Feurer & Hutter 2019). The scoring metric of choice in this study was the root mean square error (RMSE). Issues that could arise from its use, such as the heavy penalty weight assigned to larger errors, are not addressed. The search space for each method’s hyperparameters is provided in Table 2, with a reasonable range chosen for numerical hyperparameters. It is important to note that these values were determined based on literature observation and the author’s expertise, as the main focus of the research was not on determining optimal parameter values for each and every method. Logarithmic sampling was utilized for some hyperparameters to enable broader exploration. The default values specified by version 1.0.2 of the scikit-learn framework were utilized for all other parameters, with the exception of three MLP parameters: shuffle=False, early_stopping=True, and max_iter=1000.

Table 2

Search space for the machine learning algorithms

Learning algorithmHyperparameterSearch space
Linear Regression – – 
k-Nearest Neighbors n_neighbors {3, 5,…, 29} 
 weights {uniform, distance} 
 metric {euclidean, manhattan} 
Support Vector Regression kernel {linear, poly, rbf, sigmoid} 
 [0.001, 100], 10 
 epsilon [0.001, 0.1], 10 
 gamma {scale, auto} 
Multilayer Perceptron hidden_layer_sizes {(11,), (13,),…, (23,)} 
 activation {tanh, relu} 
 solver {lbfgs, sgd, adam} 
 alpha [0.01, 1.0], 10 
 learning_rate_init [0.0001, 0.01], 5 
Learning algorithmHyperparameterSearch space
Linear Regression – – 
k-Nearest Neighbors n_neighbors {3, 5,…, 29} 
 weights {uniform, distance} 
 metric {euclidean, manhattan} 
Support Vector Regression kernel {linear, poly, rbf, sigmoid} 
 [0.001, 100], 10 
 epsilon [0.001, 0.1], 10 
 gamma {scale, auto} 
Multilayer Perceptron hidden_layer_sizes {(11,), (13,),…, (23,)} 
 activation {tanh, relu} 
 solver {lbfgs, sgd, adam} 
 alpha [0.01, 1.0], 10 
 learning_rate_init [0.0001, 0.01], 5 

Note: The symbol indicates that the values are evenly spaced in logarithmic space.

Time series cross-validation

When evaluating machine learning models, traditional estimators like k-fold cross-validation are not appropriate for dealing with time series data. This is because these methods assume that observations are independent and identically distributed, which is not valid for time series data where the temporal order in which the values were recorded must be respected (Brownlee 2017). To address this issue, researchers have recently adopted a procedure called time series cross-validation, although naming conventions are still being established (Uber 2019, 2020; Gordeev et al. 2020).

Time series cross-validation involves evaluating a model in various time periods, simulating a real-world application scenario (Hyndman & Athanasopoulos 2021). In this process, there are multiple test sets, and the corresponding training set comprises only observations that occurred prior to those in the test set. The performance metric of interest is computed by averaging the results for each iteration. This outcome can be utilized to evaluate a model, identify optimal parameters, and measure prediction volatility over time (Gordeev et al. 2020). Compared to a single holdout split, this approach provides a more consistent performance estimate, applying cross-validation logic while still respecting the temporal order of the data. Time series cross-validation can be categorized into two types: sliding window and expanding window.

Sliding window is a type of time series cross-validation method in which a fixed-size window moves forward over time, and the model is trained on the data within the window before predicting the next data point. In contrast, expanding window is another type of time series cross-validation method in which the model is trained on all data up to a certain time point before predicting the next data point. The main difference between these two methods is that sliding window only trains on a fixed period of data, whereas expanding window continually increases the amount of training data. Figure 2(a) and 2(b) illustrate the series of training and test sets for both approaches, where the green observations form the training sets, and the red observations form the test sets.
Figure 2

Visualization of the time series cross-validation behaviors with sliding window (a) and expanding window (b). The training set is represented in green, while the test set is in red: (a) Time series cross-validation using a sliding window; (b) Time series cross-validation using an expanding window.

Figure 2

Visualization of the time series cross-validation behaviors with sliding window (a) and expanding window (b). The training set is represented in green, while the test set is in red: (a) Time series cross-validation using a sliding window; (b) Time series cross-validation using an expanding window.

Close modal

For this study, a total of 12 models were created by utilizing three time series cross-validation configurations for each learning method. The configurations comprised two sliding windows with training window sizes of 365 and 730 days (one and two years, respectively), and one expanding window with an initial size of 730 days (two years), which grew at each iteration. A test window size and sliding step of 7 days were used for all three configurations since it was the smallest pattern expected from the daily water demand data. That is, all models provide predictions in a weekly fashion, forecasting water demand for the next 7 days ahead.

Performance metrics

Performance metrics provide a summary of the capability of the model that performed the predictions. They are used to monitor and measure the performance of a model during training, selection, and testing. This study used four measures commonly used in the literature, namely, root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and the coefficient of determination (R2), given by:
where yi represents the target value of the ith observation, is the forecasted value, n is the number of predicted values, and is the average of all observed values. Since regression models have continuous output, the metrics are based on some sort of distance between predicted and actual values.

The present section outlines the outcomes of this study. It begins by discussing the selection of the target attribute, followed by presenting the final dataset used for analysis that resulted from preprocessing. An overview of the performance of the models employed in the research is then provided. Subsequently, an in-depth analysis of the best-performing model from each learning algorithm is presented. Model selection results (optimal hyperparameters and model performance on the validation set) are included in the Supplementary material for ease of reference.

Selection of target attribute

To identify the most suitable predictive/target attribute for the analysis, two time series collected on a daily basis over four years from 2016 to 2019 were explored, comprising a total of 1,461 observations each. These time series represent water produced in the WTP and water consumed in the reservoirs. For ease of reference, the former will be referred to as ‘water produced’ and the latter as ‘water consumed’. Figure 3 depicts water produced in the WTPs and water consumed in the reservoirs of Guaratuba during the four years of available data: 2016 (upper left), 2017 (upper right), 2018 (lower left) and 2019 (lower right). Pearson’s correlation coefficient between the two variables for each year (assuming the same order as before) is 0.84, 0.67, 0.86, and 0.82, respectively. Over all years the correlation is 0.80. It is noteworthy that, as stated by Billings & Jones (2011), the system’s water demand and total water production in a public supply are conceptually equivalent. This means that the analysis is valid regardless of which time series is used. The ultimate choice of the most suitable predictive/target attribute was based on the quality of the data, particularly its completeness and coherence.

While the water produced time series was complete, 117 observations (8%) of water consumed were missing. According to SANEPAR, the company that provided the data, these missing values resulted from measurement errors arising from power outages and communication failures in the reading equipment (Carvalho Junior 2021). Furthermore, the water consumed data contained incoherent values, with 367 samples exceeding 50,000 m3, nearly double the maximum amount of water produced at 28,424 m3. To address these discrepancies and identify the underlying pattern in the time series, water consumed observations that fell outside the minimum and maximum values of the water produced were removed, resulting in the deletion of 369 samples. After this, 33% of the water consumed time series was missing, making any correction attempt via an imputation method difficult/troublesome.
Figure 3

Relationship between the produced and consumed water for Guaratuba (years: 2016–2019).

Figure 3

Relationship between the produced and consumed water for Guaratuba (years: 2016–2019).

Close modal

To measure the strength of association between the two time series, Pearson’s correlation coefficient was computed. The results indicated a strong positive linear relationship with a correlation coefficient of 0.80 over the four-year period (as already discussed). This suggests that both variables exhibit similar underlying patterns, with discrepancies likely arising from water losses. Indeed, research indicates substantial water wastage through leakage and inefficiencies within distribution networks (Gautam et al. 2020). The most differences appear during the months of August to October in 2017. Given the low quality of the water consumed time series, the decision was made to exclude it from further analysis and focus solely on the water produced data as the target variable.

Preprocessing

The raw data underwent various steps to transform it into a cleaned and usable dataset for further analysis. Initially, specific information was extracted from timestamps to generate meaningful features that can establish simple and strong relationships between inputs and outputs (Brownlee 2017). These features include year, month, day, day of the week, weekend, and season of the year. Ordinal attributes were converted to integers, and the encoded data represented a sequence of labels, while binary attributes were represented as integers with only two possible values, 0 or 1. These features are crucial in the learning algorithm’s ability to model the data accurately, especially due to its cyclic patterns.

To account for the significant impact of Carnival on the city’s water demand, a special feature was created for this period. In addition to the official Carnival holidays (which comprise Monday, Tuesday, and Wednesday), the dates for Friday, Saturday, and Sunday before the official days were also included to cover the entire festive period. The local holidays of Guaratuba, Curitiba, and Joinville were merged into a single attribute, as they showed high similarity with a Jaccard similarity coefficient of 0.88. These holiday-related features are binary and represented by an integer variable that takes on values of either 0 (not holiday) or 1 (holiday).

The meteorological data, which included temperature, radiation, relative humidity, and precipitation, underwent three steps: resampling, imputation, and scaling. First, the 15-minute data was resampled daily by calculating the daily mean and standard deviation for each variable. Next, missing data (only five samples) was imputed using the kNN algorithm with Euclidean distance and , where neighbors were weighted by the inverse of their distance. Finally, the data was normalized between 0 and 1 to ensure consistency in the analysis.

Lastly, SSA was applied to smooth the selected water demand time series. The residuals demonstrated a normal distribution centered at zero, suggesting that the smoothing process effectively reduced noise, and the resulting smoothed curve describes the signal. The final dataset for analysis comprises 18 attributes, including the target variable, with daily observations from 2016 to 2019. Of these attributes, eight were categorical and 10 were numerical. Table 3 provides a summary of the attributes in the final dataset, with a total of 18 dimensions. The target variable is highlighted at the bottom for easy identification. Subsequently, an examination of the overall model performance is conducted. For selected hyperparameters and model performance on the validation set, please refer to the Supplementary material.

Table 3

Attributes that compose the database used for analysis

AttributeDescriptionType
temperature_mean Daily average temperature (°C) Numerical 
temperature_std Daily standard deviation of temperature (°C) Numerical 
radiation_mean Daily average radiation (W/m2Numerical 
radiation_std Daily standard deviation of radiation (W/m2Numerical 
relative_humidity_mean Daily average relative humidity (%) Numerical 
relative_humidity_std Daily standard deviation of relative humidity (%) Numerical 
precipitation_mean Daily average precipitation (mm) Numerical 
precipitation_std Daily standard deviation of precipitation (mm) Numerical 
year Year of the record Numerical 
month Month of the record Categorical 
day Day of the record Categorical 
day_of_week Day of the week of the record Categorical 
is_weekend Indicates whether it is a weekend or not Categorical 
season Season of the record Categorical 
is_holiday_ctba_gtba_jve Indicates whether it is a public holiday Categorical 
is_carnival Indicates if it is carnival week Categorical 
is_school_recess_pr Indicates whether it is recess or school break Categorical 
water_produced Water produced by the WTPs of Guaratuba (m3Numerical 
AttributeDescriptionType
temperature_mean Daily average temperature (°C) Numerical 
temperature_std Daily standard deviation of temperature (°C) Numerical 
radiation_mean Daily average radiation (W/m2Numerical 
radiation_std Daily standard deviation of radiation (W/m2Numerical 
relative_humidity_mean Daily average relative humidity (%) Numerical 
relative_humidity_std Daily standard deviation of relative humidity (%) Numerical 
precipitation_mean Daily average precipitation (mm) Numerical 
precipitation_std Daily standard deviation of precipitation (mm) Numerical 
year Year of the record Numerical 
month Month of the record Categorical 
day Day of the record Categorical 
day_of_week Day of the week of the record Categorical 
is_weekend Indicates whether it is a weekend or not Categorical 
season Season of the record Categorical 
is_holiday_ctba_gtba_jve Indicates whether it is a public holiday Categorical 
is_carnival Indicates if it is carnival week Categorical 
is_school_recess_pr Indicates whether it is recess or school break Categorical 
water_produced Water produced by the WTPs of Guaratuba (m3Numerical 

Overall performance

The performance of four different machine learning algorithms – LR, kNN, SVR, and MLP – in predicting water demand was assessed. The models were trained using the optimal hyperparameters (as obtained with the validation set) and evaluated using three error metrics, RMSE, MAE, and MAPE. Table 4 summarizes the results obtained on the test set. The table is organized into three sections based on the time series cross-validation method used: expanding window (EW), one-year size sliding window (SW-1Y), and two-year size sliding window (SW-2Y). Performance measures were computed for each cross-validation iteration and averaged across rounds to estimate each model’s performance. Standard deviations are reported on the right side of each metric. The model with the lowest error for each learning algorithm is highlighted in bold for each metric. The results are analyzed from two perspectives: firstly, based on the learning algorithm, and secondly, considering the time-series cross-validation method.

Table 4

Performance of the 12 models in the test set

ModelRMSEσMAEσMAPE (%)σ (%)
EW 
LR 2148.36 929.992 1832.11 883.323 15.03 6.507 
kNN 1866.11 972.392 1590.67 948.388 12.30 5.078 
SVR 1852.52 647.234 1525.26 544.143 12.12 4.373 
MLP 1709.70 593.104 1408.97 515.158 11.81 4.707 
SW-1Y 
LR 2403.62 1010.830 2093.66 955.716 16.01 5.430 
kNN 1906.99 969.900 1618.19 954.202 12.47 5.055 
SVR 2022.69 846.400 1685.90 744.767 13.08 4.318 
MLP 2142.58 927.998 1796.94 855.901 14.11 5.841 
SW-2Y 
LR 2178.25 870.311 1870.36 814.996 14.88 5.341 
kNN 1968.28 1008.130 1668.98 963.966 12.98 5.370 
SVR 1943.40 783.327 1611.67 648.252 12.67 4.558 
MLP 1927.41 732.005 1620.15 630.532 13.08 4.916 
ModelRMSEσMAEσMAPE (%)σ (%)
EW 
LR 2148.36 929.992 1832.11 883.323 15.03 6.507 
kNN 1866.11 972.392 1590.67 948.388 12.30 5.078 
SVR 1852.52 647.234 1525.26 544.143 12.12 4.373 
MLP 1709.70 593.104 1408.97 515.158 11.81 4.707 
SW-1Y 
LR 2403.62 1010.830 2093.66 955.716 16.01 5.430 
kNN 1906.99 969.900 1618.19 954.202 12.47 5.055 
SVR 2022.69 846.400 1685.90 744.767 13.08 4.318 
MLP 2142.58 927.998 1796.94 855.901 14.11 5.841 
SW-2Y 
LR 2178.25 870.311 1870.36 814.996 14.88 5.341 
kNN 1968.28 1008.130 1668.98 963.966 12.98 5.370 
SVR 1943.40 783.327 1611.67 648.252 12.67 4.558 
MLP 1927.41 732.005 1620.15 630.532 13.08 4.916 

With respect to the learning algorithm, it can be seen that MLP performed the best with a test RMSE of 1,709.70 and a test MAE of 1,408.97. This indicates that the average absolute difference between the actual and predicted daily water demand values is 1,408.97 m3 — note from Figure 3 that the water demand values are as high as 25,000 m3. In descending order of RMSE, the other methods can be ranked as: SVR (RMSE ), kNN (RMSE ), and LR (RMSE ). From a more interpretable perspective, the outcomes reveal that the models achieved MAPE values ranging from 16.01 to 11.81%. Following established standards for predictive model quality, a MAPE falling within the 10–20% range is considered good, while anything below 10% indicates highly accurate forecasting (Lewis 1982). Consequently, the methods employed are considered sufficient for generating accurate water demand predictions, though there is room for further enhancement.

The superior performance MLP, SVR, and kNN – with only marginal differences compared to LR – can be attributed to their ability to model complex, nonlinear relationships. According to Niknam et al. (2022), the first two algorithms are ranked among the top machine learning algorithms for this task. In contrast, while kNN offers a good trade-off between performance and complexity, it is not commonly used in the field of water demand, leaving room for further exploration. This method typically requires less data than other algorithms and exhibits stable metrics across all three configurations employed in this analysis. Ultimately, the key limitation of LR lies in its inability to model nonlinear relationships, making it the only learning algorithm evaluated in this study that is unable to handle such relationships. Prior researches already emphasized that nonlinear models play an important role in predicting water demand (Adamowski et al. 2012; Vijai & Sivakumar 2018; Bata 2019).

With respect to the time series cross-validation, the analysis of the three configurations underscores the nuanced relationship between model performance and temporal dynamics. The results reveal that models utilizing the expanding window outperformed those employing the sliding window. Each learning algorithm had its best model using this configuration, suggesting that, in this specific case, a larger historical dataset leads to improved performance. For instance, the best-performing method using the expanding window was MLP (RMSE ), while LR had the worst performance (RMSE ). For the sliding window with a window size of 365 days, kNN performed the best (RMSE ), and LR performed the worst (RMSE ). All methods, except kNN, had their worst performance in this configuration. Finally, for the sliding window with a window size of 730 days, MLP performed the best (RMSE ), and LR performed the worst (RMSE ).

It is worth noting that in certain situations, simply increasing the amount of historical data available does not necessarily improve a model’s performance. This phenomenon has been observed in previous studies, which have shown that the data can undergo significant shifts in water consumption patterns, such as the adoption of a novel hygiene behavior (Bata 2019). To address this issue, smaller sliding windows for training may be more beneficial as they allow for disregarding old behaviors that are no longer relevant. A prime example of this is the COVID-19 pandemic, which caused significant changes in water consumption patterns. For instance, Lüdtke et al. (2021) found that during the first lockdown in 2020, daily water consumption in northern Germany was 14.3% higher than during the same time period in previous years.

Therefore, it is crucial to note that these results, as with any analysis, should be interpreted with caution, as they may differ depending on the specific location and data collected. For instance, the present study only examines data from the past four years, whereas other studies, such as Adamowski et al. (2012), consider data from nine years. The effects of different time series cross-validation configurations would likely be more pronounced if the data were collected over a longer period. Nevertheless, many studies in the field of water demand forecasting often focus solely on hold-out strategies, neglecting to consider the limitations inherent in these traditional model evaluation techniques. For practitioners contemplating the adoption of cross-validation, it is crucial not to overlook the temporal structure of the data. Failing to account for this aspect may lead to an overestimation of the model’s performance (Wang & Ruf 2022).

Selected models’ performance

The preceding section offered a broad glimpse into the model’s overall performance. To delve deeper, a thorough analysis was conducted for the best model associated with each machine learning algorithm. The results are presented in Figure 4 in the form of box plots, which provide a summary of the absolute error in the test set of each model across different months. Interestingly, it can be seen that all models exhibited their lowest and highest monthly performance in December and September, respectively.

In December, LR, kNN, SVR, and MLP models had an MAE of 2,438.78, 2,583.26, 1,875.48, and 1,831.90, respectively. The heightened influx of people during the summer season is likely the reason for the difficulty in modeling water demand during this month. This is further supported by the presence of outliers in the subsequent month (January), which has the highest absolute errors, except for the MLP. During these peak summer conditions (December and January), SVR and MLP showcased a notable superiority over LR and kNN, with a MAE difference of up to 751.36. Intriguingly, this is not consistently observed in other months, with instances where kNN outperforms all models (e.g., April and June). One possible explanation for this pattern is that the intricate dynamics of water usage during peak summer may require more sophisticated models for accurate representation, while in other months, simpler models might suffice due to potentially simpler usage patterns.
Figure 4

Box plot illustrating the absolute error (in m3) for each month in the test set (2019), showcasing only the best-performing model from each algorithm. The symbol ° denotes the mean value.

Figure 4

Box plot illustrating the absolute error (in m3) for each month in the test set (2019), showcasing only the best-performing model from each algorithm. The symbol ° denotes the mean value.

Close modal

In September, the models performed remarkably well with MAE values of 829.07, 760.91, 1,014.57, and 812.52, in the same order as previously mentioned. The models exhibited greater consistency during this month, with an MAE difference of up to only 253.66. The reason behind this remains unclear as no discernible patterns have been discovered. Furthermore, the month of October also exhibited good overall performance, while the months from March to August fall between the strong performance of September and October and the lower performance observed in December and January. It is crucial, however, to acknowledge the limitations imposed by the limited size of the test dataset, spanning only one year. This limitation hinders a comprehensive understanding of whether these observed trends are sporadic or indicative of a less chaotic water demand during these months.

Despite that, the models’ predictions during September and December were further explored, as illustrated by Figure 5. The central section of the figure depicts the outcomes of the entire test set, while the upper section provides a detailed view of the least accurate month (December), and the lower section zooms in on the month of peak performance (September). The predictions for September reveal a strong weekly seasonality in the water demand data, which repeats four times during the month. However, this pattern is not evident in December. The models encounter challenges in keeping up with the target variable values, with significant errors occurring around the 13th and 18th of the month. Additionally, this is apparent from the 26th, but only for LR and kNN models. These observations emphasize the importance of understanding the specific dynamics of water demand, especially during critical months when external factors like tourism can significantly influence the patterns.

Guaratuba is a seaside city and a popular destination for temporary residents during the summer, especially in December and January when schools are closed, and people flock to the beaches to enjoy the warm weather. As a result, the demand for water reaches its peak during this period, and the lack of available data representing the number of incoming tourists poses a significant constraint to this study. Although calendar attributes have been incorporated to mitigate this limitation, they offer only a broad overview of temporal patterns, underscoring the need for more fine-grained information such as daily tourist arrivals. Kofinas et al. (2016) exemplify in their research the effectiveness of integrating detailed tourist influx data, offering a more nuanced and accurate representation of the underlying dynamics. The integration of such fine-grained information holds the potential to significantly enhance the models’ performance, particularly during the peak summer season.

Firstly, Figure 6 presents a comparison between the actual and predicted urban water demand for the test set. The scatter plots depict the accuracy of the predictions, where a higher concentration of points around the line of identity indicates a stronger agreement between observed and predicted data. These findings reinforce the superior predictive capacity of MLP and SVR models in contrast to kNN and LR. The scores, prominently displayed in the title of each image, show values of 0.76, 0.72, 0.67, and 0.59 for MLP, SVR, kNN, and LR respectively. Following a conventional approach for interpreting the , as reported by Schober et al. (2018), kNN and LR exhibit a moderate correlation (0.40–0.69), while MLP and SVR demonstrate a strong correlation (0.70–0.89). These detailed assessments provide an in-depth comprehension of the models’ performance and contribute to a more knowledgeable evaluation of their predictive capabilities.
Figure 5

Daily forecasts in the test set (2019), showcasing the best-performing model from each algorithm.

Figure 5

Daily forecasts in the test set (2019), showcasing the best-performing model from each algorithm.

Close modal
Figure 6

Scatter plot comparing observed and predicted water demand values in the test set, showcasing the best-performing model from each algorithm.

Figure 6

Scatter plot comparing observed and predicted water demand values in the test set, showcasing the best-performing model from each algorithm.

Close modal

This research compared four machine learning algorithms for predicting daily urban water demand in Guaratuba, a Brazilian coastal city with a fluctuating population. The analysis was conducted using a historical dataset containing meteorological, calendar, and water demand information from 2016 to 2019, which underwent preprocessing steps to enhance its quality. To assess the model’s generalization power, three time series cross-validation configurations for each algorithm were employed. To the best of out knowledge, this is the first in-depth evaluation regarding the effect of such choices in the field of water demand. The results highlighted the importance of using nonlinear models for short-term forecasting, with the multilayer perceptron (MLP) algorithm delivering the most accurate predictions, followed by support vector regression (SVR), k-nearest neighbors (kNN), and the linear regression (LR) algorithm producing the poorest results. Despite producing reliable estimates, the models could probably benefit from other descriptive variables that influence water consumption, especially during the summer when the city experiences a surge in tourism. This recognition emphasizes the importance of tailoring the modeling approach to capture the unique characteristics of each city. For Guaratuba, this could involve integrating variables such as the number of tourist arrivals and hotel occupancy rates. Although information regarding the latter might be difficult to obtain on a daily basis, data regarding the former could be readily obtained from tolls located within the major influx highways arriving at the city. Lastly, it is worth noting that all models exhibited better forecasts when using time series cross-validation with an expanding window. This not only demonstrates their practical utility but also suggests promising directions for future research.

The authors would like to thank SANEPAR and SIMEPAR for providing the necessary data for this study.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Bata
M.
2019
Smart Water: Short-Term Forecasting Application in Water Utilities. Electronic Theses and Dissertations. 7685. https://scholar.uwindsor.ca/etd/7685
.
Billings
R. B.
&
Jones
C. V.
2011
Forecasting Urban Water Demand
.
American Water Works Association
,
Denver, CO
.
Bishop
C. M.
&
Nasrabadi
N. M.
2006
Pattern Recognition and Machine Learning
, Vol.
4
.
Springer
,
New York
.
Bontempi
G.
,
Ben Taieb
S.
&
Borgne
Y.-A. L.
2012
Machine learning strategies for time series forecasting. In: European Business Intelligence Summer School. Springer, pp. 62–77
.
Brownlee
J.
2017
Introduction to Time Series Forecasting with Python: How to Prepare Data and Develop Models to Predict the Future
.
Machine Learning Mastery
.
Carvalho Junior
M.
2021
Consumo e perdas no sistema de abastecimento de água de guaratuba, pontal do paraná e matinhos (litoral do paraná). Bachelor Thesis (Environmental and Sanitary Engineering B.S.), UFPR (Universidade Federal do Paraná), Curitiba, Brazil
.
Cominola
A.
,
Preiss
L.
,
Thyer
M.
,
Maier
H.
,
Prevos
P.
,
Stewart
R.
&
Castelletti
A.
2023
The determinants of household water consumption: A review and assessment framework for research and practice
.
npj Clean Water
6
(
1
),
11
.
Cortes
C.
&
Vapnik
V.
1995
Support-vector networks
.
Machine Learning
20
(
3
),
273
297
.
de Souza Groppo
G.
,
Costa
M. A.
&
Libânio
M.
2019
Predicting water demand: A review of the methods employed and future possibilities
.
Water Supply
19
(
8
),
2179
2198
.
Drucker
H.
,
Burges
C. J.
,
Kaufman
L.
,
Smola
A.
&
Vapnik
V.
1996
Support vector regression machines. In: Advances in Neural Information Processing Systems, Vol. 9. (Mozer, M. C., Jordan, M. I. & Petsche, T., eds.). MIT Press, Cambridge, MA
.
Donkor
E.
,
Mazzuchi
T.
,
Soyer
R.
&
Roberson
A.
2014
Urban water demand forecasting: Review of methods and models
.
Journal of Water Resources Planning and Management
140
(
2
),
146
159
.
Feurer
M.
&
Hutter
F.
2019
Hyperparameter optimization. In Automated Machine Learning. Springer, Cham, pp. 3–33
.
Flach
P.
2012
Machine Learning: The Art and Science of Algorithms that Make Sense of Data
.
Cambridge University Press, Cambridge
.
Gautam
J.
,
Chakrabarti
A.
,
Agarwal
S.
,
Singh
A.
,
Gupta
S.
&
Singh
J.
2020
Monitoring and forecasting water consumption and detecting leakage using an iot system
.
Water Supply
20
(
3
),
1103
1113
.
Ghalehkhondabi
I.
,
Ardjmand
E.
,
Young
W. A.
&
Weckman
G. R.
2017
Water demand forecasting: Review of soft computing methods
.
Environmental Monitoring and Assessment
189
(
7
),
313
.
Goodfellow
I.
,
Bengio
Y.
&
Courville
A.
2016
Deep Learning
.
MIT Press
,
Cambridge, MA
. .
Gordeev
D.
,
Singer
P.
,
Michailidis
M.
,
Müller
M.
&
Ambati
S.
2020
Backtesting the Predictability of Covid-19. arXiv preprint arXiv:2007.11411
.
Gössling
S.
,
Peeters
P.
,
Hall
C. M.
,
Ceron
J.-P.
,
Dubois
G.
,
Lehmann
L. V.
&
Scott
D.
2012
Tourism and water use: Supply, demand, and security. an international review
.
Tourism Management
33
(
1
),
1
15
.
Guo
G.
,
Liu
S.
,
Wu
Y.
,
Li
J.
,
Zhou
R.
&
Zhu
X.
2018
Short-term water demand forecast based on deep learning method
.
Journal of Water Resources Planning and Management
144
(
12
),
04018076
.
Hassani
H.
2010
A brief introduction to singular spectrum analysis
.
Optimal Decisions in Statistics and Data Analysis
1
,
1
11
.
Haykin
S.
1998
Neural Networks: A Comprehensive Foundation
, 2nd edn.
Prentice Hall PTR
,
USA
.
Hyndman
R. J.
&
Athanasopoulos
G.
2021
Forecasting: Principles and Practice
, 3rd edn.
OTexts
.
IBGE
2021
Censo brasileiro de 2021. Instituto Brasileiro de Geografia e Estatística, Rio de Janeiro, Brazil
.
Kanakoudis
V.
&
Gonelas
K.
2014
Forecasting the residential water demand, balancing full water cost pricing and non-revenue water reduction policies
.
Procedia Engineering
89
,
958
966
.
16th Water Distribution System Analysis Conference, WDSA2014
.
Kofinas
D.
,
Papageorgiou
E.
,
Laspidou
C.
,
Mellios
N.
&
Kokkinos
K.
2016
Daily multivariate forecasting of water demand in a touristic island with the use of artificial neural network and adaptive neuro-fuzzy inference system. In 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater). IEEE, pp. 37–42
.
Lewis
C. D.
1982
Industrial and Business Forecasting Methods: A Practical Guide to Exponential Smoothing and Curve Fitting. Butterworth Scientific, London, UK
.
Murphy
K. P.
2022
Probabilistic Machine Learning: An Introduction
.
MIT Press
,
Cambridge, MA
.
Niknam
A.
,
Zare
H. K.
,
Hosseininasab
H.
,
Mostafaeipour
A.
&
Herrera
M.
2022
A critical review of short-term water demand forecasting tools—what method should i use?
.
Sustainability
14
(
9
),
5412
.
Portal da Cidade
2022
Virada do ano em guaratuba teve cerca de 1 milhão de pessoas, segundo pm. Portal da Cidade – Guaratuba, Guaratuba, Brazil
.
Rumelhart
D. E.
,
Hinton
G. E.
&
Williams
R. J.
1986
Learning representations by back-propagating errors
.
Nature
323
(
6088
),
533
536
.
Schober
P.
,
Boer
C.
&
Schwarte
L. A.
2018
Correlation coefficients: Appropriate use and interpretation
.
Anesthesia & Analgesia
126
(
5
),
1763
1768
.
SEED/PR
2022
Calendário Escolar. Secretaria Estadual de Educação do Estado do Paraná, Curitiba, Brazil
.
Smola
A. J.
&
Schölkopf
B.
2004
A tutorial on support vector regression
.
Statistics and Computing
14
(
3
),
199
222
.
Tan
P.-N.
,
Steinbach
M.
&
Kumar
V.
2016
Introduction to Data Mining
.
Pearson Education India
.
Uber
2019
Omphalos, Uber’s Parallel and Language-Extensible Time Series Backtesting Tool. Uber Blog, São Paulo, Brazil
.
Uber
2020
Building a Backtesting Service to Measure Model Performance at Uber-Scale. Uber Blog, São Paulo, Brazil
.
UN-Water
2021
Un World Water Development Report 2021. United Nations Fund for Population Activities, New York, USA
.
Vijai
P.
&
Sivakumar
P. B.
2018
Performance comparison of techniques for water demand forecasting
.
Procedia Computer Science
143
,
258
266
.
Wang
W.
&
Ruf
J.
2022
A note on spurious model selection
.
Quantitative Finance
22
(
10
),
1797
1800
.
Zhang
F.
&
O’Donnell
L. J.
2020
Support vector regression. In: Machine Learning. Elsevier, pp. 123–140
.
Zhigljavsky
A.
&
Golyandina
N.
2020
Singular Spectrum Analysis for Time Series
, 2nd edn.
Springer
,
Berlin
.
Zubaidi
S. L.
,
Ortega-Martorell
S.
,
Al-Bugharbee
H.
,
Olier
I.
,
Hashim
K. S.
,
Gharghan
S. K.
,
Kot
P.
&
Al-Khaddar
R.
2020
Urban water demand prediction for a city that suffers from climate change and population growth: Gauteng province case study
.
Water (Switzerland)
12
,
1
17
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data