ABSTRACT
Water resources management is crucial for human well-being and contemporary socio-economic development. However, the increasing use of water has led to various problems that affect its quality and availability. To address these issues, accurate forecasting of water consumption is essential for the optimal operation of water collection, treatment, and distribution systems. This study aims to compare four machine learning methods for predicting daily urban water demand in a Brazilian coastal tourist city (Guaratuba – Paraná). Historical data from the city’s water distribution system, spanning from 2016 to 2019 (1,461 measurements in total), were considered along with meteorological and calendar data to conduct the investigation. Three time series cross-validation approaches were considered for each method, thus totaling 12 evaluation settings. All models were subjected to hyperparameter optimization and evaluated using appropriate performance metrics from the literature. Results demonstrate the importance of using nonlinear models to predict short-term water demand, highlighting the problem’s complexity. From the compared models, multilayer perceptron provided the best results. Finally, regardless of the model, the best results were obtained by applying an expanding window time series cross-validation, indicating that the more historical data available, the better, in this particular case.
HIGHLIGHTS
Machine learning algorithms are appropriate for water demand estimation.
Preprocessing allows for better data quality and final predictive results.
The addition of historical data showed positive results in the case study.
A case study in a coastal touristic city, Guaratuba – Paraná – Brazil is presented.
Artificial neural networks provided the best predictions for the case study.
INTRODUCTION
The management of water resources is critical to ensure human well-being and socio-economic development. Throughout the course of history, civilizations have flourished along river banks, underscoring the indispensable role of water to our existence. However, the growing demand for water has created several challenges that compromise the quality and availability of this vital resource (UN-Water 2021). Among the contributing factors to an increase in water demand are population growth, economic development, and changes in consumption patterns. As a consequence, accurate water demand forecasting assumes paramount importance for the optimal operation of water catchment, treatment, and distribution systems (Billings & Jones 2011).
Despite that, water demand forecasting is a complex task due to the nature of the data under analysis and the various factors influencing water consumption. Indeed, several works have been published specifically on this topic, e.g., Kanakoudis & Gonelas (2014). As outlined by Cominola et al. (2023), such factors can be organized into three main categories, namely observable, latent, and external. Observable factors include tangible aspects that are readily apparent and trackable, such as socio-demographic, and house and yard characteristics. Latent factors, on the other hand, are subjective and difficult to measure, such as consumer perception, awareness, habits, and opinions. These factors are often hidden or not immediately apparent, requiring more detailed analysis to determine their impact on water consumption. External factors include climate variables, water prices, and other extrinsic factors that can influence water consumption patterns. Although often beyond the control of the system under consideration, they can still have a significant impact on water usage.
Furthermore, it is essential to acknowledge the intricacies and nuances of each scenario under scrutiny. In coastal tourist cities, for instance, there is a clear positive correlation between the influx of tourists and the observed water consumption levels (Toth et al. 2018). The concentration of tourists in these regions can potentially lead to increased pressure on local water resources, making tourism a significant contributor to the overall water consumption (Gössling et al. 2012). However, few studies have considered water demand forecasting for cities with peculiar fluctuations in population. Almutaz et al. (2012) proposed a probabilistic forecasting model for Mecca, Saudi Arabia, a significant religious center that annually attracts millions of Islamic faith followers. Kofinas et al. (2016) offered a comparative analysis of artificial neural networks and adaptive neuro-fuzzy inference systems, focusing on a Mediterranean tourist resort characterized by arid, hot summers coinciding with substantial tourist influxes. Felfelani & Kerachian (2016) employed various artificial neural network architectures to predict water demand in Mashhad, Iran, a tourist city affected by fluctuations driven by national and religious events.
To address the complexities of water demand forecasting, a wide variety of methods have been proposed. Although there is no single convention, these approaches can be broadly divided into two main categories, namely statistical and machine learning based (Guo et al. 2018). Statistical methods, including time-series analysis and regression models, leverage historical water consumption data to discern trends and patterns. Machine learning methods have become increasingly popular for forecasting, as they can handle complex relationships and adapt to dynamic conditions. Prominent machine learning algorithms frequently employed in water demand forecasting include artificial neural networks and support vector regression. Comprehensive, in-depth reviews on the topic are presented by Donkor (2014), Ghalehkhondabi et al. (2017), de Souza Groppo et al. (2019) and, more recently, by Niknam et al. (2022).
In this paper the specific scenario of short-term water demand forecasting in a coastal tourist city characterized by significant population variations is considered. To briefly offer some insight into the complexity of the scenario, it is worth reporting that despite a population of around 38 thousand inhabitants (IBGE 2021), the city experiences peaks of up to one million visitors during New Year’s Eve, as reported by local authorities (Portal da Cidade 2022). Four machine learning methods are taken into account: linear regression (LR), k-nearest neighbors (kNN), support vector regression (SVR), and multilayer perceptron (MLP). Consequently, this research contributes to the growing body of literature on water demand forecasting, focusing on the unique challenges posed by such dynamic scenarios. The experimental setup employed within this study, which comprehends the two main cross-validation approaches in time-series, can be used as a reference (testbed) for future research. Finally, the findings may also have practical implications, supporting the development of more accurate water demand forecasting models for water utility companies, and benefiting all stakeholders engaged in water resource management.
MATERIALS AND METHODS
In this section, a comprehensive overview of the study area and the data utilized in the research is provided, including its characteristics and the preprocessing techniques employed to enhance its quality. Subsequently, the methods used in the experiments are described, along with details on how the models were trained and evaluated.
Study area and data
The city of Guaratuba is situated on the coast of the state of Paraná, Brazil (Figure 1). According to estimates from IBGE (2021), the city has a population of 37,974 inhabitants. During the summer months (December to March), the city experiences a substantial influx of tourists, which alters water consumption patterns and leads to an increase in per capita dues (Carvalho Junior 2021). For this research, two datasets related to the city’s water distribution system (WDS) were available, but only one was chosen for further analysis. The additional data collected to model these consumption patterns can be classified into two categories: meteorological and calendar data.
The meteorological data used in this study were sourced from SIMEPAR, the company responsible for providing the state with meteorological, hydrological, and environmental data. The dataset includes measurements of temperature, radiation, relative humidity, and precipitation, and was collected from May 2015 to November 2020 at a 15-minute frequency. As reported by Billings & Jones (2011), these climatic factors play a significant role in generating the seasonal components of water usage. It is worth noting that this data provides an ideal meteorological forecast for model development, as only estimates for these values can be obtained when the model is employed in real-world scenarios.
The calendar data was collected in an attempt to model the city’s influx of tourists. Previous research showed that in 2012, the number of tourists visiting Paraná’s coast throughout the year was 2,597,392 (Carvalho Junior 2021). As such, data regarding holidays and school recesses were gathered. Holidays were selected for Guaratuba, as well as for the two largest nearby cities, Curitiba and Joinville, each with populations exceeding half a million people. The information was acquired from a web API. School recess data was collected considering solely the state of Paraná from the website of the state’s education and sports department (SEED/PR 2022). In addition to summer and winter holidays, recesses during the term period were also taken into account.
Table 1 presents a summary of the raw data utilized in this study. Prior to any analysis, the data was carefully selected, extracted, and transformed into a set of features that are appropriate for the intended use case. Despite the laborious and time-consuming nature of these steps, it is imperative to understand and prepare the raw data in a format suitable for further analysis (Tan et al. 2016). Among the preprocessing steps taken are: selecting the target attribute based on data quality characteristics, feature extraction and transformation, imputing missing values, reducing noise in the target variable, and encoding and normalizing categorical and numerical attributes, respectively.
Main characteristics of the data
Raw data . | Category . | Measurement frequency . | Source . |
---|---|---|---|
Water produced in the WTPs | Target | 1 day | SANEPAR |
Water consumed from the reservoirs | Target | 1 day | SANEPAR |
Temperature | Meteorological | 15 minutes | SIMEPAR |
Radiation | Meteorological | 15 minutes | SIMEPAR |
Relative humidity | Meteorological | 15 minutes | SIMEPAR |
Precipitation | Meteorological | 15 minutes | SIMEPAR |
Holidays | Calendar | – | – |
School recesses (Paraná State) | Calendar | – | (SEED/PR 2022) |
Raw data . | Category . | Measurement frequency . | Source . |
---|---|---|---|
Water produced in the WTPs | Target | 1 day | SANEPAR |
Water consumed from the reservoirs | Target | 1 day | SANEPAR |
Temperature | Meteorological | 15 minutes | SIMEPAR |
Radiation | Meteorological | 15 minutes | SIMEPAR |
Relative humidity | Meteorological | 15 minutes | SIMEPAR |
Precipitation | Meteorological | 15 minutes | SIMEPAR |
Holidays | Calendar | – | – |
School recesses (Paraná State) | Calendar | – | (SEED/PR 2022) |
To impute the missing values (only five were found in the meteorological dataset), the kNN (k-Nearest Neighbors) algorithm with Euclidean distance and k=5 was employed. The influence of each neighbor was weighted by the inverse of its distance. The water demand time series was subjected to noise reduction using the singular spectrum analysis (SSA) technique, which has become a standard tool in the analysis of meteorological and geophysical time series (Zhigljavsky & Golyandina 2020; Zubaidi et al. 2020). The SSA configuration followed the methodology outlined in Hassani (2010). A window length (L) of 560 was chosen to compute the trajectory matrix. This size strikes a balance, being sufficiently large to capture significant patterns while remaining smaller than half of the entire train data (730 data points). This matrix forms the basis for applying singular value decomposition (SVD). Following SVD, meaningful components were identified by interpreting singular vectors, guided by a visual analysis of w-correlations to determine the parameter r. The first 108 eigentriples were selected, representing essential patterns, while the rest were considered noise. The final step involves reconstructing the one-dimensional series using the chosen eigenvectors. Finally, categorical attributes were encoded by converting each label to an integer value, whereas numerical attributes were normalized between 0 and 1.
Supervised learning algorithms
Time series forecasting, the objective of this work, can be approached as a regression task and, thus, allows the use of supervised learning methods (Bontempi et al. 2012; Brownlee 2017). In this context, historical time series observations become the target attribute of the dataset, alongside other input features. A model is then generated by exploring the relationship between the set of input variables and the output, which can be used for one-step or multi-step prediction. This section covers the supervised learning methods used to tackle the time series forecasting task of this study.
Linear regression




k-Nearest neighbours
The kNN algorithm is a nonparametric technique that is both simple and powerful. Unlike parametric models, this algorithm does not rely on any assumptions about the underlying data distribution, and instead, derives its model structure from the training dataset (Murphy 2022). The method stores the inputs X and outputs y of the training set, and when attempting to predict the target value of a new object, it searches for the nearest k instances in the stored set and returns the associated regression target. The returned value is usually the average of the target attribute from the k nearest instances, which can be weighted by the inverse of their distance to the new object, giving more importance to the closest points. The performance of the KNN algorithm is sensitive to the choice of k and the distance metric adopted between instances (Flach 2012). To ensure accurate predictions, it may be also necessary to scale the attributes to a similar range.
Support vector regression
Initially developed for classification tasks under the name of support vector machine (SVM) and later generalized for regression, SVR is one of the most elegant regression methods (Cortes & Vapnik 1995; Drucker et al. 1996). The algorithm seeks a hyperplane that widely separates the training samples, constrained to have at most ε deviation from the actual values (Smola & Schölkopf 2004). It does not penalize errors as long as they are smaller than ε, giving the flexibility to define how much error is acceptable in the model. The hyperplane and ε define a region called ε-insensitive tube and the support vectors are those data points that lie on the boundary or outside. Hence, SVR computes a hyperplane that minimizes the ε-insensitive tube to be as narrow as possible while comprising the most number of training samples, handling the error term is the constraints of the algorithm (Zhang & O’Donnell 2020). A technique known as the ‘kernel trick’ is used to capture non-linear relationships between variables, which maps the data to a higher dimensional space. For further mathematical details, references such as Bishop & Nasrabadi (2006) and Murphy (2022) can be consulted.
Multilayer perceptron



Model selection & evaluation
This section outlines the methodology adopted for selecting and evaluating machine learning models to estimate water demand in Guaratuba – PR. Initially, the preprocessed data was split into three distinct sets: train, validation, and test. The training set was exclusively used for model training, whereas the validation set was utilized to optimize model hyperparameters (i.e., model selection), and the test set to provide an unbiased evaluation of the chosen model (i.e., model evaluation), reproducing a real-world application scenario. To ensure a reasonable split that considers the annual patterns observed in water demand data, the first two years were allocated for training, the third year for validation, and the last year for testing. All models underwent the same training and evaluation process.
Out of the four methods used in this study, only linear regression does not have any hyperparameters. The other methods were fine-tuned using the widely adopted hyperparameter optimization technique known as grid search (Feurer & Hutter 2019). The scoring metric of choice in this study was the root mean square error (RMSE). Issues that could arise from its use, such as the heavy penalty weight assigned to larger errors, are not addressed. The search space for each method’s hyperparameters is provided in Table 2, with a reasonable range chosen for numerical hyperparameters. It is important to note that these values were determined based on literature observation and the author’s expertise, as the main focus of the research was not on determining optimal parameter values for each and every method. Logarithmic sampling was utilized for some hyperparameters to enable broader exploration. The default values specified by version 1.0.2 of the scikit-learn framework were utilized for all other parameters, with the exception of three MLP parameters: shuffle=False, early_stopping=True, and max_iter=1000.
Search space for the machine learning algorithms
Learning algorithm . | Hyperparameter . | Search space . |
---|---|---|
Linear Regression | – | – |
k-Nearest Neighbors | n_neighbors | {3, 5,…, 29} |
weights | {uniform, distance} | |
metric | {euclidean, manhattan} | |
Support Vector Regression | kernel | {linear, poly, rbf, sigmoid} |
C | [0.001, 100], 10† | |
epsilon | [0.001, 0.1], 10† | |
gamma | {scale, auto} | |
Multilayer Perceptron | hidden_layer_sizes | {(11,), (13,),…, (23,)} |
activation | {tanh, relu} | |
solver | {lbfgs, sgd, adam} | |
alpha | [0.01, 1.0], 10† | |
learning_rate_init | [0.0001, 0.01], 5† |
Learning algorithm . | Hyperparameter . | Search space . |
---|---|---|
Linear Regression | – | – |
k-Nearest Neighbors | n_neighbors | {3, 5,…, 29} |
weights | {uniform, distance} | |
metric | {euclidean, manhattan} | |
Support Vector Regression | kernel | {linear, poly, rbf, sigmoid} |
C | [0.001, 100], 10† | |
epsilon | [0.001, 0.1], 10† | |
gamma | {scale, auto} | |
Multilayer Perceptron | hidden_layer_sizes | {(11,), (13,),…, (23,)} |
activation | {tanh, relu} | |
solver | {lbfgs, sgd, adam} | |
alpha | [0.01, 1.0], 10† | |
learning_rate_init | [0.0001, 0.01], 5† |
Note: The symbol † indicates that the values are evenly spaced in logarithmic space.
Time series cross-validation
When evaluating machine learning models, traditional estimators like k-fold cross-validation are not appropriate for dealing with time series data. This is because these methods assume that observations are independent and identically distributed, which is not valid for time series data where the temporal order in which the values were recorded must be respected (Brownlee 2017). To address this issue, researchers have recently adopted a procedure called time series cross-validation, although naming conventions are still being established (Uber 2019, 2020; Gordeev et al. 2020).
Time series cross-validation involves evaluating a model in various time periods, simulating a real-world application scenario (Hyndman & Athanasopoulos 2021). In this process, there are multiple test sets, and the corresponding training set comprises only observations that occurred prior to those in the test set. The performance metric of interest is computed by averaging the results for each iteration. This outcome can be utilized to evaluate a model, identify optimal parameters, and measure prediction volatility over time (Gordeev et al. 2020). Compared to a single holdout split, this approach provides a more consistent performance estimate, applying cross-validation logic while still respecting the temporal order of the data. Time series cross-validation can be categorized into two types: sliding window and expanding window.
Visualization of the time series cross-validation behaviors with sliding window (a) and expanding window (b). The training set is represented in green, while the test set is in red: (a) Time series cross-validation using a sliding window; (b) Time series cross-validation using an expanding window.
Visualization of the time series cross-validation behaviors with sliding window (a) and expanding window (b). The training set is represented in green, while the test set is in red: (a) Time series cross-validation using a sliding window; (b) Time series cross-validation using an expanding window.
For this study, a total of 12 models were created by utilizing three time series cross-validation configurations for each learning method. The configurations comprised two sliding windows with training window sizes of 365 and 730 days (one and two years, respectively), and one expanding window with an initial size of 730 days (two years), which grew at each iteration. A test window size and sliding step of 7 days were used for all three configurations since it was the smallest pattern expected from the daily water demand data. That is, all models provide predictions in a weekly fashion, forecasting water demand for the next 7 days ahead.
Performance metrics


RESULTS AND DISCUSSION
The present section outlines the outcomes of this study. It begins by discussing the selection of the target attribute, followed by presenting the final dataset used for analysis that resulted from preprocessing. An overview of the performance of the models employed in the research is then provided. Subsequently, an in-depth analysis of the best-performing model from each learning algorithm is presented. Model selection results (optimal hyperparameters and model performance on the validation set) are included in the Supplementary material for ease of reference.
Selection of target attribute
To identify the most suitable predictive/target attribute for the analysis, two time series collected on a daily basis over four years from 2016 to 2019 were explored, comprising a total of 1,461 observations each. These time series represent water produced in the WTP and water consumed in the reservoirs. For ease of reference, the former will be referred to as ‘water produced’ and the latter as ‘water consumed’. Figure 3 depicts water produced in the WTPs and water consumed in the reservoirs of Guaratuba during the four years of available data: 2016 (upper left), 2017 (upper right), 2018 (lower left) and 2019 (lower right). Pearson’s correlation coefficient between the two variables for each year (assuming the same order as before) is 0.84, 0.67, 0.86, and 0.82, respectively. Over all years the correlation is 0.80. It is noteworthy that, as stated by Billings & Jones (2011), the system’s water demand and total water production in a public supply are conceptually equivalent. This means that the analysis is valid regardless of which time series is used. The ultimate choice of the most suitable predictive/target attribute was based on the quality of the data, particularly its completeness and coherence.
Relationship between the produced and consumed water for Guaratuba (years: 2016–2019).
Relationship between the produced and consumed water for Guaratuba (years: 2016–2019).
To measure the strength of association between the two time series, Pearson’s correlation coefficient was computed. The results indicated a strong positive linear relationship with a correlation coefficient of 0.80 over the four-year period (as already discussed). This suggests that both variables exhibit similar underlying patterns, with discrepancies likely arising from water losses. Indeed, research indicates substantial water wastage through leakage and inefficiencies within distribution networks (Gautam et al. 2020). The most differences appear during the months of August to October in 2017. Given the low quality of the water consumed time series, the decision was made to exclude it from further analysis and focus solely on the water produced data as the target variable.
Preprocessing
The raw data underwent various steps to transform it into a cleaned and usable dataset for further analysis. Initially, specific information was extracted from timestamps to generate meaningful features that can establish simple and strong relationships between inputs and outputs (Brownlee 2017). These features include year, month, day, day of the week, weekend, and season of the year. Ordinal attributes were converted to integers, and the encoded data represented a sequence of labels, while binary attributes were represented as integers with only two possible values, 0 or 1. These features are crucial in the learning algorithm’s ability to model the data accurately, especially due to its cyclic patterns.
To account for the significant impact of Carnival on the city’s water demand, a special feature was created for this period. In addition to the official Carnival holidays (which comprise Monday, Tuesday, and Wednesday), the dates for Friday, Saturday, and Sunday before the official days were also included to cover the entire festive period. The local holidays of Guaratuba, Curitiba, and Joinville were merged into a single attribute, as they showed high similarity with a Jaccard similarity coefficient of 0.88. These holiday-related features are binary and represented by an integer variable that takes on values of either 0 (not holiday) or 1 (holiday).
The meteorological data, which included temperature, radiation, relative humidity, and precipitation, underwent three steps: resampling, imputation, and scaling. First, the 15-minute data was resampled daily by calculating the daily mean and standard deviation for each variable. Next, missing data (only five samples) was imputed using the kNN algorithm with Euclidean distance and , where neighbors were weighted by the inverse of their distance. Finally, the data was normalized between 0 and 1 to ensure consistency in the analysis.
Lastly, SSA was applied to smooth the selected water demand time series. The residuals demonstrated a normal distribution centered at zero, suggesting that the smoothing process effectively reduced noise, and the resulting smoothed curve describes the signal. The final dataset for analysis comprises 18 attributes, including the target variable, with daily observations from 2016 to 2019. Of these attributes, eight were categorical and 10 were numerical. Table 3 provides a summary of the attributes in the final dataset, with a total of 18 dimensions. The target variable is highlighted at the bottom for easy identification. Subsequently, an examination of the overall model performance is conducted. For selected hyperparameters and model performance on the validation set, please refer to the Supplementary material.
Attributes that compose the database used for analysis
Attribute . | Description . | Type . |
---|---|---|
temperature_mean | Daily average temperature (°C) | Numerical |
temperature_std | Daily standard deviation of temperature (°C) | Numerical |
radiation_mean | Daily average radiation (W/m2) | Numerical |
radiation_std | Daily standard deviation of radiation (W/m2) | Numerical |
relative_humidity_mean | Daily average relative humidity (%) | Numerical |
relative_humidity_std | Daily standard deviation of relative humidity (%) | Numerical |
precipitation_mean | Daily average precipitation (mm) | Numerical |
precipitation_std | Daily standard deviation of precipitation (mm) | Numerical |
year | Year of the record | Numerical |
month | Month of the record | Categorical |
day | Day of the record | Categorical |
day_of_week | Day of the week of the record | Categorical |
is_weekend | Indicates whether it is a weekend or not | Categorical |
season | Season of the record | Categorical |
is_holiday_ctba_gtba_jve | Indicates whether it is a public holiday | Categorical |
is_carnival | Indicates if it is carnival week | Categorical |
is_school_recess_pr | Indicates whether it is recess or school break | Categorical |
water_produced | Water produced by the WTPs of Guaratuba (m3) | Numerical |
Attribute . | Description . | Type . |
---|---|---|
temperature_mean | Daily average temperature (°C) | Numerical |
temperature_std | Daily standard deviation of temperature (°C) | Numerical |
radiation_mean | Daily average radiation (W/m2) | Numerical |
radiation_std | Daily standard deviation of radiation (W/m2) | Numerical |
relative_humidity_mean | Daily average relative humidity (%) | Numerical |
relative_humidity_std | Daily standard deviation of relative humidity (%) | Numerical |
precipitation_mean | Daily average precipitation (mm) | Numerical |
precipitation_std | Daily standard deviation of precipitation (mm) | Numerical |
year | Year of the record | Numerical |
month | Month of the record | Categorical |
day | Day of the record | Categorical |
day_of_week | Day of the week of the record | Categorical |
is_weekend | Indicates whether it is a weekend or not | Categorical |
season | Season of the record | Categorical |
is_holiday_ctba_gtba_jve | Indicates whether it is a public holiday | Categorical |
is_carnival | Indicates if it is carnival week | Categorical |
is_school_recess_pr | Indicates whether it is recess or school break | Categorical |
water_produced | Water produced by the WTPs of Guaratuba (m3) | Numerical |
Overall performance
The performance of four different machine learning algorithms – LR, kNN, SVR, and MLP – in predicting water demand was assessed. The models were trained using the optimal hyperparameters (as obtained with the validation set) and evaluated using three error metrics, RMSE, MAE, and MAPE. Table 4 summarizes the results obtained on the test set. The table is organized into three sections based on the time series cross-validation method used: expanding window (EW), one-year size sliding window (SW-1Y), and two-year size sliding window (SW-2Y). Performance measures were computed for each cross-validation iteration and averaged across rounds to estimate each model’s performance. Standard deviations are reported on the right side of each metric. The model with the lowest error for each learning algorithm is highlighted in bold for each metric. The results are analyzed from two perspectives: firstly, based on the learning algorithm, and secondly, considering the time-series cross-validation method.
Performance of the 12 models in the test set
Model . | RMSE . | σ . | MAE . | σ . | MAPE (%) . | σ (%) . |
---|---|---|---|---|---|---|
EW | ||||||
LR | 2148.36 | 929.992 | 1832.11 | 883.323 | 15.03 | 6.507 |
kNN | 1866.11 | 972.392 | 1590.67 | 948.388 | 12.30 | 5.078 |
SVR | 1852.52 | 647.234 | 1525.26 | 544.143 | 12.12 | 4.373 |
MLP | 1709.70 | 593.104 | 1408.97 | 515.158 | 11.81 | 4.707 |
SW-1Y | ||||||
LR | 2403.62 | 1010.830 | 2093.66 | 955.716 | 16.01 | 5.430 |
kNN | 1906.99 | 969.900 | 1618.19 | 954.202 | 12.47 | 5.055 |
SVR | 2022.69 | 846.400 | 1685.90 | 744.767 | 13.08 | 4.318 |
MLP | 2142.58 | 927.998 | 1796.94 | 855.901 | 14.11 | 5.841 |
SW-2Y | ||||||
LR | 2178.25 | 870.311 | 1870.36 | 814.996 | 14.88 | 5.341 |
kNN | 1968.28 | 1008.130 | 1668.98 | 963.966 | 12.98 | 5.370 |
SVR | 1943.40 | 783.327 | 1611.67 | 648.252 | 12.67 | 4.558 |
MLP | 1927.41 | 732.005 | 1620.15 | 630.532 | 13.08 | 4.916 |
Model . | RMSE . | σ . | MAE . | σ . | MAPE (%) . | σ (%) . |
---|---|---|---|---|---|---|
EW | ||||||
LR | 2148.36 | 929.992 | 1832.11 | 883.323 | 15.03 | 6.507 |
kNN | 1866.11 | 972.392 | 1590.67 | 948.388 | 12.30 | 5.078 |
SVR | 1852.52 | 647.234 | 1525.26 | 544.143 | 12.12 | 4.373 |
MLP | 1709.70 | 593.104 | 1408.97 | 515.158 | 11.81 | 4.707 |
SW-1Y | ||||||
LR | 2403.62 | 1010.830 | 2093.66 | 955.716 | 16.01 | 5.430 |
kNN | 1906.99 | 969.900 | 1618.19 | 954.202 | 12.47 | 5.055 |
SVR | 2022.69 | 846.400 | 1685.90 | 744.767 | 13.08 | 4.318 |
MLP | 2142.58 | 927.998 | 1796.94 | 855.901 | 14.11 | 5.841 |
SW-2Y | ||||||
LR | 2178.25 | 870.311 | 1870.36 | 814.996 | 14.88 | 5.341 |
kNN | 1968.28 | 1008.130 | 1668.98 | 963.966 | 12.98 | 5.370 |
SVR | 1943.40 | 783.327 | 1611.67 | 648.252 | 12.67 | 4.558 |
MLP | 1927.41 | 732.005 | 1620.15 | 630.532 | 13.08 | 4.916 |
With respect to the learning algorithm, it can be seen that MLP performed the best with a test RMSE of 1,709.70 and a test MAE of 1,408.97. This indicates that the average absolute difference between the actual and predicted daily water demand values is 1,408.97 m3 — note from Figure 3 that the water demand values are as high as 25,000 m3. In descending order of RMSE, the other methods can be ranked as: SVR (RMSE ), kNN (RMSE
), and LR (RMSE
). From a more interpretable perspective, the outcomes reveal that the models achieved MAPE values ranging from 16.01 to 11.81%. Following established standards for predictive model quality, a MAPE falling within the 10–20% range is considered good, while anything below 10% indicates highly accurate forecasting (Lewis 1982). Consequently, the methods employed are considered sufficient for generating accurate water demand predictions, though there is room for further enhancement.
The superior performance MLP, SVR, and kNN – with only marginal differences compared to LR – can be attributed to their ability to model complex, nonlinear relationships. According to Niknam et al. (2022), the first two algorithms are ranked among the top machine learning algorithms for this task. In contrast, while kNN offers a good trade-off between performance and complexity, it is not commonly used in the field of water demand, leaving room for further exploration. This method typically requires less data than other algorithms and exhibits stable metrics across all three configurations employed in this analysis. Ultimately, the key limitation of LR lies in its inability to model nonlinear relationships, making it the only learning algorithm evaluated in this study that is unable to handle such relationships. Prior researches already emphasized that nonlinear models play an important role in predicting water demand (Adamowski et al. 2012; Vijai & Sivakumar 2018; Bata 2019).
With respect to the time series cross-validation, the analysis of the three configurations underscores the nuanced relationship between model performance and temporal dynamics. The results reveal that models utilizing the expanding window outperformed those employing the sliding window. Each learning algorithm had its best model using this configuration, suggesting that, in this specific case, a larger historical dataset leads to improved performance. For instance, the best-performing method using the expanding window was MLP (RMSE ), while LR had the worst performance (RMSE
). For the sliding window with a window size of 365 days, kNN performed the best (RMSE
), and LR performed the worst (RMSE
). All methods, except kNN, had their worst performance in this configuration. Finally, for the sliding window with a window size of 730 days, MLP performed the best (RMSE
), and LR performed the worst (RMSE
).
It is worth noting that in certain situations, simply increasing the amount of historical data available does not necessarily improve a model’s performance. This phenomenon has been observed in previous studies, which have shown that the data can undergo significant shifts in water consumption patterns, such as the adoption of a novel hygiene behavior (Bata 2019). To address this issue, smaller sliding windows for training may be more beneficial as they allow for disregarding old behaviors that are no longer relevant. A prime example of this is the COVID-19 pandemic, which caused significant changes in water consumption patterns. For instance, Lüdtke et al. (2021) found that during the first lockdown in 2020, daily water consumption in northern Germany was 14.3% higher than during the same time period in previous years.
Therefore, it is crucial to note that these results, as with any analysis, should be interpreted with caution, as they may differ depending on the specific location and data collected. For instance, the present study only examines data from the past four years, whereas other studies, such as Adamowski et al. (2012), consider data from nine years. The effects of different time series cross-validation configurations would likely be more pronounced if the data were collected over a longer period. Nevertheless, many studies in the field of water demand forecasting often focus solely on hold-out strategies, neglecting to consider the limitations inherent in these traditional model evaluation techniques. For practitioners contemplating the adoption of cross-validation, it is crucial not to overlook the temporal structure of the data. Failing to account for this aspect may lead to an overestimation of the model’s performance (Wang & Ruf 2022).
Selected models’ performance
The preceding section offered a broad glimpse into the model’s overall performance. To delve deeper, a thorough analysis was conducted for the best model associated with each machine learning algorithm. The results are presented in Figure 4 in the form of box plots, which provide a summary of the absolute error in the test set of each model across different months. Interestingly, it can be seen that all models exhibited their lowest and highest monthly performance in December and September, respectively.
Box plot illustrating the absolute error (in m3) for each month in the test set (2019), showcasing only the best-performing model from each algorithm. The symbol ° denotes the mean value.
Box plot illustrating the absolute error (in m3) for each month in the test set (2019), showcasing only the best-performing model from each algorithm. The symbol ° denotes the mean value.
In September, the models performed remarkably well with MAE values of 829.07, 760.91, 1,014.57, and 812.52, in the same order as previously mentioned. The models exhibited greater consistency during this month, with an MAE difference of up to only 253.66. The reason behind this remains unclear as no discernible patterns have been discovered. Furthermore, the month of October also exhibited good overall performance, while the months from March to August fall between the strong performance of September and October and the lower performance observed in December and January. It is crucial, however, to acknowledge the limitations imposed by the limited size of the test dataset, spanning only one year. This limitation hinders a comprehensive understanding of whether these observed trends are sporadic or indicative of a less chaotic water demand during these months.
Despite that, the models’ predictions during September and December were further explored, as illustrated by Figure 5. The central section of the figure depicts the outcomes of the entire test set, while the upper section provides a detailed view of the least accurate month (December), and the lower section zooms in on the month of peak performance (September). The predictions for September reveal a strong weekly seasonality in the water demand data, which repeats four times during the month. However, this pattern is not evident in December. The models encounter challenges in keeping up with the target variable values, with significant errors occurring around the 13th and 18th of the month. Additionally, this is apparent from the 26th, but only for LR and kNN models. These observations emphasize the importance of understanding the specific dynamics of water demand, especially during critical months when external factors like tourism can significantly influence the patterns.
Guaratuba is a seaside city and a popular destination for temporary residents during the summer, especially in December and January when schools are closed, and people flock to the beaches to enjoy the warm weather. As a result, the demand for water reaches its peak during this period, and the lack of available data representing the number of incoming tourists poses a significant constraint to this study. Although calendar attributes have been incorporated to mitigate this limitation, they offer only a broad overview of temporal patterns, underscoring the need for more fine-grained information such as daily tourist arrivals. Kofinas et al. (2016) exemplify in their research the effectiveness of integrating detailed tourist influx data, offering a more nuanced and accurate representation of the underlying dynamics. The integration of such fine-grained information holds the potential to significantly enhance the models’ performance, particularly during the peak summer season.


Daily forecasts in the test set (2019), showcasing the best-performing model from each algorithm.
Daily forecasts in the test set (2019), showcasing the best-performing model from each algorithm.
Scatter plot comparing observed and predicted water demand values in the test set, showcasing the best-performing model from each algorithm.
Scatter plot comparing observed and predicted water demand values in the test set, showcasing the best-performing model from each algorithm.
CONCLUSIONS
This research compared four machine learning algorithms for predicting daily urban water demand in Guaratuba, a Brazilian coastal city with a fluctuating population. The analysis was conducted using a historical dataset containing meteorological, calendar, and water demand information from 2016 to 2019, which underwent preprocessing steps to enhance its quality. To assess the model’s generalization power, three time series cross-validation configurations for each algorithm were employed. To the best of out knowledge, this is the first in-depth evaluation regarding the effect of such choices in the field of water demand. The results highlighted the importance of using nonlinear models for short-term forecasting, with the multilayer perceptron (MLP) algorithm delivering the most accurate predictions, followed by support vector regression (SVR), k-nearest neighbors (kNN), and the linear regression (LR) algorithm producing the poorest results. Despite producing reliable estimates, the models could probably benefit from other descriptive variables that influence water consumption, especially during the summer when the city experiences a surge in tourism. This recognition emphasizes the importance of tailoring the modeling approach to capture the unique characteristics of each city. For Guaratuba, this could involve integrating variables such as the number of tourist arrivals and hotel occupancy rates. Although information regarding the latter might be difficult to obtain on a daily basis, data regarding the former could be readily obtained from tolls located within the major influx highways arriving at the city. Lastly, it is worth noting that all models exhibited better forecasts when using time series cross-validation with an expanding window. This not only demonstrates their practical utility but also suggests promising directions for future research.
ACKNOWLEDGEMENTS
The authors would like to thank SANEPAR and SIMEPAR for providing the necessary data for this study.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.