Abstract
This paper presents a backpropagation neural network (BPNN) approach based on the sparse autoencoder (SAE) for shortterm water demand forecasting. In this method, the SAE is used as a feature learning method to extract useful information from hourly water demand data in an unsupervised manner. After that, the extracted information is employed to optimize the initial weights and thresholds of the BPNN. In addition, to enhance the effectiveness of the proposed method, data reconstruction is implemented to create suitable samples for the BPNN, and the early stopping method is employed to overcome the BPNN overfitting problem. Data collected from a realworld water distribution system are used to verify the effectiveness of the proposed method, and a comparison with the BPNN and other BPNNbased methods which integrate the BPNN with particle swarm optimization (PSO) and the mind evolutionary algorithm (MEA), respectively, is conducted. The results show that the proposed method can achieve fairly accurate and stable forecasts with a 2.31% mean absolute percentage error (MAPE) and 320 m^{3}/h root mean squared error (RMSE). Compared with the BPNN, PSO–BPNN and MEA–BPNN models, the proposed method gains MAPE improvements of 5.80, 3.33 and 3.89%, respectively. In terms of the RMSE, promising improvements (i.e., 5.27, 2.73 and 3.33%, respectively) can be obtained.
HIGHLIGHTS
To enhance the performance of the BPNN, the SAE is introduced to extract useful features in an unsupervised manner.
An effective framework which integrates the BPNN with the SAE and early stopping technique is proposed for water demand forecasting.
The proposed method is verified by comparing with the BPNN and similar methods which integrate the BPNN with PSO and the MEA, respectively.
INTRODUCTION
Water demand forecasting is the basis of smart scheduling for water distribution systems. Since prediction accuracy can directly affect the reliability and practicability of management decisions, reliable and accurate forecasts are of significance for effective water management. According to different forecast horizons and forecast frequencies, water demand forecasting can be divided into longterm, mediumterm and shortterm forecasting (Pacchin et al. 2019). Shortterm demand prediction generally forecasts water demand over limited time horizons (e.g., 1 month or 1 day) with a time step ranging from daily to subhourly (e.g., 15 or 5 min) (Bárdossy et al. 2009; Tabesh & Dini 2009). In this paper, we focus on hourly water demand forecasting.
Over the past decades, a wide variety of methods have been proposed for water demand forecasting based on different principles (Donkor et al. 2014). Artificial neural networks (ANNs) have always been a research hotpot in this field due to their ability to handle nonlinear data (Ghalehkhondabi et al. 2017). Among these ANNs, the most commonly used type is the backpropagation neural network (BPNN), where a backpropagation algorithm is used for training (Bougadis et al. 2005). Some previous studies have shown that it is possible to yield fairly accurate forecasts using BPNNs to predict shortterm water demand (Adamowski & Karapataki 2010; Herrera et al. 2010). Although BPNNs perform well in some cases, they easily fall into the local optimal solutions due to the randomness of the initial weights and thresholds, which results in poor generalization, especially for complex prediction problems. To address this problem, some studies have adopted optimization algorithms (e.g., the genetic algorithm (GA), particle swarm optimization (PSO) or the mind evolutionary algorithm (MEA)) to optimize the initial weights and thresholds of the BPNN. The results show that the prediction performance of water demand forecasting can improve to different degrees compared with BPNNs without optimization (PulidoCalvo & GutierrezEstrada 2009; Huang et al. 2022). In other words, this approach is a feasible way to improve the BPNN performance by optimizing its initial weights and thresholds. However, previous studies have usually emphasized supervised learning to optimize the BPNN's initial weights and thresholds, while unsupervised learning is ignored.
In recent years, many deep learningbased methods have been proposed to address complex prediction problems (Alipanahi et al. 2015; Lv et al. 2015; Hu et al. 2019; Cao et al. 2021). As a kind of machine learning method, deep learning methods can learn more useful features when the machine learning models are constructed with many hidden layers and massive training data, thereby improving prediction accuracy. Given this advantage, deep learning methods have gained great interest in shortterm water demand forecasting (Salloom et al. 2021, 2022; Chen et al. 2022; Sharma 2022). The above studies suggest that deep learning methods may be a promising alternative for improving the prediction performance of shortterm water demand forecasting. However, these deep learningbased methods usually require professional knowledge during their construction. Unfortunately, the required expert knowledge is not always available, which means that these methods are not general enough to tackle prediction tasks. In addition, deep learningbased methods are usually timeconsuming in terms of tuning the related parameters and training the networks due to their much more complex structures.
In this context, a simple method that couples a BPNN with a sparse autoencoder (SAE), named the SAE–BPNN model, is proposed for shortterm water demand forecasting in this paper. Note that there is no general method/model that can obtain forecasts with high accuracy and good stability in all cases. Therefore, this study does not pursue the best method/model for prediction problems but seeks to predict shortterm water demand in a simple and effective way without losing prediction performance. Thus, the main purposes of this study include the following aspects: (1) investigating the potential of the proposed method in shortterm water demand forecasting and (2) examining whether the proposed method can achieve some improvements in prediction performance compared with some similar methods.
The main contributions of this paper are as follows: (1) to enhance the performance of the BPNN, the SAE is introduced to extract useful features in an unsupervised feature manner; (2) an effective framework which integrates the BPNN with the SAE and early stopping technique is proposed for water demand forecasting and (3) the proposed method is verified by comparing with the BPNN and similar methods which integrate the BPNN with PSO and the MEA, respectively.
METHODOLOGY
Data reconstruction
Input variables are important factors that have a considerable impact on the accuracy of shortterm forecasting (Arjmand et al. 2020), and the input variables that can effectively describe the data features are beneficial in terms of improving the prediction performance. Therefore, it is necessary to reconstruct the hourly water demand data to achieve input variables that can reflect the relationships among data. In general, hourly water demand data have the characteristics of periodicity and shortterm correlation. To represent these characteristics, data reconstruction is implemented by using the framework proposed by Huang et al. (2022); the framework is provided in Table 1. After that, the BPNN's inputs and outputs can be obtained.
Component of a sample .  Data .  Description . 

Input variables  T(i − j) – T(i − 1)  T(i) denotes hourly water demand at time i; D_{n}(i) denotes the hourly water demand at time i for the previous day n; W_{m}(i) is the hourly water demand at time i on the same day for the previous week m. 
D_{1}(i) – D_{n}(i)  
W_{1}(i) – W_{m}(i)  
Output variables  T(i) 
Component of a sample .  Data .  Description . 

Input variables  T(i − j) – T(i − 1)  T(i) denotes hourly water demand at time i; D_{n}(i) denotes the hourly water demand at time i for the previous day n; W_{m}(i) is the hourly water demand at time i on the same day for the previous week m. 
D_{1}(i) – D_{n}(i)  
W_{1}(i) – W_{m}(i)  
Output variables  T(i) 
BPNN model
Sparse AE
To date, the SAE has become a popular unsupervised feature learning method because it has a powerful ability to effectively find succinct and highlevel representations in complex data (Xu et al. 2016; Wang et al. 2020).
Implementation of the proposed method
The implementation procedure for the proposed method includes two stages: unsupervised pretraining and supervised finetuning. From this point of view, the proposed method can be considered a semisupervised learning method, and the proposed method's specific procedure is described as follows:
 (1)
Unsupervised pretraining
The main goal of this stage is to provide the initial weights and thresholds for the BPNN. The SAE is first trained using sufficient unlabeled training samples. Suppose a training sample of the BPNN is denoted as {x_{1}, x_{2}, … , x_{n}, y}, where {x_{1}, x_{2}, … , x_{n}} is the input variable, y is the output variable. Then, {x_{1}, x_{2}, … , x_{n}} can be used as the unlabeled training sample of the SAE. Once the SAE training process is completed, the useful features can be extracted from the unlabeled samples. After that, the extracted features are used as the initial parameters for the BPNN. Since the initial weights and thresholds influence the BPNN's prediction performance, it is crucial to train the SAE well to successfully extract meaningful information. In addition, to make use of the information obtained from the SAE, the size of the SAE's hidden layer must be the same as that of the BPNN's hidden layer.
 (2)
Supervised finetuning
This stage aims to find the final parameters for the BPNN. Although the information obtained from the SAE can be used to optimize the BPNN's initial parameters, there is no guarantee that the BPNN will yield good predictions using these parameters. As a consequence, it is still necessary to search for appropriate parameters. For this purpose, the BPNN needs to be trained using sufficient labeled training samples. As part of the proposed method, early stopping, which is an effective method used to avoid the BPNN overfitting problem (Gurbuz et al. 2003; Cheng et al. 2016), is adopted during the BPNN's training. When the training termination condition is satisfied, the BPNN will have suitable parameters and can be used for water demand forecasting.
Forecasting performance evaluation
CASE STUDY
Data description
The data used in this study were obtained from a realworld water distribution system located in Guigang region, southern China (Huang et al. 2022), and the dataset contains hourly water demand data over 175 days. Therefore, a total of 4,200 observed values are included in the dataset. Based on the framework shown in Table 1, the hourly water demand data were reconstructed to generate proper samples for the BPNN. To obtain more suitable samples, correlation analysis was conducted to determine appropriate values for parameters j, n and m, which are shown in Table 1. The values of j, n and m were finally set to 4, 4 and 2, respectively. After that, a sample set containing 3,864 samples was created for the BPNN. To evaluate the performance of the forecasting methods, the sample set was divided into a training set (including 3,092 samples, approximately 80% of the total samples) and a testing set (772 samples, approximately 20% of the total samples).
Forecasting methods used for comparisons
In this paper, two scenarios were created to evaluate the proposed method. The purpose of Scenario 1 is to verify the effectiveness of the SAE and the superiority of the proposed method over other methods. In Scenario 1, the proposed method was compared with a traditional BPNN and two BPNNbased models, where the initial weights and thresholds of the BPNN were optimized by PSO and the MEA, respectively. Both PSO and the MEA are intelligent optimization algorithms that are usually used to search for optimal values when optimizing BPNNs (Wang et al. 2015; Chang et al. 2020). For convenience, the BPNN models that are based on PSO and the MEA are denoted as the PSO–BPNN model and MEA–BPNN model, respectively. In Scenario 2, to investigate the effectiveness of the samples obtained from data reconstruction, the proposed method was compared with the SAE–BPNN model without data reconstruction. The parameter settings for forecasting methods are discussed in detail in the following sections.
Scenario 1 parameter settings
A total of four methods are considered in Scenario 1. Note that the forecasting methods in this study were performed using the MATLAB software. Furthermore, all of the methods in Scenario 1 adopted the same samples for prediction. The process used to determine the related parameters is presented as follows:
 (1)
BPNN model
To utilize the BPNN model, its structure should be specified in advance. The basic parameters of the BPNN structure include the number of input nodes, hidden nodes and output nodes. The numbers of input nodes and output nodes are equal to the numbers of inputs and outputs, respectively. According to the ‘Data description’ subsection, the number of input nodes can be set to 10, while the number of output nodes can be set to 1, and for the number of hidden nodes, the optimal value can be obtained by using a grid search method. A summary of the parameter settings for the BPNN is listed in Table 2.
 (2)
Other methods
Number of input nodes .  Number of hidden nodes .  Number of output nodes .  Search range for optimal number of hidden nodes .  Other parameters . 

10  5  1  [5, 20]  Default 
Number of input nodes .  Number of hidden nodes .  Number of output nodes .  Search range for optimal number of hidden nodes .  Other parameters . 

10  5  1  [5, 20]  Default 
To provide a fair comparison, the other BPNNbased methods adopted the same parameters described in the BPNN section. For the PSO–BPNN model, the PSO parameters include two acceleration factors, iteration times, velocity ranges, position ranges, and population sizes. The first four parameters were determined based on published values (Wang et al. 2015) and several trials, which are described in this paper. Moreover, a grid search method was adopted to obtain the optimal population size value according to the minimum MAPE on the test data. The PSO parameters are shown in Table 3.
Acceleration factors c_{1}, c_{2} .  Number of iterations .  Velocity range .  Position range .  Population size .  Search range for optimal population size . 

1.49445  10  [−1, 1]  [−5, 5]  70  [20, 100] 
Acceleration factors c_{1}, c_{2} .  Number of iterations .  Velocity range .  Position range .  Population size .  Search range for optimal population size . 

1.49445  10  [−1, 1]  [−5, 5]  70  [20, 100] 
In the MEA–BPNN model, the MEA parameters include the number of superior subgroups, number of temporary subgroups, number of iterations and population size. The first three parameters were specified based on published values (Huang et al. 2022) and a trialanderror procedure. The optimal population size can also be obtained by using the same method as that adopted for PSO. The MEA parameters are listed in Table 4.
Population size .  Number of iterations .  Number of superior subgroups and temporary subgroups .  Search range for optimal population size . 

200  20  (5, 5)  [20, 200] 
Population size .  Number of iterations .  Number of superior subgroups and temporary subgroups .  Search range for optimal population size . 

200  20  (5, 5)  [20, 200] 
With regard to the proposed method, the SAE should have the same number of hidden nodes as the BPNN to ensure that the proposed method operates properly. In the MATLAB environment, three hyperparameters, termed the coefficient of the L2 weight regularization, the coefficient of sparsity regularization and the sparsity proportion, are usually used to control the SAE sparsity. To achieve better performance, a grid search method was conducted to obtain the optimal values for the aforementioned parameters based on the minimal MAPE on the test data. The SAE parameters are displayed in Table 5.
Parameter .  Value .  Parameter .  Value . 

Coefficient of L2 weight regularization  0.1  Search range for optimal coefficient of L2 weight regularization  [0.05, 1] 
Coefficient of sparsity regularization  1  Search range for optimal coefficient of sparsity regularization  [1, 5] 
Sparsity proportion  0.2  Search range for sparsity proportion  [0.05, 1] 
Transfer function for the decoder  purelin  Other parameters  Default 
Parameter .  Value .  Parameter .  Value . 

Coefficient of L2 weight regularization  0.1  Search range for optimal coefficient of L2 weight regularization  [0.05, 1] 
Coefficient of sparsity regularization  1  Search range for optimal coefficient of sparsity regularization  [1, 5] 
Sparsity proportion  0.2  Search range for sparsity proportion  [0.05, 1] 
Transfer function for the decoder  purelin  Other parameters  Default 
Scenario 2 parameter settings
In Figure 4, x(i) is the ith data point in the water demand time series.
To provide a fair comparison, the inputs in a sample for the SAE–BPNN model without data reconstruction consist of 10 sequential data. In other words, the sample size of the SAE–BPNN model without data reconstruction is the same as that of the proposed method. According to the framework shown in Figure 4, a sample dataset including a total of 4,190 samples can be obtained. Based on similar procedures to Scenario 1, the parameters of the SAE–BPNN model without data reconstruction can be determined. Detailed information about the parameter settings is illustrated in Table 6.
SAE section .  BPNN section . 



SAE section .  BPNN section . 



Results and discussion
In this study, a laptop was used to implement the MATLAB programs for all the prediction methods. The main parameters of the laptop are as follows: Core™ i77500 μ 2.70 GHz, RAM 8.00.
Table 7 shows the comprehensive comparison results of the forecasting performance. The improvements gained from the comparison between the proposed method and three other BPNNbased methods are provided in Table 8.
.  APE .  

Method .  MAPE (%) .  RMSE (m^{3}/h) .  R^{2} .  Computational load (s) .  Number (APE > 5%) .  Maximum value (%) .  Standard deviation . 
BPNN  2.45  338  0.93  2.17  79  10.27  1.81 
PSO–BPNN  2.39  329  0.91  73.13  73  10.13  1.76 
MEA–BPNN  2.40  331  0.92  3.98  71  9.43  1.77 
SAE–BPNN  2.31  320  0.91  3.57  61  8.95  1.71 
.  APE .  

Method .  MAPE (%) .  RMSE (m^{3}/h) .  R^{2} .  Computational load (s) .  Number (APE > 5%) .  Maximum value (%) .  Standard deviation . 
BPNN  2.45  338  0.93  2.17  79  10.27  1.81 
PSO–BPNN  2.39  329  0.91  73.13  73  10.13  1.76 
MEA–BPNN  2.40  331  0.92  3.98  71  9.43  1.77 
SAE–BPNN  2.31  320  0.91  3.57  61  8.95  1.71 
Indicator .  BPNN .  PSO–BPNN .  MEA–BPNN . 

MAPE improvements (%)  5.80  3.33  3.89 
RMSE improvements (%)  5.27  2.73  3.33 
Indicator .  BPNN .  PSO–BPNN .  MEA–BPNN . 

MAPE improvements (%)  5.80  3.33  3.89 
RMSE improvements (%)  5.27  2.73  3.33 
From Table 7, it is clear that the proposed method has the lowest values in terms of both the MAPE and RMSE. The proposed method achieves a 2.31% MAPE and 320 m^{3}/h RMSE. Compared with the BPNN, the proposed method gained MAPE improvements of 5.80%. In terms of the RMSE, promising improvements (i.e., approximately 5.27%) can also be obtained. These observations demonstrate that the proposed method is effective in improving the performance of the BPNN. One reasonable explanation is that optimizing the BPNN through the SAE can effectively overcome the defects caused by the randomness of initial weights and thresholds. Similar improvements can also be found in terms of the APE. Compared with the BPNN, the number of APE (>5%) declined significantly from 79 to 61 when using the proposed method. Furthermore, the maximum value of APE decreased by 1.32% from 10.27 to 8.95%, and the standard deviation of APE gained improvements of 5.52%. One main reason for achieving these improvements is that the SAE can improve the signaltonoise ratio to a certain extent. In other words, the SAE can be used for data denoising, which is conducive to removing the impact of noise on modeling. As shown in Table 7, the RMSE value obtained from the proposed method is fairly low, there is still room to further reduce uncertainties. In future work, it is suggested that weather variables should be taken into consideration during the procedure of input variables selection. However, it is expected to be a challenge for water utilities to collect and keep record of weatherrelated data, which would deteriorate the practicality of the proposed method to some extent.
In Table 7, it can be seen that the proposed method performed better than the PSO–BPNN and MEA–BPNN models in the majority of indicators. Compared with the PSO–BPNN and MEA–BPNN models, the proposed method gained MAPE improvements of 3.33 and 3.89%, respectively. In terms of the RMSE, improvements of 2.73 and 3.33%, respectively, can be achieved. These results suggest that the proposed method outperforms the PSO–BPNN and MEA–BPNN models in terms of both forecasting accuracy and stability. Similar information can also be found in other indicators shown in Table 7. Compared with the BPNN, although the PSO–BPNN and MEA–BPNN models also gained some improvements in APE, these improvements are inferior to that obtained from the proposed method. The underlying reasons for these findings are mainly as follows: First, unlike the SAE, PSO and the MEA have no ability to reduce data noise. This is the key reason why the proposed method is more effective to reduce the number of extreme APE values. Next, the SAE can effectively extract useful input features and use them as the basis for optimizing the BPNN's thresholds and weights. Finally, PSO and the MEA only rely on their global search ability to obtain optimal thresholds and weights without input features extraction.
Unlike the aforementioned indicators, the BPNN has an advantage over the other three methods, which is mainly because the BPNN's structure is simpler than that of the three other methods. While the computational load of the proposed method is slightly higher than that of the BPNN, it has no impact on the practicality of the proposed method because, in our experiments, it took only 3.57 s to predict 772 data points. Furthermore, compared with the PSO–BPNN and MEA–BPNN models, the proposed method took less time to achieve the desired forecasts. Considering that these three methods have the same parameters as the BPNN, it can be inferred that the proposed method is more efficient in terms of searching for the optimal parameters than the PSO–BPNN and MEA–BPNN models.
Similar to the computational load, the BPNN has the best performance in R^{2}. This means that the BPNN has a better goodness of fit. However, as shown in Table 7, all R^{2} values for the other BPNNbased methods are greater than 0.90, which indicates that the forecasts obtained from these methods also fit observed values well. That is to say, it is acceptable and feasible to use these methods for prediction.
Figure 8 shows the comparison results between the proposed method and the SAE–BPNN model without data reconstruction.
CONCLUSIONS
In this paper, a new method that integrates the SAE into the BPNN is proposed for shortterm water demand forecasting. In this method, the SAE module conducts feature extraction in an unsupervised learning manner, whereas the BPNN module is used to forecast the water demand. To enhance the proposed method's forecasting performance, data reconstruction is adopted to generate suitable samples for the BPNN module. Hourly water demand data obtained from a realworld water distribution system were used to verify the effectiveness of the proposed method, and comparisons with other similar methods were also considered. The results show that the proposed method has an advantage over the BPNN, PSO–BPNN and MEA–BPNN models in both prediction accuracy and stability. In addition, the findings also prove that the proposed method demonstrates promise as a powerful tool for shortterm water demand forecasting in a simple but effective way.
Extracting useful features by unsupervised learning is an advantage of the proposed method over other methods in this study. However, the proposed method also has some limitations. First, adequate unlabeled samples are required to achieve useful features through unsupervised learning. The forecasting performance of the proposed method may be subject to unlabeled samples. Next, as a kind of BPNNbased method, the proposed method also has requirements for massive training samples. As a consequence, the application of the proposed method may be limited in some scenarios where sufficient samples are not available.
ACKNOWLEDGEMENTS
This work is supported in part by the MiddleAged and Young Teachers' Basic Ability Promotion Project of Guangxi (Grant No. 2021KY0438) and the Natural Science Foundation of Guangxi Province (Grant No. 2022GXNSFAA035582).
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories: https://kdocs.cn/l/cbiLlwiRFAi1.
CONFLICT OF INTEREST
The authors declare there is no conflict.