## Abstract

Short-term (e.g., hourly) urban water consumption (or demand) prediction is of great significance for the optimal operation of the intelligent water distribution pump stations. In this study, three single models (autoregressive integrated moving average (ARIMA), back-propagation (BP), support vector machine (SVM)) and three hybrid models (ensemble empirical mode decomposition (EEMD)-ARIMA, EEMD-BP and EEMD-SVM) were developed and compared in terms of prediction accuracy and application convenience. 31-day (1 month) hourly flow series from a water distribution division in Shanghai were used for the demonstration case study, among which 30-day data used for model training and 1-day data used for model verification. Finally, the effects of historical data length on the prediction accuracy of three hybrid models were also analyzed, and the optima of the historical data length for three hybrid models were obtained. Results reveal that (1) the mean absolute percentage errors (MAPE) of EEMD-ARIMA, EEMD-BP, EEMD-SVM, ARIMA, BP and SVM are 5.2036, 1.4460, 1.3424, 5.7891, 4.3857 and 3.8470%, respectively. (2) In terms of prediction accuracy and actual practice convenience, EEMD-SVM performs best among the above six models. (3) The EEMD algorithm is effective for improving the prediction accuracy of six models. (4) The optimal historical data length of EEMD-ARIMA, EEMD-BP and EEMD-SVM are 11, 11 and 10 days, respectively.

## HIGHLIGHTS

Three single models (ARIMA, BP and SVM) and three hybrid models (EEMD-ARIMA, EEMD-BP and EEMD-SVM) were compared for the prediction of hourly water demand.

EEMD-SVM performs best among the six prediction models.

The EEMD algorithm is significant for improving prediction accuracy.

The optimal historical data length for intelligent algorithms should be greater than a week.

### Graphical Abstract

## NOMENCLATURE

- ARIMA
autoregressive integrated moving average

- ARMA
autoregressive moving average

- ANN
artificial neural network

- BP
back-propagation

- DNN
deep neural network

- ES
exponential smoothing

- EMD
empirical mode decomposition

- EEMD
ensemble empirical mode decomposition

- IMF
intrinsic mode functions

- LSSVM
least square support vector machine

- MAPE
mean absolute percentage error

- MAE
mean absolute error

- MSE
mean square error

- Res
residual

- RMSE
root mean square error

- RF
random forest

- SVM
support vector machine

- WDN
water distribution networks

## INTRODUCTION

Urban water demand forecast has a great effect on improving the stability of water supply. Due to the increasing number of water customers, the pressure of the water supply pipeline network has also increased, which, in turn, increases the risk of pipeline bursts and leaks and reduces the sustainability of peak water supply. Short-term water demand forecasting helps quantify the probability of pipeline bursts during the water supply process and identify possible leaks in the pipeline network (Hutton & Kapelan 2015; Brentan *et al.* 2017; Du *et al.* 2021). In addition, the prediction of urban water demand plays an important role in reducing pump station electricity consumption. Urban waterworks need to consume a lot of electricity in the process of producing highly qualified drinking water for users. In the United States, about 75 billion kWh of electricity are used for water purification and transmission each year, accounting for 4% of the total electricity consumption, and the cost of electricity is about 4 billion US dollars (Goldstein & Smith 2002). In China, electricity for water supply accounts for 30–50% of total water production costs (Shu *et al.* 2010). The operating costs of water supply pump stations in the water distribution network constitute the largest expenditure of the water supply enterprises worldwide (Zyl *et al.* 2004). When the pump is running, its cost mainly includes maintenance cost and electricity consumption cost, the cost of electrical energy during pump operation usually accounts for most of the total cost as the price of electricity has been rising globally (Mala-Jetmarova *et al.* 2017). Researchers proposed and tested an adaptive weighted sum genetic algorithm, which demonstrated the ability to achieve optimal pump scheduling to reduce energy consumption and daily maintenance costs of the water supply system (Abiodun & Ismail 2013). The problem of optimal pump scheduling has been studied in previous research (de la Perrière *et al.* 2014; Ghaddar *et al.* 2015; Carpitella *et al.* 2019; Luna *et al.* 2019). In the case of the Netherlands, relevant experiments show that the overall energy costs of water distribution networks (WDNs) based on predictive flow control are around 1.7–7.4% lower than that of traditional level-based flow control systems (Bakker *et al.* 2013a). It is crucial to reduce energy consumption in production, especially in the context of China's current ‘peak carbon’ and ‘carbon neutral’ targets, and the optimization of pumping stations in water distribution systems can reduce unnecessary energy consumption and thus reduce carbon emissions from power plants. Short-term water demand forecasting makes a fundamental contribution to the real-time intelligent control and scheduling of pumping stations, and it has far-reaching significance.

There are a number of problems with building urban water demand forecasting models, as water demand fluctuates over time and future water demand at certain times is affected by previous water demand, demographic, holiday and weather conditions (Donkor *et al.* 2014; Romano & Kapelan 2014). When modeling, if there are many factors to consider, it will inevitably result in more model input variables and increase model complexity and calculation time. At the same time, for water utilities, some factors are more difficult to obtain reliably than historical water demand data, causing inconvenience in practical application. It has been shown that reliable predictions can be achieved using historical hourly water demand data as the only input when predicting short-term water demand on an hourly scale (Cutore *et al.* 2008; Bakker *et al.* 2013b). As a result, short-term water demand forecasting is increasingly being investigated in order to better manage urban water supply systems.

At present, the methods that can be used to predict urban water demand can be roughly summarized as two categories: the traditional statistical method and the artificial intelligence method (Mikut & Reischl 2011). Traditional statistical methods include exponential smoothing (ES), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA) and so on (Odan & Reis 2012). However, traditional statistical models have the characteristics of stationarity, linearity and low complexity. It is subject to certain restrictions when dealing with unstable and nonlinear time-series data, resulting in low prediction accuracy (Shukur & Lee 2015; Zhang *et al.* 2016, 2017; Ma *et al.* 2017; Qin *et al.* 2017). Studies over the past two decades have provided important information on the artificial intelligence method, including random forest (RF), artificial neural network (ANN) and support vector machine (SVM), and so on. In view of the nonlinear characteristics of water supply data, some scholars developed the RF regression model to predict daily water demand data in Southwest China (Chen *et al.* 2017). Previous research comparing the traditional statistical model and the ANN model has found the superiorities of the ANN model in processing complex and nonlinear time-series data (Bougadis *et al.* 2005; Piasecki *et al.* 2018; Yin *et al.* 2018). ANN, deep neural network (DNN), RF and least square support vector machine (LSSVM) were used to predict water demand at different time intervals (1, 12 and 24 h), then used R-squared, root mean square error (RMSE), mean square error (MSE) and mean absolute error (MAE) to compare the performance of the model. Results show that the ANN model performs best in the above time intervals (Vijai & Sivakumar 2018). In addition to the ANN, there are also some researchers who have used SVM regression models based on optimization algorithms to predict short-term urban water demand (Candelieri *et al.* 2019; Wu *et al.* 2020b). In some studies, SVM was selected as one of the tools to forecast hourly urban water demand, and the results show that SVM can be used as an appropriate modeling algorithm (Herrera *et al.* 2010).

In fact, in the use of a single algorithm for modeling, whether it is a traditional statistical method or an artificial intelligence method, there are several disadvantages. The traditional statistical model has a fixed function form and has relatively strict assumptions on the sampling data. Although artificial intelligence models can be used to analyze complex and nonlinear data more effectively, they also have defects, such as overfitting and the local optimization. Original water consumption data without preprocessing are directly used as input data when using a single model to make predictions. However, water consumption data are generally characterized by nonstationarity, nonlinearity and complexity, making the performance of a single prediction model inaccurate and unstable. The use of an orderly combination of multiple algorithms can make use of their respective advantages to make up for deficiencies, thereby improving the performance of the model. For complex time series, the ensemble empirical mode decomposition (EEMD) algorithm is an effective processing algorithm; it can decompose the different frequency components of the data to generate relatively stable subsequence, which is more conducive to modeling. The use of EEMD to pre-treat the data reduces the instability of original data and increases the possibility of achieving high accuracy predictions. In addition, intrinsic mode functions (IMFs) are generated by the EEMD decomposition to eliminate stochastic volatility, so the prediction effect can be improved. Previous studies have demonstrated that the prediction performance of the hybrid model based on empirical mode decomposition (EMD) is better than a single model in many cases (Lin *et al.* 2012; Wei & Chen 2012; Wang *et al.* 2016).

In this study, the EEMD algorithm was chosen as the preprocessing algorithm for short-term water demand data because of its effectiveness in decomposing complex time series. The ARIMA algorithm, as a classical algorithm in traditional statistical methods, is widely used in urban water demand forecasting because of its small computational power and does not consider other influencing factors (Brentan *et al.* 2017; Oliveira *et al.* 2017). Compared with AR, MA and ARMA, ARIMA has the advantage of adding a differential process, which can be used for both stationary and nonstationary series, making it more applicable to a wider range of applications. Therefore, among the statistical models we choose the ARIMA model for prediction. A back-propagation (BP) neural network has strong learning ability and high efficiency in data processing, which can adapt to the needs of urban water demand forecasting (Qi & Chang 2011). In order to compare traditional statistical methods with artificial intelligence methods, we chose a BP neural network, which has the advantage of minimizing the error between the output of the network and the ideal value during training based on an empirical risk minimization criterion (Ai *et al.* 2022). However, the disadvantage of BP is that it may fall into a local optimum solution (Rangel *et al.* 2016). SVM can overcome some of the shortcomings of BP, mainly because when developing the cost function, SVM applies the principle of structural risk minimization rather than the empirical risk minimization principle used by BP (Tripathi *et al.* 2006; Bazrkar & Chu 2022). Therefore, in this paper, both BP and SVM are chosen to compare their prediction effects in artificial intelligence algorithms.

The framework of this paper is organized as follows: the section Methodology describes the details of the proposed algorithms. In the section Results of Demonstration Study, prediction results of different models and the impact of length of historical data on the prediction accuracy are presented. Discussions are stated in the section Discussion. Lastly, the main conclusions are summarized in the section Conclusions.

## METHODOLOGY

Urban hourly water consumption data are generally characterized by nonlinearity, nonstationarity, randomness and complexity. Here, a hybrid prediction methodology based on the EEMD algorithm was developed to improve the prediction accuracy. It consists of three parts: (1) the multi-scale decomposition tool, EEMD algorithm, to decompose the water consumption data series into several IMFs; (2) ARIMA, BP and SVM models were then used to predict for each IMF; (3) prediction results for IMFs were added together to obtain the final prediction results. The basic framework is shown in Figure 1. Particularly, to demonstrate the reliability and efficacy of EEMD algorithm, three single models (ARIMA, BP and SVM) were firstly developed to provide benchmark prediction results, and then three hybrid models (EEMD-ARIMA, EEMD-BP and EEMD-SVM) were performed for further improvement. The parameter values of the algorithms involved in this study are summarized in Table 1.

Algorithms . | Parameters . | Single models . | Hybrid models . | References . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | Res . | ||||

EEMD | std | 0.2 | Not applicable | Zhang et al. (2008, 2010); Ren et al. (2017) | ||||||||

NE | 100 | |||||||||||

ARIMA | p | 16 | 4 | 7 | 8 | 10 | 4 | 8 | 6 | 9 | 10 | – |

d | 1 | 0 | 0 | 0 | 0 | 4 | 5 | 5 | 5 | 2 | – | |

q | 14 | 1 | 0 | 0 | 0 | 10 | 6 | 6 | 3 | 0 | – | |

BP | Number of neurons in input layer | 5 | Guo et al. (2018) | |||||||||

Number of neurons in hidden layer | 8 | 8 | 9 | 11 | 9 | 10 | 4 | 8 | 13 | 10 | – | |

Training function | Trainlm | – | ||||||||||

Transfer function | ‘tansig’ for hidden layers, ‘purelin’ for output layers | Guo et al. (2018) | ||||||||||

Epoch | 10,000 | – | ||||||||||

Performance | 10^{−5} | Chen et al. (2017) | ||||||||||

Training speed | 0.01 | Herrera et al. (2010) | ||||||||||

Momentum parameter | 0.9 | – | ||||||||||

SVM | -s | 3 | – | |||||||||

-t | 2 | – | ||||||||||

-p | 0.01 | – |

Algorithms . | Parameters . | Single models . | Hybrid models . | References . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | Res . | ||||

EEMD | std | 0.2 | Not applicable | Zhang et al. (2008, 2010); Ren et al. (2017) | ||||||||

NE | 100 | |||||||||||

ARIMA | p | 16 | 4 | 7 | 8 | 10 | 4 | 8 | 6 | 9 | 10 | – |

d | 1 | 0 | 0 | 0 | 0 | 4 | 5 | 5 | 5 | 2 | – | |

q | 14 | 1 | 0 | 0 | 0 | 10 | 6 | 6 | 3 | 0 | – | |

BP | Number of neurons in input layer | 5 | Guo et al. (2018) | |||||||||

Number of neurons in hidden layer | 8 | 8 | 9 | 11 | 9 | 10 | 4 | 8 | 13 | 10 | – | |

Training function | Trainlm | – | ||||||||||

Transfer function | ‘tansig’ for hidden layers, ‘purelin’ for output layers | Guo et al. (2018) | ||||||||||

Epoch | 10,000 | – | ||||||||||

Performance | 10^{−5} | Chen et al. (2017) | ||||||||||

Training speed | 0.01 | Herrera et al. (2010) | ||||||||||

Momentum parameter | 0.9 | – | ||||||||||

SVM | -s | 3 | – | |||||||||

-t | 2 | – | ||||||||||

-p | 0.01 | – |

*Note*: -*s* is the type of SVM, setting to 3 means e-SVR; *-t* is the type of kernel function, setting to 2 means radial basis function; *-p* is the value of the loss function in the e-SVR.

### Empirical mode decomposition (EMD)

EMD is a multi-scale analysis method for nonlinear, nonstationary and complex time-series data (Huang *et al.* 1998). It decomposes the original sequence based on the time-scale characteristics of the data itself and can extract the local information of the feature set of the original data from the time series (Tiwari & Kanungo 2010). Compared with the traditional time-series decomposition method, it has better adaptive characteristics and can effectively decompose the fluctuation of different frequencies in the data step by step to obtain multiple IMFs and residual (Res).

The EMD process for a given time sequence is as follows:

- (1)
Let , ;

- (2)
Determine the local extremum point of sequence ;

- (3)
The upper and lower envelope lines are obtained by local maximum and local minimum interpolation, respectively.

- (4)
- (5)
, if meets the conditions: (1) the number of extreme points (including local maximum points and local minimum points) equals or differs from the number of zero crossings to 1; (2) at any point, the average of the local maximum envelope and the local minimum envelope is 0. is considered to be an IMF, let , ; if is not an IMF, let .

- (6)
Repeat steps (2)–(5), until the residual satisfies the stop condition.

### Ensemble empirical mode decomposition (EEMD)

Although EMD has been widely successful in applied research, it has the defect of mode mixing (Wang *et al.* 2012), defined as either a single IMF including a scale of other modes, or a scale existing in different IMFs. Aimed at the problem of mode mixing in EMD, EEMD was proposed.

EEMD adds white noise to original data based on EMD. Normally, suppose the added white noise obeys , data with white noise added are decomposed by EMD. Because white noise is added to disturb the original data before the overall average, the purpose of avoiding mode mixing is achieved (Wu & Huang 2009).

The detailed process of EEMD is as follows:

- (1)
- (2)
Use EMD to decompose time-series data after adding white noise;

- (3)

### Autoregressive integrated moving average

*p*is the order of autoregression, MA is the part of moving average and

*q*is the order of moving average.

*I*is the part of difference, and

*d*represents the order required to differentiate the original nonstationary series into a stationary series. ARIMA solves the problem of poor fitting effect of the ARMA model for nonstationary, nonlinear and other characteristics of time-series data. In the ARIMA model, it is assumed that the future value of the target variable is a linear function of historical data and error. In daily water-use cases, there is often a certain correlation between hourly water demand, which is consistent with the assumptions of the model, so the model is considered for water demand forecasting in this paper, i.e., it is assumed that the future water demand in the case is linearly correlated with the past water demand. The model expression is as follows:

where is the sample value, is the order of model, is the current random error interference, and and are the parameters.

*B*is the delay operator; is a polynomial of all autoregression coefficients in the ARIMA model; is a polynomial of all moving average coefficients in the ARIMA model; , ; represents white noise subject to ; denotes the correlation between factors before and after the sequence.

### BP neural network

BP neural network is a feedforward neural network composed of nonlinear transformation units based on the error back-propagation algorithm (gradient descent, supervised learning algorithm). The basic idea is to use the gradient search technology to solve the minimum error function. The basic process mainly includes: (1) forward spread of information and (2) backward propagation of error (Wu *et al.* 2020a). The basic structure is: one input layer, one output layer and one or more hidden layers (Yaqin *et al.* 2020). It has powerful computational skills and can learn the input and output relationships of samples through training. It is one of the most mature and widely used data mining technologies so far, and it has broad prospects in the field of classification and prediction.

The structure of the three-layer BP neural network is shown in Figure 2. The input signal is forwarded layer by layer to the output layer through the neurons of the input layer and the hidden layer. The neurons in each layer affect each other. If the result of the output layer has a large error with the expected result, the error is calculated and transmitted backwards along the original path. Then the connection weights between the input neurons and the hidden layer neurons are adjusted to make the error decrease along the gradient direction, and then the error is transferred to the forward propagation process. The training is repeated until the output error falls within the allowable range. The process is shown in Figure 3.

### Support vector machine (SVM)

SVM is a new machine learning technology proposed by Cortes & Vapnik (1995). It is based on the VC dimension theory of statistical learning theory and the minimum principle of structural risk. According to the characteristics of the given data, it finds the most appropriate point between the model complexity and learning ability, so as to obtain the best promotion ability. Compared with the traditional linear model, SVM does not complicate the calculation process. In addition, SVM can obtain the global optimal search process and is suitable for the case of fewer samples. SVM is divided into two categories: one is SVC (support vector classifier) for solving classification problems and the other is SVR (support vector regression) for solving regression problems (Malik *et al.* 2020). The brief introduction of SVR is as follows:

*C*is a penalty parameter to balance the generalization ability and complexity of the model; represents the error requirement of the regression function, which ensures the sparsity of the solution and are two slack variables, and they restrict the upper and lower bounds of the output value by using Lagrange function to solve the above optimal model and introducing kernel function.

In this study, the LIBSVM toolbox was used to implement the SVM algorithm (Chang & Lin 2011).

### Evaluation indicators of model performance

In order to compare the predictive performance of different models, the RMSE, the mean absolute percentage error (MAPE) and the correlation coefficient (*R*) were chosen as evaluation indicators. The details of these indicators are shown below.

## RESULTS OF THE DEMONSTRATION STUDY

### Study data

An hourly water supply flow series was used for the case study, which was measured from July 1, 2013 to July 31, 2013 for a water supply division of Shanghai, with a total of 744 data points in the entire study period, as is shown in Figure 4. The July water demand data were chosen because the forecast is on an hourly scale and has little correlation with seasonal factors; in addition, July is in the summer months and covers part of the maximum water demand to a certain extent, which is generally high and highly volatile. As water management is considered as well as short-term scheduling optimization of pumping stations, the objective of the model is to predict the future hourly water demand in real time and the historical data are updated hourly; furthermore, it appears that there is a cyclical nature to the daily hourly water demand in the short term. Therefore, the last 24-hourly data points (July 31, 2013) are used as the actual values for verification of prediction models. To ensure that the model was fully trained, we chose to use all the remaining historical data as the training set, so the first 720 data (July 1, 2013–July 30, 2013) are used as the known historical data to train the models.

### Outlier analysis

We use MATLAB to perform an outlier test on the original sequence with the method ‘median’, which is described as returning ‘true’ for elements more than three scaled MAD from the median. The scaled MAD is defined as *c**median(abs(*A* − median(*A*))), where *c* = −1/(sqrt (2) *erfcinv (3/2)). Detected outliers are highlighted in Figure 5. By analyzing the outliers, we found that most of the outliers are water demand in the early morning hours and a small number of water demand data in the peak hours. This part of the outliers contributes to a certain volatility in the whole series and is consistent with the daily water-use pattern, so we choose to keep the outliers.

### Nonlinear analysis

The linear and nonlinear characteristics of the data can be tested by the BDS test method proposed by Broock *et al.* (1996), that is, the linear regression residual method. Before performing the BDS test, in order to eliminate the linear correlation component in the original series, the linear autoregression of the original series is first performed by the AR model and calculated by the fitted residuals. Next the BDS test was carried out with the aid of Eviews software, where the parameters can be based on Brock's suggestion , and *r* is 0.7 times the variance of the data in the phase space (Broock *et al.* 1996). The test result of , which is less than 0.05, indicates that the data have nonlinear characteristics, as shown in Table 2.

Dimensions . | BDS statistic . | Std. error . | z-statistic
. | Prob. . |
---|---|---|---|---|

2 | 0.085532 | 0.004984 | 17.161730 | 0.00000 |

3 | 0.137316 | 0.005686 | 24.149430 | 0.00000 |

4 | 0.174202 | 0.004877 | 35.716930 | 0.00000 |

Dimensions . | BDS statistic . | Std. error . | z-statistic
. | Prob. . |
---|---|---|---|---|

2 | 0.085532 | 0.004984 | 17.161730 | 0.00000 |

3 | 0.137316 | 0.005686 | 24.149430 | 0.00000 |

4 | 0.174202 | 0.004877 | 35.716930 | 0.00000 |

### Data normalization

*y*, respectively, specified as scalars.

*x*is the matrix you want to process, specified as an

*N*-by-

*Q*matrix. , are the minimum and maximum values for each row of the input matrix

*x*, respectively.

### Prediction results of the single models

First of all, three commonly used single models (ARIMA, BP and SVM) were used for prediction. Figure 6 and Table 3 show the predicted results and performance of the three single models. It can be seen that MAPEs of three single models were all lower than 10%, which mirrors that prediction performance of three single models was suitable for hourly water consumption prediction scenarios. Among them, MAPEs of BP and SVM algorithms were lower than 5% and better than that of ARIMA probably because of their nonlinear modeling ability.

Models . | RMSE (m^{3}/h)
. | MAPE (%) . | R (%)
. |
---|---|---|---|

ARIMA | 3,626.1686 | 5.7891 | 88.13 |

BP | 3,183.7223 | 4.3857 | 91.72 |

SVM | 3,199.7851 | 3.8470 | 90.18 |

Models . | RMSE (m^{3}/h)
. | MAPE (%) . | R (%)
. |
---|---|---|---|

ARIMA | 3,626.1686 | 5.7891 | 88.13 |

BP | 3,183.7223 | 4.3857 | 91.72 |

SVM | 3,199.7851 | 3.8470 | 90.18 |

### Prediction results of three hybrid models

In order to further improve the prediction performance, the EEMD algorithm was introduced to develop hybrid prediction models. Based on the above three single models and the EEMD algorithm, three hybrid models (EEMD-ARIMA, EEMD-BP and EEMD-SVM) were developed and compared in the following sections.

### Decomposition results of original data based on EEMD

The MATLAB EEMD toolkit was used to decompose the original hourly water supply flow series. The standard deviation of the added white noise was set to 0.2, and the number of iterations NE was set to 100 (Zhang *et al.* 2008, 2010; Ren *et al.* 2017). The decomposition results are shown in Figure 7. The original water supply flow series were decomposed into eight independent IMF components (IMF1, IMF2, …, IMF8) and a Res component, which were arranged in the order of frequency from high to low. It can be seen from Figure 7 that the periodicity of eight IMFs increases gradually with the decrease of frequency, while the amplitude of eight IMFs decreases in turn.

### Prediction results of EEMD-ARIMA

The ARIMA algorithm was used to predict eight IMFs and Res components, which were obtained by EEMD decomposition, respectively. For each component, the first 720 data were used as input to the ARIMA model to predict the water demand in the next 24 h. The model prediction results are obtained by summing the predictions of eight components. ARIMA model parameters (*p*, *d*, *q*) corresponding to different components (or IMFs) are shown in Table 4. The prediction results are shown in Figure 8. The final prediction results were compared with the actual values (24 data), and the performance of EEMD-ARIMA model was evaluated by RMSE, MAPE and R. According to calculation results, RMSE, MAPE and R of the prediction of EEMD-ARIMA were 3,713.1656, 5.2036 and 86.36%, respectively.

Components (IMFs) . | p
. | d
. | q
. |
---|---|---|---|

IMF1 | 4 | 0 | 1 |

IMF2 | 7 | 0 | 0 |

IMF3 | 8 | 0 | 0 |

IMF4 | 10 | 0 | 0 |

IMF5 | 4 | 4 | 10 |

IMF6 | 8 | 5 | 6 |

IMF7 | 6 | 5 | 6 |

IMF8 | 9 | 5 | 3 |

Res | 10 | 2 | 0 |

Components (IMFs) . | p
. | d
. | q
. |
---|---|---|---|

IMF1 | 4 | 0 | 1 |

IMF2 | 7 | 0 | 0 |

IMF3 | 8 | 0 | 0 |

IMF4 | 10 | 0 | 0 |

IMF5 | 4 | 4 | 10 |

IMF6 | 8 | 5 | 6 |

IMF7 | 6 | 5 | 6 |

IMF8 | 9 | 5 | 3 |

Res | 10 | 2 | 0 |

*Note*: *p* is the order of autoregression, *q* is the order of moving average and *d* is the order of difference.

### Prediction results of EEMD-BP

*t*. The number of input layer neurons was set to 5, which means that water supply data at should be used for prediction (Guo

*et al.*2018). The model was shown as the following formula:where was water supply flow at hour

*i*.

*m*is the neuron number of input layer;

*n*is the number of input layer neurons;

*a*is a parameter and its range is [1,10]. In the case study, the resulting range of the neuron number of hidden layers was [3,13]. The number of the hidden layer neurons for eight components (IMFs) was optimized by the enumeration method, and results are shown in Table 5. Other parameters of the BP algorithm are shown in Table 6. The predicted results are shown in Figure 9. RMSE, MAPE and R of EEMD-BP were 1,036.7634, 1.4460 and 98.91%, respectively, which were all better than that of the single BP model. It shows that EEMD decomposition measures can improve the prediction performance of the BP algorithm.

Components . | Number of hidden layer neurons . |
---|---|

IMF1 | 8 |

IMF2 | 9 |

IMF3 | 11 |

IMF4 | 9 |

IMF5 | 10 |

IMF6 | 4 |

IMF7 | 8 |

IMF8 | 13 |

Res | 10 |

Components . | Number of hidden layer neurons . |
---|---|

IMF1 | 8 |

IMF2 | 9 |

IMF3 | 11 |

IMF4 | 9 |

IMF5 | 10 |

IMF6 | 4 |

IMF7 | 8 |

IMF8 | 13 |

Res | 10 |

Parameters . | Value settings . |
---|---|

Training function | Trainlm |

Transfer function | ‘tansig’ for hidden layers, ‘purelin’ for output layers |

Epoch | 10,000 |

Performance | 10^{−5} |

Training speed | 0.01 |

Momentum parameter | 0.9 |

Parameters . | Value settings . |
---|---|

Training function | Trainlm |

Transfer function | ‘tansig’ for hidden layers, ‘purelin’ for output layers |

Epoch | 10,000 |

Performance | 10^{−5} |

Training speed | 0.01 |

Momentum parameter | 0.9 |

### Prediction results of EEMD-SVM

Some of the parameter settings in the LIBSVM toolbox are shown in Table 7.

Parameter . | -s
. | -t
. | -p
. |
---|---|---|---|

Value settings | 3 | 2 | 0.01 |

Parameter . | -s
. | -t
. | -p
. |
---|---|---|---|

Value settings | 3 | 2 | 0.01 |

*Note*: -*s* is the type of SVM, setting to 3 means e-SVR; *-t* is the type of kernel function, setting to 2 means radial basis function; *-p* is the value of the loss function in the e-SVR.

The final prediction result is shown in Figure 10. RMSE, MAPE and R of EEMD-SVM were 892.9561, 1.3424 and 99.21%, respectively. Firstly, they were all better than the single SVM model, which mirrors that EEMD decomposition measures can improve the prediction performance of the SVM algorithm dramatically. Second, the prediction result of EEMD-SVM was better than that of EEMD-BP.

### Impact of historical data length on model prediction accuracy

In the real practice, time-consumption of computers will increase if the historical training data used for the above models are too long. Conversely, model performance will be reduced significantly. Therefore, the appropriate length of historical data was an important parameter to be optimized in the actual project. Taking MAPE and RMSE as evaluation indicators, the impacts of the historical data lengths on the prediction performance of three hybrid models were investigated.

### Impact of historical data length on EEMD-ARIMA model performance

In order to find the relationship between the length of the training data and the prediction performance, the adjacent 1, 2, 3, …, 11, 12 days of the historical data (training data) were used to predict the hourly water consumption (24 data) on July 31, respectively. The fitting results are shown in Figure 11, and variations of MAPE and RMSE are shown in Figure 12. From Figures 11 and 12, we can see that MAPE and RMSE of EEMD-ARIMA models decrease gradually with the increase of the length of historical data. Considering the requirements of reducing the length of training data and the accuracy of the prediction model, the optima of the historical data was selected as 11 days.

### Impact of historical data length on EEMD-BP model performance

Similarly, for the EEMD-BP model, the adjacent 1, 2, 3, …, 11, 12 days of the historical data (training data) were used to predict the hourly water consumption (24 data) on July 31, respectively. The fitting results are shown in Figure 13, and variations of MAPE and RMSE are shown in Figure 14. From Figures 13 and 14, we can see that MAPE and RMSE of EEMD-BP models decrease gradually with the increase of the length of historical data. Considering the requirements of reducing the length of training data and the accuracy of the prediction model, the optima of the historical data was selected as 11 days.

### Impact of historical data length on EEMD-SVM model performance

For the EEMD-SVM model, the adjacent 1, 2, 3, …, 11, 12 days of the historical data (training data) were also used to predict the hourly water consumption (24 data) on July 31, respectively. The fitting results are shown in Figure 15, and variations of MAPE and RMSE are shown in Figure 16. From Figures 15 and 16, we can see that MAPE and RMSE of EEMD-SVM models generally decrease with the increase of the length of historical data. Considering the requirements of reducing the length of training data and the accuracy of the prediction model, the optima of the historical data was selected as 10 days.

## DISCUSSION

Three single models (ARIMA, BP and SVM) and three hybrid models (EEMD-ARIMA, EEMD-BP and EEMD-SVM) were investigated and compared in this study, as shown in Table 8.

Forecasting models . | RMSE (m^{3}/h)
. | MAPE (%) . | R (%)
. |
---|---|---|---|

ARIMA | 3,626.1686 | 5.7891 | 88.13 |

BP | 3,183.7223 | 4.3857 | 91.72 |

SVM | 3,199.7851 | 3.8470 | 90.18 |

EEMD-ARIMA | 3,713.1656 | 5.2036 | 86.36 |

EEMD-BP | 1,036.7634 | 1.4460 | 98.91 |

EEMD-SVM | 892.9561 | 1.3424 | 99.21 |

Forecasting models . | RMSE (m^{3}/h)
. | MAPE (%) . | R (%)
. |
---|---|---|---|

ARIMA | 3,626.1686 | 5.7891 | 88.13 |

BP | 3,183.7223 | 4.3857 | 91.72 |

SVM | 3,199.7851 | 3.8470 | 90.18 |

EEMD-ARIMA | 3,713.1656 | 5.2036 | 86.36 |

EEMD-BP | 1,036.7634 | 1.4460 | 98.91 |

EEMD-SVM | 892.9561 | 1.3424 | 99.21 |

According to results of the above six models, the optimal prediction model can be selected in terms of prediction accuracy and computation time. According to Table 8, EEMD-SVM performed best among the six models, followed by EEMD-BP, SVM and BP models. First, we can see that the prediction performance of the artificial intelligent models (BP and SVM) was superior to that of the linear model ARIMA according to the above study results. This result is consistent with the previous literature (Li & Huicheng 2009; Antunes *et al.* 2018). The water consumption process is commonly affected by different kinds of factors and is complicated, nonstationary and nonlinear. The artificial intelligent models, based on deep machine learning, can deal with these different characteristics better than the traditional linear models. In addition, in terms of computer time, the artificial intelligent models (BP and SVM) were less than the linear model (ARIMA). Although their physical mechanisms are still not clear, the artificial intelligent prediction models (deep leaning or data-driven models) will be more and more widely used in future. Second, according to the above study results, the EEMD hybrid models were better than the single models, which reveal that EEMD was an effective algorithm to improve the prediction models. This is supported by previous studies on predictive models with a mixture of EEMD (Lin *et al.* 2012; Wei & Chen 2012; Wang *et al.* 2016). Coupling with EEMD decomposition, MAPE of ARIMA, BP and SVM models decreased by 10.114, 67.029 and 65.105%, respectively. Because the suitable models can be selected for the different variation characteristics of water demand series, the decomposition and ensemble prediction methodologies still have great potential for improving the prediction accuracy. Finally, seasonal factors have little influence on short-term (e.g., hourly) water consumption prediction. For forecasting on hourly scales, the use of historical data alone is sufficient (Cutore *et al.* 2008; Bakker *et al.* 2013b; Guo *et al.* 2018). Therefore, the advantages of ARIMA cannot be reflected and it is not suitable for this occasion.

Since the EEMD-SVM model performed the best of all models, we simulated the model again using longer data, predicting hourly water demand data for the coming week, and the fit results are shown in Figure 17. After calculation, MAPE = 1.24%, again with satisfactory performance. Therefore, the EEMD-SVM model is valuable for use in future pumping station scheduling and management.

Besides the prediction accuracy, the computer time of the prediction algorithm was another factor to consider. For this end, the historical data used for algorithm training (or deep learning) were particularly investigated and the optimal length of the training data was recommended for different prediction models. The results are shown in Table 9. In addition, we also performed partial autocorrelation analysis on the historical data, and the results showed that the partial autocorrelation was higher than the other ranges when lag took values between 150 and 200, indicating that the data there had certain influence on the prediction accuracy, as shown in Figure 18. Further study revealed that the lag in the interval of 150–200 corresponds to a point in time a few days before the week nearby, which is consistent with a week-long water-use pattern, because there are two significant modes, working day mode and rest day mode for the water consumption process. From this perspective, the historical data should include all the two modes. Therefore, the length of training data should be larger than 7 days (one cycle). Taking into account the above reasons and the simulation results in Table 9, 10 days or 11 days was the optimal length for the training data of EEMD hybrid models.

Historical data length (d)
. | Forecasting models . | RMSE (m^{3}/h)
. | MAPE (%) . |
---|---|---|---|

1 | EEMD-ARIMA | 117,554.6544 | 167.1663 |

EEMD-BP | 4,282.9784 | 5.9437 | |

EEMD-SVM | 2,978.1158 | 4.1900 | |

2 | EEMD-ARIMA | 25,529.9031 | 23.7486 |

EEMD-BP | 2,360.8980 | 3.3854 | |

EEMD-SVM | 1,187.2721 | 1.7300 | |

3 | EEMD-ARIMA | 7,984.7288 | 11.3634 |

EEMD-BP | 2,467.6211 | 3.6033 | |

EEMD-SVM | 1,292.8952 | 2.0000 | |

4 | EEMD-ARIMA | 13,003.7349 | 18.1692 |

EEMD-BP | 1,920.4515 | 2.4817 | |

EEMD-SVM | 1,062.3696 | 1.6000 | |

5 | EEMD-ARIMA | 7,585.2915 | 10.5601 |

EEMD-BP | 1,870.5918 | 2.8169 | |

EEMD-SVM | 1,037.3897 | 1.5100 | |

6 | EEMD-ARIMA | 6,261.3510 | 6.9679 |

EEMD-BP | 1,123.9871 | 1.6762 | |

EEMD-SVM | 971.2917 | 1.3400 | |

7 | EEMD-ARIMA | 4,956.9568 | 6.5327 |

EEMD-BP | 1,170.9357 | 1.7468 | |

EEMD-SVM | 945.0464 | 1.4500 | |

8 | EEMD-ARIMA | 3,459.7802 | 4.5406 |

EEMD-BP | 1,345.7225 | 1.6781 | |

EEMD-SVM | 937.1297 | 1.3900 | |

9 | EEMD-ARIMA | 3,441.4380 | 4.7515 |

EEMD-BP | 1,211.5804 | 1.8909 | |

EEMD-SVM | 1,058.1079 | 1.5600 | |

10 | EEMD-ARIMA | 3,473.5999 | 4.4029 |

EEMD-BP | 1,320.6420 | 1.8609 | |

EEMD-SVM | 694.4904 | 1.0000 | |

11 | EEMD-ARIMA | 2,799.1188 | 3.5437 |

EEMD-BP | 938.6738 | 1.3486 | |

EEMD-SVM | 735.0543 | 1.0400 | |

12 | EEMD-ARIMA | 4,480.1003 | 6.3668 |

EEMD-BP | 1,401.2050 | 1.9151 | |

EEMD-SVM | 986.4205 | 1.4700 |

Historical data length (d)
. | Forecasting models . | RMSE (m^{3}/h)
. | MAPE (%) . |
---|---|---|---|

1 | EEMD-ARIMA | 117,554.6544 | 167.1663 |

EEMD-BP | 4,282.9784 | 5.9437 | |

EEMD-SVM | 2,978.1158 | 4.1900 | |

2 | EEMD-ARIMA | 25,529.9031 | 23.7486 |

EEMD-BP | 2,360.8980 | 3.3854 | |

EEMD-SVM | 1,187.2721 | 1.7300 | |

3 | EEMD-ARIMA | 7,984.7288 | 11.3634 |

EEMD-BP | 2,467.6211 | 3.6033 | |

EEMD-SVM | 1,292.8952 | 2.0000 | |

4 | EEMD-ARIMA | 13,003.7349 | 18.1692 |

EEMD-BP | 1,920.4515 | 2.4817 | |

EEMD-SVM | 1,062.3696 | 1.6000 | |

5 | EEMD-ARIMA | 7,585.2915 | 10.5601 |

EEMD-BP | 1,870.5918 | 2.8169 | |

EEMD-SVM | 1,037.3897 | 1.5100 | |

6 | EEMD-ARIMA | 6,261.3510 | 6.9679 |

EEMD-BP | 1,123.9871 | 1.6762 | |

EEMD-SVM | 971.2917 | 1.3400 | |

7 | EEMD-ARIMA | 4,956.9568 | 6.5327 |

EEMD-BP | 1,170.9357 | 1.7468 | |

EEMD-SVM | 945.0464 | 1.4500 | |

8 | EEMD-ARIMA | 3,459.7802 | 4.5406 |

EEMD-BP | 1,345.7225 | 1.6781 | |

EEMD-SVM | 937.1297 | 1.3900 | |

9 | EEMD-ARIMA | 3,441.4380 | 4.7515 |

EEMD-BP | 1,211.5804 | 1.8909 | |

EEMD-SVM | 1,058.1079 | 1.5600 | |

10 | EEMD-ARIMA | 3,473.5999 | 4.4029 |

EEMD-BP | 1,320.6420 | 1.8609 | |

EEMD-SVM | 694.4904 | 1.0000 | |

11 | EEMD-ARIMA | 2,799.1188 | 3.5437 |

EEMD-BP | 938.6738 | 1.3486 | |

EEMD-SVM | 735.0543 | 1.0400 | |

12 | EEMD-ARIMA | 4,480.1003 | 6.3668 |

EEMD-BP | 1,401.2050 | 1.9151 | |

EEMD-SVM | 986.4205 | 1.4700 |

## CONCLUSIONS

In this study, prediction performance of three single models (ARIMA, BP and SVM) and three hybrid models (EEMD-ARIMA, EEMD-BP and EEMD-SVM) were investigated and compared based on three evaluation indicators (RMSE, MAPE and R). It can be concluded that (1) the MAPEs of EEMD-ARIMA, EEMD-BP, EEMD-SVM, ARIMA, BP and SVM are 5.2036, 1.4460, 1.3424, 5.7891, 4.3857 and 3.8470, respectively. (2) In terms of prediction accuracy and actual practice convenience, EEMD-SVM performs best among the above six models. (3) EEMD algorithm is effective for improving the prediction accuracy of six models. (4) The optimal historical data lengths of EEMD-ARIMA, EEMD-BP and EEMD-SVM are 11, 11 and 10 days, respectively.

## ACKNOWLEDGEMENTS

We are very grateful to the editors and anonymous reviewers for their insightful suggestions and comments on this paper.

## CONFLICT OF INTEREST STATEMENT

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.