## Abstract

A hybrid model based on the mind evolutionary algorithm is proposed to predict hourly water demand. In the hybrid model, hourly water demand data are first reconstructed to generate appropriate samples so as to represent the characteristics of time series effectively. Then, the mind evolutionary algorithm is integrated into a back propagation neural network (BPNN) to improve prediction performance. To investigate the application potential of the proposed model in hourly water demand forecasting, real hourly water demand data were applied to evaluate its prediction performance. In addition, the performance of the proposed model was compared with a traditional BPNN model and another hybrid model where the genetic algorithm (GA) is used as an optimization algorithm for BPNN. The results show that the proposed model has a satisfactory prediction performance in hourly water demand forecasting. On the whole, the proposed model outperforms all other models involved in the comparisons in both prediction accuracy and stability. These findings suggest that the proposed model can be a novel and effective tool for hourly water demand forecasting.

## HIGHLIGHTS

A unified framework is developed to select input variables for an hourly water demand forecasting model.

Mind evolutionary algorithm, a novel and powerful optimization algorithm, is used to obtain the optimal initial weights and thresholds for the back propagation neural network.

A hybrid model coupling mind evolutionary algorithm and back propagation neural network is proposed to predict hourly water demand.

## INTRODUCTION

Water demand forecasting is of great importance for water resources management and planning in water distribution systems. With the rapid development of smart management, high-accuracy forecasts have become an important topic in water distribution systems. Therefore, a wide variety of models based on different forecast horizons and forecast periodicities have been developed over the past few decades.

There are some review papers on the models for water demand forecasting in which various models are discussed (Donkor *et al.* 2014; Ghalehkhondabi *et al.* 2017). In general, these models can be categorized into two main groups: conventional models (e.g. auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA)), and artificial intelligence (AI)-based models (e.g. artificial neural networks (ANNs)). Conventional models have the advantages of simple structure and low computation load. These models can yield good forecasts when time series data are under a stationary condition. However, they often fail to capture the nonlinear relationships in time series data, which limits their applications in water demand forecasting. To address this problem, researchers have paid more attention to AI-based models, especially ANNs, which have a powerful ability to capture the nonlinear nature of water demand data time series effectively. Jain *et al.* (2001) developed an ANN for weekly peak demand forecasting at the Indian Institute of Technology, Kanpur. Bougadis *et al.* (2005) conducted a comparative study of ANNs, regression models and seven time-series models for short-term municipal water demand forecasting. The results indicated that ANNs obtained the best performance. Similar studies can also be found in Adamowski (2008) and Herrera *et al.* (2010). More recently, Banjac *et al.* (2015) used ANNs with an adaptive tuning procedure of model parameters for 24-hours-ahead water demand prediction. Bata *et al.* (2020) applied nonlinear autoregressive ANNs to forecast 24 h and one-week-ahead water demand. All of the applications mentioned above indicate that ANNs have a better prediction ability compared with conventional models in many cases. However, due to complex nonlinearity, high irregularity, and multi-scale variability in water demand data time series, ANNs alone cannot always handle these issues appropriately, which may lead to poor predictions in some circumstances.

To improve prediction performance, a variety of hybrid models combining ANNs with other models or methods have been developed and widely used in recent years. One way to improve the forecasting performance of ANNs mainly depends on data preprocessing techniques. Odan & Reis (2012) developed two hybrid models coupling ANNs with Fourier series for 24-hour forecast, and obtained promising results. Tiwari & Adamowski (2013) proposed a hybrid wavelet–bootstrap–neural network model for short-term water demand forecasting, and found that the hybrid model produced significantly more accurate forecasting results than other models such as traditional ANNs and ARIMA. More recently, Xu *et al.* (2019) proposed a hybrid model composed of chaotic theory and a continuous deep belief neural network for daily water demand forecasting. Another way to achieve better predictions is to tune the parameters of ANNs using optimization algorithms such as the genetic algorithm (GA). Many studies using this approach can be found in other fields (Karimi & Yousefi 2012; Mohandes 2012; Wang *et al.* 2017). In the field of water demand forecasting, Pulido-Calvo & Gutierrez-Estrada (2009) predicted daily water demand using a hybrid model which consisted of ANNs, GA and fuzzy logic. The results of the studies aforementioned definitely demonstrate the effectiveness of hybrid models to improve prediction performance.

As Arjmand *et al.* (2020) pointed out, input variables and prediction algorithms are two main factors which can significantly affect the accuracy of predictions for short-term forecasting. In view of these points, there is still some room to further improve the performance of hybrid ANN-based models: (1) although a number of methods have been developed to select input variables, most of them rely on complex computation or algorithms, which makes them inconvenient to implement. Moreover, most of the selected input variables based on these methods have no regularity at all and cannot intuitively describe the changing characteristics of water demand time series; and (2) some optimization algorithms (e.g. GA) used for tuning the parameters of ANNs are often subject to prematurity or convergence to local minima, which weakens the performance of hybrid models. In this context, a novel hybrid model based on the mind evolutionary algorithm (MEA) is proposed for hourly water demand forecasting. MEA is a novel evolutionary algorithm which has shown powerful performance for global search (Zhou *et al.* 2014; Wu & Zhang 2019; Zhang *et al.* 2019), but few studies have applied it to water demand forecasting to date. The proposed model is composed of three parts: reconstruction of hourly water demand series, MEA and a back propagation neural network (BPNN). First, the reconstruction of time series was used to generate appropriate samples including inputs and outputs. Then, MEA was adopted to optimize the initial weights and thresholds of BPNN. Ultimately, the BPNN with optimized parameters was applied to prediction. To describe it more easily, the proposed model is referred as the MEA_BPNN model, while the hybrid model that uses GA to optimize BPNN is named the GA_BPNN model.

The remainder of the paper is organized as follows. Section 2 outlines the methodology. In Section 3, hourly water demand data obtained from a real-world water distribution system are utilized to evaluate the prediction performance of the proposed model, following comparisons with a traditional BPNN model and the GA_BPNN model. Conclusions are presented in Section 4.

## METHODOLOGY

### Data preprocessing

Prediction models require a lot of observation data, but in the real world, poor data are often inevitable. The quality of data has a great influence on the performance of models, so outliers and missing data should be preprocessed before modeling. In this study, the average water demand data at the same time over the previous week are used to replace the missing values and outliers.

### Structure of samples for BPNN

According to the classification criteria based on forecast horizons (Donkor *et al.* 2014), hourly water demand forecasting can be grouped into short-term prediction. Generally, two kinds of variables commonly used for short-term water demand forecasting are water demand data and weather-related variables (e.g. rainfall, temperature, humidity). To some extent, too many variables may deteriorate the practicability of models for the reason that reliable collecting and timely tracking of these data would pose a great challenge to water utilities. Moreover, water demand data themselves are comprehensive results of multiple factors, which contain a concealed relationship between weather-related information and water demand data. That is why some forecast models can also achieve reliable forecasts using water demand data as the only inputs (Bougadis *et al.* 2005; Bakker *et al.* 2013). For this reason, hourly water demand data are selected as the only inputs for BPNN in this study.

To date, unfortunately, there has not been a generally accepted method to determine how many and which water demand data should be selected as inputs for BPNN. In this study, a unified framework for input selection was developed to handle this issue. The main purpose of this framework is just to provide an effective and convenient solution for the selection of input variables, with no requirements for complex mathematical calculations and algorithms. To achieve this objective, hourly water demand data should be reconstructed according to the characteristics of hourly water demand series.

Generally, an hourly water demand series displays two notable characteristics. One characteristic is periodicity, which is mainly influenced by water use behaviors of users at different times (e.g. morning and evening, working day and weekends). It is usually manifested as: (1) similarity of hourly water demand at the same time on different days; and (2) similarity of hourly water demand at the same time on the same day in different weeks. Another characteristic for hourly water demand series follows a principle of ‘big when near and small when far’, that is, the closer the time is, the greater is the influence of historical demand on current demand. Based on the above analyses, reconstruction of hourly water demand data was carried out. The final data structure of samples including inputs and outputs is shown in Table 1.

Component of a sample . | Data . |
---|---|

Input variables | T(i−j) ∼ T(i−1) |

D_{1}(i) ∼ D(_{n}i) | |

W_{1}(i) ∼ W(_{m}i) | |

Output variable | T(i) |

Component of a sample . | Data . |
---|---|

Input variables | T(i−j) ∼ T(i−1) |

D_{1}(i) ∼ D(_{n}i) | |

W_{1}(i) ∼ W(_{m}i) | |

Output variable | T(i) |

Note: *T*(*i*) = hourly water demand at time *i*; *D _{n}*(

*i*) = hourly water demand at time

*i*for the previous day

*n*;

*W*(

_{m}*i*) = hourly water demand at time

*i*on the same day for the previous week

*m*.

### BPNN model

BPNN is a multilayer feedforward neural network, which is commonly used in water demand forecasting (Bougadis *et al.* 2005). In the training phase, BPNN uses an error back propagation (BP) algorithm to train samples. The key idea of the BP algorithm is to minimize the mean square error between actual outputs and expected outputs using a gradient search method. In general, a typical BPNN consists of one input layer, one hidden layer and one output layer. The details involved in developing a BPNN model can be found in many other references.

### GA_BPNN model

GA is a parallel random search optimization algorithm which simulates the genetic mechanism of nature and the theory of biochemical evolution. This algorithm has been applied in a number of fields (Baker & Ayechew 2003; D'Souza & Simpson 2003; Yao *et al.* 2012). The remarkable characteristic of GA is to introduce the biological evolution principle of ‘survival of the fittest’ into the process of parameter optimization. According to the selected fitness function, individuals are screened through three basic operations (i.e. selection, crossover and mutation). The individuals with good fitness values are retained, while the ones with poor fitness values are eliminated. In this way, the new individuals not only inherit the information of the previous generation, but also have an advantage over the previous generation. The above three operations are repeated until the conditions are satisfied.

In a traditional BPNN, the initial weights and thresholds are often generated randomly, which easily results in poor predictions. Therefore, to improve the prediction performance, GA is often used to optimize the initial weights and thresholds for BPNN. The flow chart for using the GA_BPNN model for hourly water demand forecasting is shown in Figure 1.

### MEA_BPNN model

MEA is an emerging optimization algorithm inspired by the activities of the human mind (Sun *et al.* 1999). The motivation of MEA is just to overcome some drawbacks (e.g. slow convergence rate, prematurity) of GA. Therefore, MEA follows some basic concepts of GA, such as ‘group’, ‘individual’, ‘environment’. However, some new concepts, such as ‘superior subgroup’, ‘temporary subgroup’ and ‘billboard’, are also created so as to describe MEA effectively. In the process of MEA, all individuals of each generation constitute a group and the group can be divided into a number of subgroups. A subgroup is grouped into a superior subgroup and a temporary subgroup. The superior subgroup records the information of the winner in the global competition, and the temporary subgroup records the process of the global competition. A ‘billboard’ corresponds to an information platform, which provides opportunities for information exchange between individuals and subgroups. Accordingly, the billboard is divided into global billboard and local billboard. The former is used to post the information of each subgroup, while the latter is employed to post the information of individuals in each subgroup. In addition, MEA uses the similar-taxis and the dissimilation operations to replace the crossover and mutation operations in GA respectively. The basic procedures of MEA can be described as follows:

- (1)
Individuals with a certain scale are randomly generated in the solution space, and then a number of superior individuals and temporary individuals with the highest scores are searched out. Note that the score here corresponds to the fitness value in GA, which represents the individual's adaptability to the environment.

- (2)
Taking these superior individuals and temporary individuals as centers, some new individuals are created around these centers. Subsequently, a number of superior subgroups and temporary subgroups can be obtained.

- (3)
A similar-taxis operation is performed until all subgroups mature. The score of the optimal individual in the subgroup is taken as the score of the subgroup.

- (4)
The scores of all subgroups are posted on the global billboard. Then, a dissimilation operation is performed to complete the process of replacement, abandonment and individual release between the superior subgroups and the temporary subgroups.

- (5)
The above steps are repeated until the number of iterations is satisfied or the score of the optimal individual remains the same. The individual with the highest score in all subgroups is selected as the global optimal individual. The structure diagram of MEA can be seen in Figure 2.

Similar to GA, MEA is also employed to optimize the initial weights and thresholds for BPNN so as to achieve better forecasts. Figure 3 shows the flow chart of hourly water demand forecasting using the MEA_BPNN model.

### Evaluation for prediction performance

*et al.*2014):where and represent the observed values and the forecast values at time

*i*respectively;

*n*is the number of observed values or forecast values.

## CASE STUDY

### Data description

The data used to evaluate the proposed model were collected from a real-world water distribution system in Guigang city, southern China. The hourly water demand data range from January 1, 2018, to June 24, 2018. The data set contains a total of 4,200 observations, which were reconstructed to generate suitable samples for BPNN according to the framework listed in Table 1. In this study, a trial-and-error procedure was applied to determine the parameters *j*, *m* and *n* in Table 1. The values for *j*, *n* and *m* were finally set to 4, 4 and 3 respectively. Then, a sample set containing 3,696 samples for BPNN can be obtained. To verify the performance of the proposed model, the sample set was divided into three parts: a training set, a validating set and a testing set. The training set contains 2,290 samples (about 62% of samples), while the validating set and the testing set contain 762 (about 21% of samples) and 644 samples (about 17% of samples) respectively. The characteristics of the time series for the collected data can be seen in Figures 4 and 5.

### Parameter settings for MEA_BPNN model

To implement the MEA_BPNN model effectively, some parameters for MEA and BPNN should be determined in advance. The parameters to be set in MEA include the size of population, and the number of iterations, superior subgroups and temporary subgroups. As for BPNN, the number of nodes in the hidden layer is a key parameter to be specified. In this paper, a BPNN with three layers was used for prediction in all models. The specific parameters for the MEA_BPNN model are given in Table 2.

Parameters for MEA . | Value . | Parameters for BPNN . | Value . |
---|---|---|---|

Size of population | 200 | Number of input layer nodes | 11 |

Number of superior subgroups | 5 | Number of hidden layer nodes | 5 |

Number of temporary subgroups | 5 | Number of output layer nodes | 1 |

Number of iterations | 20 | Other parameters | Default |

Parameters for MEA . | Value . | Parameters for BPNN . | Value . |
---|---|---|---|

Size of population | 200 | Number of input layer nodes | 11 |

Number of superior subgroups | 5 | Number of hidden layer nodes | 5 |

Number of temporary subgroups | 5 | Number of output layer nodes | 1 |

Number of iterations | 20 | Other parameters | Default |

Note that a trial-and-error method was used to determinate the suitable parameters for all the models in this paper. Other methods such as grid search and random search can also be used to obtain the proper parameters.

### Results and analyses

To investigate the prediction performance of the models, the same samples and parameters for BPNN in Table 2 were applied to train BPNN in all the models. A computer with an i7-7500U CPU was used to carry out all the models in this study. The models were all performed in the Matlab software environment. In addition, the performances of forecast models were evaluated from three aspects: computation load, prediction stability and accuracy. The elapsed time was used to measure the computational load, while the prediction stability and accuracy were measured by RMSE and MAPE respectively.

#### Comparison between MEA_BPNN model and the traditional BPNN model

As is well known, due to the randomness of initial weights and thresholds, the results for each running of BPNN will be different from time to time. Consequently, to evaluate the models more comprehensively, three groups of results were used to investigate the prediction performance. Results for each group were derived from running each model ten times. Table 3 shows the performance results for the MEA_BPNN model and the traditional BPNN model. Note that the elapsed time in Table 3 and the later tables is the total time for running each model ten times, while the results for the other two metrics are the average values for each group.

Model . | Group 1 . | Group 2 . | Group 3 . | ||||||
---|---|---|---|---|---|---|---|---|---|

Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | |

BPNN | 7 | 395 | 2.73 | 8 | 390 | 2.70 | 8 | 383 | 2.67 |

MEA_BPNN | 23 | 350 | 2.47 | 22 | 350 | 2.48 | 26 | 344 | 2.44 |

Improvements (%) | — | 11.3 | 9.5 | — | 10.3 | 8.2 | — | 10.2 | 8.6 |

Model . | Group 1 . | Group 2 . | Group 3 . | ||||||
---|---|---|---|---|---|---|---|---|---|

Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | |

BPNN | 7 | 395 | 2.73 | 8 | 390 | 2.70 | 8 | 383 | 2.67 |

MEA_BPNN | 23 | 350 | 2.47 | 22 | 350 | 2.48 | 26 | 344 | 2.44 |

Improvements (%) | — | 11.3 | 9.5 | — | 10.3 | 8.2 | — | 10.2 | 8.6 |

From Table 3, it can be observed that:

- (1)
The elapsed time for the BPNN model is much less than that for the MEA_BPNN model. This is mainly because the MEA_BPNN model has a more complex structure than the BPNN model, and more time is required to train the network in order to achieve the desired results.

- (2)
The MEA_BPNN model has high prediction accuracy (e.g. the mean of MAPE values range from 2.44% to 2.47%). Compared with the BPNN model, the MEA_BPNN model has lower MAPE values on average. Significant improvements (i.e. 9.5%, 8.2% and 8.6% improvements) can be achieved in the three groups respectively. This reveals that the BPNN model optimized by MEA can indeed improve the prediction accuracy considerably.

- (3)
Similarly, the means of the RMSE values of the MEA_BPNN model are also lower than those of the BPNN model in each group. Obviously, there are considerable reductions for the mean RMSE values in all three groups, leading to the great improvements of 11.3%, 10.3% and 10.2% respectively. The results firmly suggest that the MEA_BPNN model can yield more stable results than the BPNN model.

Figure 6 illustrates an example of good forecasts achieved by the BPNN model and the MEA_BPNN model.

From Figure 6, it is clear that the prediction curve for the MEA_BPNN model is very close to the real curve except for a few points. In contrast, the curve for the BPNN model has a significant deviation from the real curve, especially at some points during the peak hours (e.g. from 6:00 am to 8:00 am or from 6:00 pm to 8:00 pm). It can be explained that hourly water demand in the peak hours tends to fluctuate more frequently, and the BPNN model alone cannot deal with this issue effectively. This also proves the superiority of the MEA_BPNN model from this perspective.

In sum, the MEA_BPNN model outperforms the BPNN model in both prediction stability and accuracy. Although the MEA_BPNN model took about three times as long as the BPNN model, in practice it is almost negligible for the fact that the elapsed time for the MEA_BPNN model increases by about 16 s with more than 8% accuracy improvement on average. That is to say, the increased time does not weaken the practicability of the proposed model. Based on these findings, the MEA_BPNN model can be an effective alternative in hourly water demand forecasting.

#### Comparisons of different hybrid models

To fully evaluate the performance of the proposed model, comparative analyses between the MEA_BPNN model and the GA_BPNN model were also carried out in this paper. The specific parameters for GA are listed in Table 4.

Parameters for GA . | Value . |
---|---|

Crossover probability | 0.2 |

Mutation probability | 0.1 |

Size of population | 10 |

Number of iterations | 20 |

Parameters for GA . | Value . |
---|---|

Crossover probability | 0.2 |

Mutation probability | 0.1 |

Size of population | 10 |

Number of iterations | 20 |

Table 5 shows prediction performances for different hybrid models. Note that in Table 5 the meanings of metrics are the same as those in Table 3, and results for each group were obtained from running each model ten times.

Model . | Group 1 . | Group 2 . | Group 3 . | ||||||
---|---|---|---|---|---|---|---|---|---|

Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | |

MEA_BPNN | 22 | 353 | 2.50 | 25 | 348 | 2.46 | 23 | 349 | 2.44 |

GA_BPNN | 183 | 364 | 2.56 | 199 | 362 | 2.54 | 202 | 369 | 2.57 |

Model . | Group 1 . | Group 2 . | Group 3 . | ||||||
---|---|---|---|---|---|---|---|---|---|

Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | Elapsed time (s) . | RMSE (m^{3}/h)
. | MAPE (%) . | |

MEA_BPNN | 22 | 353 | 2.50 | 25 | 348 | 2.46 | 23 | 349 | 2.44 |

GA_BPNN | 183 | 364 | 2.56 | 199 | 362 | 2.54 | 202 | 369 | 2.57 |

From Table 5, the following observations can be made:

- (1)
The elapsed time of the MEA_BPNN model is much less than that of the GA_BPNN model, which is only about 23 s on average for running ten times. This indicates that the MEA_BPNN model has a lower computational load, and is more efficient than the GA_BPNN model in finding the optimal solution. In other words, the GA_BPNN model needs to take much more time to train the network.

- (2)
The MEA_BPNN model always has the lowest RMSE values in each group, with 353, 348, and 349 respectively. This signifies that the MEA_BPNN model has better prediction stability than the GA_BPNN model in hourly water demand forecasting. Moreover, the MAPE values of the MEA_BPNN model are likewise the lowest compared with the GA_BPNN models. These results clearly suggest that more accurate predictions can be obtained using the proposed model. The main reason why the proposed model is superior to the GA_BPNN model in both prediction accuracy and stability is that MEA can effectively overcome some defects (e.g. easily falling into a local optimization, prematurity) which exist in GA. For this reason, the MEA_BPNN model can find more suitable optimized parameters to train the BPNN, and thereby yields more accurate and stable forecasts.

- (3)
From the perspective of prediction accuracy and stability, all the hybrid models in Table 5 produced better results than the BPNN model in Table 3. This further proves the fact that optimizing the initial weights and thresholds using optimization algorithms can really improve the prediction accuracy of the BPNN.

#### Evaluation of effectiveness for input variable selection

Input variables are an important part of forecast models, and have a non-negligible impact on prediction performance. Different input variables will produce different results. As a result, it is necessary to investigate the influences on predictions using different input variables.

Table 6 shows prediction results for different input variables. The results in Table 6 are the average values obtained from running the proposed model ten times using different input variables. The input variables in Model#1 were selected according to the framework in Table 1, while in the other models, the continuous previous water demand data were used as the input variables, which is also a common practice in time series forecasting.

Model . | Input variables . | RMSE (m^{3}/h)
. | MAPE (%) . |
---|---|---|---|

Model#1 | T(i−4),T(i−3),T(i−2),T(i−1),D_{4}(i), | 347 | 2.47 |

D_{3}(i),D_{2}(i),D_{1}(i),W_{3}(i),W_{2}(i),W_{1}(i) | |||

Model#2 | T(i−4),T(i−3),T(i−2),T(i−1) | 736 | 4.87 |

Model#3 | T(i−11),T(i−10),T(i−9),T(i−8),T(i−7),T(i−6),T(i−5), T(i−4),T(i−3),T(i−2),T(i−1) | 674 | 4.67 |

Model#4 | T(i−24),T(i−23),T(i−22),T(i−21),T(i−20),T(i−19),T(i−18),T(i−17),T(i−16), T(i−15),T(i−14),T(i−13),T(i−12), T(i−11),T(i−10),T(i−9),T(i−8),T(i−7),T(i−6),T(i−5), T(i−4),T(i−3),T(i−2),T(i−1) | 496 | 3.53 |

Model . | Input variables . | RMSE (m^{3}/h)
. | MAPE (%) . |
---|---|---|---|

Model#1 | T(i−4),T(i−3),T(i−2),T(i−1),D_{4}(i), | 347 | 2.47 |

D_{3}(i),D_{2}(i),D_{1}(i),W_{3}(i),W_{2}(i),W_{1}(i) | |||

Model#2 | T(i−4),T(i−3),T(i−2),T(i−1) | 736 | 4.87 |

Model#3 | T(i−11),T(i−10),T(i−9),T(i−8),T(i−7),T(i−6),T(i−5), T(i−4),T(i−3),T(i−2),T(i−1) | 674 | 4.67 |

Model#4 | T(i−24),T(i−23),T(i−22),T(i−21),T(i−20),T(i−19),T(i−18),T(i−17),T(i−16), T(i−15),T(i−14),T(i−13),T(i−12), T(i−11),T(i−10),T(i−9),T(i−8),T(i−7),T(i−6),T(i−5), T(i−4),T(i−3),T(i−2),T(i−1) | 496 | 3.53 |

It can be seen from Table 6 that Model#1 performed much better than the other models. The main reason is that the input variables in Model#1 take the characteristics of water demand data series into account, and hence are able to capture the regularities in hourly water demand series accurately, while the other models fail to do this. Based on the results from Model#2, Model#3 and Model#4, it can be found that with the increase of the number of input variables, the prediction performance tends to become better (e.g. the mean of the MAPE value decreases from 4.87% to 3.35%, and the mean of the RMSE value decreases from 736 to 496). One reasonable explanation may be that more input variables contain more information, which allows the BPNN model to capture more characteristics among the data. Nevertheless, there are many more input variables in Model#4 (24 input variables), the results in Model#4 are still greatly inferior to the ones in Model#1 (11 input variables). This finding strongly demonstrates the effectiveness of the framework for input variable selection in this paper. In practice, too many input variables may raise redundant information, which is harmful to the performance of forecast models. Consequently, to improve prediction performance, it is more important to pursuit the quality of input variables rather than the number of input variables.

## CONCLUSIONS

A hybrid model called MEA_BPNN is proposed to forecast hourly water demand in this study. Based on the results, it can be concluded that: (1) The proposed framework for input variables selection is effective in hourly water demand forecasting; (2) the MEA_BPNN model has a promising prediction performance in hourly water demand forecasting; and (3) the MEA_BPNN model can significantly improve the performance of the traditional BPNN model, and outperforms the GA_BPNN model in computational load, prediction accuracy and stability.

## ACKNOWLEDGEMENTS

This work was supported by Beibu Gulf University (Grant No. 2019KYQD22) and Middle-aged and Young Teachers' Basic Ability Promotion Project of Guangxi (Grant No. 2021KY0438). Hourly water demand data were provided by Guigang region, Guangxi Province.

## CONFLICTS OF INTEREST/COMPETING INTERESTS

We declare that there is no conflict of interest.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories: https://pan.baidu.com/share/init?surl=3887oqjkQAtqC_lgcblEaQ (password:1234).

## REFERENCES

*IEEE SMC'99 Conference Proceedings, 1999 IEEE International Conference on Systems, Man, and Cybernetics*, Vol. 2, IEEE, Piscataway, NJ, USA, pp.