## Abstract

Forecasting short-term water demands is one of the most critical needs of operating companies of urban water distribution networks. Water demands have a time series nature, and various factors affect their variations and patterns, which make it difficult to forecast. In this study, we first implemented a hybrid model of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to forecast urban water demand. These models include a combination of CNN with simple RNN (CNN-Simple RNN), CNN with the gate recurrent unit (CNN-GRU), and CNN with the long short-term memory (CNN-LSTM). Then, we increased the number of CNN channels to achieve higher accuracy. The accuracy of the models increased with the number of CNN channels up to four. The evaluation metrics show that the CNN-GRU model is superior to other models. Ultimately, the four-channel CNN-GRU model demonstrated the highest accuracy, achieving a mean absolute percentage error (MAPE) of 1.65% for a 24-h forecasting horizon. The effects of the forecast horizon on the accuracy of the results were also investigated. The results show that the MAPE for a 1-h forecast horizon is 1.06% in four-channel CNN-GRU, and its value decreases with the amount of the forecast horizon.

## HIGHLIGHTS

Time series analysis of water demand using hybrid deep learning models can be a suitable option for short-term forecasting.

Hybrid deep neural networks integrate the advantages of the two classic models of convolutional neural networks (CNNs) and recurrent neural networks (RNNs).

The combination of CNN and RNNs can simultaneously extract the appropriate features by CNN and learn the long dependency between data by RNNs.

## INTRODUCTION

Urban water distribution networks (WDNs) are one of the vital infrastructures of any city. Intelligent operation management is necessary for WDNs to ensure adequate water supply at the desired pressure and quantity for consumers. One of the essential requirements of operation management in water supply networks is to predict the amount of water demand needed in that network in the short-term and the hourly interval. Many factors affect the quantity of water demand, such as temperature, precipitation, relative humidity, population, network water pressure, water price for various uses, water losses, the method and system of measuring water consumption, household income, yard area, and green space (Arbués *et al.* 2003; Wentz & Gober 2007; Schleich & Hillenbrand 2009; Nauges & Whittington 2010; de Maria André & Carvalho 2014; de Souza Groppo *et al.* 2019).

Water demand forecasting methods can be divided into two general categories of linear and nonlinear methods (Zhang 2001; de Souza Groppo *et al.* 2019). Exponential smoothing, autoregressive integrated moving averages (ARIMA), and multivariate linear regression are linear methods that use univariate or multivariate time series analysis (Adamowski & Karapataki 2010; Adamowski *et al.* 2012; Ristow *et al.* 2021). In this regard, various regression methods were used by Shuang & Zhao (2021) to predict urban water demand. Short-term water demand follows a nonlinear pattern and is affected by many factors. In most of the past studies, statistical methods were less accurate than nonlinear methods for forecasting short-term water demand. These methods are used more for long-term forecasting (Donkor *et al.* 2014; Ghalehkhondabi *et al.* 2017).

Nonlinear regression and artificial neural networks (ANNs) with nonlinear activation functions are nonlinear methods. For example, ANNs were used by Ghiassi *et al.* (2008) to predict water demand. Herrera *et al.* (2010), Peña-Guzmán *et al*. (2016), and Brentan *et al*. (2017) used the support vector machine method. Also, Altunkaynak *et al.* (2005) and Firat *et al.* (2009a) used fuzzy logic. Other ANNs used in past studies to predict urban water demand can be cited, such as the generalized regression neural network, the radial basis function networks, the feedforward neural network (Firat *et al.* 2009b), and the extreme learning machine method (Mouatadid & Adamowski 2017). A machine learning (ML) model such as the abovementioned models cannot simultaneously perform feature selection and prediction. On the other hand, using all the features in ML models, in addition to the problems of collecting data in the real world, increases the computational cost and reduces the model's accuracy. Therefore, to feature selection, reduce the computational cost, adjust neural network parameters, and eliminate noise from data, the researchers focused on hybrid methods for predicting water demand (Tiwari & Adamowski 2015; Shirkoohi *et al.* 2021). As mentioned earlier, one of the significant challenges of using ML methods is choosing the appropriate features that directly affect the prediction results. On top of this, these models often pose an overfitting problem with the increase in data (Sajjad *et al.* 2020). In deep learning methods, feature selections are made automatically through many hidden layers; their accuracy usually increases with increasing data. Similarly, several deep neural networks (DNNs) are developed for water demand prediction. For example, Mu *et al.* (2020) used the long short-term memory (LSTM) model to predict hourly and daily water demand in Hefei in China. They obtained better results than support vector regression, random forest, and ARIMA models. In another study, Guo *et al.* (2018) used the gate recurrent units (GRU) model to forecast water demand with a time step of 15 min. In this study, the GRU model was more accurate than the conventional ANN model and the seasonal ARIMA model. Convolutional neural networks (CNNs) are widely used in various fields, including time series, due to their high ability to extract features, and recurrent neural networks (RNNs) can learn time dependency between data. Namdari *et al.* (2023) used a one-dimensional CNN (1D CNN) to forecast short-term urban water demand. Next, they compared the results with other deep learning models including simple RNNs, LSTM, GRU, and deep feedforward neural networks (DFNNs). The results of their study showed that 1D CNN, compared to other models, predicts short-term water demand with higher accuracy, and the DFNN model has the least accuracy compared to other models.

Hybrid DNNs can consist of CNNs and RNNs. In hybrid DNNs, CNNs are utilized to capture spatial features, while recurrent models are employed to model temporal features (Sajjad *et al.* 2020). The convolution layers in these models extract the features from the dataset, and the RNNs learn the long dependency between the data. Hybrid DNNs excel in recognizing patterns with both spatial and temporal characteristics. Recently, researchers used these models for prediction in various fields, for example, human activity recognition (Lu *et al.* 2022), petroleum price prediction (Kim & Jang 2023), electric energy forecasting (Wang *et al.* 2023b; Yang *et al.* 2023), estimation of the yield of agricultural products (Wang *et al.* 2023a), wind speed forecasting (Lv *et al.* 2023), and forecasting stock market indices (Song & Choi 2023).

In past studies, researchers have used various methods to predict water demand, but despite much research in this field, the hybrid DNN has received less attention. The water demand follows a time series pattern with a complex and multifaceted structure, and it is challenging to collect all the characteristics affecting water demand in a real-world implementation. On the other hand, despite having a high ability to analyze nonlinear water demand data, ML models are weak in dealing with nonstationary data (Ghalehkhondabi *et al.* 2017). In this study, we implemented hybrid DNNs to forecast short-term urban water demand by combining CNN and RNNs. The combination of CNN and RNNs can simultaneously extract the appropriate features by CNN and learn the time dependency between data by RNNs.

## METHODOLOGY

*X*

_{1},

*X*

_{2}, …,

*X*] is the input layer, and [

_{p}*Y*

_{1},

*Y*

_{2}, …,

*Y*] are the forecast values for the first hour to the last hour of the time horizon. In the following, after the introduction of DNNs, the implementation method will be explained.

_{n}## INTRODUCTION OF DNNS

### 1D CNNS

*et al.*2021; Qazi

*et al.*2022). In this sense, they are an excellent choice for signal processing and analyzing time series data. Figure 2 shows the details of a 1D CNN.

### Simple RNNs

In this equation, is the input of the network at time *t*, is a vector of values called the state of the internal network at time *t*, and refers to a summary of previous network inputs for times before *t*.

### LSTM neural network

### GRU neural network

*et al.*(2014). Two gates are used in the GRU network: update and reset gates. In the update gate, based on the new input and the information in the memory cell related to the past, it decides what information should be kept and what information should be deleted. In the reset gate, based on the new input (at time

*t*) and the information in the memory cell, it decides what information should be deleted. The information from the previous time step's memory cell is combined with the remaining information to generate a candidate memory cell. In the last step, the update gate determines the proportion of the candidate memory information and memory cell information from the previous step that will be added to obtain the memory cell vector at time

*t*. The memory mentioned is utilized as a hidden state for the time

*t*+ 1, and by applying an activation function on it, the output at time

*t*can be obtained. Figure 5 shows the architecture of the GRU unit.

### Hybrid DNNs

Hybrid DNNs leverage water demand data sequences as input to a convolutional layer structure, which performs feature extraction. The output of the convolutional layer is then passed through a flattening layer and fed as input to RNNs, whose final output is the forecast value of the water demand. Following the experimental comparison, it has been observed that this solution effectively combines the benefits of the two traditional models of CNN and RNNs (Xu *et al.* 2021). The CNN layer aids in comprehending the sequential features of the input while the RNN layer learns long-term dependencies between data (Xu *et al.* 2021). Various types of RNNs are used in these models, including Simple RNN, LSTM, and GRU.

## CASE STUDY

Shiraz City is the capital of Fars province, located in the south of Iran. This city is the fourth most populated in Iran, with a population of about two million people and an area of 1,268 km^{2}. This city, based on meteorological data, has a moderate climate. Its average annual temperature is 19.3 °C, and its annual rainfall is 294 mm. According to the reports of Shiraz Water and Wastewater Company, about 75% of drinking water in this city is supplied by 180 underground water wells, and 25% is supplied by surface water. Due to the large number of water production sources and pumping stations in the WDN, it is necessary for short-term water demand forecasts in hourly intervals for operating companies. With a proper forecast of short-term water demand, the operating company can deliver water to the customers in sufficient quantity and at the appropriate pressure and minimize energy consumption costs with proper planning for pump operations. Saadi Town is a mountainous region in the east of Shiraz. It covers an area of 360 hectares and has a population of over 60,000 people. We considered this section of Shiraz City as the study area with an independent WDN. The dataset used in this study was 34,840 hourly water demand records from 19 May 2016 to 18 May 2022, of which 8,984 data records are missing. The minimum and maximum hourly demand in the dataset is 131.7 and 1,103.2 m^{3} with an average of 596.49 and a standard deviation of 131.7 m^{3}.

## IMPLEMENTATION OF HYBRID DNNS

### Data preprocessing

Data preprocessing is an important step in data mining projects, and quality decisions can only be made when they are based on quality data. In this research, we used hourly water demand in past years. Water demand data may not be recorded in the dataset at some hours of the day for various reasons. To fill in the missing values, if the missing data are related to an hour on a working day, we used the average demand in the same hour on the day before and after it. In another case, the missing data are related to a holiday or the day before or after a holiday; we used the average demand at the same time and day in the week before and after that. This method for filling missing values is the most likely state for missing values and prevents data bias. Using this method to fill in the missing values due to the high probability of occurrence and being close to reality will considerably solve the problem of bias in the data and also preserve the seasonal changes in the value of demand in correcting the missing value.

Outlier data are data that are significantly different from other data. Their use in ML models causes anomalies and negative bias in the results. We identified outliers using the boxplot. Through this method, we calculate the value of the data's first and third quartiles (*Q*_{1}, *Q*_{3}) and get the difference between them (IQR = *Q*_{3} − *Q*_{1}). Data that are larger than *Q*_{3} + 1.5IQR or smaller than *Q*_{1} − 1.5IQR are considered outliers. In the dataset, 52 hourly records were identified as outliers using the boxplot and removed from the dataset. Then, the deleted values were filled in as in the case of missing data.

*et al.*2022). We used Min–Max normalization in this study and converted the dataset to the range of 0–1 according to Equation (2):

The and are the minimum and maximum values of a feature *A*, *V* is the value of the feature, and *V*′ is the value of the normalized feature.

### Implementation of single-channel hybrid models

*t*and

*t*-lag, and denotes the variance of values at time

*t*. We identify the optimal lag by drawing the autocorrelation plot of the water demand data. The autocorrelation plot between the data is shown in Figure 6. The blue shaded area in this figure indicates a 95% significant level, and the vertical lines (correlation values) that are taller than this area indicate the presence of significant correlation in that lag (Lazzeri 2020). As evident in this figure, the highest autocorrelation between the data in lag is 24. Hence, we select the water demand data from time 1–24 as the feature of the first sample and its value from time 25–48 as the sample's label. We repeat this process until the final data (Table 1) are labeled and used as model input (Namdari

*et al.*2023). Then, we selected 80% of the beginning of the dataset for the training data and 20% of the end of the dataset for the test data.

Sample . | Feature . | Label . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

1 | Q_{1} | Q_{2} | Q_{3} | …. | Q_{24} | Q_{25} | Q_{26} | Q_{27} | …. | Q_{48} |

2 | Q_{2} | Q_{3} | Q_{4} | …. | Q_{25} | Q_{26} | Q_{27} | Q_{28} | …. | Q_{49} |

3 | Q_{3} | Q_{4} | Q_{5} | …. | Q_{26} | Q_{27} | Q_{28} | Q_{29} | …. | Q_{50} |

4 | Q_{4} | Q_{5} | Q_{6} | …. | Q_{27} | Q_{28} | Q_{29} | Q_{30} | …. | Q_{51} |

… | … | … | … | … | … | … | … | … | … | … |

n | Q _{n} | Q_{n}_{+1} | Q_{n}_{+2} | … | Q_{n}_{+23} | Q_{n}_{+24} | Q_{n}_{+25} | Q_{n}_{+26} | … | Q_{n}_{+47} |

Sample . | Feature . | Label . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

1 | Q_{1} | Q_{2} | Q_{3} | …. | Q_{24} | Q_{25} | Q_{26} | Q_{27} | …. | Q_{48} |

2 | Q_{2} | Q_{3} | Q_{4} | …. | Q_{25} | Q_{26} | Q_{27} | Q_{28} | …. | Q_{49} |

3 | Q_{3} | Q_{4} | Q_{5} | …. | Q_{26} | Q_{27} | Q_{28} | Q_{29} | …. | Q_{50} |

4 | Q_{4} | Q_{5} | Q_{6} | …. | Q_{27} | Q_{28} | Q_{29} | Q_{30} | …. | Q_{51} |

… | … | … | … | … | … | … | … | … | … | … |

n | Q _{n} | Q_{n}_{+1} | Q_{n}_{+2} | … | Q_{n}_{+23} | Q_{n}_{+24} | Q_{n}_{+25} | Q_{n}_{+26} | … | Q_{n}_{+47} |

### Implementation of multichannel hybrid models

In this study, the weights of the neural networks were modified in an iterative process by the backpropagation algorithm, and using the gradient descent algorithm with a mini-batch size of 256 with the mean absolute error (MAE) loss function. Also, Adam's algorithm (Kingma & Ba 2014) was used to adjust the learning rate, which performed better than other adaptive algorithms. The uniform Xavier Glorot model (Glorot & Bengio 2010) was used for the initial weighting of the parameters, which obtained better answers than other methods. Using Batch Normalization in network layers did not positively affect the results.

## EVALUATION METRICS

In these equations, *y* is the real value, *ŷ* is the forecast value, and is the mean of the real values of the samples. The *R*^{2} index indicates the amount of adaptation of the forecast values with the real values. The closer this index is to one, it indicates that the predicted values are closer to real values and the accuracy of the model is higher.

## RESULTS AND DISCUSSION

In this study, we implemented the 1D CNN model and hybrid DNNs, including CNN-Simple RNN, CNN-LSTM, and CNN-GRU to forecast short-term urban water demand. At first, we used the hybrid models as single-channel ones. The architecture of these models was according to Figure 7. The evaluation results of these models are given in Table 2. The results show that the combination of CNNs and RNNs (hybrid DNNs) improves the forecast results and can increase the accuracy of forecasting water demand. Among the hybrid models, the single-channel CNN-GRU obtained better results than other single-channel models. In the next step, we increased the number of 1D CNN channels in the hybrid models. Increasing the number of 1D CNN channels resulted in better results in all models compared to the single-channel model.

Model . | RMSE . | MSE . | R^{2}. | MAE . | MAPE% . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | |

1D CNN | 32.68 | 24.59 | 1,069 | 604.7 | 0.941 | 0.855 | 21.18 | 14.55 | 3.13 | 2.62 |

CNN-Simple RNN | 31.20 | 23.83 | 974.0 | 567.9 | 0.946 | 0.863 | 19.79 | 13.86 | 2.92 | 2.50 |

CNN-GRU | 28.09 | 22.29 | 788.8 | 496.7 | 0.957 | 0.881 | 16.58 | 12.24 | 2.51 | 2.21 |

CNN-LSTM | 30.24 | 22.70 | 915.9 | 515.4 | 0.950 | 0.875 | 19.25 | 13.00 | 2.88 | 2.37 |

Model . | RMSE . | MSE . | R^{2}. | MAE . | MAPE% . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | |

1D CNN | 32.68 | 24.59 | 1,069 | 604.7 | 0.941 | 0.855 | 21.18 | 14.55 | 3.13 | 2.62 |

CNN-Simple RNN | 31.20 | 23.83 | 974.0 | 567.9 | 0.946 | 0.863 | 19.79 | 13.86 | 2.92 | 2.50 |

CNN-GRU | 28.09 | 22.29 | 788.8 | 496.7 | 0.957 | 0.881 | 16.58 | 12.24 | 2.51 | 2.21 |

CNN-LSTM | 30.24 | 22.70 | 915.9 | 515.4 | 0.950 | 0.875 | 19.25 | 13.00 | 2.88 | 2.37 |

The evaluation metrics of the CNN-GRU model are better than other models, which is why the values of CNN-GRU rows are in bold.

Table 3 shows the evaluation metrics for the hybrid DNNs with two to five CNN channels. As is clear in this table, the evaluation metrics in all the hybrid DNNs improve with the increase in the number of up to four CNN channels. The multichannel CNN-GRU model with two to five channels has better evaluation metrics than other hybrid models. By increasing the number of CNN channels from four to five, there was no noticeable change in the results; for this reason, we presented the four-channel CNN-GRU model as a proposed model. However, the evaluation metrics on training data in the five-channel models are better than the four-channel models. Figure 8 shows the architecture of the multichannel CNN-GRU. According to this figure, filters of different sizes have been used in the convolution layers on each CNN channel. The variety of filter sizes can extract more diverse features compared to single-channel hybrid models, and ultimately increase the accuracy of forecasting the results. However, with the increase in the number of CNN channels, the training time of the model increases.

Model . | RMSE . | MSE . | R^{2}. | MAE . | MAPE% . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | |

Two-channel CNN-Simple RNN | 28.12 | 23.21 | 792.4 | 538.8 | 0.956 | 0.869 | 16.76 | 13.30 | 2.52 | 2.43 |

Two-channel CNN-GRU | 25.01 | 20.25 | 626.1 | 410.2 | 0.966 | 0.901 | 13.34 | 10.06 | 2.03 | 1.85 |

Two-channel CNN-LSTM | 27.66 | 21.57 | 765.2 | 465.4 | 0.958 | 0.888 | 16.26 | 11.41 | 2.47 | 2.07 |

Three-channel CNN-Simple RNN | 25.13 | 20.38 | 634.1 | 415.3 | 0.955 | 0.899 | 13.71 | 10.36 | 2.11 | 1.90 |

Three-channel CNN-GRU | 23.92 | 19.64 | 574.4 | 385.9 | 0.959 | 0.907 | 12.38 | 9.53 | 1.92 | 1.75 |

Three-channel CNN-LSTM | 25.39 | 20.88 | 647.7 | 436.0 | 0.954 | 0.895 | 14.08 | 10.84 | 2.17 | 1.98 |

Four-channel CNN-Simple RNN | 24.25 | 19.87 | 588.1 | 394.9 | 0.968 | 0.905 | 12.56 | 9.68 | 1.90 | 1.77 |

Four-channel CNN-GRU | 22.98 | 19.45 | 528.3 | 378.4 | 0.971 | 0.909 | 10.70 | 8.96 | 1.65 | 1.64 |

Four-channel CNN-LSTM | 24.66 | 20.60 | 608.0 | 424.4 | 0.967 | 0.895 | 12.77 | 10.91 | 1.94 | 2.04 |

Five-channel CNN-Simple RNN | 24.42 | 19.53 | 597.0 | 381.5 | 0.967 | 0.908 | 12.87 | 9.16 | 1.94 | 1.67 |

Five-channel CNN-GRU | 23.11 | 18.86 | 534.5 | 355.8 | 0.971 | 0.914 | 11.35 | 8.40 | 1.72 | 1.54 |

Five-channel CNN-LSTM | 24.09 | 19.36 | 580.9 | 374.8 | 0.968 | 0.910 | 12.48 | 9.10 | 1.89 | 1.65 |

Model . | RMSE . | MSE . | R^{2}. | MAE . | MAPE% . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | Test . | Training . | |

Two-channel CNN-Simple RNN | 28.12 | 23.21 | 792.4 | 538.8 | 0.956 | 0.869 | 16.76 | 13.30 | 2.52 | 2.43 |

Two-channel CNN-GRU | 25.01 | 20.25 | 626.1 | 410.2 | 0.966 | 0.901 | 13.34 | 10.06 | 2.03 | 1.85 |

Two-channel CNN-LSTM | 27.66 | 21.57 | 765.2 | 465.4 | 0.958 | 0.888 | 16.26 | 11.41 | 2.47 | 2.07 |

Three-channel CNN-Simple RNN | 25.13 | 20.38 | 634.1 | 415.3 | 0.955 | 0.899 | 13.71 | 10.36 | 2.11 | 1.90 |

Three-channel CNN-GRU | 23.92 | 19.64 | 574.4 | 385.9 | 0.959 | 0.907 | 12.38 | 9.53 | 1.92 | 1.75 |

Three-channel CNN-LSTM | 25.39 | 20.88 | 647.7 | 436.0 | 0.954 | 0.895 | 14.08 | 10.84 | 2.17 | 1.98 |

Four-channel CNN-Simple RNN | 24.25 | 19.87 | 588.1 | 394.9 | 0.968 | 0.905 | 12.56 | 9.68 | 1.90 | 1.77 |

Four-channel CNN-GRU | 22.98 | 19.45 | 528.3 | 378.4 | 0.971 | 0.909 | 10.70 | 8.96 | 1.65 | 1.64 |

Four-channel CNN-LSTM | 24.66 | 20.60 | 608.0 | 424.4 | 0.967 | 0.895 | 12.77 | 10.91 | 1.94 | 2.04 |

Five-channel CNN-Simple RNN | 24.42 | 19.53 | 597.0 | 381.5 | 0.967 | 0.908 | 12.87 | 9.16 | 1.94 | 1.67 |

Five-channel CNN-GRU | 23.11 | 18.86 | 534.5 | 355.8 | 0.971 | 0.914 | 11.35 | 8.40 | 1.72 | 1.54 |

Five-channel CNN-LSTM | 24.09 | 19.36 | 580.9 | 374.8 | 0.968 | 0.910 | 12.48 | 9.10 | 1.89 | 1.65 |

The evaluation metrics of the CNN-GRU model are better than other models, which is why the values of CNN-GRU rows are in bold.

*R*

^{2}for these models for different forecast horizons. As is evident, the accuracy of the models increases with the decrease in the forecast horizon. Also, the hybrid models have better evaluation metrics on all forecast horizons than the 1D CNN model. In hybrid models up to the 6-h forecast horizon, they have almost similar evaluation metrics, but for horizons of more than 6 h, the four-channel CNN-GRU performs better than the other two models.

## CONCLUSIONS

To supply adequate water and with the appropriate pressure for consumers in urban WDNs, there is a need for intelligent operation management. One of the requirements of intelligent operation management in WDNs is to forecast short-term water demand at an hourly interval for the next 24 h. In this study, we implemented combined CNN and RNN models to forecast short-term urban water demand. Hybrid models included CNN-Simple RNN, CNN-GRU, and CNN-LSTM. The results showed that the combination of CNNs and RNNs has forecast accuracy much better than the 1D CNN model. Among the hybrid models, the CNN-GRU is superior to other models. On the other hand, increasing the number of CNN channels improved the evaluation metrics, and the four-channel CNN-GRU model had higher accuracy than the other two hybrid models. Thus, we proposed the four-channel CNN-GRU for forecasting water demand. However, it is worth noting that the five-channel models had better evaluation metrics on the training data than the four-channel models. The accuracy of the models increases with the decrease in the forecast horizon. The four-channel CNN-GRU has almost the same accuracy as other hybrid models (CNN-Simple RNN and CNN-LSTM) for forecast horizons of up to 6 h, but it is better than other hybrid models for forecast horizons of more than 6 h. As we mentioned, CNNs can extract features, and RNNs can learn long-term dependencies between data, and combining these two networks can forecast water demand with higher accuracy than 1D CNN. Therefore, hybrid DNNs can be suggested for forecasting other time series issues. During this study, we manually tuned the hyperparameters. However, there are several methods available for automatic hyperparameter tuning, such as random search, grid search, and Bayesian optimization. By utilizing these methods, it is possible to obtain models with better accuracy, although it will take a longer time to train the model. Reducing the training time of the model while maintaining its accuracy is a challenging task. In general, models that predict future values based on time series have the weakness that if one of the factors that affect water demand changes significantly compared to its past, such as the price of water, it can adversely affect the model's results. Therefore, it is important to consider this issue when using such models.

## ACKNOWLEDGEMENTS

We are grateful to the Shiraz Water and Wastewater Company for providing the water demand data.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*.*6980

**72**,

**17**(7), 1652–1662.

.1259