## Abstract

Due to the importance of forecasting urban water demand (WD), the present study investigated the capability of Gaussian process regression (GPR) using different features as input data. WD could be represented as a function of various variables such as climatic, socioeconomic, institutional and management factors to understand it better. Therefore, in the present study, GPR was used to predict daily WD by taking advantage of several socio-economic and climatic variables for Hamadan city, located in the west of Iran. The selected variables were comprised of daily weather data such as rainfall (R), maximum temperature (T_{max}), mean temperature (T_{mean}), min temperature (T_{min}) and relative humidity, Socioeconomic variables such as average monthly water bill (MWB), population (P), number of households (NH), gross national product, and inflation rate (I). The results indicated that GPR could predict the daily WD accurately in terms of statistical evaluations criteria. Sensitivity analysis among various climatic and socio-economic data showed the best input structure in water consumption prediction via GPR. Accordingly, the results approved the substantial impression of WD with three day-lag, I, MWB and T_{max} as the input dataset.

## INTRODUCTION

Extreme rate of population and industrial development in recent decades has caused an augmentation in water consumption in metropolitan areas all around the world. The valid operation and management of an urban water supply system necessitates accurate water requisition predictions, and the estimation of future urban water demand (WD) as a critical evaluation of sustainable planning of regional water supply systems (Zhou *et al.* 2002; Kame'enui 2003; Donkor *et al.* 2014). Accurate forecasts also aid with decision making, such as when to implement regulatory water use restrictions in times of water stress or drought (Herrera *et al.* 2010), or when to start drawing from auxiliary supplies (Jain & Ormsbee 2002). Urban water is generally supplied on the basis of the experience of operators, although accurate and reliable forecasts of WD help operators provide water more sustainably (Zhou *et al.* 2002). Temperature, precipitation, and past WD are usually the most significant input variables that affect urban WD forecasting (Bougadis *et al.* 2005; Adamowski *et al.* 2012).

Artificial intelligence such as machine learning approaches are black box models that have been applied in a variety of areas such as water engineering and WD forecasting (e.g., Adamowski & Karapataki 2010; Nasseri *et al.* 2011; Nourani *et al.* 2016; Roushangar *et al.* 2016). Application of these models has proved their capability in prediction of urban WD at different scales (Jain *et al.* 2001; Altunkaynak *et al.* 2005; Bougadis *et al*. 2005; Yurdusev & Firat 2009; Adamowski & Karapataki 2010; Nasseri *et al.* 2011; Adamowski *et al.* 2012). Zhou *et al.* (2002) developed time series models for daily water consumption in Melbourne, Australia. Artificial neural network models have also been used to model weekly peak demand.

Weather and climate play an important role in WD. Daily and weekly fluctuations in air temperature and precipitation values are highly associated with changes in water use, with negative relationships between water use and precipitation and positive relationships with temperature and water use (Zhou *et al.* 2000; Miller & Yates 2006; Chang *et al.* 2014; de Souza *et al.* 2015). Because climate is defined as the average or trend in weather over longer time periods, any long-term change in climate is expected to bring changes in WD as well. Advance knowledge of the extent of WD change as it relates to climate change is an important piece of information for water managers for long-term climate adaptation planning (Zhou *et al.* 2000; Olmstead 2014; Wang *et al.* 2014).

The present study focused on prediction of daily urban WD for one-step-ahead lead times using Gaussian process regression (GPR). The developed models were established using WD feature of previous days along with socio-economic and climatic features.

## MATERIAL AND METHODS

### Gaussian process regression

As an inherent extension of Gaussian distribution, Gaussian processes (GP) bear their properties with mean and covariance as a vector and matrix respectively (Neal 1998). GPR is a beneficial nonparametric regression tool, since it has theoretical simplicity and good distribution capability, as well as purveying probabilistic results (Rasmussen & Williams 2006).

A GP is defined as a set of random variables, any bounded number of which has a common multivariate Gaussian distribution. Let indicate the domains of inputs and outputs, respectively, from which *n* pairs (*x _{i}*,

*y*) are drawn independently and identically distributed. For regression, adjudge that ; then, a GP on is specified by a mean function and a covariance function .

_{i}*y*∼

*f*(

*x*) +

*ξ*,

*where ξ*∼

*N*(0,

*σ*

^{2}). The symbol ∼ in statistics means sampling for. In GPR, for every input

*x*there is an associated random variable

*f*(

*x*), which is the value of the stochastic function

*f*at that location. In this study, it is assumed that the observational error

*ξ*is normal independent and identically distributed, with a mean value of zero (

*μ*(

*x*) = 0), a variance of

*σ*

^{2}and

*f*(

*x*) drawn from the GP on

*χ*specified by k. That is: where

*K*=

_{ij}*K*(

*x*,

_{i}*x*), and I is the identity matrix.

_{j}If there are *n* training data and test data, then represents the *n* × *n** matrix of covariance evaluated at all pairs of training and test datasets, and this is similarly true for the other values of and ; here *X* and *Y* are the vector of the training data and training data labels *y _{i}*.

*K*, where

*K*=

_{ij}*K*(

*x*,

_{i}*x*). With the known kernel function and degree of noise

_{j}*σ*

^{2}, Equations (2) and (3) would be enough for derivation. The user needs to tune the covariance function and its parameters and the degree of noise suitably during the training process of the GPR models. In the case of GPR with a fixed value of Gaussian noise, a GP model could be trained by applying Bayesian inference, i.e. by maximizing the marginal likelihood. This leads to the minimization of the negative log-posterior: To acquire the hyperparameters, the partial derivative of Equation (4) can be obtained with respect to

*σ*

^{2}and

*k*, and minimization can be obtained by gradient descent.

As an illustration, Figure 1 shows the capability of a GPR model to predict a sine wave with an amplitude of 1, frequency of 1 kHz for a period of 0.1 sec. using a sampling frequency of 20 kHz. The results proved the accuracy of predictions via GPR.

### Efficiency criteria

In this study, the total data were separated into calibration and verification sets. Two different criteria were selected to meter the revenue of the proposed forecasting methods: the root mean square error (RMSE) and the determination coefficient (DC). The RMSE and DC are applied to exhibit discrepancies between predicted and observed values (Adamowski & Chan 2011). The RMSE is used to quantify modeling accuracy, which generates a positive value by squaring the errors. The RMSE grows from zero for perfect forecasts through large positive values as the differences between forecasts and observations become increasingly large. Obviously, a high value for DC (up to one) and small value for RMSE indicate high efficiency of the model. Legates & McCabe (1999) indicated that a model can be sufficiently evaluated by these two statistics. These measures are not oversensitive to extreme values (outliers) and are sensitive to additive and proportional differences between model predictions and observations. Therefore, correlation-based measures (e.g. DC statistic) can indicate that a model is a good predictor (Legates & McCabe 1999).

### Study area and data description

Hamadan city is the capital of Hamadan Province in Iran. At the 2006 census, its population was about 474,000. However, the population increased to 527,000 in 2010, which means the population increase was rapid (11%). This increase in population caused an effect on WD.

Figure 2 illustrates the scheme of water supply in Hamadan. It is inferred from the figure that 40 percent of WD is supplied from Ekbatan dam and Abshine small dam, and the remaining 60 percent is supplied from the wells of Bahar plain. In the last recent years, wells' discharge has reduced. In order to handle the WD issue, a pipeline project is under construction, which has a capacity of 89 MCM per year.

Several socio-economic and climatic factors such as daily water consumption (WD), average monthly water bill (MWB), population (P), and number of households (NH), inflation rate (I), gross national product (GNP), daily max, mean and min temperature (T_{max}, T_{mean}, T_{min}), daily precipitation (P) and daily relative humidity (RH) were used to construct the WD prediction models. The mentioned datasets were obtained from the Water and Waste Water Company (WWW) and Iran Climate Organization (ICO) respectively. Data duration is for the period 2010–2014. Table 1 represents statistics of the data used in this research. The models were trained and verified by GPR. The performance of the models in training and testing sets are compared with the observations.

Parameter . | WD (m^{3}/day)
. | T_{max}
. | T_{min}
. | T_{mean}
. | RH (%) . | R (mm) . | P (person) . | NH . | GNP (Rial/P) . | I (%) . | MWB (Rial/p) . |
---|---|---|---|---|---|---|---|---|---|---|---|

Mean | 131,543.8 | 20.3 | 4.3 | 12.3 | 50.6 | 1 | 498,000 | 91,200 | 152,000 | 102 | 45.4 |

Max. | 157,808 | 39.6 | 23.3 | 29.8 | 99.9 | 40.1 | 527,000 | 99,400 | 191,000 | 208 | 100.2 |

Min. | 101,068 | −9.9 | −25.5 | −16.9 | 14.1 | 0 | 474,000 | 82,500 | 110,000 | 54 | 14.5 |

S_{x} | 10,492.1 | 11.3 | 8 | 9.4 | 22.6 | 3.3 | 56,000 | 44,500 | 12,000 | 42 | 6.7 |

C_{sx} | 8 | 55.8 | 184.4 | 76.3 | 44.7 | 345.9 | 5 | 3 | 2.9 | 6.4 | 6.7 |

Parameter . | WD (m^{3}/day)
. | T_{max}
. | T_{min}
. | T_{mean}
. | RH (%) . | R (mm) . | P (person) . | NH . | GNP (Rial/P) . | I (%) . | MWB (Rial/p) . |
---|---|---|---|---|---|---|---|---|---|---|---|

Mean | 131,543.8 | 20.3 | 4.3 | 12.3 | 50.6 | 1 | 498,000 | 91,200 | 152,000 | 102 | 45.4 |

Max. | 157,808 | 39.6 | 23.3 | 29.8 | 99.9 | 40.1 | 527,000 | 99,400 | 191,000 | 208 | 100.2 |

Min. | 101,068 | −9.9 | −25.5 | −16.9 | 14.1 | 0 | 474,000 | 82,500 | 110,000 | 54 | 14.5 |

S_{x} | 10,492.1 | 11.3 | 8 | 9.4 | 22.6 | 3.3 | 56,000 | 44,500 | 12,000 | 42 | 6.7 |

C_{sx} | 8 | 55.8 | 184.4 | 76.3 | 44.7 | 345.9 | 5 | 3 | 2.9 | 6.4 | 6.7 |

### Proposed model

_{min}), daily mean temperature (T

_{mean}), daily max. temperature (T

_{max}), daily precipitation (P), daily RH, previous day's water consumption (WD

_{t-i}), average MWB, population (P), number of households (NH), GNP and inflation rate (I). In order to predict the daily WD, input variables were selected among the features as follows: where average MWB represents the average amount of money paid by the customers. The data associated with MWB were acquired from the total money collected by the water authority divided by the number of households (Yurdusev & Firat 2009). Population represents the number of people served by the water authority. The yearly demographic data were obtained from the State Statistical Authority and then used in daily date format. Number of households means the number of houses served by the authority. GNP and the inflation rate are summarized as GNP and I, which are applied in monthly scale in this study for Hamadan city. The inflation effect on pecuniary values such as GNP and the water bill is applied in the modeling process to investigate its effect on WD. On the other hand, short term water consumption could be related to climatological variables, therefore these variables were selected as input to GPR. Figure 3 represents climatological variables against respective water consumption values. It could be inferred from Figure 3 that T

_{max}and RH have the most correlation with WD features. In order to obtain a suitable outcome, the input datasets must be arranged in such a way to predict the variable of interest (WD) accurately. Since this process is complex, formerly used approaches were mostly incapable of predicting the WD precisely enough and leading to solid consequences. Therefore, in the present research, the GPR method was employed for WD prediction using the input combinations, which is going to be explained. To expand the prediction models, the total 1,461 data records for daily variables were collected for the period 2010–2014 for the city of Hamadan in Iran. The dataset was separated into two subsets, the calibration and verification datasets. The calibration dataset included data records between 2009 and 2012, which is 80% of the total data records. In order to have a precise assessment, models were verified with a testing dataset, which was omitted from the training process. The testing dataset consisted of 20% of the total data, recorded in the last year. Altunkaynak

*et al.*(2005), Jain

*et al.*(2001), Bougadis

*et al.*(2005) and Yurdusev & Firat (2009) had used almost the same strategy for selection of the training and testing datasets.

In order to have a better insight into used data, Mutual information (MI) was used to calculate non-linear correlation among all variables as sensitivity analysis. In probability theory and information theory, the MI of two random variables is a measure of the mutual dependence between the two variables (Roushangar *et al.* 2016). More specifically, it quantifies the ‘amount of information’ (in units such as bits) obtained about one random variable, through the other random variable. The concept of MI is intricately linked to that of the entropy of a random variable, a fundamental notion in information theory that defines the ‘amount of information’ held in a random variable (Shannon 1948). Table 2 shows the obtained MI values for the selected dataset. WD values with a 3-day lag and all climatological values were computed in this table. WD_{(t)} (water demand of the day) is more correlated with WD_{(t-1)}. MI values were reduced relatively until seven days before, however, all MI values are in an acceptable range for WD_{(t-i)}. T_{max} and RH are other features with the highest MI values (normalized) in Table 2. According to the MI values, through P, T_{max} and RH, T_{max} had more correlation with WD, revealing that T_{max} fluctuations directly affect the WD. However, variation of precipitation values (P) did not affect the WD as did T_{max} and RH.

Feature . | MI values . | Feature . | MI values . |
---|---|---|---|

WD_{(t-1)} | 0.9 | T_{max} | 0.82 |

WD_{(t-2)} | 0.88 | T_{min} | 0.54 |

WD_{(t-3)} | 0.85 | RH | 0.85 |

WD_{(t-4)} | 0.85 | R | 0.34 |

WD_{(t-5)} | 0.78 | MWB | 0.80 |

WD_{(t-6)} | 0.72 | P | 0.76 |

WD_{(t-7)} | 0.69 | NH | 0.45 |

WD | 1.00 | GNB | 0.31 |

T_{mean} | 0.64 | I | 0.56 |

Feature . | MI values . | Feature . | MI values . |
---|---|---|---|

WD_{(t-1)} | 0.9 | T_{max} | 0.82 |

WD_{(t-2)} | 0.88 | T_{min} | 0.54 |

WD_{(t-3)} | 0.85 | RH | 0.85 |

WD_{(t-4)} | 0.85 | R | 0.34 |

WD_{(t-5)} | 0.78 | MWB | 0.80 |

WD_{(t-6)} | 0.72 | P | 0.76 |

WD_{(t-7)} | 0.69 | NH | 0.45 |

WD | 1.00 | GNB | 0.31 |

T_{mean} | 0.64 | I | 0.56 |

Different combination were defined according to MI values to find the best prediction via GPR. Table 3 shows the selected input structure for the performance analysis stage. Since the used data did not have same time discretization, input structures were set in a way to predict water consumption at the daily scale. As an instance, MWB was used as a fixed value for all 30 days in a given month. Next month, the MWB's value changed for next 30 days and so on. The input structure of various combinations is shown in Equation (6).

It could be observed that WD and climatic variables change on a daily scale, MWB, I and GNP dataset varied at the monthly scale and P and NH vary at the annual scale. According to Table 3, Comb.1, 2 and 3 were designed using only the WD dataset of previous time steps. Comb.4 to Comb.12 were determined according to MI values from Table 2 to take advantage of socio-economic and climatic features along with WD values. Since T_{max}, T_{min}, RH, I, MWB and P were the most important datasets according to MI values, they were considered in this stage of modeling. Comb.4, 5, 6 and 7 were defined using previous WD and socio-economic features. Also Comb.8 to Comb.12 were established based on all features that had suitable correlation with WD_{(t).} A total of 12 input combinations were determined to feed into GPR in order to evaluate their efficiency in the performance analysis stage and find the best input structure.

Model . | Selected parameters . |
---|---|

Comb. 1 | WD_{(t-1)} |

Comb. 2 | WD_{(t-1)}, WD_{(t-2)} |

Comb. 3 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} |

Comb. 4 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB |

Comb. 5 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, P |

Comb. 6 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB, P |

Comb. 7 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB, P, I |

Comb. 8 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{max}, RH, MWB, |

Comb. 9 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} T_{max}, RH, P |

Comb. 10 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} T_{mean}, RH, P |

Comb. 11 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{max}, RH, P, MWB |

Comb. 12 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{mean}, RH, P, MWB |

Model . | Selected parameters . |
---|---|

Comb. 1 | WD_{(t-1)} |

Comb. 2 | WD_{(t-1)}, WD_{(t-2)} |

Comb. 3 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} |

Comb. 4 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB |

Comb. 5 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, P |

Comb. 6 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB, P |

Comb. 7 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, MWB, P, I |

Comb. 8 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{max}, RH, MWB, |

Comb. 9 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} T_{max}, RH, P |

Comb. 10 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)} T_{mean}, RH, P |

Comb. 11 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{max}, RH, P, MWB |

Comb. 12 | WD_{(t-1)}, WD_{(t-2)}, WD_{(t-3)}, T_{mean}, RH, P, MWB |

## RESULTS AND DISCUSSION

In this study, a kernel-based machine learning approach, namely GPR, was used in order to obtain the best prediction of WD. A well-advised application of this approach is to select an appropriate kernel function and tune related hyperparameters. A number of kernels were argued by researchers, but most of the studies suggest the effectiveness of radial basis kernel (RBF) function in machine learning methods in the majority of civil engineering applications (Nourani *et al.* 2016). For a fair comparison of results, a RBF , where *γ* (kernel width) is a kernel specific, was used with GPR for all models in this study.

As well as the choice of the kernel function, GPR requires the setting up of kernel-specific parameters. GPR also require optimum values of the Gaussian noise (added to the diagonal of the covariance matrix) and the parameter *ρ* respectively. Several approaches, such as a manual method and a grid search method, are proposed for the selection of user-defined parameters with different algorithms. In the present study, a manual method, which involves carrying out a large number of trials by using different input combinations of user-defined parameters with different algorithms, was applied to choose the optimal value of user-defined parameters. Optimal values of various user-defined parameters are chosen in a way so as to minimize the RMSE values and maximize DC with a test dataset. The obtained optimal values of user-defined parameters were for Noise = 0.4 and *γ* = 0.45.

The obtained results for GPR are presented in Table 4. Conforming with Table 4, from Comb.1 to Comb.3 the performance was increased but the obtained results did not present sufficient accuracy in terms of RMES and DC. The obtained results from Comb.3 were improved by about 9 percent in comparison to Comb.1. The obtained results are illustrated in Figure 4(a). The performance of Comb.4 to Comb.7, which included the civic and economic features, showed a relative improvement. It could be stated that proper correlation (as calculated via MI) among these features caused the recovery in results. It could be concluded that Comb.7, including the WD(t-i) (three-day lag), MWB, and population established the best input structure related to socio-economic features. Subjoining climatic features enhanced performance of the GPR. Based on Table 4, T_{max} and RH were the best climate-based features that enhanced the GPR performances. The models presented significant improvement based on DC and RMSE. Comparing the overall outcome of GPR models, the outcome of Comb.11 led to the best prediction in comparison to the other models. The highest DC and lowest RMSE was captured for Comb.11 in comparison to other models. The results obtained from Comb.6 showed about 4 percent recovery in comparison to Comb.3. Predicted values using Comb.6 are presented in Figure 4(b). The features used in this combination are very important factors in management of water use in urban areas. In other words, MWB, which was applied in six input combinations, is the economic factor that could be considered as an important factor in water consumption management. For cities such as Hamadan, which have a drinking water problem, MWB is the key factor that could be used as a management method to obtain optimum water usage. Rate of population rise is a negative factor in water consumption. It means that population growth causes an increase in WD, and population control could be considered as another important factor in urban water management. On the other hand, Comb.11 which was comprised of climatic features in addition to other combinations, gave the best obtained results for GPR. The results showed about 2 percent recovery when using Comb.11, in comparison to Comb.6. It could be inferred from the results that maximum temperature and RH are substantial climatological factors affecting water applications. When temperature increases or RH decreases, water utilization consequently increases. These climatic factors, along with the mentioned socio-economic factors, are the important features that must be considered in urban water management issues, as a result of the present research. Figure 4(c) demonstrates the obtained results via GPR as the best input dataset among all used models (Comb.11).

Input combination . | Calibration . | Verification . | ||
---|---|---|---|---|

RMSE . | DC . | RMSE . | DC . | |

Comb. 1 | 3,256.7 | 0.78 | 5,301.6 | 0.702 |

Comb. 2 | 2,803.2 | 0.834 | 4,603.5 | 0.751 |

Comb. 3 | 2,685.6 | 0.873 | 4,760.5 | 0.736 |

Comb. 4 | 2,425 | 0.864 | 4,422.0 | 0.772 |

Comb. 5 | 3,037.1 | 0.821 | 4,849.3 | 0.719 |

Comb. 6 | 1,786 | 0.91 | 3,955.7 | 0.82 |

Comb. 7 | 2,087.7 | 0.890 | 4,226.0 | 0.790 |

Comb. 8 | 1,802 | 0.914 | 3,989.4 | 0.809 |

Comb. 9 | 2,120.1 | 0.889 | 4,484.6 | 0.797 |

Comb. 10 | 2,510.1 | 0.869 | 4,923 | 0.741 |

Comb. 11 | 1,591.7 | 0.931 | 3,149.3 | 0.833 |

Comb. 12 | 3,896.5 | 0.826 | 5,486 | 0.714 |

Input combination . | Calibration . | Verification . | ||
---|---|---|---|---|

RMSE . | DC . | RMSE . | DC . | |

Comb. 1 | 3,256.7 | 0.78 | 5,301.6 | 0.702 |

Comb. 2 | 2,803.2 | 0.834 | 4,603.5 | 0.751 |

Comb. 3 | 2,685.6 | 0.873 | 4,760.5 | 0.736 |

Comb. 4 | 2,425 | 0.864 | 4,422.0 | 0.772 |

Comb. 5 | 3,037.1 | 0.821 | 4,849.3 | 0.719 |

Comb. 6 | 1,786 | 0.91 | 3,955.7 | 0.82 |

Comb. 7 | 2,087.7 | 0.890 | 4,226.0 | 0.790 |

Comb. 8 | 1,802 | 0.914 | 3,989.4 | 0.809 |

Comb. 9 | 2,120.1 | 0.889 | 4,484.6 | 0.797 |

Comb. 10 | 2,510.1 | 0.869 | 4,923 | 0.741 |

Comb. 11 | 1,591.7 | 0.931 | 3,149.3 | 0.833 |

Comb. 12 | 3,896.5 | 0.826 | 5,486 | 0.714 |

The best combination based on input variables is italicised and the best input is in bold.

## SUMMARY AND CONCLUSION

The present study investigated the effect of different climatic and socio-economic variables on precise and trustworthy WD prediction, which is an important subject in effective and sustainable water resources planning and management in metropolitan areas. The most substantial goal of the present study was to create an adaptive instrumentation that, when trained, might be used straightforwardly for operational applications such as water allocation. Building such an effective tool requires use of a capable regression approach such as GPR and efficient variables such as socio-economic and climatic information.

To investigate the capability of GPR, different input combinations were proposed and trained. The results of GPR-based models using the observed dataset were compared and evaluated based on their performance in training and testing periods. Different input combinations were determined based on sensitivity analysis obtained from MI values. Comparing the performance of models revealed that the Comb.3 based on previous time steps of WD (3-day time lag), the Comb.6 model based on previous time steps of WD, MWB and P, and Comb.11, which had T_{max} and RH in addition to the other datasets of Comb.6, were selected as the best prediction models based on the evaluation criteria. The lowest value of RMSE and the highest value of DC were obtained from Comb.11 via GPR. It could be inferred from the results that socio-economic features such as MWB, population and inflation rate along with climatic features such as maximum daily temperature and daily RH are the features that affect daily water consumption. Since such socio-economic features are very important in modeling, they could be used as substantial factors in the control and management of municipal WD. Also, based on the results, it was proved that high temperature and low RH cause water use to increase. Additional work will be required for a complete understanding of these findings, and the use of larger and more diversified datasets may help to confirm the empirical results obtained in this study. For future studies, it is suggested to use water quality and water loss data as independent values in prediction of WD to find the efficacy of these features in modeling via black-box methods.

## REFERENCES

*MS Thesis*

*National Center for Atmospheric Research, Boulder, CO*

*.*

*Regression and classification using Gaussian process priors*

*.*

Water Demand Forecasting in the Puget Sound Region: Short and Long-term Models