In this paper, a modelingidentification approach for the monthly municipal water demand system in Hail region, Saudi Arabia, is developed. This approach is based on an autoregressive integrated moving average (ARIMA) model tuned by the particle swarm optimization (PSO). The ARIMA (p, d, q) modeling requires estimation of the integer orders p and q of the AR and MA parts; and the real coefficients of the model. More than being simple, easy to implement and effective, the PSOARIMA model does not require data preprocessing (original timeseries normalization for artificial neural network (ANN) or data stationarization for traditional stochastic timeseries (STS)). Moreover, its performance indicators such as the mean absolute percentage error (MAPE), coefficient of determination (R^{2}), root mean squared error (RMSE) and average absolute relative error (AARE) are compared with those of ANN and STS. The obtained results show that the PSOARIMA outperforms the ANN and STS approaches since it can optimize simultaneously integer and real parameters and provides better accuracy in terms of MAPE (5.2832%), R^{2} (0.9375), RMSE (2.2111 × 10^{5}m^{3}) and AARE (5.2911%). The PSOARIMA model has been implemented using 69 records (for both training and testing). The results can help local water decision makers to better manage the current water resources and to plan extensions in response to the increasing need.
INTRODUCTION
In arid climate countries such as Saudi Arabia, reliable municipal water demand modeling is crucial in both the medium and long term. In fact, water demand is steeply rising and consequently water utilities are invited to make suitable decisions in order to ensure good planning and management of scarce and limited water resources. Operational decisions are in general based on both the present requirements and the expected demands. Water demand can be viewed as a dynamic system and requires mathematically modeling. The developed models can be useful for many purposes such as scheduling, diagnosis, management, control and forecasting. In this paper, monthly municipal water demand (MMWD) is posed as an identification problem and an optimization procedure has been established to solve it. The outlined model is used for forecasting purposes.
This study is concerned with developing MMWD models for the case study of Hail, Saudi Arabia, taking into account historical records of water consumption as unique explanatory variables. A wide variety of models and methods have been used by researchers and water forecasting practitioners. Water demand forecasting problem solving techniques can roughly be categorized into three classes: (1) timeseries stochastic methods (e.g. Zhou et al. 2000; Mohamed & AlMualla 2010; Almutaz et al. 2013; Ngo & Bros 2013; Mehr & Jha 2013; McNown et al. 2015; Vantuch & Zelinka 2015); (2) artificial neural networks (ANNs) (e.g. Ajbar & Ali 2013; Babel et al. 2014); and (3) swarm intelligence optimization techniques (Chau 2004; Huang et al. 2005; Sun et al. 2012; Aladag et al. 2013; Nazari et al. 2015). According to the forecasting horizon, smallscales (hourly, daily and weekly) approaches have been investigated in, for example, Zhang 2003; Boughadis et al. 2005; Adamowski & Karapataki 2010; Herrera et al. 2010; Adamowski et al. 2012; Adebiyi et al. 2014; AlZahrani 2014; Perea et al. 2015; and Tiwari & Adamowski 2015. For longscales (annual) forecasting horizon, efficient frameworks have been developed and their relative performances measured (e.g. Zhang 2003; Khashei & Bijari 2010; Sun et al. 2012; Almutaz et al. 2013; Babel et al. 2014; Donkor et al. 2014; Nazari et al. 2015; Ouda 2014; Yalcintas et al. 2015). Since this study is concerned only with monthly water demand forecasting (e.g. Billings & Agthe 1998; Firat et al. 2009; Jalalkamali et al. 2011; Nasseri et al. 2011; Sudheer et al. 2011; Yasar et al. 2012; Ajbar & Ali 2013; Aladag et al. 2013; Almutaz et al. 2013; Mehr & Jha 2013; Babel et al. 2014; AkbariAlashti et al. 2015; Beheshti et al. 2015; Bai et al. 2016), this paper presents and analyses a relatively recent stateoftheart. Particular attention is allocated to models (mathematical descriptions of the problem), methods (used to solve the problem) and results (obtained). The study area, the performance measures and the case study particular aspects are mentioned if available. Inspired from Donkor et al. (2014), an annotated reference list of selected papers related to monthly water forecasting problems is included in Table 1.
Reference  Model/ purpose/method  Location  Explanatory variables  Results 

Donkor et al. (2014)  Literature review on urban water demand from 2000 to 2010  –  –  Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and casesensitive 
Yasar et al. (2012)  Application of nonlinear regression (NLR) for forecasting monthly water demand  Adana city, Turkey 
 Water consumption in Adana city will increase from 3.84 million m^{3} in 2009 to 4.99 million m^{3} in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature 
Babel et al. (2014)  Use of future climatic and socioeconomic previously forecasted data for forecasting monthly water demand using ANNs  Bangkok 


Firat et al. (2009)  Evaluation and comparison of ANN techniques for MMWD  Izmir, Turkey 
 The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables 
Ajbar & Ali (2013)  Use of ANN for predicting monthly municipal water consumption  Mecca city, Saudi Arabia 
 ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons 
Nasseri et al. (2011)  Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP)  Tehran, Iran  Historic records of monthly water consumption  Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself 
Baheshti et al. (2015)  Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN  Johor state, Malaysia 
 The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are preprocessed 
Buykyildiz et al. (2014)  Estimation of the change in lake water level by artificial intelligence methods including PSOANN, SVR, MLP, RBNN and ANFIS  Lake Beysehir, Turkey 
 SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R^{2} of 0.9988 
Sudheer et al. (2011)  Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R  Rentashintala, India  Historic records  SVMQPSO is better than ANN in terms of RMSE (0.43), R^{2} (0.94) and efficiency (0.84) 
Mehr & Jha (2013)  Analysis of monthly rainfall data using timeseries based approaches: ARIMA and forecasting for the next 12 years for different district towns  Mahanadi River Basin, India  Historic records  The ARIMA(1,0,0)(0,1,1)^{12} is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns 
Billings & Agthe (1998)  Use of statespace approach versus multipleregression for forecasting monthly urban water demand  USA 
 Mean absolute error of forecast ranges from 7.4 to 14.8%. Statespace is found to be better than the multipleregression model on the studied water demand set 
AkbariAlashti et al. (2015)  Evaluation of developed discrete timeseries in flow forecasting models: ARMAX, ANN, GP  East Azerbaijan province, Iran 
 GP is found to be the more precise model with R^{2} = 0.7 and RMSE = 0.172 
Jalalkamali et al. (2011)  Forecasting groundwater level in a well using neurofuzzy and ANN  Kerman plain, Iran 
 NF and ANN techniques are good choices for predicting groundwater level 
Wang et al. (2016)  Three hybrid stochastic models have been used for predicting available water resources, water demand and water use  Urimqi, China 
 The results are assumed by the authors to help managers and policymakers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future 
Bai et al. (2016)  Grey relational analysis model is used for forecasting monthly reservoir inflow  Anhui, China 
 The proposed approach is more accurate than traditional techniques 
Reference  Model/ purpose/method  Location  Explanatory variables  Results 

Donkor et al. (2014)  Literature review on urban water demand from 2000 to 2010  –  –  Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and casesensitive 
Yasar et al. (2012)  Application of nonlinear regression (NLR) for forecasting monthly water demand  Adana city, Turkey 
 Water consumption in Adana city will increase from 3.84 million m^{3} in 2009 to 4.99 million m^{3} in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature 
Babel et al. (2014)  Use of future climatic and socioeconomic previously forecasted data for forecasting monthly water demand using ANNs  Bangkok 


Firat et al. (2009)  Evaluation and comparison of ANN techniques for MMWD  Izmir, Turkey 
 The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables 
Ajbar & Ali (2013)  Use of ANN for predicting monthly municipal water consumption  Mecca city, Saudi Arabia 
 ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons 
Nasseri et al. (2011)  Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP)  Tehran, Iran  Historic records of monthly water consumption  Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself 
Baheshti et al. (2015)  Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN  Johor state, Malaysia 
 The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are preprocessed 
Buykyildiz et al. (2014)  Estimation of the change in lake water level by artificial intelligence methods including PSOANN, SVR, MLP, RBNN and ANFIS  Lake Beysehir, Turkey 
 SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R^{2} of 0.9988 
Sudheer et al. (2011)  Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R  Rentashintala, India  Historic records  SVMQPSO is better than ANN in terms of RMSE (0.43), R^{2} (0.94) and efficiency (0.84) 
Mehr & Jha (2013)  Analysis of monthly rainfall data using timeseries based approaches: ARIMA and forecasting for the next 12 years for different district towns  Mahanadi River Basin, India  Historic records  The ARIMA(1,0,0)(0,1,1)^{12} is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns 
Billings & Agthe (1998)  Use of statespace approach versus multipleregression for forecasting monthly urban water demand  USA 
 Mean absolute error of forecast ranges from 7.4 to 14.8%. Statespace is found to be better than the multipleregression model on the studied water demand set 
AkbariAlashti et al. (2015)  Evaluation of developed discrete timeseries in flow forecasting models: ARMAX, ANN, GP  East Azerbaijan province, Iran 
 GP is found to be the more precise model with R^{2} = 0.7 and RMSE = 0.172 
Jalalkamali et al. (2011)  Forecasting groundwater level in a well using neurofuzzy and ANN  Kerman plain, Iran 
 NF and ANN techniques are good choices for predicting groundwater level 
Wang et al. (2016)  Three hybrid stochastic models have been used for predicting available water resources, water demand and water use  Urimqi, China 
 The results are assumed by the authors to help managers and policymakers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future 
Bai et al. (2016)  Grey relational analysis model is used for forecasting monthly reservoir inflow  Anhui, China 
 The proposed approach is more accurate than traditional techniques 
It should be concluded from Table 1 that the forecasting obtained results are casesensitive. In fact, the same method does not provide the same results when applied to different case studies. Moreover, the ANN technique seems to be the most used alone or combined to other techniques such as particle swarm optimization (PSO).
In this study, the MMWD forecasting problem is posed as an identification framework where the error between the observed (real) and the forecasted (predicted) water demands is to be minimized. From an optimization point of view, tuning the parameters of an ARIMA(p,d,q) model is known to be hybrid since it involves mixed integerreal decision variables. In fact, the presence of integer variables (the orders of the AR and the MA parts) can lead to a combinatorial explosion, whereas the classical autoregressive integrated moving average (ARIMA) model tuning (with known orders) can present a nonconvex aspect and thus classical solving algorithms can be easily trapped into local minimum solutions.
The present paper proposes a new approach, referred to as PSOARIMA, by mapping the ARIMA(p,d,q) parameters to the particle position components in the PSO paradigm. The decision variables are the integervalued orders of the AR and the MA parts and the realvalued coefficients of the model. The proposed PSO algorithm is new since it handles simultaneously discrete and continuous decision variables. Moreover, the searchspace limits are defined such that the stability of the obtained model is ensured. The proposed algorithm has the capacity of operating in complex searchspaces with global optimization ability compared to gradientdescent algorithms known to be local search methods. The relative performance measures of the proposed approach are first evaluated and then compared with those of ANN and traditional stochastic timeseries (STS) approaches. Based on 69 records of MMWD collected from Water Directorate of Hail Region, first an explanatory analysis including the stationarity test of the timeseries was conducted through the socalled sample autocorrelation function (ACF) study. Then, diverse ANN based approaches and different STSapproaches such as AR and ARIMA were used to model the MMWD. Finally, the progressive development of the PSOARIMA approach was detailed. The PSO technique demonstrates high ability to provide mixed integerreal nearoptimal solution for the ARIMA(p,d,q) forecasting model. By comparing the statistical performance indicators, the PSOARIMA is found to be the most parsimonious in the sense that it allows the identification of both the orders and the coefficients simultaneously. Moreover, it is revealed to be insensitive to forecasting problem irregularities including linearity, convexity and stationarity. Other features of the proposed PSOARIMA approach include the facts that it is easy to understand, simple to program (requires few lines of computer code) and that it operates in a straightforward manner on the original timeseries without needing any preprocessing.
The remainder of the paper is organized as follows: in the following section, the problem of MMWD modeling and identification is formulated; followed by a section briefly discussing three tools for solving the problem, namely, ANN, STS and PSO. In the penultimate section, the practical implementation of the presented methodologies is detailed. Special emphasis is allocated to the PSOARIMA approach. Concluding remarks and recommendations are summarized in the final section.
MMWD MODELINGIDENTIFICATION PROBLEM FORMULATION
In this study, the MMWD is viewed as a dynamic system. Thus, an experimental analysis, referred to as identification, should be performed. A mathematical model is derived from measurements (observations in the case of forecasting problems). For such a purpose, the system inputs and outputs are gathered and a relationship between them has to be performed. According to basic system theory concepts, the dynamic system models are of two types: parametric and nonparametric. The parametric models are considered in this study. The structures of the developed models are known in advance and their synthesis parameters are to be tuned optimally such that an error function is minimized.
The MMWD model is a mathematical description assumed to produce similar behavior as the MMWD dynamic real system. The p previous values (obtained via the delay operator q^{–1}) of the system output, y(t), are fed respectively to the model and to the system itself. The estimated value, , the observed value, y(t) as well as their difference, , are introduced to the model identifier block (Bidwell 2005; Aitmaatallah et al. 2015). This block is intended to estimate the unknown model parameters by minimizing the least squared error function between the system and its model respective outputs.
In this study, the model identifier has been implemented using respectively ANN, AR, ARIMA and PSOARIMA approaches. The following section will describe the theoretical and practical aspects of these methodologies.
PROPOSED METHODOLOGIES
Timeseries
ANNs
The ANN has been widely used as a popular tool for MWD forecasting (e.g. Ajbar & Ali 2013; Babel et al. 2014). It was in some cases used as a unique framework, but in the majority of cases it has been coupled with other techniques, such as wavelettransform and PSO. ANNs are defined as universal and highly flexible approximators for diverse timeseries going from the fields of cognitive science to engineering. The concept of ANN is an attempt to imitate the human brain functioning (Babel et al. 2014). A network of artificial neurons is organized in layers. A collection of inputs is introduced to the process's units (neurons). In each layer, the neurons are fully interconnected via weights. The weighted inputs are then introduced to a transfer function, which converts them to an output. Choosing the ANN architecture, determining the number of neurons and choosing the appropriate activation functions are the most important steps in designing an ANN. Distinct architectures, learning methods and performance can be found elsewhere (e.g. Talib et al. 2008; Firat et al. 2009; Adamowski & Karapataki 2010). In this study, ANNs with different structures have been used in monthly MWD forecasting. Details for this aim will be provided later.
PSO
The PSO algorithm was first introduced by Kennedy and Eberhart in 1995. It is categorized as one of the population based swarm intelligence techniques such as Genetic Algorithms (GA), Ant Colony (AC), etc. Parameters tuning of the algorithm, its variants and its applications can be found elsewhere (e.g. Clerc & Kennedy 2002; Delvalle 2008; Wang et al. 2008; Boubaker et al. 2014; Buyukyildiz et al. 2014). PSO is preferred in many applications for its simplicity, its universal aspect and the fact that it does not require any regularity of the studied problem (continuity, differentiability and convexity). Although it has been used in forecasting electric load (e.g. Huang et al. 2005; Nazari et al. 2015; Saravanan et al. 2015), its use in water demand forecasting remains limited.
are two constant factors, named cognitive and social coefficients
are random numbers in the [0,1] range
is the best previous position of the ith particle
is the best previous position of the whole swarm
is a weighting function, known to ensure balance between searchspace exploration and exploitation (Boubaker et al. 2014)
are respectively the maximum and the minimum value of w_{k}.
From the expertise about the MWD forecasting problem, one can define the searchspace limits. Components having values out of their limits are reset through a confinement mechanism. The first step in using PSO to solve the problem (6) is to adopt a suitable mapping between the particle position in the swarm evolution concept and the vector of the MWD forecasting problem synthesis parameters. Note that for this problem, the decision variables set is casesensitive and that it depends on the nature of the available data, the forecasting horizon and the adopted model structure. More details will be provided when dealing with the progressive development of the chosen models based on PSO.
RESULTS AND DISCUSSION
MMWD is in general known to be affected by socioeconomic variables, such as monthly water bill, population, number of subscribers, gross national product, etc. (Firat et al. 2009), and by climatic variables such as precipitation accumulation, relative humidity, air temperature, etc. Moreover, the historical values of the monthly water consumption are among the explanatory variables that affect the current consumption. Since the socioeconomic and the climatic variables are nonavailable, we are limited in this study to analyzing the monthly MWD as a function of only its previous values. AR, ARIMA, ANN and PSObased approaches will be used in forecasting. The first step consists on studying the stationarity of a TS composed of 69 records of monthly MWD (from January 2010 to September 2015) collected from the General Directorate of Water in Hail Region. As proposed by Box–Jenkins methodology, determining whether the monthly MWDTS is stationary (data set having a constant mean and variance) (Boughadis et al. 2005) is required for modeling STS forecasting using ARIMA models. The ANN models require a preprocessing of the original records in order to normalize them in the interval [0,1]. This process will be discussed in detail when dealing with the ANN approaches. Through a suitable mapping between PSO and the monthly MWD forecasting model, an algorithm to determine both the ARIMA model parameters and orders will be detailed. The simulations carried out in this study are performed using Matlab 2013b package and implemented on an Intel® Core™ i33110 microprocessor with a clock of 2 GHz as frequency and 4 GB of RAM. The relative performances of the proposed PSOARIMA approach were evaluated using the performance indicators introduced respectively in Equations (1)–(4). They are then compared with those of the AR, ARIMA and ANNbased approaches.
Study area
Hail city districts are fed with municipal water using valves operated manually according to a weekly fixed schedule (GDWHR 2004).
Evaluation of ANN techniques for monthly MWD modeling
In this section, various ANN techniques such as feedforward (FF) and radialbasis have been evaluated via numerous performance indicators in modeling future MMWD in Hail region (Leva et al. 2015). The first step in using an ANNbased approach is to optimize the architecture of the ANN such that the relationship between the inputs and the outputs is well captured. For this aim, a number of tests have been conducted using different ANN architectures, activation functions and training algorithms (Adamowski et al. 2012). The inputs of the ANNs considered here are the MMWD of the previous months. The data set is divided into two subsets, training and testing. For all simulations, the first N_{1} = 40 records are used for training and the remaining N_{2} = 29 records are used for model validation. Choosing the ANN architecture is the most consuming time step when using ANNs in prediction. The best architecture is decided here by trial and error procedure. After several trials, the best ANN structure is found to consist of an input layer, two hidden layers and one output layer. The optimal numbers of neurons are found to be, respectively, 60 (for the input layer), 16 (for each one of the two hidden layers) and 10 (for the output layer). Linear activation functions are used for the input and the output layers whereas logarithmicsigmoid activation function is used for the two hidden layers. The BPFFANN has been trained using the Levenberg–Marquardt, the resilient backpropagation and the gradient Powell–Beale algorithms. The last one is found to provide better results than the two others.
We define the order of the BPFFANN as the number of inputs for comparison purpose against other techniques. The maximum number of previous monthly MWD used as inputs is set to six. So, six BFFFANN algorithms are trained and tested. The obtained results are depicted in Table 2.
Model  MAPE (%)  R^{2}  RMSE (m^{3})  AARE(%) 

BPFFANN (1)  10.6534  0.7426  4.3552 × 10^{5}  9.2345 
BPFFANN (2)  11.8427  0.6629  4.9971 × 10^{5}  10.1552 
BPFFANN (3)  6.5671  0.9115  2.5753 × 10^{5}  6.3571 
BPFFANN (4)  11.3592  0.6691  4.7912 × 10^{5}  9.8243 
BPFFANN (5)  12.9392  0.5581  5.7695 × 10^{5}  10.8166 
BPFFANN (6)  17.1062  0.2380  7.6783 × 10^{5}  14.2835 
Model  MAPE (%)  R^{2}  RMSE (m^{3})  AARE(%) 

BPFFANN (1)  10.6534  0.7426  4.3552 × 10^{5}  9.2345 
BPFFANN (2)  11.8427  0.6629  4.9971 × 10^{5}  10.1552 
BPFFANN (3)  6.5671  0.9115  2.5753 × 10^{5}  6.3571 
BPFFANN (4)  11.3592  0.6691  4.7912 × 10^{5}  9.8243 
BPFFANN (5)  12.9392  0.5581  5.7695 × 10^{5}  10.8166 
BPFFANN (6)  17.1062  0.2380  7.6783 × 10^{5}  14.2835 
Exploratory analysis
AR models
The Econometrics toolbox of Matlab has been used as the modeling tool for the ARIMA STS studied in this paper (the AR model is considered as a particular case of ARIMA with q = 0). The selected and calibrated models are used for simulation and forecasting. The ‘arima’ function is used to specify the model and the ‘estimate’ function is used to estimate the model polynomial coefficients. These two functions assume that the model orders (p and q) are determined in advance. The ‘estimate’ function uses the maximum likelihood principle which finds the maximum probability density function corresponding to the optimal values of the model parameters.
Model  MAPE (%)  R^{2}  RMSE (m^{3})  AARE (%)  AIC 

AR(1)  5.7864  0.9017  2.6728 × 10^{5}  5.7260  24.0842 
AR(2)  5.7585  0.9238  2.7115 × 10^{5}  5.4833  24.9739 
AR(3)  5.4007  0.9201  2. 31191 × 10^{5}  5.3800  24.0005 
AR(4)  5.5501  0.9203  2. 4240 × 10^{5}  5.6355  24.0661 
AR(5)  5.6534  0.9209  2. 1412 × 10^{5}  5.3555  24.1219 
Model  MAPE (%)  R^{2}  RMSE (m^{3})  AARE (%)  AIC 

AR(1)  5.7864  0.9017  2.6728 × 10^{5}  5.7260  24.0842 
AR(2)  5.7585  0.9238  2.7115 × 10^{5}  5.4833  24.9739 
AR(3)  5.4007  0.9201  2. 31191 × 10^{5}  5.3800  24.0005 
AR(4)  5.5501  0.9203  2. 4240 × 10^{5}  5.6355  24.0661 
AR(5)  5.6534  0.9209  2. 1412 × 10^{5}  5.3555  24.1219 
From the results of Table 3, it can be observed that the AR(3) model performed better than the other models with the lowest MAPE (5.4007%), the highest R^{2}(0.9201) and the lowest Akaike information criterion (AIC) (24.0005).
ARIMA(p,1,q) models
In order to construct the best ARIMA(p,1,q) model, the orders p and q need to be determined. Many combinations of the couple (p,q) have been tested and evaluated according to the four previously defined performance indicators. The onestepahead forecasting is considered here. Four models along with their coefficients and performance indicators for both training and testing data are shown in Table 4.
Model  Parameters  Performance indicators (training + testing)  

C  a_{1}  a_{2}  a_{3}  b_{1}  MAPE (%)  R^{2}  RMSE (m^{3})  AARE (%)  
ARIMA (1,1,1)  16287  0.6611  –  –  0.3563  6.1491  0.9119  2.5508 × 10^{5}  5.4566 
ARIMA (2,1,1)  21846  −1.0202  −0.0202  –  0.9394  8.3298  0.7744  4.1149 × 10^{5}  9.1475 
ARIMA (3,1,1)  356,581  0.5744  0.3190  −0.2738  −1  7.9462  0.7124  4.6811 × 10^{5}  9.2233 
ARIMA (1,0,1)  346,677  0.8467  –  –  −0.2257  7.0535  0.8962  2.8557 × 10^{5}  7.5595 
Model  Parameters  Performance indicators (training + testing)  

C  a_{1}  a_{2}  a_{3}  b_{1}  MAPE (%)  R^{2}  RMSE (m^{3})  AARE (%)  
ARIMA (1,1,1)  16287  0.6611  –  –  0.3563  6.1491  0.9119  2.5508 × 10^{5}  5.4566 
ARIMA (2,1,1)  21846  −1.0202  −0.0202  –  0.9394  8.3298  0.7744  4.1149 × 10^{5}  9.1475 
ARIMA (3,1,1)  356,581  0.5744  0.3190  −0.2738  −1  7.9462  0.7124  4.6811 × 10^{5}  9.2233 
ARIMA (1,0,1)  346,677  0.8467  –  –  −0.2257  7.0535  0.8962  2.8557 × 10^{5}  7.5595 
PSO–ARIMA(p,d,q) models
Since MMWD in Hail region is derived from a stochastic process known to be highly nonlinear and eventually presenting irregularities, the resulting modelingidentification problem is classified as hybrid and difficult to solve. This problem presents a high level of algorithmic complexity. In such conditions, we have to resort to the PSO technique to address the modeling aspect of future water consumptions underlying on the limited available data (Parsopoulos & Vrahatis 2002).
Swarm model definition
This encoding scheme presents the particularity that it contains, simultaneously, integer and realcoded variables. are realcoded variables, p and q are integer variables. To avoid unnecessary computations, the maximum values for p and q respectively are set to 5. Therefore, once p and q are known, all the coefficients between (respectively between) are set to 0. The searchspace limits are defined in Table 5.
Parameter  Min  Max  Nature  Meaning 

10^{−4} × C  1  35  Real  Scalar constant in the linear timeseries model as defined in Equation (5) 
p,q  1  5  Integer  Degrees of the compound AR and the compound moving average (MA) polynomials 
a_{1}, …, a_{p}  −1  1  Real  Coefficients of the AR polynomial 
b_{1}, …, b_{q}  −1  1  Real  Coefficients of the MA polynomial 
Parameter  Min  Max  Nature  Meaning 

10^{−4} × C  1  35  Real  Scalar constant in the linear timeseries model as defined in Equation (5) 
p,q  1  5  Integer  Degrees of the compound AR and the compound moving average (MA) polynomials 
a_{1}, …, a_{p}  −1  1  Real  Coefficients of the AR polynomial 
b_{1}, …, b_{q}  −1  1  Real  Coefficients of the MA polynomial 
Since we have no idea about the magnitude of the ARIMA model coefficients, the objective function f in Equation (12) has been minimized by the ‘fminunc’ (unconstrained minimization function of the optimization toolbox in Matlab) with different initial guesses (because it is a local gradientbased minimization function). Values of C have been found to range from approximately 10,000 to 350,000 and the coefficients a_{1}, a_{2}, …, a_{p}, b_{1},b_{2}, …, b_{q} are found to be between −1 and 1.
Fitness function evaluation
PSO–ARIMA(p,d,q) procedure for modeling monthly MWD
The PSO–ARIMA(p,d,q) approach for modeling monthly MWD in Hail region is described as follows.
Set the PSO synthesis parameters as follows: w_{max} = 0.95, w_{min} = 0.35, nb_part = 20, c_{1} = c_{2} = 0.75, maxIter_nb = 100.
Load the historical records of the monthly MWD in Hail region from an excel file (y(t), t = 1:69).
Generate a whitenoise signal (e(t), t = 1:69).
Set the iteration index to k = 1.
Initialize the swarm particles randomly within the searchspace limits as defined in Table 5.
For each particle, compute the fitness function as defined in Equation (12) using the N_{1} = 40 records.
Determine the index of the best particle among the swarm and set to the position of this particle.
Set the personal best of each particle to its current position.
Increment the iteration index k = k + 1.
Update the position of each particle according to Equations (8) and (9).
As are integer variables, round them off to the nearest integer values over {1, 2, 3, 4, 5}.
If one of the realcoded variables comes out of the searchspace limits defined in Table 5, confine it by resetting within the searchspace.
Compute the fitness function for each particle according to Equation (12). Compare the current fitness of each particle to its previous personal best fitness. If there is improvement, then set .
Determine the index of the best particle in the swarm and update accordingly.
Test the stopping criterion, if not reached loop to Step 9. If reached, record as the best solution.
Adaptation procedure for the real and integer decision variables
In this subsection, more emphasis on steps (10) and (11) of the PSOARIMA procedure is put forward to clarify the adaptation procedure used to simultaneously handle the real and integer variables of the studied forecasting problem. By reference to Equation (11) for the position vector and to Table 5 for the meaning and limits of the position components, the following example is considered. Assume that at the kth iteration, the decision variables vector of the ith particle is given by X_{k}^{i} = [29 × 10^{4}, 2.11, 0.7, −0.8, 3.2, −1.7, 3.11, 3.92, 0.05, −0.2, 0.17, 4.5 2.73] (this vector is the result of step (10)). Since p and q are integer variables, the second and the eighth related values are roundedoff to the nearest integer numbers. p = roundoff (2.11) = 2 and q = roundoff (3.92) = 4. Consequently, a_{3} = a_{4} = a_{5} = 0 and b_{5} = 0. As a result of the adaptation procedure (output of step (11)), the ith particle position vector is then given by X_{k}^{i} = [29 × 10^{4}, 2, 0.7, −0.8, 0, 0, 0, 4, 0.05, −0.2, 0.17, 4.5, 0].
Numerical results
Performance indicators  

Model  MAPE (%)  RMSE (m^{3})  R^{2}  AARE (%)  
Phase  Training + testing  Training  Testing  Training + testing  Training  Testing  Training + testing  Training  Testing  Training + testing  Training  Testing 
BPFFANN(3)  6.5671  6.4609  6.4977  2.5735 × 10^{5}  2.2320 × 10^{5}  2.3611 × 10^{5}  0.9115  0.9189  0.6346  6.3571  6.2218  7.2207 
STSAR(3)  5.4007  5.4672  5.2009  2.3119 × 10^{5}  2.3232 × 10^{5}  2.3707 × 10^{5}  0.9201  0,5125  0.8582  5.3800  5.3570  5.4473 
STSARIMA(1,0,1)  7.0535  7.2234  8.2817  2.8597 × 10^{5}  2.7733 × 10^{5}  3.2787 × 10^{5}  0.8962  0.8736  0.7036  8.2236  8.3270  7.1179 
STSARIMA(1,1,1)  6.1491  6.0917  7.4320  2.5528 × 10^{5}  2.4678 × 10^{5}  2.7965 × 10^{5}  0.9119  0.8233  0.7958  7.3255  7.8917  7.3576 
PSOARIMA  5.2832  4.8236  5.3217  2.2111 × 10^{5}  1.8932 × 10^{5}  2.2367 × 10^{5}  0.9375  0.9452  0.8289  5.2911  5.0733  5.3319 
Performance indicators  

Model  MAPE (%)  RMSE (m^{3})  R^{2}  AARE (%)  
Phase  Training + testing  Training  Testing  Training + testing  Training  Testing  Training + testing  Training  Testing  Training + testing  Training  Testing 
BPFFANN(3)  6.5671  6.4609  6.4977  2.5735 × 10^{5}  2.2320 × 10^{5}  2.3611 × 10^{5}  0.9115  0.9189  0.6346  6.3571  6.2218  7.2207 
STSAR(3)  5.4007  5.4672  5.2009  2.3119 × 10^{5}  2.3232 × 10^{5}  2.3707 × 10^{5}  0.9201  0,5125  0.8582  5.3800  5.3570  5.4473 
STSARIMA(1,0,1)  7.0535  7.2234  8.2817  2.8597 × 10^{5}  2.7733 × 10^{5}  3.2787 × 10^{5}  0.8962  0.8736  0.7036  8.2236  8.3270  7.1179 
STSARIMA(1,1,1)  6.1491  6.0917  7.4320  2.5528 × 10^{5}  2.4678 × 10^{5}  2.7965 × 10^{5}  0.9119  0.8233  0.7958  7.3255  7.8917  7.3576 
PSOARIMA  5.2832  4.8236  5.3217  2.2111 × 10^{5}  1.8932 × 10^{5}  2.2367 × 10^{5}  0.9375  0.9452  0.8289  5.2911  5.0733  5.3319 
The performance of both training and validation data sets considered together as well as the training and validation data sets considered separately are summarized in Table 6. The results of the best BPFFANN(3), AR(3), STSARIMA(1,0,1) using the original records and the STSARIMA(1,1,1) using the stationarized records are included in the same table for comparison.
As shown in Figure 13, the behavior of the whole swarm along with the iterations is shown respectively for iteration 1, 20, 50 and 100. The proposed PSObased approach drives the swarm to a steady state (for the optimal a_{1} and b_{1}). The remaining optimal parameters behave similarly.
Comparative analysis and concluding remarks
According to the obtained results, it has been shown that the used PSO synthesis parameters are effective for modeling the monthly MWD in Hail region. The set of parameters have been used successfully for other engineering problems such as those presented in Zhang (2003), Huang et al. (2005) and ElTelbany & ElKarmi (2008). The unique exception concerns with c_{1} and c_{2}. In fact, c_{1} = c_{2} = 2.05 (as in the cited papers) conduct to a severe oscillating behavior. In this study they have been set to 0.75. Moreover, the PSObased approach has the advantage of simultaneously handling realcoded and integercoded decision variables. Thus the identification of the AR part order and MA part order of the ARIMA model and the estimation of the polynomial coefficients are performed together in a unique framework. This can save time in design compared to STS and ANNbased approaches. The proposed PSOARIMA approach should be the preferred approach (over ANN and STS) due to its simplicity without needing data preprocessing (data rescaling in ANN and stationarization in STS). The comparative results in Table 6 reveal that the PSObased approach outperforms both ANN and STS approaches in terms of the four computed performance indicators (a MAPE of 5.2832%, an R^{2} of 0.9375, an RMSE of 2.2111 × 10^{5}m^{3} and an AARE of 5.2911%) for the global data sets and for the training and validation data sets considered separately. The testing phase is the one that is more representative of the actual performances of the model in a real application framework and the RMSE is the most significant performance indicator because it represents the objective function to be minimized. The value of the RMSE for the PSOARIMA in the testing phase (2.2111 × 10^{5} m^{3}) is better than the RMSE of the STSAR(3) in the same phase (2.3119 × 10^{5} m^{3}). By computing the relative difference between the two values, it has been found that the PSOARIMA is 4.36% better than the STSAR(3).
In terms of monthly MWD profile tracking, all the developed models present a whole behavior characterized by forecasted values that follow the actual values relatively closely.
CONCLUSIONS AND RECOMMENDATIONS
In this paper, an approach based on an ARIMA model combined to the PSO as an identification tool was developed and applied to the forecasting of onemonthahead municipal water demand in Hail region, Saudi Arabia. The modeling–identification problem included integervalued (model orders) and realvalued (model coefficients) decision variables. It was then considered as a hybrid complex optimization problem for which classical optimization techniques suffer from combinatoric explosion (for the discrete aspect) and high ability to be trapped into local minimum points (for the continuous aspect). The proposed approach has the advantage of optimizing integer and real variables simultaneously. Moreover, it has been shown to operate efficiently in complex searchspaces and converge to nearoptimal solution in a reasonable computation time. Since the PSO is a conceptually stochastic optimization algorithm, 300 runs have been carried out using 69 records of monthly water demand. From the results of the best PSOARIMA run reported in this paper, this approach has been shown to outperform BPFFANN, STSAR and STSARIMA based solutions in terms of four performance indicators namely the MAPE, R^{2}, RMSE and AARE for both training and testing data sets.
More than being relatively better in term of forecasting accuracy, the proposed PSOARIMA approach should be preferred for being insensitive to the forecasting problem irregularities including convexity and differentiability. In addition, the PSOARIMA does not require any data preprocessing (data rescaling in ANN and data stationarity in STS).
Forecasting monthly MWD is classified as mediumrange; it can participate to efficient operation and management of an existing water supply system. Hail water utilities can take benefit from the results of this study to develop efficient plans for optimized system operation and ensure balancing between water need and supply. The developed monthly MWD in Hail region are fundamental in taking decisions in water management issues. Establishing efficient pricing policies, planning new system developments and optimizing sizes and operation procedures of the supply system (pumps, reservoirs) are among these issues. Results obtained for monthly MWD by similar studies (Firat et al. 2009; Babel et al. 2014) demonstrated that using other explanatory variables (such as population statistics, water bill, subscribers’ number, climatic variables, etc.) can improve forecasting results accuracy.
To study the effect of climatic changes on monthly MWD, other ways can be explored. In particular, for Hail region, two periods (summer and winter) will be considered separately. If sufficient metering devices are available, the living levels in different cities of Hail town will be taken into consideration. A future issue will be based on forecasting climatic and socioeconomic variables first and after that use them to forecast water consumption.
ACKNOWLEDGEMENTS
This research project was granted by the Deanship of Scientific Research in Hail University, Saudi Arabia under the number E22 CC.