In this paper, a modeling-identification approach for the monthly municipal water demand system in Hail region, Saudi Arabia, is developed. This approach is based on an auto-regressive integrated moving average (ARIMA) model tuned by the particle swarm optimization (PSO). The ARIMA (p, d, q) modeling requires estimation of the integer orders p and q of the AR and MA parts; and the real coefficients of the model. More than being simple, easy to implement and effective, the PSO-ARIMA model does not require data pre-processing (original time-series normalization for artificial neural network (ANN) or data stationarization for traditional stochastic time-series (STS)). Moreover, its performance indicators such as the mean absolute percentage error (MAPE), coefficient of determination (R2), root mean squared error (RMSE) and average absolute relative error (AARE) are compared with those of ANN and STS. The obtained results show that the PSO-ARIMA outperforms the ANN and STS approaches since it can optimize simultaneously integer and real parameters and provides better accuracy in terms of MAPE (5.2832%), R2 (0.9375), RMSE (2.2111 × 105m3) and AARE (5.2911%). The PSO-ARIMA model has been implemented using 69 records (for both training and testing). The results can help local water decision makers to better manage the current water resources and to plan extensions in response to the increasing need.

In arid climate countries such as Saudi Arabia, reliable municipal water demand modeling is crucial in both the medium and long term. In fact, water demand is steeply rising and consequently water utilities are invited to make suitable decisions in order to ensure good planning and management of scarce and limited water resources. Operational decisions are in general based on both the present requirements and the expected demands. Water demand can be viewed as a dynamic system and requires mathematically modeling. The developed models can be useful for many purposes such as scheduling, diagnosis, management, control and forecasting. In this paper, monthly municipal water demand (MMWD) is posed as an identification problem and an optimization procedure has been established to solve it. The outlined model is used for forecasting purposes.

This study is concerned with developing MMWD models for the case study of Hail, Saudi Arabia, taking into account historical records of water consumption as unique explanatory variables. A wide variety of models and methods have been used by researchers and water forecasting practitioners. Water demand forecasting problem solving techniques can roughly be categorized into three classes: (1) time-series stochastic methods (e.g. Zhou et al. 2000; Mohamed & Al-Mualla 2010; Almutaz et al. 2013; Ngo & Bros 2013; Mehr & Jha 2013; McNown et al. 2015; Vantuch & Zelinka 2015); (2) artificial neural networks (ANNs) (e.g. Ajbar & Ali 2013; Babel et al. 2014); and (3) swarm intelligence optimization techniques (Chau 2004; Huang et al. 2005; Sun et al. 2012; Aladag et al. 2013; Nazari et al. 2015). According to the forecasting horizon, small-scales (hourly, daily and weekly) approaches have been investigated in, for example, Zhang 2003; Boughadis et al. 2005; Adamowski & Karapataki 2010; Herrera et al. 2010; Adamowski et al. 2012; Adebiyi et al. 2014; Al-Zahrani 2014; Perea et al. 2015; and Tiwari & Adamowski 2015. For long-scales (annual) forecasting horizon, efficient frameworks have been developed and their relative performances measured (e.g. Zhang 2003; Khashei & Bijari 2010; Sun et al. 2012; Almutaz et al. 2013; Babel et al. 2014; Donkor et al. 2014; Nazari et al. 2015; Ouda 2014; Yalcintas et al. 2015). Since this study is concerned only with monthly water demand forecasting (e.g. Billings & Agthe 1998; Firat et al. 2009; Jalalkamali et al. 2011; Nasseri et al. 2011; Sudheer et al. 2011; Yasar et al. 2012; Ajbar & Ali 2013; Aladag et al. 2013; Almutaz et al. 2013; Mehr & Jha 2013; Babel et al. 2014; Akbari-Alashti et al. 2015; Beheshti et al. 2015; Bai et al. 2016), this paper presents and analyses a relatively recent state-of-the-art. Particular attention is allocated to models (mathematical descriptions of the problem), methods (used to solve the problem) and results (obtained). The study area, the performance measures and the case study particular aspects are mentioned if available. Inspired from Donkor et al. (2014), an annotated reference list of selected papers related to monthly water forecasting problems is included in Table 1.

Table 1

Annotated reference of selected papers on monthly water demand forecasting

ReferenceModel/ purpose/methodLocationExplanatory variablesResults
Donkor et al. (2014)  Literature review on urban water demand from 2000 to 2010 – – Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and case-sensitive 
Yasar et al. (2012)  Application of nonlinear regression (NLR) for forecasting monthly water demand Adana city, Turkey 
  • - Average monthly water bill

  • - Total subscribership

  • - Temperature

  • - Humidity

  • - Rainfall

  • - Solar radiation

  • - Sunshine duration

  • - Wind speed

  • - Atmospheric pressure

 
Water consumption in Adana city will increase from 3.84 million m3 in 2009 to 4.99 million m3 in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature 
Babel et al. (2014)  Use of future climatic and socio-economic previously forecasted data for forecasting monthly water demand using ANNs Bangkok 
  • - Gross Provincial Product

  • - Population

  • - Number of household connections

  • - Evaporation

  • - Relative humidity

  • - Minimum temperature

  • - Maximum temperature

 
  • - Good prediction accuracy (AARE of 4.76%)

  • - Maximum temperature and evaporation affect the MWD

 
Firat et al. (2009)  Evaluation and comparison of ANN techniques for MMWD Izmir, Turkey 
  • - Average monthly water bill

  • - Population

  • - Number of households

  • - Gross national product

  • - Monthly average temperature

  • - Monthly total rainfall

  • - Monthly average humidity

  • - Inflation rate

 
The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables 
Ajbar & Ali (2013)  Use of ANN for predicting monthly municipal water consumption Mecca city, Saudi Arabia 
  • - Historic records

  • - Estimated visitors’ distribution

  • - Household income

  • - Persons per household

  • - City population

  • - Maximum monthly temperature

 
ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons 
Nasseri et al. (2011)  Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP) Tehran, Iran Historic records of monthly water consumption Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself 
Baheshti et al. (2015)  Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN Johor state, Malaysia 
  • - Month number (1–12)

  • - Rainfall in the last month

  • - Rainfall of the current month in the last year

  • - Rainfall of the month in the last 2 years

  • - Average of the month rainfall in the last 5 years

  • - Average rainfall of the month during the last 10 years

 
The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are pre-processed 
Buykyildiz et al. (2014)  Estimation of the change in lake water level by artificial intelligence methods including PSO-ANN, SVR, MLP, RBNN and ANFIS Lake Beysehir, Turkey 
  • - Monthly inflow

  • - Lost flow

  • - Precipitation

  • - Evaporation

  • - Outflow

 
SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R2 of 0.9988 
Sudheer et al. (2011)  Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R Rentashintala, India Historic records SVM-QPSO is better than ANN in terms of RMSE (0.43), R2 (0.94) and efficiency (0.84) 
Mehr & Jha (2013)  Analysis of monthly rainfall data using time-series based approaches: ARIMA and forecasting for the next 12 years for different district towns Mahanadi River Basin, India Historic records The ARIMA(1,0,0)(0,1,1)12 is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns 
Billings & Agthe (1998)  Use of state-space approach versus multiple-regression for forecasting monthly urban water demand USA 
  • - Average monthly water purchased per household

  • - Average monthly temperature

  • - Monthly precipitation

  • - Marginal price of water

  • - Block rate subsidy

  • - Real income per capita

 
Mean absolute error of forecast ranges from 7.4 to 14.8%. State-space is found to be better than the multiple-regression model on the studied water demand set 
Akbari-Alashti et al. (2015)  Evaluation of developed discrete time-series in flow forecasting models: ARMAX, ANN, GP East Azerbaijan province, Iran 
  • - Precipitation

  • - Previous runoffs

 
GP is found to be the more precise model with R2 = 0.7 and RMSE = 0.172 
Jalalkamali et al. (2011)  Forecasting groundwater level in a well using neuro-fuzzy and ANN Kerman plain, Iran 
  • - Air temperature

  • - Rainfall

  • - Groundwater level

 
NF and ANN techniques are good choices for predicting groundwater level 
Wang et al. (2016)  Three hybrid stochastic models have been used for predicting available water resources, water demand and water use Urimqi, China 
  • - Historical records

 
The results are assumed by the authors to help managers and policy-makers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future 
Bai et al. (2016)  Grey relational analysis model is used for forecasting monthly reservoir inflow Anhui, China 
  • - Historical monthly data

 
The proposed approach is more accurate than traditional techniques 
ReferenceModel/ purpose/methodLocationExplanatory variablesResults
Donkor et al. (2014)  Literature review on urban water demand from 2000 to 2010 – – Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and case-sensitive 
Yasar et al. (2012)  Application of nonlinear regression (NLR) for forecasting monthly water demand Adana city, Turkey 
  • - Average monthly water bill

  • - Total subscribership

  • - Temperature

  • - Humidity

  • - Rainfall

  • - Solar radiation

  • - Sunshine duration

  • - Wind speed

  • - Atmospheric pressure

 
Water consumption in Adana city will increase from 3.84 million m3 in 2009 to 4.99 million m3 in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature 
Babel et al. (2014)  Use of future climatic and socio-economic previously forecasted data for forecasting monthly water demand using ANNs Bangkok 
  • - Gross Provincial Product

  • - Population

  • - Number of household connections

  • - Evaporation

  • - Relative humidity

  • - Minimum temperature

  • - Maximum temperature

 
  • - Good prediction accuracy (AARE of 4.76%)

  • - Maximum temperature and evaporation affect the MWD

 
Firat et al. (2009)  Evaluation and comparison of ANN techniques for MMWD Izmir, Turkey 
  • - Average monthly water bill

  • - Population

  • - Number of households

  • - Gross national product

  • - Monthly average temperature

  • - Monthly total rainfall

  • - Monthly average humidity

  • - Inflation rate

 
The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables 
Ajbar & Ali (2013)  Use of ANN for predicting monthly municipal water consumption Mecca city, Saudi Arabia 
  • - Historic records

  • - Estimated visitors’ distribution

  • - Household income

  • - Persons per household

  • - City population

  • - Maximum monthly temperature

 
ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons 
Nasseri et al. (2011)  Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP) Tehran, Iran Historic records of monthly water consumption Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself 
Baheshti et al. (2015)  Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN Johor state, Malaysia 
  • - Month number (1–12)

  • - Rainfall in the last month

  • - Rainfall of the current month in the last year

  • - Rainfall of the month in the last 2 years

  • - Average of the month rainfall in the last 5 years

  • - Average rainfall of the month during the last 10 years

 
The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are pre-processed 
Buykyildiz et al. (2014)  Estimation of the change in lake water level by artificial intelligence methods including PSO-ANN, SVR, MLP, RBNN and ANFIS Lake Beysehir, Turkey 
  • - Monthly inflow

  • - Lost flow

  • - Precipitation

  • - Evaporation

  • - Outflow

 
SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R2 of 0.9988 
Sudheer et al. (2011)  Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R Rentashintala, India Historic records SVM-QPSO is better than ANN in terms of RMSE (0.43), R2 (0.94) and efficiency (0.84) 
Mehr & Jha (2013)  Analysis of monthly rainfall data using time-series based approaches: ARIMA and forecasting for the next 12 years for different district towns Mahanadi River Basin, India Historic records The ARIMA(1,0,0)(0,1,1)12 is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns 
Billings & Agthe (1998)  Use of state-space approach versus multiple-regression for forecasting monthly urban water demand USA 
  • - Average monthly water purchased per household

  • - Average monthly temperature

  • - Monthly precipitation

  • - Marginal price of water

  • - Block rate subsidy

  • - Real income per capita

 
Mean absolute error of forecast ranges from 7.4 to 14.8%. State-space is found to be better than the multiple-regression model on the studied water demand set 
Akbari-Alashti et al. (2015)  Evaluation of developed discrete time-series in flow forecasting models: ARMAX, ANN, GP East Azerbaijan province, Iran 
  • - Precipitation

  • - Previous runoffs

 
GP is found to be the more precise model with R2 = 0.7 and RMSE = 0.172 
Jalalkamali et al. (2011)  Forecasting groundwater level in a well using neuro-fuzzy and ANN Kerman plain, Iran 
  • - Air temperature

  • - Rainfall

  • - Groundwater level

 
NF and ANN techniques are good choices for predicting groundwater level 
Wang et al. (2016)  Three hybrid stochastic models have been used for predicting available water resources, water demand and water use Urimqi, China 
  • - Historical records

 
The results are assumed by the authors to help managers and policy-makers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future 
Bai et al. (2016)  Grey relational analysis model is used for forecasting monthly reservoir inflow Anhui, China 
  • - Historical monthly data

 
The proposed approach is more accurate than traditional techniques 

It should be concluded from Table 1 that the forecasting obtained results are case-sensitive. In fact, the same method does not provide the same results when applied to different case studies. Moreover, the ANN technique seems to be the most used alone or combined to other techniques such as particle swarm optimization (PSO).

In this study, the MMWD forecasting problem is posed as an identification framework where the error between the observed (real) and the forecasted (predicted) water demands is to be minimized. From an optimization point of view, tuning the parameters of an ARIMA(p,d,q) model is known to be hybrid since it involves mixed integer-real decision variables. In fact, the presence of integer variables (the orders of the AR and the MA parts) can lead to a combinatorial explosion, whereas the classical auto-regressive integrated moving average (ARIMA) model tuning (with known orders) can present a non-convex aspect and thus classical solving algorithms can be easily trapped into local minimum solutions.

The present paper proposes a new approach, referred to as PSO-ARIMA, by mapping the ARIMA(p,d,q) parameters to the particle position components in the PSO paradigm. The decision variables are the integer-valued orders of the AR and the MA parts and the real-valued coefficients of the model. The proposed PSO algorithm is new since it handles simultaneously discrete and continuous decision variables. Moreover, the search-space limits are defined such that the stability of the obtained model is ensured. The proposed algorithm has the capacity of operating in complex search-spaces with global optimization ability compared to gradient-descent algorithms known to be local search methods. The relative performance measures of the proposed approach are first evaluated and then compared with those of ANN and traditional stochastic time-series (STS) approaches. Based on 69 records of MMWD collected from Water Directorate of Hail Region, first an explanatory analysis including the stationarity test of the time-series was conducted through the so-called sample auto-correlation function (ACF) study. Then, diverse ANN- based approaches and different STS-approaches such as AR and ARIMA were used to model the MMWD. Finally, the progressive development of the PSO-ARIMA approach was detailed. The PSO technique demonstrates high ability to provide mixed integer-real near-optimal solution for the ARIMA(p,d,q) forecasting model. By comparing the statistical performance indicators, the PSO-ARIMA is found to be the most parsimonious in the sense that it allows the identification of both the orders and the coefficients simultaneously. Moreover, it is revealed to be insensitive to forecasting problem irregularities including linearity, convexity and stationarity. Other features of the proposed PSO-ARIMA approach include the facts that it is easy to understand, simple to program (requires few lines of computer code) and that it operates in a straightforward manner on the original time-series without needing any pre-processing.

The remainder of the paper is organized as follows: in the following section, the problem of MMWD modeling and identification is formulated; followed by a section briefly discussing three tools for solving the problem, namely, ANN, STS and PSO. In the penultimate section, the practical implementation of the presented methodologies is detailed. Special emphasis is allocated to the PSO-ARIMA approach. Concluding remarks and recommendations are summarized in the final section.

In this study, the MMWD is viewed as a dynamic system. Thus, an experimental analysis, referred to as identification, should be performed. A mathematical model is derived from measurements (observations in the case of forecasting problems). For such a purpose, the system inputs and outputs are gathered and a relationship between them has to be performed. According to basic system theory concepts, the dynamic system models are of two types: parametric and non-parametric. The parametric models are considered in this study. The structures of the developed models are known in advance and their synthesis parameters are to be tuned optimally such that an error function is minimized.

Given N observations of the MMWD, y(t), t = 1:N; The set of observations is divided into two sub-sets, model development and model validation. The first sub-set is composed of N1 records and the second one is composed of the remaining N2 observations. The block diagram of the modeling-identification procedure is shown in Figure 1.
Figure 1

Block diagram of the modeling-identification procedure for the MMWD dynamic system.

Figure 1

Block diagram of the modeling-identification procedure for the MMWD dynamic system.

Close modal

The MMWD model is a mathematical description assumed to produce similar behavior as the MMWD dynamic real system. The p previous values (obtained via the delay operator q–1) of the system output, y(t), are fed respectively to the model and to the system itself. The estimated value, , the observed value, y(t) as well as their difference, , are introduced to the model identifier block (Bidwell 2005; Aitmaatallah et al. 2015). This block is intended to estimate the unknown model parameters by minimizing the least squared error function between the system and its model respective outputs.

In this study, the model identifier has been implemented using respectively ANN, AR, ARIMA and PSO-ARIMA approaches. The following section will describe the theoretical and practical aspects of these methodologies.

The good fit of the forecasted values is measured via the mean absolute percentage error (MAPE), the coefficient of determination (R2), the root mean squared error (RMSE) and the average absolute relative error (AARE) defined as follows:
1
2
3
4
where is the forecasted water demand, yt is the tth actual water demand, is the mean water demand and N is the number of observations.

Time-series

A time-series of municipal water demand is defined as a set of values that occur over a period of time in a certain pattern. Under the assumption that the time-series behavior (pattern) will persist in the future, this approach has been widely used in forecasting water consumption (Mohamed & Al-Mualla 2010; Donkor et al. 2014). A strong tool used for this purpose is the Box-Jenkins methodology (BJM; Araghienajad 2014). Before identifying the model structure and its parameters, a preliminary exploratory study has to be conducted. As the Box–Jenkins methodology is appropriate for stationary time-series of medium to long length, the first step in using this approach is to measure the autocorrelation between successive values. From the so-called sample ACF, it should be easy to define the model order. Univariate ARIMA(p,d,q) model has been demonstrated through various researches to be useful for analysis and forecasting of municipal water demand (Bidwell 2005; Herrera et al. 2010; Herrera et al. 2011; Vafeiadis et al. 2012). If y(t) is the stationary MWD at time t, then its ARIMA model is defined as follows:
5
where C is a constant value, are the coefficients of the autoregressive (AR) part and p is its order; are the coefficients of the moving average part and q is its order. e(t) is a white noise sequence following normal distribution with 0 as mean and 1 as standard deviation (Herrera et al. 2011). In this study, some experiments have been conducted using only the AR component of the ARIMA model. Once the stationarity of the TS is studied, differencing the original TS will be required. The level of differencing is in general determined by examining the ACF plots. If the ACF of the time-series values either cuts off or dies down quickly, the TS can be declared as stationary until this level of differencing. Otherwise, the process of differencing has to be continued until obtaining the stationarity (Ngo & Bros 2013). Once the values of p, d and q are determined, the coefficients , have to be estimated. As proposed in the literature (Araghienajad 2014), the maximum likelihood estimation process as outlined by Box and Jenkins is used in order to estimate the last coefficients. The final step in Box and Jenkins methodology is the diagnostic checking of the model by studying the autocorrelation plots of the residuals. If both the ACFs and the partial auto-correlation function are small, then the model is declared to be adequate, otherwise, the process is repeated by re-adjusting the values of p and q (BJM; Tutunji et al. 2007).

ANNs

The ANN has been widely used as a popular tool for MWD forecasting (e.g. Ajbar & Ali 2013; Babel et al. 2014). It was in some cases used as a unique framework, but in the majority of cases it has been coupled with other techniques, such as wavelet-transform and PSO. ANNs are defined as universal and highly flexible approximators for diverse time-series going from the fields of cognitive science to engineering. The concept of ANN is an attempt to imitate the human brain functioning (Babel et al. 2014). A network of artificial neurons is organized in layers. A collection of inputs is introduced to the process's units (neurons). In each layer, the neurons are fully interconnected via weights. The weighted inputs are then introduced to a transfer function, which converts them to an output. Choosing the ANN architecture, determining the number of neurons and choosing the appropriate activation functions are the most important steps in designing an ANN. Distinct architectures, learning methods and performance can be found elsewhere (e.g. Talib et al. 2008; Firat et al. 2009; Adamowski & Karapataki 2010). In this study, ANNs with different structures have been used in monthly MWD forecasting. Details for this aim will be provided later.

PSO

The PSO algorithm was first introduced by Kennedy and Eberhart in 1995. It is categorized as one of the population based swarm intelligence techniques such as Genetic Algorithms (GA), Ant Colony (AC), etc. Parameters tuning of the algorithm, its variants and its applications can be found elsewhere (e.g. Clerc & Kennedy 2002; Delvalle 2008; Wang et al. 2008; Boubaker et al. 2014; Buyukyildiz et al. 2014). PSO is preferred in many applications for its simplicity, its universal aspect and the fact that it does not require any regularity of the studied problem (continuity, differentiability and convexity). Although it has been used in forecasting electric load (e.g. Huang et al. 2005; Nazari et al. 2015; Saravanan et al. 2015), its use in water demand forecasting remains limited.

In the present study, we intend to take benefit from the features of the PSO to forecast the MWD in Hail region. MWD forecasting is posed as (or transformed into) an optimization problem. This problem is defined as in Equation (6):
6
PSO is based on the concept of gradually evolving a swarm of possible solutions for an optimization problem (Trelea 2003). The potential solution is coded as the position of the ith particle in the d-dimensional search space:
7
Let be the decision variables and k the iteration index of the optimization problem (6). During the search process, each particle in the swarm adjusts its position through a velocity operator defined as:
8
where:
  • are two constant factors, named cognitive and social coefficients

  • are random numbers in the [0,1] range

  • is the best previous position of the ith particle

  • is the best previous position of the whole swarm

  • is a weighting function, known to ensure balance between search-space exploration and exploitation (Boubaker et al. 2014)

  • are respectively the maximum and the minimum value of wk.

The position vector of the ith particle is then given by Equation (9):
9

From the expertise about the MWD forecasting problem, one can define the search-space limits. Components having values out of their limits are reset through a confinement mechanism. The first step in using PSO to solve the problem (6) is to adopt a suitable mapping between the particle position in the swarm evolution concept and the vector of the MWD forecasting problem synthesis parameters. Note that for this problem, the decision variables set is case-sensitive and that it depends on the nature of the available data, the forecasting horizon and the adopted model structure. More details will be provided when dealing with the progressive development of the chosen models based on PSO.

MMWD is in general known to be affected by socioeconomic variables, such as monthly water bill, population, number of subscribers, gross national product, etc. (Firat et al. 2009), and by climatic variables such as precipitation accumulation, relative humidity, air temperature, etc. Moreover, the historical values of the monthly water consumption are among the explanatory variables that affect the current consumption. Since the socioeconomic and the climatic variables are non-available, we are limited in this study to analyzing the monthly MWD as a function of only its previous values. AR, ARIMA, ANN and PSO-based approaches will be used in forecasting. The first step consists on studying the stationarity of a TS composed of 69 records of monthly MWD (from January 2010 to September 2015) collected from the General Directorate of Water in Hail Region. As proposed by Box–Jenkins methodology, determining whether the monthly MWDTS is stationary (data set having a constant mean and variance) (Boughadis et al. 2005) is required for modeling STS forecasting using ARIMA models. The ANN models require a pre-processing of the original records in order to normalize them in the interval [0,1]. This process will be discussed in detail when dealing with the ANN approaches. Through a suitable mapping between PSO and the monthly MWD forecasting model, an algorithm to determine both the ARIMA model parameters and orders will be detailed. The simulations carried out in this study are performed using Matlab 2013b package and implemented on an Intel® Core™ i3-3110 microprocessor with a clock of 2 GHz as frequency and 4 GB of RAM. The relative performances of the proposed PSO-ARIMA approach were evaluated using the performance indicators introduced respectively in Equations (1)–(4). They are then compared with those of the AR, ARIMA and ANN-based approaches.

Study area

Hail region (Saudi Arabia) is located in a semi-arid area (coordinates 27°31′N, 41°41′E) (Figure 2). It has a continental desert climate characterized by hot summers and cold winters. Its climate is also known to be milder than other Saudi cities due to its higher altitude (HAILWIKI). This region has a population of approximately 593,308 as per the population and Household Census 2010's results. Rapid urbanization causing intensive use of water resources leads to chronic water supply problems (Ouda et al. 2014). The demand for municipal water is steeply rising in Hail region (probably as in other Saudi regions) (Chowdhury & Al-Zahrani 2013). The General Water Directorate of Hail region is responsible for the production and distribution of municipal water. Hail region is fed from two principal treatment-pumping stations (Figure 3). Al-Shuqaiq Field is located 25 km from Hail city and produces around 70,000 m3/day of water to feed Hail city (30,000 m3 for approximately 322,000 people). The remaining quantity is fed to about 500 villages in Hail region. Al-Shuqaiq field is composed of 43 wells, a collection and purification station and pumps to send water to Al-Salf reservoir (120,000 m3) and Bzakha reservoir (100,000 m3). Al-Shuqaiq field has been chosen for the low levels of salts in the water and for being the most rich water location in the region. Bzakha was chosen for being the reservoir location because of its high altitude (1,275 m from the sea level). Al Humaima station feeds the old distribution system in Hail city. Around 40,000 m3/day are fed to, respectively, Al-Samra mountain reservoir (20,000 m3) and Uqda reservoir (20,000 m3). Hail is located at an altitude of 900 m above sea level, so water is distributed without pumping (gravity).
Figure 2

Geographic location of Hail region, Saudi Arabia.

Figure 2

Geographic location of Hail region, Saudi Arabia.

Close modal
Figure 3

Municipal water distribution sytem in Hail Region, Saudi Arabia.

Figure 3

Municipal water distribution sytem in Hail Region, Saudi Arabia.

Close modal

Hail city districts are fed with municipal water using valves operated manually according to a weekly fixed schedule (GDWHR 2004).

Evaluation of ANN techniques for monthly MWD modeling

In this section, various ANN techniques such as feed-forward (FF) and radial-basis have been evaluated via numerous performance indicators in modeling future MMWD in Hail region (Leva et al. 2015). The first step in using an ANN-based approach is to optimize the architecture of the ANN such that the relationship between the inputs and the outputs is well captured. For this aim, a number of tests have been conducted using different ANN architectures, activation functions and training algorithms (Adamowski et al. 2012). The inputs of the ANNs considered here are the MMWD of the previous months. The data set is divided into two sub-sets, training and testing. For all simulations, the first N1 = 40 records are used for training and the remaining N2 = 29 records are used for model validation. Choosing the ANN architecture is the most consuming time step when using ANNs in prediction. The best architecture is decided here by trial and error procedure. After several trials, the best ANN structure is found to consist of an input layer, two hidden layers and one output layer. The optimal numbers of neurons are found to be, respectively, 60 (for the input layer), 16 (for each one of the two hidden layers) and 10 (for the output layer). Linear activation functions are used for the input and the output layers whereas logarithmic-sigmoid activation function is used for the two hidden layers. The BP-FF-ANN has been trained using the Levenberg–Marquardt, the resilient back-propagation and the gradient Powell–Beale algorithms. The last one is found to provide better results than the two others.

We define the order of the BP-FF-ANN as the number of inputs for comparison purpose against other techniques. The maximum number of previous monthly MWD used as inputs is set to six. So, six BF-FF-ANN algorithms are trained and tested. The obtained results are depicted in Table 2.

Table 2

Performance indicators (training + testing) of BP-FF-ANN models of MMWD in Hail region

ModelMAPE (%)R2RMSE (m3)AARE(%)
BP-FF-ANN (1) 10.6534 0.7426 4.3552 × 105 9.2345 
BP-FF-ANN (2) 11.8427 0.6629 4.9971 × 105 10.1552 
BP-FF-ANN (3) 6.5671 0.9115 2.5753 × 105 6.3571 
BP-FF-ANN (4) 11.3592 0.6691 4.7912 × 105 9.8243 
BP-FF-ANN (5) 12.9392 0.5581 5.7695 × 105 10.8166 
BP-FF-ANN (6) 17.1062 0.2380 7.6783 × 105 14.2835 
ModelMAPE (%)R2RMSE (m3)AARE(%)
BP-FF-ANN (1) 10.6534 0.7426 4.3552 × 105 9.2345 
BP-FF-ANN (2) 11.8427 0.6629 4.9971 × 105 10.1552 
BP-FF-ANN (3) 6.5671 0.9115 2.5753 × 105 6.3571 
BP-FF-ANN (4) 11.3592 0.6691 4.7912 × 105 9.8243 
BP-FF-ANN (5) 12.9392 0.5581 5.7695 × 105 10.8166 
BP-FF-ANN (6) 17.1062 0.2380 7.6783 × 105 14.2835 

All the models were developed in the same conditions (architecture and training parameters). The unique difference is the number of inputs. The BF-FF-ANN(3) was found to provide more accurate monthly MWD forecasts than the other similar approaches. The best model was a function of the MWD of the three previous months. In fact, it had the lowest value of the MAPE (6.5671%), the highest value of the coefficient of determination R2 (0.9115), and the lowest RMSE (2.5753 × 105m3) and AARE (6.3571%). It is observed that the model's performance indicators degrade when the model order increases. At the beginning, the ANNs were trained using the original monthly MWD. The results were very poor since the activation functions have to operate in the range [0, 1]. Consequently a scaling procedure was used. The original data were divided by 107. We can notice here that this transformation should not change the mean or the data distribution and that other normalization approaches can be used to re-scale the original data. The plot of the best BF-FF-ANN(3) predicted versus observed monthly MWD in Hail region is shown in Figure 4. It can be seen that this ANN-based approach presents high ability to track the consumption trend and seasonality.
Figure 4

Forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with BP-FF-ANN(3).

Figure 4

Forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with BP-FF-ANN(3).

Close modal

Exploratory analysis

As mentioned before, predicting a time-series based on a static analysis of a time set is conducted in general through the Box–Jenkins methodology. The condition of stationarity has to be fulfilled (Vantuch & Zelinka 2015). A time-series is said to be stationary if it does not present any trend or seasonality through time (Ngo & Bros 2013). The monthly MWD set (composed of 69 records) presents the autocorrelation function (ACF) depicted in Figure 5.
Figure 5

Autocorrelation function of the original time-series of monthly MWD.

Figure 5

Autocorrelation function of the original time-series of monthly MWD.

Close modal
It is shown that the ACF dies down extremely slowly, then it should be considered as non-stationary. The original TS is then differenced d times until a stationary set is obtained. Usually, one difference of the data set is sufficient. By examining the plot of the ACF function of the differenced TS (see Figure 6), one can observe that it cuts off fairly quickly. The parameter d in the ARIMA(p,d,q) is then set to 1. The size of the differenced TS becomes 68.
Figure 6

Autocorrelation function of the once-differenced time-series of monthly MWD.

Figure 6

Autocorrelation function of the once-differenced time-series of monthly MWD.

Close modal

AR models

The Econometrics toolbox of Matlab has been used as the modeling tool for the ARIMA STS studied in this paper (the AR model is considered as a particular case of ARIMA with q = 0). The selected and calibrated models are used for simulation and forecasting. The ‘arima’ function is used to specify the model and the ‘estimate’ function is used to estimate the model polynomial coefficients. These two functions assume that the model orders (p and q) are determined in advance. The ‘estimate’ function uses the maximum likelihood principle which finds the maximum probability density function corresponding to the optimal values of the model parameters.

The first experiment conducted here is to model the monthly MWD in Hail region using only the pure AR part of ARIMA(p,q,d). The current monthly MWD is defined as a function of time-lagged previous elements. Five different models, respectively of orders 1, 2, 3, 4 and 5, are tested and compared according to the MAPE, R2, RMSE, AARE, and AIC performance indicators. The results are summarized in Table 3 and the plots of the forecasted and the observed monthly MWD is depicted in Figure 7 (only the graph of the best AR model is presented here).
Table 3

Performance indicators (training + testing) of AR models for MMWD in Hail region

ModelMAPE (%)R2RMSE (m3)AARE (%)AIC
AR(1) 5.7864 0.9017 2.6728 × 105 5.7260 24.0842 
AR(2) 5.7585 0.9238 2.7115 × 105 5.4833 24.9739 
AR(3) 5.4007 0.9201 2. 31191 × 105 5.3800 24.0005 
AR(4) 5.5501 0.9203 2. 4240 × 105 5.6355 24.0661 
AR(5) 5.6534 0.9209 2. 1412 × 105 5.3555 24.1219 
ModelMAPE (%)R2RMSE (m3)AARE (%)AIC
AR(1) 5.7864 0.9017 2.6728 × 105 5.7260 24.0842 
AR(2) 5.7585 0.9238 2.7115 × 105 5.4833 24.9739 
AR(3) 5.4007 0.9201 2. 31191 × 105 5.3800 24.0005 
AR(4) 5.5501 0.9203 2. 4240 × 105 5.6355 24.0661 
AR(5) 5.6534 0.9209 2. 1412 × 105 5.3555 24.1219 
Figure 7

Hydrograph of the forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with STS – AR(3) model.

Figure 7

Hydrograph of the forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with STS – AR(3) model.

Close modal

From the results of Table 3, it can be observed that the AR(3) model performed better than the other models with the lowest MAPE (5.4007%), the highest R2(0.9201) and the lowest Akaike information criterion (AIC) (24.0005).

ARIMA(p,1,q) models

In order to construct the best ARIMA(p,1,q) model, the orders p and q need to be determined. Many combinations of the couple (p,q) have been tested and evaluated according to the four previously defined performance indicators. The one-step-ahead forecasting is considered here. Four models along with their coefficients and performance indicators for both training and testing data are shown in Table 4.

Table 4

ARIMA (p,1,q) results

ModelParameters
Performance indicators (training + testing)
Ca1a2a3b1MAPE (%)R2RMSE (m3)AARE (%)
ARIMA (1,1,1) 16287 0.6611 – – 0.3563 6.1491 0.9119 2.5508 × 105 5.4566 
ARIMA (2,1,1) 21846 −1.0202 −0.0202 – 0.9394 8.3298 0.7744 4.1149 × 105 9.1475 
ARIMA (3,1,1) 356,581 0.5744 0.3190 −0.2738 −1 7.9462 0.7124 4.6811 × 105 9.2233 
ARIMA (1,0,1) 346,677 0.8467 – – −0.2257 7.0535 0.8962 2.8557 × 105 7.5595 
ModelParameters
Performance indicators (training + testing)
Ca1a2a3b1MAPE (%)R2RMSE (m3)AARE (%)
ARIMA (1,1,1) 16287 0.6611 – – 0.3563 6.1491 0.9119 2.5508 × 105 5.4566 
ARIMA (2,1,1) 21846 −1.0202 −0.0202 – 0.9394 8.3298 0.7744 4.1149 × 105 9.1475 
ARIMA (3,1,1) 356,581 0.5744 0.3190 −0.2738 −1 7.9462 0.7124 4.6811 × 105 9.2233 
ARIMA (1,0,1) 346,677 0.8467 – – −0.2257 7.0535 0.8962 2.8557 × 105 7.5595 

It has been found that the ARIMA (1,1,1) is the most parsimonious model among all the tested models. Moreover, ARIMA(1,1,1) is judged to be adequate even when the non-stationary TS observations are used. Figure 8 illustrates the forecasted against the observed monthly MWD in Hail region obtained by ARIMA(1,1,1). This model is found to perform better than all the other models in terms of the performance indices (lowest MAPE, RMSE and AARE and highest R2). It can be concluded that although it is designed for linear data, the ARIMA model has the ability to capture non-linear behavior and produce relatively good fit for forecasting water demand. Intuitively, ARIMA(1,1,1) has the least number of parameters allowing it to acquire the minimum AIC. Consequently, it should be the best model according to the AIC performance indicator.
Figure 8

Forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with STS-ARIMA(1,1,1) model.

Figure 8

Forecasted vs. observed monthly MWD (cubic meter) for Hail region obtained with STS-ARIMA(1,1,1) model.

Close modal
Since the original time-series, y(t) is not stationary, it has been differentiated once. The new once-differenced time-series (Z(t) = y(t)−y(t−1)) is found to be stationary (see above under ‘Exploratory analysis’). Considering that Equation (5) operates only on stationary time-series, y(t) has to be replaced by Z(t). Thus, Equation (5) on Z(t) becomes Z(t) = C + a1Z(t−1) + e(t) + b1e(t−1). By substituting Z(t) = y(t)−y(t−1) and Z(t–1) = y(t−1)–y(t−2), the monthly MWD forecasting model is then given by the following equation:
10

PSO–ARIMA(p,d,q) models

Since MMWD in Hail region is derived from a stochastic process known to be highly non-linear and eventually presenting irregularities, the resulting modeling-identification problem is classified as hybrid and difficult to solve. This problem presents a high level of algorithmic complexity. In such conditions, we have to resort to the PSO technique to address the modeling aspect of future water consumptions underlying on the limited available data (Parsopoulos & Vrahatis 2002).

Swarm model definition

The first step in a PSO-based approach is to ensure the mapping between the PSO world and the problem to be solved. As it has been mentioned before, the one-step-ahead value of monthly MWD can be modeled using the ARIMA(p,d,q) Box–Jenkins methodology. The ARIMA(p,d,q) model defined in Equation (5) is adopted here and both the identification and estimation steps are performed by PSO. The position vector is then defined by Equation (11):
11

This encoding scheme presents the particularity that it contains, simultaneously, integer and real-coded variables. are real-coded variables, p and q are integer variables. To avoid unnecessary computations, the maximum values for p and q respectively are set to 5. Therefore, once p and q are known, all the coefficients between (respectively between) are set to 0. The search-space limits are defined in Table 5.

Table 5

Search-space limits

ParameterMinMaxNatureMeaning
10−4 × C 35 Real Scalar constant in the linear time-series model as defined in Equation (5) 
p,q Integer Degrees of the compound AR and the compound moving average (MA) polynomials 
a1, …, ap −1 Real Coefficients of the AR polynomial 
b1, …, bq −1 Real Coefficients of the MA polynomial 
ParameterMinMaxNatureMeaning
10−4 × C 35 Real Scalar constant in the linear time-series model as defined in Equation (5) 
p,q Integer Degrees of the compound AR and the compound moving average (MA) polynomials 
a1, …, ap −1 Real Coefficients of the AR polynomial 
b1, …, bq −1 Real Coefficients of the MA polynomial 

Since we have no idea about the magnitude of the ARIMA model coefficients, the objective function f in Equation (12) has been minimized by the ‘fminunc’ (unconstrained minimization function of the optimization toolbox in Matlab) with different initial guesses (because it is a local gradient-based minimization function). Values of C have been found to range from approximately 10,000 to 350,000 and the coefficients a1, a2, …, ap, b1,b2, …, bq are found to be between −1 and 1.

Fitness function evaluation

Let be the ith particle position, and then the quality of this particle is evaluated through its fitness function. The objective function adopted in this study is the squared error between the observed and the forecasted values for the monthly MWD defined in Equation (12) (Hu 2013):
12
One of the features of the PSO-based approach is that it does not require any computation of the objective function derivatives. The PSO concept is based in particular on the computation of the objective function itself.

PSO–ARIMA(p,d,q) procedure for modeling monthly MWD

The PSO–ARIMA(p,d,q) approach for modeling monthly MWD in Hail region is described as follows.

  1. Set the PSO synthesis parameters as follows: wmax = 0.95, wmin = 0.35, nb_part = 20, c1 = c2 = 0.75, maxIter_nb = 100.

  2. Load the historical records of the monthly MWD in Hail region from an excel file (y(t), t = 1:69).

  3. Generate a white-noise signal (e(t), t = 1:69).

  4. Set the iteration index to k = 1.

  5. Initialize the swarm particles randomly within the search-space limits as defined in Table 5.

  6. For each particle, compute the fitness function as defined in Equation (12) using the N1 = 40 records.

  7. Determine the index of the best particle among the swarm and set to the position of this particle.

  8. Set the personal best of each particle to its current position.

  9. Increment the iteration index k = k + 1.

  10. Update the position of each particle according to Equations (8) and (9).

  11. As are integer variables, round them off to the nearest integer values over {1, 2, 3, 4, 5}.

  12. If one of the real-coded variables comes out of the search-space limits defined in Table 5, confine it by resetting within the search-space.

  13. Compute the fitness function for each particle according to Equation (12). Compare the current fitness of each particle to its previous personal best fitness. If there is improvement, then set .

  14. Determine the index of the best particle in the swarm and update accordingly.

  15. Test the stopping criterion, if not reached loop to Step 9. If reached, record as the best solution.

Adaptation procedure for the real and integer decision variables

In this subsection, more emphasis on steps (10) and (11) of the PSO-ARIMA procedure is put forward to clarify the adaptation procedure used to simultaneously handle the real and integer variables of the studied forecasting problem. By reference to Equation (11) for the position vector and to Table 5 for the meaning and limits of the position components, the following example is considered. Assume that at the kth iteration, the decision variables vector of the ith particle is given by Xki = [29 × 104, 2.11, 0.7, −0.8, 3.2, −1.7, 3.11, 3.92, 0.05, −0.2, 0.17, 4.5 2.73] (this vector is the result of step (10)). Since p and q are integer variables, the second and the eighth related values are rounded-off to the nearest integer numbers. p = round-off (2.11) = 2 and q = round-off (3.92) = 4. Consequently, a3 = a4 = a5 = 0 and b5 = 0. As a result of the adaptation procedure (output of step (11)), the ith particle position vector is then given by Xki = [29 × 104, 2, 0.7, −0.8, 0, 0, 0, 4, 0.05, −0.2, 0.17, 4.5, 0].

Numerical results

The simulations were performed in a Matlab environment using our proper code. The data set was divided into a modeling sub-set (composed of 40 records) and a validation sub-set (composed of the remaining 29 records). The PSO algorithm was conceptually stochastic. For this reason, 300 runs were carried out and the results of the best run are depicted in Table 6. The corresponding best solution is given in Equation (13):
13
Since the optimal values obtained for p and q are equal to 1, all the coefficients are equal to 0. The best model is then expressed as in Equation (14):
14
Table 6

Comparison between the performance indicators of the best PSO-ARIMA, BP-FF-ANN(3) and STS-ARIMA approaches

 Performance indicators
ModelMAPE (%)
RMSE (m3)
R2
AARE (%)
PhaseTraining + testingTrainingTestingTraining + testingTrainingTestingTraining + testingTrainingTestingTraining + testingTrainingTesting
BP-FF-ANN(3) 6.5671 6.4609 6.4977 2.5735 × 105 2.2320 × 105 2.3611 × 105 0.9115 0.9189 0.6346 6.3571 6.2218 7.2207 
STS-AR(3) 5.4007 5.4672 5.2009 2.3119 × 105 2.3232 × 105 2.3707 × 105 0.9201 0,5125 0.8582 5.3800 5.3570 5.4473 
STS-ARIMA(1,0,1) 7.0535 7.2234 8.2817 2.8597 × 105 2.7733 × 105 3.2787 × 105 0.8962 0.8736 0.7036 8.2236 8.3270 7.1179 
STS-ARIMA(1,1,1) 6.1491 6.0917 7.4320 2.5528 × 105 2.4678 × 105 2.7965 × 105 0.9119 0.8233 0.7958 7.3255 7.8917 7.3576 
PSO-ARIMA 5.2832 4.8236 5.3217 2.2111 × 105 1.8932 × 105 2.2367 × 105 0.9375 0.9452 0.8289 5.2911 5.0733 5.3319 
 Performance indicators
ModelMAPE (%)
RMSE (m3)
R2
AARE (%)
PhaseTraining + testingTrainingTestingTraining + testingTrainingTestingTraining + testingTrainingTestingTraining + testingTrainingTesting
BP-FF-ANN(3) 6.5671 6.4609 6.4977 2.5735 × 105 2.2320 × 105 2.3611 × 105 0.9115 0.9189 0.6346 6.3571 6.2218 7.2207 
STS-AR(3) 5.4007 5.4672 5.2009 2.3119 × 105 2.3232 × 105 2.3707 × 105 0.9201 0,5125 0.8582 5.3800 5.3570 5.4473 
STS-ARIMA(1,0,1) 7.0535 7.2234 8.2817 2.8597 × 105 2.7733 × 105 3.2787 × 105 0.8962 0.8736 0.7036 8.2236 8.3270 7.1179 
STS-ARIMA(1,1,1) 6.1491 6.0917 7.4320 2.5528 × 105 2.4678 × 105 2.7965 × 105 0.9119 0.8233 0.7958 7.3255 7.8917 7.3576 
PSO-ARIMA 5.2832 4.8236 5.3217 2.2111 × 105 1.8932 × 105 2.2367 × 105 0.9375 0.9452 0.8289 5.2911 5.0733 5.3319 

The performance of both training and validation data sets considered together as well as the training and validation data sets considered separately are summarized in Table 6. The results of the best BP-FF-ANN(3), AR(3), STS-ARIMA(1,0,1) using the original records and the STS-ARIMA(1,1,1) using the stationarized records are included in the same table for comparison.

The graph of the actual and forecasted monthly MWD is depicted in Figure 9. The convergence characteristics of the optimal PSO-ARIMA structure are depicted respectively in Figures 1013, which reveal that the values of the best fitness function decreases along with a constant value since the iteration 30 (from an optimization process of 100 iterations). Similar behaviors are shown for, respectively, the optimal p, q, a1 and b1. In fact, all these parameters converge to their optimal values reported in the optimal global solution (Equation (14)).
Figure 9

Actual vs. forecasted monthly MWD (cubic meter) in Hail region obtained with PSO-ARIMA(1,1,1) approach.

Figure 9

Actual vs. forecasted monthly MWD (cubic meter) in Hail region obtained with PSO-ARIMA(1,1,1) approach.

Close modal
Figure 10

Convergence of the best RMSE function along with the identification process.

Figure 10

Convergence of the best RMSE function along with the identification process.

Close modal
Figure 11

Convergence of the (a) AR part order p and (b) MA part order q.

Figure 11

Convergence of the (a) AR part order p and (b) MA part order q.

Close modal
Figure 12

Convergence of the AR part coefficient a1 and the MA part coefficient b1.

Figure 12

Convergence of the AR part coefficient a1 and the MA part coefficient b1.

Close modal
Figure 13

Convergence of the whole swarm for a1 and b1: (a) swarm initialization (iteration 1), (b) swarm evolution (iteration 20), (c) swarm evolution (iteration 50) and (d) swarm convergence (iteration 100).

Figure 13

Convergence of the whole swarm for a1 and b1: (a) swarm initialization (iteration 1), (b) swarm evolution (iteration 20), (c) swarm evolution (iteration 50) and (d) swarm convergence (iteration 100).

Close modal

As shown in Figure 13, the behavior of the whole swarm along with the iterations is shown respectively for iteration 1, 20, 50 and 100. The proposed PSO-based approach drives the swarm to a steady state (for the optimal a1 and b1). The remaining optimal parameters behave similarly.

Comparative analysis and concluding remarks

According to the obtained results, it has been shown that the used PSO synthesis parameters are effective for modeling the monthly MWD in Hail region. The set of parameters have been used successfully for other engineering problems such as those presented in Zhang (2003), Huang et al. (2005) and El-Telbany & El-Karmi (2008). The unique exception concerns with c1 and c2. In fact, c1 = c2 = 2.05 (as in the cited papers) conduct to a severe oscillating behavior. In this study they have been set to 0.75. Moreover, the PSO-based approach has the advantage of simultaneously handling real-coded and integer-coded decision variables. Thus the identification of the AR part order and MA part order of the ARIMA model and the estimation of the polynomial coefficients are performed together in a unique framework. This can save time in design compared to STS and ANN-based approaches. The proposed PSO-ARIMA approach should be the preferred approach (over ANN and STS) due to its simplicity without needing data pre-processing (data re-scaling in ANN and stationarization in STS). The comparative results in Table 6 reveal that the PSO-based approach outperforms both ANN and STS approaches in terms of the four computed performance indicators (a MAPE of 5.2832%, an R2 of 0.9375, an RMSE of 2.2111 × 105m3 and an AARE of 5.2911%) for the global data sets and for the training and validation data sets considered separately. The testing phase is the one that is more representative of the actual performances of the model in a real application framework and the RMSE is the most significant performance indicator because it represents the objective function to be minimized. The value of the RMSE for the PSO-ARIMA in the testing phase (2.2111 × 105 m3) is better than the RMSE of the STS-AR(3) in the same phase (2.3119 × 105 m3). By computing the relative difference between the two values, it has been found that the PSO-ARIMA is 4.36% better than the STS-AR(3).

In terms of monthly MWD profile tracking, all the developed models present a whole behavior characterized by forecasted values that follow the actual values relatively closely.

In this paper, an approach based on an ARIMA model combined to the PSO as an identification tool was developed and applied to the forecasting of one-month-ahead municipal water demand in Hail region, Saudi Arabia. The modeling–identification problem included integer-valued (model orders) and real-valued (model coefficients) decision variables. It was then considered as a hybrid complex optimization problem for which classical optimization techniques suffer from combinatoric explosion (for the discrete aspect) and high ability to be trapped into local minimum points (for the continuous aspect). The proposed approach has the advantage of optimizing integer and real variables simultaneously. Moreover, it has been shown to operate efficiently in complex search-spaces and converge to near-optimal solution in a reasonable computation time. Since the PSO is a conceptually stochastic optimization algorithm, 300 runs have been carried out using 69 records of monthly water demand. From the results of the best PSO-ARIMA run reported in this paper, this approach has been shown to outperform BP-FF-ANN, STS-AR and STS-ARIMA based solutions in terms of four performance indicators namely the MAPE, R2, RMSE and AARE for both training and testing data sets.

More than being relatively better in term of forecasting accuracy, the proposed PSO-ARIMA approach should be preferred for being insensitive to the forecasting problem irregularities including convexity and differentiability. In addition, the PSO-ARIMA does not require any data pre-processing (data re-scaling in ANN and data stationarity in STS).

Forecasting monthly MWD is classified as medium-range; it can participate to efficient operation and management of an existing water supply system. Hail water utilities can take benefit from the results of this study to develop efficient plans for optimized system operation and ensure balancing between water need and supply. The developed monthly MWD in Hail region are fundamental in taking decisions in water management issues. Establishing efficient pricing policies, planning new system developments and optimizing sizes and operation procedures of the supply system (pumps, reservoirs) are among these issues. Results obtained for monthly MWD by similar studies (Firat et al. 2009; Babel et al. 2014) demonstrated that using other explanatory variables (such as population statistics, water bill, subscribers’ number, climatic variables, etc.) can improve forecasting results accuracy.

To study the effect of climatic changes on monthly MWD, other ways can be explored. In particular, for Hail region, two periods (summer and winter) will be considered separately. If sufficient metering devices are available, the living levels in different cities of Hail town will be taken into consideration. A future issue will be based on forecasting climatic and socio-economic variables first and after that use them to forecast water consumption.

This research project was granted by the Deanship of Scientific Research in Hail University, Saudi Arabia under the number E22 CC.

Aitmaatallah
O.
Achuthan
A.
Janoyan
K.
Marzocca
P.
2015
Recursive wind speed forecasting based on Hammerstein Auto-Regressive model
.
Appl. Energy
145
,
191
197
.
Ajbar
A.
Ali
E. M.
2013
Prediction of Municipal water production in touristic Mecca City in Saudi Arabia using neural networks
.
J. King Saud. Univ. Eng. Sci.
27
,
83
91
.
Akbari-Alashti
H.
Haddad
O. B.
Marino
M. A.
2015
Evaluation of developed discrete time-series method in flow forecasting models
.
Water Resour. Manage.
29
,
3211
3225
.
Araghienajad
S.
2014
Time series modeling in data-driven modeling: using Matlab
.
Water Res. Environ. Eng.
7
,
12
25
.
Bai
Y.
Xie
J.
Wang
X.
Li
C.
2016
Model fusion approach for monthly reservoir inflow forecasting
.
J. Hydroinform.
18
(
4
),
634
650
.
Beheshti
Z.
Firouzi
M.
Shamsuddin
S. M.
Zibarzani
M.
Yusop
Z.
2015
A new rainfall forecasting model using the CAPSO algorithm and an artificial neural network
.
J. Neur. Comput. Appl.
27
(
8
),
2551
2565
.
Billings
R. B.
Agthe
D. E.
1998
State-space versus multiple regression for forecasting urban water demand
.
J. Water Resour. Plan. Manage.
124
,
113
117
.
BJM
The Box-Jenkins Method: Chapter 470
.
NCSS Statistical Software
,
1
14
.
Boughadis
J.
Adamowski
K.
Diduch
R.
2005
Short-term municipal water demand forecasting
.
J. Hydrol. Proc.
19
,
137
148
.
Buyukyildiz
M.
Tezel
G.
Yilmaz
V.
2014
Estimation of the change in lake water level by artificial intelligence methods
.
Water Resour. Manage.
28
,
4747
4763
.
Chau
K.
2004
A split-step particle swarm optimization algorithm in river stage forecasting
.
J. Hydrol.
346
(
3
),
131
135
.
Chowdhury
C.
Al-Zahrani
M.
2013
Characterizing water resources and trends of sector wise water consumptions in Saudi Arabia
.
J. King Saud Univ. Eng. Sci.
27
(
1
),
68
82
.
DelValle
Y.
2008
Particle swarm optimization: basic concepts, variants and applications in power systems
.
IEEE Trans. Evol. Comp.
12
,
55
72
.
Donkor
E. A.
Mazzuchi
T. A.
Soyer
R.
Roberson
J.
2014
Urban water demand forecasting: review of methods and models
.
J. Water Resour. Plan. Manage.
140
,
146
159
.
GDWHR
2004
Report of the projects between 1425–1433 H. Internal report published by General Directorate of Water in Hail Region, Saudi Arabia
.
HAILWIKI
,
Ha'il
.
Herrera
M.
Torgo
L.
Izquierdo
J.
Perez-Garcia
R.
2010
Predictive models for forecasting hourly urban water demand
.
J. Hydrol.
387
,
141
150
.
Herrera
M.
Garcia
D.
Izquierdo
J.
Perez-Garcia
R.
2011
Municipal water demand forecasting: tools for intervention time series
.
J. Stoch. Anal. Appl.
29
,
998
1007
.
Huang
C. M.
Huang
C. J.
Wang
M. L.
2005
A particle swarm optimization to identifying the ARMAX model for short-term load forecasting
.
IEEE Trans. Pow. Sys.
20
(
2
),
1126
1133
.
Leva
S.
Dolara
A.
Grimaccia
F.
Mussetta
M.
Ogliari
E.
2015
Analysis and validation of 24 hours ahead neural network forecasting of photovoltaic output power
.
Math. Comp. Sim.
131
,
88
100
.
Meher
J.
Jha
R.
2013
Time-series analysis of monthly rainfall data for the Mahanadi River Basin, India
.
J. Sci. Cold Arid Reg.
5
,
73
84
.
Ngo
T. H. D.
Bros
W.
2013
The Box-Jenkins methodology for time series models. J. Stat. Data Analysis, SAS Global Forum, San Francisco, USA, April 28–May 1, paper no. 454
.
Ouda
O. K. M.
Al-Waked
R.
Alshehri
A. A.
2014
Privatization of water-supply services in Saudi Arabia: a unique experience
.
J. Utilit. Policy.
31
,
107
113
.
Perea
R. G.
Poyato
E. C.
Montesinos
P.
Diaz
J. A. R.
2015
Irrigation demand forecasting using artificial neuro-genetic networks
.
Water Resour. Manage.
29
,
5551
5567
.
Saravanan
S.
Nithya
R.
Kannan
S.
Thangara
C.
2015
Forecasting India's electricity consumption using particle swarm optimization
.
Power Electr. Ren. Energy Sys. Lec. Notes Elec. Eng., Springer India.
326
,
843
85l
.
Sudheer
C.
Shrivastava
N. A.
Panigrahi
B. K.
Mathur
S.
2011
Groundwater level forecasting using SVM-QPSO
. In:
Swarm, Evolutionary and Memetic Computing
,
Vol. 7076 of the series Lecture Notes in Computer Science, B. K. Panigrahi et al. (eds)
.
Springer-Verlag
,
Berlin, Heidelberg
. pp.
731
741
.
Tutunji
T.
Molhim
M.
Turki
E.
2007
Mechatronic systems identification using an impulse response recursive algorithm
.
Sim. Model. Prac. Theory.
15
,
970
988
.
Vafeiadis
T.
Papanikolaou
A.
Ilioudis
C.
Charchalakis
S.
2012
Real-time network data analysis using time series models
.
Sim. Model. Prac. Theory
29
,
173
180
.
Vantuch
T.
Zelinka
I.
2015
Evolutionary based ARIMA models for stock price forecasting
. In:
ISCS 2014: Interdisciplinary Symposium on Complex Systems
,
Vol. 14 of the series Emergence, Complexity and Computation, Zelinka I. et al. (eds)
.
Ostrava, Poruba
,
Czech Republic
, pp.
239
247
.
Wang
B.
Tai
N. L.
Zhai
H. Q.
Ye
J.
Zhu
J. D.
Qi
L. B.
2008
A new ARMAX model based on evolutionary algorithm and particle swarm optimization for short-term load forecasting
.
Elec. Power Syst. Res.
78
,
1679
1685
.
Zhou
S. L.
McMahon
T. A.
Walton
A.
Lewis
J.
2000
Forecasting daily urban water demand: a case study of Melbourne
.
J. Hydrol.
236
,
153
164
.