In this paper, a modeling-identification approach for the monthly municipal water demand system in Hail region, Saudi Arabia, is developed. This approach is based on an auto-regressive integrated moving average (ARIMA) model tuned by the particle swarm optimization (PSO). The ARIMA (p, d, q) modeling requires estimation of the integer orders p and q of the AR and MA parts; and the real coefficients of the model. More than being simple, easy to implement and effective, the PSO-ARIMA model does not require data pre-processing (original time-series normalization for artificial neural network (ANN) or data stationarization for traditional stochastic time-series (STS)). Moreover, its performance indicators such as the mean absolute percentage error (MAPE), coefficient of determination (R2), root mean squared error (RMSE) and average absolute relative error (AARE) are compared with those of ANN and STS. The obtained results show that the PSO-ARIMA outperforms the ANN and STS approaches since it can optimize simultaneously integer and real parameters and provides better accuracy in terms of MAPE (5.2832%), R2 (0.9375), RMSE (2.2111 × 105m3) and AARE (5.2911%). The PSO-ARIMA model has been implemented using 69 records (for both training and testing). The results can help local water decision makers to better manage the current water resources and to plan extensions in response to the increasing need.
INTRODUCTION
In arid climate countries such as Saudi Arabia, reliable municipal water demand modeling is crucial in both the medium and long term. In fact, water demand is steeply rising and consequently water utilities are invited to make suitable decisions in order to ensure good planning and management of scarce and limited water resources. Operational decisions are in general based on both the present requirements and the expected demands. Water demand can be viewed as a dynamic system and requires mathematically modeling. The developed models can be useful for many purposes such as scheduling, diagnosis, management, control and forecasting. In this paper, monthly municipal water demand (MMWD) is posed as an identification problem and an optimization procedure has been established to solve it. The outlined model is used for forecasting purposes.
This study is concerned with developing MMWD models for the case study of Hail, Saudi Arabia, taking into account historical records of water consumption as unique explanatory variables. A wide variety of models and methods have been used by researchers and water forecasting practitioners. Water demand forecasting problem solving techniques can roughly be categorized into three classes: (1) time-series stochastic methods (e.g. Zhou et al. 2000; Mohamed & Al-Mualla 2010; Almutaz et al. 2013; Ngo & Bros 2013; Mehr & Jha 2013; McNown et al. 2015; Vantuch & Zelinka 2015); (2) artificial neural networks (ANNs) (e.g. Ajbar & Ali 2013; Babel et al. 2014); and (3) swarm intelligence optimization techniques (Chau 2004; Huang et al. 2005; Sun et al. 2012; Aladag et al. 2013; Nazari et al. 2015). According to the forecasting horizon, small-scales (hourly, daily and weekly) approaches have been investigated in, for example, Zhang 2003; Boughadis et al. 2005; Adamowski & Karapataki 2010; Herrera et al. 2010; Adamowski et al. 2012; Adebiyi et al. 2014; Al-Zahrani 2014; Perea et al. 2015; and Tiwari & Adamowski 2015. For long-scales (annual) forecasting horizon, efficient frameworks have been developed and their relative performances measured (e.g. Zhang 2003; Khashei & Bijari 2010; Sun et al. 2012; Almutaz et al. 2013; Babel et al. 2014; Donkor et al. 2014; Nazari et al. 2015; Ouda 2014; Yalcintas et al. 2015). Since this study is concerned only with monthly water demand forecasting (e.g. Billings & Agthe 1998; Firat et al. 2009; Jalalkamali et al. 2011; Nasseri et al. 2011; Sudheer et al. 2011; Yasar et al. 2012; Ajbar & Ali 2013; Aladag et al. 2013; Almutaz et al. 2013; Mehr & Jha 2013; Babel et al. 2014; Akbari-Alashti et al. 2015; Beheshti et al. 2015; Bai et al. 2016), this paper presents and analyses a relatively recent state-of-the-art. Particular attention is allocated to models (mathematical descriptions of the problem), methods (used to solve the problem) and results (obtained). The study area, the performance measures and the case study particular aspects are mentioned if available. Inspired from Donkor et al. (2014), an annotated reference list of selected papers related to monthly water forecasting problems is included in Table 1.
Reference . | Model/ purpose/method . | Location . | Explanatory variables . | Results . |
---|---|---|---|---|
Donkor et al. (2014) | Literature review on urban water demand from 2000 to 2010 | – | – | Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and case-sensitive |
Yasar et al. (2012) | Application of nonlinear regression (NLR) for forecasting monthly water demand | Adana city, Turkey |
| Water consumption in Adana city will increase from 3.84 million m3 in 2009 to 4.99 million m3 in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature |
Babel et al. (2014) | Use of future climatic and socio-economic previously forecasted data for forecasting monthly water demand using ANNs | Bangkok |
|
|
Firat et al. (2009) | Evaluation and comparison of ANN techniques for MMWD | Izmir, Turkey |
| The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables |
Ajbar & Ali (2013) | Use of ANN for predicting monthly municipal water consumption | Mecca city, Saudi Arabia |
| ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons |
Nasseri et al. (2011) | Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP) | Tehran, Iran | Historic records of monthly water consumption | Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself |
Baheshti et al. (2015) | Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN | Johor state, Malaysia |
| The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are pre-processed |
Buykyildiz et al. (2014) | Estimation of the change in lake water level by artificial intelligence methods including PSO-ANN, SVR, MLP, RBNN and ANFIS | Lake Beysehir, Turkey |
| SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R2 of 0.9988 |
Sudheer et al. (2011) | Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R | Rentashintala, India | Historic records | SVM-QPSO is better than ANN in terms of RMSE (0.43), R2 (0.94) and efficiency (0.84) |
Mehr & Jha (2013) | Analysis of monthly rainfall data using time-series based approaches: ARIMA and forecasting for the next 12 years for different district towns | Mahanadi River Basin, India | Historic records | The ARIMA(1,0,0)(0,1,1)12 is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns |
Billings & Agthe (1998) | Use of state-space approach versus multiple-regression for forecasting monthly urban water demand | USA |
| Mean absolute error of forecast ranges from 7.4 to 14.8%. State-space is found to be better than the multiple-regression model on the studied water demand set |
Akbari-Alashti et al. (2015) | Evaluation of developed discrete time-series in flow forecasting models: ARMAX, ANN, GP | East Azerbaijan province, Iran |
| GP is found to be the more precise model with R2 = 0.7 and RMSE = 0.172 |
Jalalkamali et al. (2011) | Forecasting groundwater level in a well using neuro-fuzzy and ANN | Kerman plain, Iran |
| NF and ANN techniques are good choices for predicting groundwater level |
Wang et al. (2016) | Three hybrid stochastic models have been used for predicting available water resources, water demand and water use | Urimqi, China |
| The results are assumed by the authors to help managers and policy-makers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future |
Bai et al. (2016) | Grey relational analysis model is used for forecasting monthly reservoir inflow | Anhui, China |
| The proposed approach is more accurate than traditional techniques |
Reference . | Model/ purpose/method . | Location . | Explanatory variables . | Results . |
---|---|---|---|---|
Donkor et al. (2014) | Literature review on urban water demand from 2000 to 2010 | – | – | Wide variety of models and methods have been used. Applications differ depending on forecast variable, periodicity and case-sensitive |
Yasar et al. (2012) | Application of nonlinear regression (NLR) for forecasting monthly water demand | Adana city, Turkey |
| Water consumption in Adana city will increase from 3.84 million m3 in 2009 to 4.99 million m3 in 2020. MWD is affected particularly by the total number of subscribers and the atmospheric temperature |
Babel et al. (2014) | Use of future climatic and socio-economic previously forecasted data for forecasting monthly water demand using ANNs | Bangkok |
|
|
Firat et al. (2009) | Evaluation and comparison of ANN techniques for MMWD | Izmir, Turkey |
| The Generalized Regression NN (GRNN) was found to be the best model when using the monthly water bill, the population and the average temperature as the most significant explanatory variables |
Ajbar & Ali (2013) | Use of ANN for predicting monthly municipal water consumption | Mecca city, Saudi Arabia |
| ANN approach is found to be better than the econometric model. Moreover, it particularly captures the effect of the number of visitors since Mecca is a holy town visited by people for religious reasons |
Nasseri et al. (2011) | Forecasting monthly urban water demand using extended Kalman filter (EKF) and genetic programming (GP) | Tehran, Iran | Historic records of monthly water consumption | Using hybrid EKF and GP is suitable for calibrating water demand using historic records of the water consumption itself |
Baheshti et al. (2015) | Forecasting monthly rainfall using the centripetal accelerated particle swarm optimization (CAPSO) combined to ANN | Johor state, Malaysia |
| The use of the hybrid model combining the CAPSO and the ANN increases the accuracy, in particular when the original data are pre-processed |
Buykyildiz et al. (2014) | Estimation of the change in lake water level by artificial intelligence methods including PSO-ANN, SVR, MLP, RBNN and ANFIS | Lake Beysehir, Turkey |
| SVR model is found to be the best model when compared to other methods. It provides results with a coefficient of determination R2 of 0.9988 |
Sudheer et al. (2011) | Use of a quantum behaved PSO to determine the SVM coefficients in order to estimate groundwater level R | Rentashintala, India | Historic records | SVM-QPSO is better than ANN in terms of RMSE (0.43), R2 (0.94) and efficiency (0.84) |
Mehr & Jha (2013) | Analysis of monthly rainfall data using time-series based approaches: ARIMA and forecasting for the next 12 years for different district towns | Mahanadi River Basin, India | Historic records | The ARIMA(1,0,0)(0,1,1)12 is found to be the most suitable for forecasting monthly rainfall for the upcoming 12 years in 38 towns |
Billings & Agthe (1998) | Use of state-space approach versus multiple-regression for forecasting monthly urban water demand | USA |
| Mean absolute error of forecast ranges from 7.4 to 14.8%. State-space is found to be better than the multiple-regression model on the studied water demand set |
Akbari-Alashti et al. (2015) | Evaluation of developed discrete time-series in flow forecasting models: ARMAX, ANN, GP | East Azerbaijan province, Iran |
| GP is found to be the more precise model with R2 = 0.7 and RMSE = 0.172 |
Jalalkamali et al. (2011) | Forecasting groundwater level in a well using neuro-fuzzy and ANN | Kerman plain, Iran |
| NF and ANN techniques are good choices for predicting groundwater level |
Wang et al. (2016) | Three hybrid stochastic models have been used for predicting available water resources, water demand and water use | Urimqi, China |
| The results are assumed by the authors to help managers and policy-makers to have a clear understanding of regional water supply and demand trend as well as the water utilization structure in the future |
Bai et al. (2016) | Grey relational analysis model is used for forecasting monthly reservoir inflow | Anhui, China |
| The proposed approach is more accurate than traditional techniques |
It should be concluded from Table 1 that the forecasting obtained results are case-sensitive. In fact, the same method does not provide the same results when applied to different case studies. Moreover, the ANN technique seems to be the most used alone or combined to other techniques such as particle swarm optimization (PSO).
In this study, the MMWD forecasting problem is posed as an identification framework where the error between the observed (real) and the forecasted (predicted) water demands is to be minimized. From an optimization point of view, tuning the parameters of an ARIMA(p,d,q) model is known to be hybrid since it involves mixed integer-real decision variables. In fact, the presence of integer variables (the orders of the AR and the MA parts) can lead to a combinatorial explosion, whereas the classical auto-regressive integrated moving average (ARIMA) model tuning (with known orders) can present a non-convex aspect and thus classical solving algorithms can be easily trapped into local minimum solutions.
The present paper proposes a new approach, referred to as PSO-ARIMA, by mapping the ARIMA(p,d,q) parameters to the particle position components in the PSO paradigm. The decision variables are the integer-valued orders of the AR and the MA parts and the real-valued coefficients of the model. The proposed PSO algorithm is new since it handles simultaneously discrete and continuous decision variables. Moreover, the search-space limits are defined such that the stability of the obtained model is ensured. The proposed algorithm has the capacity of operating in complex search-spaces with global optimization ability compared to gradient-descent algorithms known to be local search methods. The relative performance measures of the proposed approach are first evaluated and then compared with those of ANN and traditional stochastic time-series (STS) approaches. Based on 69 records of MMWD collected from Water Directorate of Hail Region, first an explanatory analysis including the stationarity test of the time-series was conducted through the so-called sample auto-correlation function (ACF) study. Then, diverse ANN- based approaches and different STS-approaches such as AR and ARIMA were used to model the MMWD. Finally, the progressive development of the PSO-ARIMA approach was detailed. The PSO technique demonstrates high ability to provide mixed integer-real near-optimal solution for the ARIMA(p,d,q) forecasting model. By comparing the statistical performance indicators, the PSO-ARIMA is found to be the most parsimonious in the sense that it allows the identification of both the orders and the coefficients simultaneously. Moreover, it is revealed to be insensitive to forecasting problem irregularities including linearity, convexity and stationarity. Other features of the proposed PSO-ARIMA approach include the facts that it is easy to understand, simple to program (requires few lines of computer code) and that it operates in a straightforward manner on the original time-series without needing any pre-processing.
The remainder of the paper is organized as follows: in the following section, the problem of MMWD modeling and identification is formulated; followed by a section briefly discussing three tools for solving the problem, namely, ANN, STS and PSO. In the penultimate section, the practical implementation of the presented methodologies is detailed. Special emphasis is allocated to the PSO-ARIMA approach. Concluding remarks and recommendations are summarized in the final section.
MMWD MODELING-IDENTIFICATION PROBLEM FORMULATION
In this study, the MMWD is viewed as a dynamic system. Thus, an experimental analysis, referred to as identification, should be performed. A mathematical model is derived from measurements (observations in the case of forecasting problems). For such a purpose, the system inputs and outputs are gathered and a relationship between them has to be performed. According to basic system theory concepts, the dynamic system models are of two types: parametric and non-parametric. The parametric models are considered in this study. The structures of the developed models are known in advance and their synthesis parameters are to be tuned optimally such that an error function is minimized.
The MMWD model is a mathematical description assumed to produce similar behavior as the MMWD dynamic real system. The p previous values (obtained via the delay operator q–1) of the system output, y(t), are fed respectively to the model and to the system itself. The estimated value, , the observed value, y(t) as well as their difference, , are introduced to the model identifier block (Bidwell 2005; Aitmaatallah et al. 2015). This block is intended to estimate the unknown model parameters by minimizing the least squared error function between the system and its model respective outputs.
In this study, the model identifier has been implemented using respectively ANN, AR, ARIMA and PSO-ARIMA approaches. The following section will describe the theoretical and practical aspects of these methodologies.
PROPOSED METHODOLOGIES
Time-series
ANNs
The ANN has been widely used as a popular tool for MWD forecasting (e.g. Ajbar & Ali 2013; Babel et al. 2014). It was in some cases used as a unique framework, but in the majority of cases it has been coupled with other techniques, such as wavelet-transform and PSO. ANNs are defined as universal and highly flexible approximators for diverse time-series going from the fields of cognitive science to engineering. The concept of ANN is an attempt to imitate the human brain functioning (Babel et al. 2014). A network of artificial neurons is organized in layers. A collection of inputs is introduced to the process's units (neurons). In each layer, the neurons are fully interconnected via weights. The weighted inputs are then introduced to a transfer function, which converts them to an output. Choosing the ANN architecture, determining the number of neurons and choosing the appropriate activation functions are the most important steps in designing an ANN. Distinct architectures, learning methods and performance can be found elsewhere (e.g. Talib et al. 2008; Firat et al. 2009; Adamowski & Karapataki 2010). In this study, ANNs with different structures have been used in monthly MWD forecasting. Details for this aim will be provided later.
PSO
The PSO algorithm was first introduced by Kennedy and Eberhart in 1995. It is categorized as one of the population based swarm intelligence techniques such as Genetic Algorithms (GA), Ant Colony (AC), etc. Parameters tuning of the algorithm, its variants and its applications can be found elsewhere (e.g. Clerc & Kennedy 2002; Delvalle 2008; Wang et al. 2008; Boubaker et al. 2014; Buyukyildiz et al. 2014). PSO is preferred in many applications for its simplicity, its universal aspect and the fact that it does not require any regularity of the studied problem (continuity, differentiability and convexity). Although it has been used in forecasting electric load (e.g. Huang et al. 2005; Nazari et al. 2015; Saravanan et al. 2015), its use in water demand forecasting remains limited.
are two constant factors, named cognitive and social coefficients
are random numbers in the [0,1] range
is the best previous position of the ith particle
is the best previous position of the whole swarm
is a weighting function, known to ensure balance between search-space exploration and exploitation (Boubaker et al. 2014)
are respectively the maximum and the minimum value of wk.
From the expertise about the MWD forecasting problem, one can define the search-space limits. Components having values out of their limits are reset through a confinement mechanism. The first step in using PSO to solve the problem (6) is to adopt a suitable mapping between the particle position in the swarm evolution concept and the vector of the MWD forecasting problem synthesis parameters. Note that for this problem, the decision variables set is case-sensitive and that it depends on the nature of the available data, the forecasting horizon and the adopted model structure. More details will be provided when dealing with the progressive development of the chosen models based on PSO.
RESULTS AND DISCUSSION
MMWD is in general known to be affected by socioeconomic variables, such as monthly water bill, population, number of subscribers, gross national product, etc. (Firat et al. 2009), and by climatic variables such as precipitation accumulation, relative humidity, air temperature, etc. Moreover, the historical values of the monthly water consumption are among the explanatory variables that affect the current consumption. Since the socioeconomic and the climatic variables are non-available, we are limited in this study to analyzing the monthly MWD as a function of only its previous values. AR, ARIMA, ANN and PSO-based approaches will be used in forecasting. The first step consists on studying the stationarity of a TS composed of 69 records of monthly MWD (from January 2010 to September 2015) collected from the General Directorate of Water in Hail Region. As proposed by Box–Jenkins methodology, determining whether the monthly MWDTS is stationary (data set having a constant mean and variance) (Boughadis et al. 2005) is required for modeling STS forecasting using ARIMA models. The ANN models require a pre-processing of the original records in order to normalize them in the interval [0,1]. This process will be discussed in detail when dealing with the ANN approaches. Through a suitable mapping between PSO and the monthly MWD forecasting model, an algorithm to determine both the ARIMA model parameters and orders will be detailed. The simulations carried out in this study are performed using Matlab 2013b package and implemented on an Intel® Core™ i3-3110 microprocessor with a clock of 2 GHz as frequency and 4 GB of RAM. The relative performances of the proposed PSO-ARIMA approach were evaluated using the performance indicators introduced respectively in Equations (1)–(4). They are then compared with those of the AR, ARIMA and ANN-based approaches.
Study area
Hail city districts are fed with municipal water using valves operated manually according to a weekly fixed schedule (GDWHR 2004).
Evaluation of ANN techniques for monthly MWD modeling
In this section, various ANN techniques such as feed-forward (FF) and radial-basis have been evaluated via numerous performance indicators in modeling future MMWD in Hail region (Leva et al. 2015). The first step in using an ANN-based approach is to optimize the architecture of the ANN such that the relationship between the inputs and the outputs is well captured. For this aim, a number of tests have been conducted using different ANN architectures, activation functions and training algorithms (Adamowski et al. 2012). The inputs of the ANNs considered here are the MMWD of the previous months. The data set is divided into two sub-sets, training and testing. For all simulations, the first N1 = 40 records are used for training and the remaining N2 = 29 records are used for model validation. Choosing the ANN architecture is the most consuming time step when using ANNs in prediction. The best architecture is decided here by trial and error procedure. After several trials, the best ANN structure is found to consist of an input layer, two hidden layers and one output layer. The optimal numbers of neurons are found to be, respectively, 60 (for the input layer), 16 (for each one of the two hidden layers) and 10 (for the output layer). Linear activation functions are used for the input and the output layers whereas logarithmic-sigmoid activation function is used for the two hidden layers. The BP-FF-ANN has been trained using the Levenberg–Marquardt, the resilient back-propagation and the gradient Powell–Beale algorithms. The last one is found to provide better results than the two others.
We define the order of the BP-FF-ANN as the number of inputs for comparison purpose against other techniques. The maximum number of previous monthly MWD used as inputs is set to six. So, six BF-FF-ANN algorithms are trained and tested. The obtained results are depicted in Table 2.
Model . | MAPE (%) . | R2 . | RMSE (m3) . | AARE(%) . |
---|---|---|---|---|
BP-FF-ANN (1) | 10.6534 | 0.7426 | 4.3552 × 105 | 9.2345 |
BP-FF-ANN (2) | 11.8427 | 0.6629 | 4.9971 × 105 | 10.1552 |
BP-FF-ANN (3) | 6.5671 | 0.9115 | 2.5753 × 105 | 6.3571 |
BP-FF-ANN (4) | 11.3592 | 0.6691 | 4.7912 × 105 | 9.8243 |
BP-FF-ANN (5) | 12.9392 | 0.5581 | 5.7695 × 105 | 10.8166 |
BP-FF-ANN (6) | 17.1062 | 0.2380 | 7.6783 × 105 | 14.2835 |
Model . | MAPE (%) . | R2 . | RMSE (m3) . | AARE(%) . |
---|---|---|---|---|
BP-FF-ANN (1) | 10.6534 | 0.7426 | 4.3552 × 105 | 9.2345 |
BP-FF-ANN (2) | 11.8427 | 0.6629 | 4.9971 × 105 | 10.1552 |
BP-FF-ANN (3) | 6.5671 | 0.9115 | 2.5753 × 105 | 6.3571 |
BP-FF-ANN (4) | 11.3592 | 0.6691 | 4.7912 × 105 | 9.8243 |
BP-FF-ANN (5) | 12.9392 | 0.5581 | 5.7695 × 105 | 10.8166 |
BP-FF-ANN (6) | 17.1062 | 0.2380 | 7.6783 × 105 | 14.2835 |
Exploratory analysis
AR models
The Econometrics toolbox of Matlab has been used as the modeling tool for the ARIMA STS studied in this paper (the AR model is considered as a particular case of ARIMA with q = 0). The selected and calibrated models are used for simulation and forecasting. The ‘arima’ function is used to specify the model and the ‘estimate’ function is used to estimate the model polynomial coefficients. These two functions assume that the model orders (p and q) are determined in advance. The ‘estimate’ function uses the maximum likelihood principle which finds the maximum probability density function corresponding to the optimal values of the model parameters.
Model . | MAPE (%) . | R2 . | RMSE (m3) . | AARE (%) . | AIC . |
---|---|---|---|---|---|
AR(1) | 5.7864 | 0.9017 | 2.6728 × 105 | 5.7260 | 24.0842 |
AR(2) | 5.7585 | 0.9238 | 2.7115 × 105 | 5.4833 | 24.9739 |
AR(3) | 5.4007 | 0.9201 | 2. 31191 × 105 | 5.3800 | 24.0005 |
AR(4) | 5.5501 | 0.9203 | 2. 4240 × 105 | 5.6355 | 24.0661 |
AR(5) | 5.6534 | 0.9209 | 2. 1412 × 105 | 5.3555 | 24.1219 |
Model . | MAPE (%) . | R2 . | RMSE (m3) . | AARE (%) . | AIC . |
---|---|---|---|---|---|
AR(1) | 5.7864 | 0.9017 | 2.6728 × 105 | 5.7260 | 24.0842 |
AR(2) | 5.7585 | 0.9238 | 2.7115 × 105 | 5.4833 | 24.9739 |
AR(3) | 5.4007 | 0.9201 | 2. 31191 × 105 | 5.3800 | 24.0005 |
AR(4) | 5.5501 | 0.9203 | 2. 4240 × 105 | 5.6355 | 24.0661 |
AR(5) | 5.6534 | 0.9209 | 2. 1412 × 105 | 5.3555 | 24.1219 |
From the results of Table 3, it can be observed that the AR(3) model performed better than the other models with the lowest MAPE (5.4007%), the highest R2(0.9201) and the lowest Akaike information criterion (AIC) (24.0005).
ARIMA(p,1,q) models
In order to construct the best ARIMA(p,1,q) model, the orders p and q need to be determined. Many combinations of the couple (p,q) have been tested and evaluated according to the four previously defined performance indicators. The one-step-ahead forecasting is considered here. Four models along with their coefficients and performance indicators for both training and testing data are shown in Table 4.
Model . | Parameters . | Performance indicators (training + testing) . | |||||||
---|---|---|---|---|---|---|---|---|---|
C . | a1 . | a2 . | a3 . | b1 . | MAPE (%) . | R2 . | RMSE (m3) . | AARE (%) . | |
ARIMA (1,1,1) | 16287 | 0.6611 | – | – | 0.3563 | 6.1491 | 0.9119 | 2.5508 × 105 | 5.4566 |
ARIMA (2,1,1) | 21846 | −1.0202 | −0.0202 | – | 0.9394 | 8.3298 | 0.7744 | 4.1149 × 105 | 9.1475 |
ARIMA (3,1,1) | 356,581 | 0.5744 | 0.3190 | −0.2738 | −1 | 7.9462 | 0.7124 | 4.6811 × 105 | 9.2233 |
ARIMA (1,0,1) | 346,677 | 0.8467 | – | – | −0.2257 | 7.0535 | 0.8962 | 2.8557 × 105 | 7.5595 |
Model . | Parameters . | Performance indicators (training + testing) . | |||||||
---|---|---|---|---|---|---|---|---|---|
C . | a1 . | a2 . | a3 . | b1 . | MAPE (%) . | R2 . | RMSE (m3) . | AARE (%) . | |
ARIMA (1,1,1) | 16287 | 0.6611 | – | – | 0.3563 | 6.1491 | 0.9119 | 2.5508 × 105 | 5.4566 |
ARIMA (2,1,1) | 21846 | −1.0202 | −0.0202 | – | 0.9394 | 8.3298 | 0.7744 | 4.1149 × 105 | 9.1475 |
ARIMA (3,1,1) | 356,581 | 0.5744 | 0.3190 | −0.2738 | −1 | 7.9462 | 0.7124 | 4.6811 × 105 | 9.2233 |
ARIMA (1,0,1) | 346,677 | 0.8467 | – | – | −0.2257 | 7.0535 | 0.8962 | 2.8557 × 105 | 7.5595 |
PSO–ARIMA(p,d,q) models
Since MMWD in Hail region is derived from a stochastic process known to be highly non-linear and eventually presenting irregularities, the resulting modeling-identification problem is classified as hybrid and difficult to solve. This problem presents a high level of algorithmic complexity. In such conditions, we have to resort to the PSO technique to address the modeling aspect of future water consumptions underlying on the limited available data (Parsopoulos & Vrahatis 2002).
Swarm model definition
This encoding scheme presents the particularity that it contains, simultaneously, integer and real-coded variables. are real-coded variables, p and q are integer variables. To avoid unnecessary computations, the maximum values for p and q respectively are set to 5. Therefore, once p and q are known, all the coefficients between (respectively between) are set to 0. The search-space limits are defined in Table 5.
Parameter . | Min . | Max . | Nature . | Meaning . |
---|---|---|---|---|
10−4 × C | 1 | 35 | Real | Scalar constant in the linear time-series model as defined in Equation (5) |
p,q | 1 | 5 | Integer | Degrees of the compound AR and the compound moving average (MA) polynomials |
a1, …, ap | −1 | 1 | Real | Coefficients of the AR polynomial |
b1, …, bq | −1 | 1 | Real | Coefficients of the MA polynomial |
Parameter . | Min . | Max . | Nature . | Meaning . |
---|---|---|---|---|
10−4 × C | 1 | 35 | Real | Scalar constant in the linear time-series model as defined in Equation (5) |
p,q | 1 | 5 | Integer | Degrees of the compound AR and the compound moving average (MA) polynomials |
a1, …, ap | −1 | 1 | Real | Coefficients of the AR polynomial |
b1, …, bq | −1 | 1 | Real | Coefficients of the MA polynomial |
Since we have no idea about the magnitude of the ARIMA model coefficients, the objective function f in Equation (12) has been minimized by the ‘fminunc’ (unconstrained minimization function of the optimization toolbox in Matlab) with different initial guesses (because it is a local gradient-based minimization function). Values of C have been found to range from approximately 10,000 to 350,000 and the coefficients a1, a2, …, ap, b1,b2, …, bq are found to be between −1 and 1.
Fitness function evaluation
PSO–ARIMA(p,d,q) procedure for modeling monthly MWD
The PSO–ARIMA(p,d,q) approach for modeling monthly MWD in Hail region is described as follows.
Set the PSO synthesis parameters as follows: wmax = 0.95, wmin = 0.35, nb_part = 20, c1 = c2 = 0.75, maxIter_nb = 100.
Load the historical records of the monthly MWD in Hail region from an excel file (y(t), t = 1:69).
Generate a white-noise signal (e(t), t = 1:69).
Set the iteration index to k = 1.
Initialize the swarm particles randomly within the search-space limits as defined in Table 5.
For each particle, compute the fitness function as defined in Equation (12) using the N1 = 40 records.
Determine the index of the best particle among the swarm and set to the position of this particle.
Set the personal best of each particle to its current position.
Increment the iteration index k = k + 1.
Update the position of each particle according to Equations (8) and (9).
As are integer variables, round them off to the nearest integer values over {1, 2, 3, 4, 5}.
If one of the real-coded variables comes out of the search-space limits defined in Table 5, confine it by resetting within the search-space.
Compute the fitness function for each particle according to Equation (12). Compare the current fitness of each particle to its previous personal best fitness. If there is improvement, then set .
Determine the index of the best particle in the swarm and update accordingly.
Test the stopping criterion, if not reached loop to Step 9. If reached, record as the best solution.
Adaptation procedure for the real and integer decision variables
In this subsection, more emphasis on steps (10) and (11) of the PSO-ARIMA procedure is put forward to clarify the adaptation procedure used to simultaneously handle the real and integer variables of the studied forecasting problem. By reference to Equation (11) for the position vector and to Table 5 for the meaning and limits of the position components, the following example is considered. Assume that at the kth iteration, the decision variables vector of the ith particle is given by Xki = [29 × 104, 2.11, 0.7, −0.8, 3.2, −1.7, 3.11, 3.92, 0.05, −0.2, 0.17, 4.5 2.73] (this vector is the result of step (10)). Since p and q are integer variables, the second and the eighth related values are rounded-off to the nearest integer numbers. p = round-off (2.11) = 2 and q = round-off (3.92) = 4. Consequently, a3 = a4 = a5 = 0 and b5 = 0. As a result of the adaptation procedure (output of step (11)), the ith particle position vector is then given by Xki = [29 × 104, 2, 0.7, −0.8, 0, 0, 0, 4, 0.05, −0.2, 0.17, 4.5, 0].
Numerical results
. | Performance indicators . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | MAPE (%) . | RMSE (m3) . | R2 . | AARE (%) . | ||||||||
Phase . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . |
BP-FF-ANN(3) | 6.5671 | 6.4609 | 6.4977 | 2.5735 × 105 | 2.2320 × 105 | 2.3611 × 105 | 0.9115 | 0.9189 | 0.6346 | 6.3571 | 6.2218 | 7.2207 |
STS-AR(3) | 5.4007 | 5.4672 | 5.2009 | 2.3119 × 105 | 2.3232 × 105 | 2.3707 × 105 | 0.9201 | 0,5125 | 0.8582 | 5.3800 | 5.3570 | 5.4473 |
STS-ARIMA(1,0,1) | 7.0535 | 7.2234 | 8.2817 | 2.8597 × 105 | 2.7733 × 105 | 3.2787 × 105 | 0.8962 | 0.8736 | 0.7036 | 8.2236 | 8.3270 | 7.1179 |
STS-ARIMA(1,1,1) | 6.1491 | 6.0917 | 7.4320 | 2.5528 × 105 | 2.4678 × 105 | 2.7965 × 105 | 0.9119 | 0.8233 | 0.7958 | 7.3255 | 7.8917 | 7.3576 |
PSO-ARIMA | 5.2832 | 4.8236 | 5.3217 | 2.2111 × 105 | 1.8932 × 105 | 2.2367 × 105 | 0.9375 | 0.9452 | 0.8289 | 5.2911 | 5.0733 | 5.3319 |
. | Performance indicators . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | MAPE (%) . | RMSE (m3) . | R2 . | AARE (%) . | ||||||||
Phase . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . | Training + testing . | Training . | Testing . |
BP-FF-ANN(3) | 6.5671 | 6.4609 | 6.4977 | 2.5735 × 105 | 2.2320 × 105 | 2.3611 × 105 | 0.9115 | 0.9189 | 0.6346 | 6.3571 | 6.2218 | 7.2207 |
STS-AR(3) | 5.4007 | 5.4672 | 5.2009 | 2.3119 × 105 | 2.3232 × 105 | 2.3707 × 105 | 0.9201 | 0,5125 | 0.8582 | 5.3800 | 5.3570 | 5.4473 |
STS-ARIMA(1,0,1) | 7.0535 | 7.2234 | 8.2817 | 2.8597 × 105 | 2.7733 × 105 | 3.2787 × 105 | 0.8962 | 0.8736 | 0.7036 | 8.2236 | 8.3270 | 7.1179 |
STS-ARIMA(1,1,1) | 6.1491 | 6.0917 | 7.4320 | 2.5528 × 105 | 2.4678 × 105 | 2.7965 × 105 | 0.9119 | 0.8233 | 0.7958 | 7.3255 | 7.8917 | 7.3576 |
PSO-ARIMA | 5.2832 | 4.8236 | 5.3217 | 2.2111 × 105 | 1.8932 × 105 | 2.2367 × 105 | 0.9375 | 0.9452 | 0.8289 | 5.2911 | 5.0733 | 5.3319 |
The performance of both training and validation data sets considered together as well as the training and validation data sets considered separately are summarized in Table 6. The results of the best BP-FF-ANN(3), AR(3), STS-ARIMA(1,0,1) using the original records and the STS-ARIMA(1,1,1) using the stationarized records are included in the same table for comparison.
As shown in Figure 13, the behavior of the whole swarm along with the iterations is shown respectively for iteration 1, 20, 50 and 100. The proposed PSO-based approach drives the swarm to a steady state (for the optimal a1 and b1). The remaining optimal parameters behave similarly.
Comparative analysis and concluding remarks
According to the obtained results, it has been shown that the used PSO synthesis parameters are effective for modeling the monthly MWD in Hail region. The set of parameters have been used successfully for other engineering problems such as those presented in Zhang (2003), Huang et al. (2005) and El-Telbany & El-Karmi (2008). The unique exception concerns with c1 and c2. In fact, c1 = c2 = 2.05 (as in the cited papers) conduct to a severe oscillating behavior. In this study they have been set to 0.75. Moreover, the PSO-based approach has the advantage of simultaneously handling real-coded and integer-coded decision variables. Thus the identification of the AR part order and MA part order of the ARIMA model and the estimation of the polynomial coefficients are performed together in a unique framework. This can save time in design compared to STS and ANN-based approaches. The proposed PSO-ARIMA approach should be the preferred approach (over ANN and STS) due to its simplicity without needing data pre-processing (data re-scaling in ANN and stationarization in STS). The comparative results in Table 6 reveal that the PSO-based approach outperforms both ANN and STS approaches in terms of the four computed performance indicators (a MAPE of 5.2832%, an R2 of 0.9375, an RMSE of 2.2111 × 105m3 and an AARE of 5.2911%) for the global data sets and for the training and validation data sets considered separately. The testing phase is the one that is more representative of the actual performances of the model in a real application framework and the RMSE is the most significant performance indicator because it represents the objective function to be minimized. The value of the RMSE for the PSO-ARIMA in the testing phase (2.2111 × 105 m3) is better than the RMSE of the STS-AR(3) in the same phase (2.3119 × 105 m3). By computing the relative difference between the two values, it has been found that the PSO-ARIMA is 4.36% better than the STS-AR(3).
In terms of monthly MWD profile tracking, all the developed models present a whole behavior characterized by forecasted values that follow the actual values relatively closely.
CONCLUSIONS AND RECOMMENDATIONS
In this paper, an approach based on an ARIMA model combined to the PSO as an identification tool was developed and applied to the forecasting of one-month-ahead municipal water demand in Hail region, Saudi Arabia. The modeling–identification problem included integer-valued (model orders) and real-valued (model coefficients) decision variables. It was then considered as a hybrid complex optimization problem for which classical optimization techniques suffer from combinatoric explosion (for the discrete aspect) and high ability to be trapped into local minimum points (for the continuous aspect). The proposed approach has the advantage of optimizing integer and real variables simultaneously. Moreover, it has been shown to operate efficiently in complex search-spaces and converge to near-optimal solution in a reasonable computation time. Since the PSO is a conceptually stochastic optimization algorithm, 300 runs have been carried out using 69 records of monthly water demand. From the results of the best PSO-ARIMA run reported in this paper, this approach has been shown to outperform BP-FF-ANN, STS-AR and STS-ARIMA based solutions in terms of four performance indicators namely the MAPE, R2, RMSE and AARE for both training and testing data sets.
More than being relatively better in term of forecasting accuracy, the proposed PSO-ARIMA approach should be preferred for being insensitive to the forecasting problem irregularities including convexity and differentiability. In addition, the PSO-ARIMA does not require any data pre-processing (data re-scaling in ANN and data stationarity in STS).
Forecasting monthly MWD is classified as medium-range; it can participate to efficient operation and management of an existing water supply system. Hail water utilities can take benefit from the results of this study to develop efficient plans for optimized system operation and ensure balancing between water need and supply. The developed monthly MWD in Hail region are fundamental in taking decisions in water management issues. Establishing efficient pricing policies, planning new system developments and optimizing sizes and operation procedures of the supply system (pumps, reservoirs) are among these issues. Results obtained for monthly MWD by similar studies (Firat et al. 2009; Babel et al. 2014) demonstrated that using other explanatory variables (such as population statistics, water bill, subscribers’ number, climatic variables, etc.) can improve forecasting results accuracy.
To study the effect of climatic changes on monthly MWD, other ways can be explored. In particular, for Hail region, two periods (summer and winter) will be considered separately. If sufficient metering devices are available, the living levels in different cities of Hail town will be taken into consideration. A future issue will be based on forecasting climatic and socio-economic variables first and after that use them to forecast water consumption.
ACKNOWLEDGEMENTS
This research project was granted by the Deanship of Scientific Research in Hail University, Saudi Arabia under the number E22 CC.