Identification of monthly municipal water demand system based on autoregressive integrated moving average model tuned by particle swarm optimization

In this paper, a modeling-identification approach for the monthly municipal water demand system in Hail region, Saudi Arabia, is developed. This approach is based on an auto-regressive integrated moving average (ARIMA) model tuned by the particle swarm optimization (PSO). The ARIMA (p, d, q) modeling requires estimation of the integer orders p and q of the AR and MA parts; and the real coefficients of the model. More than being simple, easy to implement and effective, the PSO-ARIMA model does not require data pre-processing (original time-series normalization for artificial neural network (ANN) or data stationarization for traditional stochastic time-series (STS)). Moreover, its performance indicators such as the mean absolute percentage error (MAPE), coefficient of determination (R), root mean squared error (RMSE) and average absolute relative error (AARE) are compared with those of ANN and STS. The obtained results show that the PSO-ARIMA outperforms the ANN and STS approaches since it can optimize simultaneously integer and real parameters and provides better accuracy in terms of MAPE (5.2832%), R (0.9375), RMSE (2.2111 × 10m) and AARE (5.2911%). The PSO-ARIMA model has been implemented using 69 records (for both training and testing). The results can help local water decision makers to better manage the current water resources and to plan extensions in response to the increasing need. doi: 10.2166/hydro.2017.035 s://iwaponline.com/jh/article-pdf/19/2/261/391183/jh0190261.pdf Sahbi Boubaker Department of Electrical Engineering, Community College, University of Hail, Saudi Arabia and Research Unit on Study of Systems and Renewable Energy, Hail City, Saudi Arabia and National College of Engineering of Monastir, Monastir City, Tunisia E-mail: sahbi.boubaker@gmail.com


INTRODUCTION
In arid climate countries such as Saudi Arabia, reliable municipal water demand modeling is crucial in both the medium and long term. In fact, water demand is steeply rising and consequently water utilities are invited to make suitable decisions in order to ensure good planning and management of scarce and limited water resources. Operational decisions are in general based on both the present requirements and the expected demands. Water demand can be viewed as a dynamic system and requires mathematically modeling. The developed models can be useful for many purposes such as scheduling, diagnosis, management, control and forecasting. In this paper, monthly municipal water demand (MMWD) is posed as an identification problem and an optimization procedure has been established to solve it. The outlined model is used for forecasting purposes.
This study is concerned with developing MMWD models for the case study of Hail, Saudi Arabia, taking into account historical records of water consumption as unique explanatory variables. A wide variety of models and methods have been used by researchers and water forecasting practitioners. Water demand forecasting problem solving techniques can roughly be categorized into three classes: (1) time-series stochastic methods (e.g. Zhou et al. The study area, the performance measures and the case study particular aspects are mentioned if available. Inspired from Donkor et al. (), an annotated reference list of selected papers related to monthly water forecasting problems is included in Table 1.
It should be concluded from Table 1 that the forecasting obtained results are case-sensitive. In fact, the same method does not provide the same results when applied to different case studies. Moreover, the ANN technique seems to be the most used alone or combined to other techniques such as particle swarm optimization (PSO).
In this study, the MMWD forecasting problem is posed as an identification framework where the error between the observed (real) and the forecasted (predicted) water demands is to be minimized. From an optimization point of view, tuning the parameters of an ARIMA(p,d,q) model is known to be hybrid since it involves mixed integer-real decision variables. In fact, the presence of integer variables (the orders of the AR and the MA parts) can lead to a combinatorial explosion, whereas the classical auto-regressive integrated moving average (ARIMA) model tuning (with known orders) can present a non-convex aspect and thus classical solving algorithms can be easily trapped into local minimum solutions.
The present paper proposes a new approach, referred to as PSO-ARIMA, by mapping the ARIMA(p,d,q) parameters to the particle position components in the PSO paradigm.  ARIMA(p,d,q) forecasting model. By comparing the statistical performance indicators, the PSO-ARIMA is found to be the most parsimonious in the sense that it allows the identification of both the orders and the coefficients simultaneously. Moreover, it is revealed to be insensitive to forecasting problem irregularities including linearity, convexity and stationarity. Other features of the proposed PSO-ARIMA approach include the facts that it is easy to understand, simple to program (requires few lines of computer code) and that it operates in a straightforward manner on the original time-series without needing any pre-processing.
The remainder of the paper is organized as follows: in the following section, the problem of MMWD modeling and identification is formulated; followed by a section

MMWD MODELING-IDENTIFICATION PROBLEM FORMULATION
In this study, the MMWD is viewed as a dynamic system.
Thus, an experimental analysis, referred to as identification, should be performed. A mathematical model is derived from measurements (observations in the case of forecasting problems). For such a purpose, the system inputs and outputs are gathered and a relationship between them has to be performed. According to basic system theory concepts, In this study, the model identifier has been implemented using respectively ANN, AR, ARIMA and PSO-ARIMA approaches. The following section will describe the theoretical and practical aspects of these methodologies.
The good fit of the forecasted values is measured via the mean absolute percentage error (MAPE), the coefficient of determination (R 2 ), the root mean squared error (RMSE) and the average absolute relative error (AARE) defined as follows: The proposed approach is more accurate than traditional techniques whereŷ t is the forecasted water demand, y t is the tth actual water demand, y is the mean water demand and N is the number of observations.

Time-series
A time-series of municipal water demand is defined as a set of values that occur over a period of time in a certain pattern. Under the assumption that the time-series behavior (pattern) will persist in the future, this approach has been widely used in forecasting water consumption ( where C is a constant value, a 1 , . . . , a p are the coefficients of the autoregressive (AR) part and p is its order; b 1 , . . . , b q are the coefficients of the moving average part and q is its order.

ANNs
The ANN has been widely used as a popular tool for MWD In the present study, we intend to take benefit from the features of the PSO to forecast the MWD in Hail region.
MWD forecasting is posed as (or transformed into) an optimization problem. This problem is defined as in Equation (6): PSO is based on the concept of gradually evolving a swarm of possible solutions for an optimization problem (Trelea ). The potential solution is coded as the position of the ith particle in the d-dimensional search space: Let X 1 , X 2 , . . . , X d be the decision variables and k the iteration index of the optimization problem (6). During the search process, each particle in the swarm adjusts its position through a velocity operator defined as: where: • c 1 , c 2 are two constant factors, named cognitive and social coefficients • rand 1 , rand 2 are random numbers in the [0,1] range •P i k is the best previous position of the ith particle •G i k is the best previous position of the whole swarm • w k ¼ w max À (w max À w min =k max ) × k is a weighting function, known to ensure balance between search-space exploration and exploitation (Boubaker et al. ) • w max and w min are respectively the maximum and the minimum value of w k .
The position vector of the ith particle is then given by Equation (9): From the expertise about the MWD forecasting problem, one can define the search-space limits. Components having values out of their limits are reset through a confinement mechanism. The first step in using PSO to solve the problem (6) is to adopt a suitable mapping between the particle position in the swarm evolution concept and the vector of the MWD forecasting problem synthesis parameters.
Note that for this problem, the decision variables set is case-sensitive and that it depends on the nature of the available data, the forecasting horizon and the adopted model structure. More details will be provided when dealing with the progressive development of the chosen models based on PSO.

RESULTS AND DISCUSSION
MMWD is in general known to be affected by socioeco- They are then compared with those of the AR, ARIMA and ANN-based approaches.

Study area
Hail region (Saudi Arabia) is located in a semi-arid area and Uqda reservoir (20,000 m 3 ). Hail is located at an altitude of 900 m above sea level, so water is distributed without pumping (gravity).
Hail city districts are fed with municipal water using valves operated manually according to a weekly fixed schedule (GDWHR ).   Choosing the ANN architecture is the most consuming time step when using ANNs in prediction. The best architecture is decided here by trial and error procedure. After several trials, the best ANN structure is found to consist of an input layer, two hidden layers and one output layer.
The optimal numbers of neurons are found to be, respectively, 60 (for the input layer), 16 (for each one of the two hidden layers) and 10 (for the output layer). Linear activation functions are used for the input and the output layers whereas logarithmic-sigmoid activation function is used for the two hidden layers. The BP-FF-ANN has been trained using the Levenberg-Marquardt, the resilient backpropagation and the gradient Powell-Beale algorithms.
The last one is found to provide better results than the two others.
We define the order of the BP-FF-ANN as the number of inputs for comparison purpose against other techniques.
The maximum number of previous monthly MWD used as inputs is set to six. So, six BF-FF-ANN algorithms are trained and tested. The obtained results are depicted in Table 2.
All the models were developed in the same conditions Consequently a scaling procedure was used. The original data were divided by 10 7 . We can notice here that this transformation should not change the mean or the data distribution and that other normalization approaches can be used to re-scale the original data. The plot of the best

BF-FF-ANN(3) predicted versus observed monthly MWD
in Hail region is shown in Figure 4. It can be seen that this ANN-based approach presents high ability to track the consumption trend and seasonality.

Exploratory analysis
As mentioned before, predicting a time-series based on a static analysis of a time set is conducted in general through the Box-Jenkins methodology. The condition of stationarity has to be fulfilled (Vantuch & Zelinka ). A time-series is said to be stationary if it does not present any trend or seasonality through time (Ngo & Bros ). The monthly MWD set (composed of 69 records) presents the autocorrelation function (ACF) depicted in Figure 5.
It is shown that the ACF dies down extremely slowly, then it should be considered as non-stationary. The original TS is then differenced d times until a stationary set is obtained. Usually, one difference of the data set is sufficient.
By examining the plot of the ACF function of the differenced TS (see Figure 6), one can observe that it cuts off fairly      Table 3 and the plots of the forecasted and the observed monthly MWD is depicted in Figure 7 (only the graph of the best AR model is presented here).
From the results of Table 3, it can be observed that the AR(3) model performed better than the other models with the lowest MAPE (5.4007%), the highest R 2 (0.9201) and the lowest Akaike information criterion (AIC) (24.0005).

ARIMA(p,1,q) models
In order to construct the best ARIMA(p,1,q) model, the orders p and q need to be determined. Many combinations of the couple (p,q) have been tested and evaluated according to the four previously defined performance indicators.
The one-step-ahead forecasting is considered here. Four models along with their coefficients and performance indicators for both training and testing data are shown in Table 4.
It has been found that the ARIMA (1,1,1) is the most parsimonious model among all the tested models. Moreover, ARIMA(1,1,1) is judged to be adequate even when the nonstationary TS observations are used. Figure 8 illustrates the forecasted against the observed monthly MWD in Hail region obtained by ARIMA(1,1,1). This model is found to perform better than all the other models in terms of the performance indices (lowest MAPE, RMSE and AARE and  highest R 2 ). It can be concluded that although it is designed for linear data, the ARIMA model has the ability to capture non-linear behavior and produce relatively good fit for forecasting water demand. Intuitively, ARIMA(1,1,1) has the least number of parameters allowing it to acquire the minimum AIC. Consequently, it should be the best model according to the AIC performance indicator.
Since the original time-series, y(t) is not stationary, it has been differentiated once. The new once-differenced timeseries (Z(t) ¼ y(t)Ày(tÀ1)) is found to be stationary (see above under 'Exploratory analysis'). Considering that Equation (5) operates only on stationary time-series, y(t) has to be replaced by Z(t). Thus, Equation (5) on Z(t) becomes Z(t) ¼ C þ a 1 Z(tÀ1) þ e(t) þ b 1 e(tÀ1). By substituting Z(t) ¼ y(t)Ày(tÀ1) and Z(t-1) ¼ y(tÀ1)-y(tÀ2), the monthly MWD forecasting model is then given by the following equation: y(t) ¼ 16, 287 þ y(t À 1) À 0:6611(y(t À 1) À y(t À 2)) þ 0:3563e(t À 1) þ e(t) PSO-ARIMA(p,d,q) models Since MMWD in Hail region is derived from a stochastic process known to be highly non-linear and eventually presenting irregularities, the resulting modeling-identification problem is classified as hybrid and difficult to solve. This problem presents a high level of algorithmic complexity.
In such conditions, we have to resort to the PSO technique to address the modeling aspect of future water consumptions underlying on the limited available data (Parsopoulos & Vrahatis ).

Swarm model definition
The first step in a PSO-based approach is to ensure the mapping between the PSO world and the problem to be solved.
As it has been mentioned before, the one-step-ahead value of monthly MWD can be modeled using the ARIMA(p,d,q) Box-Jenkins methodology. The ARIMA(p,d,q) model defined in Equation (5) is adopted here and both the  The position vector is then defined by Equation (11): a 1 a 2 . . . a p a pþ1 . . . a pmax q b 1 This encoding scheme presents the particularity that it contains, simultaneously, integer and real-coded variables.
C, a 1 , a 2 , . . . , a p , b 1 , b 2 . . . b q are real-coded variables, p and q are integer variables. To avoid unnecessary computations, the maximum values for p and q respectively p max , q max are set to 5. Therefore, once p and q are known, all the coefficients between a pþ1 and a pmax (respectively between b qþ1 and b qmax ) are set to 0. The search-space limits are defined in Table 5.
Since we have no idea about the magnitude of the ARIMA model coefficients, the objective function f in Equation (12) has been minimized by the 'fminunc' (unconstrained minimization function of the optimization toolbox in Matlab) with different initial guesses (because it is a local gradient-based minimization function). Values of C have been found to range from approximately 10,000 to 350,000 and the coefficients a 1 , a 2 , …, a p , b 1 ,b 2 , …, b q are found to be between À1 and 1.

Fitness function evaluation
qmax ] be the ith particle position, and then the quality of this particle is evaluated through its fitness function.
The objective function adopted in this study is the squared error between the observed and the forecasted values for the monthly MWD defined in Equation (12) (Hu ): One of the features of the PSO-based approach is that it does not require any computation of the objective function derivatives. The PSO concept is based in particular on the computation of the objective function itself.

PSO-ARIMA(p,d,q) procedure for modeling monthly MWD
The PSO-ARIMA(p,d,q) approach for modeling monthly MWD in Hail region is described as follows.
(2) Load the historical records of the monthly MWD in Hail region from an excel file (y(t), t ¼ 1:69).
(5) Initialize the swarm particles randomly within the search-space limits as defined in Table 5.
(6) For each particle, compute the fitness function as defined in Equation (12) using the N 1 ¼ 40 records.
(7) Determine the index of the best particle among the swarm and setG (i) to the position of this particle.
(8) Set the personal best of each particleP (i) k to its current position.
(12) If one of the real-coded variables comes out of the search-space limits defined in Table 5, confine it by resetting within the search-space.  (13) Compute the fitness function for each particle according to Equation (12). Compare the current fitness of each particle to its previous personal best fitness. If there is improvement, then setP (i) k ¼X (i) k . (14) Determine the index of the best particle in the swarm and updateG (i) accordingly.
(15) Test the stopping criterion, if not reached loop to Step 9. If reached, recordG (i) as the best solution.

Adaptation procedure for the real and integer decision variables
In this subsection, more emphasis on steps (10) and (11) of the PSO-ARIMA procedure is put forward to clarify the adaptation procedure used to simultaneously handle the real and integer variables of the studied forecasting problem. By reference to Equation (11) for the position vector and to Table 5 for the meaning and limits of the position components, the following example is considered. Assume that at the kth iteration, the decision variables vector of the ith particle is given by X k i ¼ [29 × 10 4 , 2.11, 0.7, À0.8, 3.2, À1.7, 3.11, 3.92, 0.05, À0.2, 0.17, 4.5 2.73] (this vector is the result of step (10)). Since p and q are integer variables, the second and the eighth related values are rounded-off to the nearest integer numbers. p ¼ round-off (2.11) ¼ 2 and q ¼ round-off (3.92) ¼ 4. Consequently, a 3 ¼ a 4 ¼ a 5 ¼ 0 and b 5 ¼ 0. As a result of the adaptation procedure (output of step (11)), the ith particle position vector is then given by X k i ¼ [29 × 10 4 , 2, 0.7, À0.8, 0, 0, 0, 4, 0.05, À0.2, 0.17, 4.5, 0].

Numerical results
The simulations were performed in a Matlab environment using our proper code. The data set was divided into a modeling sub-set (composed of 40 records) and a validation subset (composed of the remaining 29 records). The PSO algorithm was conceptually stochastic. For this reason, 300 runs were carried out and the results of the best run are depicted in Table 6. The corresponding best solution is given in Equation (  Since the optimal values obtained for p and q are equal to 1, all the coefficients a 2 , . . . , a 5 and b 2 , . . . , b 5 are equal to 0. The best model is then expressed as in Equation (14): The performance of both training and validation data sets considered together as well as the training and validation data sets considered separately are summarized in Table 6. The results of the best BP-FF-ANN(3), AR(3), STS-ARIMA(1,0,1) using the original records and the STS-ARIMA(1,1,1) using the stationarized records are included in the same table for comparison.
The graph of the actual and forecasted monthly MWD is depicted in Figure 9. The convergence characteristics of the optimal PSO-ARIMA structure are depicted respectively in Figures 10-13, which reveal that the values of the best fitness function decreases along with a constant value since the iteration 30 (from an optimization process of 100 iterations). Similar behaviors are shown for, respectively, the optimal p, q, a 1 and b 1 . In fact, all these parameters converge to their optimal values reported in the optimal global solution (Equation (14)).
As shown in Figure 13, the behavior of the whole swarm along with the iterations is shown respectively for iteration 1, 20, 50 and 100. The proposed PSO-based approach drives the swarm to a steady state (for the optimal a 1 and b 1 ). The remaining optimal parameters behave similarly.

Comparative analysis and concluding remarks
According to the obtained results, it has been shown that the used PSO synthesis parameters are effective for modeling   approach should be the preferred approach (over ANN and STS) due to its simplicity without needing data pre-processing (data re-scaling in ANN and stationarization in STS).
The comparative results in Table 6 reveal that the PSObased approach outperforms both ANN and STS approaches in terms of the four computed performance indicators (a MAPE of 5.2832%, an R 2 of 0.9375, an RMSE of 2.2111 × 10 5 m 3 and an AARE of 5.2911%) for the global data sets and for the training and validation data sets considered separately. The testing phase is the one that is more representative of the actual performances of the model in a real application framework and the RMSE is the most significant performance indicator because it represents the objective function to be minimized. The value of the RMSE for the PSO-ARIMA in the testing phase (2.2111 × 10 5 m 3 ) is better than the RMSE of the STS-AR(3) in the same phase (2.3119 × 10 5 m 3 ). By computing the relative difference between the two values, it has been found that the PSO-ARIMA is 4.36% better than the STS-AR(3).
In terms of monthly MWD profile tracking, all the developed models present a whole behavior characterized by forecasted values that follow the actual values relatively closely.

CONCLUSIONS AND RECOMMENDATIONS
In this paper, an approach based on an ARIMA model combined to the PSO as an identification tool was developed and applied to the forecasting of one-month-ahead municipal water demand in Hail region, Saudi Arabia. The modeling-identification problem included integer-valued (model orders) and real-valued (model coefficients) decision variables. It was then considered as a hybrid complex optimization problem for which classical optimization techniques suffer from combinatoric explosion (for the discrete aspect) and high ability to be trapped into local minimum points (for the continuous aspect). The proposed approach has the advantage of optimizing integer and real variables simultaneously. Moreover, it has been shown to operate efficiently in complex search-spaces and converge to nearoptimal solution in a reasonable computation time. Since the PSO is a conceptually stochastic optimization algorithm, 300 runs have been carried out using 69 records of monthly water demand. From the results of the best PSO-ARIMA run reported in this paper, this approach has been shown to outperform BP-FF-ANN, STS-AR and STS-ARIMA based solutions in terms of four performance indicators namely the MAPE, R 2 , RMSE and AARE for both training and testing data sets.
More than being relatively better in term of forecasting accuracy, the proposed PSO-ARIMA approach should be preferred for being insensitive to the forecasting problem irregularities including convexity and differentiability. In addition, the PSO-ARIMA does not require any data preprocessing (data re-scaling in ANN and data stationarity in STS).
Forecasting monthly MWD is classified as mediumrange; it can participate to efficient operation and To study the effect of climatic changes on monthly MWD, other ways can be explored. In particular, for Hail region, two periods (summer and winter) will be considered separately. If sufficient metering devices are available, the living levels in different cities of Hail town will be taken into consideration. A future issue will be based on forecasting climatic and socio-economic variables first and after that use them to forecast water consumption.