A comparison of five variants of a procedure for spatial aggregation of synthetic water demand time series is presented. This procedure allows for the spatial aggregation of hourly synthetic water demand time series preserving mean and variance at user level in such a way that the statistics of the spatially aggregated time series are reproduced (mean, variance, lag-1 temporal correlation, lag-1 temporal covariance). Five different ways of application of the methodology are considered and compared. Application to a case study consisting of the water demands of 21 users highlights that the variants considered show different levels of effectiveness in reproducing the statistics of interest and different computational burden, but overall represent a valid tool for the bottom-up generation of synthetic water demand time series.
Users' water demands are the main driver of water distribution systems (WDSs). Their proper characterization thus represents a fundamental prerequisite in order to set up a robust and accurate hydraulic model of a WDS. To this end, several models for the generation of synthetic user water demand time series have been proposed in the last decades; these models allow for the characterization of the user water demand even with fine time steps and are typically based on stochastic processes, like Poisson rectangular pulses (e.g., Buchberger & Wu 1995; Guercio et al. 2001), Neyman-Scott rectangular pulse (e.g., Alvisi et al. 2003; Alcocer et al. 2006) or based on the characterization of the uses of individual devices such as washing machines, toilets, etc., typically used in houses (e.g., Blokker et al. 2010). These models were shown to be very effective in statistically reproducing the observed data at the level of single users and at low temporal aggregation levels, but it is worth remembering that in WDS simulation models the water demands are typically allocated at the nodes of the model aggregating the consumption of many users adjacent to the node considered, and hourly or semi-hourly time steps are generally used. Thus, it is necessary to transfer the information from one level of spatial-temporal aggregation to another in order to set up synthetic water demand time series to be effectively used within hydraulic models of real/complex WDSs.
Indeed, several studies (e.g., Alvisi et al. 2003; Moughton et al. 2006) showed that both the simple temporal aggregation of the synthetic series, from, for example, 1-minute to 1-hour time step, and the spatial aggregation from single users to a group of users, performed through a simple sum, can lead to time series which do not reproduce the corresponding observed statistics (mainly variance and temporal covariances at different time-lags). In fact, users' water demand time series can be temporally and spatially correlated since they depend on the users' habits and lifestyle; thus, for example, a hot summer day can lead to very high water consumption of several users during evening hours due to irrigation of gardens, showers, etc. (Bakker et al. 2013). In this respect, Alvisi et al. (2003) highlighted that reproducing the covariances within and among the series to be aggregated (i.e., both temporally and spatially) is a prerequisite in order to properly reproduce the variances of the spatially aggregated series. In fact, the aggregation procedure consists of adding up a certain number of random variables (which represent the users' water demands), and the variance of a random variable that is a sum of random variables is equal to the sum of the variances of the individual random variables plus twice the sum of their covariances.
Summing up, it is important to be able to ‘transfer’ the time series generated at low temporal and spatial aggregation levels (e.g., at 1-minute time step for single users) to higher aggregation levels (e.g., at 1-hour time step for groups of users), since these latter series are those typically utilized within the simulation models for the design and management of WDSs. At the same time, the main statistics (mean, variance and temporal covariance) should be reproduced at these higher aggregation levels since they can have a significant impact on the hydraulic performances of the WDSs, as demonstrated by some recent studies (e.g., Babayan et al. 2005; Filion et al. 2007).
The temporal aggregation problem of time series from minute to hourly time step has been recently addressed by Alvisi et al. (2014b). Thus, in this paper, attention is focused only on the spatial aggregation of hourly water demand time series of single users in order to obtain hourly time series representative of a group of users to be used within a hydraulic simulation model, which can reproduce the main statistics observed at the same spatial aggregation level, i.e., hourly means, variances, covariances and correlations at different time-lags. To this end, taking our cue from Alvisi et al. (2014a), a spatial aggregation procedure based on the method proposed by Iman & Conover (1982) is presented, and five different ways of applying this method are compared. In the following, the underlying idea of the procedure and its five variants are described; the results obtained by their application to the case study consisting of the water demands of 21 users of the WDS of Milford (Ohio) (Buchberger et al. 2003) are discussed; finally, conclusions are presented.
Since the two variances shown in Equation (4) are preserved at the level of single user (this is an assumption; see above), in order to reproduce the (spatial) covariances among the time series of the single users, the spatial cross-correlations have to be additionally preserved and can be obtained by using the method proposed by Iman & Conover (1982) (hereinafter indicated as IC). This method is based on the reordering of the samples of water demands of the single users generated independently of each other (this is an assumption; see above) so as to impose the cross-correlations among them. The method can be schematized as follows.
Let X be the matrix of the generated samples, made up of Nc columns, where each column represents one of the random variables considered (e.g., the water demands of the nus users in the hour h of the day) and Nr rows, where each row represents one realization (e.g., the ndie generated days). Furthermore, let R* be the target (Nc × Nc) cross-correlation matrix calculated by using the observed data and R the(Nc × Nc) cross-correlation matrix calculated by using the generated samples: the generic element ρij of R thus represents the Pearson correlation coefficient between the ith column and the jth column of X (with i,j = 1,2, … ,Nc). The main steps are:
the target cross-correlation matrix R* is written as R* = PPT by using Cholesky factorization (Press et al. 1990), P being a lower triangular matrix and PT its transpose matrix;
the cross-correlation matrix R is similarly written as R = SST by using Cholesky factorization (Press et al. 1990);
the matrix X1 = X(PS−1)T is computed; it is worth noting that the matrix X1 has the same dimensions as X;
the desired matrix X* is created by reordering the columns of the matrix X, so that the ranking of each column of X is equal to the ranking of each column of X1. In this way, the matrices X* and X1 have the same rank correlation matrix, and, as a consequence, similar Pearson correlation matrices.
Summing up, imposing the spatial cross-correlations among the single user water demands through the IC method allows reproducing, according to Equations (3) and (4), the hourly means and variances of the spatially aggregated time series.
In order to preserve also the temporal covariances and correlations within the spatially aggregated time series, the IC method is again applied at this latter level by imposing the preservation of the temporal correlation coefficients from lag-1 to lag-24.
Five different ways of applying the IC procedure for thespatial aggregation of the time series of the single user water demands are considered here; these variants differ for the spatial and temporal cross-correlation coefficients imposed simultaneously to the series to be spatially aggregated, as explained in the subsequent sections. Once the spatially aggregated time series is obtained, the IC method is applied again in order to preserve the temporal covariances and correlations at different time-lags.
The five variants of the procedure compared
In the first variant, called V1, the IC procedure is applied to the independently generated synthetic hourly time series of individual users which are expected to reproduce mean and variance for each hour h of the day; the target is the preservation of the lag-0 spatial cross-correlations among the water demands of individual users. The time series of each user are then summed up hour by hour producing the time series of the group of users. Operatively, the IC procedure is applied 24 times, one for each hour h of the day, in order to impose the lag-0 spatial cross-correlations among the water demands of individual users for each hour of the day: the matrix X of the generated samples is thus made up of nus columns, where each column represents the random variable water demand of the jth user (with j = 1:nus) in the hth hour of the day, and the corresponding correlation matrix R has dimensions equal to nus × nus.
Once the spatially aggregated time series is produced through this variant, the IC method is applied once more in order to directly impose the temporal correlations from lag-1 to lag-23. To this end, the matrix X is made up of 24 columns, where each column represents the random variable water demand of the spatially aggregated time series in each hour of the day. This last step will be hereinafter indicated as TCas (Temporal Correlation of the aggregated series).
In the second variant, called V2, the IC procedure is preliminary applied to each of the synthetic hourly time series of the single users in order to reproduce the temporal correlations from lag-1 to lag-23, thus obtaining synthetic time series at user level which preserve mean, variance and temporal covariances at different time-lags. Operatively, the IC procedure is thus applied nus times, one for each user: the matrix X of the generated samples is thus made up of 24 columns, where each column represents the random variable water demand of the considered user in each hour of the day, and the corresponding correlation matrix R has dimensions equal to 24 × 24. After that, the variant V2 follows the same operating procedure as in V1, that is, the IC procedure is applied 24 times, one for each hour h of the day, to the synthetic hourly time series (which are now expected to reproduce mean, variance and covariance for each hour h of the day) in order to impose the lag-0 cross-correlations and the time series of the users are then summed up hour by hour producing the time series of the group of users. Finally the TCas step is applied (see variant V1).
It is worth noting that this method, depending on the number of users nus involved, may imply the need to estimate and work with a very high number of correlation coefficients (very large correlation matrix). The numerical application will show that imposing such a high number of correlation coefficients can be difficult to achieve. For this reason, two other variants were considered as alternatives to V3.
In the second alternative (to V3), called V5, the water demands are spatially aggregated by applying the same operating procedures as in V3 but in successive steps, each time considering subgroups of the total nus users to be aggregated; as a result, the dimensions of the cross-correlation matrix will be smaller. Practically speaking, the nus users are evenly divided into ng subgroups and for each subgroup V3 is applied in order to obtain the spatially aggregated time series of the subgroup; finally, V3 is applied once again, in this case to the series of ng spatially aggregated subgroups. For example, assuming a number of users nus = 20, these users could be divided into ng = 4 subgroups containing five users each: by applying V3 to each subgroup it is possible to obtain ng = 4 series of ‘partially’ spatially aggregated water demands, which are in turn spatially aggregated, again by applying V3, so as to obtain the time series of the overall aggregate of nus = 20 users; this approach enables us to work with correlation matrices which are smaller than the one that would result if all users were considered together. In this case as well, the TCas step is finally applied (see variant V1) to the spatially aggregated time series.
The proposed procedure (featuring the five variants) was applied to the water demands of 21 users of the WDS of Milford (Ohio) (Buchberger et al. 2003); the observation period had a length of 31 days, from 11 May to 10 June 1997. The hourly time series for each of the nus = 21 users to be spatially aggregated were generated by using the procedure described by Alvisi et al. (2014b); the length of each generated hourly time series representing the water demand of a user is equal to ndie = 90 days and the generation process, and consequently the spatial aggregation process, was repeated nrep = 500 times. The generated synthetic series reproduce, for each user, the hourly mean and variance of the corresponding observed series. More details about the generations of the time series with these characteristics are provided in Alvisi et al. (2014b).
Application of the variants of the procedure and discussion of the results
In Figure 1, the variances, the lag-1 correlation coefficients before and after the TCas step, and the lag-1 temporal covariances after the TCas step, of the spatially aggregated synthetic series obtained following the application of the five variants are presented and compared with the corresponding observed statistics; in particular, as regards the synthetic series, the statistics obtained for each of the nrep = 500 series are shown, along with the values averaged over the nrep synthetic series.
First, as illustrated in Figure 1(a), the simple sum of the individual user water demands results in an underestimation of the (hourly) variance of the aggregated historical time series, returning a CD of 0.27. This is due to the fact that no spatial cross-correlations have been imposed and preserved, and in particular no spatial cross-correlations at lag-0 for each of the 24 hours; therefore, the sum of the covariances of Equation (3) is close to zero, resulting in an underestimation of the variance of the aggregated data. Clearly, also the lag-1 temporal correlations of the spatially aggregated time series before the TCas step are not reproduced, the users' time series being temporally uncorrelated, whereas following the TCas step, the lag-1 correlations of the aggregated time series are improved; overall, however, the lag-1 temporal covariances of the aggregated time series are not well reproduced (CD = 0.11) due to the low effectiveness in reproducing the variances (see Equation (4)).
Variant V1 involves reordering the users' water demands so as to impose lag-0 spatial cross-correlations among the water demands of the individual users. For example, Figure 2 shows the target cross-correlation matrix R* calculated by using the observed data for hour h = 12 a.m. and the cross-correlation matrix R calculated by using the generated samples before and after applying the Iman & Conover (1982) approach. As can be observed, the approach allows for obtaining generated water demands whose cross-correlation matrix approximates very well the observed one.
Consequently, as shown in Figure 1(b), V1 enables excellent reproduction of the variance of the aggregated historical time series, with a CD of 0.94; as for the previous case, the lag-1 temporal correlations of the aggregated time series before the TCas step are not reproduced, the aggregated users' time series being temporally uncorrelated, whereas following the TCas step, the lag-1 correlations of the aggregated time series are improved, leading, together with the excellent reproduction of the variance (see Equation (4)), to a rather good reproduction of the temporal lag-1 covariances (CD = 0.64).
V2 is similar to V1, but the single users' water demand time series are reordered in order to impose their own temporal correlations preliminary to the imposition of the lag-0 cross-correlations. As can be observed in Figure 1(c), this variant shows the same effectiveness of the previous one as far as the variance concerns; on the other hand, thanks to the impositions of the temporal correlations on each user time series, the lag-1 temporal correlation of the aggregated series before the TCas step are better reproduced, and slightly better performances (with respect to V1) in reproducing both the lag-1 correlations (CD = 0.65 vs. CD = 0.55) and covariances (CD = 0.73 vs. CD = 0.64) can be observed also after the application of the TCas step.
V3 simultaneously imposes all the temporal and spatial cross-correlations from lag-0 to lag-23 (see Equation (5)) within and among the different users. As shown in Figure 1(d), V3 enables good reproduction of the variance of the aggregated historical time series, with a CD of 0.72, although this value is lower than the one obtained with V1 and V2; the effectiveness in reproducing the lag-1 cross-correlation of the aggregated series is good before the TCas step (CD = 0.54) and it significantly increases after the TCas step (CD = 0.90), leading to a CD = 0.64 in reproducing the lag-1 covariances, as for V1. V3 is less effective than V1 and V2 in reproducing the variance due to the fact that (1) the variance in each hour depends on the lag-0 spatial cross-correlations and (2) a larger number of correlation coefficients is imposed in V3 than in V1 and V2: in V1 and V2 only spatial cross-correlations at lag-0 are imposed whereas in V3 all spatial-temporal cross-correlations from lag-0 to lag-23 are imposed. In practical terms, when we seek to impose so many correlation coefficients simultaneously, we obtain less accuracy in reproducing the lag-0 spatial cross-correlations compared to V1 and V2, and thus the variances of the aggregated historical time series are not so well preserved.
V4 simultaneously imposes temporal correlations for every user and lag-0 spatial cross-correlations through the construction of a single banded correlation matrix (see Equation (6)). As shown in Figure 1(e), V4 enables excellent reproduction of the variance of the aggregated historical time series, similar to V1 and V2, with a CD of 0.93; the improvement over V3 in terms of reproducing the variance is understandable considering the smaller number of cross-correlations imposed. However, it performs worse than V3 in reproducing the lag-1 temporal correlation of the aggregated series both before and after the TCas step resulting in a CD = 0.51 for the lag-1 temporal covariance of the aggregated time series, lower than the corresponding ones of V1, V2 and V3.
V5, based on the aggregation of water demands in successive steps, was applied by dividing the nus = 21 users into ng = 3 groups of seven users each. Compared to V3 it involves estimating a smaller number of correlation coefficients for each matrix. As shown in Figure 1(f), V5 enables good reproduction of the variance of the aggregated historical time series, with a CD of 0.88; it is, moreover, highly effective in reproducing the lag-1 temporal correlation of the aggregated series both before and after the TCas step resulting in a CD = 0.83 for the lag-1 temporal covariance of the aggregated time series, the highest value among all the variants.
In short, as summarized in Table 1, where the five variants are ranked according to the effectiveness in reproducing each statistic, V1 and V2, followed by V4, V5 and V3 showed better results in reproducing the hourly variance of the aggregated historical time series; V5 in turn was shown to be the most effective in reproducing both the lag-1 temporal correlation and lag-1 temporal covariance of the aggregated historical time series after the TCas step, followed by V3 and V2 as far as the lag-1 temporal correlation was concerned and by V2 and V1 for the lag-1 temporal covariance.
|Variance||Lag-1 temporal correlation||Lag-1 temporal covariance||IC method application||Σrank||Total rank|
|Variance||Lag-1 temporal correlation||Lag-1 temporal covariance||IC method application||Σrank||Total rank|
Finally, in terms of computational burden, it is worth noting that the number of times nIC, the IC method, is applied (before the TCas step, which is common to all the variants) can vary significantly from one variant to the other, as detailed in Table 1: in fact, in V1 the IC method is applied 24 times, one for each hour h of the day, in order to impose the lag-0 spatial cross-correlations among the water demands of individual users, whereas in V2, remembering that also the temporal correlations within each user water demand time series are preliminarily imposed, the IC method is applied nIC = 21 + 24 = 55 times; in V3 and V4 the IC method is applied only once, even if the correlation matrix is very large, and in V5 the IC method is applied three times, considering each time a group of seven users, and once considering the three resulting partially aggregated series, for a total of nIC = 4 times.
Overall, considering both the effectiveness in reproducing all the statistics and the computational burden, a total ranking of the five variants can be computed using the approach for ranking solutions according to different criteria proposed by Ostfeld et al. (2012), that is, summing up for each solution the rank values of the solution corresponding to each criteria, and identifying the solution which is characterized by the lowest total rank value. In this case, thus, the rank values obtained by each variant were summed up (see Table 1, column ‘Σrank’) and the variant characterized by the lowest total value identified. V5 is the variant characterized by the best total rank, providing a good trade-off between effectiveness in reproducing the statistics and computational burden.
This paper proposes and compares five different variants of a procedure for the spatial aggregation of synthetic water demand time series all based on the IC method. The variants differ for the spatial and temporal cross-correlation coefficients imposed to the series to be spatially aggregated and thus for the way the IC method is applied. Application of these variants to a real case involving the water demands of 21 users showed that all of them allowed the statistics of interest of the spatially aggregated historical time series to be reproduced. In particular, all of them provide better results than the simple sum of spatially uncorrelated synthetic time series in terms of ability to preserve the hourly variance of the aggregated data. This confirms the importance of preserving the spatial correlations of individual user water demands in the spatial aggregation procedure. On the other hand, comparing the variants among themselves, it was observed that the degree of efficiency in reproducing the statistics, and the computational burden, varied from one variant to another. In the specific case concerned, it was observed that the procedure of aggregation based on user subgroups (variant V5) is the one that shows the best trade-off between computational burden and effectiveness in reproducing the main statistics, means, variances, lag-1 temporal correlations and lag-1 temporal covariances of the aggregated historical time series. In conclusion, the results obtained showed that the procedure in general, and the variant V5 in particular, represent a valid tool for ‘transferring’ the time series generated at low levels of spatial (individual user) aggregation to series relating to groups of users, thereby enabling the main statistics of the corresponding historical time series to be preserved at these higher levels of spatial aggregation.
The authors wish to thank Prof. Steven Buchberger for providing the Milford data that were used in the numerical applications. This study was carried out as part of the PRIN 2012 project ‘Tools and procedures for an advanced and sustainable management of water distribution systems’ and under the framework of Terra & Acqua Tech Laboratory, Axis I activity 1.1 of the POR FESR 2007–2013, project funded by the Emilia-Romagna Regional Council (Italy) (http://fesr.regione.emiliaromagna.it/allegati/comunicazione/la-brochure-dei-tecnopoli).