The effect of different temporal (from seconds to months) and spatial aggregation scales (from individual users to full urban areas) on water demand behavior has been explored to a limited degree. The effort described here extends those works by evaluating the scale effects of residential water consumption in a unique US data set that covers 10,000 households with a 1-gallon (3.79 L) hourly resolution over 2 years. A preliminary data analysis and a sequential Principal Component Analysis (PCA) is carried out to assess the effect of different temporal (weekly, daily, hourly) and spatial aggregation (individual meters and groups every 10, 100 and 1,000 meters) levels on demand. Results show that individual users act very differently from each other, and individual consumer variability is only canceled out when a significant number of households are aggregated. The implications of this finding are assessed from a hydraulic modeling perspective as the spatiotemporal scale of measurements may condition the type of analysis that can be carried out in practice. However, additional work is needed to explore the point at which it may be worth embracing a micro (per fixture/household) or a macro (per node/network) approach for different purposes.

The availability of smart technologies for water supply systems has increased considerably over the last decades. One of the key advances is the emergence of Advanced Metering Infrastructure (AMI), that remotely collects water consumption data with high temporal and spatial resolution (USEPA 2023). Consumer demands condition flow and pressure throughout the network and are integral to modeling studies and real-time operational decisions.

Water demand is stochastic in nature and responsible for most of the variability in pressure or flow within a water supply system (Magini et al. 2008). Individual consumer demand variability becomes relatively less important when aggregating users. The effect of spatial resolution on water demands and network flows has been studied in the past. Transport mains (with few demand nodes that represent aggregated users) maintain more consistent flow rates and have higher correlation than individual consumer withdrawals (e.g., Blokker et al. 2008).

Water demands on a household level behave as sporadic pulses (e.g., Buchberger & Wu 1995; Blokker et al. 2010). These pulses are the result of each user's behavior that is individually specific and independent from other users (Díaz & González 2020). Thus, consumers' demands typically have low auto and cross-correlations (e.g., Filion et al. 2006; Blokker et al. 2008). However, as users are exposed to similar external factors (e.g., weather, work schedules), apparent correlations are seen in flow series that represent aggregated users (Díaz & González 2021, 2022). This aggregation effect permits the common modeling practice of lumping users at a single node in network hydraulic models (e.g., Kang & Lansey 2009; De Oliveira & Boccelli 2021).

In this work, two scales are differentiated: the micro and macro scales. The micro level refers to individual fixtures or households, typically associated with time scales in the order of seconds/minutes or minutes/hours, that are determined by the household end-uses (e.g., taking a shower or washing hands). The macro level corresponds to users/households that are aggregated at a node or networks/subnetworks (sets of nodes) within the system. The time scale for nodal measurements is usually the hydraulic model time step (from several minutes to hours), whereas the network temporal scale can be several minutes, days or even months depending upon application. Table 1 summarizes the micro- and macro-scale definition and properties of water consumption. The boundaries between micro and macro scales are blurry. Scales for nodes and networks/subnetworks are related to population density. For example, a node in an urban network may represent as many consumers as a network or DMA (District Metered Area) in a suburban/rural system. In some other infrastructure systems, the ‘in-between’ (i.e., user aggregation) may be called meso-scale (e.g., Li et al. 2023), but in this work, scales are limited to two categories (micro and macro).

Table 1

Definition and properties of water consumption

Micro scale
Macro scale
FixtureHouseholdNodeNetwork
Consumer Individual person Family WDS model node DMA or source inflow (tank, well, WTP) 
Space Fixture Meter Group of consumers Network or subnetwork 
Time Seconds to minutes Minutes to hours WDS hydraulic time step (several minutes to hours) WDS hydraulic time step or longer depending upon application (several minutes to days or months) 
Measurement Special high resolution metering AMI AMI aggregation or estimation Source flow meter 
Micro scale
Macro scale
FixtureHouseholdNodeNetwork
Consumer Individual person Family WDS model node DMA or source inflow (tank, well, WTP) 
Space Fixture Meter Group of consumers Network or subnetwork 
Time Seconds to minutes Minutes to hours WDS hydraulic time step (several minutes to hours) WDS hydraulic time step or longer depending upon application (several minutes to days or months) 
Measurement Special high resolution metering AMI AMI aggregation or estimation Source flow meter 

WDS, Water Distribution System; WTP, Water Treatment Plant.

Macro- and micro-scales have been studied in the past with the understanding that they are two scales representing the same reality. Linking these two domains is challenging, but necessary to understand the possibilities of hydraulic modeling applications. The aim of this work is to assess the scale effects associated with a set of consumption data to identify the type of analysis that is worthwhile to carry out in a sequent hydraulic modeling application.

Previous works have suggested that aggregation of users impacts correlations and predictive ability. This work attempts to quantify aggregation effects from a large real data set. This novel multi-scale assessment is possible with the unique (large, homogeneous and very complete) data set for a residential area provided by the Oro Valley Water Utility (Arizona, US) and described in the next section. This analysis sheds light on the properties of water flows within a hydraulic network at alternative spatial and temporal scales, laying the groundwork to discuss the implications (usefulness and potential) of data for specific hydraulic applications, such as model construction or leakage detection. In other words, the strategy here adopted will guide the answer to the following questions: What can we do with this data set? Is the spatial/temporal resolution adequate to address a specific application (e.g., leakage detection)? Should we change the measurement strategy/resolution for that purpose? These are relevant questions given the variety of data resolutions and applications that currently coexist in the water industry (Oberascher et al. 2022). This work focuses on the analysis of water consumption. Subsequent modeling and/or forecasting applications are beyond the scope of this paper, but the presented analysis is meaningful to define their potential as discussed in the Implications section.

Study area

Oro Valley is a suburban town located 10 km north of Tucson, Arizona (US) in the western foothills of Santa Catalina Mountains. In 2020, Oro Valley had 47,070 inhabitants (USCB 2023) including a number of winter visitors who reside elsewhere in the summer. The town is relatively affluent with a median household income that is 31% higher than the US median (USCB 2022a).

Oro Valley's water is largely supplied from the local aquifer that is replenished by flow from the Santa Catalina Mountains. The town also has a Colorado river water allocation and reclaims part of its wastewater for irrigation purposes. Even so, they place a significant emphasis on water conservation (Oro Valley 2023). To that end, the Oro Valley Water Utility deployed AMI with the goal of increasing network efficiency and reducing household leakage losses. The utility has commissioned Tetra Tech (2022) to perform a data analytics evaluation with the aim of identifying usage patterns, establishing usage metrics, and creating a digital dashboard for analytical support.

Data set

Oro Valley Water Utility provides water to 20,620 AMI volumetric flow meters within its service area. Hourly consumption is recorded with 1-gallon (3.79 L) resolution. The roughly 18,000 single-family households are analyzed in this study for the period of January 1, 2019, to December 31, 2020. A preliminary analysis of this data carried out by Tetra Tech (2022) recommended to discard about 5,000 meters due to data deficiencies (e.g., duplicate meters, meters missing latitude and longitude data, meters with negative values and meters with gaps in data greater than 10 h); leaving 12,697 meters with high-quality hourly data. Consumption records have been anonymized by the water utility and identified by a meter ID with no spatial referencing.

To assess water use at different temporal scales (weeks, days and hours), the time window is adjusted to work with full weeks: from January 7, 2019, to December 27, 2020 (103 weeks). To maintain the records as completely as possible and avoid second-residence or long periods of absence effects (e.g., winter visitors), meters with more than 20 null consumption days (3 weeks) within the 103-week-period were not considered (2,675 in total). Another 22 devices that have exactly 20 null consumption days were removed from the analysis set. This reduction is applied to provide a data set of exactly 10,000 meters that could be conveniently grouped into subsets of 10, 100 and 1,000 units.

This data set is unique for assessing the scale effects of water consumption at demand nodes (i.e., flows at any other location within the system) for several reasons. First, it is large compared to previous studies that rarely exceed a thousand household meters (Cominola et al. 2015; Mazzoni et al. 2022). Second, all households are relatively homogeneous; similarly constructed single-family homes in a mostly residential suburban town. Third, the data set is very complete with the average number of missing hourly values per meter over the study period of 0.06%. Also of note is that the time window includes the dates in which the COVID outbreak took place (assumed April 1, 2020 in this work), so differences in behavior before and during the first months of the pandemic can be assessed.

Oro Valley Water Utility provided two additional pieces of anonymized metadata; the binned lot size [<465 m2 (5,000 ft2); between 465 and 745 m2 (5,000 and 8,000 ft2); between 745 and 1,394 m2 (8,000 and 15,000 ft2): >1,394 m2 (15,000 ft2) or unknown] and pool availability (yes, no or unknown) for each property. The lot size is greater than 465 m2 (5,000 ft2) for 85% of the households. With respect to pools, 3,099 (∼31%) and 5,039 (∼50%) of the 10,000 analyzed properties have/do not have a pool, respectively (1,862 households have unknown status). Finally, based on usage patterns, Tetra Tech (2022) identified that of the 10,000 homes, 5,091 (∼51%) operate programed irrigation systems.

To assess the stochastic nature of water consumption, its variability and its associated scale effects, the data above is analyzed in two stages: (1) preliminary data analysis and (2) Principal Component Analysis (PCA). Preliminary analysis examines the average and correlation of and between hourly, daily and weekly consumptions that are representative of short-term, medium-term and long-term variability (as defined in Díaz & González 2022), respectively. PCA is a statistical technique that reduces the data set's dimensionality while retaining as much as possible of the variation present in the original data set. Therefore, by analyzing the relative importance of the Principal Components (PCs) and their role in explaining variance, it is possible to understand demand relationships between consumers and how they might be modeled.

Preliminary data analysis

Hourly demand values for time hour i and meter j are represented as , with hours and meters. With few missing data, no imputation procedure was applied to estimate their value. Data analysis included computing the mean and variance of aggregated hourly demands and correlation studies for different time intervals and spatial aggregation levels.

Spatial aggregation: Total consumption and average pattern

From a spatial perspective, the demand series of individual meters can be aggregated to compute the total hourly consumption (). The aggregated hourly consumption across all meters and time steps is computed as:
(1)
When subperiods are considered (e.g., before and after the COVID outbreak), the limits of time index i are modified accordingly.
The aggregated daily average (i.e., expected) pattern at an hour time step can be computed by averaging each hour's () consumption:
(2)
Similarly, the associated standard deviation for a given hour, o, can be computed as:
(3)
Assuming that the hourly data are normally distributed, the upper and lower confidence intervals can be computed from the variance as:
(4)
where represents the inverse of the normal Cumulative Distribution Function (CDF). The aggregated daily average pattern (Equation (2)) and its confidence interval (Equation (4)) gives an understanding of macro demand behavior equivalent to the average demand of a small town (10,000 households).

Temporal aggregation: Daily consumption and representative hourly consumption for different time windows

From a temporal perspective, each household's hourly data can be summed to compute the daily (, with nd = 721 days) and weekly (, with nw = 103 weeks) total withdrawals:
(5)
(6)
Averaging the consumption computed in Equations (5) and (6) provides representative hourly consumption rates over different time intervals (days and weeks). Hourly consumption rates can be averaged over the corresponding day (; ) or week (; ) as:
(7)
(8)
where and are household 's average hourly consumption for day k and week l, respectively.

Construction of data matrices

Remaining manipulations in the preliminary analysis (and subsequent PCA) use raw hourly consumption values () and hourly averages over daily and weekly windows ( and , respectively). Different matrices, generally called ‘individual data matrices’ (), collect each individual meter's consumptions over time. Each of the columns corresponds to a meter. Depending upon the temporal scale, will have a different numbers of rows n ( for hours, for days or for weeks):
(9)
(10)
(11)

Each of these matrices can be manipulated to compute an equivalent ‘group data matrix that contains as many columns as the number of groups being considered. As previously noted, households are grouped in this work every 10 meters ( units, groups), 100 meters ( units, groups) and 1,000 meters ( units, groups), leading to differently sized matrices. To avoid excessive computation times, one set of randomly identified groups are developed in this work (i.e., meters are aggregated randomly once).

The group data matrix is computed by summing the demands from the individual data matrix for meters within the group. For example, the group data matrix for hourly demand values and group size can be computed as:
(12)

Equations similar to Equation (12) can be written for average hourly demands within each day () and average hourly consumptions within each week ().

Three group data matrices are computed for each temporal level for the three grouping levels mg, i.e., for the hourly, daily and weekly data (with superscripts dropped for convenience in the general discussion that follows). For the sake of simplicity, hereafter, both individual () and grouped () data matrices will be referred to as data matrices (with a general dimension).

Standardization of data matrices

Matrices are ‘standardized’ in this work per spatial unit (e.g., per meter or group of meters) so that each unit contributes equally to the analysis. Standardization is implemented by taking the difference between the matrix values of a meter/group of meters s (, i.e., -column values in matrix ) and the mean of consumption for that same meter/group of meters (, i.e., mean of -column values in matrix ). This difference is then divided by the corresponding standard deviation (, i.e., standard deviation of -column values in matrix ) or:
(13)

Since water consumption is standardized per spatial unit, the mean and standard deviation of each column of are equal to 0 and 1, respectively.

Correlation analysis

As noted in the Introduction, previous works have discussed correlation across users for different temporal and spatial resolution levels. Correlation is explored in this work by computing the variance-covariance matrix of the standardized data matrices () as:
(14)

After standardization, all diagonals of the matrix will equal 1. Since the size of the covariance matrix varies with the spatial unit scale (individual meters or groups of meters ), it is not straightforward to directly compare correlations across different spatiotemporal scales. The CDF of the upper triangular submatrix of (symmetric matrix) is computed to facilitate comparison.

Principal Component Analysis (PCA)

PCA linearly transforms data into a new coordinate system where the variation in the data can be described with fewer dimensions by creating new uncorrelated variables that successively maximize variance (Jollife & Cadima 2016).

Data matrices and standardization

Like in the exploratory data analysis, the PCA transformation is applied to a spatially standardized data matrix () that results from standardizing a data matrix () as in Equation (13). In this work, PCA is applied sequentially for successive temporal levels so that the effect of the previous temporal layer is removed as the process progresses. The individual data matrix of average hourly consumption values within each week () is directly used as in the preliminary analysis:
(15)
Next, the average hourly consumption values within each day are normalized by the corresponding average hourly consumption value within that week to eliminate seasonal effects:
(16)
where refers to the integer part of its argument, i.e., the average hourly value of each day is divided by the corresponding average hourly value for the corresponding week , like a rolling window. Similarly, raw hourly values are here normalized by the corresponding average hourly value for that day:
(17)
Note that Equations (15)–(17) () can also be adapted to consider groups () by adding the columns of as in Equation (12). After individual or group data matrices, generally called , are computed for each temporal and spatial scale, they are standardized as in Equation (13) to obtain the standardized matrix . The covariance matrix of , here named , can also be computed as in Equation (14).

PCA transformation

The PCA transformation is defined so that the data in the new coordinate system are a linear transformation of the original data set (Jollife 2002). For each matrix :
(18)
where is an matrix that represents the new coordinates or scores, computed as the projection of over the new orthonormal coordinate system defined by the matrix . represents the new space where PCs are selected to maximize the variance that is transferred to . Each column of represents a so-called eigenvector.
According to the definition of eigenvalue and eigenvector (e.g., Larson & Favlo 2009), any matrix (including the covariance matrix ) can be expressed in terms of its associated eigenvalues and eigenvectors:
(19)
where is a diagonal matrix with the eigenvalues . Equation (19) can alternatively be written as:
(20)
The procedure to compute eigenvalues and eigenvectors begins by computing the eigenvalues as:
(21)

Equation (21) is the characteristic equation of the matrix that is to be decomposed and can be expanded to a polynomial form. The roots of the polynomial equation are the eigenvalues. Eigenvectors can be computed after the eigenvalues are computed as the solution of the homogeneous linear system of equations (Equation (20)). Eigenvalues and eigenvectors are sorted in the descending order of their eigenvalues.

Once the eigenvalues and eigenvectors are known, the original data and the associated covariance matrix can be reconstructed as (Shlens 2014):
(22)
(23)
The percentage of variance explained by each PC () can be computed as:
(24)
As noted, the covariance matrix provides the dependencies across spatial units (e.g., meters). The diagonal of this matrix (i.e., variance) can be expanded from Equation (23) as:
(25)
and the individual weight of each PC to the variance of each meter () can be computed as:
(26)

The percentage of variance explained by each component and the specific contribution of each PC to each spatial unit's variance are useful indicators for the extent to which the model dimensionality can be reduced and give an idea of the singularity of each meter/group. Reducing the dimensionality of the problem (i.e., selecting fewer eigenvalues/eigenvectors) to reconstruct the series and/or its variance is associated with a reduction in the percentage of explained variance.

Preliminary data analysis

Spatial and temporal aggregation: Daily consumption and average daily pattern

The average daily water consumption for the 10,000 homes sample set is 279.0 gallons/HH/day (1,056 L/HH/day), which is slightly higher than the average consumption in the United States [roughly 205 gallons/HH/day (776 L/HH/day) with an average per capita consumption of 82 gallons (310 L) (USGS 2018) and average number of people per household of 2.5 (USCB 2022b)]. A higher demand is reasonable given most residences are single family homes (not apartments) in an arid environment. The household demand has significant variability as seen in Figure 1(a). Slightly more than half of the households use less than the US average of 200 gallons/HH/day (757 L/HH/day) while some households use eight times the US average.
Figure 1

Histogram of the average household daily consumption for: (a) all meters, (b) meters without scheduled irrigation system, and (c) meters with scheduled irrigation system.

Figure 1

Histogram of the average household daily consumption for: (a) all meters, (b) meters without scheduled irrigation system, and (c) meters with scheduled irrigation system.

Close modal

Using Tetra Tech (2022) classified households without and with scheduled irrigation systems, Figure 1(b) and 1(c) are daily consumption histograms for the two meters subsets. For reference, the 4,909 users without programmed irrigation systems consume 177.8 gallons/HH/day on average (672.9 L/HH/day) while 5,091 customers with programmed irrigation systems use on average 376.7 gallons/HH/day (1,425.9 L/HH/day). Figure 1(c) shows that most of the variability in the dataset is associated with outdoor use.

Figure 2(a) shows the average aggregated consumption pattern ( as in Equation (2)) and its 95% confidence interval (Equation (4)) for the full study period (from January 7, 2019, to December 27, 2020) for the 10,000 meters. As often seen in US daily consumption, a pronounced morning and a less prominent evening peak occur around 7:00 and 19:00, respectively. The horizontal dashed line corresponds to the average hourly community demand, . To obtain the average household consumption (279.0 gallons/day), the neighborhood value is multiplied by 24 hours and divided by 10,000 households. The average standard deviation of hourly demand, , is also included in Figure 2(a) for reference.
Figure 2

Total consumption average hourly pattern and 95% confidence interval: (a) full period (January 7, 2019 – December 27, 2020), (b) before COVID-19 outbreak (April 1, 2019 – December 27, 2019) and (c) after COVID-19 outbreak (April 1, 2020 – December 27, 2020).

Figure 2

Total consumption average hourly pattern and 95% confidence interval: (a) full period (January 7, 2019 – December 27, 2020), (b) before COVID-19 outbreak (April 1, 2019 – December 27, 2019) and (c) after COVID-19 outbreak (April 1, 2020 – December 27, 2020).

Close modal

Figure 2(b) and 2(c) show the daily withdrawal pattern before and after the COVID-19 outbreak (assumed to begin on April 1, 2020), respectively. Due to confinement and/or mobility restrictions (Díaz et al. 2021), the average household consumption increased [from 287.7 gallons/day (1,089.0 L/day) to 315.9 gallons/day (1,195.8 L/day)] over the comparable time window (271 days). The temporal distribution of use also changed during COVID. Use in the community became more consistent as seen with decreased standard deviations (Equation (3)) and confidence interval widths (Equation (4)) after the COVID outbreak (Figure 2(c), in gray). This is captured by an increase in the average of and a reduction in the average of .

Temporal aggregation: Construction and standardization of data matrices

Some of the household consumption variability can be explained with the household characteristics (i.e., lot size, pool availability and scheduled irrigation system availability). To complete this assessment, standardized individual data matrices are computed using Equations (9)–(11) and (13) and averaged for meter subsets based on lot size and existence of a pool or scheduled irrigation system (average each group). Results are presented from weekly to hourly to progressively increase the level of detail.

Figure 3(a) presents the group average (according to the binned lot size) of the standardized average hourly values within each week (). This figure shows that values change seasonally over the time window with higher water demands in spring/summer compared to winter, irrespective of the lot size group. During the COVID-19 pandemic in 2020, consumption increased throughout the year. The amplitude of the standardized seasonal variations tracks with lot sizes. Since the data is standardized by meter, residences with larger lot sizes are not consuming less water than small houses in winter, rather they are using less water relative to their average consumption, that includes significant irrigation, which takes place mainly during summer.
Figure 3

Time series of group averages of standardized water consumption per lot size group for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Figure 3

Time series of group averages of standardized water consumption per lot size group for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Close modal

The standardized values for the daily level are plotted in Figure 3(b) for the first 32 weeks of 2019 (roughly 8 months). While the frequency of the time series increases because of the daily periodicity, the trends are similar to the weekly pattern regardless of the lot size group and, as in the weekly data, amplitudes increase with the lot size. To compare summer and winter, hourly patterns are presented. The group averages for the standardized hourly values enhanced for two winter and summer weeks are shown in Figure 3(c) and 3(d), respectively. The trend in variability by season and lot size is again seen in the plots with the most significant differences associated with use amplitude. All lot sizes have similar hourly patterns with greater summer amplitudes (likely due to irrigation). Also, weekday-weekend differences are more apparent in summer (Figure 3(d)) (weekends correspond to 08/24/2019–08/25/2019 and 08/31/2019–09/01/2019) compared to winter (Figure 3(c)) (weekends correspond to 01/12/2019–01/13/2019 and 01/19/2019–01/20/2019).

Supplementary material, Figure S1 shows the evolution of the group average of standardized consumption values for different time windows for homes with and without pools. Again, amplitudes are larger for households with a pool in a pattern similar to homes with larger lots and have lower variability in winter. As with different lot sizes, the standardized distributions for both groups are similar for the two summer weeks. These tendencies are seen across groups and temporal aggregation levels.

Figure 4 shows the equivalent for houses with/without a scheduled irrigation system. Figure 4(a) and 4(b) plot weekly and daily standardized usage that is consistent with Figure 3(a) and 3(b) and Supplementary material, Figure S1(a) and S1(b). Standardized hourly water use patterns for winter (Figure 4(c)) and summer (Figure 4(d)), however, are different. Households with scheduled irrigation have a more variable hourly pattern and, in response to extensive public education on optimal irrigation timing (City of Tucson 2023), significantly higher morning peaks than homes without programmed irrigation systems. Houses without scheduled irrigation systems also experience seasonal changes correlated with air temperatures.
Figure 4

Time series of group averages of standardized water consumption per scheduled irrigation system availability group for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Figure 4

Time series of group averages of standardized water consumption per scheduled irrigation system availability group for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Close modal

To summarize the standardized consumption findings, water consumption trends are very similar regardless of the lot size or pool availability group. Amplitudes change at consistent times in the daily and weekly patterns implying that these factors condition the average consumption but not its evolution over time. On the other hand, programed irrigation systems (Figure 4) affect consumption timing.

Correlation analysis

Figure 5 displays the CDFs of the upper triangle of the variance-covariance matrices of the standardized data () for different temporal and spatial aggregation levels (Equation (14)). The CDF for individual meters (Figure 5(a)) demonstrates that the overall correlation across meters is low, with 95% of the values falling below 0.5 for all temporal resolutions. This result is consistent with Blokker et al. (2008) and Filion et al. (2006) who argue that at the individual user level correlation is low. The CDF becomes more vertical around 0 (i.e., lower dependency) at shorter time intervals and user dependencies increase with temporal aggregation over longer time intervals, which is consistent with Magini et al. (2008).
Figure 5

CDF of the upper triangle of the covariance matrix for standardized data matrices when considering weekly, daily and hourly temporal levels and different spatial aggregations: (a) individual meters, (b) every 10 meters, (c) every 100 meters and (d) every 1,000 meters.

Figure 5

CDF of the upper triangle of the covariance matrix for standardized data matrices when considering weekly, daily and hourly temporal levels and different spatial aggregations: (a) individual meters, (b) every 10 meters, (c) every 100 meters and (d) every 1,000 meters.

Close modal

Figure 5(b)–5(d) show the similar plots for spatial aggregation levels (). The three curves within each subplot (weekly, daily and hourly) shift towards the right with higher spatial aggregation levels, demonstrating that correlation between user groups increases. Even for hourly data, correlations are low for groups of 1–10 meters but substantially higher when aggregating 100–1,000 users. These results raise caution when developing a hydraulic model and assuming all nodes have the same average patterns (i.e., top-down approach – Blokker et al. 2011). That assumption requires a minimum level of aggregation in the order of hundreds or thousands of users that may not be reached in suburban areas in which nodes may represent tens of residences as discussed in the Implications section.

Principal Component Analysis (PCA)

PCA assesses to what extent it is possible to explain the variation in data with fewer dimensions. PCA is applied here to identify the strength of the relationship among meters at different spatiotemporal aggregation scales. It is first applied to individual meters then groups of meters to assess the spatial aggregation effect.

Individual meters

The scores (i.e., projections over the eigenvector space) for the first three and last second principal components of each temporal analysis level are plotted in Figure 6. Figure 6(a) corresponds to the weekly level results and shows that the first principal components have a progressive behavior over time. PC1 is intuitively associated with seasonal temperature changes, a major driver for water consumption. Other components cannot naturally be associated with specific phenomena. Of the 10,000 eigenvalues/eigenvectors associated with the analysis, the last two meaningful components correspond to PCs 101 and 102 or number of degrees of freedom computed as the minimum dimension of matrix minus 1 (in this case, ). Figure 6(a) shows that first components PCs introduce a very different effect compared to the higher order components that are minor tweaks to fit the data (i.e., local effects).
Figure 6

Score evolution for the first and last principal components for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Figure 6

Score evolution for the first and last principal components for different temporal levels: (a) weekly (for full period – roughly 2 years), (b) daily (for 32 weeks – roughly 8 months), (c) hourly (for 2 weeks) of winter and (d) hourly (for 2 weeks) of summer.

Close modal

Figure 6(b) represents the score on a daily level after weekly effects are removed. PC1's evolution practically mirrors PC2's, indicating that the pattern provided by the first component is partly counterbalanced by the second component. This result shows that no clear tendency exists in the data and several components must be combined to reconstruct an individual user's consumption. Figure 6(c)–6(d) show the winter and summer hourly level PCA scores, respectively, after daily effects are removed. These figures show the ‘usual’ hourly patterns represented by PC1, with morning and evening peaks that are higher in summer and low overnight. PC1 is shaded by the sequent principal components to account for the individual meter variability. Figure 4(c) and 4(d) show slightly different total use and temporal distribution on weekdays compared to weekends. Weekends tend to be lower in consumption and have smaller morning peaks regardless of the season. This is reflected in the PC weights that are also lower on weekends for comparable times (Figure 6(c) and 6(d)).

Figure 7 shows the evolution of the cumulative percentage of explained variance with the number of PCs for all temporal aggregation levels. The percentage of variance explained by PC1 is low (less than 25%) for all temporal windows, demonstrating that additional PCs are required to represent the variance. PC1 explains 23% of the variance in weekly level analysis, but it only explains 4 and 9% of the daily and hourly windows, respectively. These results imply that the pattern is less distinct for daily values that change over the week than for the hourly pattern, which varies during the day.
Figure 7

Percentage of explained variance according to the number of PCs for different temporal levels (weekly, daily, and hourly). Dashed lines represent the maximum number of PCs for the different temporal levels.

Figure 7

Percentage of explained variance according to the number of PCs for different temporal levels (weekly, daily, and hourly). Dashed lines represent the maximum number of PCs for the different temporal levels.

Close modal
Figure 8 shows a colormap of the percentage of variance explained by each PC for each meter (Equation (26)). For consistency between durations, each row in the colormap represents the weight for the first 103 PCs and visualizes the differences in variance explained by each component relative to the first (and most important) PC. The meters are ordered according to the first PC's magnitude. The predominantly blue color indicates that a PC's weight is low and reinforces the notion that many PCs are required to explain the variability of water consumption on an individual household basis, with a limited pattern in common.
Figure 8

Colormap of the percentage of variance explained by each PC for each meter and different temporal layers: (a) weekly, (b) daily and (c) hourly.

Figure 8

Colormap of the percentage of variance explained by each PC for each meter and different temporal layers: (a) weekly, (b) daily and (c) hourly.

Close modal

Combining all households in this analysis could potentially bias the above results. However, Figure 3 and Supplementary material, Figure S1 shows that the lot size and pool availability affect the amplitude of water consumption but have little impact on its temporal evolution. Therefore, conducting PCA for lot size or pool availability grouping would provide similar conclusions. However, scheduled irrigation systems do appear to affect water demand timing (Figure 4), so the sequential PCA analysis was repeated for the two samples without (4,909 households) and with (5,091 homes) scheduled irrigation systems.

Similar to Figure 6, Supplementary material, Figure S2 and S3 display the evolution of PCA scores for the first and last principal components within each subset. The daily pattern of PC scores is less erratic in households without programmed irrigation (Supplementary material, Figure S2(b)) relative to all homes (Figure 6(b)) and the most random group with scheduled irrigation (Supplementary material, Figure S3(b)). This is likely due to the variability of irrigation timing in homes with programmed irrigation. Indoor uses are likely more consistent between homes. Households without programmed irrigation adopt manual periodic irrigation (e.g., weekly or lower period) and are likely more similar among those users than households with programmed irrigation. These trends are seen through PCA, which represents the periodicity of groups of residences.

Following recommended watering patterns, the morning peak is higher in homes with timed irrigation systems (Supplementary material, Figure S3) compared to residences without them (Supplementary material, Figure S2). Peaks are in general lower in winter (Supplementary material, Figure S2(c) and S3(c)) than in summer (Supplementary material, Figure S2(d) and Figure S3(d)). This pinpoints that most scheduled irrigation activities take place in the morning and are especially intense in summer, which is consistent with Oro Valley's recommended practice.

The strong PC1 peaks for all days in homes with scheduled irrigation demonstrate its significant influence on demand. Interestingly, households without programmed irrigation in summer (Supplementary material, Figure S2(d)) and with programmed irrigation in winter (Supplementary material, Figure S3(c)) both have strong early morning weekday PC1 peaks. This may suggest that the former homes are watering in some manner early in the day and that the many residents in the latter group are not turning off their irrigation systems in the fall. However, PCA can only assess relative consumption, so this hypothesis should be tested considering absolute consumption values and typical practices at the household level. In homes without programmed irrigation, PC1 reflects the decrease and wider distribution of morning demand on weekends and is tweaked by the behavior of the other PCs.

The colormaps (Supplementary material, Figure S4 – no scheduled irrigation system – and Supplementary material, Figure S5 – with scheduled irrigation system) are predominantly blue and demonstrate that a significant number of PCs are needed to explain the variability of household water consumption for these household classes. Table 2 lists the mean and variance of the percentage of variance explained by PC1, PC2 and PC3 for all meters and the subclasses without and with programmed irrigation systems. These values are equivalent to the mean and variance of the first three columns in the colormaps in Figure 8, Supplementary material, Figures S4 and S5.

Table 2

Mean and variance of the percentage of variance explained by PC1, PC2 and PC3 for different temporal scales and types of meters

All meters (10,000)
Meters without scheduled irrigation system (4,909)
Meters with scheduled irrigation system (5,091)
Temporal levelPCMeanVarMeanVarMeanVar
Weekly PC1 22.9 456.4 17.0 317.4 29.0 523.1 
PC2 8.8 142.1 8.8 134.3 8.7 145.7 
PC3 5.3 68.3 5.1 62.0 5.5 75.0 
Daily PC1 3.9 61.8 3.8 42.6 5.5 156.8 
PC2 2.8 30.7 1.7 17.3 3.0 23.0 
PC3 1.6 23.2 1.1 10.3 2.4 94.8 
Hourly PC1 8.6 85.6 6.1 27.4 11.7 130.1 
PC2 3.9 35.5 2.5 10.2 5.7 57.1 
PC3 2.8 16.6 2.4 14.3 3.6 26.9 
All meters (10,000)
Meters without scheduled irrigation system (4,909)
Meters with scheduled irrigation system (5,091)
Temporal levelPCMeanVarMeanVarMeanVar
Weekly PC1 22.9 456.4 17.0 317.4 29.0 523.1 
PC2 8.8 142.1 8.8 134.3 8.7 145.7 
PC3 5.3 68.3 5.1 62.0 5.5 75.0 
Daily PC1 3.9 61.8 3.8 42.6 5.5 156.8 
PC2 2.8 30.7 1.7 17.3 3.0 23.0 
PC3 1.6 23.2 1.1 10.3 2.4 94.8 
Hourly PC1 8.6 85.6 6.1 27.4 11.7 130.1 
PC2 3.9 35.5 2.5 10.2 5.7 57.1 
PC3 2.8 16.6 2.4 14.3 3.6 26.9 

The mean of PC1 is higher for the scheduled irrigation meters subgroup for all temporal aggregation levels. The variance of PC1 is also considerably higher in this subclass, which is anticipated given the variability in irrigation household schedules. Trends are less clear for PC2 and PC3 at all temporal levels. However, nearly all means and variances of homes with (without) programmed irrigation are larger (smaller) than the all-meter class, which is a weighted combination of the two subclasses.

This analysis demonstrates that the variability of household water consumption is difficult to explain (i.e., requires many PCs) even for residences that have key characteristics in common (e.g., a scheduled irrigation system). It also suggests that individual residence water consumption models should include a common component with a low or medium weight and other behaviors/tweaks that may be specific to each household.

Since the water consumption of each meter is unpredictable and any possible modeling would be complicated (due to the presence of zero-consumption periods and how to treat them, the high variety of individual patterns, etc.), it is worth wondering if it is truly indispensable to model each household individually for some specific applications. If demand is characterized with the aim of analyzing the state of the hydraulic network, it may be enough to characterize the behavior on a spatially aggregated level. The following subsection analyzes the extent that statistical properties are simplified due to aggregation in the upstream direction. As the availability of a scheduled irrigation system does not seem to influence how the variance is explained, the remainder of this work will deal with the whole set of 10,000 meters.

Spatially aggregated users

The previous section highlighted the high number of PCs needed for demand prediction for individual households. The exploratory data analysis showed that, in general, higher spatially aggregated data has higher correlations. Given higher correlations in demand, a logical follow-up to this work is to determine if demands for spatially aggregated users can be predicted with better accuracy.

Table 3 lists the percentage of the variance explained by the first PC for different temporal and spatial aggregation levels. First, the percentage of explained variance increases with the level of spatial aggregation. Next, similar to results in Figure 7, the percentage of variance explained by PC1 is greatest for weekly temporal aggregation followed by hourly and finally daily temporal scales for the four spatial aggregation levels. Thus, for daily and hourly temporal aggregations, a higher spatial aggregation level is needed to reach the same level of PC1's explained variance in the weekly analysis.

Table 3

Percentage of explained variance by PC1 for different temporal scales and spatial aggregation levels

Individual metersEvery 10 metersEvery 100 metersEvery 1,000 meters
Weekly 22.9 56.0 91.5 98.8 
Daily 3.9 10.6 47.7 88.2 
Hourly 8.6 37.9 83.2 96.4 
Individual metersEvery 10 metersEvery 100 metersEvery 1,000 meters
Weekly 22.9 56.0 91.5 98.8 
Daily 3.9 10.6 47.7 88.2 
Hourly 8.6 37.9 83.2 96.4 

This table shows that reasonable percentages of explained variance (over 90%) for weekly and hourly time scales can be achieved if over 1,000 meters are aggregated. Demand aggregation is therefore promising with the aim of developing water consumption models to characterize a network's hydraulic behavior, particularly for dense urban systems with a large number of customers served through a node.

Aggregation and apparent correlation

Results show that consumption/flow series have some commonality depending on the spatial aggregation level. When the spatial aggregation level is low (i.e., micro scale), users have weak relationships among each other, and a significant number of principal components is needed to explain the variance of water consumption. The average consumption pattern is not related to any specific individual series because consumption at each household experiences significant variations. Correlation appears when water consumption is aggregated, as water users share similar external factors and fewer principal components are needed to explain the variability of the flow series.

In other words, random variabilities and individual user differences cancel out when aggregating households (i.e., macro scale), and the average pattern is what remains after the aggregation process. The correlation that appears (i.e., apparent correlation) is a result of causality that is determined by broad user behavior (Díaz & González 2022). For example, consider the morning peak, when most consumers wake, prepare for the day, and leave their homes sometime between 6:30 and 9:00 am. With a small set of users, the distribution of water use may be dispersed over the full 2.5 hours. This effect is seen in the low correlations and explained variance for individual and ten user sets (Figure 5 and Table 3, respectively). As more users are included, the signal becomes more distinct and recognizable at specific times and water use can be more readily explained. Temporal aggregation will likely have a similar impact. This hypothesis should be more appropriately tested with data with shorter time intervals (i.e., below the hour).

Micro vs macro scales and demand modeling

The data analysis above suggests that modeling/forecasting water consumption on a customer (micro scale) basis requires a flexible and versatile model that can adapt to each user (e.g., household waking time). On the other hand, modeling/forecasting for aggregated users (hundreds or thousands of users, i.e., macro scale) should be less complex and therefore more accurate.

High resolution demand (micro) models are relatively new. In their literature review, Creaco et al. (2017) identified two types of household demand prediction models. Household models (first type) directly estimate residential consumption (e.g., Buchberger & Wu 1995) using high resolution flow measurements to adjust site-specific statistical process parameters. The second type, known as end-use models, build household consumption by summing micro-component (end-use/fixture) demands (e.g., SIMDEUM – Blokker et al. 2010). End-use models are based on Monte Carlo simulations that are driven by surveyed/measured/estimated end-use intensity, duration and frequency (IDF) parameters and socioeconomic information.

This work shows that variations appear in the same household at different times and between households at the same time, so a sufficient number of simulations and households are needed to accurately compute an average consumption pattern and its associated variance (Blokker et al. 2011; Díaz & González 2021). Further, model parameters should be specified for each customer using flow measurements or precise inhabitant information. That is, running simulations using ‘off the shelf’ IDF parameters is unlikely to be representative of a specific household that would require adjusted IDF parameters. Thus, without precise site-specific data, end-use models are appropriate to model a significant number of aggregated customers (e.g., 100–1,000). The lack of correlation and significant number of PCs needed according to the analysis presented here suggest that individual residence variability will result in large errors in small samples (10 or less).

Finally, from the data presented here, macro models that are generally statistically based will be useful for 100–1,000 households. More data and analysis are needed for both micro and macro models on spatial aggregations between 10 and 100 households.

Hydraulic modeling scales

The expectations on accurate demands have implications for the temporal and spatial hydraulic modeling scales, particularly for applications like leakage assessment, real-time modeling or water quality analysis. Estimating demands for one hundred or greater households may be appropriate for nodes in urban areas, DMAs or small systems. Defining nodal demands in suburban areas, such as in the US, may prove more difficult with 20 or fewer homes being supplied by each node. Modeling is more complex if commercial, industrial and/or public uses (out of the scope of this paper) coexist with residential consumption (Creaco et al. 2017).

While estimating demands is a valuable research direction, understanding the effect of its uncertainty on model predictions will be a key driver for practitioner modeling. Model outputs of concern include pressures for real-time control and leakage management, and water quality for chlorine injection rates. All are functions of nodal demands and their spatial distribution. In an all-pipe hydraulic model, nodal demands may be driven by a small number of users and have high uncertainty. Aggregating nodes (and pipes) increases the number of consumers and reduces demand prediction errors, but aggregation introduces model representation errors. Impacts will vary for hydraulic modeling and local water quality analyses.

Cost-benefit analyses should be carried out to define the scale of interest for researchers/practitioners depending on their specific application. For example, if a hydraulic model is to be calibrated with a limited budget for instrumentation, building the hydraulic model down to each water connection may be impractical. Rather than locating flow meters at the entrance to a few households, which only enables us to characterize what happens in those homes, installing flow meters at strategic positions that aggregate several users may provide a better representation of reality. If leakage is to be assessed (quantify amount of water lost), water balances on hourly or daily data may be sufficient while leak location may require better demand resolution. As noted, the spatial aggregation level conditions the temporal resolution. Therefore, scale effects should be considered to optimize the instrumentation and monitoring strategy according to economic, technical and/or technological constraints for each application.

Future work

Results from this work show that different types of users coexist in Oro Valley's data set. These differences occur with and without scheduled irrigation systems, but outdoor use clearly complicates water consumption patterns. This work has not attempted to isolate outdoor uses due to the relatively low measurement resolution (1 gallon and 1 hour). Previous works show that higher resolution measurements are needed to separate outdoor from indoor use (e.g., Meyer et al. 2021). Of interest is assessing the similarities among users for only indoor use through PCA in other data sets. Moreover, if data were not anonymized, satellite image processing (e.g., Halipu et al. 2022) or weather metadata (e.g., Xenochristou et al. 2019) could be used to correlate outdoor use to other external factors. Clustering could also be useful to strategically aggregate users (Noiva et al. 2016) in ways that maximize their apparent correlation.

This work assesses the stochastic structure of water consumption on multiple temporal and spatial scales through the analysis of a unique residential data set. An initial preliminary analysis is conducted to examine the variability in the consumption series. Sequential PCA is then applied to assess the relationship among users at different temporal scales (weekly, daily, and hourly) and to explore the spatial aggregation effect in flow series (individual users vs groups every 10, 100, and 1,000 meters).

In this unique data set, individual household consumption is uncorrelated irrespective of the temporal scale, and correlation (i.e., apparent correlation) grows with spatial aggregation as a result of causality. Thus, modeling/forecasting water consumption per customer (i.e., micro scale) will require a versatile type of model that is to be adjusted to individual users. However, the effort required to build models per customer may be substantial and prohibitive when building a hydraulic model for simulating network flows within a water system. Standard parameters may be enough to get average patterns from such models on a macro scale, but not on a micro scale. Using groups of consumers (rather than individual household water demands) may be useful for some applications (such as hydraulic model construction or leakage assessment) but not for others (e.g., leakage location).

Thus, using this unique data set, water consumption/flow series are shown to have some commonality depending on the spatial aggregation level. Previous studies have discussed the aggregation effect, but the novelty of this work lies in demonstrating that demand series must be aggregated from many (perhaps hundreds or thousands) users/meters to accurately analyze/model/forecast actual demand patterns with common approaches. If aggregation is insufficient, common patterns will not exist because individual users behave randomly, and modeling and forecasting will be compromised. Therefore, the level of aggregation will determine the ability of the model to explain demand variability and may vary depending on the specific modeling needs and application characteristics.

Further research is needed to weight the impacts of mixed uses on demand aggregation and sequent modeling. In addition, the potential for improving predictability and understanding of users should be explored through clustering and follow-on statistical analysis. Finally, as AMI metering becomes more prevalent, other datasets including those with short reporting intervals should be examined to better understand water consumption and demand forecasting.

The authors thank Oro Valley Water Utility and Tetra Tech for collaborating with us by providing the data and answering questions about the utility and data. S.D. thanks the Spanish Ministry of Universities for the financial support (CAS21/00392) provided to visit the University of Arizona in Fall 2022 under the Fulbright Program. M.P. thanks the Austrian Marshall Plan Foundation for financially supporting his stay at the University of Arizona in 2022–2023.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Blokker
E. J. M.
,
Vreeburg
J. H. G.
,
Buchberger
S. G.
&
van Dijk
J. C.
2008
Importance of demand modelling in network water quality models: A review
.
Drinking Water Eng. Sci.
1
,
27
38
.
https://doi.org/10.5194/dwes-1-27-2008
.
Blokker
E. J. M.
,
Vreeburg
J. H. G.
&
van Dijk
J. C.
2010
Simulating residential water demand with a stochastic end-use model
.
J. Water Resour. Plann. Manage.
136
(
1
),
19
26
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0000002
.
Blokker
E.
,
Beverloo
H.
,
Vogelaar
J.
,
Vreeburg
J.
&
van Dijk
J.
2011
A bottom-up approach of stochastic demand allocation in a hydraulic network model: A sensitivity study of model parameters
.
J. Hydroinf.
13
(
4
),
714
728
.
https://doi.org/10.2166/hydro.2011.067
.
Buchberger
S. G.
&
Wu
L.
1995
Model for instantaneous residential water demands
.
J. Hydraul. Eng.
121
(
3
),
232
246
.
https://doi.org/10.1061/(ASCE)0733-9429(1995)121:3(232)
.
City of Tucson
2023
Seasonal Watering and Landscape Resources
. Available from: https://www.tucsonaz.gov/water/landscape (accessed 20 June 2023)
.
Cominola
A.
,
Giuliani
M.
,
Piga
D.
,
Castelletti
A.
&
Rizzoli
A. E.
2015
Benefits and challenges of using smart meters for advancing residential water demand modeling and management: A review
.
Environ. Modell. Software
72
,
198
214
.
https://doi.org/10.1016/j.envsoft.2015.07.012
.
Creaco
E.
,
Blokker
M.
&
Buchberger
S.
2017
Models for generating household water demand pulses: Literature review and comparison
.
J. Water Resour. Plann. Manage.
143
(
6
),
04017013
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0000763
.
De Oliveira
P. J. A.
&
Boccelli
D. L.
2021
Water distribution nodal demand clustering based on network flow measurements
.
J. Water Resour. Plann. Manage.
147
(
12
),
04021087
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001485
.
Díaz
S.
&
González
J.
2020
Analytical stochastic microcomponent modeling approach to assess network spatial scale effects in water supply systems
.
J. Water Resour. Plann. Manage.
146
(
8
),
04020065
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001237
.
Díaz
S.
&
González
J.
2021
Temporal scale effect analysis for water supply systems monitoring based on a microcomponent stochastic demand model
.
J. Water Resour. Plann. Manage.
147
(
5
),
04021023
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001352
.
Díaz
S.
&
González
J.
2022
Short-term variability effect on peak demand: Assessment based on a microcomponent stochastic demand model
.
Water Resour. Res.
58
,
e2021WR030532
.
https://doi.org/10.1029/2021WR030532
.
Filion
Y. R.
,
Karney
B.
,
Moughton
L. J.
,
Buchberger
S. G.
&
Adams
B. J.
2006
Cross correlation analysis of residential demand in the city of Milford, Ohio
. In
Water Distribution System Analysis 8
,
Cincinnati, Ohio, USA
.
Halipu
A.
,
Wang
X.
,
Iwasaki
E.
,
Yang
W.
&
Kondoh
A.
2022
Quantifying water consumption through satellite estimation of land use/land cover and groundwater storage changes in a hyper-arid region of Egypt
.
Remote Sens.
14
(
11
),
2608
.
https://doi.org/10.3390/rs14112608
.
Jollife
I. T.
2002
Principal Component Analysis
, 2nd edn.
Springer Series in Statistics, Springer
,
USA
.
Jolliffe
I. T.
&
Cadima
J.
2016
Principal component analysis: A review and recent developments
.
Philos. Trans. A
374
,
20150202
.
https://doi.org/10.1098/rsta.2015.0202
.
Kang
D.
&
Lansey
K.
2009
Real-time demand estimation and confidence limit analysis for water distribution systems
.
J. Hydraul. Eng.
135
(
10
),
825
837
.
https://doi.org/10.1061/(ASCE)HY.1943-7900.0000086
.
Larson
R.
&
Falvo
D. C.
2009
Elementary Linear Algebra
.
Houghton Mifflin Harcourt Publishing Company
,
Boston, USA
.
Li
R.
,
Chester
M. V.
,
Hondula
D. M.
,
Middel
A.
,
Vanos
J. K.
&
Watkins
L.
2023
Repurposing mesoscale traffic models for insights into traveler heat exposure
.
Transp. Res. D: Transp. Environ.
114
,
103548
.
https://doi.org/10.1016/j.trd.2022.103548
.
Magini
R.
,
Pallavicini
I.
&
Guercio
R.
2008
Spatial and temporal scaling properties of water demand
.
J. Water Resour. Plann. Manage.
134
(
3
),
276
284
.
https://doi.org/10.1061/(ASCE)0733-9496(2008)134:3(276)
.
Mazzoni
F.
,
Alvisi
S.
,
Blokker
M.
,
Buchberger
S. G.
,
Castelletti
A.
,
Cominola
A.
,
Gross
M. P.
,
Jacobs
H. E.
,
Mayer
P.
,
Steffelbauer
D. B.
,
Stewart
R. A.
,
Stillwell
A. S.
,
Tzatchkov
V.
,
Yamanaka
V. H. A.
&
Franchini
M.
2022
Investigating the characteristics of residential end uses of water: A worldwide review
.
Water Res.
119500
.
https://doi.org/10.1016/j.watres.2022.119500
.
Meyer
B. E.
,
Nguyen
K.
,
Beal
C. D.
,
Jacobs
H. E.
&
Buchberger
S. G.
2021
Classifying household water use events into indoor and outdoor use: Improving the benefits of basic smart meter data sets
.
J. Water Resour. Plann. Manage.
147
(
12
),
04021079
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001471
.
Noiva
K.
,
Fernández
J. E.
&
Wescoat
J. L.
Jr.
2016
Cluster analysis of urban water supply and demand: toward large-scale comparative sustainability planning
.
Sustainable Cities Soc.
27
,
484
496
.
https://doi.org/10.1016/j.scs.2016.06.003
.
Oberascher
M.
,
Rauch
W.
&
Sitzenfrei
R.
2022
Towards a smart water city: A comprehensive review of applications, data requirements, and communication technologies for integrated management
.
Sustainable Cities Soc.
76
,
103442
.
https://doi.org/10.1016/j.scs.2021.103442
.
Oro Valley
2023
Water Conservation
. .
Shlens
J.
2014
A tutorial on principal component analysis. ArXiv, https://doi.org/10.48550/arXiv.1404.1100
.
Tetra Tech
2022
Water Utility Potable Water Advanced Metering Infrastructure (AMI) Data Analytics Evaluation Model. Volume I: Final Report
.
Town of Oro Valley
,
Arizona, USA
.
USCB
2022b
America's Families and Living Arrangements: 2022
.
United States Census Bureau
. .
USCB
2023
Quick facts: Oro Valley town, Arizona. Available from: https://www.census.gov/quickfacts/fact/table/orovalleytownarizona/POP010220#POP010220 (accessed 20 June 2023)
.
USEPA
2023
Advanced Metering Infrastructure
. .
USGS
2018
Estimated Use of Water in the United States in 2015
.
United States Geological Survey. Water Availability and Use Science Program
,
Virginia, USA
.
Xenochristou
M.
,
Kapelan
Z.
&
Hutton
C.
2019
Using smart demand-metering data and customer characteristics to investigate influence of weather on water consumption in the UK
.
J. Water Resour. Plann. Manage.
146
(
2
),
04019073
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001148
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data