An accurate estimation of residential end uses of water is helpful in developing efficient water systems. If not obtainable through direct metering, this information can be gathered by disaggregating and classifying household-level water-use data. However, most automated techniques require fine-resolution data (e.g., 1 s) and end-use parameters which may be unavailable to water utilities. To fill the above gap, this study presents a method for the automated disaggregation and classification of indoor water-use data collected at the 1-min temporal resolution, and by exclusively relying on the end-use parameter values available in the literature. Specifically, the features of each water-use event detected at the household level are compared against the most common event features for the selected end-use category. The results obtained by testing the method with real data collected at 14 households in two different countries (Italy and the Netherlands) confirm its potential in disaggregating and classifying water end-use events with an average accuracy higher than 90% and an average (normalized) root-mean-square lower than 0.06 despite the lack of information about end uses in individual households. This demonstrates that end-use detection is possible even with data whose resolution is closer to that of most commercial water meters.

  • An enhanced method for water end-use classification is presented.

  • The method processes household water-use data collected at the 1-min resolution.

  • Method performance is evaluated with data collected in different spatial contexts.

  • General, literature-based, end-use parameter values are applied to test the method.

  • Method average accuracy is demonstrated to be higher than 90%.

Urbanization, population growth, and climate change are increasingly affecting water availability in many regions of the world (Bouziotas et al. 2015; Cosgrove & Loucks 2015; Wang et al. 2023). In this era of relevant environmental issues and decreasing availability of resources, water utilities are faced with key decisions to ensure the sustainable management of water systems (Goharian & Burian 2018; Li & Song 2023). More specifically, water-system managers have to adopt strategies to cope with water shortage and thus ensure water availability to future generations (Zanfei et al. 2022; Ghamkhar et al. 2023), spacing from water-use restrictions to leakage monitoring and control, a revision of water price and rates, smart water policies, the introduction of incentives to install water-saving appliances, information, or education (Gleick et al. 2003). However, an accurate estimation of water consumption over space and time – going beyond the limited application and of coarse-resolution data (Melville-Shreeve et al. 2021) – is typically needed to cover knowledge gaps (Bastidas Pacheco et al. 2023), implement most of the above strategies, and evaluate their effects (Koop et al. 2021; Surendra & Deka 2022; Zhang et al. 2024).

In the last decades, owing to the diffusion of smart meters and paired software, several studies have been carried out with the aim of investigating the characteristics of residential water-use with fine levels of spatio-temporal detail, i.e., up to the domestic micro-component scale (end-use level) and with hourly to sub-minute temporal resolution (Mazzoni et al. 2023a). Water-use data collected with fine spatio-temporal levels of detail can be applied not only to carry out strategic assessments in the field of short- and long-term planning of water distribution systems (Stewart et al. 2018), but also to test water demand models (e.g., Blokker et al. 2010), develop technologies aimed at saving or reusing water (Agudelo-Vera et al. 2013), or provide feedback to ensure a sustainable behaviour towards water-use (Beal et al. 2011; Cominola et al. 2021).

Despite their wide range of applications, end-use data may not be directly collectable in the field due to the high costs of monitoring tools, practical difficulties in installing smart meters upon each domestic end-use (i.e., tap, shower, etc.), and user rejection (Mazzoni et al. 2021). These limitations have led to the development of several approaches allowing household-level data to be automatically – or semi-automatically – decomposed (disaggregated) from the aggregate signal, and then assigned (classified) to the different end-use categories (thus enabling a large amount of information about the end uses of water to be gathered). Specifically, several techniques for automated end-use disaggregation and classification have been developed in the last two decades (e.g., Mayer et al. 1999; Kowalski & Marshallsay 2003; Fontdecaba et al. 2013; Nguyen et al. 2013; Cominola et al. 2018; Bethke et al. 2021; Mazzoni et al. 2021; Heydari et al. 2022) based on different approaches – such as decision tree algorithms and machine learning algorithms (Yang et al. 2018) – and relying on household-level data collected at different sampling frequencies (Clifford et al. 2018). However, it is worth observing that – on the one hand – most of the methods for water end-use disaggregation and classification can process data collected at high or very high sampling frequency, i.e., 1–10 s (Mayer et al. 1999; Kowalski & Marshallsay 2003; Fontdecaba et al. 2013; Nguyen et al. 2013; Bethke et al. 2021; Attallah et al. 2023), which may not be easily accessible to water utilities due to issues with meter power sources, battery life, or telemetry network capacity (Heydari et al. 2022). On the other hand, only a small group of disaggregation and classification methods have been developed to process data collected with coarser sampling frequency, e.g., 1-min (Cominola et al. 2018; Mazzoni et al. 2021; Heydari et al. 2022). Still, validation was typically performed only with synthetically generated water-use data (Cominola et al. 2018) or with a limited amount of data collected in the field at the end-use level (Mazzoni et al. 2021; Heydari et al. 2022). Overall, the following two major difficulties affect the state-of-the-art of automated methods for water end-use disaggregation and classification: (i) the majority of automated disaggregation and classification methods can only process water-use data collected with sampling frequencies that may be unavailable to water utilities; and (ii) independently of their nature (i.e., decision trees versus machine-learning algorithms), these methods typically require the input of end-use parameters or training datasets. On the one hand, end-use parameters may be difficult to gather because of the economic and technical limitations affecting intrusive monitoring. On the other hand, most end-use datasets available in the literature have been obtained through automated disaggregation of household-level data, so confidence in classified events is lower (Attallah et al. 2023).

In light of the above difficulties and challenges, the following research question is considered in this study: Is it possible to automatically perform water end-use disaggregation and classification based on data collected with a temporal resolution which is similar to that of many commercial water meters (i.e., more accessible to water utilities) and with no specific information about end-use parameters (i.e., by exclusively relying on the end-use parameters available in the literature)?

To address this question, the study proposes a revised version of the automated method for end-use disaggregation and classification originally developed by Mazzoni et al. (2021) and applicable to water-use data collected at 1-min temporal resolution. In its original structure, the process relied on detailed information about daily periods of fixture use, which was required as input. Therefore, end-use disaggregation and classification were performed only in relation to specific daily periods for each end-use category, making the process strictly time dependent. Moreover, in the event that the number of daily water-use events relatable to a given end-use category exceeded a daily threshold (required as an input), only the earliest were assigned to the category concerned. Thus, in addition to being time-dependent, end-use classification was not performed by prioritizing those events which were most likely related to a given end-use category (i.e., the characteristics of which were the closest to the reference features for the category selected). It is also worth noting that (i) the method was originally tested with a limited dataset of water-use data from four households located in the same geographical area (northern Italy), with no considerable variations in the number of residents; and (ii) disaggregation and classification were performed by exploiting a set of end-use parameter values obtained by installing smart water meters upon individual fixtures in the above-mentioned household sample, which may be inapplicable on a large scale.

To overcome the practical limitations affecting the applicability of its original version, the revised method relies on a new similarity metric allowing time-independent classification. This is possible by considering those water-use events whose characteristics are the closest to the reference water-use characteristics for each given end-use category. In addition, unlike the original version, the effectiveness of the enhanced method is tested with water-use data collected in two different locations (i.e., Italy versus The Netherlands) and by applying a set of end-use parameter values based on the information available in the literature, with the aim of demonstrating its performance in the event that detailed information on water-use characteristics for individual households is not available. To the authors' knowledge, this study represents the first case in the literature in which (i) end-use disaggregation and classification are applied by exclusively relying on end-use parameters available in the literature; and (ii) the performance of an automated method for the disaggregation and classification of water-use data collected at a coarse resolution (i.e., 1 min) is tested by exploiting real end-use data from different countries.

In the following sections, the layout of the enhanced methodology for automated end-use disaggregation and classification is presented, along with the characteristics of the two end-use databases adopted to test its robustness (Materials and Methods). The most relevant results obtained by applying the enhanced method are then discussed (Results and Discussion). Finally, key findings and the most significant outcomes of the research are outlined (Conclusions).

The enhanced method for water end-use disaggregation and classification

The enhanced method for water end-use disaggregation and classification presented in this paper is a revised version of the rule-based method originally proposed by Mazzoni et al. (2021). The method is applicable to water-use data at 1-min resolution. Overall, the following five categories of indoor water-use are detectable: dishwasher, washing machine, shower, taps, and toilet flush.

The main structure of the method is shown in Figure 1. Similar to the original version, it detects (i.e., disaggregates and classifies) individual water-use events one end-use category at a time, and relying on a set of dedicated functions. These functions are applied in a specific order, starting with the detection of automated water-use events (characterized by several water-inflow events per operational programme), and ending with the detection of other indoor water uses. In greater detail, appliance uses (i.e., washing-machine and dishwasher uses) are first searched by applying the function for appliance-use identification; shower events are then detected by means of the function for shower-use identification; finally, residual events – i.e., toilet uses, tap uses, or a combination of them – are detected and classified using a unique function, given that these end-use events are generally less evident at 1-min resolution due to their limited duration. At the end of the process, water-use time series are available for each end-use category.
Figure 1

Method layout.

From an operational standpoint, the enhanced method – developed by using the MATLAB R2019a® software – includes a main code (in which household-level water-use data and the end-use parameter values for disaggregation and classification are loaded) and the three abovementioned functions for end-use detection. It is worth noting that, although end-use detection functions are still applied starting with those aimed at detecting electric appliances and ending with those aimed at identifying shower, toilet, and tap uses, end-use detection functions were substantially revised to overcome the limitations affecting the original method applicability, and thus to make the approach transferable to a variety of residential contexts.

The general layout of end-use detection functions is shown in Figure 2. Overall, disaggregation and classification of water-use events of a given end-use category j are performed by comparing the features of each detected water-use event i – mainly volume (but, for some categories, also duration, as detailed in the following) – against: (a) the minimum and maximum allowable values of event features for the selected end-use category j (hereinafter called extreme end-use parameter values) and (b) the features of a reference event for the selected end-use category j (hereinafter called reference end-use parameter values). In greater detail, extreme end-use parameter values are first considered to exclude all those events which are not compatible with the possible water-use events of the selected end-use category j. All the events possibly related to the -th end-use category are then counted on a daily basis: if their daily per capita frequency of use (being and the number of counted uses and the number of inhabitants, respectively) is lower than the maximum allowable daily per capita frequency of use for the category selected, all the daily events are considered as end-use events of the -th category and removed from the household-level water-use time series. Conversely, if their daily per capita frequency of use is greater than the maximum allowable daily per capita frequency of use (i.e., , being ), reference end-use parameter values are considered to quantify the level of similarity between each of the -selected water-use events and reference event of the end-use category j. In this case, only the – most similar events are assigned to the category concerned (and thus removed from the household-level water-use time series), whereas the others are neglected.
Figure 2

General layout of end-use detection functions making up the enhanced method for automated water end-use disaggregation and classification.

Figure 2

General layout of end-use detection functions making up the enhanced method for automated water end-use disaggregation and classification.

Close modal
Event similarity is assessed through the calculation of the normalized Euclidean distance between the features of the reference event for the end-use category j – i.e., reference parameter values , being the number of features investigated to detect the events of the category concerned – and the corresponding features of the given water-use event i. From an operational standpoint, normalization is carried out by considering the minimum and the maximum allowable values of each water-use feature for the -th end-use category concerned, i.e., the extreme end-use parameter values and , as indicated in Equation (1).
formula
(1)

It is worth noting that, according to Equation (1), the Euclidean distance of a water-use event i from the reference event of a given end-use category j is zero if all event features coincide with those of the reference event , whereas it is maximum when the difference between its features and those of the reference event is maximum, i.e., event features coincide with those defined by extreme parameter values of end-use category j, or .

Besides their general layout, the detailed characteristics of each end-use function making up the enhanced automated method for end-use disaggregation and classification (including end-use detection rules and event features considered) are described in the following, whereas individual flow-charts are shown in Supplementary material, Figures S1–S3.

  • Appliance use (dishwasher, washing machine). Appliance operational cycles (hereinafter called loads) typically include several short inflow events between longer periods during which the machine does not draw water in (hereinafter called withdrawals). Withdrawal features (e.g., volume, duration, time distance from the other withdrawals of the same load) differ based on manufacturer, model, and programme. In light of the above, the function for appliance-use detection relies on the following end-use parameters for disaggregation and classification (as detailed in Table 1): (i) withdrawal duration , defined by extreme values ; (ii) withdrawal volume , defined by extreme values ; (iii) number of withdrawals per appliance load , defined by extreme and reference values ; (iv) temporal distance between two subsequent withdrawals , defined by extreme values ; (v) load duration , defined by extreme and reference values ; (vi) load volume , defined by extreme and reference values ; and (vii) maximum daily per capita frequency of appliance use, defined by extreme value . In greater detail, all water-use events are first considered as possible appliance withdrawals if their duration d and volume v fall within specific thresholds and . A group of possible withdrawals is then considered as a possible appliance load if their number X and temporal distance p from other possible withdrawals fall within the accepted ranges and . In the case of time overlapping possible loads, or if the number n of daily possible loads is higher than (being the number of inhabitants of the household concerned), only those loads with minimum Euclidean distance – calculated as shown in Equation (2) – are classified as appliance use:
    formula
    (2)
Table 1

General end-use parameter values for automated disaggregation and classification

End-use categoryEnd-use parameterExtreme values
Reference values
MinMax
Dishwasher Load durationa    
Load volumeb    
Number of withdrawals per loada    
Withdrawal durationa   – 
Withdrawal volumea   – 
Time between subsequent withdrawalsa   – 
Daily per capita frequency of useb,c –  – 
Washing machine Load durationa    
Load volumeb    
Number of withdrawals per loada    
Withdrawal durationa   – 
Withdrawal volumea   – 
Time between subsequent withdrawalsd   – 
Daily per capita frequency of useb,c –  – 
Shower Durationb    
Volumeb    
Duration of flow interruption per usec –  – 
Daily per capita frequency of useb,c –  – 
Toilet Durationb    
Volumeb    
Daily per capita frequency of useb,c – –  
Taps Durationb    
Volumeb    
Daily per capita frequency of useb,c – –  
End-use categoryEnd-use parameterExtreme values
Reference values
MinMax
Dishwasher Load durationa    
Load volumeb    
Number of withdrawals per loada    
Withdrawal durationa   – 
Withdrawal volumea   – 
Time between subsequent withdrawalsa   – 
Daily per capita frequency of useb,c –  – 
Washing machine Load durationa    
Load volumeb    
Number of withdrawals per loada    
Withdrawal durationa   – 
Withdrawal volumea   – 
Time between subsequent withdrawalsd   – 
Daily per capita frequency of useb,c –  – 
Shower Durationb    
Volumeb    
Duration of flow interruption per usec –  – 
Daily per capita frequency of useb,c –  – 
Toilet Durationb    
Volumeb    
Daily per capita frequency of useb,c – –  
Taps Durationb    
Volumeb    
Daily per capita frequency of useb,c – –  

aParameter values derived from fixture technical handbooks.

bParameter values available in the literature (i.e., derived from the study by Mazzoni et al. 2023a).

cExpressed in uses/person/day.

dParameter values based on common-sense observations.

It is worth observing that, although the function for appliance-use detection is the same for dishwashers and washing machines, the parameter values to input are different based on the end-use to detect (as shown in Table 1).

  • Shower use. Shower uses are typically characterized by long (i.e., several-minute) durations and large volumes of water consumed. However, their features depend on the way in which residents have showers (some of whom may be used to turn off the water). Therefore, the following end-use parameters are considered for disaggregation and classification of shower events (as detailed in Table 1): (i) shower duration , defined by extreme and reference values ; (ii) shower volume , defined by extreme and reference values ; (iii) maximum duration of flow interruption during shower use, defined by extreme value ; and (iv) maximum daily per capita frequency for shower use, defined by extreme value . Specifically, all possible daily shower uses are first searched among those uses the features of which fall within specific thresholds , , and whose maximum duration of water interruption is lower than . In the event that the number n of daily possible shower uses is greater than , only those uses related to the lowest Euclidean distance – calculated as shown in Equation (3) are classified as shower uses:
    formula
    (3)
  • Toilet and tap use. Toilet consumption per flush is generally constant in the case of the same fixture due to the mechanical characteristics of toilet filling systems, whereas tap consumption can considerably vary based on residents' activity. However, as far as duration is concerned, both toilet and tap uses (i.e., the residual water uses to disaggregate and classify) are typically characterized by a minute or sub-minute duration (Mazzoni et al. 2023a), making these uses hard to be detected at the 1-min resolution. In light of the above, toilet uses, tap uses, or a combination of them are disaggregated and classified using a single function which is applied after appliance- and shower-use detection. Specifically, the following end-use parameters are considered for disaggregation and classification of toilet and tap-use (as detailed in Table 1): (i) toilet-fill duration , defined by extreme and reference values ; (ii) toilet-flush volume , defined by extreme and reference values ; (iii) toilet-flush daily per capita frequency of use , defined by reference value ; (iv) tap-event duration , defined by extreme and reference values ; (v) tap-event volume , defined by extreme and reference values ; and (vi) tap-event daily per capita frequency of use , defined by reference value . In greater detail, the compatibility between each residual water-use event and the characteristics of toilets and taps (i.e., their extreme end-use parameter values) is first evaluated. If the event to classify is compatible with toilet characteristics (i.e., and ) but not with tap characteristics (i.e., or ), it is classified as a toilet use. By contrast, if the event is compatible with tap characteristics but not with toilet characteristics, it is considered a tap use. If the water-use event is compatible with both toilet and tap characteristics, the similarity between the selected event and the reference event of toilet and tap uses is evaluated for both categories, i.e., Euclidean distances and are calculated, as shown in Equations (4) and (5). The water-use event is then classified to the end-use category for which the Euclidean distance is minimum (i.e., toilet, if , or tap, if ).
    formula
    (4)
    formula
    (5)

It is worth noting that, in the case of residual water-use events not directly compatible with the characteristics of individual toilet or tap uses (i.e., and , or , and ), events are split in two and assigned to both categories proportionally to their reference daily per capita volumes, i.e., the product between reference volumes per use ( and ) and per capita frequencies of use ( and ).

End-use parameters for automated disaggregation and classification

The enhanced automated methodology requires the input of a set of end-use parameter values – describing end-use features – to disaggregate and classify water-use events. The overall list of the required values is shown in Table 1. As highlighted in Section 2.1, end-use parameter values can be classified as (i) extreme values, i.e., minimum and/or maximum allowable values assumable by event features to consider the selected event as a potential water-use of a given end-use category; or (ii) reference values, i.e., values describing the features of the average and most common water-use events of a given end-use category.

It is worth observing that the values of each end-use parameter may vary across households because of different residents' habits and end-use features. However, from an operational standpoint, extreme and reference end-use parameter values can be defined specifically for each household (i.e., by relating parameter values to individual household characteristics) or in a general way (i.e., by relating parameter values to the average end-use features and the most common residents' attitude towards water-use). On the one hand, specific parameter values for individual households can be obtained through intrusive monitoring or by submitting water-use surveys to the residents of the households investigated, which typically requires considerable effort and may be inapplicable on a large scale. On the other hand, the use of general parameter values for end-use disaggregation and classification of household water-use data would make the method more applicable and easily transferrable, because the exploitation of values already available in the literature would overcome the need to carry out infeasible, expensive, and time-consuming procedures for the assessment of specific values.

In light of the above, the automated method for end-use disaggregation and classification was tested in this paper considering general end-use parameter values. This was done to evaluate method transferability to contexts in which detailed information on water-use in individual households (i.e., specific end-use parameter values) is not available. A summary of the end-use general parameter values applied is provided in Table 1. In greater detail, general end-use parameter values are defined, when possible, by referring to the end-use data reported in the literature. Conversely, in the case of parameters the values of which are not available in the literature, data reported in technical handbooks – or derived based on common-sense considerations – are applied.

In greater detail, general values for the end-use parameters already investigated in the literature are obtained by relying on the data reported in the extensive review on residential end uses of water over different spatiotemporal contexts by Mazzoni et al. (2023a). As shown in Equation (6), general parameter values are calculated for each end-use parameter based on the statistical probability-density functions (i.e., PDFs) shown in the above-mentioned study (which were derived from the corresponding empirical PDFs reported in numerous studies published between 1975 and 2023):
formula
(6)
where subscripts j, c, and a indicate end-use category (e.g., shower), end-use feature (e.g., duration), and study considered in the literature review by Mazzoni et al. (2023a), respectively; A is the total number of studies reporting the empirical PDF for feature c of end-use category j (e.g., shower duration); is the statistical PDF derived by Mazzoni et al. (2023a) for feature c of end-use category j based on the empirical PDF reported in study a; is the -th percentile of statistical PDF ; and is the general parameter value required (i.e., general value for feature c of end-use ). From an operational standpoint, reference and extreme end-use parameter values are assigned based on different percentiles of statistical PDFs: specifically, the 50th percentile of each PDF, i.e., the median, was selected for reference values (e.g., shower reference duration ), whereas the 10th and the 90th percentile are assigned to the extreme values, respectively (e.g., shower minimum and maximum duration and ). As far as extreme values are concerned, it is worth noting that the 10th and the 90th percentile are adopted for each distribution instead of more extreme percentiles (e.g., 5th and 95th, or 3rd and 97th) to avoid considering exceptional or anomalous events (i.e., outlier uses) which may be poorly representative of the general and most common characteristics of residential end uses of water. However, with no loss of generality, different thresholds could be considered for the obtainment of the extreme end-use parameter values required by the model. It is also worth noting that, in the case of temporal parameters (e.g., durations, time distances, etc.) percentile values are rounded up to the nearest minute in light of the fact that the enhanced method disaggregates and classifies water-use data collected at the 1-min resolution, thus parameter values expressed with a similar resolution are required.

In the case of end-use parameters not yet investigated in the literature, different approaches are applied to obtain general values. On the one hand, in the case of parameters related to electronic appliances (e.g., load total duration ) reference is made to technical handbooks. On the other hand, in the case of parameters neither available in the literature nor reported in fixture technical specifications (e.g., parameters depending on individual behaviour), values are assumed based on common-sense considerations. By way of example, a maximum duration of flow interruption equal to 2 min is assumed for shower use , in light of the fact that, when people turn off the water during a shower, this is typically restarted after a limited period.

Metrics for the evaluation of method performance

The evaluation of the method performance is conducted through the combined use of the two metrics generally applied in the similar studies available in the literature (e.g., Cominola et al. 2018): (i) Water Contribution Accuracy (WCA), describing the (percent) accuracy of the performance at the level of aggregate end-use consumption and (ii) NRMSE, quantifying the tendency of the automated method to over- or underestimate the end-use time series. The formulation of the WCA and NRMSE metrics is shown in Equations (7) and (8):
formula
(7)
formula
(8)
where subscripts h, j, and t indicate household, end-use category, and minute, respectively; T is the length (min) of monitoring period; (L) is the aggregate water-use in household h at minute t; (L) is the observed water-use of end-use category j in household h at minute t; and (L) is the water-use disaggregated and classified to end-use category j in household h at minute t. Based on and formulation, an accurate performance of the disaggregation and classification model would result in values close to 100% and values close to 0.

As reported by Cominola et al. (2018), the use of paired metrics is motivated by the fact according to which may be poorly representative of disaggregation and classification effectiveness in the case of occasionally activated end uses. Therefore, the introduction of other, less aggregated metrics (e.g., ) can help in better interpreting the results. Finally, it is worth noting that, although metrics and are calculated for each end-use category j of household h, the overall performance of the methodology in successfully detecting different water end uses can be assessed by averaging these metrics across all households (thus obtaining aggregate performance indicators for different end-use categories (i.e., and ). More generally, aggregate indicators describing the average effectiveness of the method in relation to a given household sample may be obtained by averaging metrics and across all end-use categories.

Data application

The performance of the enhanced methodology for end-use disaggregation and classification is evaluated by exploiting real water-use data collected in the field at the end-use level. Specifically, to test the method under different circumstances, two end-use datasets – featuring different locations, end-use characteristics, number of residents and inhabitants' attitudes towards water-use, and including more than 650 full days of water-use monitoring in 13 households – are considered:

  • A first sample of four households located in the peripheral area of Bologna, Italy (namely, Italian dataset), which underwent intrusive monitoring for 6–8 weeks in early 2018. Intrusive monitoring was carried out at each domestic end-use by installing Itron® mechanical water meters (with an accuracy of 1 L) paired with data loggers with a pre-set sampling frequency of 1 min. It is worth noting that the end-use data collected at the households of the Italian dataset are the same as those initially used by Mazzoni et al. (2021) to calibrate and validate the original version of the automated method.

  • A second sample of nine households located in the North Holland province, Netherlands (hereinafter called Dutch dataset) was monitored over periods of varying length (from 2 weeks up to 3 months) between mid-2019 and early 2020. Unlike the Italian dataset, monitoring was carried out at the household level by installing smart water meters with 1-s sampling frequency and 0.1-L accuracy, whereas end-use data were obtained from the household-level water-use time series by (i) automatically segmenting the aggregate water-use time series into individual end-use events; and (ii) manually classifying the unlabelled end-use events segmented based on their features and the results of water-use surveys submitted to residents. Detailed information on the methodology adopted for end-use analysis and the results achieved are available in Mazzoni et al. (2023b). From an operational standpoint, the water-use time series of the Dutch dataset were temporally aggregated at the 1-min resolution before being input in the model, to conform data temporal resolution to the one required by the automated method. In addition, all water-use events classified as uncertain or other uses by the analysts – and thus not applicable to the model – were removed from the household-level water-use time series.

As far as the overall household sample is concerned, an average daily per capita water consumption of about 119 L/person/day is observed (i.e., about 114 L/person/day excluding uncertain/unknown uses of water) as depicted in Figure 3(a). Specifically, showers and toilets are tied to the greatest values (i.e., 34 and 32 L/person/day respectively), followed by taps (26 L/person/day), washing machines (18 L/person/day) and, finally, dishwashers (3 L/person/day). In addition, as shown in Figure 4 (grey lines), a peculiar daily profile (i.e., normalized series of hourly water-use values) is observed for each end-use of water. On the one hand, toilet and tap use is rather constant throughout the day, although higher values can be observed in the morning and at dinner time (along with lower values at night). On the other hand, shower and appliance use is more frequent at specific times of the day. In the case of showers, two peaks emerge in the morning and the evening, whereas a single peak in appliance use is observed (i.e., at about midnight in the case of dishwashers and at about midday in the case of washing machines). However, despite the average characteristics, different behaviours emerge between the two datasets in terms of both daily per capita end-use water consumption and daily end-use profiles (as shown in Figures S4 and S5 of the Supplementary materials, respectively).
Figure 3

(a) Daily per capita end-use volumes and (b) comparison against those obtained by applying the enhanced disaggregation and classification method.

Figure 3

(a) Daily per capita end-use volumes and (b) comparison against those obtained by applying the enhanced disaggregation and classification method.

Close modal
Figure 4

End-use daily profiles (grey lines) and comparison against those obtained by applying the enhanced disaggregation and classification method (blue lines).

Figure 4

End-use daily profiles (grey lines) and comparison against those obtained by applying the enhanced disaggregation and classification method (blue lines).

Close modal

The analysis of water-use data also reveals similarities in terms of combined events (i.e., end uses activated simultaneously). Concerning the Italian households, combined events cover about 9.3% of the minutes revealing water consumption, whereas the percent number of minutes related to combined events is 9.0% in the case of the Dutch dataset. Specifically, the choice of considering the number of minutes instead of the number of events is due to the fact that some end-use events (e.g., toilet and tap uses) typically last less than one minute, and therefore individual events can be hidden by the 1-min temporal resolution. Among all possible combinations of end uses, the majority of minutes (6.1% of the 1-min readings including water consumption of the Italian and Dutch dataset respectively) are tied to the simultaneous use of toilets and taps. It therefore emerges that these combined events can be correctly classified by the method, in which the detection of simultaneous toilet and tap uses is possible (as highlighted in Section 2.1). Conversely, other combined events (i.e., the residual 3.2% and 1.6% of the 1-min readings including water consumption), are likely to be misclassified. However, the above misclassification margin is considered acceptable for method application.

From an operational standpoint, the household-level (i.e., raw) data making up the Italian and the Dutch dataset are input in the model along with the information about the number of inhabitants of each household concerned, whereas their respective end-use time series are considered as a benchmark to evaluate the performance of the enhanced method, i.e., to test its effectiveness in successfully disaggregating and classifying end-use events. It is worth noting that, with no lack of generality, the methodology proposed could be also applied to the case in which the information about the number of inhabitants per household is unavailable, since it may be possible to estimate this parameter based on few weeks of (aggregate) water-use data for the household concerned.

The enhanced methodology is tested by comparing the disaggregated and classified household-level water-use time series against the end-use time series observed in relation to the overall monitoring period. In greater detail, end-use disaggregation and classification are performed by using the set of general end-use parameter values derived as discussed in Section 2.2. The results obtained are reported for the sake of brevity in relation to the overall sample of monitored households (i.e., Italian and Dutch datasets grouped).

The comparison between the classified end-use volumes (and their related profiles) against the baseline end-use data observed in the field is shown in Figures 3 and 4, respectively. Overall, it results that the method is effective in disaggregating volumes (Figure 3) and reproducing daily profiles (Figure 4) despite (i) the coarse resolution of input household-level data (i.e., 1 min) and (ii) the application of general end-use parameter values derived from the literature or common-sense-based considerations (and thus without using values specifically calibrated for individual households). It is worth noting that the pie charts shown in Figure 3 and the profiles shown in Figure 4 summarize method performance from an aggregate point of view, without providing an insight into the possible reasons behind the difference between baseline data and the results of automated classification. In light of the above, classification effectiveness is also assessed by evaluating the number of water consumption minutes correctly classified to each end-use category, along with those related to misclassification. The analysis reveals that considering the overall household sample: (i) showers are characterized by the lowest misclassification rate, being the majority of minutes including shower uses correctly assigned to the corresponding end-use category; (ii) a higher misclassification rate is observed in relation to toilet flushes and washing-machine uses, with about 21% of minutes including toilet-flush classified as washing-machine withdrawals, and about 28% of minutes including washing-machine withdrawals classified as toilet flushes; (iii) the highest misclassification rate is observed in the case of dishwasher uses, being 34% of minutes including dishwasher uses classified as tap uses. The results obtained are motivated by the following considerations: (i) low misclassification rate for showers is due to the fact that the features of this end-use category (e.g., durations and volumes per use) are generally quite different from those of other categories; (ii) the misclassification of toilet flushes and washing-machine uses is likely to be due to the similarities between the features of these two end uses (as shown in Table 1) along with similarities between the daily frequency of toilet use and the number of washing-machine withdrawals per load; (iii) misclassification of dishwasher uses as tap uses (which is reported also in similar) is due to the similarity between event duration and volume for these two categories.

As far as evaluation metrics are concerned (i.e., and ), the average values obtained by applying the automated method(s) are included in the heatmaps shown in Figure 5. Although with differences across individual households (i.e., with average WCA values ranging from 83.5% to 96.1% based on the household considered) the enhanced methodology results in a total WCA of 91.7% by using general parameter values (Figure 5(a)), with a limited standard deviation (i.e., 3.6%). This further demonstrates the effectiveness of the enhanced method in successfully disaggregating and classifying water end uses. In fact, the average WCA achieved in this study is between the values initially obtained by Mazzoni et al. (2021) (i.e., WCA ranging between 90.4% and 95.7%) although the original version of the method was applied only in relation to the Italian sample and by using end-use parameter values specifically calibrated for individual households, or their envelope. Moreover, the average value achieved is higher than the value reported by Cominola et al. (2018) in the case of end-use disaggregation and classification of synthetically generated data at the same temporal resolution (i.e., ). Overall, it is worth noting that a remarkable, average value of nearly 92% is achieved in this study by exploiting general end-use parameter values defined by only relying on the information available in the literature or in fixture technical handbooks. This value is also higher than the corresponding value obtained by applying the original method to the same water-use dataset and in relation to the same group of general end-use parameter values ( of 87%, as shown in Figure 5(b)). Therefore, the results obtained demonstrate that – unlike the original method – the enhanced method can provide considerably accurate results at the level of aggregate end-use consumption despite the use of parameter values not specifically calibrated with regard to individual households, thus avoiding extensive surveys or intrusive monitoring in individual households.
Figure 5

Results of the enhanced method for end-use disaggregation and classification in terms of (a) WCA and (c) NRMSE and comparison against the corresponding results provided by the original method version (b, d).

Figure 5

Results of the enhanced method for end-use disaggregation and classification in terms of (a) WCA and (c) NRMSE and comparison against the corresponding results provided by the original method version (b, d).

Close modal

When method performance in relation to individual end-use categories is considered, the most accurate cases are those related to the use of toilets (WCA of ) and dishwashers (WCA of ), whereas slightly lower values emerge in the case of washing machine (WCA of ), tap (WCA of ) and shower uses (WCA of ), as shown in Figure 5(a). Despite slightly different values based on the end-use considered, all categories are characterized by values of at least 90%, meaning that – on average – the difference between baseline and disaggregated volumes is lower than 10% of the total for each end-use category (as observable from Figure 3). Therefore, the values obtained for different end uses demonstrate that the method is capable of effectively performing end-use disaggregation and classification with low variability among the end-use category concerned, while ensuring a general good similarity between baseline end-use daily profiles and those resulting from method application (Figure 4). In greater detail, the slightly lower value obtained in the case of taps and showers could be motivated by considering that, on the one hand, tap uses are typically characterized by limited durations (typically equal to – or shorter than – 1 min, as highlighted in Table 1) and consumption, and could be time-overlapped with other end uses of water (e.g., toilet flushes), making them scarcely detectable at the 1-min resolution. On the other hand, despite longer durations, shower uses can be considerably variable in terms of consumed volumes, making an accurate volume classification by means of general end-use parameter values rather complex.

In addition, the results obtained demonstrate the enhancement of the end-use disaggregation and classification method, the original version of which reveals drops in the for some end-use categories when general end-use parameter values are applied (e.g., lower than 80% in the case of shower events). The aforementioned reduced accuracy obtained for some end-use categories when the original method version is applied is most likely due to a different layout in the original end-use detection functions, which do not perform water-use event classification based on the selection of the events the characteristics of which are the most similar to the reference of each end-use category (as opposed to the case of the enhanced method).

The above considerations about method performance are also supported by end-use disaggregation and classification results in terms of NRMSE. As indicated in Figure 5(c), method revision and the introduction of reference parameter values for event classification results in a total NRMSE of 0.058, with a standard deviation of 0.014 (although with differences across individual households, i.e., with average NRMSE values ranging from 0.033 to 0.073 based on the household considered). In greater detail, the best method performance is observed in the case of tap and shower uses (NRMSE of about and , respectively) whereas a slightly larger NRMSE is observed in the case of toilet , washing machine and, finally, dishwasher uses (, being this value most likely due to the very limited range of dishwasher flow rates leading to an increase in the denominator of Equation (8) and thus affecting the NRMSE value). It can also be observed that the NRMSE values are lower than the corresponding values obtained by applying the original method version in relation to the same household sample and the same general end-use parameter dataset (i.e., to , with an average of as shown in Figure 5(d)). In addition, although slightly higher than the value reported by Cominola et al. (2018) (i.e., 0.040) the NRMSE results obtained demonstrate an overall limited error of the enhanced method in over- or underestimating end-use time series.

This study presented an enhanced version of the automated method for end-use disaggregation and classification originally proposed by Mazzoni et al. (2021) and applicable to water-use data collected at the 1-min temporal resolution. Unlike the original version, the enhanced method (i) is not time dependent, i.e., water-use events of different end-use categories are not searched only in relation to specific daily periods; and (ii) classifies end-use events not only by comparing the features of the water-use events detected on the aggregate time series against the allowable features for each end-use category, but also by evaluating the similarity with the most common events of the end-use category concerned.

The main key implications of the proposed study are summarized by way of conclusion:

  • The results achieved by applying the enhanced method in relation to water-use data collected in two considerably different geographical areas (i.e., Italy and the Netherlands) confirm the method ability to disaggregate and classify water-use events effectively, with an average total accuracy of about 91% and an average normalized root-mean-square-error (NRSME) lower than . This reveals a good similarity between baseline and disaggregated end-use volumes, along with reduced deviations between the related end-use daily profiles.

  • Unlike other disaggregation and classification methods, results are achieved by using general end-use parameter values not specifically calibrated in relation to individual households (thus reproducing the case in which detailed information about household characteristics and residents' habits is not available). This lays the basis for future method application to large amounts of aggregate water-use data to be processed, without the need to investigate the characteristics of water consumption in individual households.

  • The accuracy of the method in detecting, disaggregating, and classifying residential end uses of water is similar to that obtained in other studies making use of data at the 1-min resolution, but higher than the accuracy resulting from the application of the original method version in relation to the same household sample and the same general end-use parameter dataset. On the one hand, this confirms the effective enhancement of the method. On the other hand, it is further demonstrated that an accurate end-use disaggregation and classification is possible even by exploiting water-use data at the 1-min resolution, which is closer to that of many commercial smart water meters and thus could make water-use data more easily collectable and storable. Therefore, it is believed that the method could be a valid aid for water utilities to massively investigate the characteristics of the residential end uses of water across different geographical contexts.

In conclusion, it is worth remarking that (i) the current method version is only capable of classifying end-use events into the five main categories of indoor water consumption here considered, i.e., dishwasher, washing machine, shower, taps, and toilet (thus excluding outdoor uses and specific indoor uses, e.g., irrigation, bathtub, air humidifier); and (ii) all overlapped events that are not a combination of a toilet and a tap use cannot generally be detected (although it was demonstrated that, at least for the case studies here considered, these events are typically a minority of all possible water uses, i.e., less than 5%). In light of these limitations, future research will mainly be addressed at (i) evaluating the possibility of defining parameters and disaggregation rules for additional end-use categories; and (ii) understanding whether the 1-min temporal resolution of water-use data is sufficient to detect and disaggregate every possible combined water-use event (e.g., shower and toilet used simultaneously, or toilet and washing machine) and, if so, adapting the disaggregation and classification method accordingly. Future research will also focus on further enlarging the end-use datasets to use as a benchmark, for testing the performance of the method in relation to additional household samples and different seasons (i.e., summer period, which was not considered here – except for one household – due to the general lack of end-use data collected during the summer season and applicable for method validation).

All relevant data are available from an online repository or repositories: https://doi.org/10.5281/zenodo.7937757.

The authors declare there is no conflict.

Agudelo-Vera
C.
,
Keesman
K.
,
Mels
A.
&
Rijnaarts
H.
2013
Evaluating the potential of improving residential water balance at building scale
.
Water Res.
47
(
20
),
7287
7299
.
https://doi.org/10.1016/j.watres.2013.10.040
.
Attallah
N.
,
Horsburgh
J.
&
Bastidas Pacheco
C.
2023
An open-source, semisupervised water end-use disaggregation and classification tool
.
J. Water Res. Plan. Manage.
149
(
7
),
04023024
.
https://doi.org/10.1061/JWRMD5.WRENG-5444
.
Bastidas Pacheco
C.
,
Attallah
N.
&
Horsburgh
J.
2023
Variability in consumption and end uses of water for residential users in logan and Providence, Utah, US
.
J. Water Res. Plan. Manage.
149
(
1
),
04023024
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001633
.
Beal
C.
,
Stewart
R.
,
Spinks
A.
&
Fielding
K.
2011
Using smart meters to identify social and technological impacts on residential water consumption
.
Water Sci. Tech.: Water Supply
11
(
5
),
527
533
.
https://doi.org/10.2166/ws.2011.088
.
Bethke
G.
,
Cohen
A.
&
Stillwell
A.
2021
Emerging investigator series: Disaggregating residential sector high-resolution smart water meter data into appliance end-uses with unsupervised machine learning
.
Env. Sci.: Water Res. Tech.
7
(
3
),
487
503
.
https://doi.org/10.1039/D0EW00724B
.
Blokker
M.
,
Vreeburg
J.
&
van Dijk
J.
2010
Simulating residential water demand with a stochastic end-use model
.
J. Water Res. Plan. Manage.
136
(
1
),
19
26
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0000002
.
Bouziotas
D.
,
Rozos
E.
&
Makropoulos
C.
2015
Water and the city: Exploring links between urban growth and water demand management
.
J. Hydroinformatics
17
(
2
),
176
192
.
https://doi.org/10.2166/hydro.2014.053
.
Cominola
A.
,
Giuliani
M.
,
Castelletti
A.
,
Rosenberg
D.
&
Abdallah
A.
2018
Implications of data sampling resolution on water use simulation, end-use disaggregation, and demand management
.
Env. Model. Soft.
102
,
199
212
.
https://doi.org/10.1016/j.envsoft.2017.11.022
.
Cominola
A.
,
Giuliani
M.
,
Castelletti
A.
,
Fraternali
P.
,
Herrera Gonzalez
S. L.
,
Guardiola Herrero
J. C.
,
Novak
J.
&
Rizzoli
A. E.
2021
Long-term water conservation is fostered by smart meter-based feedback and digital user engagement
.
npj Clean Water
4
,
29
.
https://doi.org/10.1038/s41545-021-00119-0
.
Cosgrove
W.
&
Loucks
D.
2015
Water management: Current and future challenges and research directions
.
Water Resour. Res.
51
(
6
),
4823
4839
.
https://doi.org/10.1002/2014WR016869
.
Fontdecaba
S.
,
Sánchez-Espigares
J.
,
Marco-Almagro
L.
,
Tort-Martorell
X.
,
Cabrespina
F.
&
Zubelzu
J.
2013
An approach to disaggregating total household water consumption into major end-uses
.
Water Resour. Manage.
27
(
7
),
2155
2177
.
https://doi.org/10.1007/s11269-013-0281-8
.
Ghamkhar
H.
,
Ghazizadeh
M. J.
,
Mohajeri
S. H.
,
Moslehi
I.
&
Yousefi-Khoshqalb
E.
2023
An unsupervised method to exploit low-resolution water meter data for detecting end-users with abnormal consumption: Employing the DBSCAN and time series complexity
.
Sustain. Cities Soc.
94
,
104516
.
https://doi.org/10.1016/j.scs.2023.104516
.
Gleick
P.
,
Wolff
G.
&
Cushing
K.
2003
Waste not, Want not: The Potential for Urban Water Conservation in California
.
Pacific Institute for Studies in Development, Environment and Security
,
Oakland, California, USA
.
Goharian
E.
&
Burian
S. J.
2018
Developing an integrated framework to build a decision support tool for urban water management
.
J. Hydroinformatics
20
(
3
),
708
727
.
https://doi.org/10.2166/hydro.2018.088
.
Heydari
Z.
,
Cominola
A.
&
Stillwell
A.
2022
Is smart water meter temporal resolution a limiting factor to residential water end-use classification? A quantitative experimental analysis
.
Environ. Res.
2
(
4
),
045004
.
https://doi.org/10.1088/2634-4505/ac8a6b
.
Koop
S.
,
Clevers
S.
,
Blokker
M.
&
Brouwer
S.
2021
Public attitudes towards digital water meters for households
.
Sustainability
13
(
11
),
6440
.
https://doi.org/10.3390/su13116440
.
Kowalski
M.
&
Marshallsay
D.
2003
A system for improved assessment of domestic water use components
. In
In Proc. of the 2nd Int. Water Assoc. Conf. on Efficient Use and Manage. of Urban Water Supply. International Water Association, London, UK
.
Li
J.
&
Song
S.
2023
Urban water consumption prediction based on CPMBNIP
.
Water Resour. Manage.
37
,
5189
5213
.
https://doi.org/10.1007/s11269-023-03601-1
.
Mayer
P.
,
DeOreo
W.
,
Opitz
E.
,
Kiefer
J.
,
Davis
B.
,
Dziegielewski
B.
&
Nelson
J.
1999
Residential end-Uses of Water
.
American Water Works Association Research Foundation
,
Denver, Colorado, USA
.
Mazzoni
F.
,
Alvisi
S.
,
Franchini
M.
,
Ferraris
M.
&
Kapelan
Z.
2021
Automated household water end-use disaggregation through rule-based methodology
.
J. Water Res. Plan. Manage.
146
(
6
),
04021024
.
https://doi.org/10.1061/(ASCE)WR.1943-5452.0001379
.
Mazzoni, F., Alvisi, S., Blokker, E. J. M., Buchberger, S. G., Castelletti, A. F., Cominola, A., Gross, M. P., Jacobs, H. E., Mayer, P. W., Steffelbauer, R. A., Stillwell, A. S., Tzatchkov, V., Yamanaka, V. H. & Franchini, M.
2023a
Investigating the characteristics of residential end uses of water: A worldwide review
.
Water Res.
230
,
119500
.
https://doi.org/10.1016/j.watres.2022.119500
.
Mazzoni, F., Alvisi, S., Blokker, E. J. M., Buchberger, S. G., Castelletti, A. F., Cominola, A., Gross, M. P., Jacobs, H. E., Mayer, P. W., Steffelbauer, R. A., Stillwell, A. S., Tzatchkov, V., Yamanaka, V. H. & Franchini, M.
2023b
Exploiting high-resolution data to investigate the characteristics of residential water consumption at the end-use level: A Dutch case study
.
Water Resour. Ind.
23
,
100198
.
https://doi.org/10.1016/j.wri.2022.100198
.
Mazzoni
F.
,
Blokker
E. J. M.
,
Alvisi
S.
&
Franchini
M.
2023c
Water End-Use Disaggregation and Classification Method – "Enhanced Version"
.
Zenodo
, Geneva, Switzerland.
https://doi.org/10.5281/zenodo.7937757
.
Melville-Shreeve
P.
,
Cotterill
S.
&
Butler
D.
2021
Capturing high-resolution water demand data in commercial buildings
.
J. Hydroinformatics
23
(
3
),
402
416
.
https://doi.org/10.2166/hydro.2021.103
.
Nguyen
K.
,
Zhang
H.
&
Stewart
R.
2013
Development of an intelligent model to categorise residential water end-use events
.
J. Hydro-env. Res.
7
(
3
),
182
201
.
https://doi.org/10.1016/j.jher.2013.02.004
.
Stewart
R.
,
Nguyen
K.
,
Beal
C.
,
Zhang
H.
,
Sahin
O.
,
Bertone
E.
,
Vieira
E.
&
Kossieris
P.
2018
Integrated intelligent water-energy metering systems and informatics: Visioning a digital multi-utility service provider
.
Env. Model. Soft.
105
,
94
117
.
https://doi.org/10.1016/j.envsoft.2018.03.006
.
Surendra
H. J.
&
Deka
P. C.
2022
„Municipal residential water consumption estimation techniques using traditional and soft computing approach: A review
.
Water Conserv. Sci. Eng.
7
,
77
85
.
https://doi.org/10.1007/s41101-022-00127-2
.
Wang
M.
,
Fu
H.
,
Zhou
Z.
&
Cheng
Z.
2023
Modeling the impact of a family structure on household water consumption
.
AQUA Water Infrastruct. Ecosyst. Soc.
72
(
1
),
96
110
.
https://doi.org/10.2166/aqua.2022.166
.
Zanfei
A.
,
Melo-Brentan
B.
,
Menapace
A.
&
Righetti
M.
2022
A short-term water demand forecasting model using multivariate long short-term memory with meteorological data
.
J. Hydroinformatics
24
(
5
),
1053
1065
.
https://doi.org/10.2166/hydro.2022.055
.
Zhang
J.
,
Savic
S.
,
Xu
Q.
,
Liu
K.
&
Quang
Z.
2024
Poisson rectangular pulse (PRP) model establishment based on uncertainty analysis of urban residential water consumption patterns
.
Env. Sci. Ecotech.
18
,
100317
.
https://doi.org/10.1016/j.ese.2023.100317
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data