ABSTRACT
Model-based leakage localisation in water distribution networks requires accurate estimates of nodal demands to correctly simulate hydraulic conditions. While digital water meters installed at household premises can be used to provide high-resolution information on water demands, questions arise regarding the necessary temporal resolution of water demand data for effective leak localisation. In addition, how do temporal and spatial data gaps affect leak localisation performance? To address these research gaps, a real-world water distribution network is first extended with the stochastic water end-use model PySIMDEUM. Then, more than 700 scenarios for leak localisation assessment characterised by different water demand sampling resolutions, data gap rates, leak size, time of day for analysis, and data imputation methods are investigated. Numerical results indicate that during periods with high/peak demand, a fine temporal resolution (e.g., 15 min or less) is required for the successful localisation of leakages. However, regardless of the sampling frequency, leak localisation with a sensitivity analysis achieves a good performance during periods with low water demand (localisation success is on average 95%). Moreover, improvements in leakage localisation might occur depending on the data imputation method selected for data gap management, as it can mitigate random/sudden temporal and spatial fluctuations of water demands.
HIGHLIGHTS
A real-world water distribution network is extended with stochastically generated demands to represent spatial and temporal water demand fluctuations.
The effectiveness of leakage localisation varies throughout the day and depends on the leakage size and the water demand sampling resolution.
Leakage localisation performance improves if the sum of estimated nodal demands matches the actual water demand.
INTRODUCTION
Water losses (or nonrevenue water), as the difference between the total water input into a water distribution network and billed water demand, are estimated to account for approximately 30% of water system input volumes across the world, with country-specific differences, which reach peak levels higher than 50% (EPA 2013; EurEau 2017; Liemberger & Wyatt 2019). Besides revenue losses, water losses can generate several undesired cascading effects, including induced water scarcity (Zyoud et al. 2016), an unnecessary increase in energy demand for pumping (Colombo & Karney 2002), and damages to piping infrastructure due to erosion of the pipe beds (Mora-Rodríguez et al. 2013). Therefore, water losses represent a major challenge for the operation of water distribution networks and accurate water loss management is of the highest interest for the operators and authorities (Oberascher et al. 2020).
Water loss management involves the timely detection and localisation of water leakages (Puust et al. 2010). Several leak detection and localisation methods have been developed and demonstrated in the literature, recently fostered by the Battle of the Leakage Detection and Isolation Methods (BattLeDIM; Vrachimis et al. 2022). These include both data-driven (Daniel et al. 2022; Romero-Ben et al. 2022) and model-based methods (Steffelbauer et al. 2022b). While both types of methods achieve comparable results in leak detection, model-based algorithms are so far more reliable for leak localisation, as they account for the spatial structure of a water distribution network by means of a calibrated hydraulic computer model of the drinking water network.
Model-based methods for leak management are further subdivided into calibration-based, sensitivity analysis-based, and classification-based methods (Li et al. 2015; Hu et al. 2021; Wan et al. 2022; Romero-Ben et al. 2023). However, the basic concept is similar for all of them: a numerical model of the real network is first used to simulate hydraulic parameters (e.g., pressure at hydrants). The simulated values are then compared to pressure measurements gathered via sensors distributed in the networks and their difference is used to determine the spatial location of a leakage. An accurately calibrated hydraulic model of the water distribution network is a requirement to accomplish this task, providing a precise illustration of the pressure conditions. Therefore, water demand data from the different water users are required as model input to compute the nodal demands in the hydraulic model. Up to now, mainly quarterly or annual readings from mechanical water meters are downscaled to the desired simulation time step (Oberascher et al. 2022), representing one of the main uncertainties in numerical models (Sanz & Pérez 2014). Furthermore, as a recent review by Mohan Doss et al. (2023) concludes, most existing leak localisation approaches using a hydraulic numerical model rely on the assumption that water demand is deterministic, whereas in reality, water demands can be highly stochastic.
Digital water meters provide detailed information on water demands, as they can measure individual water demand at the household level and at high temporal resolution (e.g., sub-daily), enabling a representation of different water demand patterns throughout the water distribution network. They can thus provide essential data for the improvement of model-based leak detection algorithms. With the increasing deployment of digital water meters, recent research has focused on their application potential at the network scale (Antzoulatos et al. 2020; Farah & Shahrour 2017; Huang et al. 2020; Jun et al. 2021; Spedaletti et al. 2022). As these different studies and their results show, high-resolution water demand data improve various aspects of water loss management (e.g., water balance, leakage detection, and leakage localisation) compared to the current state of the art. However, the following question arises, acknowledging that digital water meters can record and transmit water demand data at different temporal resolutions, whereas higher resolution comes also with costs and drawbacks (e.g., shorter meter battery life): ‘What temporal resolution (sampling interval) of water demand data is required for an effective leakage localisation in model-based approaches?’. While similar investigations have already been carried out for applications at the household level based on synthetic and measured water demand data (Cominola et al. 2018; Heydari et al. 2022), to the best of the authors' knowledge, the above question is still unsolved for the purpose of model-based leakage localisation. In this context, it is assumed that the resolution of water demand data and thus knowledge of water demands as input parameters of the hydraulic model have an influence on model-based leakage localisation.
Furthermore, while digital water meters allow the recording of detailed information on water demand, water demand data underlie temporal and spatial data gaps in reality. For example, wireless communication technologies such as Wireless M-Bus or technologies associated with Low Power Wide Area Networks operate in the public frequency bandwidths and data losses can be expected (Oberascher et al. 2022). High-resolution water demand data fall also under the European General Data Protection Regulation (2016/679/EU 2016), requiring the consumer's consent for the installation and operation of digital water meters. Consequently, the water demand in an area might not be entirely known and imputation methods are required to replace the missing values.
The detailed objectives of this work can be described as follows:
To investigate the effectiveness of a hydraulic model-based method for leakage localisation subject to high temporal and spatial water demand fluctuations.
To determine an optimal temporal resolution of household water demand data for effective model-based leakage localisation.
To analyse the impact of spatial and temporal data gaps in household water demand data on the leakage localisation performance and identify opportunities for their mitigation with different imputation methods.
To address these research gaps, first, the hydraulic model of a real-world case study is extended with synthetic demand data simulated with PySIMDEUM (Steffelbauer et al. 2022a), i.e., the Python implementation of the state-of-the-art stochastic end-use model SIMDEUM (Blokker et al. 2010 2017). SIMDEUM allows the generation of unique water demand patterns for each household with 1 min interval and water demands with highly varying spatial-temporal patterns across the case study are represented. These demand patterns are used for the simulation of the observed pressure data in a baseline scenario, providing an ideal setting for the robust evaluation of a model-based method for leakage localisation. Subsequently, the generated high-resolution time series are resampled to different temporal sampling intervals and applied as nodal demand for sensitivity assessment of the model-based leakage localisation to water demand data resolution. Finally, temporal and spatial data gaps are randomly included in the digital water meter readings, and the effectiveness of different imputation methods to mitigate data gap impacts on leakage localisation is investigated.
MATERIAL AND METHODS
High-resolution water demand generation
To consider fluctuations in water demands during the day, a unique high-resolution water demand pattern is generated for each household in the case study. PySIMDEUM, i.e., the Python implementation of SIMDEUM (Steffelbauer et al. 2022a), is used for this purpose, given its ability to generate stochastic water end uses and water demand patterns (Blokker et al. 2010). In this work, the total water demand of each household is generated by randomly setting its number of occupants (one person, two persons, or family households) and by using the default profile. The default profile is based on household statistics from the Netherlands (e.g., household type, gender, age, employment), survey information about the usage of household appliances (e.g., frequency of use, duration, intensity), and information on the water-using appliances (e.g., signatures of water flow). Based on this information, a time series of hot and cold-water demand at 1 s resolution is generated based on probability functions for each household appliance, which is aggregated to the household level and to a temporal resolution of 1 min. For more information about the functionality of SIMDEUM and the default user profile, refer to Blokker et al. (2010) and Blokker et al. (2017). The original time series with a temporal resolution of 1 min is assumed to represent the real water demand behaviour and to simulate the observed data in a baseline scenario. Later, it is further processed to account for different sampling intervals and data gaps in digital water meter readings.
Water demand scaling at different sampling intervals
Furthermore, different times of day are also examined to analyse the influence of total water demand in relation to the leakage size. Based on the pattern profile, the times of day selected for performance assessment are 03:00 (minimum night flow), 07:45 (morning peak), 19:00 (average daily demand), and 23:00 (evening/night peak).
Data gaps and data reconstruction
As described in the Introduction the collected water demand data is subject to temporal and spatial data gaps. In this work, the effect of data gaps is investigated for a sampling interval of 15 min (15 min was chosen because preliminary experiments showed a good leakage localisation performance for all times of day and leakage size, thus it was assumed that data losses will have a high impact at this data sampling resolution). The temporal data losses are assumed to be between 0 and 40% per household, meaning that the number of successfully transmitted data packets is between 96 and 58 per day for a sampling interval of 15 min. Furthermore, the degree of digital water meter penetration varies between 60 and 100%, corresponding to 40% of the households without a digital water meter and full penetration, respectively. Various combinations of spatial and temporal data gaps are thereby randomly implemented.
While an advantage of the above two methods for data imputation is their simplicity, they do not consider the current inflow, which creates a mismatch between the measured inflow and the applied nodal demand in the hydraulic model. Therefore, a third method is developed here and adapted to the specification of the case study using the inflow measurements as a reference value as follows. First, the water demand is measured cumulatively in the considered water distribution network, meaning that the total water demand over a certain period is known even if intermediate (localised) values are missing. Subsequently, the total water demand of each household is scaled in relation to the inflow during this period to fill the missing temporal values. Second, the households are classified into different clusters based on the water demand per billing period to fill the spatial data gaps. Thereby, it is assumed that all households in a cluster have similar behaviour regarding water demand over the day. Under this assumption, the missing water demand patterns of the households are determined by averaging the water demand of the other known households in each cluster. The values are further adjusted to the total water demand of each household and the missing residual quantity, i.e., inflow minus known demand and estimated loss quantity due to leakage and background losses, to comply with the mass balance. The number of clusters should be high enough to cover a wide range of household characteristics, but a higher number of clusters can also cause no real-time values of households to be present in the cluster due to spatial and temporal data losses. Since the aim here is to test the applicability of this approach, four clusters are assumed for simplicity for this case study using the k-means clustering method with the default setting of the Python library scikit-learn (Pedregosa et al. 2011).
Leakage localisation
Experimental setup
Case study
The hydraulic model of the networks is created in EPANET 2.2 (Rossman et al. 2020) and calibrated with data from a measurement campaign run in the summer of 2021. Afterwards, a unique water demand time series is created for each household using PySIMDEUM and assigned to a respective hydraulic model node (referred to the 160 household nodes in Figure 3). The generated time series has a duration of 4 months, using the first 3 months for statistical analyses and feature extraction (quarterly readings, clustering and classification of the households, historical mean demand) and the last month for selecting a random day for leak localisation performance assessment. The Python package WNTR (Klise et al. 2017) is utilised for the hydraulic simulations.
Sampling interval and data gap scenarios
Table 1 summarises the parameters sampled for building different scenarios for leakage localisation performance assessment, along with their considered range. In total, 140 scenarios are initially simulated to analyse the influence of different sampling intervals of water demands, leakage sizes, and analysis times of day, without any spatial and temporal data gaps. Further, for the investigation of data gaps and data imputation effects, leakage size, sampling interval, and analysis time of day are fixed to 2 l/s (corresponds to the equivalent value of 0.002 m³/s in SI units), 15 min, and morning peak, respectively, whereas the spatial and temporal data gaps are reconstructed with the three different imputation methods detailed above. To reduce the influence of the randomness of data gap implementation on the results, each configuration is simulated 10 times, resulting in an additional 600 scenarios. All simulations are performed on a computer with Windows 10 Enterprise 64-bit as the operating system, with an Intel® Core™ i7-7000 processor with 3.60 GHz, and a working memory of 16.4 GB.
Parameter . | Range/values . |
---|---|
Leakage size | 1, 2, 3, 4, and 5 l/s |
Sampling interval | 5 min, 15 min, 1 h, 2 h, 4 h, one quarter (av), one quarter scaled to the inflow (avscal) |
Analysis time of day | minimum night flow (03:00), morning peak (07:45), average daily demand (19:00), and evening/night peak (23:00) |
Spatial data gaps | 0, 10, 20, 30, and 40% |
Temporal data gaps | 0, 10, 20, 30, and 40% |
Imputation method | Zero values, historical mean, adapted to inflow |
Parameter . | Range/values . |
---|---|
Leakage size | 1, 2, 3, 4, and 5 l/s |
Sampling interval | 5 min, 15 min, 1 h, 2 h, 4 h, one quarter (av), one quarter scaled to the inflow (avscal) |
Analysis time of day | minimum night flow (03:00), morning peak (07:45), average daily demand (19:00), and evening/night peak (23:00) |
Spatial data gaps | 0, 10, 20, 30, and 40% |
Temporal data gaps | 0, 10, 20, 30, and 40% |
Imputation method | Zero values, historical mean, adapted to inflow |
Note: Parameter names and their range/values are reported.
Key assumptions
As the focus of this work is on the use of digital water meters for leakage localisation, rather than on leakage simulation of hydraulic model development, the following simplifications are assumed:
There is only one leakage at a time, with perfect detection (i.e., leakages are always detected) and estimation of the leakage size.
A calibrated hydraulic model of the water distribution network is available, without having model uncertainties (e.g., roughness, background losses).
The digital water meters and the pressure sensors at hydrants have no measurement errors.
A regular household water demand is used without considering season-dependent water demands due to, e.g., garden irrigation or swimming pool filling.
Temporal data gaps are equally probable for all digital water meters, without having spatially concentrated losses due to poor radio coverage.
RESULTS AND DISCUSSION
Influence of water demand sampling intervals
In general, the best performances are attained at finer sampling intervals, i.e., 5 and 15 min across all scenarios considered, as they rely on high-resolution knowledge of water demand. The LLS decreases with a coarser sampling interval (e.g., 1–4 h), and the worst performance is shown for the scenarios that rely on quarterly water demand, without considering any spatial and temporal variations. Interestingly, the quarterly water demand scaled to the inflow performs well across all scenarios and has one of the best performances for low (minimum night flow, Figure 5(a)) and medium (quarterly daily demand, Figure 5(c)) water demands. In general, larger leakages can be better localised than smaller leakages for all analysis times and sampling intervals. The friction losses due to increased water flow in the pipes increase with larger leakages, resulting in higher pressure changes and a clearer identification of correct candidate nodes.
In addition, the LLS varies with the analysis time of day and achieves, on average, the best results at the minimum night flow and the worst results at the morning and evening peak. For example, the sampling interval of 15 min has a localisation success of 88% for a leakage size of 2 l/s at the minimum night flow, which decreases to 70% at the evening peak. In general, good results can be achieved at the minimum night flow for all sampling intervals including small leakages. In contrast, it requires a finer sampling interval during the demand peaks to achieve reasonable results.
Interestingly, a finer sampling interval does not always result in a better localisation of the leakages. For example, a sampling interval of 15 min has the poorest performance (except the quarterly water demand) at the analysis point ‘minimum night flow’. This can be explained by the fact that all sampling intervals represent a mean value of the actual water demand over a specific period of time, whereas the pressure measurements are conducted at exactly the time of analysis in this work. The real water demand of the case study is 0.19 l/s at 03:00 a.m., whereas the measured average water demand is 0.14, 0.10, and 0.20 l/s for sampling intervals of 5 min, 15 min, and 1 h, respectively. Similarly, the sampling time of 15 min is closest to the real water demand at the morning peak followed by 1 h and 5 min, and in this order is also the localisation success rate. Subsequently, the localisation success is higher if the average measured water demand is closer to the actual water demand at that specific time point.
Influence of data gaps
For the influence of data gaps on leakage localisation, the sampling interval of 15 min is used, as this resolution corresponds to the planned sampling interval of the digital water meters in the considered case study. A leakage size of 2 l/s at the analysis point morning peak is further applied for the analysis, as the LLS shows a high variation in the sampling intervals at this time point and thus a high impact of data gaps on the results is assumed. As a reference value for benchmarking, this scenario has an LLS of 73% without any temporal and spatial data gaps.
These results also indicate that even if complete time series are available, it is favourable to temporarily smooth the water demand over several sampling intervals (e.g., with rolling averages) to achieve higher robustness in leakage localisation against random/sudden temporal and spatial fluctuations in the water demand. Conversely, one could adjust and average the pressure measurements with a finer sampling interval to the sampling interval of the digital water meters or use the average pressure measurements over multiple time steps.
Computational effort, limitations, and outlook
The applied sensitivity analysis compares measured with simulated data (Wan et al. 2022), requiring a hydraulic simulation for the leakage-free scenario as well as one hydraulic simulation for each possible leakage node for each leakage scenario. Due to the fine resolution of the network, 316 hydraulic nodes were selected as possible leakage nodes, resulting in 317 hydraulic simulations with EPANET, whereby the average computational time per leakage scenario was around 90 s. Thus, the sensitivity analysis is computationally efficient compared to other model-based leakage localisation methods, and the computational time strongly correlates with the number of possible leakage nodes.
This work is based on the following assumptions, influencing the results of model-based leakage localisation:
First, a perfectly calibrated hydraulic model of the water distribution network and no measurement errors are assumed. However, in reality, the calibrated hydraulic model and the required pressure measurements are subject to uncertainties, potentially increasing the distance between identified leakage regions and the real leakage place (Marzola et al. 2022).
Second, a perfect detection and estimation of the size of simulated leaks was assumed, without considering different characteristics of leaks (e.g., abrupt or incipient). However, since this assumption is directly incorporated into the leakage localisation method, it is expected that the LLS will decrease in case of incorrect estimations of the leakage size.
Third, only one model-based method was applied. As shown by the literature, the results may differ for other techniques (Casillas Ponce et al. 2014), suggesting combining multiple techniques for a more robust leakage localisation.
Fourth, a single household water demand profile was applied, showing similar total water demand patterns throughout the year. In reality, a more realistic case study would include heterogeneous water demand profiles, including domestic, commercial, touristic, and agricultural water users, either increasing (e.g., outdoor season-dependent household water end uses, such as garden irrigation or swimming pool filling) or decreasing (e.g., constant consumption) seasonal and daily water demand fluctuations. Subsequently, future research could investigate the effect of different water demand profiles including their combinations on the leakage localisation efficiency.
Finally, an equal probability of temporal data gaps was assumed for all digital water meters, whereas data losses would be spatially aggregated due to poor radio coverage and thus differently impacting their performance. This topic could be addressed by future research.
Therefore, the results obtained for perfect conditions in this work can be seen as an upper limit, but the LLS is expected to decrease in reality or when the above assumptions are relaxed. Thereby, the various uncertainty parameters should be considered together within a comprehensive sensitivity/robustness analysis, thus a systematic error propagation still requires further research (Mohan Doss et al. 2023). This will be especially relevant for small leakage sizes, as the pressure fluctuations caused by the leakage are minor compared to the mentioned uncertainties.
Nonetheless, it is clear from the results of this work that the applied nodal demand estimation has a major influence on the effectiveness of the model-based leakage localisation. Therefore, it requires careful coordination between the available resolution of the water demand data and the time of analysis for leakage localisation, as the achievable efficiency varies throughout the day. On the contrary, as experiences from the real-world implementation of the digital waters in the case study showed, the efforts in installing and operating such a system are still quite high (Oberascher et al. 2024). Subsequently, sufficient benefits are required to compensate for the initial investment, which depends on the case study (e.g., water shortage, pumping or treatment required) and requires an individual and detailed quantitative assessment.
As part of the above-mentioned smart city project, one or more simultaneous leakages will be simulated in reality to estimate the potential of model-based leakage localisation in a real-world environment and test how the results of this study will change in such settings.
CONCLUSION
Water losses are a major challenge for the operation of water distribution networks and a timely detection and localisation of leakages is of greatest interest for network operators and municipalities. Hydraulic model-based methods can be applied for leak localisation. They estimate leak localisation by minimising the differences between simulated and measured pressure time series. Yet, they require nodal demands as an input for the numerical model. Therefore, high-resolution water meter readings can be utilised to enhance the performance of leakage localisation. However, data from digital water meters can be recorded with different sampling resolutions due to hardware/software constraints and are also subject to temporal (e.g., packet losses during data communication) and spatial (e.g., consumers' agreement for the installation due to privacy regulations) data gaps.
In this work, an existing water distribution network is first extended with synthetic demand data simulated with PySIMDEUM (Steffelbauer et al. 2022a), i.e., the Python implementation of the state-of-the-art stochastic end-use model SIMDEUM (Blokker et al. 2010), to obtain highly temporal and spatial variations of water demand across the nodes in the network. Afterwards, artificial leakages with different leakage sizes are implemented at each possible network node and the candidate region for the following fine search is determined by using a sensitivity-based approach with Pearson correlation. Using these settings, the aim was to address the following questions to support a real-world implementation of an early warning system for leakage detection and localisation: ‘What sampling resolution is required for water demand data recording to achieve an effective leakage localisation?’ and ‘What is the impact of temporal and spatial data gaps’?
Based on the obtained results, the following conclusions can be made in reply to the above questions:
If leak localisation is run in times of day with low water demand, nearly every tested sampling interval showed a high-performance rate for leakage localisation. For example, there were only minor differences between coarser (e.g., from 1 h to quarter readings) and finer (e.g., 5–15 min) temporal resolution at the minimum night flow.
However, a finer temporal resolution (e.g., 15 min or less) is required for the successful localisation of leakages during periods with higher demands or even at peak demands, whereas the performance improves with the leakage size.
If the sum of applied nodal demand estimations corresponds to the real water demand of the case study, the LLS increases (e.g., from 0.73 without any data gaps to 0.81 for a temporal data gap of 20% for a leakage size of 2 l/s). In this context, quarterly readings scaled to the inflow show a good performance also during the day and represent a good alternative in case of missing high-resolution demand data.
Temporal and spatial data gaps of demand data generally decrease the performance with an increasing amount of missing data. However, the choice of a data imputation method strongly influences the result, as two out of the three tested methods show an increase in the LLS even beyond the reference value without any data gaps.
These findings also suggest that temporal averaging of water demand data (e.g., rolling average) is favourable even if complete time series are available to become more robust against random/sudden temporal and spatial fluctuations of water demand magnitudes and patterns.
Conversely, a similar behaviour is expected if the pressure measurements with a finer sampling interval are resampled to the sampling interval of the digital water meters or averaged over multiple timesteps.
FUNDING
This publication was produced as part of the ‘REWADIG’ project. This project was funded by the Climate and Energy Fund and is part of the programme ‘Smart Cities Demo – Boosting Urban Innovation 2020’ (project 884788).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.