Abstract
In this paper, a hybrid leak localization approach in WDNs is proposed, combining both model-based and data-driven modeling. Pressure heads of leak scenarios are simulated using a hydraulic model, and then used to train a machine-learning-based leak localization model. A key element of the methodology is that discrepancies between simulated and measured pressures are accounted for using a dynamically calculated bias correction, based on historical pressure measurements. Data of in-field leak experiments in operational water distribution networks were produced to evaluate our approach on realistic test data. The results show that the leak localization model is able to reduce the leak search region in parts of the network where leaks induce detectable drops in pressure. When this is not the case, the model still localizes the leak but is able to indicate a higher level of uncertainty with respect to its leak predictions.
HIGHLIGHTS
Hydraulic modeling and machine learning were combined in a hybrid WDN leak localization approach.
Data sets of real in-field leak experiments were created and made publicly available, as well as code used.
Experimental leaks were localized with adaptive levels of uncertainty.
Graphical Abstract
INTRODUCTION
Water scarcity has become a major issue in the twenty-first century. Water is a valuable resource, and in many regions of the world water distribution companies struggle to meet the demands of an ever-increasing population. Economic development and rapid urbanization have a significant impact on water usage per person, in terms both of treated water use and of the amount of indirect water in items produced and consumed. Water scarcity negatively affects the health and well-being of urban residents, environmental quality, and socioeconomic development (Bierkens & Wada 2019; He et al. 2021). Reducing the number of people suffering from water scarcity has been put forth as a top priority by the UN's Sustainable Development Goal 6 (van Vliet et al. 2021).
Leakage from water distribution networks (WDNs) is a significant issue for the water supply industry. Leakages cause economic loss, contamination risk, and an excessive environmental burden in relation to water availability and operational energy consumption (Rajeswaran et al. 2018). Losses occur in nearly all WDNs. Typically, 20%–30% of the total water in the supply system is lost, and even surpasses 40% in some countries. As a result, the reduction of water losses from leaks is prioritized by many countries (Z. Hu et al. 2021). Leakage reduction is a very challenging task due to the complexity of WDNs. WDNs are heterogeneous systems, with components of different age, material and size, several source points and a large number of consumer locations (Zaman et al. 2020).
The search for a leak is divided into two phases. In the leak detection phase, the presence of a leak in the WDN is discovered. In the leak localization (or isolation) phase, the actual location of the leak is isolated. Recent, comprehensive overviews on leak detection and localization techniques can be found in Chan et al. (2018) and Z. Hu et al. (2021), containing exhaustive comparisons between different existing methodologies. It is emphasized that although several studies show promising results in leak detection, obtaining a high leak localization accuracy remains a difficult task. The work presented in our paper focuses on the leak localization phase. Leak localization methods can be classified into transient-based, model-based and data-driven methods.
Transient-based methods rely on transient pressure waves in the presence of a leak opening. Recent examples can be found in Brunone & Ferrante (2001), Shi et al. (2020), Bohorquez et al. (2021) and Wang (2021). Most transient methods cannot be used for real-time leak localization applications (Chan et al. 2018), since these methods rely on complicated simulation models or the deployment of a large number of sensors (Romano et al. 2014).
Model-based methods rely on a numerical hydraulic model for simulating the real network. The model should accurately describe the network, and requires calibration with actual data from the real network. The approximate leakage location is determined by comparing pressure data with their estimation obtained using the hydraulic model. For example, Sophocleous et al. (2019) proposed a model-based leak detection and localization approach, based on optimizing leakage emission coefficients to minimize the difference between simulated measured pressures and flows. An important assumption of the approach is that the impact of any error in calibration on the modeled outputs following leak detection/localization is minimal or null. Other recent examples of model-based approaches can be found in Abdulshaheed et al. (2018) and Salguero et al. (2019). The complexity of a WDN makes constructing an accurate and robust model unfeasible in real cases. The pipe network topology, sensor placements, flow rates, and pressures can differ in each case, making it impossible to create a perfect model for all scenarios (Kim et al. 2016).
Data-driven methods rely on the collection of historical data to construct a predictive model for leak localization, based on statistical or pattern recognition techniques. Recent work can be found in Quiñones-Grueiro et al. (2018), Navarro et al. (2019) and Zhou et al. (2021). These methods do not require in-depth hydraulic modeling knowledge about the system. However, a main disadvantage of data-driven methods is that a large amount of data is required to train a predictive model as it needs representative data on all possible scenarios (Chan et al. 2018), while pressure data accompanying leak events is often scarce.
Hybrid approaches combine model-based and data-driven approaches to have the strengths of both in one single technique (Zaman et al. 2020). Here, pressure data for the real network can be obtained by the model-based approach initially. A leak localization classifier is then applied to this simulated data, similar to data-driven approaches, but circumventing the need for large amounts of data. A non-exhaustive list of recent works using this approach is given in Soldevila et al. (2017), Zhou et al. (2019) and X. Hu et al. (2021).
As recognized by Chan et al. (2018) and Zaman et al. (2020), today's methodologies are not tested on real system data. For example, some methodologies are only tested on simulated data such as the Hanoi WDN dataset (Quiñones-Grueiro et al. 2018) or in other WDNs where leaks are simulated artificially in software (Zhou et al. 2019; X. Hu et al. 2021). In other works, only a single in-field test per WDN is considered (Soldevila et al. 2017, 2018). Methodologies should always be validated against multiple real test events using in-field data, to prove their practicality. The research community should therefore be directed towards real-field testing. The above publications also share a common problem: code and data used are not publicly available. Crucial information (e.g. demand diagrams used) is missing, making methodological comparisons difficult.
Efforts have been made to construct public datasets for leak localization, such as the LeakDB (Vrachimis et al. 2018) and BattLeDIM (Vrachimis et al. 2020) datasets, which were specifically created to enable benchmarking of competing approaches. Several methodologies were subsequently published and validated on these datasets, e.g. Vrachimis et al. (2021) and Marzola et al. (2022). The highest performance among the participants in the BattLeDIM competition was achieved by Daniel et al. (2022). An overview on the BattLeDIM results has been published by Vrachimis et al. (2022). However, both the LeakDB and the BattLeDIM datasets make use of leaks which have been artificially simulated in software, whereas real in-field tests were not performed.
A first objective of this work is to design a hybrid leak localization approach, combining model-based and data-driven modeling. The approach was specifically designed to enable leak localization in operational WDNs. The simulated representation of a WDN does not always represent its behavior faithfully. More specifically, the authors introduce a way to account for discrepancies between simulated and measured pressures, which is defined as the Time-Windowed Head Bias Correction (TWHBC). The novelty of this approach is that the effect of the many sources of uncertainty in a WDN on the simulated pressure heads is captured implicitly. The sources of uncertainty (e.g. demand errors, pipe roughness errors) do not need to be modeled explicitly, which holds the risk that certain aspects are modeled incorrectly. The TWHBC is used to generate a feature space with a separate classification label for each leak location. Generated features for each class are intrinsically tied to the leak location scenario simulated, corresponding to that class. An elastic-net logistic regression model is then introduced which predicts leak location probabilities based on this dataset.
A second objective of this work is to enable future researchers to benchmark their leak localization approaches on realistic leak test data. Two experimental data measurement campaigns (MC) were carried out for the purpose of this paper. Multiple in-field leaks were created by opening hydrants in operational WDNs. Pressure sensor data was collected during these leak experiments, and made publicly available in two datasets.
A third objective is to evaluate our methodology in two additional, difficult settings for leak localization. In the first setting, the hydraulic model is replaced with an uncalibrated model, which makes the model a less accurate representation of the real system. In the second setting, an extended version of the WDN is considered. In large parts within this extended WDN, measured pressures do not show a discernible difference when a leak occurs compared with the leak-free scenario. Both settings are realistic and frequently occurring complications for leak localization in real WDNs, leading to a more ambiguous relation between the pressure signals and leak locations. Our designed leak localization model should be able to reflect this increased uncertainty in its predictions.
The rest of this article is organized as follows. Section 2 discusses the data collected and the leak localization methodology. In Section 3, the results of the presented methodology are discussed. Section 4 concludes this article.
METHODS
Data
First, the telemetric data is discussed. This collection of data is used to evaluate the effectiveness of our leak localization methodology. The goal is to localize various leaks inside the BK-Town District Metered Area (DMA), which is a large WDN managed by De Watergroep in Belgium.
Pressure data
In-field pressure data was collected using Altecno Data-Safe800 pressure sensors, which are piezoresistive pressure transmitters with an accuracy of ±0.25% FS, which equals ±0.05 bar for a measurement range of 0–20 bar. Data was recorded using a one-minute time step. Since boundary conditions of the hydraulic model (flow meters, pump operations, etc.) were only available at five-minute intervals, the pressure data was sub-sampled to five minutes to match the simulated heads, by discarding time steps for which no simulated head is available. The leak localization models make use of pressure head values rather than raw pressures. For this reason, pressure data was converted to head values by multiplication of a constant conversion factor (i.e., 10.1974, to convert from bar to metre head units) and then adding the corresponding pressure sensor's elevation. The number of pressure sensors used and their location in the WDN are kept fixed in this work.
Measurement campaigns
Subfigure label . | Leak volume (m3) . | Start leak . | End leak . |
---|---|---|---|
(a) | 5.76 | 2020-12-16 09:05 | 2020-12-16 09:45 |
(b) | 6.36 | 2020-12-16 10:00 | 2020-12-16 10:40 |
(c) | 6.07 | 2020-12-16 10:50 | 2020-12-16 11:30 |
(d) | 5.05 | 2020-12-16 10:50 | 2020-12-16 10:50 |
(e) | 7.24 | 2020-12-17 07:40 | 2020-12-17 08:20 |
(f) | 5.84 | 2020-12-17 08:30 | 2020-12-17 09:10 |
(g) | 7.77 | 2020-12-17 10:15 | 2020-12-17 10:55 |
(h) | 7.29 | 2020-12-17 11:05 | 2020-12-17 11:45 |
(i) | 4.65 | 2020-12-17 11:45 | 2020-12-17 12:35 |
Subfigure label . | Leak volume (m3) . | Start leak . | End leak . |
---|---|---|---|
(a) | 5.76 | 2020-12-16 09:05 | 2020-12-16 09:45 |
(b) | 6.36 | 2020-12-16 10:00 | 2020-12-16 10:40 |
(c) | 6.07 | 2020-12-16 10:50 | 2020-12-16 11:30 |
(d) | 5.05 | 2020-12-16 10:50 | 2020-12-16 10:50 |
(e) | 7.24 | 2020-12-17 07:40 | 2020-12-17 08:20 |
(f) | 5.84 | 2020-12-17 08:30 | 2020-12-17 09:10 |
(g) | 7.77 | 2020-12-17 10:15 | 2020-12-17 10:55 |
(h) | 7.29 | 2020-12-17 11:05 | 2020-12-17 11:45 |
(i) | 4.65 | 2020-12-17 11:45 | 2020-12-17 12:35 |
The actual leak volume that was lost during the experiment is given. The intended leak size was 10 m3/h.
Subfigure label . | Leak volume (m3) . | Start leak . | End leak . |
---|---|---|---|
(a) | 3.79 | 2020-08-05 09:35 | 2020-08-05 09:55 |
(b) | 3.17 | 2020-08-05 10:12 | 2020-08-05 10:32 |
(c) | 3.27 | 2020-08-05 11:06 | 2020-08-05 11:28 |
(d) | 3.03 | 2020-08-05 13:00 | 2020-08-05 13:20 |
(e) | 3.20 | 2020-08-05 13:39 | 2020-08-05 13:59 |
(f) | 3.11 | 2020-08-05 14:50 | 2020-08-05 15:10 |
(g) | 3.54 | 2020-08-05 15:28 | 2020-08-05 15:48 |
Subfigure label . | Leak volume (m3) . | Start leak . | End leak . |
---|---|---|---|
(a) | 3.79 | 2020-08-05 09:35 | 2020-08-05 09:55 |
(b) | 3.17 | 2020-08-05 10:12 | 2020-08-05 10:32 |
(c) | 3.27 | 2020-08-05 11:06 | 2020-08-05 11:28 |
(d) | 3.03 | 2020-08-05 13:00 | 2020-08-05 13:20 |
(e) | 3.20 | 2020-08-05 13:39 | 2020-08-05 13:59 |
(f) | 3.11 | 2020-08-05 14:50 | 2020-08-05 15:10 |
(g) | 3.54 | 2020-08-05 15:28 | 2020-08-05 15:48 |
The actual leak volume that was lost during the experiment is given. The intended leak size was 10 m3/h.
Preceding measurement campaigns 1 and 2, an initial calibration measurement campaign was set up in January 2020. Fifteen pressure sensors were used to collect data in normal operating conditions (i.e., without purposely inducing leaks) to calibrate the hydraulic model.
In this work, three distinct settings are considered in which leak experiments are evaluated. In the first setting, the nine leak experiments of measurement campaign 1 are localized with the proposed hybrid leak localization approach, using a calibrated hydraulic WDN model. This setting is labeled as ‘MC1, calibrated’. In the second setting, the same set of leak experiments are considered in measurement campaign 1, with the hydraulic WDN model replaced by an uncalibrated model. This setting is labeled ‘MC1, uncalibrated’. In the third setting, measurement campaign 2 with its calibrated hydraulic model is considered, labeled as ‘MC2, calibrated’.
Hydraulic modeling
The hydraulic model was built in three stages. In the first stage, a database with all the physical elements of the drinking water network is considered. The connected network is built by identifying link and node objects, and connecting the nodes using the links in the second stage. In the third and final stage, the connected network is transferred to a hydraulic network by adding boundary conditions (i.e., reservoir data, pump operations, etc.) and customer demand points. This hydraulic information is used to solve a set of equations to obtain a simulated flow and headloss profile through each link element and head profile for each node element.
The aim of the hydraulic model calibration procedure is to check the validity of the model, correcting for small existing leak losses by redistributing them uniformly over all customer points, adapting the roughness parameters of the pipes, and calibrating the pressures at the feeding points and the related reduction valves. The parameters of the headloss equations, describing the headloss caused by friction in the pipe, depend on the pipe's material, diameter and age. These parameters are calibrated to have them correspond with the actual structural state of the WDN. For all pipes in the network, Hazen–Williams headloss factors are used.
In this work, the use of a calibrated vs an uncalibrated hydraulic model is compared for the leak experiments of measurement campaign 1 (i.e., ‘MC1, calibrated’ vs ‘MC1, uncalibrated’). For the uncalibrated model, the original Hazen–Williams headloss factors are used. For the calibrated model, the calibrated factors are used.
Hydraulic simulation of leak scenarios
Simulated pressure head data was obtained using the Water Network Tool for Resilience (WNTR) (Klise et al. 2017), a Python package designed to simulate and analyze resilience of WDNs. Every leak scenario in each of the hydrants was simulated in WNTR by adding a demand of 10 m3/h to the hydrant considered. Pressure-driven analysis is used instead of demand-driven analysis, since the latter is known to output unrealistic results such as negative nodal pressure heads under the abnormal hydraulic conditions encountered in WDNs in leakage scenarios (Baek et al. 2010). Simulated pressure heads at every pressure sensor location are then obtained per leak scenario. Since different leak scenarios can be simulated independently of each other, the simulations can be parallelized. Leak scenarios for all measurement campaigns were simulated in parallel on a high-performance computing cluster, submitted as MapReduce jobs. For example for measurement campaign 1, there are 360 hydrants and 28 pressure sensors in the WDN considered. Since every hydrant is considered as a potential leak location, 360 sets of 28 pressure head time-series are generated.
Feature engineering
The leak classification dataset was constructed by applying multiple consecutive data-processing steps, starting from the simulated pressure head time-series per leak scenario, and measured pressure heads (after conversion from pressures in bar unit to heads in meter unit). The first step is a correction of the simulated pressure heads, which is defined as the Time-Windowed Head Bias Correction (TWHBC). The TWHBC is based on a comparison between simulated and measured heads on the days that precede the day of a leak experiment.
Every source of data used as input for the WDN model is associated with uncertainties (Ricca et al. 2020): demands, pipe roughnesses, node elevations, background leakages, sensor measurements and pump and valve operations. The combined effect of these uncertainties on the simulated pressure heads is captured with the TWHBC, resulting in simulated values which are more representative for the real WDN.
The mathematical formulation for the general case of N leak locations and M pressure sensors is now given. To avoid overburdening the notation, we assume a given time-window (e.g., 04:00–05:00) in the following formulas, without explicitly indicating this time-window. The leak location is indicated by i (1 to N), and the pressure sensor by j (1 to M). The day is indexed by k, with k=K corresponding to the day of the leak for which the TWHBC is calculated. We can then define the head values and residuals in the following list for sensor j and day k:
= the measured head,
= the leak-free simulated head,
= the simulated head for leak location i,
= = the head residual corresponding to the leak-free simulation.
The features of measurement campaign 1 and 2 were both constructed in the above manner. There are 360 hydrants and 19 pressure sensors in measurement campaign 1. Hence, a 19-dimensional space is constructed, with 360 Gaussian distributions that each represent a hydrant leak candidate. Seven working days preceding the first day of the leak experiments, during which the pressure sensors collected data, are used for calculating the TWHBC. Similarly, a 28-dimensional feature space is created with 1,115 Gaussians for measurement campaign 2. Here, 11 working days are used for the TWHBC. For the features of the measurement campaign, different time-windowed averages are used (in contrast to windows of one hour in the synthetic example). Every leak experiment of measurement campaign 1 lasted at least 40 minutes. Hence, windows of 40 minutes were used. Similarly, windows of 20 minutes were used for measurement campaign 2.
To create the training data points, each Gaussian is sampled 40 times to obtain a representative number of samples for each hydrant. As a last processing step, features are standardized by subtracting the mean and scaling to unit variance.
Leak localization
Formulated as a supervised classification problem, the feature vector to be classified corresponds to the processed pressure head features xj in Equation (4), and the leak hydrant location to the classification label i. Logistic regression is used as the classification algorithm. It is a discriminative, linear model. To prevent the model from overfitting, elastic-net regularization is used (Zou & Hastie 2005), which adds a penalty term to the objective function containing both L1 and L2 regularization. Cross-entropy loss, also called logistic regression loss or log loss, is defined on probability estimates of the classifier. It can be used to evaluate the probability outputs of a classifier instead of its discrete predictions. Our aim is to optimize this cross-entropy loss. The probabilistic densities where a leak might be located are of main interest, instead of only one discrete leak location per leak prediction. However, an important limitation of the classification model is that it is trained to localize single leaks only. It is not designed to localize multiple, simultaneous leaks in the WDN.
The classification model is trained on simulated leak scenarios using the hydraulic model, as only a small number of in-field experiments is available for measurement campaigns 1 and 2 (i.e., nine and seven leaks, respectively). Given the small number of test data, the authors evaluate and report the full grid search of the elastic-net hyperparameters (i.e. the elastic-net mixing parameter and the inverse of the regularization strength) for the average cross-entropy loss obtained over all leak experiments per measurement campaign. The predicted leak probabilities of the best hyperparameter combination are then visualized for all leak experiments.
RESULTS AND DISCUSSION
Hyperparameter grid search
Probabilistic leak prediction
The shortest path length between the leak location with the highest predicted probability and the true leak location for every leak experiment is given in Table 3. There is a notable correspondence between the results of Figure 6 and the matching shortest path lengths in Table 3 (column ‘MC1, calibrated’). For example, the cluster of high probability predictions around the true leak location in Figure 6(a)) results in a path length of only 0.18 km. The opposite is true in Figure 6(d)), resulting in the longest path length among the leak experiments of ‘MC1, calibrated’, equal to 4.96 km.
Subfigure label . | Path length (km) MC1, calibrated . | Path length (km) MC1, uncalibrated . | Path length (km) MC2, calibrated . |
---|---|---|---|
(a) | 0.18 | 6.13 | 7.60 |
(b) | 0.71 | 7.31 | 0.98 |
(c) | 0.44 | 0.44 | 12.92 |
(d) | 4.96 | 5.11 | 10.07 |
(e) | 1.83 | 6.54 | 0.94 |
(f) | 1.75 | 5.51 | 1.34 |
(g) | 0.42 | 0.42 | 17.56 |
(h) | 2.56 | 2.56 | n/a |
(i) | 3.38 | 3.40 | n/a |
Subfigure label . | Path length (km) MC1, calibrated . | Path length (km) MC1, uncalibrated . | Path length (km) MC2, calibrated . |
---|---|---|---|
(a) | 0.18 | 6.13 | 7.60 |
(b) | 0.71 | 7.31 | 0.98 |
(c) | 0.44 | 0.44 | 12.92 |
(d) | 4.96 | 5.11 | 10.07 |
(e) | 1.83 | 6.54 | 0.94 |
(f) | 1.75 | 5.51 | 1.34 |
(g) | 0.42 | 0.42 | 17.56 |
(h) | 2.56 | 2.56 | n/a |
(i) | 3.38 | 3.40 | n/a |
Additional leak experiments
Additional leak experiments during the second measurement campaign in the extended version of the WDN are shown in Figure 2. For this extended WDN, it is known that the pressure heads for a large share of the leak scenarios are insensitive to leaks. This aspect is discussed in detail in the next section. Hyperparameter grid search results and predicted leak probabilities are shown in Supplementary Figures 3 and 4, respectively. The leak probabilities show that the uncertainty of the leak localization model varies greatly per leak experiment considered. The same observation holds for predictions using the uncalibrated hydraulic model, shown in Supplementary Figures 1 and 2. By comparing columns ‘MC1, calibrated’ and ‘MC1, uncalibrated’ in Table 3, it can be seen that the use of an uncalibrated hydraulic model results in shortest path lengths which are approximately equally long (experiments c, g, h, and i) or longer (experiments a, b, d, e, and f). Thus, leak localization performance has worsened.
Shortest path lengths for MC2 are given in Table 3, column ‘MC2, calibrated’, with lengths ranging from 0.94 km (experiment e) to 17.56 km (experiment g).
Sensitivity of the extended WDN to leaks
For every hydrant in the WDN, two different leak scenarios are compared. The first scenario is the leak-free situation. The second scenario is when a leak of 10 m3/h is simulated in the hydrant. The pressure head difference (i.e. the head residual) between the two scenarios is calculated for every pressure sensor. The hydrant considered is colored according to the maximum head residual that occurs over the pressure sensors.
Figure 7 shows that the sensitivity of the pressure sensors to leaks varies greatly, depending on the location of the leak. A leak in the yellow regions does not lead to a detectable pressure head residual in any pressure sensor. In contrast, a leak in the black regions leads to a head residual of considerable magnitude in at least one of the pressure sensors. The subset of the WDN used in measurement campaign 1 is the region in the north of the extended WDN, which has a moderate sensitivity to leaks of 10 m3/h.
CONCLUSION
Prediction results were presented for a novel leak localization methodology, which consists of a combination of the TWHBC and optimizing probabilistic leak predictions. The TWHBC quantifies the uncertainty of simulated pressure heads, resulting from the various sources of modeling errors and uncertainties in the hydraulic WDN model. Features generated after calculating the TWHBC were used to train an elastic-net logistic regression with every leak location labeled as a class. To evaluate the methodology on real in-field leak experiments, leak datasets were created for two measurement campaigns. These were made publicly available as part of this paper.
Using the designed methodology, regions of higher leak probabilities are predicted in which the target leaks induced by the experiments are located. This focus on probabilistic leak search regions is beneficial for further leak localization, for example by technicians using acoustic equipment. Since low probability regions can be excluded, further pinpointing of the leak can happen in a more focused way.
The spatial extent of the leak regions predicted varies in size considerably. Next to a visual evaluation of the leak probability distributions, the leak localization performance was also evaluated numerically by computing the shortest path length between the highest predicted probability and the true leak location. The first setting considered was ‘MC1, calibrated’, using a calibrated hydraulic model to simulate leak scenarios for measurement campaign 1. Shortest path lengths computed over the leak experiments range from 0.18 km to 4.96 km.
More problematic leak localization settings were also evaluated. The second setting considered was ‘MC1, uncalibrated’, where the hydraulic model of MC1 was replaced with an uncalibrated model. Shortest path lengths computed over the leak experiments range from 0.44 km to 7.31 km. Thus, the use of an uncalibrated model leads to a worse localization performance. The third setting is ‘MC2, calibrated’, when a larger WDN is considered in which pressures in large parts of the WDN are insensitive to leaks. Shortest path lengths range from 0.94 km to 17.56 km. In both settings ‘MC1, uncalibrated’ and ‘MC2, calibrated’, the leak localization model is able to indicate higher levels of uncertainty if precise leak localization is not possible.
Reliable WDN management may be achieved with an accurate calibrated model only. Additionally, the calibration of a hydraulic WDN model is a complex procedure which requires considerable domain expertise (Zanfei et al. 2020). Hence, it is interesting to observe that even when the model is uncalibrated, leak localization is still possible in a number of leak experiments by using the proposed methodology.
The presence of large predicted leak regions for some experiments is also indicative of a limitation of our approach. It relies on differences in pressure heads between leak-free and leaky scenarios. However, a pressure drop (in at least one of the pressure sensors) does not occur for every leak scenario, as visualized in Figure 7. The leak location may be located either close to large pipes or inside of a highly meshed grid, where pressure sensitivity is lower. Another limitation is that a few days of pressure data collection before the actual leak event are needed to calculate the TWHBC.
A final limitation of this work is that only leakages of 10 m3/h were considered. These are easier to locate than background leakages, which are defined as leak losses below 0.5 m3/h (Chan et al. 2018), but are in this case of a large DMA in the same magnitude as the short-term variability of the consumption.
Further work could focus on improving the leak localization performance for the problematic settings presented in this work. Future research could also examine additional complications in the leak localization methodology presented. For example, more complex classification strategies using machine learning could be applied to the processed feature data. Further complexities in the TWHBC could be considered, such as including covariances in the pressure sensor uncertainty calculations. In this work, all simulated pressure head time-series were generated strictly before processing the time-series into a leak localization dataset. Active learning for leak classification could be researched to specify which additional pressure head time-series are most interesting to generate based on feedback from a leak classification model trained on pressure heads already generated. For this purpose, the hydraulic models used in this work were also made publicly available. Future work could also focus on improving the proposed methodology to localize multiple simultaneous leaks, since it is currently designed to localize single leaks only. Another open problem is whether a similar leak localization performance for the experiments considered can be achieved with a smaller amount of pressure sensors.
ACKNOWLEDGEMENTS
The authors thank T. Van Daele, P. J. Haest, J. Debaenst for their research assistance, and De Watergroep for sharing data. The authors thank HydroScan for setting up LeakReduxTM for the leak detection phase to determine the start and magnitude of the leaks. The work of G. Mazaev and M. Weyns was supported by Research Foundation – Flanders (FWO) under strategic basis research doctoral grants (1S88020N, 1SD8821N). This work was supported through the SmartWaterGrid project, an imec.icon research project funded by imec and Agentschap Innoveren & Ondernemen.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. The URL for the dataset is: https://doi.org/10.5281/zenodo.7255403. The URL for the source code is: https://github.com/predict-idlab/phys-ml-leak-localization.
CONFLICT OF INTEREST
The authors declare there is no conflict.