In this paper, a hybrid leak localization approach in WDNs is proposed, combining both model-based and data-driven modeling. Pressure heads of leak scenarios are simulated using a hydraulic model, and then used to train a machine-learning-based leak localization model. A key element of the methodology is that discrepancies between simulated and measured pressures are accounted for using a dynamically calculated bias correction, based on historical pressure measurements. Data of in-field leak experiments in operational water distribution networks were produced to evaluate our approach on realistic test data. The results show that the leak localization model is able to reduce the leak search region in parts of the network where leaks induce detectable drops in pressure. When this is not the case, the model still localizes the leak but is able to indicate a higher level of uncertainty with respect to its leak predictions.

  • Hydraulic modeling and machine learning were combined in a hybrid WDN leak localization approach.

  • Data sets of real in-field leak experiments were created and made publicly available, as well as code used.

  • Experimental leaks were localized with adaptive levels of uncertainty.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Water scarcity has become a major issue in the twenty-first century. Water is a valuable resource, and in many regions of the world water distribution companies struggle to meet the demands of an ever-increasing population. Economic development and rapid urbanization have a significant impact on water usage per person, in terms both of treated water use and of the amount of indirect water in items produced and consumed. Water scarcity negatively affects the health and well-being of urban residents, environmental quality, and socioeconomic development (Bierkens & Wada 2019; He et al. 2021). Reducing the number of people suffering from water scarcity has been put forth as a top priority by the UN's Sustainable Development Goal 6 (van Vliet et al. 2021).

Leakage from water distribution networks (WDNs) is a significant issue for the water supply industry. Leakages cause economic loss, contamination risk, and an excessive environmental burden in relation to water availability and operational energy consumption (Rajeswaran et al. 2018). Losses occur in nearly all WDNs. Typically, 20%–30% of the total water in the supply system is lost, and even surpasses 40% in some countries. As a result, the reduction of water losses from leaks is prioritized by many countries (Z. Hu et al. 2021). Leakage reduction is a very challenging task due to the complexity of WDNs. WDNs are heterogeneous systems, with components of different age, material and size, several source points and a large number of consumer locations (Zaman et al. 2020).

The search for a leak is divided into two phases. In the leak detection phase, the presence of a leak in the WDN is discovered. In the leak localization (or isolation) phase, the actual location of the leak is isolated. Recent, comprehensive overviews on leak detection and localization techniques can be found in Chan et al. (2018) and Z. Hu et al. (2021), containing exhaustive comparisons between different existing methodologies. It is emphasized that although several studies show promising results in leak detection, obtaining a high leak localization accuracy remains a difficult task. The work presented in our paper focuses on the leak localization phase. Leak localization methods can be classified into transient-based, model-based and data-driven methods.

Transient-based methods rely on transient pressure waves in the presence of a leak opening. Recent examples can be found in Brunone & Ferrante (2001), Shi et al. (2020), Bohorquez et al. (2021) and Wang (2021). Most transient methods cannot be used for real-time leak localization applications (Chan et al. 2018), since these methods rely on complicated simulation models or the deployment of a large number of sensors (Romano et al. 2014).

Model-based methods rely on a numerical hydraulic model for simulating the real network. The model should accurately describe the network, and requires calibration with actual data from the real network. The approximate leakage location is determined by comparing pressure data with their estimation obtained using the hydraulic model. For example, Sophocleous et al. (2019) proposed a model-based leak detection and localization approach, based on optimizing leakage emission coefficients to minimize the difference between simulated measured pressures and flows. An important assumption of the approach is that the impact of any error in calibration on the modeled outputs following leak detection/localization is minimal or null. Other recent examples of model-based approaches can be found in Abdulshaheed et al. (2018) and Salguero et al. (2019). The complexity of a WDN makes constructing an accurate and robust model unfeasible in real cases. The pipe network topology, sensor placements, flow rates, and pressures can differ in each case, making it impossible to create a perfect model for all scenarios (Kim et al. 2016).

Data-driven methods rely on the collection of historical data to construct a predictive model for leak localization, based on statistical or pattern recognition techniques. Recent work can be found in Quiñones-Grueiro et al. (2018), Navarro et al. (2019) and Zhou et al. (2021). These methods do not require in-depth hydraulic modeling knowledge about the system. However, a main disadvantage of data-driven methods is that a large amount of data is required to train a predictive model as it needs representative data on all possible scenarios (Chan et al. 2018), while pressure data accompanying leak events is often scarce.

Hybrid approaches combine model-based and data-driven approaches to have the strengths of both in one single technique (Zaman et al. 2020). Here, pressure data for the real network can be obtained by the model-based approach initially. A leak localization classifier is then applied to this simulated data, similar to data-driven approaches, but circumventing the need for large amounts of data. A non-exhaustive list of recent works using this approach is given in Soldevila et al. (2017), Zhou et al. (2019) and X. Hu et al. (2021).

As recognized by Chan et al. (2018) and Zaman et al. (2020), today's methodologies are not tested on real system data. For example, some methodologies are only tested on simulated data such as the Hanoi WDN dataset (Quiñones-Grueiro et al. 2018) or in other WDNs where leaks are simulated artificially in software (Zhou et al. 2019; X. Hu et al. 2021). In other works, only a single in-field test per WDN is considered (Soldevila et al. 2017, 2018). Methodologies should always be validated against multiple real test events using in-field data, to prove their practicality. The research community should therefore be directed towards real-field testing. The above publications also share a common problem: code and data used are not publicly available. Crucial information (e.g. demand diagrams used) is missing, making methodological comparisons difficult.

Efforts have been made to construct public datasets for leak localization, such as the LeakDB (Vrachimis et al. 2018) and BattLeDIM (Vrachimis et al. 2020) datasets, which were specifically created to enable benchmarking of competing approaches. Several methodologies were subsequently published and validated on these datasets, e.g. Vrachimis et al. (2021) and Marzola et al. (2022). The highest performance among the participants in the BattLeDIM competition was achieved by Daniel et al. (2022). An overview on the BattLeDIM results has been published by Vrachimis et al. (2022). However, both the LeakDB and the BattLeDIM datasets make use of leaks which have been artificially simulated in software, whereas real in-field tests were not performed.

A first objective of this work is to design a hybrid leak localization approach, combining model-based and data-driven modeling. The approach was specifically designed to enable leak localization in operational WDNs. The simulated representation of a WDN does not always represent its behavior faithfully. More specifically, the authors introduce a way to account for discrepancies between simulated and measured pressures, which is defined as the Time-Windowed Head Bias Correction (TWHBC). The novelty of this approach is that the effect of the many sources of uncertainty in a WDN on the simulated pressure heads is captured implicitly. The sources of uncertainty (e.g. demand errors, pipe roughness errors) do not need to be modeled explicitly, which holds the risk that certain aspects are modeled incorrectly. The TWHBC is used to generate a feature space with a separate classification label for each leak location. Generated features for each class are intrinsically tied to the leak location scenario simulated, corresponding to that class. An elastic-net logistic regression model is then introduced which predicts leak location probabilities based on this dataset.

A second objective of this work is to enable future researchers to benchmark their leak localization approaches on realistic leak test data. Two experimental data measurement campaigns (MC) were carried out for the purpose of this paper. Multiple in-field leaks were created by opening hydrants in operational WDNs. Pressure sensor data was collected during these leak experiments, and made publicly available in two datasets.

A third objective is to evaluate our methodology in two additional, difficult settings for leak localization. In the first setting, the hydraulic model is replaced with an uncalibrated model, which makes the model a less accurate representation of the real system. In the second setting, an extended version of the WDN is considered. In large parts within this extended WDN, measured pressures do not show a discernible difference when a leak occurs compared with the leak-free scenario. Both settings are realistic and frequently occurring complications for leak localization in real WDNs, leading to a more ambiguous relation between the pressure signals and leak locations. Our designed leak localization model should be able to reflect this increased uncertainty in its predictions.

The rest of this article is organized as follows. Section 2 discusses the data collected and the leak localization methodology. In Section 3, the results of the presented methodology are discussed. Section 4 concludes this article.

Data

First, the telemetric data is discussed. This collection of data is used to evaluate the effectiveness of our leak localization methodology. The goal is to localize various leaks inside the BK-Town District Metered Area (DMA), which is a large WDN managed by De Watergroep in Belgium.

Pressure data

In-field pressure data was collected using Altecno Data-Safe800 pressure sensors, which are piezoresistive pressure transmitters with an accuracy of ±0.25% FS, which equals ±0.05 bar for a measurement range of 0–20 bar. Data was recorded using a one-minute time step. Since boundary conditions of the hydraulic model (flow meters, pump operations, etc.) were only available at five-minute intervals, the pressure data was sub-sampled to five minutes to match the simulated heads, by discarding time steps for which no simulated head is available. The leak localization models make use of pressure head values rather than raw pressures. For this reason, pressure data was converted to head values by multiplication of a constant conversion factor (i.e., 10.1974, to convert from bar to metre head units) and then adding the corresponding pressure sensor's elevation. The number of pressure sensors used and their location in the WDN are kept fixed in this work.

Measurement campaigns

Pressure data was collected in two different measurement campaigns. A measurement campaign is defined as a period in which leaks were purposely induced by opening hydrants for a relatively short duration. Hydrants were chosen since they provide for an accessible way to induce realistic leaks. Hydrants are also numerous in a WDN, resulting in a representative discretization of the network. An overview of measurement campaign 1 is shown in Figure 1. It consists of 360 hydrants as potential leak locations, with 19 pressure sensors installed. This measurement campaign was conducted in December 2020, in a subset of BK-Town. Its nine leak experiments lasted approximately 40 minutes, and are summarized in Table 1. The intended leak size was 10 m3/h, as this is slightly larger than the detection limit of the leaks in this large DMA using the commercial solution (LeakReduxTM) to determine the start, end, and magnitude of the leaks. The actual leak volume lost is also shown in Table 1. An overview of measurement campaign 2 is shown in Figure 2. The WDN considered is a larger, extended version of the WDN in measurement campaign 1. It consists of 1,115 hydrants as potential leak locations, with 28 pressure sensors installed. It was conducted in July and August 2020, in the full BK-Town network, Belgium. Its seven leak experiments lasted approximately 20 minutes and are summarized in Table 2. The intended leak size was also 10 m3/h.
Table 1

Overview of leak experiments of measurement campaign 1

Subfigure labelLeak volume (m3)Start leakEnd leak
(a) 5.76 2020-12-16 09:05 2020-12-16 09:45 
(b) 6.36 2020-12-16 10:00 2020-12-16 10:40 
(c) 6.07 2020-12-16 10:50 2020-12-16 11:30 
(d) 5.05 2020-12-16 10:50 2020-12-16 10:50 
(e) 7.24 2020-12-17 07:40 2020-12-17 08:20 
(f) 5.84 2020-12-17 08:30 2020-12-17 09:10 
(g) 7.77 2020-12-17 10:15 2020-12-17 10:55 
(h) 7.29 2020-12-17 11:05 2020-12-17 11:45 
(i) 4.65 2020-12-17 11:45 2020-12-17 12:35 
Subfigure labelLeak volume (m3)Start leakEnd leak
(a) 5.76 2020-12-16 09:05 2020-12-16 09:45 
(b) 6.36 2020-12-16 10:00 2020-12-16 10:40 
(c) 6.07 2020-12-16 10:50 2020-12-16 11:30 
(d) 5.05 2020-12-16 10:50 2020-12-16 10:50 
(e) 7.24 2020-12-17 07:40 2020-12-17 08:20 
(f) 5.84 2020-12-17 08:30 2020-12-17 09:10 
(g) 7.77 2020-12-17 10:15 2020-12-17 10:55 
(h) 7.29 2020-12-17 11:05 2020-12-17 11:45 
(i) 4.65 2020-12-17 11:45 2020-12-17 12:35 

The actual leak volume that was lost during the experiment is given. The intended leak size was 10 m3/h.

Table 2

Overview of leak experiments of measurement campaign 2

Subfigure labelLeak volume (m3)Start leakEnd leak
(a) 3.79 2020-08-05 09:35 2020-08-05 09:55 
(b) 3.17 2020-08-05 10:12 2020-08-05 10:32 
(c) 3.27 2020-08-05 11:06 2020-08-05 11:28 
(d) 3.03 2020-08-05 13:00 2020-08-05 13:20 
(e) 3.20 2020-08-05 13:39 2020-08-05 13:59 
(f) 3.11 2020-08-05 14:50 2020-08-05 15:10 
(g) 3.54 2020-08-05 15:28 2020-08-05 15:48 
Subfigure labelLeak volume (m3)Start leakEnd leak
(a) 3.79 2020-08-05 09:35 2020-08-05 09:55 
(b) 3.17 2020-08-05 10:12 2020-08-05 10:32 
(c) 3.27 2020-08-05 11:06 2020-08-05 11:28 
(d) 3.03 2020-08-05 13:00 2020-08-05 13:20 
(e) 3.20 2020-08-05 13:39 2020-08-05 13:59 
(f) 3.11 2020-08-05 14:50 2020-08-05 15:10 
(g) 3.54 2020-08-05 15:28 2020-08-05 15:48 

The actual leak volume that was lost during the experiment is given. The intended leak size was 10 m3/h.

Figure 1

Overview of the WDN of measurement campaign 1, with its pressure sensor and hydrant leak locations.

Figure 1

Overview of the WDN of measurement campaign 1, with its pressure sensor and hydrant leak locations.

Close modal
Figure 2

Overview of the extended WDN of measurement campaign 2, with its pressure sensor and hydrant leak locations.

Figure 2

Overview of the extended WDN of measurement campaign 2, with its pressure sensor and hydrant leak locations.

Close modal

Preceding measurement campaigns 1 and 2, an initial calibration measurement campaign was set up in January 2020. Fifteen pressure sensors were used to collect data in normal operating conditions (i.e., without purposely inducing leaks) to calibrate the hydraulic model.

In this work, three distinct settings are considered in which leak experiments are evaluated. In the first setting, the nine leak experiments of measurement campaign 1 are localized with the proposed hybrid leak localization approach, using a calibrated hydraulic WDN model. This setting is labeled as ‘MC1, calibrated’. In the second setting, the same set of leak experiments are considered in measurement campaign 1, with the hydraulic WDN model replaced by an uncalibrated model. This setting is labeled ‘MC1, uncalibrated’. In the third setting, measurement campaign 2 with its calibrated hydraulic model is considered, labeled as ‘MC2, calibrated’.

Hydraulic modeling

The hydraulic model was built in three stages. In the first stage, a database with all the physical elements of the drinking water network is considered. The connected network is built by identifying link and node objects, and connecting the nodes using the links in the second stage. In the third and final stage, the connected network is transferred to a hydraulic network by adding boundary conditions (i.e., reservoir data, pump operations, etc.) and customer demand points. This hydraulic information is used to solve a set of equations to obtain a simulated flow and headloss profile through each link element and head profile for each node element.

The aim of the hydraulic model calibration procedure is to check the validity of the model, correcting for small existing leak losses by redistributing them uniformly over all customer points, adapting the roughness parameters of the pipes, and calibrating the pressures at the feeding points and the related reduction valves. The parameters of the headloss equations, describing the headloss caused by friction in the pipe, depend on the pipe's material, diameter and age. These parameters are calibrated to have them correspond with the actual structural state of the WDN. For all pipes in the network, Hazen–Williams headloss factors are used.

In this work, the use of a calibrated vs an uncalibrated hydraulic model is compared for the leak experiments of measurement campaign 1 (i.e., ‘MC1, calibrated’ vs ‘MC1, uncalibrated’). For the uncalibrated model, the original Hazen–Williams headloss factors are used. For the calibrated model, the calibrated factors are used.

Hydraulic simulation of leak scenarios

Simulated pressure head data was obtained using the Water Network Tool for Resilience (WNTR) (Klise et al. 2017), a Python package designed to simulate and analyze resilience of WDNs. Every leak scenario in each of the hydrants was simulated in WNTR by adding a demand of 10 m3/h to the hydrant considered. Pressure-driven analysis is used instead of demand-driven analysis, since the latter is known to output unrealistic results such as negative nodal pressure heads under the abnormal hydraulic conditions encountered in WDNs in leakage scenarios (Baek et al. 2010). Simulated pressure heads at every pressure sensor location are then obtained per leak scenario. Since different leak scenarios can be simulated independently of each other, the simulations can be parallelized. Leak scenarios for all measurement campaigns were simulated in parallel on a high-performance computing cluster, submitted as MapReduce jobs. For example for measurement campaign 1, there are 360 hydrants and 28 pressure sensors in the WDN considered. Since every hydrant is considered as a potential leak location, 360 sets of 28 pressure head time-series are generated.

Feature engineering

The leak classification dataset was constructed by applying multiple consecutive data-processing steps, starting from the simulated pressure head time-series per leak scenario, and measured pressure heads (after conversion from pressures in bar unit to heads in meter unit). The first step is a correction of the simulated pressure heads, which is defined as the Time-Windowed Head Bias Correction (TWHBC). The TWHBC is based on a comparison between simulated and measured heads on the days that precede the day of a leak experiment.

Every source of data used as input for the WDN model is associated with uncertainties (Ricca et al. 2020): demands, pipe roughnesses, node elevations, background leakages, sensor measurements and pump and valve operations. The combined effect of these uncertainties on the simulated pressure heads is captured with the TWHBC, resulting in simulated values which are more representative for the real WDN.

The TWHBC calculation is exemplified on synthetic data for one pressure sensor in Figure 3. On the right, head time-series are shown for day N, from 00:00 to 05:00. Time-windowed averages of one hour are considered, for the simulated (blue line) and measured (grey line) heads. The measured heads represent the head values that are experimentally measured using the pressure sensors. For each hour, there is a discrepancy between the simulated and measured line, designated as the ‘head bias’. This head bias is corrected for by considering day N − 3 to day N − 1 (on the left of the figure). For each hour, the mean and standard deviation of the head bias are calculated over day N − 3 to day N − 1. For example, the mean and standard deviation are equal to 3.50 m and 0.49 m respectively, for hour 04:00–05:00. The head bias of each hour of day N is then corrected for by subtracting the calculated means, shown on the right of the figure. The uncertainty of each correction is given by the standard deviations. For example, the corrected head for hour 04:00–05:00 is now 11.50 m ± 0.49 m. The corresponding, measured head of 11.80 m lies within this interval.
Figure 3

Illustration of the TWHBC calculation with synthetic data. Pressure heads are averaged in windows of one hour in this example. The TWHBC is calculated for day N, based on the head biases of the three days preceding it. The mean and standard deviation of the head biases are calculated per hour.

Figure 3

Illustration of the TWHBC calculation with synthetic data. Pressure heads are averaged in windows of one hour in this example. The TWHBC is calculated for day N, based on the head biases of the three days preceding it. The mean and standard deviation of the head biases are calculated per hour.

Close modal
After calculating the TWHBC for every pressure sensor, the next step is to construct the leak localization feature space. The feature space is based on the simulated heads of all pressure sensors, corrected by the TWHBC. This is again exemplified with synthetic data in Figure 4. Time-windowed measured heads and corrected simulated heads are shown in Figure 4(a) and Figure 4(b) for two pressure sensors. Corrected simulated heads for two leaks are shown in cyan and red, for hour 04:00–05:00. For clarity, note that the measured heads in grey in the figures are the only experimentally measured heads in the figure; all other heads are thus obtained by using hydraulic simulations. For the two simulated leaks, the heads drop compared with the leak-free values. Note that the uncertainty per sensor is the same for the leak-free and leaky heads (i.e. σ1 for sensor 1, σ2 for sensor 2). The difference in heads, i.e., the head residual, compared with the leak-free heads is indicated. For example, μ2,1 indicates the head residual of leak 2 for sensor 1. The head residual for the actual head measurement is also indicated as xj, with j representing sensor 1 or 2. In Figure 4(c), Gaussian distributions corresponding to the leaks are shown. The mean vectors and covariance matrices of the Gaussian are shown, as well as data points sampled from each distribution. The mean and standard deviations of the Gaussian distributions are obtained from the parameters given in Figure 4(a) and Figure 4(b). The Gaussians have different mean vectors, but they share the same diagonal covariance matrix. The features xj corresponding to the measurements are also plotted. In this example, the measurement would probably be classified as belonging to leak 1.
Figure 4

Pressure sensor feature map obtained after calculation of the TWHBC. In (a) and (b), pressure heads are shown for two sensors, obtained from synthetic data as in Figure 3. For hour 04:00–05:00, two leak scenarios are also shown. In (c) the two Gaussian distributions created for both leak scenarios are shown, including data points sampled from these distributions. The inner and other ellipses of the Gaussians represent one and two standard deviations, respectively.

Figure 4

Pressure sensor feature map obtained after calculation of the TWHBC. In (a) and (b), pressure heads are shown for two sensors, obtained from synthetic data as in Figure 3. For hour 04:00–05:00, two leak scenarios are also shown. In (c) the two Gaussian distributions created for both leak scenarios are shown, including data points sampled from these distributions. The inner and other ellipses of the Gaussians represent one and two standard deviations, respectively.

Close modal

The mathematical formulation for the general case of N leak locations and M pressure sensors is now given. To avoid overburdening the notation, we assume a given time-window (e.g., 04:00–05:00) in the following formulas, without explicitly indicating this time-window. The leak location is indicated by i (1 to N), and the pressure sensor by j (1 to M). The day is indexed by k, with k=K corresponding to the day of the leak for which the TWHBC is calculated. We can then define the head values and residuals in the following list for sensor j and day k:

  • = the measured head,

  • = the leak-free simulated head,

  • = the simulated head for leak location i,

  • = = the head residual corresponding to the leak-free simulation.

The means of the TWHBC are given by the head residuals corresponding to the leak-free simulation averaged over days k = 1 to k=K − 1:
(1)
The (sample) standard deviations of the TWHBC are calculated as follows:
(2)
The means of the multivariate Gaussians are equal to the debiased head residuals corresponding to the leak simulations at day K:
(3)
Note that is in fact not needed to calculate this difference, since the leak-free and leaky simulations are debiased using the same value. The head residuals of the actual head measurements at day K are calculated as
(4)
The features for each leak candidate i are generated through randomly sampling the following multivariate Gaussian with a diagonal covariance matrix:
(5)

The features of measurement campaign 1 and 2 were both constructed in the above manner. There are 360 hydrants and 19 pressure sensors in measurement campaign 1. Hence, a 19-dimensional space is constructed, with 360 Gaussian distributions that each represent a hydrant leak candidate. Seven working days preceding the first day of the leak experiments, during which the pressure sensors collected data, are used for calculating the TWHBC. Similarly, a 28-dimensional feature space is created with 1,115 Gaussians for measurement campaign 2. Here, 11 working days are used for the TWHBC. For the features of the measurement campaign, different time-windowed averages are used (in contrast to windows of one hour in the synthetic example). Every leak experiment of measurement campaign 1 lasted at least 40 minutes. Hence, windows of 40 minutes were used. Similarly, windows of 20 minutes were used for measurement campaign 2.

To create the training data points, each Gaussian is sampled 40 times to obtain a representative number of samples for each hydrant. As a last processing step, features are standardized by subtracting the mean and scaling to unit variance.

Leak localization

Formulated as a supervised classification problem, the feature vector to be classified corresponds to the processed pressure head features xj in Equation (4), and the leak hydrant location to the classification label i. Logistic regression is used as the classification algorithm. It is a discriminative, linear model. To prevent the model from overfitting, elastic-net regularization is used (Zou & Hastie 2005), which adds a penalty term to the objective function containing both L1 and L2 regularization. Cross-entropy loss, also called logistic regression loss or log loss, is defined on probability estimates of the classifier. It can be used to evaluate the probability outputs of a classifier instead of its discrete predictions. Our aim is to optimize this cross-entropy loss. The probabilistic densities where a leak might be located are of main interest, instead of only one discrete leak location per leak prediction. However, an important limitation of the classification model is that it is trained to localize single leaks only. It is not designed to localize multiple, simultaneous leaks in the WDN.

The classification model is trained on simulated leak scenarios using the hydraulic model, as only a small number of in-field experiments is available for measurement campaigns 1 and 2 (i.e., nine and seven leaks, respectively). Given the small number of test data, the authors evaluate and report the full grid search of the elastic-net hyperparameters (i.e. the elastic-net mixing parameter and the inverse of the regularization strength) for the average cross-entropy loss obtained over all leak experiments per measurement campaign. The predicted leak probabilities of the best hyperparameter combination are then visualized for all leak experiments.

Hyperparameter grid search

The cross-entropy loss of the elastic-net logistic regression model was computed for each of the leak experiments of measurement campaign 1 shown in Figure 1. The mean loss over the nine experiments is visualized in Figure 5, for each combination of the L1 ratio (the elastic-net mixing parameter), and the regularization parameter C (the inverse of the regularization strength). The lowest loss is obtained for C = 0.01 and L1 ratio = 0.0. The best model thus only uses L2 regularization, without L1 regularization.
Figure 5

Grid search over the hyperparameters of elastic-net logistic regression, using a calibrated hydraulic model for measurement campaign 1 (‘MC1, calibrated’). The mean cross-entropy loss is computed over nine experimental leaks.

Figure 5

Grid search over the hyperparameters of elastic-net logistic regression, using a calibrated hydraulic model for measurement campaign 1 (‘MC1, calibrated’). The mean cross-entropy loss is computed over nine experimental leaks.

Close modal

Probabilistic leak prediction

The leak predictions of the logistic regression model with hyperparameters C = 0.01 and L1 ratio = 0.0 are shown in Figure 6. The predicted leak probabilities in every hydrant of the WDN are visualized using a color scale. Darker colors indicate higher leak probabilities. The leak localization model is able to assign higher probabilities to the appropriate regions, since the true hydrant location of every experiment is located within such a region. Depending on the leak experiment, the leak localization region may vary in size considerably. For example, the predictions in Figure 6(c)) are closely concentrated around the true leak location. Conversely, the probabilities are spread out more over the hydrants in Figure 6(e)).
Figure 6

Leak probabilities predicted for the nine leak experiments of measurement campaign 1, using a calibrated hydraulic model (‘MC1, calibrated’). Higher probabilities are shown in an increasingly darker color. The logistic regression model with hyperparameters C = 0.01, L1 ratio = 0.0 is used.

Figure 6

Leak probabilities predicted for the nine leak experiments of measurement campaign 1, using a calibrated hydraulic model (‘MC1, calibrated’). Higher probabilities are shown in an increasingly darker color. The logistic regression model with hyperparameters C = 0.01, L1 ratio = 0.0 is used.

Close modal

The shortest path length between the leak location with the highest predicted probability and the true leak location for every leak experiment is given in Table 3. There is a notable correspondence between the results of Figure 6 and the matching shortest path lengths in Table 3 (column ‘MC1, calibrated’). For example, the cluster of high probability predictions around the true leak location in Figure 6(a)) results in a path length of only 0.18 km. The opposite is true in Figure 6(d)), resulting in the longest path length among the leak experiments of ‘MC1, calibrated’, equal to 4.96 km.

Table 3

Shortest path length between the leak location with the highest predicted probability and the true leak location for every leak experiment

Subfigure labelPath length (km) MC1, calibratedPath length (km) MC1, uncalibratedPath length (km) MC2, calibrated
(a) 0.18 6.13 7.60 
(b) 0.71 7.31 0.98 
(c) 0.44 0.44 12.92 
(d) 4.96 5.11 10.07 
(e) 1.83 6.54 0.94 
(f) 1.75 5.51 1.34 
(g) 0.42 0.42 17.56 
(h) 2.56 2.56 n/a 
(i) 3.38 3.40 n/a 
Subfigure labelPath length (km) MC1, calibratedPath length (km) MC1, uncalibratedPath length (km) MC2, calibrated
(a) 0.18 6.13 7.60 
(b) 0.71 7.31 0.98 
(c) 0.44 0.44 12.92 
(d) 4.96 5.11 10.07 
(e) 1.83 6.54 0.94 
(f) 1.75 5.51 1.34 
(g) 0.42 0.42 17.56 
(h) 2.56 2.56 n/a 
(i) 3.38 3.40 n/a 

Additional leak experiments

Additional leak experiments during the second measurement campaign in the extended version of the WDN are shown in Figure 2. For this extended WDN, it is known that the pressure heads for a large share of the leak scenarios are insensitive to leaks. This aspect is discussed in detail in the next section. Hyperparameter grid search results and predicted leak probabilities are shown in Supplementary Figures 3 and 4, respectively. The leak probabilities show that the uncertainty of the leak localization model varies greatly per leak experiment considered. The same observation holds for predictions using the uncalibrated hydraulic model, shown in Supplementary Figures 1 and 2. By comparing columns ‘MC1, calibrated’ and ‘MC1, uncalibrated’ in Table 3, it can be seen that the use of an uncalibrated hydraulic model results in shortest path lengths which are approximately equally long (experiments c, g, h, and i) or longer (experiments a, b, d, e, and f). Thus, leak localization performance has worsened.

Shortest path lengths for MC2 are given in Table 3, column ‘MC2, calibrated’, with lengths ranging from 0.94 km (experiment e) to 17.56 km (experiment g).

Sensitivity of the extended WDN to leaks

The sensitivity of the pressure sensors in the extended WDN of measurement campaign 2 is visualized in Figure 7. Simulated pressure heads are considered at one point in time only, which is 2020-08-05, 08:00. The visualization does not change considerably at other moments of the day. Hence, it suffices to consider pressure heads at this time only to evaluate the leak sensitivity of the WDN.
Figure 7

Sensitivity of the pressure sensors in the extended WDN to leaks, at 2020-08-05, 08:00. Every hydrant is colored according to the maximum possible head residual simulated in any pressure sensor, as result of a leak of 10 m3/h in the hydrant considered.

Figure 7

Sensitivity of the pressure sensors in the extended WDN to leaks, at 2020-08-05, 08:00. Every hydrant is colored according to the maximum possible head residual simulated in any pressure sensor, as result of a leak of 10 m3/h in the hydrant considered.

Close modal

For every hydrant in the WDN, two different leak scenarios are compared. The first scenario is the leak-free situation. The second scenario is when a leak of 10 m3/h is simulated in the hydrant. The pressure head difference (i.e. the head residual) between the two scenarios is calculated for every pressure sensor. The hydrant considered is colored according to the maximum head residual that occurs over the pressure sensors.

Figure 7 shows that the sensitivity of the pressure sensors to leaks varies greatly, depending on the location of the leak. A leak in the yellow regions does not lead to a detectable pressure head residual in any pressure sensor. In contrast, a leak in the black regions leads to a head residual of considerable magnitude in at least one of the pressure sensors. The subset of the WDN used in measurement campaign 1 is the region in the north of the extended WDN, which has a moderate sensitivity to leaks of 10 m3/h.

Prediction results were presented for a novel leak localization methodology, which consists of a combination of the TWHBC and optimizing probabilistic leak predictions. The TWHBC quantifies the uncertainty of simulated pressure heads, resulting from the various sources of modeling errors and uncertainties in the hydraulic WDN model. Features generated after calculating the TWHBC were used to train an elastic-net logistic regression with every leak location labeled as a class. To evaluate the methodology on real in-field leak experiments, leak datasets were created for two measurement campaigns. These were made publicly available as part of this paper.

Using the designed methodology, regions of higher leak probabilities are predicted in which the target leaks induced by the experiments are located. This focus on probabilistic leak search regions is beneficial for further leak localization, for example by technicians using acoustic equipment. Since low probability regions can be excluded, further pinpointing of the leak can happen in a more focused way.

The spatial extent of the leak regions predicted varies in size considerably. Next to a visual evaluation of the leak probability distributions, the leak localization performance was also evaluated numerically by computing the shortest path length between the highest predicted probability and the true leak location. The first setting considered was ‘MC1, calibrated’, using a calibrated hydraulic model to simulate leak scenarios for measurement campaign 1. Shortest path lengths computed over the leak experiments range from 0.18 km to 4.96 km.

More problematic leak localization settings were also evaluated. The second setting considered was ‘MC1, uncalibrated’, where the hydraulic model of MC1 was replaced with an uncalibrated model. Shortest path lengths computed over the leak experiments range from 0.44 km to 7.31 km. Thus, the use of an uncalibrated model leads to a worse localization performance. The third setting is ‘MC2, calibrated’, when a larger WDN is considered in which pressures in large parts of the WDN are insensitive to leaks. Shortest path lengths range from 0.94 km to 17.56 km. In both settings ‘MC1, uncalibrated’ and ‘MC2, calibrated’, the leak localization model is able to indicate higher levels of uncertainty if precise leak localization is not possible.

Reliable WDN management may be achieved with an accurate calibrated model only. Additionally, the calibration of a hydraulic WDN model is a complex procedure which requires considerable domain expertise (Zanfei et al. 2020). Hence, it is interesting to observe that even when the model is uncalibrated, leak localization is still possible in a number of leak experiments by using the proposed methodology.

The presence of large predicted leak regions for some experiments is also indicative of a limitation of our approach. It relies on differences in pressure heads between leak-free and leaky scenarios. However, a pressure drop (in at least one of the pressure sensors) does not occur for every leak scenario, as visualized in Figure 7. The leak location may be located either close to large pipes or inside of a highly meshed grid, where pressure sensitivity is lower. Another limitation is that a few days of pressure data collection before the actual leak event are needed to calculate the TWHBC.

A final limitation of this work is that only leakages of 10 m3/h were considered. These are easier to locate than background leakages, which are defined as leak losses below 0.5 m3/h (Chan et al. 2018), but are in this case of a large DMA in the same magnitude as the short-term variability of the consumption.

Further work could focus on improving the leak localization performance for the problematic settings presented in this work. Future research could also examine additional complications in the leak localization methodology presented. For example, more complex classification strategies using machine learning could be applied to the processed feature data. Further complexities in the TWHBC could be considered, such as including covariances in the pressure sensor uncertainty calculations. In this work, all simulated pressure head time-series were generated strictly before processing the time-series into a leak localization dataset. Active learning for leak classification could be researched to specify which additional pressure head time-series are most interesting to generate based on feedback from a leak classification model trained on pressure heads already generated. For this purpose, the hydraulic models used in this work were also made publicly available. Future work could also focus on improving the proposed methodology to localize multiple simultaneous leaks, since it is currently designed to localize single leaks only. Another open problem is whether a similar leak localization performance for the experiments considered can be achieved with a smaller amount of pressure sensors.

The authors thank T. Van Daele, P. J. Haest, J. Debaenst for their research assistance, and De Watergroep for sharing data. The authors thank HydroScan for setting up LeakReduxTM for the leak detection phase to determine the start and magnitude of the leaks. The work of G. Mazaev and M. Weyns was supported by Research Foundation – Flanders (FWO) under strategic basis research doctoral grants (1S88020N, 1SD8821N). This work was supported through the SmartWaterGrid project, an imec.icon research project funded by imec and Agentschap Innoveren & Ondernemen.

All relevant data are available from an online repository or repositories. The URL for the dataset is: https://doi.org/10.5281/zenodo.7255403. The URL for the source code is: https://github.com/predict-idlab/phys-ml-leak-localization.

The authors declare there is no conflict.

Abdulshaheed
A.
,
Mustapha
F.
&
Anuar
M.
2018
Pipe material effect on water network leak detection using a pressure residual vector method
.
J. Water Resour. Plann. Manage.
144
(
4
),
05018006
.
Baek
C. W.
,
Jun
H. D.
&
Kim
J. H.
2010
Development of a PDA model for water distribution systems using harmony search algorithm
.
KSCE J. Civ. Eng.
14
(
4
),
613
625
.
Bierkens
M. F. P.
&
Wada
Y.
2019
Non-renewable groundwater use and groundwater depletion: a review
.
Environ. Res. Lett.
14
(
6
),
063002
.
Bohorquez
J.
,
Simpson
A. R.
,
Lambert
M. F.
&
Alexander
B.
2021
Merging fluid transient waves and artificial neural networks for burst detection and identification in pipelines
.
J. Water Resour. Plann. Manage.
147
(
1
),
04020097
.
Brunone
B.
&
Ferrante
M.
2001
Detecting leaks in pressurised pipes by means of transients
.
J. Hydraul. Res.
39
(
5
),
539
547
.
Daniel
I.
,
Pesantez
J.
,
Letzgus
S.
,
Fasaee
M. A. K.
,
Alghamdi
F.
,
Berglund
E.
,
Mahinthakumar
G.
&
Cominola
A.
2022
A sequential pressure-based algorithm for data-driven leakage identification and model-based localization in water distribution networks
.
J. Water Resour. Plann. Manage.
148
(
6
),
04022025
.
He
C.
,
Liu
Z.
,
Wu
J.
,
Pan
X.
,
Fang
Z.
,
Li
J.
&
Bryan
B. A.
2021
Future global urban water scarcity and potential solutions
.
Nat. Commun.
12
(
1
),
4667
.
Kim
Y.
,
Lee
S. J.
,
Park
T.
,
Lee
G.
,
Suh
J. C.
&
Lee
J. M.
2016
Robust leak detection and its localization using interval estimation for water distribution network
.
Comput. Chem. Eng.
92
,
1
17
.
Marzola
I.
,
Mazzoni
F.
,
Alvisi
S.
&
Franchini
M.
2022
Leakage detection and localization in a water distribution network through comparison of observed and simulated pressure data
.
J. Water Resour. Plann. Manage.
148
(
1
),
04021096
.
Navarro
A.
,
Begovich
O.
,
Delgado-Aguiñaga
J.
&
Sanchez
J.
2019
Real time leak isolation in pipelines based on a time delay neural network
. In:
2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)
,
13-15 November
,
Ixtapa, Mexico
.
Quiñones-Grueiro
M.
,
Verde
C.
,
Prieto-Moreno
A.
&
Llanes-Santiago
O.
2018
An unsupervised approach to leak detection and location in water distribution networks
.
Int. J. Appl. Math.
28
(
2
),
283
295
.
Rajeswaran
A.
,
Narasimhan
S.
&
Narasimhan
S.
2018
A graph partitioning algorithm for leak detection in water distribution networks
.
Comput. Chem. Eng.
108
,
11
23
.
Ricca
H.
,
Patskoski
J.
&
Mahinthakumar
G.
2020
Reducing error in water distribution network simulations with field measurements
.
J. Appl. Water Eng. Res.
8
(
1
),
15
27
.
Romano
M.
,
Kapelan
Z.
&
Savić
D. A.
2014
Automated detection of pipe bursts and other events in water distribution systems
.
J. Water Resour. Plann. Manage.
140
(
4
),
457
467
.
Soldevila
A.
,
Fernandez-Canti
R. M.
,
Blesa
J.
,
Tornil-Sin
S.
&
Puig
V.
2017
Leak localization in water distribution networks using Bayesian classifiers
.
J. Process Control
55
,
1
9
.
Soldevila
A.
,
Jensen
T. N.
,
Blesa
J.
,
Tornil-Sin
S.
,
Fernandez-Canti
R. M.
&
Puig
V.
2018
Leak localization in water distribution networks using a Kriging data-based approach
. In:
2018 IEEE Conference on Control Technology and Applications (CCTA)
,
21–24 August
,
Copenhagen, Denmark
.
Sophocleous
S.
,
Savić
D.
&
Kapelan
Z.
2019
Leak localization in a real water distribution network based on search-space reduction
.
J. Water Resour. Plann. Manage.
145
(
7
),
04019024
.
van Vliet
M. T. H.
,
Jones
E. R.
,
Flörke
M.
,
Franssen
W. H. P.
,
Hanasaki
N.
,
Wada
Y.
&
Yearsley
J. R.
2021
Global water scarcity including surface water quality and expansions of clean water technologies
.
Environ. Res. Lett.
16
(
2
),
024020
.
Vrachimis
S. G.
,
Kyriakou
M. S.
,
Eliades
D. G.
&
Polycarpou
M. M.
2018
LeakDB: a benchmark dataset for leakage diagnosis in water distribution networks
. In:
WDSA/CCWI Joint Conference
,
23–25 July
,
Kingston, Ontario, Canada
.
Vrachimis
S. G.
,
Eliades
D. G.
,
Taormina
R.
,
Ostfeld
A.
,
Kapelan
Z.
,
Liu
S.
,
Kyriakou
M. S.
,
Pavlou
P.
,
Qiu
M.
&
Polycarpou
M.
2020
.
Vrachimis
S. G.
,
Timotheou
S.
,
Eliades
D. G.
&
Polycarpou
M. M.
2021
Leakage detection and localization in water distribution systems: a model invalidation approach
.
Control Eng. Pract.
110
,
104755
.
Vrachimis
S. G.
,
Eliades
D. G.
,
Taormina
R.
,
Kapelan
Z.
,
Ostfeld
A.
,
Liu
S.
,
Kyriakou
M.
,
Pavlou
P.
,
Qiu
M.
&
Polycarpou
M. M.
2022
Battle of the leakage detection and isolation methods
.
J. Water Resour. Plann. Manage.
148
(
12
),
04022068
.
Zaman
D.
,
Tiwari
M. K.
,
Gupta
A. K.
&
Sen
D.
2020
A review of leakage detection strategies for pressurised pipeline in steady-state
.
Eng. Fail. Analys.
109
,
104264
.
Zanfei
A.
,
Menapace
A.
,
Santopietro
S.
&
Righetti
M.
2020
Calibration procedure for water distribution systems: comparison among hydraulic models
.
Water
12
(
5
),
1421
.
Zhou
X.
,
Tang
Z.
,
Xu
W.
,
Meng
F.
,
Chu
X.
,
Xin
K.
&
Fu
G.
2019
Deep learning identifies accurate burst locations in water distribution networks
.
Water Res.
166
,
115058
.
Zhou
M.
,
Yang
Y.
,
Xu
Y.
,
Hu
Y.
,
Cai
Y.
,
Lin
J.
&
Pan
H.
2021
A pipeline leak detection and localization approach based on ensemble TL1DCNN
.
IEEE Access
9
,
47565
47578
.
Zou
H.
&
Hastie
T.
2005
Regularization and variable selection via the elastic net
.
J. R. Stat. Soc. B.
67
(
2
),
301
320
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data