A significant percentage of treated water is lost due to leakage in water distribution systems. The state-of-the-art leak detection and localization schemes use a hybrid approach to hydraulic modeling and data-driven techniques. Most of these works, however, focus on single leakage detection and localization. In this research, we propose to use combined pressure and flow residual data to detect and localize multiple leaks. The proposed approach has two phases: detection and localization. The detection phase uses the combination of pressure and flow residuals to build a hydraulic model and classification algorithm to identify leaks. The localization phase analyzes the pattern of isolated leak residuals to localize multiple leaks. To evaluate the performance of the proposed approach, we conducted experiments using Hanoi Water Network benchmark and a dataset produced based on LeakDB benchmark's dataset preparation procedure. The result for a well-calibrated hydraulic model shows that leak detection is 100% accurate while localization is 90% accurate, thereby outperforming minimum night flow and raw- and residual-based methods in localizing leaks. The proposed approach performed relatively well with the introduction of demand and noise uncertainty. The proposed localization approach is also able to locate two to four leaks that existed simultaneously.

  • Water leak detection and localization (LDL) approaches based on a hybrid of hydraulic modeling and classification, and statistical approaches are proposed.

  • Combined residual data of pressure and flow are used to enhance LDL.

  • By separating the detection and classification phase, multiple leaks are localized.

The urgency of managing challenges due to leakage in pipe networks has become greater in recent years due to water shortages caused by recent droughts, increase in demand along with environmental, social and political pressures (Dighade et al. 2014). The significance of leak detection stems from the fact that many water scarce and arid countries in the world have limited options for water resources development. Although there is awareness that efficient management of water resources is a growing necessity, non-revenue water due to physical losses such as leakage is still excessive in many cities. A World Bank study states that the global estimate of physical water losses is about 32 billion cubic meters each year, half of which occurs in developing countries (Bhagat et al. 2019).

One of the greatest challenges considered in the water distribution system (WDS) is the occurrence of leaks (Li et al. 2020). Reducing leakage volume not only saves water to supply the ever-growing population, but potentially avoids or at least delays costly expansions of the distribution system through hydraulic works and reduces costs related to energy and environmental losses (Ferrante et al. 2013). Lost revenue due to leakage is estimated to be around USD 2.9 billion every year in developing countries alone (Bell 2016). In the case of intermittent supply, which is frequently caused by excessive leakage, the urban poor often suffer most, as they cannot afford proper storage facilities and pumps, and often have to buy water from vendors during non-supply hours. Reducing physical losses will make more water available and enable water utilities to increase coverage, improve quality of services, and reduce expansion and distribution costs.

Water leak detection and localization (LDL) is one of the main activities done in the effort to reduce water leakage volume. Several researchers propose different software-based methods to help in detecting and localizing water leakage as early as possible. Software-based methods rely on algorithms and data-analyzing tools for finding anomalies like leaks in the hydraulic data pattern. In these methods, hydraulic sensors are used to measure time-series data of vital parameters like pressure, flow rate, and temperature. The main theme for software-based methods is having predicted system states and comparing them with the current observation of the system state for finding abnormal events like leakage and contamination. The approaches used in software-based methods could be classified as hydraulic modeling and data-driven-based approaches.

The use of hydraulic modeling in water LDL is customary in many studies (Lah et al. 2018). The advancement of these hydraulic modeling tools makes it easier to predict system states in the WDS under different conditions. A well-calibrated hydraulic WDS model is used as a reference for comparing subsequent time-step hydraulic data in leakage detection. In model-based WDS analysis, the quantity and quality of the available data are the main factors for building a well-calibrated model that results in better LDL (Kirstein et al. 2021). The detection of leakages with the hydraulic model is mostly based on either residual analysis between the WDS hydraulic model predictions and the WDS observations (Pudar & Liggett 1992; Pérez et al. 2009, 2010, 2014; Casillas Ponce et al. 2014; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017) or WDS state estimates pattern evaluation (Jung & Lansey 2014; Anjana et al. 2015; Jung et al. 2015; Khalilabad et al. 2018). The focus of this paper is on the residual analysis approach.

Data-driven approaches extract meaningful information in time-series data using different statistical (Romano et al. 2009, 2017; Jung & Lansey 2014) and artificial intelligence (Mounce & Machell 2006; Aksela et al. 2009; Mounce et al. 2011; Mounce et al. 2014; Ravichandran et al. 2021) algorithms. These approaches have shown promising results in pattern recognition and anomaly detection (Mounce 2013; Mounce et al. 2014). The approaches, however, fail to capture the dynamic nature of WDS and are not robust in boundary changes like pump and valve operational changes (Wu & Liu 2017). Such characteristics of WDS are usually solved with a calibrated hydraulic modeling (Okeya 2018). Hence, the combination of the two approaches is used to get a better result in LDL (Mashford et al. 2012; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017). The idea of localizing multiple leaks, however, is an open challenge, and the prior works focuses only on single leak localization.

Based on the previous hybrid approach, this study proposes a leakage detection and localization approach. The approach essentially generates leaky and non-leaky residuals using the calibrated hydraulic model and feeds it to a statistical binary classifier for training. After training the classifier, measured pressure and flow values are compared to generate new residuals that are fed to the classifier for isolating leaky and non-leaky residuals. Once a leak is detected, the leaky residual is further analyzed for finding leak locations. In the proposed approach, the outliers in the residual space for multiple time steps are considered as leaking nodes.

To assess the performance of the proposed approach, we conducted five sets of experiments using Hanoi Water Network (Fujiwara & Khang 1990) Benchmark and a realistic dataset produced based on LeakDB benchmark's dataset preparation procedure. The accuracy of the well-calibrated hydraulic model was compared against the state-of-the-art LDL approaches: minimum night flow (MNF) (Vrachimis & Kyriakou 2018), and hybrid approach (Soldevila et al. 2016; Carreno-Alvarado et al. 2017); and robustness when subjected to noise and demand uncertainty was evaluated. The results show that for a well-calibrated hydraulic model, the detection of leaks is 100% accurate, while localization is 90% accurate. The comparison with the state-of-the-art works shows that the proposed method outperforms MNF, and raw and residual-based methods in localizing leaks. The proposed method performed relatively well with the introduction of demand and noise uncertainty, where leaks of size 5 cm had resilience for noise and demand uncertainty. The localization method has shown effectiveness in locating two to four leaks that existed simultaneously.

The main contribution of this research is the use of combined residual data of pressure and flow to detect and localize multiple leaks. The state-of-the-art works focus on localization of single leakage. In addition, the research uses the hybrid approach for localization only, not for detection. The hybrid approach makes the localization of multiple leaks from a single testing sample challenging. The proposed approach, which is in line with the state of the art in using a hybrid approach, however, detects and localizes multiple leaks using residual of pressure and flow data.

In order to localize multiple leaks, a method based on residual analysis is proposed. The proposed approach has two main phases: detection and localization. In the detection phase, we used hydraulic modeling to generate a dataset containing each node's residual of pressure and flow that represent the system state. The generated dataset is used to build a classifier model. The model is then used to classify new observational residual data into leaky and non-leaky. If the new observational residual data is classified as leaky, localization is instantiated.

The localization phase uses the residual data to localize nodes that are leaking. The rationale behind using the residual data for localization is that a leaking node will have the highest peak residual value of all neighboring nodes. At a particular timestamp, a water distribution network could have multiple leaks. The residual value for leaks at different locations and times could also be different. Hence, in this study, we translated the problem of leak localization into finding a local maximum for multiple leak scenarios and finding a global maximum for a single leak scenario at a given timestamp. To identify the maximum residual values, we propose to use a dynamic threshold value, , that is dependent on the current residual data. The threshold value, , is set to be the upper quartile of the local or global residual dataset. All residual values that are higher than the threshold value are considered as potential leaking nodes. Our approach, finally, identifies the final candidate leaking node(s) by taking the intersection of potential leaking nodes at different timestamps in a given period of time.

The general overview of the approach is shown in Figure 1. Details of the proposed approach along with the description of residual analysis are provided below.

Figure 1

General overview (d1, d2dn = demand patterns for every node; ci = boundary conditions; fi = leaks that are simulated in nodes; vi = measurement noise that could potentially appear in the measured data).

Figure 1

General overview (d1, d2dn = demand patterns for every node; ci = boundary conditions; fi = leaks that are simulated in nodes; vi = measurement noise that could potentially appear in the measured data).

Close modal

Residual analysis

Our proposed approach is based on residual analysis. Residual analysis is a method used for anomaly detection, where the residue is calculated as the difference between the field measured hydraulic data and prior established baseline data. The baseline data is generated either from previous predicted patterns using data-driven approaches or by using hydraulic modeling software systems. Prior works (Perez et al. 2014; Soldevila et al. 2016) depend on pressure residuals. This residual vector, R, is determined by the difference between the measured pressure at inner nodes where sensors are installed and the estimated pressure at these nodes (see Equation (1)).
(1)
where p is the measured pressure at inner nodes at time t, and is the estimated pressure at these nodes at the same time, t. is obtained using the network model considering a leak-free scenario. In this work, the same procedure is followed to calculate the flow residuals.

In previous studies, these pressure residuals are compared with a given threshold value,, for the detection of leak events. The threshold is calculated by considering the measurement noise and model uncertainty of the given WDS. If the residuals are greater than this threshold, leak is detected and localization is instantiated. The choice of this threshold affects the quality of detection and localization. Setting too high a value makes the detection fail to identify leak events, while setting the threshold value too low results in many false positives (FPs).

In this research, we used a classification algorithm to detect leaks and residual analysis for localization. In the residual analysis, we used a dynamic threshold value,, that is dependent on the current residual values to identify candidate leaking nodes. The threshold value, , in this study, is set to be the upper quartile of the current residual values.

Detection phase

This phase utilizes both hydraulic modeling and data-driven approach to detect leaks. The hydraulic modeling is used to generate residuals that represent the system state under leaky and non-leaky scenarios. The residuals are computed using Equation (1). The generated leaky and non-leaky residuals are then fed to the classifier to build a model. The major difference from previous works is that the proposed approach uses a combination of pressure and flow residuals for training and testing. We conjecture that combining the two residuals will improve the leak detection accuracy. The detection phase consists of generating input data, selection and training of the classifier, followed by the detection of leaky and non-leaky residuals.

Data generation

Hydraulic data that is representative of different leak sizes is generated using a modeling software. The generated data is used to train the classifiers. To make the data closer to reality, modeling, demand, and measurement uncertainty are taken into consideration during data generation. This is shown using vector v in Figure 1. After hydraulic data is generated, flow and pressure residuals are calculated using Equation (1). The two residuals are used as features of the classifier input data.

The data generation step shown on the left side of Figure 1 is adopted from previous works (Pudar & Liggett 1992; Pérez et al. 2009, 2014; Casillas Ponce et al. 2014; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017). The symbols in the figure are input values that are provided to the model. Demand patterns, d1, d2dn, are adopted from LeakDB benchmark dataset. The boundary conditions, ci, do not contain pump and valve characteristics as the selected model, i.e., the Hanoi Water Network, is a relatively simple network.

Training the classifier

The second step after generating representative leak residuals is training the classifier. The classifier used for detection is a binary classifier. The training residuals are labeled with two classes, i.e., 0 – no leak and 1 – leak. The pressure and flow residuals for different magnitudes of leak amount and location are combined in a single dataset. Then, the combined dataset will be split for training and testing. In this research, support vector machine (SVM) (scikit-learn developers 2019) is used for classification. The rationale behind selecting SVM as a classifier is that it has the ability to treat high-dimensional inputs. Due to its robust nature, SVM can also cater for noisy residuals that are a consequence of demand variation. From the generated residuals, 50% are used for training and the remainder for testing.

Detection

After training, the classifier model is provided with a new observational residual data. The new input data will be the difference between field measured pressure and flow data and the baseline pressure and flow non-leaky data. The same procedure used to generate the residuals is adopted. Following this, the classifier model built during the training phase decides if the given residual is leaky or non-leaky. If the residual is found to be leaky, the localization phase is instantiated. In our experiment, to simulate the real observational data, we used the testing dataset.

Localization phase

The localization phase uses a statistical method to find the leaking node after a leak is detected in the system. The occurrence of a leak increases the incoming flow and reduces the pressure downstream. The effect of one leak in any node can be seen on the other nodes, especially on nodes that are looped like the Hanoi network. However, the effect is not the same in every node. The input for the localization phase is the timestamp where the leak is first detected. The phase adopts the following procedures.

  • First, the absolute value of each residual after the suspected timestamp is taken. Using the absolute value will make the deviation that is caused by pressure and flow rate to come in one dimension.

  • By taking the absolute value, all the residuals show a positive value. However, their values are not the same. Neighborhood nodes to the leaking node show larger residual values than the other nodes. The leaking node in particular shows the highest peak value of all neighboring nodes. Therefore, finding a leaking node translates into finding a local maximum for multiple leak scenarios and finding a global maximum for a single leak scenario.

  • A node is considered as a potential leaking node when its pressure residual value is greater than the threshold value, . The threshold, in our experiment is calculated as the upper quartile (75% quartile) of the residual data at a given timestamp. Taking the upper quartile of the residual data at a given timestamp helps to capture the network situation at that specific time. The threshold is calculated for the residual data computed at each timestamp, which could give different threshold values. The final candidate leaking nodes are the nodes selected from the list of potential leaking nodes at every timestamp. This is done by taking the intersection of potential leaking nodes at every timestamp in a given time period. Taking the intersection over a given time period helps to identify continuously leaking nodes. The final list of nodes is candidate leaking nodes that require the attention of the water authority. Algorithm 1 illustrates the pseudo-code for localization.

Algorithm 1 Localization Pseudo-code 
t0 = starting timestamp 
c1, c2, …cn//candidate for a time step 
fort0, …,Tfdo 
τ = 4th quartile 
for all nodes do 
  if node's residue ≥τ then 
   Ci [x] = node 
  end if 
end for 
end for 
Final Candidate = c1c2∩ …∩cn 
Algorithm 1 Localization Pseudo-code 
t0 = starting timestamp 
c1, c2, …cn//candidate for a time step 
fort0, …,Tfdo 
τ = 4th quartile 
for all nodes do 
  if node's residue ≥τ then 
   Ci [x] = node 
  end if 
end for 
end for 
Final Candidate = c1c2∩ …∩cn 

The dataset for evaluating the proposed approach is prepared based on Leakage Diagnosis Benchmark (LeakDB) (Vrachimis & Kyriakou 2018), and modeling was done using the Hanoi Water Network Benchmark shown in Figure 2. The main reason for using the Hanoi Water Network is that it is simple and includes a loop. The Hanoi Water Network is used in several previous similar studies (Casillas Ponce et al. 2014; Soldevila et al. 2016; Carreno-Alvarado et al. 2017; Vrachimis & Kyriakou 2018), which facilitates comparison. LeakDB also used this network to generate the datasets. All the datasets and experiments in the proposed LDL system are implemented using the Python water network tool for resilience (Klise et al. 2017).

Figure 2

Hanoi Water Network (Fujiwara & Khang 1990).

Experimental setup

A total of five sets of experiments were conducted to evaluate the proposed approach. Based on their intended goal, the experiments are grouped into three major categories.

Experiment 1: leak detection and localization

In this experiment, the two phases of the proposed approach are implemented and compared with related previous works. The experiment has three sub-experiments.

Experiment 1.1: detection comparisons with the MNF approach

MNF analysis is the oldest and customary way of detecting leaks in a district metered area. In this work, a modified version of MNF which is proposed by Eliades & Polycarpou (2012) is used. LeakDB (Vrachimis & Kyriakou 2018) also uses this detection approach to compare newly proposed detection algorithms. A moving window, is defined in order to calculate the MNF during some consecutive M days. MNF is defined as which is the average night-flow measured for the period and a threshold is selected for a time window of M days (see Equation (2)).
(2)
Let l be the day a leakage is detected, such that , where is a detection threshold which is selected off-line by using historical measurements, in order to minimize FPs and maximize true positives (TPs). In this experiment, m = 3 days and thresholds of = 60, 70, 85, and 100 l/s are chosen.

Experiment 1.2: detection comparisons with approaches based on raw and residual data

In this experiment, the proposed combined residual approach is compared with the use of raw and residual pressures. In this experiment, we have also computed detection results while using only raw flow and residual flow data for comparison.

Experiment 1.3: single leak localization

This experiment compares the proposed approach with similar hybrid approaches that are described in Carreno-Alvarado et al. (2017) and Soldevila et al. (2016). The approach proposed by Soldevila et al. (2016) uses pressure residuals and multi-label classifier, K-Nearest Neighbor, to localize leaks. The proposed approach was tested on the Hanoi water network by taking pressure residuals of two nodes as input features and 26 nodes as an output class. Because the implementation of Soldevila et al.’s approach was not available, we have replicated their approach following the brief description provided in their paper.

Experiment 2: robustness to uncertainty

To cater for noise during pressure measurement and reduce inevitable demand variations, we conducted a second category of experiment which quantifies the robustness of the proposed approach. This is done by introducing demand uncertainty, noise addition, and application of both uncertainties to the model. In this experiment, 10 different leakage areas are created in the range between 1 and 10 cm with 1 cm increment. During the data generation phase, a 2 and 4% uncertainty of the average daily base demand is added before simulation. The same amount of Gaussian noise is also added in the pressure and flow measurement to check robustness of the proposed approach in the presence of noise.

Experiment 3: multiple leak localization

To evaluate the proposed localization approach when subjected to more than one leakage at a time, another set of experiments was conducted. In these experiments, the number of leaking nodes is varied from two to five leaks at a time. The aim of these experiments is to assess the ability of the proposed approach in localizing multiple leaks. In this experiment, four variants of multiple leaks, ranging from two to five simultaneous leaks, are simulated. For each variant of leak presented, 20 scenarios are run. The number of scenarios where the proposed approach was able to accurately localize the simulated number of leaks added to different nodes was documented.

Evaluation metrics

To evaluate the proposed approach, we used the data labels provided with the datasets for calculating the standard classification metrics. The metrics are computed using a confusion matrix. The confusion matrix is composed of TP, FP, true negative (TN), and false negative (FN) values. TP refers to cases where an actual leak is detected as a leak. TN refers to cases where non-leaky data is identified as non-leaky. FP refers to cases where a leak is alarmed while there is no leak. FN refers to cases where a leak is identified as non-leaky. Commonly used classification metrics such as precision, recall, F-measure, and accuracy, which are described below, can be computed using the confusion matrix.

Precision is a ratio of the number of correctly identified leaks to the total number of leaks identified by the detection module. Precision is computed using Equation (3).
(3)
Recall, also known as TP Rate, measures the proportion of TPs to the actual total number of leaks. Recall is computed using Equation (4).
(4)
F-measure is the harmonic mean of precision and recall. F-measure is computed using Equation (5).
(5)
Accuracy is a rate of positive classifications, i.e., both leaky and non-leaky, over the total classification samples and it is defined using Equation (6).
(6)

Experiment 1: leak detection and localization

Experiment 1.1: detection comparisons with the MNF approach

 Table 1 shows the precision result of different scenarios for using MNF analysis in comparison with the proposed approach (i.e., the use of combined residuals).

Table 1

Comparison of detection precisions (in percentage)

Detection methodLeak diameter (m)
0.010.020.030.040.050.060.070.080.090.1
Proposed approach 100 100 100 100 100 100 100 100 100 100 
MNF (l/s)  = 60 25 40 40 50 50 50 50 50 50 50 
= 70 – 33.3 66.6 66.6 100 100 100 100 100 100 
= 85 – – – 66.6 100 100 100 100 100 100 
= 100 – – – – 66.7 100 100 100 100 100 
Detection methodLeak diameter (m)
0.010.020.030.040.050.060.070.080.090.1
Proposed approach 100 100 100 100 100 100 100 100 100 100 
MNF (l/s)  = 60 25 40 40 50 50 50 50 50 50 50 
= 70 – 33.3 66.6 66.6 100 100 100 100 100 100 
= 85 – – – 66.6 100 100 100 100 100 100 
= 100 – – – – 66.7 100 100 100 100 100 

As shown in Table 1, the proposed approach detects different leakage sizes with 100% accuracy. Precision of MNF for = 60 l/s is less than 50%. The low precision scores for MNF analysis are due to many FP reports, especially for small leak diameters ranging from 0.01 to 0.03 m. When considering = 70 l/s for MNF, large leakages (with leak diameter 0.05–0.1) were detected with 100% precision, whereas the detection of small leakages lacked accuracy. With a threshold of , MNF can detect large leakages, but small leakages are still undetected.

The results shown in Table 1 suggest that it is preferable to use the classifier-based approach than the MNF analysis approach. The proposed approach is free from thresholds, and different leak sizes can be detected with the classifier automatically. This is mainly because the leaky patterns can easily be captured from the dataset.

Experiment 1.2: detection comparisons with approaches based on raw and residual data

To see the effect of using residual data on leak detection, we compared the proposed approach with the approaches that use raw data. For proper comparison, the flow counterparts are also included in the experiment. Table 2 shows the result for leak sizes between 0.01 and 0.1 m.

Table 2

Comparison of input data type accuracy (in percentage)

Leak Diameter (m)Raw data
Residual
FlowPressureCombinedFlowPressureCombined
0.01 66.7 66.3 66.5 66.7 100 100 
0.02 66.7 66.4 66.6 66.7 100 100 
0.03 66.7 66.1 66.3 66.7 100 100 
0.04 66.7 70.1 69.6 100 100 100 
0.05 66.7 70.5 70.7 100 100 100 
0.06 66.7 66.1 66.3 100 100 100 
0.07 66.7 69.3 73.1 100 100 100 
0.08 100 99.4 99.6 100 100 100 
0.09 93.4 74.1 72.7 100 100 100 
0.1 66.7 66.1 66.3 100 100 100 
Leak Diameter (m)Raw data
Residual
FlowPressureCombinedFlowPressureCombined
0.01 66.7 66.3 66.5 66.7 100 100 
0.02 66.7 66.4 66.6 66.7 100 100 
0.03 66.7 66.1 66.3 66.7 100 100 
0.04 66.7 70.1 69.6 100 100 100 
0.05 66.7 70.5 70.7 100 100 100 
0.06 66.7 66.1 66.3 100 100 100 
0.07 66.7 69.3 73.1 100 100 100 
0.08 100 99.4 99.6 100 100 100 
0.09 93.4 74.1 72.7 100 100 100 
0.1 66.7 66.1 66.3 100 100 100 

From Table 2, the use of residuals generally outperforms the use of raw data in the detection accuracy. In terms of accuracy, adding flow residual data to pressure residual data did not show any improvement. Where flow measurements are scarce, pressure residuals can be used in place of the combined residual approach because they give similar results to that of the combination. From an application point of view, this presents a practical solution by reducing reliance on measurements.

Experiment 1.3: single leak localization

Carreno-Alvarado et al. (2017) used raw pressure residuals as input and the output classes are categorized into three zones. The loops in Figure 2 show the three classes. Carreno-Alvarado et al.’s work is similar to that of Ferrandez-Gamot et al. (2015) and Soldevila et al. (2016). However, they used raw pressure data instead of pressure residuals. Hence, we compared our approach with both types of approaches. For this comparison, the experiment is conducted by taking 24 h time step data with a sampling time of 1 h. Table 3 shows the results.

Table 3

Comparison of single leak localization accuracy (in percentage)

No. of nodesProposed approachCarreno-Alvarado et al. (2017) Soldevila et al. (2016) 
26 nodes 96.1 94.4 95 
2 nodes 57.3 52.3 53.5 
No. of nodesProposed approachCarreno-Alvarado et al. (2017) Soldevila et al. (2016) 
26 nodes 96.1 94.4 95 
2 nodes 57.3 52.3 53.5 

In Table 3, values in the first column represent the number of nodes used as input for the classifier. The results show that the proposed approach localizes single leak nodes better than the other two approaches. For all approaches, using the data from all nodes increases the localization accuracy. The result also shows that the use of raw pressure data gives the least accuracy compared to the others.

In Experiment 1, the proposed approach was evaluated against the threshold-based MNF approach in detecting leaks with size varying from 1 to 10 cm. While the proposed approach identified leaks across all sizes, the MNF approach was only able to detect relatively larger leaks. Having performed well across all leak sizes, accuracy of the proposed approach was compared against raw-based (Carreno-Alvarado et al. 2017) and residual-based (Soldevila et al. 2016) approaches for localizing single leaks. The proposed approach outperformed both approaches and gave similar leak detection results with the use of pressure residuals (see Table 2). The performance of the proposed approach was further evaluated for its ability to localize single leaks against approaches that used raw pressure data by varying the number of nodal data that was used as classifier input as shown in Table 3. The effect and significance of the number of input nodal data is translated in the accuracy of single leak detection across all approaches, where better performance was seen with higher input nodal data.

Experiment 2: robustness to uncertainty

Experiment 2.1: detection

To test the robustness of leakage detection, we conducted two experiments in the presence of 2 and 4% noise and demand uncertainties. For purposes of comparison, the same degrees of uncertainty were added on the MNF detection approach described earlier. Tables 4 and 5 show the F-measure for 2 and 4% uncertainties. The result shows that the proposed approach is robust to demand variation for large leakages greater than 0.05 m. The results for small leakage sizes of 0.01 m, however, show that the approach is not robust to uncertainties.

Table 4

Robustness to 2% uncertainty (F-measure in percentage)

ApproachesLeak diameter (m)
0.01
0.05
0.09
NoiseDemandBothNoiseDemandBothNoiseDemandBoth
Proposed approach 3.1 43.5 41.5 72.5 97.2 65.3 95.6 100 95.9 
MNF at = 70 – 20 75 – 20 75 – 
ApproachesLeak diameter (m)
0.01
0.05
0.09
NoiseDemandBothNoiseDemandBothNoiseDemandBoth
Proposed approach 3.1 43.5 41.5 72.5 97.2 65.3 95.6 100 95.9 
MNF at = 70 – 20 75 – 20 75 – 

For MNF analysis, the addition of uncertainty gives poor results except for demand variation. The blanks in Tables 4 and 5 are undefined values due to the calculation of the F-measure. The other noticeable fact is that when the degree of uncertainty increases (i.e., from 2 to 4%), the detection score decreases. Large leak sizes greater than 0.05 m, however, are not significantly affected by the increased measurement noise.

Table 5

Robustness to 4% uncertainty (F-measure in percentage)

ApproachesLeak diameter (m)
0.01
0.05
0.09
NoiseDemandBothNoiseDemandBothNoiseDemandBoth
Proposed approach 1.2 – 39.2 100 3.6 97.2 100 23.8 
MNF at = 70  – 66.6 75 – 66.6 75 – 
ApproachesLeak diameter (m)
0.01
0.05
0.09
NoiseDemandBothNoiseDemandBothNoiseDemandBoth
Proposed approach 1.2 – 39.2 100 3.6 97.2 100 23.8 
MNF at = 70  – 66.6 75 – 66.6 75 – 

The results of both experiments show that under the perfectly modeled condition, the detection of small leakages, less than the 0.05 m range, is possible. However, when the degree of uncertainty increases, the detection accuracy gets lower. On the contrary, large leak sizes, which are greater than 0.05 m, show resistance to noise and demand uncertainty. This is because the residuals of small leakages are very small and are affected by the demand and noise addition. The addition of noise will distort the pattern in the residual space, which in turn makes it difficult to accurately classify. In a total stochastic condition, the result gets worse (see Table 5).

Experiment 2.2: localization

To assess the impact of uncertainty on localization, we conducted four groups of experiments, each with 20 scenarios. In each scenario, the leaking nodes are randomly selected.

 Table 6 shows the results of single leak localization under different uncertainties. For an ideal scenario, where there is no uncertainty, the proposed approach detected 90% of the leaking nodes. When the uncertainties are introduced, the number of correctly identified leaking nodes has reduced. An important observation from this experiment is that when the leaking node is closer to the reservoir (i.e., nodes 2 and 3), it provides wrong localization results. Since the leaking nodes were selected randomly in the data generation phase, some leaking nodes were nodes 2 and 3. Therefore, the localization always fails to locate them. This is mainly because the proposed approach could not notice the change in pressure difference as they are very close to the reservoir.

Table 6

Precision of single leak localization (each experiment has 20 scenarios)

ExperimentLocatedMislocatedPrecision (%)
Only leakage 18 90 
Leakage + 4% noise 14 70 
Leakage + 4% demand 15 75 
Leakage + 4% demand + 4% noise 13 65 
ExperimentLocatedMislocatedPrecision (%)
Only leakage 18 90 
Leakage + 4% noise 14 70 
Leakage + 4% demand 15 75 
Leakage + 4% demand + 4% noise 13 65 

When a single leak scenario is considered, the solution is the global maximum according to the localization phase of the proposed approach. For proper localization, it is preferred to consider neighboring nodes that show higher residual values like the actual leaking node.

Experiment 2 assesses the effect of demand uncertainty and noise on the overall performance of the proposed LDL approach. The results of the experiments shown in Tables 4 and 5 confirm that the best performance is obtained when using the proposed approach. With noisy measurements, accuracy of the proposed approach is higher for relatively larger leaks. The proposed approach also outperforms the MNF approach for relatively smaller leak sizes. When subjected to demand uncertainty, which is the most widely observed real-world uncertainty, the proposed approach showed better resilience to larger leak sizes and demonstrated an overall robustness to demand uncertainty. For both noise and demand uncertainty, the resilience of the proposed approach reduced with increase in uncertainty from 2 to 4% but still shows relatively better robustness than the MNF analysis in leak detection. In the localization phase of the experiment, it is apparent that nodes closer to the water supply source having relatively lower pressures did not have significant pressure discrepancies to be recognized by the approach. However, for nodes located elsewhere that are subjected to uncertainty, the results showed good performance.

Experiment 3: multiple leak localization

In this experiment, multiple leaks are added on different nodes at the same time. Four variants of multiple leaks are tested, each having 20 scenarios.

 Table 7 shows the results for multiple leak localization experiments. The overall accuracy in Table 7 is computed as the weighted average of the 20 scenarios for each multiple leaks. The results show that for the majority of the scenarios, the proposed approach is able to detect at least half of the leaks present in the system for more than half of the scenarios. For example, when there were two leaks, the proposed approach has identified both leaks in 12 of the 20 scenarios. In the remaining eight scenarios, the approach is able to locate one of the leaks. When there were five leaks, the approach was able to detect at least three leaks in 12 out of the 20 scenarios. Closely looking at the leaks that are missed, we have noticed that they are usually closer to the reservoir, i.e., nodes 2 and 3.

Table 7

Localization of multiple leaks

Leaks presentedLocalized leaks
Overall Accuracy (%)
NoneOneTwoThreeFourFive
Two – 12    80 
Three – 11   65 
Four – 12  2   67.5 
Five – –  3  – 55 
Leaks presentedLocalized leaks
Overall Accuracy (%)
NoneOneTwoThreeFourFive
Two – 12    80 
Three – 11   65 
Four – 12  2   67.5 
Five – –  3  – 55 

The suitability of the proposed approach in detecting multiple leaks was demonstrated in Experiment 3 by separating the detection and classification phase. Prior mixed approaches are unable to locate multiple leakages using their classifier approach because they classify one class at a time. From the four variants of multiple leaks tested, experiment results show that at least two of the three leaks that take place in the network can be located. However, when the number of leaks is increased, the localization of multiple leaks has decreased. It is possible to conclude that the predicted leaks under each scenario are satisfactorily close to the exact number of leaks simulated, and the discrepancy is mainly due to the random selection of nodes which include nodes close to the reservoir.

This research proposed a leak detection approach that uses hydraulic modeling and classification approaches for detection, and a statistical approach for localization. Previous hybrid approaches lack the ability to localize multiple leakages. This research attempts to fill the gap of localization of multiple leakages. In doing so, this research conducted experiments on the Hanoi Water Network which were grouped into three major categories. The experiments were done to compare output of the proposed approach with that of previous works, to evaluate the robustness of the approach to uncertainty, and to evaluate the capacity of localizing multiple leaks. The results show that the combined residual approach is robust to demand uncertainty, outperforms the separate raw- and residual-based approaches in localizing single leaks, and can detect multiple leaks in a given time period.

The proposed approach was tested on a model that is based on a realistic benchmark dataset. This was considered as a viable option, as there was limited access to real-time field data in the city this study was conducted. For an enhanced uptake of the approach, it is recommended to test it on a real dataset taken from water distribution companies and other large academic water networks. The final output of the localization phase is the identification of candidate leaking nodes (i.e., leaking nodes and their neighbors). The candidate leaking nodes could further be refined using search space reduction techniques such as optimization and statistical process control methods.

The authors would like to deeply acknowledge Dr Agizew Nigussie from the School of Civil and Environmental Engineering, Addis Ababa Institute of Technology, Addis Ababa University and Dr Assefa M. Melesse from Earth System Sciences, Florida International University for their invaluable comments and thorough review of this manuscript.

All relevant data are included in the paper or its Supplementary Information.

Aksela
K.
Aksela
M.
Vahala
R.
2009
Leakage detection in a real distribution network using a SOM
.
Urban Water Journal
6
(
4
),
279
289
.
Anjana
G.
Kumar
K. S.
Kumar
M. M.
Amrutur
B.
2015
A particle filter based leak detection technique for water distribution systems
.
Procedia Engineering
119
,
28
34
.
Bell
C.
2016
The World Bank and the International Water Association to Establish A Partnership to Reduce Water Losses
. .
Bhagat
S. K.
Tiyasha
Welde
W.
Tesfaye
O.
Tung
T. M.
Al-Ansari
N.
Salih
S. Q.
Yaseen
Z. M.
2019
Evaluating physical and fiscal water leakage in water distribution system
.
Water
11
(
10
),
2091
.
Carreno-Alvarado
E. P.
Reynoso-Meza
G.
Montalvo
I.
Izquierdo
J.
2017
A comparison of machine learning classifiers for leak detection and isolation in urban networks
. In
Congress on Numerical Methods in Engineering CMN
.
Casillas Ponce
M. V.
Garza Castañón
L. E.
Cayuela
V. P.
2014
Model-based leak detection and location in water distribution networks considering an extended horizon analysis of pressure sensitivities
.
Journal of Hydroinformatics
16
(
3
),
649
670
.
Dighade
R.
Kadu
M.
Pande
A.
2014
Challenges in water loss management of water distribution systems in developing countries
.
International Journal of Innovative Research in Science, Engineering and Technology
3
(
6
),
13838
13846
.
Eliades
D.
Polycarpou
M. M.
2012
Leakage fault detection in district metered areas of water distribution systems
.
Journal of Hydroinformatics
14
(
4
),
992
1005
.
Ferrandez-Gamot
L.
Busson
P.
Blesa
J.
Tornil-Sin
S.
Puig
V.
Duviella
E.
Soldevila
A.
2015
Leak localization in water distribution networks using pressure residuals and classifiers
.
IFAC-PapersOnLine
48
(
21
),
220
225
.
Ferrante
M.
Massari
C.
Todini
E.
Brunone
B.
Meniconi
S.
2013
Experimental investigation of leak hydraulics
.
Journal of Hydroinformatics
15
(
3
),
666
675
.
Khalilabad
N. M.
Mollazadeh
M.
Akbarpour
A.
Khorashadizadeh
S.
2018
Leak detection in water distribution system using non-linear Kalman filter
.
International Journal of Optimization in Civil Engineering
8
(
2
),
169
180
.
Klise
K. A.
Hart
D.
Moriarty
D.
Bynum
M. L.
Murray
R.
Burkhardt
J.
Haxton
T.
2017
Water Network Tool for Resilience (Wntr) User Manual
.
Lah
A. A. A.
Dziyauddin
R. A.
Yusoff
N. M.
2018
Localization techniques for water pipeline leakages: a review
. In
2018 2nd International Conference on Telematics and Future Generation Networks (TAFGEN)
,
IEEE
, pp.
49
54
.
Li
J.
Wu
Y.
Lu
C.
2020
Pipeline leak detection using the multiple signal classification-like method
.
Journal of Hydroinformatics
22
(
5
),
1321
1337
.
Mashford
J.
De Silva
D.
Burn
S.
Marney
D.
2012
Leak detection in simulated water pipe networks using svm
.
Applied Artificial Intelligence
26
(
5
),
429
444
.
Mounce
S.
2013
A comparative study of artificial neural network architectures for time series prediction of water distribution system flow data
. In
Machine Learning in Water Systems – AISB Convention
.
Mounce
S. R.
Mounce
R. B.
Boxall
J. B.
2011
Novelty detection for time series data analysis in water distribution systems using support vector machines
.
Journal of Hydroinformatics
13
(
4
),
672
686
.
Mounce
S. R.
Mounce
R. B.
Jackson
T.
Austin
J.
Boxall
J. B.
2014
Pattern matching and associative artificial neural networks for water distribution system time series data analysis
.
Journal of Hydroinformatics
16
(
3
),
617
632
.
Okeya
O. I.
2018
Detection and localisation of pipe bursts in a district metered area using an online hydraulic model. Doctoral Thesis, University of Exeter, UK
.
Pérez
R.
Puig
V.
Pascual
J.
Peralta
A.
Landeros
E.
Jordanas
L.
2009
Pressure sensor distribution for leak detection in Barcelona water distribution network
.
Water Science and Technology: Water Supply
9
(
6
),
715
721
.
Pérez
R.
Puig
V.
Pascual
J.
Quevedo
J.
Landeros
E.
Peralta
A.
2010
Leakage isolation using pressure sensitivity analysis in water distribution networks: application to the Barcelona case study
.
IFAC Proceedings
43
(
8
),
578
584
.
Pérez
R.
Sanz
G.
Puig
V.
Quevedo
J.
Escofet
M. A. C.
Nejjari
F.
Meseguer
J.
Cembrano
G.
Tur
J. M. M.
Sarrate
R.
2014
Leak localization in water networks: a model-based methodology using pressure sensors applied to a real network in Barcelona
.
IEEE Control Systems Magazine
34
(
4
),
24
36
.
Pudar
R. S.
Liggett
J. A.
1992
Leaks in pipe networks
.
Journal of Hydraulic Engineering
118
(
7
),
1031
1046
.
Ravichandran
T.
Gavahi
K.
Ponnambalam
K.
Burtea
V.
Mousavi
S. J.
2021
Ensemble-based machine learning approach for improved leak detection in water mains
.
Journal of Hydroinformatics
23
(
2
),
307
323
.
Romano
M.
Kapelan
Z.
Savic
D.
2009
Bayesian-based online burst detection in water distribution systems
. In:
Integrating Water Systems
(J. Boxall & C. Maksimovic, eds.).
Taylor and Francis/CRC Press
,
London
, pp.
331
337
.
scikit-learn developers
(
2019
)
scikit-learn user guide. scikit-learn
.
Soldevila
A.
Blesa
J.
Tornil-Sin
S.
Duviella
E.
Fernandez-Canti
R. M.
Puig
V.
2016
Leak localization in water distribution networks using a mixed model-based/data-driven approach
.
Control Engineering Practice
55
,
162
173
.
Soldevila
A.
Fernandez-Canti
R. M.
Blesa
J.
Tornil-Sin
S.
Puig
V.
2017
Leak localization in water distribution networks using Bayesian classifiers
.
Journal of Process Control
55
,
1
9
.
Vrachimis
S.
Kyriakou
M.
2018
LeakDB : A benchmark dataset for leakage diagnosis in water distribution networks
. In:
Proc. of 1st International WDSA/CCWI Joint Conference
1 (146).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/)