Abstract
A significant percentage of treated water is lost due to leakage in water distribution systems. The state-of-the-art leak detection and localization schemes use a hybrid approach to hydraulic modeling and data-driven techniques. Most of these works, however, focus on single leakage detection and localization. In this research, we propose to use combined pressure and flow residual data to detect and localize multiple leaks. The proposed approach has two phases: detection and localization. The detection phase uses the combination of pressure and flow residuals to build a hydraulic model and classification algorithm to identify leaks. The localization phase analyzes the pattern of isolated leak residuals to localize multiple leaks. To evaluate the performance of the proposed approach, we conducted experiments using Hanoi Water Network benchmark and a dataset produced based on LeakDB benchmark's dataset preparation procedure. The result for a well-calibrated hydraulic model shows that leak detection is 100% accurate while localization is 90% accurate, thereby outperforming minimum night flow and raw- and residual-based methods in localizing leaks. The proposed approach performed relatively well with the introduction of demand and noise uncertainty. The proposed localization approach is also able to locate two to four leaks that existed simultaneously.
HIGHLIGHTS
Water leak detection and localization (LDL) approaches based on a hybrid of hydraulic modeling and classification, and statistical approaches are proposed.
Combined residual data of pressure and flow are used to enhance LDL.
By separating the detection and classification phase, multiple leaks are localized.
INTRODUCTION
The urgency of managing challenges due to leakage in pipe networks has become greater in recent years due to water shortages caused by recent droughts, increase in demand along with environmental, social and political pressures (Dighade et al. 2014). The significance of leak detection stems from the fact that many water scarce and arid countries in the world have limited options for water resources development. Although there is awareness that efficient management of water resources is a growing necessity, non-revenue water due to physical losses such as leakage is still excessive in many cities. A World Bank study states that the global estimate of physical water losses is about 32 billion cubic meters each year, half of which occurs in developing countries (Bhagat et al. 2019).
One of the greatest challenges considered in the water distribution system (WDS) is the occurrence of leaks (Li et al. 2020). Reducing leakage volume not only saves water to supply the ever-growing population, but potentially avoids or at least delays costly expansions of the distribution system through hydraulic works and reduces costs related to energy and environmental losses (Ferrante et al. 2013). Lost revenue due to leakage is estimated to be around USD 2.9 billion every year in developing countries alone (Bell 2016). In the case of intermittent supply, which is frequently caused by excessive leakage, the urban poor often suffer most, as they cannot afford proper storage facilities and pumps, and often have to buy water from vendors during non-supply hours. Reducing physical losses will make more water available and enable water utilities to increase coverage, improve quality of services, and reduce expansion and distribution costs.
Water leak detection and localization (LDL) is one of the main activities done in the effort to reduce water leakage volume. Several researchers propose different software-based methods to help in detecting and localizing water leakage as early as possible. Software-based methods rely on algorithms and data-analyzing tools for finding anomalies like leaks in the hydraulic data pattern. In these methods, hydraulic sensors are used to measure time-series data of vital parameters like pressure, flow rate, and temperature. The main theme for software-based methods is having predicted system states and comparing them with the current observation of the system state for finding abnormal events like leakage and contamination. The approaches used in software-based methods could be classified as hydraulic modeling and data-driven-based approaches.
The use of hydraulic modeling in water LDL is customary in many studies (Lah et al. 2018). The advancement of these hydraulic modeling tools makes it easier to predict system states in the WDS under different conditions. A well-calibrated hydraulic WDS model is used as a reference for comparing subsequent time-step hydraulic data in leakage detection. In model-based WDS analysis, the quantity and quality of the available data are the main factors for building a well-calibrated model that results in better LDL (Kirstein et al. 2021). The detection of leakages with the hydraulic model is mostly based on either residual analysis between the WDS hydraulic model predictions and the WDS observations (Pudar & Liggett 1992; Pérez et al. 2009, 2010, 2014; Casillas Ponce et al. 2014; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017) or WDS state estimates pattern evaluation (Jung & Lansey 2014; Anjana et al. 2015; Jung et al. 2015; Khalilabad et al. 2018). The focus of this paper is on the residual analysis approach.
Data-driven approaches extract meaningful information in time-series data using different statistical (Romano et al. 2009, 2017; Jung & Lansey 2014) and artificial intelligence (Mounce & Machell 2006; Aksela et al. 2009; Mounce et al. 2011; Mounce et al. 2014; Ravichandran et al. 2021) algorithms. These approaches have shown promising results in pattern recognition and anomaly detection (Mounce 2013; Mounce et al. 2014). The approaches, however, fail to capture the dynamic nature of WDS and are not robust in boundary changes like pump and valve operational changes (Wu & Liu 2017). Such characteristics of WDS are usually solved with a calibrated hydraulic modeling (Okeya 2018). Hence, the combination of the two approaches is used to get a better result in LDL (Mashford et al. 2012; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017). The idea of localizing multiple leaks, however, is an open challenge, and the prior works focuses only on single leak localization.
Based on the previous hybrid approach, this study proposes a leakage detection and localization approach. The approach essentially generates leaky and non-leaky residuals using the calibrated hydraulic model and feeds it to a statistical binary classifier for training. After training the classifier, measured pressure and flow values are compared to generate new residuals that are fed to the classifier for isolating leaky and non-leaky residuals. Once a leak is detected, the leaky residual is further analyzed for finding leak locations. In the proposed approach, the outliers in the residual space for multiple time steps are considered as leaking nodes.
To assess the performance of the proposed approach, we conducted five sets of experiments using Hanoi Water Network (Fujiwara & Khang 1990) Benchmark and a realistic dataset produced based on LeakDB benchmark's dataset preparation procedure. The accuracy of the well-calibrated hydraulic model was compared against the state-of-the-art LDL approaches: minimum night flow (MNF) (Vrachimis & Kyriakou 2018), and hybrid approach (Soldevila et al. 2016; Carreno-Alvarado et al. 2017); and robustness when subjected to noise and demand uncertainty was evaluated. The results show that for a well-calibrated hydraulic model, the detection of leaks is 100% accurate, while localization is 90% accurate. The comparison with the state-of-the-art works shows that the proposed method outperforms MNF, and raw and residual-based methods in localizing leaks. The proposed method performed relatively well with the introduction of demand and noise uncertainty, where leaks of size 5 cm had resilience for noise and demand uncertainty. The localization method has shown effectiveness in locating two to four leaks that existed simultaneously.
The main contribution of this research is the use of combined residual data of pressure and flow to detect and localize multiple leaks. The state-of-the-art works focus on localization of single leakage. In addition, the research uses the hybrid approach for localization only, not for detection. The hybrid approach makes the localization of multiple leaks from a single testing sample challenging. The proposed approach, which is in line with the state of the art in using a hybrid approach, however, detects and localizes multiple leaks using residual of pressure and flow data.
METHODOLOGY
In order to localize multiple leaks, a method based on residual analysis is proposed. The proposed approach has two main phases: detection and localization. In the detection phase, we used hydraulic modeling to generate a dataset containing each node's residual of pressure and flow that represent the system state. The generated dataset is used to build a classifier model. The model is then used to classify new observational residual data into leaky and non-leaky. If the new observational residual data is classified as leaky, localization is instantiated.
The localization phase uses the residual data to localize nodes that are leaking. The rationale behind using the residual data for localization is that a leaking node will have the highest peak residual value of all neighboring nodes. At a particular timestamp, a water distribution network could have multiple leaks. The residual value for leaks at different locations and times could also be different. Hence, in this study, we translated the problem of leak localization into finding a local maximum for multiple leak scenarios and finding a global maximum for a single leak scenario at a given timestamp. To identify the maximum residual values, we propose to use a dynamic threshold value, , that is dependent on the current residual data. The threshold value, , is set to be the upper quartile of the local or global residual dataset. All residual values that are higher than the threshold value are considered as potential leaking nodes. Our approach, finally, identifies the final candidate leaking node(s) by taking the intersection of potential leaking nodes at different timestamps in a given period of time.
The general overview of the approach is shown in Figure 1. Details of the proposed approach along with the description of residual analysis are provided below.
Residual analysis
In previous studies, these pressure residuals are compared with a given threshold value,, for the detection of leak events. The threshold is calculated by considering the measurement noise and model uncertainty of the given WDS. If the residuals are greater than this threshold, leak is detected and localization is instantiated. The choice of this threshold affects the quality of detection and localization. Setting too high a value makes the detection fail to identify leak events, while setting the threshold value too low results in many false positives (FPs).
In this research, we used a classification algorithm to detect leaks and residual analysis for localization. In the residual analysis, we used a dynamic threshold value,, that is dependent on the current residual values to identify candidate leaking nodes. The threshold value, , in this study, is set to be the upper quartile of the current residual values.
Detection phase
This phase utilizes both hydraulic modeling and data-driven approach to detect leaks. The hydraulic modeling is used to generate residuals that represent the system state under leaky and non-leaky scenarios. The residuals are computed using Equation (1). The generated leaky and non-leaky residuals are then fed to the classifier to build a model. The major difference from previous works is that the proposed approach uses a combination of pressure and flow residuals for training and testing. We conjecture that combining the two residuals will improve the leak detection accuracy. The detection phase consists of generating input data, selection and training of the classifier, followed by the detection of leaky and non-leaky residuals.
Data generation
Hydraulic data that is representative of different leak sizes is generated using a modeling software. The generated data is used to train the classifiers. To make the data closer to reality, modeling, demand, and measurement uncertainty are taken into consideration during data generation. This is shown using vector v in Figure 1. After hydraulic data is generated, flow and pressure residuals are calculated using Equation (1). The two residuals are used as features of the classifier input data.
The data generation step shown on the left side of Figure 1 is adopted from previous works (Pudar & Liggett 1992; Pérez et al. 2009, 2014; Casillas Ponce et al. 2014; Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017). The symbols in the figure are input values that are provided to the model. Demand patterns, d1, d2 …dn, are adopted from LeakDB benchmark dataset. The boundary conditions, ci, do not contain pump and valve characteristics as the selected model, i.e., the Hanoi Water Network, is a relatively simple network.
Training the classifier
The second step after generating representative leak residuals is training the classifier. The classifier used for detection is a binary classifier. The training residuals are labeled with two classes, i.e., 0 – no leak and 1 – leak. The pressure and flow residuals for different magnitudes of leak amount and location are combined in a single dataset. Then, the combined dataset will be split for training and testing. In this research, support vector machine (SVM) (scikit-learn developers 2019) is used for classification. The rationale behind selecting SVM as a classifier is that it has the ability to treat high-dimensional inputs. Due to its robust nature, SVM can also cater for noisy residuals that are a consequence of demand variation. From the generated residuals, 50% are used for training and the remainder for testing.
Detection
After training, the classifier model is provided with a new observational residual data. The new input data will be the difference between field measured pressure and flow data and the baseline pressure and flow non-leaky data. The same procedure used to generate the residuals is adopted. Following this, the classifier model built during the training phase decides if the given residual is leaky or non-leaky. If the residual is found to be leaky, the localization phase is instantiated. In our experiment, to simulate the real observational data, we used the testing dataset.
Localization phase
The localization phase uses a statistical method to find the leaking node after a leak is detected in the system. The occurrence of a leak increases the incoming flow and reduces the pressure downstream. The effect of one leak in any node can be seen on the other nodes, especially on nodes that are looped like the Hanoi network. However, the effect is not the same in every node. The input for the localization phase is the timestamp where the leak is first detected. The phase adopts the following procedures.
First, the absolute value of each residual after the suspected timestamp is taken. Using the absolute value will make the deviation that is caused by pressure and flow rate to come in one dimension.
By taking the absolute value, all the residuals show a positive value. However, their values are not the same. Neighborhood nodes to the leaking node show larger residual values than the other nodes. The leaking node in particular shows the highest peak value of all neighboring nodes. Therefore, finding a leaking node translates into finding a local maximum for multiple leak scenarios and finding a global maximum for a single leak scenario.
A node is considered as a potential leaking node when its pressure residual value is greater than the threshold value, . The threshold, in our experiment is calculated as the upper quartile (75% quartile) of the residual data at a given timestamp. Taking the upper quartile of the residual data at a given timestamp helps to capture the network situation at that specific time. The threshold is calculated for the residual data computed at each timestamp, which could give different threshold values. The final candidate leaking nodes are the nodes selected from the list of potential leaking nodes at every timestamp. This is done by taking the intersection of potential leaking nodes at every timestamp in a given time period. Taking the intersection over a given time period helps to identify continuously leaking nodes. The final list of nodes is candidate leaking nodes that require the attention of the water authority. Algorithm 1 illustrates the pseudo-code for localization.
Algorithm 1 Localization Pseudo-code |
t0 = starting timestamp |
c1, c2, …cn//candidate for a time step |
fort0, …,Tfdo |
τ = 4th quartile |
for all nodes do |
if node's residue ≥τ then |
Ci [x] = node |
end if |
end for |
end for |
Final Candidate = c1∩c2∩ …∩cn |
Algorithm 1 Localization Pseudo-code |
t0 = starting timestamp |
c1, c2, …cn//candidate for a time step |
fort0, …,Tfdo |
τ = 4th quartile |
for all nodes do |
if node's residue ≥τ then |
Ci [x] = node |
end if |
end for |
end for |
Final Candidate = c1∩c2∩ …∩cn |
EXPERIMENT
The dataset for evaluating the proposed approach is prepared based on Leakage Diagnosis Benchmark (LeakDB) (Vrachimis & Kyriakou 2018), and modeling was done using the Hanoi Water Network Benchmark shown in Figure 2. The main reason for using the Hanoi Water Network is that it is simple and includes a loop. The Hanoi Water Network is used in several previous similar studies (Casillas Ponce et al. 2014; Soldevila et al. 2016; Carreno-Alvarado et al. 2017; Vrachimis & Kyriakou 2018), which facilitates comparison. LeakDB also used this network to generate the datasets. All the datasets and experiments in the proposed LDL system are implemented using the Python water network tool for resilience (Klise et al. 2017).
Experimental setup
A total of five sets of experiments were conducted to evaluate the proposed approach. Based on their intended goal, the experiments are grouped into three major categories.
Experiment 1: leak detection and localization
In this experiment, the two phases of the proposed approach are implemented and compared with related previous works. The experiment has three sub-experiments.
Experiment 1.1: detection comparisons with the MNF approach
Experiment 1.2: detection comparisons with approaches based on raw and residual data
In this experiment, the proposed combined residual approach is compared with the use of raw and residual pressures. In this experiment, we have also computed detection results while using only raw flow and residual flow data for comparison.
Experiment 1.3: single leak localization
This experiment compares the proposed approach with similar hybrid approaches that are described in Carreno-Alvarado et al. (2017) and Soldevila et al. (2016). The approach proposed by Soldevila et al. (2016) uses pressure residuals and multi-label classifier, K-Nearest Neighbor, to localize leaks. The proposed approach was tested on the Hanoi water network by taking pressure residuals of two nodes as input features and 26 nodes as an output class. Because the implementation of Soldevila et al.’s approach was not available, we have replicated their approach following the brief description provided in their paper.
Experiment 2: robustness to uncertainty
To cater for noise during pressure measurement and reduce inevitable demand variations, we conducted a second category of experiment which quantifies the robustness of the proposed approach. This is done by introducing demand uncertainty, noise addition, and application of both uncertainties to the model. In this experiment, 10 different leakage areas are created in the range between 1 and 10 cm with 1 cm increment. During the data generation phase, a 2 and 4% uncertainty of the average daily base demand is added before simulation. The same amount of Gaussian noise is also added in the pressure and flow measurement to check robustness of the proposed approach in the presence of noise.
Experiment 3: multiple leak localization
To evaluate the proposed localization approach when subjected to more than one leakage at a time, another set of experiments was conducted. In these experiments, the number of leaking nodes is varied from two to five leaks at a time. The aim of these experiments is to assess the ability of the proposed approach in localizing multiple leaks. In this experiment, four variants of multiple leaks, ranging from two to five simultaneous leaks, are simulated. For each variant of leak presented, 20 scenarios are run. The number of scenarios where the proposed approach was able to accurately localize the simulated number of leaks added to different nodes was documented.
Evaluation metrics
To evaluate the proposed approach, we used the data labels provided with the datasets for calculating the standard classification metrics. The metrics are computed using a confusion matrix. The confusion matrix is composed of TP, FP, true negative (TN), and false negative (FN) values. TP refers to cases where an actual leak is detected as a leak. TN refers to cases where non-leaky data is identified as non-leaky. FP refers to cases where a leak is alarmed while there is no leak. FN refers to cases where a leak is identified as non-leaky. Commonly used classification metrics such as precision, recall, F-measure, and accuracy, which are described below, can be computed using the confusion matrix.
RESULTS AND DISCUSSION
Experiment 1: leak detection and localization
Experiment 1.1: detection comparisons with the MNF approach
Table 1 shows the precision result of different scenarios for using MNF analysis in comparison with the proposed approach (i.e., the use of combined residuals).
Detection method . | Leak diameter (m) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.02 . | 0.03 . | 0.04 . | 0.05 . | 0.06 . | 0.07 . | 0.08 . | 0.09 . | 0.1 . | ||
Proposed approach | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
MNF (l/s) | = 60 | 25 | 40 | 40 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
= 70 | – | 33.3 | 66.6 | 66.6 | 100 | 100 | 100 | 100 | 100 | 100 | |
= 85 | – | – | – | 66.6 | 100 | 100 | 100 | 100 | 100 | 100 | |
= 100 | – | – | – | – | 66.7 | 100 | 100 | 100 | 100 | 100 |
Detection method . | Leak diameter (m) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.02 . | 0.03 . | 0.04 . | 0.05 . | 0.06 . | 0.07 . | 0.08 . | 0.09 . | 0.1 . | ||
Proposed approach | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
MNF (l/s) | = 60 | 25 | 40 | 40 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
= 70 | – | 33.3 | 66.6 | 66.6 | 100 | 100 | 100 | 100 | 100 | 100 | |
= 85 | – | – | – | 66.6 | 100 | 100 | 100 | 100 | 100 | 100 | |
= 100 | – | – | – | – | 66.7 | 100 | 100 | 100 | 100 | 100 |
As shown in Table 1, the proposed approach detects different leakage sizes with 100% accuracy. Precision of MNF for = 60 l/s is less than 50%. The low precision scores for MNF analysis are due to many FP reports, especially for small leak diameters ranging from 0.01 to 0.03 m. When considering = 70 l/s for MNF, large leakages (with leak diameter 0.05–0.1) were detected with 100% precision, whereas the detection of small leakages lacked accuracy. With a threshold of , MNF can detect large leakages, but small leakages are still undetected.
The results shown in Table 1 suggest that it is preferable to use the classifier-based approach than the MNF analysis approach. The proposed approach is free from thresholds, and different leak sizes can be detected with the classifier automatically. This is mainly because the leaky patterns can easily be captured from the dataset.
Experiment 1.2: detection comparisons with approaches based on raw and residual data
To see the effect of using residual data on leak detection, we compared the proposed approach with the approaches that use raw data. For proper comparison, the flow counterparts are also included in the experiment. Table 2 shows the result for leak sizes between 0.01 and 0.1 m.
Leak Diameter (m) . | Raw data . | Residual . | ||||
---|---|---|---|---|---|---|
Flow . | Pressure . | Combined . | Flow . | Pressure . | Combined . | |
0.01 | 66.7 | 66.3 | 66.5 | 66.7 | 100 | 100 |
0.02 | 66.7 | 66.4 | 66.6 | 66.7 | 100 | 100 |
0.03 | 66.7 | 66.1 | 66.3 | 66.7 | 100 | 100 |
0.04 | 66.7 | 70.1 | 69.6 | 100 | 100 | 100 |
0.05 | 66.7 | 70.5 | 70.7 | 100 | 100 | 100 |
0.06 | 66.7 | 66.1 | 66.3 | 100 | 100 | 100 |
0.07 | 66.7 | 69.3 | 73.1 | 100 | 100 | 100 |
0.08 | 100 | 99.4 | 99.6 | 100 | 100 | 100 |
0.09 | 93.4 | 74.1 | 72.7 | 100 | 100 | 100 |
0.1 | 66.7 | 66.1 | 66.3 | 100 | 100 | 100 |
Leak Diameter (m) . | Raw data . | Residual . | ||||
---|---|---|---|---|---|---|
Flow . | Pressure . | Combined . | Flow . | Pressure . | Combined . | |
0.01 | 66.7 | 66.3 | 66.5 | 66.7 | 100 | 100 |
0.02 | 66.7 | 66.4 | 66.6 | 66.7 | 100 | 100 |
0.03 | 66.7 | 66.1 | 66.3 | 66.7 | 100 | 100 |
0.04 | 66.7 | 70.1 | 69.6 | 100 | 100 | 100 |
0.05 | 66.7 | 70.5 | 70.7 | 100 | 100 | 100 |
0.06 | 66.7 | 66.1 | 66.3 | 100 | 100 | 100 |
0.07 | 66.7 | 69.3 | 73.1 | 100 | 100 | 100 |
0.08 | 100 | 99.4 | 99.6 | 100 | 100 | 100 |
0.09 | 93.4 | 74.1 | 72.7 | 100 | 100 | 100 |
0.1 | 66.7 | 66.1 | 66.3 | 100 | 100 | 100 |
From Table 2, the use of residuals generally outperforms the use of raw data in the detection accuracy. In terms of accuracy, adding flow residual data to pressure residual data did not show any improvement. Where flow measurements are scarce, pressure residuals can be used in place of the combined residual approach because they give similar results to that of the combination. From an application point of view, this presents a practical solution by reducing reliance on measurements.
Experiment 1.3: single leak localization
Carreno-Alvarado et al. (2017) used raw pressure residuals as input and the output classes are categorized into three zones. The loops in Figure 2 show the three classes. Carreno-Alvarado et al.’s work is similar to that of Ferrandez-Gamot et al. (2015) and Soldevila et al. (2016). However, they used raw pressure data instead of pressure residuals. Hence, we compared our approach with both types of approaches. For this comparison, the experiment is conducted by taking 24 h time step data with a sampling time of 1 h. Table 3 shows the results.
No. of nodes . | Proposed approach . | Carreno-Alvarado et al. (2017) . | Soldevila et al. (2016) . |
---|---|---|---|
26 nodes | 96.1 | 94.4 | 95 |
2 nodes | 57.3 | 52.3 | 53.5 |
No. of nodes . | Proposed approach . | Carreno-Alvarado et al. (2017) . | Soldevila et al. (2016) . |
---|---|---|---|
26 nodes | 96.1 | 94.4 | 95 |
2 nodes | 57.3 | 52.3 | 53.5 |
In Table 3, values in the first column represent the number of nodes used as input for the classifier. The results show that the proposed approach localizes single leak nodes better than the other two approaches. For all approaches, using the data from all nodes increases the localization accuracy. The result also shows that the use of raw pressure data gives the least accuracy compared to the others.
In Experiment 1, the proposed approach was evaluated against the threshold-based MNF approach in detecting leaks with size varying from 1 to 10 cm. While the proposed approach identified leaks across all sizes, the MNF approach was only able to detect relatively larger leaks. Having performed well across all leak sizes, accuracy of the proposed approach was compared against raw-based (Carreno-Alvarado et al. 2017) and residual-based (Soldevila et al. 2016) approaches for localizing single leaks. The proposed approach outperformed both approaches and gave similar leak detection results with the use of pressure residuals (see Table 2). The performance of the proposed approach was further evaluated for its ability to localize single leaks against approaches that used raw pressure data by varying the number of nodal data that was used as classifier input as shown in Table 3. The effect and significance of the number of input nodal data is translated in the accuracy of single leak detection across all approaches, where better performance was seen with higher input nodal data.
Experiment 2: robustness to uncertainty
Experiment 2.1: detection
To test the robustness of leakage detection, we conducted two experiments in the presence of 2 and 4% noise and demand uncertainties. For purposes of comparison, the same degrees of uncertainty were added on the MNF detection approach described earlier. Tables 4 and 5 show the F-measure for 2 and 4% uncertainties. The result shows that the proposed approach is robust to demand variation for large leakages greater than 0.05 m. The results for small leakage sizes of 0.01 m, however, show that the approach is not robust to uncertainties.
Approaches . | Leak diameter (m) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.05 . | 0.09 . | |||||||
Noise . | Demand . | Both . | Noise . | Demand . | Both . | Noise . | Demand . | Both . | |
Proposed approach | 3.1 | 43.5 | 41.5 | 72.5 | 97.2 | 65.3 | 95.6 | 100 | 95.9 |
MNF at = 70 | 0 | 0 | – | 20 | 75 | – | 20 | 75 | – |
Approaches . | Leak diameter (m) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.05 . | 0.09 . | |||||||
Noise . | Demand . | Both . | Noise . | Demand . | Both . | Noise . | Demand . | Both . | |
Proposed approach | 3.1 | 43.5 | 41.5 | 72.5 | 97.2 | 65.3 | 95.6 | 100 | 95.9 |
MNF at = 70 | 0 | 0 | – | 20 | 75 | – | 20 | 75 | – |
For MNF analysis, the addition of uncertainty gives poor results except for demand variation. The blanks in Tables 4 and 5 are undefined values due to the calculation of the F-measure. The other noticeable fact is that when the degree of uncertainty increases (i.e., from 2 to 4%), the detection score decreases. Large leak sizes greater than 0.05 m, however, are not significantly affected by the increased measurement noise.
Approaches . | Leak diameter (m) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.05 . | 0.09 . | |||||||
Noise . | Demand . | Both . | Noise . | Demand . | Both . | Noise . | Demand . | Both . | |
Proposed approach | 2 | 1.2 | – | 39.2 | 100 | 3.6 | 97.2 | 100 | 23.8 |
MNF at = 70 | 0 | 0 | – | 66.6 | 75 | – | 66.6 | 75 | – |
Approaches . | Leak diameter (m) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.01 . | 0.05 . | 0.09 . | |||||||
Noise . | Demand . | Both . | Noise . | Demand . | Both . | Noise . | Demand . | Both . | |
Proposed approach | 2 | 1.2 | – | 39.2 | 100 | 3.6 | 97.2 | 100 | 23.8 |
MNF at = 70 | 0 | 0 | – | 66.6 | 75 | – | 66.6 | 75 | – |
The results of both experiments show that under the perfectly modeled condition, the detection of small leakages, less than the 0.05 m range, is possible. However, when the degree of uncertainty increases, the detection accuracy gets lower. On the contrary, large leak sizes, which are greater than 0.05 m, show resistance to noise and demand uncertainty. This is because the residuals of small leakages are very small and are affected by the demand and noise addition. The addition of noise will distort the pattern in the residual space, which in turn makes it difficult to accurately classify. In a total stochastic condition, the result gets worse (see Table 5).
Experiment 2.2: localization
To assess the impact of uncertainty on localization, we conducted four groups of experiments, each with 20 scenarios. In each scenario, the leaking nodes are randomly selected.
Table 6 shows the results of single leak localization under different uncertainties. For an ideal scenario, where there is no uncertainty, the proposed approach detected 90% of the leaking nodes. When the uncertainties are introduced, the number of correctly identified leaking nodes has reduced. An important observation from this experiment is that when the leaking node is closer to the reservoir (i.e., nodes 2 and 3), it provides wrong localization results. Since the leaking nodes were selected randomly in the data generation phase, some leaking nodes were nodes 2 and 3. Therefore, the localization always fails to locate them. This is mainly because the proposed approach could not notice the change in pressure difference as they are very close to the reservoir.
Experiment . | Located . | Mislocated . | Precision (%) . |
---|---|---|---|
Only leakage | 18 | 2 | 90 |
Leakage + 4% noise | 14 | 6 | 70 |
Leakage + 4% demand | 15 | 5 | 75 |
Leakage + 4% demand + 4% noise | 13 | 7 | 65 |
Experiment . | Located . | Mislocated . | Precision (%) . |
---|---|---|---|
Only leakage | 18 | 2 | 90 |
Leakage + 4% noise | 14 | 6 | 70 |
Leakage + 4% demand | 15 | 5 | 75 |
Leakage + 4% demand + 4% noise | 13 | 7 | 65 |
When a single leak scenario is considered, the solution is the global maximum according to the localization phase of the proposed approach. For proper localization, it is preferred to consider neighboring nodes that show higher residual values like the actual leaking node.
Experiment 2 assesses the effect of demand uncertainty and noise on the overall performance of the proposed LDL approach. The results of the experiments shown in Tables 4 and 5 confirm that the best performance is obtained when using the proposed approach. With noisy measurements, accuracy of the proposed approach is higher for relatively larger leaks. The proposed approach also outperforms the MNF approach for relatively smaller leak sizes. When subjected to demand uncertainty, which is the most widely observed real-world uncertainty, the proposed approach showed better resilience to larger leak sizes and demonstrated an overall robustness to demand uncertainty. For both noise and demand uncertainty, the resilience of the proposed approach reduced with increase in uncertainty from 2 to 4% but still shows relatively better robustness than the MNF analysis in leak detection. In the localization phase of the experiment, it is apparent that nodes closer to the water supply source having relatively lower pressures did not have significant pressure discrepancies to be recognized by the approach. However, for nodes located elsewhere that are subjected to uncertainty, the results showed good performance.
Experiment 3: multiple leak localization
In this experiment, multiple leaks are added on different nodes at the same time. Four variants of multiple leaks are tested, each having 20 scenarios.
Table 7 shows the results for multiple leak localization experiments. The overall accuracy in Table 7 is computed as the weighted average of the 20 scenarios for each multiple leaks. The results show that for the majority of the scenarios, the proposed approach is able to detect at least half of the leaks present in the system for more than half of the scenarios. For example, when there were two leaks, the proposed approach has identified both leaks in 12 of the 20 scenarios. In the remaining eight scenarios, the approach is able to locate one of the leaks. When there were five leaks, the approach was able to detect at least three leaks in 12 out of the 20 scenarios. Closely looking at the leaks that are missed, we have noticed that they are usually closer to the reservoir, i.e., nodes 2 and 3.
Leaks presented . | Localized leaks . | Overall Accuracy (%) . | |||||
---|---|---|---|---|---|---|---|
None . | One . | Two . | Three . | Four . | Five . | ||
Two | – | 8 | 12 | 80 | |||
Three | – | 5 | 11 | 4 | 65 | ||
Four | – | 2 | 4 | 12 | 2 | 67.5 | |
Five | – | – | 8 | 9 | 3 | – | 55 |
Leaks presented . | Localized leaks . | Overall Accuracy (%) . | |||||
---|---|---|---|---|---|---|---|
None . | One . | Two . | Three . | Four . | Five . | ||
Two | – | 8 | 12 | 80 | |||
Three | – | 5 | 11 | 4 | 65 | ||
Four | – | 2 | 4 | 12 | 2 | 67.5 | |
Five | – | – | 8 | 9 | 3 | – | 55 |
The suitability of the proposed approach in detecting multiple leaks was demonstrated in Experiment 3 by separating the detection and classification phase. Prior mixed approaches are unable to locate multiple leakages using their classifier approach because they classify one class at a time. From the four variants of multiple leaks tested, experiment results show that at least two of the three leaks that take place in the network can be located. However, when the number of leaks is increased, the localization of multiple leaks has decreased. It is possible to conclude that the predicted leaks under each scenario are satisfactorily close to the exact number of leaks simulated, and the discrepancy is mainly due to the random selection of nodes which include nodes close to the reservoir.
CONCLUSION
This research proposed a leak detection approach that uses hydraulic modeling and classification approaches for detection, and a statistical approach for localization. Previous hybrid approaches lack the ability to localize multiple leakages. This research attempts to fill the gap of localization of multiple leakages. In doing so, this research conducted experiments on the Hanoi Water Network which were grouped into three major categories. The experiments were done to compare output of the proposed approach with that of previous works, to evaluate the robustness of the approach to uncertainty, and to evaluate the capacity of localizing multiple leaks. The results show that the combined residual approach is robust to demand uncertainty, outperforms the separate raw- and residual-based approaches in localizing single leaks, and can detect multiple leaks in a given time period.
The proposed approach was tested on a model that is based on a realistic benchmark dataset. This was considered as a viable option, as there was limited access to real-time field data in the city this study was conducted. For an enhanced uptake of the approach, it is recommended to test it on a real dataset taken from water distribution companies and other large academic water networks. The final output of the localization phase is the identification of candidate leaking nodes (i.e., leaking nodes and their neighbors). The candidate leaking nodes could further be refined using search space reduction techniques such as optimization and statistical process control methods.
ACKNOWLEDGEMENTS
The authors would like to deeply acknowledge Dr Agizew Nigussie from the School of Civil and Environmental Engineering, Addis Ababa Institute of Technology, Addis Ababa University and Dr Assefa M. Melesse from Earth System Sciences, Florida International University for their invaluable comments and thorough review of this manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.