Abstract
Acoustic sensors are widely deployed to detect hidden leakages in water distribution networks (WDNs). However, few studies have been conducted to quantitatively understand the dominant leakage acoustic characteristics, which are usually mixed with unknown environmental noises, coupled with the constraint of sparse deployment of acoustic sensors. In this paper, a comprehensive approach, that performs acoustic data feature analysis, is developed to detect pipe leakages in near real-time via a series of systematic analyses, namely: (1) data quality assessment; (2) features identifications; (3) outlier detection and event classification; and finally (4) near real-time leakage detection. The proposed solution has been tested on two major WDNs in Singapore having around 1,000 km of water pipelines installed with 74 permanently installed hydrophone sensors. The leakage detection results obtained from our case study demonstrate that the dominant leakage acoustic characteristics can be captured in lower intrinsic mode functions (IMFs), to within the frequency range of 100–750 Hz approximately, by decomposing the original acoustic signal. Systemwide leakage event detection and classification models are subsequently trained and tested on acoustic datasets collected over 13 historical months, where more than 70% F1-scores can be obtained from the emulated near real-time leakage detection analysis.
HIGHLIGHTS
Developed acoustic feature-based methodology for near real-time leakage event detection and classification.
Verified methodology with 74 permanently installed hydrophones in two water supply zones having 1,000 km water pipelines in Singapore.
Achieved F1-score of >70% for leakage event detection and classification analyses on imbalanced acoustic datasets collected over 13 months.
INTRODUCTION
Reducing water losses in water distribution networks (WDNs) is of great significance to control carbon footprints and improve infrastructure sustainability (Wu et al. 2011). While water losses comprise apparent and physical components, it is the latter, due to hidden pipe leakages and/or other anomaly events (Mounce et al. 2010; Guo et al. 2021), which pose the biggest challenge for water utilities. Hidden leakages in water pipelines contribute significantly to the total non-revenue water (NRW) (Cassidy et al. 2020) in WDNs and they can be classified into reported, hidden, and background leaks. Among them, hidden leaks are the most difficult to manage as the unaccounted water does not appear readily on the open streets and can remain insidious before becoming disruptive events to the public.
Over the years, different techniques have been developed to assist field engineers to detect and localize hidden leakage events in WDNs. For example, the use of physics-based hydraulic models to analyse flow and pressure data (Wu et al. 2010, 2021; Pérez et al. 2011; Goulet et al. 2013; Gerard et al. 2016; Gianfredi et al. 2017; Chew et al. 2022), and device-based methods which collect leak acoustic signals in water pipelines (Khulief et al. 2012; Muntakim et al. 2017; Butterfield et al. 2018a, 2018b; Cody et al. 2020; Guo et al. 2021). Other leak detection methodologies include Bayesian updating, coupled with optimal distribution methods (Alawadhi & Tartakovsky 2020), transient test-based techniques (TTBTs) (Meniconi et al. 2021), and trend-change analysis (Xue et al. 2022).
Over the last decade, acoustics sensors have been increasingly deployed by utility companies as part of their 24/7 permanent monitoring or temporary leakage program(s). Typically, an acoustic signal, as caused by pipe leakages, is due to the complex interaction between the flowing water and interiors of the underground pipe wall(s), hence generating random wave signals with both short-term nonstationary and long-term stationary frequency components (Almeida et al. 2014). In the practical field context, the emitted acoustic signals are, however, mixed with a range of environmental noises which can lead to a low signal-to-noise ratio (SNR) (Wen et al. 2004; Guo et al. 2021). Thus, it is imperative to first pre-process raw acoustic signals, followed by leveraging the processed data to detect possible hidden leakages underground. Examples of acoustic data pre-processing techniques involve decomposing waveform profiles into intrinsic mode functions (IMFs) via empirical mode decomposition (EMD) (Norden et al. 1998), or its improved version in ensemble empirical mode decomposition (EEMD) (Wu & Huang 2009; Luukko et al. 2016).
RELATED STUDIES
On the whole, leakage detection in WDNs via acoustic signal data analysis can be broadly grouped into two main categories, namely: (a) traditional signal processing-based methods and (b) data-driven methods using machine learning. Table 1 summarizes the key methods and findings reported from previous notable studies in the current literature.
Ref. . | Study Type . | Pipeline length . | # sensor(s) . | Detection methods . | Key findings . |
---|---|---|---|---|---|
Chatzigeorgiou (2010) | Laboratory set-up | 1.5 m | 1 | Used hydrophone to investigate simulated leaks of different sizes | Leak acoustic power was found to be maximum at leak source, and significant at upstream of leak source |
Khulief et al. (2012) | Laboratory set-up | Not given | 1 | Leak size directly affected leak frequency band | |
Amin & Ghazali (2015) | Laboratory set-up | 4.15 m | 1 | Applied EEMD to acoustic signals | Achieved 99.7% detection accuracy |
Muntakim et al. (2017) | Controlled field test | 100 m | 2 | Applied cross-correlation to analyse leaks of varying sizes between two sensors | Detected leaks of around 1.6 L/s between two sensors up to 176 m |
Butterfield et al. (2018a, 2018b) | Laboratory set-up | 26 m | 5 | Combined random forest algorithm with acoustic signal features for developing leak shape prediction algorithm | Predicted the leak shape without any information on leak area, and leak flow rate |
Kang et al. (2018) | Controlled field test | 1.28 km | 6 | Applied ensemble CNN-SVM and graph-based method | Achieved 99.3% detection accuracy and localized leaks to < 3 m error |
Cody et al. (2018) | Laboratory set-up | 15 m | 1 | Applied singular spectrum analysis (SSA) to extract leakage acoustics | Leaks led to observable spikes in signal's spectral amplitudes |
Cody et al. (2020) | Laboratory set-up | 20 m | 1 | Applied CNN and variational autoencoder models | Achieved 97.2% detection accuracy |
Kothandaraman et al. (2020) | Laboratory set-up | 40 m | 1 | Combined cross-correlation and EMD | Achieved > 95% detection accuracy |
Mark et al. (2020) | Controlled field test | 120 km | 305 | Monitored acoustic signals in terms of their relative changes in observed noises (dB) and signal frequency | Achieved lead-time detection between 1 day and 1 year, with detected localization distance ranging between 20 and 50 m |
Zhang et al. (2020) | Controlled field test | 120 km | 305 | Combined statistical analysis, CUSUM, and Kalman filter | Minimized false positive rate with 12 h lead-time detection |
Guo et al. (2021) | Controlled field test | 270 m | 6 | Applied time-frequency CNN model to handle different SNR conditions | Achieved 99.0% detection accuracy |
Ning et al. (2021) | Laboratory set-up (Gas pipeline) | 6.93 m | 1 | Combined EEMD and RF algorithm to handle different SNR conditions | Achieved > 99.0% detection accuracy with varying sampling distances within set-up |
Ravichandran et al. (2021) | Full-scale field test | Not given | 300 | Trained multi-strategy ensemble learning (MEL) model with 48 acoustic-based features | Reduced quantity of false positive events by an order of magnitude |
Ref. . | Study Type . | Pipeline length . | # sensor(s) . | Detection methods . | Key findings . |
---|---|---|---|---|---|
Chatzigeorgiou (2010) | Laboratory set-up | 1.5 m | 1 | Used hydrophone to investigate simulated leaks of different sizes | Leak acoustic power was found to be maximum at leak source, and significant at upstream of leak source |
Khulief et al. (2012) | Laboratory set-up | Not given | 1 | Leak size directly affected leak frequency band | |
Amin & Ghazali (2015) | Laboratory set-up | 4.15 m | 1 | Applied EEMD to acoustic signals | Achieved 99.7% detection accuracy |
Muntakim et al. (2017) | Controlled field test | 100 m | 2 | Applied cross-correlation to analyse leaks of varying sizes between two sensors | Detected leaks of around 1.6 L/s between two sensors up to 176 m |
Butterfield et al. (2018a, 2018b) | Laboratory set-up | 26 m | 5 | Combined random forest algorithm with acoustic signal features for developing leak shape prediction algorithm | Predicted the leak shape without any information on leak area, and leak flow rate |
Kang et al. (2018) | Controlled field test | 1.28 km | 6 | Applied ensemble CNN-SVM and graph-based method | Achieved 99.3% detection accuracy and localized leaks to < 3 m error |
Cody et al. (2018) | Laboratory set-up | 15 m | 1 | Applied singular spectrum analysis (SSA) to extract leakage acoustics | Leaks led to observable spikes in signal's spectral amplitudes |
Cody et al. (2020) | Laboratory set-up | 20 m | 1 | Applied CNN and variational autoencoder models | Achieved 97.2% detection accuracy |
Kothandaraman et al. (2020) | Laboratory set-up | 40 m | 1 | Combined cross-correlation and EMD | Achieved > 95% detection accuracy |
Mark et al. (2020) | Controlled field test | 120 km | 305 | Monitored acoustic signals in terms of their relative changes in observed noises (dB) and signal frequency | Achieved lead-time detection between 1 day and 1 year, with detected localization distance ranging between 20 and 50 m |
Zhang et al. (2020) | Controlled field test | 120 km | 305 | Combined statistical analysis, CUSUM, and Kalman filter | Minimized false positive rate with 12 h lead-time detection |
Guo et al. (2021) | Controlled field test | 270 m | 6 | Applied time-frequency CNN model to handle different SNR conditions | Achieved 99.0% detection accuracy |
Ning et al. (2021) | Laboratory set-up (Gas pipeline) | 6.93 m | 1 | Combined EEMD and RF algorithm to handle different SNR conditions | Achieved > 99.0% detection accuracy with varying sampling distances within set-up |
Ravichandran et al. (2021) | Full-scale field test | Not given | 300 | Trained multi-strategy ensemble learning (MEL) model with 48 acoustic-based features | Reduced quantity of false positive events by an order of magnitude |
CNN-SVM, convolutional neural network-support vector machine; CUSUM, cumulative sum; RF, random forest.
While numerous experimental works have been performed to develop signal processing-based methods for detecting hidden leaks in underground pipes, they mainly deployed a high density of acoustic sensors to effectively capture the leak acoustic signals within a near spatial distance range. For example, Mark et al. (2020) reported that accelerometers could detect leaks within 20–50 m for their case study analysis in Adelaide. Similar findings have also been shared by Zhang et al. (2020). While both field studies/investigations may represent the actual operations of real-world WDNs, as compared with lab-scale or experimental studies, the high sensor density requirement is not likely to represent long-term acoustic monitoring in most large-scale WDNs, where sensors are typically installed permanently and sparsely, as exemplified in our subsequent case study analyses.
For data-driven modelling using machine learning methods, despite its well-known effectiveness, an important pre-condition is to have sufficiently large and balanced acoustic datasets in terms of the total number of leakage and non-leakage data records for effective model training. However, such data prerequisites may not be possible in all well-managed WDNs, where the number of historical leakage records are likely to be less than that of the non-leakage category. In addition, it is worth noting that most of the well-documented acoustic data-driven studies for leak detection in WDNs (Kang et al. 2018; Cody et al. 2020; Guo et al. 2021) were mainly carried out under small-scale experimental or controlled field tests with sufficient deployment of acoustic sensors (see Table 1). To handle the data constraint issue for leak detection using acoustic data, Ravichandran et al. (2021) recently developed a novel multi-strategy ensemble learning (MEL) approach that exploited 48 acoustics-based features which were extracted from historical power spectral density (PSD) and time-series profiles. While the authors demonstrated the effectiveness of MEL to reduce the total number of detected false positives by an order of magnitude, their research analysis deployed a large number of acoustic sensors which thus increased the likelihood/probability of detecting the pipe leaks, as compared with the context of sparse deployment of acoustic sensors for the same practical purpose.
To mitigate the requirement of large acoustic datasets for effective model training, the need to identify useful acoustic data features for leak detection has since garnered some research interest. For example, Kothandaraman et al. (2020) coupled conventional cross-correlation analysis with EMD to detect and localize pipe leakages underground, and Ning et al. (2021) recently developed a useful framework that couples EEMD and random forest algorithm to classify the different types of leaks. While these studies have quantitatively demonstrated the usefulness of EMD or EEMD pre-processing methods to remove ambient noises embedded in the acoustic signals, we again highlight that they were mainly collected under controlled lab-scale experiments where the frequency of the environmental noises (e.g., blowing fans, pump noise) can be identified easily and removed/filtered to extract the most useful leakage acoustic frequency. It is therefore still a major engineering challenge to process complex real-world acoustic signals to best identify useful leakage acoustic characteristics for detecting leaks in WDNs.
To address the above-discussed challenges, and shortcomings from previously reported research studies, for leak detections in large-scale WDNs having permanently and sparsely installed acoustic sensors, this technical paper aims to develop an effective acoustic feature-based methodology by encompassing: (1) data quality assessment, (2) data decomposition, (3) features identifications, (4) model training for outlier detection and classification, and (5) near real-time leakage detection. In collaboration with PUB, Singapore's National Water Agency, our proposed approach has since been verified with historical acoustic signal datasets collated over 13 months between 2019 and 2020 for large-scale operational WDNs with a total pipe length of 1,000 km in Singapore.
DATA DESCRIPTION
A spectrogram (see Figure 1(b)) comprises two dimensions where the x-axis represents time (seconds), while the y-axis represents the frequency (Hz) of the acoustic signal. An additional third dimension, as represented by normalized colour intensity values in decibels (dB), quantifies the signal strength at a specific frequency value where brighter colours represent a higher power range, and vice versa for darker colours. PSD profile (see Figure 1(c)) analyses the power density distribution of the same signal over its frequency range, where its x-axis represents the signal's frequency (Hz) and the y-axis represents the corresponding power density (db/Hz) values across the frequency values.
METHOD FORMULATION
Data Quality Assessment: Perform data quality assessment on all acoustic data files, and remove those having ‘bad’ data quality.
Features Identification: Decompose acoustic data files of ‘good’ data quality into various IMF components to generate useful acoustic data features for performing leak detection analysis.
Outlier Detection: Perform outlier detection for each acoustic sensor using the generated acoustic data features.
Leakage Event Classification: Classify the detected outliers from all acoustic stations into leakage event clusters.
Near real-time event detection: Trained detection model is then deployed to detect hidden pipe leakage occurrences in the near real-time (NRT) context.
Data quality assessment
During the operations of WDNs in the practical field context, a mixture of unknown environmental noises, caused by connected pumps, valves, pipe bends, road traffic, and weather conditions, is expected to be embedded in the collected acoustic signals. It is also common that permanently installed sensors do not function correctly in the field at all times, hence resulting in numerical errors being introduced into the recorded acoustic readings over time.
Data quality issue . | Issue descriptions . | Rectification measure . |
---|---|---|
Missing Data | Missing acoustic data files at defined datetime(s) | None |
Constant Signal | Zero or constant amplitude values for a given datetime | Exclude from analysis |
Clipped Signal | Waveform profile is being clipped as only amplitude values within the known upper and lower bounds for a given bit depth can be recorded | Remove clipped component |
Drifted Signal | Signal amplitude values are not centred along the zero axis | Zero-centering of signal values |
Data quality issue . | Issue descriptions . | Rectification measure . |
---|---|---|
Missing Data | Missing acoustic data files at defined datetime(s) | None |
Constant Signal | Zero or constant amplitude values for a given datetime | Exclude from analysis |
Clipped Signal | Waveform profile is being clipped as only amplitude values within the known upper and lower bounds for a given bit depth can be recorded | Remove clipped component |
Drifted Signal | Signal amplitude values are not centred along the zero axis | Zero-centering of signal values |
Feature identification
IMF-1 is expected to embed the most ‘noises’ which correspond to the higher frequency range, as shown in Figure 5(b). Thus, this component is likely to be least useful to capture the leakage acoustic signal.
Between IMF-10 and IMF-15 components, their decomposed waveform signals (see Figure 5(k)–5(p)) are unlikely to be useful due to the gradual shifting of the signal amplitude values from the zero axis, as shown. This is further evident in Figure 6(k)–6(p) where there are no observable and significant spikes in their corresponding power density values.
As the leakage signals in underground water pipelines are expected to reside in the lower frequency values (Chatzigeorgiou 2010), the resulting PSD profiles from IMF-2 to IMF-6 (see Figure 6(c)–6(g)) are likely to be useful due to observable spikes occurring in their respective computed power density profiles along the lower frequency ranges (0–1,000 Hz).
Physically, it is expected that the acoustic power caused by the leakage condition will be greater than that of the baseline non-leakage condition. To reasonably approximate this leakage acoustic power, it is imperative to first identify the key leakage acoustic data features which can be derived from the decomposed data signals, namely, (1) specific IMF component(s) and (2) dominant frequency range, which can best represent the leakage acoustic characteristics. We thus propose a power feature that models the signal acoustic power computed using a specific IMF component (and its subsequent PSD profile) for a defined frequency range after performing the EEMD step. The general steps to compute for any given IMF component are as follows: (1) convert IMF waveform profile into its corresponding PSD and (2) sum the power density values in the derived PSD profile for a defined frequency range using a simple trapezoidal rule.
Leakage acoustic component identification
To develop a general methodology, a solution workflow process is thus proposed to programmatically identify the specific IMF component(s) which can best capture the dominant leakage characteristics/frequency for any historical leakage event reported in the modelled WDN. The details are as follows:
- i.
For each modelled leakage event, we first identify the acoustic sensor located nearest to the reported event location within the modeller-defined pipeline distance range, as typically measured in metres (m). If this condition is met, proceed with the subsequent steps.
- ii.
Leak PSD (): Derive the time-averaged PSD profile for selected timestamps (e.g., 2 am–4 am) of the individual IMF component(s) pertaining to the historical leakage date(s).
- iii.
Non-Leak PSD (): Using the same IMF component(s), derive their time-averaged PSD profile for the identical timestamps pertaining to the baseline non-leakage date(s).
- iv.For every selected IMF from the previous steps (ii–iii), compute the level of similarity between their associated and profiles using: (a) cross-correlation , as defined in Equation (1); and (b) coefficient of determination , as defined in Equation (2). If no frequency lags are considered, in (1) essentially represents the conventional correlation coefficient value in statistical analysis.where N is the total number of data points within the PSD profile, represents the profile, and represents the profile.
Leakage and non-leakage pipe conditions are thus differentiated via the highest dissimilarity between and that can be quantified by lower and values for a selected IMF component. To avoid a large number of enumerations, we consider IMFs 2–4 as the arbitrary starting selection. If IMF-4 can quantitatively be shown to have limited observable leakage signals, then the remaining higher-order IMFs are most unlikely to be useful to capture the leakage condition.
Each IMF consists of a unique frequency range where has the total acoustic power (area under the curve) greater than that of , where IMF-2 has the greater value difference for the frequency range of around 400–600 Hz in the selected example.
and using IMF-4 are very similar by attaining very high and values, hence confirming that IMF-4 is likely to be the least useful to capture any dominant acoustic leakage characteristics in the modelled system.
With increasing IMF, the and values consistently increase, hence indicating the higher level of similarity between and at higher IMFs. Consequently, this observation suggests that lower IMF components are generally more useful to capture the dominant acoustic leakage characteristics.
Leakage acoustic frequency identification
Upon identifying a dominant IMF data feature that may best represent the leakage condition, we then proceed to identify a common frequency range associated with the identified data feature via the following solution steps:
- i.
- ii.
- iii.
Locate a critical frequency value () within that has the largest associated positive value.
The IMF(s) and frequency range identified for the dominant leakage acoustic characteristics are further leveraged to generate the values, followed by using them for subsequent model training and NRT analysis (see Figure 2) for leak detection. The resulting model performances obtained from using will then be compared against those of two other common acoustic power features in and . Note that is computed by directly converting the waveform (without applying EEMD beforehand) into its corresponding PSD profile, followed by summing the power density values across its entire frequency range. is simply estimated by locating the minimum power density value of the PSD profile produced from the original waveform without applying EEMD during the data processing step.
Outlier detection
Outlier detection is first performed at the local station level by using audio files from the MNF hours between 2 am and 4 am daily where environmental noises are expected to at a minimum. The proposed outlier detection method depends on a defined reference time-window which considers the historical leakage event date, modeller-defined lead-time and lagged-time from the reported event date, and a selected total time-window size . All proposed time windows are measured in days (e.g., 24 h, 72 h) for the purpose of leakage detections in WDNs modelling and analysis. The selected and values for any distribution network are based on the modeller's level of understanding of the network's operational complexity and communications with the utility company. In our subsequent case study analyses, and are set as 3 days and 1 day, respectively, based on the agreed recommendations with our collaborator, PUB, Singapore's National Water Agency.
Leakage event classification
After the outliers are detected for each individual sensor station (i), they are then clustered and classified into systemwide leakage events by using a set of user-defined temporal proximity criteria as follows.
Basic Criterion 1 (BC-1): Total number of stations having detected outliers on any random day must be greater than the minimum number of stations having outliers ().
Basic Criterion 2 (BC-2): Total number of detected outliers between 2 am and 4 am for each sensor station on the same random day must be greater than the minimum number of outliers at each station ().
Advanced Criterion (AC): Total number of detected outliers from all stations must be greater than the minimum number of outliers ().
Time Gap Criterion (TGC): Maximum time difference between two adjacent detected outliers within each event cluster must be less than the maximum allowable time gap ().
Near real-time leakage event detection
To initiate near real-time leakage event detection, upon training a systemwide detection model for a given set of acoustic sensor stations, several key model parameters, as listed below, must be tuned to train and optimize the model's detection performance at the systemwide level.
Pre-select different combinations of , , , , and m to optimize the detection model's F1-score during model training.
For each combination of the model parameters, perform checks for BC-1 and BC-2 daily. Note that at the starting date of the training phase, the system's current cluster size is set as 0.
On a daily basis during the model training period, both BC-1 and BC-2 must be fulfilled, followed by adding the total number of detected outliers from all possible stations into . If either BC-1 or BC-2 is not fulfilled on any day, AC and TGC checks are checked if their respective criterion is fulfilled for . Once this condition is met, is then reset to 0 on the following day for continuing the detection analysis.
At the end of the model training phase, classify the identified event cluster(s) into TP or FP event(s) accordingly by using the defined for each of the historical leakage event(s) being modelled during the training phase. The model's resulting performance scores are then computed using Equations (9)–(11) at the systemwide level.
CASE STUDY
Description of WDN systems
A total of 236 historical leakage events, ranging between 1 August 2019 and 31 August 2020, were reported for both modelled WDNs. Due to their size scale, there was generally a low probability for each historical leakage event to be situated close to a permanently installed hydrophone, especially since the events do not occur daily and the number of installed hydrophones is less than the total number of historical events. Tables 3 and 4 summarize the list of hydrophones deployed in both zones having historical leakage event(s) reported to be within an arbitrary 450 m, or less, pipeline distance from the nearest station, hence at least 90% of the total reported events in both zones were located more than 450 m away from the nearest hydrophone station(s). Note that the threshold pipeline distance is again a user-defined parameter. Finally, Table 5 summarizes the details of the audio files collected across all hydrophones in both zones for the above listed period.
Reported leakge dates . | Nearest model junction . | Pipeline distance (m) . | Nearest hydrophone . |
---|---|---|---|
8/16/2019 | J_M1 | 70 | STN 39 |
9/9/2019 | J_M2 | 419 | STN 36 |
9/10/2019 | J_M3 | 174 | STN 41 |
9/11/2019 | J_M4 | 72 | STN 30 |
3/10/2020 | J_M5 | 90 | STN 32 |
8/5/2020 | J_M6 | <100 |
Reported leakge dates . | Nearest model junction . | Pipeline distance (m) . | Nearest hydrophone . |
---|---|---|---|
8/16/2019 | J_M1 | 70 | STN 39 |
9/9/2019 | J_M2 | 419 | STN 36 |
9/10/2019 | J_M3 | 174 | STN 41 |
9/11/2019 | J_M4 | 72 | STN 30 |
3/10/2020 | J_M5 | 90 | STN 32 |
8/5/2020 | J_M6 | <100 |
Reported leakge dates . | Nearest model junction . | Pipeline distance (m) . | Nearest hydrophone . |
---|---|---|---|
8/1/2019 | J_M1 | 202 | STN_38 |
2/5/2020 | J_M2 | 44 | STN_38 |
8/21/2019 | J_M3 | 400 | STN_22 |
8/29/2019 | J_M4 | 46 | STN_13 |
1/9/2020 | J_M5 | 36 | STN_16 |
4/5/2020 | J_M6 | 263 | STN_48-B |
5/10/2020 | J_M7 | 87 | STN_28 |
5/3/2020 | J_M8 | 54 | STN_17 |
6/17/2020 | J_M9 | 418 | STN_44 |
7/1/2020 | J_M10 | 399 | STN_37 |
8/11/2020 | J_M11 | 320 | STN_14 |
8/20/2020 | J_M12 | 285 | STN_20 |
Reported leakge dates . | Nearest model junction . | Pipeline distance (m) . | Nearest hydrophone . |
---|---|---|---|
8/1/2019 | J_M1 | 202 | STN_38 |
2/5/2020 | J_M2 | 44 | STN_38 |
8/21/2019 | J_M3 | 400 | STN_22 |
8/29/2019 | J_M4 | 46 | STN_13 |
1/9/2020 | J_M5 | 36 | STN_16 |
4/5/2020 | J_M6 | 263 | STN_48-B |
5/10/2020 | J_M7 | 87 | STN_28 |
5/3/2020 | J_M8 | 54 | STN_17 |
6/17/2020 | J_M9 | 418 | STN_44 |
7/1/2020 | J_M10 | 399 | STN_37 |
8/11/2020 | J_M11 | 320 | STN_14 |
8/20/2020 | J_M12 | 285 | STN_20 |
Detail . | Zone-1 . | Zone-2 . |
---|---|---|
Historical date range | 1 August 2019–31 August 2020 | |
Type of pipe material | Ductile iron | |
Pipe diameter | Min: 50 mm | Min: 15 mm |
Avg: 198 mm | Avg: 820 mm | |
Max: 2,200 mm | Max: 1,400 mm | |
Pipe length | Min: 0.03 m | Min: 0.025 m |
Avg: 23.4 m | Avg: 23.9 m | |
Max: 6,014.5 m | Max: 2,089.4 m | |
Pipe roughness | Min: 80 | Min: 70 |
Avg: 125 | Avg: 115 | |
Max: 140 | Max: 150 | |
Total quantity of hydrophones | 27 | 47 |
Total quantity of .WAV files | 156,734 | 232,026 |
Total quantitya of .WAV files from MNF hours | 21,734 | 25,518 |
Bit depth of .WAV files | 16 | |
Sampling rates (Hz) | 2,048–8,192 | |
Time length of .WAV files (s) | 6.0–30.0 | |
No. of channels | 1 (mono) |
Detail . | Zone-1 . | Zone-2 . |
---|---|---|
Historical date range | 1 August 2019–31 August 2020 | |
Type of pipe material | Ductile iron | |
Pipe diameter | Min: 50 mm | Min: 15 mm |
Avg: 198 mm | Avg: 820 mm | |
Max: 2,200 mm | Max: 1,400 mm | |
Pipe length | Min: 0.03 m | Min: 0.025 m |
Avg: 23.4 m | Avg: 23.9 m | |
Max: 6,014.5 m | Max: 2,089.4 m | |
Pipe roughness | Min: 80 | Min: 70 |
Avg: 125 | Avg: 115 | |
Max: 140 | Max: 150 | |
Total quantity of hydrophones | 27 | 47 |
Total quantity of .WAV files | 156,734 | 232,026 |
Total quantitya of .WAV files from MNF hours | 21,734 | 25,518 |
Bit depth of .WAV files | 16 | |
Sampling rates (Hz) | 2,048–8,192 | |
Time length of .WAV files (s) | 6.0–30.0 | |
No. of channels | 1 (mono) |
aAfter accounting for missing data quantity and undergoing data quality assessment.
Features identification
The reported leakage events, as summarized in Tables 3 and 4, are then analysed to identify the key leakage acoustic features, namely: (1) dominant IMF component and (2) leakage frequency range.
For each of the listed cases in Tables 3 and 4, we adopt of 5 days that encompass the reported leakage date while assuming and of 3 days and 1 day, respectively. Hence, a total of 5 profiles are generated to represent the pipe leakage condition, where each profile is obtained by considering the MNF hours of 2 am to 4 am for each assumed leak day within the defined period. The singular profile is then derived by averaging across the remaining days ranging between 1 August 2019 and 31 August 2020 (excluding the leak window dates) outside of the period, as subject to the audio files availability and data quality assessment.
Dominant IMF
- i.
The metric, as computed using Equation (1), confirms that IMF-4 has the most similar characteristics between and . As shown in Figure 13(a), most of the acoustic power values generated using IMF-4 have values greater than 99%. On the contrary, IMF-2 results in the highest data count with values less than 99%. Thus, the value of 99% for may be set as the threshold value for the subsequent near real-time detection analysis by determining if a specific IMF component, having value less than 99%, represents a hidden leak condition. Since there is a reasonable data count for IMF-3 having values less than 99%, the IMF-2 component will be compared with that of IMF-3 during the later model training and NRT analyses.
- ii.
For the metric, that is computed using Equation (2), the data distribution for IMFs 2–4 under each leak type is more scattered across the different bins of the histogram distribution analysis, as shown in Figure 13(b). This indicates that the metric is generally less useful than to differentiate between leakage and non-leakage conditions.
Leakage frequency range
Systemwide leakage event classification results
Model training and evaluation using PIMFs
Systemwide leakage event detection and classification analyses are then performed for the historical leakage events reported between 1 August 2019 and 31 August 2020 by using 100–750 Hz range, coupled with either the IMF-2 or IMF-3 component. In both modelled WDNs, the 13-month period is first divided into 75% for model training (T) and the remaining 25% to emulate the near real-time (NRT) context for the detection analysis. Readers are referred to Table 6 which summarizes the details for the model's T and NRT analyses in the respective zones.
Detail . | Zone-1 . | Zone-2 . |
---|---|---|
Period for model training (T) | 1 August 2019–15 May 2020 | |
Period for near real-time (NRT) analysis | 16 May 2020–31 August 2020 | |
Total historical leakage events (T phase) | 28 | 148 |
Total historical leakage events (NRT) | 10 | 50 |
Total quantitya of .WAV files (2 am–4 am, T phase) | 14,051 | 16,759 |
Total quantitya of .WAV files (2 am–4 am, NRT phase) | 7,683 | 8,759 |
Detail . | Zone-1 . | Zone-2 . |
---|---|---|
Period for model training (T) | 1 August 2019–15 May 2020 | |
Period for near real-time (NRT) analysis | 16 May 2020–31 August 2020 | |
Total historical leakage events (T phase) | 28 | 148 |
Total historical leakage events (NRT) | 10 | 50 |
Total quantitya of .WAV files (2 am–4 am, T phase) | 14,051 | 16,759 |
Total quantitya of .WAV files (2 am–4 am, NRT phase) | 7,683 | 8,759 |
aAfter accounting for missing data files and undergoing data quality assessment.
Using IMF-2 and IMF-3 with the common frequency range of 100–750 Hz, we further iterate the model parameters of , m, , and , while maintaining . On the daily basis, the model collates the detected outliers between 2 am and 4 am across all acoustic sensor stations in the respective zones. The resulting system cluster size is then represented by the respective sum of all detected outliers over a period until any of BC-1 or BC-2 is not met for the model's T phase, followed by checking against AC and TGC criteria (see Figure 11).
In Zone-1 (see Table 7 and Figure 15), the values of 2.3, 10, 3, and 1 for m, , , and model parameters using the IMF-2 component result in the highest possible balance between the model's T and NRT phases by obtaining the respective values of 74.0 and 72.3%, while only having a maximum cluster period of 8 days between 21 February 2020 and 28 February 2020. While there are other combinations of m values which can result in higher values (80–100%), those scenarios generally result in larger event cluster periods (≥14 days) which are not operationally practical.
When using IMF-3 component for Zone-1, the best values obtained from the model's T and NRT phases are less than 70% as summarized in Table 7, though the maximum cluster period obtained from the model's T phase is only 6 days. Hence, in terms of leakage detection performance, IMF-2 is likely to be a more conservative data feature choice.
Among the top 3 model configurations for Zone-1, the NRT analysis using 5 days is most useful to achieve higher values. This thus verifies our initial selection of days to detect the historical leakage events, and subsequently computing the average reference power for all stations to maximize the detection rate of all reported leaks during the model's T phase.
For a given combination of , , and , larger m values tend to reduce the model's capability to detect the leakage events due to reducing scores with increasing m values, as illustrated in Figures 15 and 16. The same outcome is also expected for increasing , , and/or values as the criteria for forming event clusters become more stringent, hence reducing the likelihood of leakage detection via temporal clustering.
For Zone-2, due to the larger quantity of historical leakage events reported, the model's resulting values exceed 80% (see Table 8) for the top 3 model configurations. While their performance scores may be considered optimal, the overall findings from Zone-2 may not be insightful for analysing other generic well-managed WDNs, which are likely to report a lower quantity of leakage events over time.
Rank . | Phase . | IMF . | Optimal, ,, . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
1 | Ta | 2 | 10, 2.3, 3, 1 | 9 | 3 | 8 | 75.0% | 73.1% | 74.0% |
NRTb | 2 | 10, 2.3, 3, 1 | 23 | 11 | 3 | 67.6% | 60.0% | 63.6% | |
NRT | 2 | 10, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | |
NRT | 2 | 10, 2.3, 3, 1 | 14 | 0 | 7 | 100% | 50.0% | 66.7% | |
T | 3 | 10, 1.3, 5, 1 | 13 | 9 | 6 | 59.1% | 80.8% | 68.3% | |
NRT | 3 | 10, 1.3, 5, 1 | 19 | 3 | 3 | 86.4% | 70.0% | 77.3% | |
NRT | 3 | 10, 1.3, 5, 1 | 10 | 0 | 5 | 100% | 50.0% | 66.7% | |
NRT | 3 | 10, 1.3, 5, 1 | 4 | 0 | 7 | 100% | 40.0% | 57.1% | |
2 | T | 2 | 3, 1.4, 5, 1 | 19 | 14 | 8 | 57.6% | 84.6% | 68.5% |
NRT | 2 | 3, 1.4, 5, 1 | 13 | 6 | 3 | 68.4% | 50.0% | 57.8% | |
NRT | 2 | 3, 1.4, 5, 1 | 8 | 0 | 5 | 100% | 50.0% | 66.7% | |
NRT | 2 | 3, 1.4, 5, 1 | 3 | 0 | 7 | 100% | 30.0% | 46.2% | |
T | 3 | 7, 2.1, 3, 1 | 13 | 11 | 9 | 54.2% | 84.6% | 66.1% | |
NRT | 3 | 7, 2.1, 3, 1 | 29 | 18 | 3 | 61.7% | 80.0% | 69.7% | |
NRT | 3 | 7, 2.1, 3, 1 | 23 | 6 | 5 | 79.3% | 70.0% | 74.4% | |
NRT | 3 | 7, 2.1, 3, 1 | 16 | 1 | 7 | 94.1% | 60.0% | 73.3% | |
3 | T | 2 | 5, 2.3, 3, 1 | 14 | 12 | 8 | 53.8% | 76.9% | 63.3% |
NRT | 2 | 5, 2.3, 3, 1 | 23 | 11 | 3 | 67.6% | 60.0% | 63.6% | |
NRT | 2 | 5, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | |
NRT | 2 | 5, 2.3, 3, 1 | 14 | 0 | 7 | 100% | 50.0% | 66.7% | |
T | 3 | 7, 2.3, 3, 1 | 12 | 11 | 8 | 52.2% | 84.6% | 64.5% | |
NRT | 3 | 7, 2.3, 3, 1 | 29 | 18 | 3 | 61.7% | 80.0% | 69.7% | |
NRT | 3 | 7, 2.3, 3, 1 | 23 | 6 | 5 | 79.3% | 70.0% | 74.4% | |
NRT | 3 | 7, 2.3, 3, 1 | 16 | 1 | 7 | 94.1% | 60.0% | 73.3% |
Rank . | Phase . | IMF . | Optimal, ,, . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
1 | Ta | 2 | 10, 2.3, 3, 1 | 9 | 3 | 8 | 75.0% | 73.1% | 74.0% |
NRTb | 2 | 10, 2.3, 3, 1 | 23 | 11 | 3 | 67.6% | 60.0% | 63.6% | |
NRT | 2 | 10, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | |
NRT | 2 | 10, 2.3, 3, 1 | 14 | 0 | 7 | 100% | 50.0% | 66.7% | |
T | 3 | 10, 1.3, 5, 1 | 13 | 9 | 6 | 59.1% | 80.8% | 68.3% | |
NRT | 3 | 10, 1.3, 5, 1 | 19 | 3 | 3 | 86.4% | 70.0% | 77.3% | |
NRT | 3 | 10, 1.3, 5, 1 | 10 | 0 | 5 | 100% | 50.0% | 66.7% | |
NRT | 3 | 10, 1.3, 5, 1 | 4 | 0 | 7 | 100% | 40.0% | 57.1% | |
2 | T | 2 | 3, 1.4, 5, 1 | 19 | 14 | 8 | 57.6% | 84.6% | 68.5% |
NRT | 2 | 3, 1.4, 5, 1 | 13 | 6 | 3 | 68.4% | 50.0% | 57.8% | |
NRT | 2 | 3, 1.4, 5, 1 | 8 | 0 | 5 | 100% | 50.0% | 66.7% | |
NRT | 2 | 3, 1.4, 5, 1 | 3 | 0 | 7 | 100% | 30.0% | 46.2% | |
T | 3 | 7, 2.1, 3, 1 | 13 | 11 | 9 | 54.2% | 84.6% | 66.1% | |
NRT | 3 | 7, 2.1, 3, 1 | 29 | 18 | 3 | 61.7% | 80.0% | 69.7% | |
NRT | 3 | 7, 2.1, 3, 1 | 23 | 6 | 5 | 79.3% | 70.0% | 74.4% | |
NRT | 3 | 7, 2.1, 3, 1 | 16 | 1 | 7 | 94.1% | 60.0% | 73.3% | |
3 | T | 2 | 5, 2.3, 3, 1 | 14 | 12 | 8 | 53.8% | 76.9% | 63.3% |
NRT | 2 | 5, 2.3, 3, 1 | 23 | 11 | 3 | 67.6% | 60.0% | 63.6% | |
NRT | 2 | 5, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | |
NRT | 2 | 5, 2.3, 3, 1 | 14 | 0 | 7 | 100% | 50.0% | 66.7% | |
T | 3 | 7, 2.3, 3, 1 | 12 | 11 | 8 | 52.2% | 84.6% | 64.5% | |
NRT | 3 | 7, 2.3, 3, 1 | 29 | 18 | 3 | 61.7% | 80.0% | 69.7% | |
NRT | 3 | 7, 2.3, 3, 1 | 23 | 6 | 5 | 79.3% | 70.0% | 74.4% | |
NRT | 3 | 7, 2.3, 3, 1 | 16 | 1 | 7 | 94.1% | 60.0% | 73.3% |
aT – Model training step (28 historical leakage events).
bNRT – near real-time analysis (10 historical leakage events).
Rank . | Phase . | IMF . | Optimal, ,, . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
1 | Ta | 2 | 3, 1.2, 7, 1 | 35 | 2 | 9 | 94.6% | 81.0% | 87.3% |
NRTb | 2 | 3, 1.2, 7, 1 | 52 | 1 | 3 | 98.1% | 83.6% | 90.3% | |
NRT | 2 | 3, 1.2, 7, 1 | 39 | 0 | 5 | 100% | 83.6% | 91.1% | |
NRT | 2 | 3, 1.2, 7, 1 | 26 | 0 | 7 | 100% | 72.1% | 83.8% | |
T | 3 | 3, 3.4, 4, 1 | 31 | 3 | 6 | 91.2% | 73.0% | 81.1% | |
NRT | 3 | 3, 3.4, 4, 1 | 29 | 0 | 3 | 100% | 65.6% | 79.2% | |
NRT | 3 | 3, 3.4, 4, 1 | 17 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 3 | 3, 3.4, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% | |
2 | T | 2 | 3, 1.0, 8, 1 | 31 | 1 | 9 | 96.9% | 72.3% | 82.8% |
NRT | 2 | 3, 1.0, 8, 1 | 34 | 1 | 3 | 97.1% | 78.7% | 86.9% | |
NRT | 2 | 3, 1.0, 8, 1 | 19 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 2 | 3, 1.0, 8, 1 | 9 | 0 | 7 | 100% | 45.9% | 62.9% | |
T | 3 | 5, 3.2, 4, 1 | 31 | 3 | 6 | 91.2% | 72.3% | 80.6% | |
NRT | 3 | 5, 3.2, 4, 1 | 33 | 1 | 3 | 97.1% | 82.0% | 88.9% | |
NRT | 3 | 5, 3.2, 4, 1 | 18 | 0 | 5 | 100% | 65.6% | 79.2% | |
NRT | 3 | 5, 3.2, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% | |
3 | T | 2 | 3, 1.3, 7, 1 | 35 | 1 | 8 | 94.6% | 73.0% | 82.4% |
NRT | 2 | 3, 1.3, 7, 1 | 42 | 1 | 3 | 97.7% | 78.7% | 87.2% | |
NRT | 2 | 3, 1.3, 7, 1 | 27 | 0 | 5 | 100% | 73.8% | 84.9% | |
NRT | 2 | 3, 1.3, 7, 1 | 16 | 0 | 7 | 100% | 49.2% | 65.9% | |
T | 3 | 5, 3.4, 4, 1 | 30 | 3 | 6 | 90.9% | 72.3% | 80.5% | |
NRT | 3 | 5, 3.4, 4, 1 | 29 | 0 | 3 | 100% | 65.6% | 79.2% | |
NRT | 3 | 5, 3.4, 4, 1 | 17 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 3 | 5, 3.4, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% |
Rank . | Phase . | IMF . | Optimal, ,, . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
1 | Ta | 2 | 3, 1.2, 7, 1 | 35 | 2 | 9 | 94.6% | 81.0% | 87.3% |
NRTb | 2 | 3, 1.2, 7, 1 | 52 | 1 | 3 | 98.1% | 83.6% | 90.3% | |
NRT | 2 | 3, 1.2, 7, 1 | 39 | 0 | 5 | 100% | 83.6% | 91.1% | |
NRT | 2 | 3, 1.2, 7, 1 | 26 | 0 | 7 | 100% | 72.1% | 83.8% | |
T | 3 | 3, 3.4, 4, 1 | 31 | 3 | 6 | 91.2% | 73.0% | 81.1% | |
NRT | 3 | 3, 3.4, 4, 1 | 29 | 0 | 3 | 100% | 65.6% | 79.2% | |
NRT | 3 | 3, 3.4, 4, 1 | 17 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 3 | 3, 3.4, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% | |
2 | T | 2 | 3, 1.0, 8, 1 | 31 | 1 | 9 | 96.9% | 72.3% | 82.8% |
NRT | 2 | 3, 1.0, 8, 1 | 34 | 1 | 3 | 97.1% | 78.7% | 86.9% | |
NRT | 2 | 3, 1.0, 8, 1 | 19 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 2 | 3, 1.0, 8, 1 | 9 | 0 | 7 | 100% | 45.9% | 62.9% | |
T | 3 | 5, 3.2, 4, 1 | 31 | 3 | 6 | 91.2% | 72.3% | 80.6% | |
NRT | 3 | 5, 3.2, 4, 1 | 33 | 1 | 3 | 97.1% | 82.0% | 88.9% | |
NRT | 3 | 5, 3.2, 4, 1 | 18 | 0 | 5 | 100% | 65.6% | 79.2% | |
NRT | 3 | 5, 3.2, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% | |
3 | T | 2 | 3, 1.3, 7, 1 | 35 | 1 | 8 | 94.6% | 73.0% | 82.4% |
NRT | 2 | 3, 1.3, 7, 1 | 42 | 1 | 3 | 97.7% | 78.7% | 87.2% | |
NRT | 2 | 3, 1.3, 7, 1 | 27 | 0 | 5 | 100% | 73.8% | 84.9% | |
NRT | 2 | 3, 1.3, 7, 1 | 16 | 0 | 7 | 100% | 49.2% | 65.9% | |
T | 3 | 5, 3.4, 4, 1 | 30 | 3 | 6 | 90.9% | 72.3% | 80.5% | |
NRT | 3 | 5, 3.4, 4, 1 | 29 | 0 | 3 | 100% | 65.6% | 79.2% | |
NRT | 3 | 5, 3.4, 4, 1 | 17 | 0 | 5 | 100% | 54.1% | 70.2% | |
NRT | 3 | 5, 3.4, 4, 1 | 8 | 0 | 7 | 100% | 45.9% | 62.9% |
aT – Model training step (148 historical leakage events).
bNRT – near real-time analysis (50 historical leakage events).
Model performance comparison using different acoustic power data features
The model performances are susbsequently compared for the respective zones using the different data features proposed. Tables 9 and 10 summarize the optimized model configurations which can attain the highest possible balance for the computed values from the combined T and NRT phases, while attaining a reasonable size in the model's T phase. Note that for all model runs is again kept at 24 h. In each of the zones, we compare the detection results obtained from using different power features, where the following observations can be derived:
with IMF-2 component can detect leakage events with more than 70.0% score, especially for Zone-1. In Table 9 for Zone-1, the resulting values from the NRT analysis using and , even with their optimal model configuration(s), are lower than that of .
For Zone-2, all power features result in optimal values for the model's T and NRT phases, as shown in Table 9. However, for the same reasons cited previously, the present results are unlikely to provide useful insights for well-managed WDNs.
The average lead-times for detecting the reported leakage events across all model configurations in Zone-1 and Zone-2 are 0.9–1.0 days and 0.6 days, respectively, prior to the actual leakage events reported by PUB's current technique in practice. At the same time, the average lead-times also appear to be closer to one another when using for both zones, as compared with that of and .
Power feature . | Phase . | Optimal, , , . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . | Avg. lead-time (days) . |
---|---|---|---|---|---|---|---|---|---|
Ta | 10, 2.3, 3, 1 | 9 | 3 | 8 | 75.0% | 73.1% | 74.0% | 0.9 | |
NRTb | 10, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | 0.7 | |
T | 10, 1.0, 13, 1 | 12 | 6 | 6 | 66.7% | 76.9% | 71.4% | 1.1 | |
NRT | 10, 1.0, 13, 1 | 0 | 0 | 3, 5, 7 | NA | 0.0% | NA | NA | |
T | 7, 1.4, 3, 1 | 11 | 20 | 7 | 35.5% | 65.4% | 46.0% | 1.2 | |
NRT | 7, 1.4, 3, 1 | 33 | 20 | 7 | 62.3% | 70.0% | 65.9% | 0.8 |
Power feature . | Phase . | Optimal, , , . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . | Avg. lead-time (days) . |
---|---|---|---|---|---|---|---|---|---|
Ta | 10, 2.3, 3, 1 | 9 | 3 | 8 | 75.0% | 73.1% | 74.0% | 0.9 | |
NRTb | 10, 2.3, 3, 1 | 20 | 2 | 5 | 90.9% | 60.0% | 72.3% | 0.7 | |
T | 10, 1.0, 13, 1 | 12 | 6 | 6 | 66.7% | 76.9% | 71.4% | 1.1 | |
NRT | 10, 1.0, 13, 1 | 0 | 0 | 3, 5, 7 | NA | 0.0% | NA | NA | |
T | 7, 1.4, 3, 1 | 11 | 20 | 7 | 35.5% | 65.4% | 46.0% | 1.2 | |
NRT | 7, 1.4, 3, 1 | 33 | 20 | 7 | 62.3% | 70.0% | 65.9% | 0.8 |
aT – Model training step (28 historical leakage events).
bNRT – near real-time analysis (10 historical leakage events).
Power feature . | Phase . | Optimal, , , . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . | Avg. lead-time (days) . |
---|---|---|---|---|---|---|---|---|---|
Ta | 3, 1.2, 7, 1 | 35 | 2 | 9 | 94.6% | 81.0% | 87.3% | 0.8 | |
NRTb | 3, 1.2, 7, 1 | 39 | 0 | 5 | 100% | 83.6% | 91.1% | 0.8 | |
T | 3, 1.6, 7, 1 | 39 | 4 | 9 | 90.7% | 91.2% | 91.0% | 1.2 | |
NRT | 3, 1.6, 7, 1 | 16 | 0 | 3 | 100% | 49.2% | 65.9% | 0.4 | |
T | 3, 3.8, 2, 1 | 38 | 2 | 9 | 95.0% | 83.9% | 89.1% | 0.8 | |
NRT | 3, 3.8, 2, 1 | 56 | 1 | 3 | 98.2% | 93.4% | 95.8% | 0.1 |
Power feature . | Phase . | Optimal, , , . | No. TP events . | No. FP events . | Maximum cluster period (days) . | . | . | . | Avg. lead-time (days) . |
---|---|---|---|---|---|---|---|---|---|
Ta | 3, 1.2, 7, 1 | 35 | 2 | 9 | 94.6% | 81.0% | 87.3% | 0.8 | |
NRTb | 3, 1.2, 7, 1 | 39 | 0 | 5 | 100% | 83.6% | 91.1% | 0.8 | |
T | 3, 1.6, 7, 1 | 39 | 4 | 9 | 90.7% | 91.2% | 91.0% | 1.2 | |
NRT | 3, 1.6, 7, 1 | 16 | 0 | 3 | 100% | 49.2% | 65.9% | 0.4 | |
T | 3, 3.8, 2, 1 | 38 | 2 | 9 | 95.0% | 83.9% | 89.1% | 0.8 | |
NRT | 3, 3.8, 2, 1 | 56 | 1 | 3 | 98.2% | 93.4% | 95.8% | 0.1 |
aT – Model training step (148 historical leakage events).
bNRT – near real-time analysis (50 historical leakage events).
DISCUSSIONS
The summarized detection results in Tables 9 and 10 highlight the level of effectiveness in our proposed acoustic feature-based approach to detect hidden leak events in both Zone-1 (industrial activities) and Zone-2 (mixture of commercial and domestic activities) with an average 70% F1-score and 1-day lead-time. To ensure that the approach can generally be extended to other WDNs having different operational configurations, we bring forth two major discussions as follows:
Overall applicability of the proposed model approach: In the context of near real-time analysis, the proposed acoustic feature-based approach can be applied by operators to detect and classify hidden leaks in the modelled WDN, even under the constraints of sparse acoustic sensors deployment. For example, on a daily basis by using the proposed protocol in Figure 11, operators will first initiate the outlier detection analysis during MNF hours (2 am–4 am) by using acoustics data at the individual station level and accumulate the total detected outliers across all detected acoustic sensors to form a single systemwide event cluster that grows over time. If the same event cluster lasts for a minimum number of consecutive days (e.g., 3 days), as demonstrated in Figures 15 and 16 for the respective zones, then the field maintenance team may proceed to investigate the likely location(s) of the detected hidden leaks in the system by searching the pipes near to the sensor station(s) which detect the outliers over the same number of consecutive days. In addition, we note that the adopted assumptions for the average reference acoustic power (e.g., minimum 30% normalized power) and acoustic frequency range (e.g., 100–750 Hz) in our current study are generally tunable by the operators, as based on their level of quantitative understanding of the modelled system's operational complexity.
Overall generality of the proposed model approach: The proposed feature-based approach should also work in all contexts having deployed hydrophone stations in the underground pipes. Due to the direct submersion of the hydrophones into the flowing waters within the underground pipes, the generated acoustic signals are expected to be independent of the pipe material that traditionally can affect the speed of sound travelling through the material. Hence, the focus is to iterate on the optimal frequency range, independent of the type of pipe material, that can best benefit the leakage detection analysis for any modelled WDN. As for pipe flow, that can be associated with the pipe diameter, we have quantitatively demonstrated, from our case study analyses, the general usefulness of using recorded acoustic signals from MNF hours (2 am–4 am) to optimize, to the best possible extent, the leak detection results. Importantly, we again note that our proposed method is generic and flexible which enables modellers to fine tune the different model parameters (e.g., reference acoustic power, leakage frequency range) for any given acoustic dataset from the modelled system to ensure adequately accurate leak detection results during the model training phase.
CONCLUSIONS
This paper presents a generalized acoustic feature-based approach to detect and classify leakage events in large-scale WDNs having permanently installed acoustic sensors in sparse locations of the large networks. The proposed approach consists of a series of systematic analytical components, namely: (1) data quality assessment; (2) data decomposition; (3) features identification; (4) model training for leakage event detection and classification; and (5) near real-time leakage detection.
In collaboration with PUB, Singapore's National Water Agency, the proposed approach is verified with large-scale WDNs in Singapore having around 1,000 km of underground water pipelines installed with 74 permanently installed hydrophones in sparse locations of the large networks. Multiple historical leakage events were found to be situated close to independent hydrophones within 450 m pipeline distance, where a common leakage frequency range of 100–750 Hz embedded within the IMF-2 data feature is found to best represent the dominant leakage acoustic characteristics in the modelled WDNs. At the subsequent systemwide leakage event classification step, our proposed temporal-based clustering analysis can detect and classify reported leakage events with an overall F1-score of more than 70%, while emulating the near real-time (NRT) analysis with an average leakage detection lead-time of 0.5 days.
ACKNOWLEDGEMENTS
This research is supported by the Singapore National Research Foundation under its Competitive Research Programme (CRP) (Water) and administered by PUB (PUB-1804-0087), Singapore’s National Water Agency.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.
REFERENCES
Author notes
Currently Software Engineer at UBS Singapore.