Abstract
Water loss from leaking pipes represents a substantial loss of revenue as well as environmental and public health concerns. Leak location is normally identified by placing sensors either side of the leak and recording and analysing the leak noise. The leak noise contains information about the leak's characteristics, including its shape. Whilst a tool which non-invasively provides information about a leak's shape from the leak noise would be useful for water industry practitioners, no tool currently exists. This study evaluates the effect of various leak shapes on the vibration signal and presents a unique methodology for predicting the leak shape from the vibration signal. An innovative signal processing technique which utilises the machine learning method random forest classifiers is used in combination with a number of signal features in order to develop a leak shape prediction algorithm. The results demonstrate a robust methodology for predicting leak shape at several leak flow rates within several backfill types, providing a useful tool for water companies to assess leak repair based on leak shape.
INTRODUCTION


Although pressure management provides a useful technique for reducing leakage levels, the only way to completely remove water losses occurring due to the presence of leaks is by locating and repairing leaks. A common method to do this is through leak noise correlation (Puust et al. 2010). As water discharges from a leak, turbulence around the leak hole is created which transmits a signal in the form of vibration and acoustic waves, along the pipe wall and through the fluid (Papastefanou 2011). This is known as a leaks vibro-acoustic emission (VAE) signal. Sensors (usually accelerometers or hydrophones) are placed either side of the leak, recording and analysing the leak's VAE. The signals are then cross-correlated in order to identify the location of the leak. It has been noted that a number of factors influence a leak signal including the pipe material, backfill, flow rate and leak shape (Hunaidi & Chu 1999; Muggleton & Brennan 2004; Pal 2008; Butterfield et al. 2017a, 2017b).
Leaks commonly studied in pipelines in terms of VAE signals are normally one of three types: round holes (Wassaf et al. 1985; Pal 2008; Butterfield et al. 2016a, 2016b, 2017a, 2017b); artificial leaks from fire hydrants (Almeida et al. 2014; Gao et al. 2015); and slits (Wassaf et al. 1985; Pal 2008). However, a wider variety of leak shapes can occur in WDS, including pin holes, joint leaks, circumferential slits and longitudinal slits (UKWIR 2008), amongst others. Although there is a large variety of leak shapes in pipelines, which is likely to influence the accuracy of leak noise correlation, there has been limited study investigating the effects of the leak shape on the VAE signal. Brunner & Barbezat (2007) simulated round holes of different diameters and subsequently found differences between the power spectra of different leak sizes. Wassaf et al. (1985) compared the acoustic emission responses of circular holes and rectangular slits, showing that the shape influenced the signals frequency spectrum. The use of standpipes to create artificial leaks (Hunaidi & Chu 1999; Ferrante et al. 2011; Butterfield et al. 2017a, 2017b) is also common in the literature, but these are not representative of real leaks in WDS. Pal (2008) compared the signals of artificial leaks created by fire hydrants, with joint leaks made by loosening the nuts on a flange plate and split leaks. They found that the frequency spectrum of a leak was strongly governed by its shape and size. However, the leak flow rates were not controlled in this study and therefore were different for each leak shape. As the leak flow rate has been shown to have such a strong influence on the leak signal (Butterfield et al. 2017a, 2017b), any assessment of a leak shape requires a good experimental methodology which controls the leak flow rate between shapes and thus isolating the effect of leak shape on the leak signal.
Although the aforementioned studies provide a useful insight into the effect of leak shapes on the VAE signal, the leak shapes studied are uncommon on plastic pipes in real WDS (Water Services Association of Australia 2012) and therefore have limited representation of real leaks in real WDS. The majority of leaks in plastic pipes actually occur due to leaky joints (Water Services Association of Australia 2012; Tayefi 2014), which is typically due to contamination of the joint when the pipe is being installed. This is especially true with electrofusion and butt fusion joints (Tayefi 2014). However, there have been no studies investigating the VAE signal produced by leaky electrofusion joints in laboratory conditions. This highlights a significant research gap as leaks representative of ‘real leaks’ in plastic pipe are seldom compared and there has been little comparison of leak shapes.
A study by Sun et al. (2016) analysed the effect of varying leak areas for round holes in gas pipes. Their study went one step further showing that the area of the hole could be predicted from the VAE signal in combination with Support Vector Machine (SVM). Although they only investigated round holes, the classification of a leak shape is a useful tool. The ability to distinguish between leak shapes would allow prioritisation of leak repair by fixing those leaks more likely to grow. Despite the potential shown in gas pipes by Sun et al. (2016), a method to identify leak shapes with VAE in WDS does not currently exist. In this paper a novel medium density polyethylene (MDPE) pipe rig was developed and used to simulate various leak shapes at different sizes and flow rates. A comparison of the leak signals is made and then a methodology for predicting the leak shape from only the VAE signal in combination with a random forest machine learning model is presented. This paper therefore presents a new method for predicting leak shape in WDS from vibration signals.
LEAK SHAPE PREDICTION
Due to the wide variety of variables influencing a leak signal, a machine learning approach to leak shape prediction may provide the most robust results. Any model relies on the derivation of a number of signal features which describe the leak and therefore enable the prediction of leak shape. Leak VAE signals have been shown to differ in both time and frequency domains (Ahadi & Bakhtiar 2010; Butterfield et al. 2016a, 2016b) and therefore time-frequency based features could provide useful features. Feature extraction methods such as the wavelet transform and empirical mode decomposition (EMD) provide good time-frequency resolution. EMD derives the intrinsic mode functions (IMFs) through the following procedure:
- 1.
Locate signal extrema
.
- 2.
Calculate upper and low envelope connecting the minima (cf. minima) and maxima (cf. maxima),
(cf.
by interpolating (spline interpolation).
- 3.
Calculate mean between lower and upper envelopes,
.
- 4.
Subtract mean obtaining modulated oscillation [24],
.
- 5.
Apply stopping criteria (Mandic et al. 2013). If
satisfies stopping criteria, let
become
.
- 6.
Subtract the new IMF from signal
, so
.
- 7.
Sift until IMF calculated in step 5 becomes a monotonic function.
where indicates Gaussian noise and
represents the leak signal.
The RMS of the raw time-domain signal was described by Butterfield et al. (2017a, 2017b) to correlate well with increasing leak flow rate. Shannon entropy following local mean decomposition was used by Sheng et al. (2016) to predict bearing condition and Sun et al. (2016) used Shannon entropy of individual IMFs in order to predict leak aperture in gas pipes. The maximum and mean dB of a signal's power spectral density was used by Chen et al. (2007) to describe leak flow rate in gas pipes. Prime & Shevitz (1996) found that the fundamental frequency varied due to cracks in beams. Spectral shape methods such as kurtosis, skewness (Kakur & Jurecka n.d.) and Crest factor (Pachoud et al. 2009) as well as signal descriptors such as the standard deviation and signal power (Picone 1993) have been used in speech and audio recognition/detection type problems and have shown to be useful features. A similar set of features was used by Butterfield et al. (2017a, 2017b) for the quantification of leak flow rates in plastic pipe.
Crucial to the task of predicting leak shape is use of pattern recognition algorithms. Random forest (models) have been shown encouraging results in identifying patterns in speech (Su et al. 2007) and a similar approach may recognise patterns in leak VAE signals. Random forest (RF) models are a machine learning method that can be considered as an ensemble of many decision trees, where each decision tree is trained to optimally split the data into separate classes (Breiman 2001). Each decision tree is provided with a random subset of the data for training and identifies the best split between classes based on a small subset of features. Therefore each individual decision tree alone is a weak classifier, however good performance, scalability and generalisation can be obtained by combining all decision trees in the ensemble. Class prediction on unseen test data is then obtained from a majority vote from the ensemble. A probability can also be calculated of how certain the model is in the prediction.
METHODOLOGY
Experimental setup
The state-of-the-art LiVE (leaks in viscoelastics) pipe rig at the University of Sheffield, UK is used in this study. The rig consists of a 26 m long MDPE, 63 mm diameter pipe loop. A schematic of the pipe rig is shown in Figure 1. Water is supplied to the pipe rig using a 3.5 kW variable speed pump (Wilo, Burton-on-Trent, UK) from an upstream reservoir (0.95 m3 by volume). Water passes a magnetic flow meter (Flow Systems 91DE) recording system flow rate. System pressure is measured with two pressure sensors (Gems Plainville 2200) located upstream and downstream of the leak, recording at a sample rate of 2,000 Hz.
Schematic of the pipe rig (not to scale). Adapted from Butterfield et al. (2016a, 2016b).
Schematic of the pipe rig (not to scale). Adapted from Butterfield et al. (2016a, 2016b).
A 5.5 m long ‘test section’ is located in the middle of the pipe rig (indicated between points e and g in Figure 1). This section of pipe is removable at two flange plates and was used in order to create leaks of different shapes and sizes. Round holes measuring various diameters and longitudinal slits were drilled using standard drill bits. Two leaky electrofusion joints of different sizes were created by excavating a small void from the pipe wall before welding two pipe sections together. This ensured that a gap measuring the size of the void was present and water could discharge through the remaining hole. A schematic is shown in Figure 2 in order to illustrate this process. The leak shapes were all sized in order to have equivalent leak areas of ∼10, ∼16, ∼24 and ∼32 mm2. Details of the exact dimensions of the leak shapes investigated are shown in Table 1 along with the total number of simulations per leak flow rate. Ranging in area size, a total of four round holes, four longitudinal slits and two leaky electrofusion joints were tested. Each leak shape was drilled into a separate length of straight pipe (measuring 5.5 m in length) and inserted into the ‘test section’. Photographs of the leak shapes are shown in Figure 3.
Leak shape and sizes
Area (mm2) . | Flow (L/min) . | # of Slit samples . | # of Round samples . | # of Electro samples . | Total samplesa . |
---|---|---|---|---|---|
9.62–10 | 39 | 20 | 20 | 0 | 40 |
44 | 20 | 20 | 0 | 40 | |
47 | 20 | 20 | 0 | 40 | |
49 | 20 | 20 | 0 | 40 | |
56 | 0 | 0 | 0 | 0 | |
15.9–16 | 39 | 20 | 20 | 20 | 60 |
44 | 20 | 20 | 20 | 60 | |
47 | 20 | 20 | 20 | 60 | |
49 | 20 | 20 | 20 | 60 | |
56 | 20 | 20 | 20 | 60 | |
23.76–24 | 39 | 20 | 20 | 0 | 40 |
44 | 20 | 20 | 0 | 40 | |
47 | 20 | 20 | 0 | 40 | |
49 | 20 | 20 | 0 | 40 | |
56 | 20 | 20 | 0 | 40 | |
32–33.18 | 39 | 20 | 20 | 20 | 60 |
44 | 20 | 20 | 20 | 60 | |
47 | 20 | 20 | 20 | 60 | |
49 | 20 | 20 | 20 | 60 | |
56 | 20 | 20 | 20 | 60 | |
Totals | 380 | 380 | 200 | 960 |
Area (mm2) . | Flow (L/min) . | # of Slit samples . | # of Round samples . | # of Electro samples . | Total samplesa . |
---|---|---|---|---|---|
9.62–10 | 39 | 20 | 20 | 0 | 40 |
44 | 20 | 20 | 0 | 40 | |
47 | 20 | 20 | 0 | 40 | |
49 | 20 | 20 | 0 | 40 | |
56 | 0 | 0 | 0 | 0 | |
15.9–16 | 39 | 20 | 20 | 20 | 60 |
44 | 20 | 20 | 20 | 60 | |
47 | 20 | 20 | 20 | 60 | |
49 | 20 | 20 | 20 | 60 | |
56 | 20 | 20 | 20 | 60 | |
23.76–24 | 39 | 20 | 20 | 0 | 40 |
44 | 20 | 20 | 0 | 40 | |
47 | 20 | 20 | 0 | 40 | |
49 | 20 | 20 | 0 | 40 | |
56 | 20 | 20 | 0 | 40 | |
32–33.18 | 39 | 20 | 20 | 20 | 60 |
44 | 20 | 20 | 20 | 60 | |
47 | 20 | 20 | 20 | 60 | |
49 | 20 | 20 | 20 | 60 | |
56 | 20 | 20 | 20 | 60 | |
Totals | 380 | 380 | 200 | 960 |
aThe number of samples for all leak shapes.
Demonstration of the void created before welding the electrofusion joint.
Photographs of leaks: (a) round hole; (b) longitudinal slit; (c) electrofusion joint.
Photographs of leaks: (a) round hole; (b) longitudinal slit; (c) electrofusion joint.
The leaks discharged into a 0.5 × 0.5 × 0.5 m cubic box. In order to simulate the influence of backfill type on the leak signal, the backfill type was changed for each test. The cubic box was filled with 5–12 mm diameter pea gravel backfill in accordance with British Standards for backfill of plastic pipe (BSI 1973) and therefore represents a standard external porous media. The alternative backfill types were geotextile fabric and completely submerging the pipe in water. The geotextile fabric was also used by Fox (2016) and represents a constrained external porous media. Photographs of the external media types are shown in Figure 4. Five different leak flow rates (approximately 39–40, 44–45, 47–48, 49–51 and 56–57 L/min were used in the simulations. These were kept the same across all leak shapes and area sizes by controlling system pressure with the downstream gate valve. Therefore, each leak shape is assessed on three backfill types at five different leak flow rates. The pump was run continuously whilst acquiring data for each simulation, but was turned off when replacing the test section and varying backfill. The wave speed in the pipe rig is estimated to be 347 m/s using theoretical calculations (Almeida et al. 2014).
Photographs of leaks in different backfill: (a) gravel media; (b) submerged; (c) geotextile fabric.
Photographs of leaks in different backfill: (a) gravel media; (b) submerged; (c) geotextile fabric.
Signal processing
The leak's VAE signal was recorded at 2,500 Hz using an accelerometer (PCB Piezotronics 393B12, sensitivity 10 V/g) placed approximately 30 cm away from the leak. This sensor was powered by a current source power unit (Dytran Instruments type 4102C). Signals were digitised using a data acquisition unit (National Instruments cDAQ) and imported into Matlab. Signals were then pre-processed with a 4th Order Butterworth bandpass filter set at 10–1,000 Hz. Each signal sample was measured with the accelerometer for 5 seconds.
Due to physical phenomena within the pipe rig, 20 samples from each simulated leak shape, area size and leak flow rate were taken to examine the variation between samples. It was found that there was little variation between samples, suggesting repeatable results. Table 1 shows the aggregate of the number of samples and simulations carried out over all the flow rates. The smallest of the area sizes for each leak shape has fewer simulations as they were too small for the highest flow rate of 56–57 L/min to be achieved. Table 2 lists the features extracted from each leak VAE signal used in this paper.
Derived features from the VAE signal
Feature no. . | Name . |
---|---|
1–6 | RMS of IMFs1-6 |
7–12 | Shannon entropy of IMFs 1–6 |
13 | Shannon entropy of whole signal |
14 | RMS of whole signal |
15 | Mean dB of power spectral density |
16 | Maximum dB of power spectral density |
17 | Minimum dB of power spectral density |
18 | Standard deviation |
19 | Signal power |
20 | Fundamental frequency |
21 | Spectral flux |
22 | Kurtosis |
23 | Skewness |
24 | Crest factor |
Feature no. . | Name . |
---|---|
1–6 | RMS of IMFs1-6 |
7–12 | Shannon entropy of IMFs 1–6 |
13 | Shannon entropy of whole signal |
14 | RMS of whole signal |
15 | Mean dB of power spectral density |
16 | Maximum dB of power spectral density |
17 | Minimum dB of power spectral density |
18 | Standard deviation |
19 | Signal power |
20 | Fundamental frequency |
21 | Spectral flux |
22 | Kurtosis |
23 | Skewness |
24 | Crest factor |
The RF in this paper consists of 1,000 decision trees and the entropy splitting criterion was used to measure the quality of a split. The number of features considered when looking for the best decision tree split and the maximum depth of each decision tree was determined by hyperparameter tuning using 5-fold cross-validation on the training data.
Another five-fold cross-validation was used for splitting the training and test datasets, also known as nested cross-validation. In this outer five-fold cross-validation data are split into five equally sized sections and the model is trained on four sections (i.e. 80% of the data) and tested on the remainder section (i.e. 20%). Four additional identical models are each independently trained using a sum of four sections worth of data but each model is tested using a different remaining final section. This way all the data were used in training and testing but no individual model is trained on its testing data. The average test accuracy of all five model results in an accuracy score that is almost completely unbiased of how the data were split up into training and testing sets (Varma & Simon 2006). A process flow diagram of the whole experimental, signal processing and machine learning programme is shown in Figure 5.
RESULTS
Characteristics of leaks with different shapes
VAE signals from the three different leak shapes of equivalent leak area are plotted in Figure 6 at ∼40 L/min in the frequency domain as a representation of the ratio of leak:no-leak in order to fully demonstrate the contribution of the leak shape to the leak signal. The contribution of the leak noise to the received signal is at frequencies >63 Hz for all leak shapes. The leak signal has a wide spectral range which differs depending on leak shape. The highest amplitude signals appear between 250 and 570 Hz for all leak shapes. Between frequencies 200 and 300 Hz, leak signals are very similar for all leak shapes. The round holes tended to have a slightly wider spectral range compared to the other two leak shapes. The electrofusion joint had the lowest amplitude signals, whereas the round holes were consistently the highest amplitude at frequencies >250 Hz. The round hole and longitudinal slit observed a steady decline in amplitude at frequencies >570 Hz, although this was more rapid for the longitudinal slit. The electrofusion joint, however, decreases amplitude rapidly at frequencies >430 Hz. The electrofusion joint and longitudinal slit actually became markedly similar at frequencies of 560 Hz.
Ratio of leak:no-leak for comparing different leak shapes at ∼40 L/min leak flow rate.
Ratio of leak:no-leak for comparing different leak shapes at ∼40 L/min leak flow rate.
Feature extraction of leak signals
The signal processing method involved the use of 24 different features which were extracted from the signal in time and frequency domains. All leak shapes and sizes were decomposed via EEMD generating individual IMFs. These IMFs were transposed in to the frequency domain via the Fourier transform for comparative purposes and the frequency spectrums of the first six IMFs are presented in Figure 7.
Comparison of leak shapes at individual IMFs following EEMD. All leak flow rates are set between 47 and 49 L/min and equivalent hole area is 32–33.18 mm2.
Comparison of leak shapes at individual IMFs following EEMD. All leak flow rates are set between 47 and 49 L/min and equivalent hole area is 32–33.18 mm2.
All IMFs represent signals of low frequency (<800 Hz). The highest frequency signals within this range are located within the first IMF (IMF1), but this was distinctly low amplitude. Frequency decreases as IMF number increases. The comparison between leak shapes reveals differing frequency distributions across different IMFs depending on leak shape. IMF1 is mainly dominated by signals from the round hole which has the widest spectral band of all the IMFs, the electrofusion joint is largely within IMFs 2, 3 and 4 whilst the longitudinal slit is dominant in IMF5 and 6. The electrofusion joint appears to have power in IMF2, 3 and 4 whilst the longitudinal slit has more power in IMF5. It is therefore possible that an analysis of the IMFs could provide information on the leak shape.
Classifying leak shape
Shape classification
Only the 24 features derived from the accelerometer signal were used as inputs to each model, e.g. the models were not told the leak shape or the leak flow rate. All leak shapes were initially divided up by their leak area, creating five separate datasets: 10, 16, 24 and 33 mm2; and All (all leak areas used and the model not told the leak area size). As the leak shape corresponds to a matching leak area of another shape, the effect of leak shape is isolated. As electrofusion joint data were only available for datasets of leak areas 16 and 33 mm2, the datasets 10 and 24 mm2 contained fewer input data.
The performances of the models in classifying leak shape by individual areas is shown in the form of confusion matrixes in Figure 8. The confusion matrices are further divided by leak area size, the 10 mm2 dataset (Figure 8(a)), 16 mm2 dataset (Figure 8(b)), 24 mm2 dataset (Figure 8(c)), 33 mm2 dataset (Figure 8(d)) and All leak area sizes (Figure 8(e)). The model was able to classify leak shape for all areas at all leak flow rates within all backfill types to >81%. Generally, it appeared that classification accuracy was notably higher for round holes compared to the other leak shapes. In the case of the 24 mm2, the model correctly classified 99% accuracy. An investigation into the ‘All’ dataset (Figure 8(e)) suggests that the round holes also achieved the highest prediction accuracy at 90%, followed by the electrofusion joint at 81%. The worst performance in this dataset was the longitudinal slit with a classification accuracy of 75%.
Figure 8(e) demonstrates the average classification accuracy for each leak area, each leak shape and each leak flow rate when using the ‘All’ dataset. Note, this is not the averaged-output of the model in Figure 8, it is the individual subset results for the model using the ‘All’ dataset. An investigation into the performance of the model for different leak areas within the ‘All’ dataset shows increased average classification accuracy with the 24 mm2 dataset (98%) (Table 3). However, this may be due to a smaller dataset as electrofusion joints are not included. Despite the fact that there were five different leak flow rates studied, the model was able to classify the leak shape independent of leak flow rate at >82% accuracy for all leak shapes at all leak areas.
Classification accuracies of the ‘All’ dataset
Dataset (mm2) . | Training classification accuracy (%) . | Testing classification accuracy (%) . |
---|---|---|
10 | 93 | 87 |
16 | 94 | 90 |
24 | 100 | 98 |
33 | 97 | 90 |
All | 91 | 82 |
Dataset (mm2) . | Training classification accuracy (%) . | Testing classification accuracy (%) . |
---|---|---|
10 | 93 | 87 |
16 | 94 | 90 |
24 | 100 | 98 |
33 | 97 | 90 |
All | 91 | 82 |
Figure 9 demonstrates the breakdown of the performance of the RF model using the ‘All’ dataset for each leak shape by leak area (Figure 9(a)), leak flow rate (Figure 9(b)) and backfill type (Figure 9(c)). For all leak areas studied, shape classification accuracy is greatest for round holes (>90% accuracy) (Figure 9(a)). The model performs well at 10 and 24 mm2 (>85%), but again this may be due to the fact that these leak areas do not include the electrofusion joint data and therefore there is a smaller dataset. The breakdown of individual leak flow rates for the ‘All’ dataset shows that there is no observable trend between leak flow rate and shape prediction accuracy (Figure 9(b)). However, the round holes tended to have greater consistency in prediction accuracy at all flow rates. The electrofusion joint performed comparably to the round hole at the lower flow rates, whilst accuracy dropped to <68% at the higher leak flow rates. Generally, prediction of slits shape was similar at all leak flow rates, between 68% and 78%. The breakdown of the performance of the models within individual backfill types shows that at the lowest leak flow rates, classification of leak shape performed better on the gravel media (Figure 9(c)). However, prediction accuracy was improved under geotextile fabric at the highest leak flow rates. The submerged backfill tended to perform worst at the mid to high range leak flow rates.
Classification accuracy by different subsets: (a) leak shape by shape area; (b) leak shape by leakage rate; (c) leakage rate by media type.
Classification accuracy by different subsets: (a) leak shape by shape area; (b) leak shape by leakage rate; (c) leakage rate by media type.
Feature importance
Figure 10 ranks feature importance of the RF model by ranking the use of the 24 different features input into the model. This is broken down by the ‘All’ dataset into individual leak areas and all leak areas. It was found that the most important feature to the model was the RMS of IMF1. This was true no matter what the leak area and the leaks shape. The model also found other features useful, however the degree to which the model found these features important depended on whether the model was predicting based on individual leak areas of the ‘All’ dataset.
Effect of different backfill types on classification accuracy
In real WDS, a variety of backfill types exist and the backfill has been shown to influence the leak signal. The effect of backfill type on model performance was evaluated by training and testing on individual backfill types rather than all backfill types. The results for this are shown in Table 4. Evidently, backfill type has a large impact on the performance of leak shape prediction. Overall, training on only one type of backfill and testing the model on a separate backfill type had a largely negative impact on model performance. The worst performance appeared to be training the model on gravel but then testing on submerged data. Therefore, either backfill type should be known or the model needs to be trained and tested on more backfill types.
Model accuracy on non-trained media
Trained on . | Tested on . | Testing accuracy (%) . |
---|---|---|
Gravel only | Geotextile | 41 |
Submerged | 21 | |
Geotextile only | Gravel | 36 |
Submerged | 32 | |
Submerged only | Gravel | 25 |
Geotextile | 46 |
Trained on . | Tested on . | Testing accuracy (%) . |
---|---|---|
Gravel only | Geotextile | 41 |
Submerged | 21 | |
Geotextile only | Gravel | 36 |
Submerged | 32 | |
Submerged only | Gravel | 25 |
Geotextile | 46 |
DISCUSSION
Influence of leak shape on leak signal characteristics
Leak shape was found to influence leak signal amplitude and frequency when the leak flow rate was isolated by controlling system pressure (Figure 6). Although there has been limited investigation in the literature, this study is coherent with existing literature study that the leak shape influences the leak signal (Pal 2008). However, across the whole shape spectrum (Figure 6), it is difficult to separate leak shapes based on frequency alone although this may be possible if just comparing electrofusion joints and round holes. Differences in the spectrum of leak shape were also identified when the signal was divided into different frequency bands using EEMD, where the power of the signal in each IMF differed depending on each leak shape (Figure 7). This appears to suggest that leak signals differ in both time and frequency domains, also shown by previous authors (Ahadi & Bakhtiar 2010; Butterfield et al. 2016a, 2016b). Differing signal spectra due to changes in leak shape may be due to varying jet angles (Ferrante et al. 2013) as the leak discharges the hole. In turn, this will likely create varying turbulence regimes around the leak hole specific for that leak shape, where the signal is created (Papastefanou 2011). Therefore, this study has experimentally determined that leak shape is another key variable in determining leakage behaviour in addition to leak flow rate and leak area (Cassa & Van Zyl 2011; Ferrante 2012) and these results can better inform the design of leak noise correlators.
Model performance and feature importance
The model presented herein demonstrates that it is possible to predict leak shape to high classification accuracy at all the leak flow rates and within all the backfill types studied (>80%). The use of this model provides practitioners with a tool to predict leak shape. Due to the fact that certain leak shapes have time- and pressure-dependent growth (Ferrante 2012; Fox 2016), knowledge of the leak shape will allow for prioritisation of leak repair. Moreover, the tool provides an opportunity for water companies to collect more data about the shapes of leaks, and thus this information can be linked to further parameters (such as pipe failure mode).
The RMS of IMF1 was found to be the most important feature when classifying leak shape (Figure 10). The Fourier spectrum of this feature demonstrates that the amplitude within this IMF is dependent on leak shape (Figure 7). Further investigation into this feature is demonstrated in Figure 11, showing the individual RMS of IMF1 for each leak shape and leak area. For all leak shapes, it appears generally possible to distinguish between all shapes (of all leak area) individually using this feature. However, in some cases it becomes difficult due to overlap between leak shapes, especially for the electrofusion joint at 32 mm2. This highlights the necessity for a machine learning based tool to provide the best separation between signal features. Leak signal RMS has previously been shown to correlate well with an increase leak flow rate (Chen et al. 2007; Kaewwaewnoi et al. 2007; Papastefanou 2011) and therefore describes information about the leak signal. It was also found by Sun et al. (2016) that the RMS of each IMF could provide a good descriptor of the leak area. It is therefore logical that the IMF most related to the leak signal and the RMS of this would provide a good method of classifying the leak signal. However, this feature represents the higher frequency content (350–650 Hz) and when measuring further away from the leak these frequency bands would normally be attenuated due to the pipe acting like a low pass filter (de Almeida et al. 2015). Therefore, this feature will become less effective when moving sensors further away from the leak.
RMS of IMF1 for all leak shapes of different leak areas at 44.7 L/min. RH = round hole; LS = longitudinal slit; and EJ = electrofusion joint.
RMS of IMF1 for all leak shapes of different leak areas at 44.7 L/min. RH = round hole; LS = longitudinal slit; and EJ = electrofusion joint.
Influence of backfill type
The effect of pipe backfill was explored by altering the parameters of the model, training and test on differing combinations of backfill type (Table 4). Backfill type was found to strongly influence the accuracy of the RF model, most likely because the leak signal is strongly influenced by the surrounding backfill (Muggleton & Brennan 2004; Butterfield et al. 2016a, 2016b). Further investigation into the effect of backfill on the leak signal is shown in Figure 12 when the leak flow rate is consistent between backfill types (∼47 L/min is shown). Similar frequency components between ∼250 and 570 Hz were found for all backfill types. After 570 Hz, there was a drop in signal amplitude for all backfill types and this is in agreement with the results shown in Figure 12. However, the gravel backfill remained at a higher amplitude at frequencies >570 Hz, followed by geotextile and then submerged backfill types. Both the gravel and geotextile fabric are similar in that they are a constrained media type (Fox 2016), which will have an impact on the water jet as it leaves the leak hole. However, as is not possible to achieve good prediction results when training on gravel but testing on geotextile fabric, these results suggest that these media types play a differing role in impacting the generation or transmission of the leak noise. The submerged pipe had the lowest amplitude signals (Figure 12) but only at frequencies >570 Hz. A similar effect of backfill was noted by Muggleton & Brennan (2004) who found smaller signal attenuation in the submerged pipe compared to a pipe in soil, however there was a disappearance of the leak signal in the submerged pipe. These results demonstrate that the model is more robust when trained on more backfill types, and future work should include a wider range of conditions such as soil saturation, fluidisation and further types which have all been shown to influence leak-media hydraulics (van Zyl et al. 2013; Fox 2016). Moreover, the impact of these aforementioned variables is still a major research gap in terms of leak VAE signals.
Leak spectrum of a 12 × 2 mm longitudinal slit at 47 L/min leak flow rate under three media types compared to a no-leak scenario.
Leak spectrum of a 12 × 2 mm longitudinal slit at 47 L/min leak flow rate under three media types compared to a no-leak scenario.
Study limitations
Whilst the tool provided herein provides a tool for water companies to prioritise leak repair, it is not without limitations. A key weakness of this study is the fact that leak shape prediction was undertaken approximately close to the leak. In real-world conditions, it is unlikely that any measurements would take place within such close proximity to the leak. In fact, accelerometers/hydrophones are normally placed on or in nearby fittings, such as valves and hydrants at some distance away. The value of the model has not been tested at further distances, and this remains a key element of future work. However, a number of developments exist in other areas such as pipeline robotics (Chatzigeorgiou et al. 2014), which will allow a sensor to travel to a position next to a leak and the system developed here can possibly be integrated into these tools. This study also only addresses a limited number of leak shapes, flow rates, sizes and shapes under only three different backfill types. In real systems, the variety of leaks under varying conditions is huge, and therefore the validity of this system to real world leaks is not known. However, the study has shown that it is possible to differentiate between the leak shapes studied, independent of leak flow rate, leak area and backfill type.
CONCLUSIONS
The research presented herein has demonstrated that the leak's VAE signal contains enough information within it to predict the leak shape. A unique experimental investigation used high quality experimental data from various leak shapes of several leak areas at five leak flow rates. Leak shape was found to be a significant factor influencing the leak signal. Twenty-four features were derived from the leak signal and in combination with a random forest model it was possible to predict leak shape to a relatively high accuracy. It was also found that the external backfill had a strong impact on the classification accuracy, but training on all backfill types provided a more robust tool with higher classification accuracy. While the variety of leaks under varying conditions is huge, and therefore the validity of this system to real world leaks is not known, the proposed technique provided in this paper demonstrates that it is possible to predict the leak shape independent of leak area, leak flow rate and backfill type. Therefore, this investigative study is the first to demonstrate that there is enough information within a leak signal in order to predict the leak shape.
ACKNOWLEDGEMENTS
The authors would like to acknowledge and thank Northumbrian Water, Severn Trent Water, Thames Water Utilities, Scottish Water and the EPSRC under grant number EP/G037094/1 for their valued contribution to this research.