Abstract
The main purpose of this study was to investigate whether machine learning can be used to detect leak sounds in the field. A method for detecting water leaks was developed using a convolutional neural network (CNN), after taking recurrence plots and visualising the time series as input data. In collaboration with a pipeline restoration company, 20 acoustic datasets of leak sounds were recorded by sensors at 10 leak sites. The detection ability of the constructed CNN model was tested using the hold-out method for the 20 cases: 19 showed more than 70% accuracy, of which 15 showed more than 80%.
HIGHLIGHTS
We are introducing a next-generation leak detection technique.
We are targeting the analysis of actual leaks, not virtual.
We visualised the inherent characteristics of water leak sound.
This study introduces leak detection techniques through artificial intelligence technology.
The leak detection model proposed in this study has been proven to have sufficient reliability.
INTRODUCTION
Globally, capital expenditure on supplying drinking water was approximately US$90 billion in 2011. Almost half of this was spent on water distribution networks, including constructing new water networks and rehabilitating existing ones. According to the municipal water capex by category, capital expenditure on rehabilitation (34.5%) already exceeds that on new networks (13.0%) (Global Water Intelligence 2010). The situation is similar in Japan: of the total assets of the water supply system, water pipes account for about 65% of the economic value. Furthermore, it is estimated that 14.8% of water pipes have surpassed their durable life (40 years). The ratio of ageing pipes that need to be replaced, which was 6% in 2006, is increasing every year, while the rate of pipeline renewal has been falling steadily (Ministry of Health Labour & Welfare 2018). Furthermore, this figure is expected to exceed 20% in 10 years and 40% in 20 years.
The deterioration and ageing of water pipes are the main causes of water leaks. If such leaks are left unrepaired for a long time, sinkholes could occur due to ground loss and cause major incidents. Therefore, leaks should be discovered and fixed at an early stage. However, since water pipes are buried underground, it is difficult to locate leaks until direct damage appears on the ground.
The sound of leaking water varies with the material of the pipe, water pressure, diameter, distance and so forth. Inspectors need to become familiar with the sound propagation characteristics based on the material of the pipes and also any external noise, such as the sound of electric motors in vending machines. They also require experience to be able to estimate the location of a leak and to differentiate water leak sounds. Therefore, they often attend courses on leak detection prior to actual investigation (Ministry of Health Labour & Welfare 2006). However, this approach, which may involve listening at night when there is less noise, is burdensome for inspectors. For these reasons, it is difficult for skilled inspectors to pass down technical expertise to younger inspectors.
This paper starts with a review of previous studies on leak detection techniques used in water distribution systems (WDSs) that have been published at international congresses and by various researchers.
Khulief et al. (2012) studied the detection of leaks by measuring the leakage sound inside a water pipe. However, the technique was applicable to a single pipeline, and although the probability of detection is higher, only a limited area can be investigated at one time.
Seyoum et al. (2017) described a leak detection technique using the same principle as an app that identifies the name of a song or artist. The leak detection model in their study was divided into Feed mode and Detection mode, and the software makes the final decision in Detection mode. This is a sophisticated technique that uses software to detect leaks. However, it is designed for households, and is not suitable for detecting leaks in a place with many obstacles.
Finally, Fuad et al. (2019) reviewed multiple leak detection methods, including a sensor that measures sound intensity (amplitude), and estimating the leak location using the delay time between two sensors. However, leak detection that depends on sound intensity cannot easily distinguish between a leak sound and a similar other sound.
As indicated above, many studies use sound data to detect water leakage, but sound is not the only clue for detecting leakage. Leaks can also be inferred from changes in pressure and flow. However, pressure and flow can change greatly throughout a pipe network, and the more complex the network, the harder it is to locate the leak. Although sound is not completely free from such influences, the sound becomes clearer closer to the leak point, which is a useful characteristic. Therefore, the present study uses sound data acquired by sensors for leak detection.
When such data is acquired, it is not immediately known whether the sound is from a water leak or is background noise. Just as people listening to a saxophone or flute can distinguish the two instruments, experts must listen for any differences between water leak sounds and background noise.
In a previous study, Nam et al. (2019) investigated whether the differences between water leak sounds and background noise could be quantified and verified prior to the full-scale detection of water leaks, and considered what the differences were. It was found that the pattern of a specific sound was expressed as having deterministic properties (Fujimoto & Iokibe 2000), and it was shown that leak sounds had stronger deterministic properties than normal or background noise.
In the present study, time series data (which has deterministic properties) are obtained in the form of recurrence plots (RP), which is a two-dimensional form, and the characteristics of actual water leakage are visualised. In addition, the visualised RPs are trained using the convolutional neural network (CNN) model, which is a deep learning technology.
On the other hand, it is beyond the scope of this paper to discuss the effectiveness of cross-validation (CV) in machine learning. CV is a data resampling method to assess the generalisation ability of predictive models and to prevent overfitting (Berrar 2019). Before performing CV, which requires a lot of time and effort, it is important to conduct a basic review of whether these data are trained normally. This study used actual leakage sound data, but there has been little research on how to process and utilise such data. Therefore, this study mainly addresses some fundamental questions when applying the hold-out method in machine learning.
METHODS
How to collect actual leak sounds
This section explains the collection of experimental data, including actual leak sounds. The leak sounds recorded by sensors in the field were obtained in collaboration with a pipeline restoration company. The procedure for measuring leak sounds and background noise was as follows: (1) the company's engineer visited the place where a leak was suspected at the request of the waterworks bureau; (2) the engineer then installed leak sensors on a fire hydrant or gate valve as close as possible to the leak site; (3) the sound (leak sound) was recorded for 3 min; (4) the leak was fixed; and then (5) the sound (background noise) was re-recorded on the same day at the same location (Figure 1). The measuring device (AQUASCAN 620 L, Gutermann) used a sampling frequency of 10,000 Hz, 16-bit resolution and WAV format.
Method of measuring acoustic data (MD: measuring distance from leak point to sensor).
Method of measuring acoustic data (MD: measuring distance from leak point to sensor).
The water pipelines analysed in this study were made of five materials. Figure 2 shows the 10 leak sites in this study and each pipe material. The amount of water that leaked was measured as follows: (1) the spilled water was collected in a plastic bag for 10–60 s; (2) the amount (mass) of water collected was measured; and (3) the mass in units (L/min) was measured. (The temperature of the water was assumed to be 4 °C, at which the density of water is the maximum, 1 kg/L).
Example of field experiment: LP (lead pipe), TDP (T-type ductile iron pipe), VSP (vinyl lining steel pipe), VP (unplasticized vinyl chloride pipe), PP (polyethylene pipe).
Example of field experiment: LP (lead pipe), TDP (T-type ductile iron pipe), VSP (vinyl lining steel pipe), VP (unplasticized vinyl chloride pipe), PP (polyethylene pipe).
Visualisation of measurement sounds using a recurrence plot
The RP and the effect of RP size on the training process were as follows.
Sound visualisation by converting time series data into RP (embedding dimension m: 3, delay time τ: 1).
Sound visualisation by converting time series data into RP (embedding dimension m: 3, delay time τ: 1).
The matrix on the right in Figure 3 is the RP reconstructed from the time series data on the left. If i = 2 and j = 2, the distance matrix becomes 0 from the calculation of . This corresponds to the smallest value among the calculated values in one RP, and is displayed on the plane in black.
In this study, the size of the horizontal axis and vertical axis of one RP was 64. Ten-thousand 1-s data (with a sampling frequency of 10,000 Hz) were divided into 100, and adjusted (using bicubic interpolation) so that the size became 64 × 64 to reduce the computation time of machine learning. Consequently, the time series data of 3 min of background noise and water leak sounds became 36,000 (180 s × 100 × 2) RPs, and the number of RPs for Area 1 to 10 in Figure 2 totalled 720,000 (36,000 × 10 areas × 2 sensors). As an example, Figure 4 shows RPs that visualise background noise (left) and water leak sounds (right) using sensors A in Area 4 (m: 3, τ: 1).
The RP size used in the training process was determined as follows. The size of one RP created for the training process in the CNN model in this study is 64 × 64. The changes with RP size are as follows: (1) as the amount of time series data that can be stored in one RP increases, the total number of RPs decreases; and (2) as the time spent for writing one RP increases, the work to be learned by the CNN model decreases. The decrease in the total number of RPs also affects the batch size. To ensure stable training of the CNN model, the number of iterations must exceed a certain level. If the batch size is increased, the training process can be completed quickly, but one epoch may end before the CNN model fully learns the characteristics of the input data. When the time series data of about 3 min is converted to RPs using the model proposed in this study, it takes about 1 min. On the other hand, it took about 20 min to train and test the CNN using 64 × 64 size RPs. In addition, regardless of the decrease in the amount of data to be learned, it was found that a longer time was needed with a larger RP size. Accordingly, the size of the RP was selected to ensure that there were enough iterations while minimising the time required. Most of the samples began to recognise the characteristics of the input data based on around 100 iterations. In addition, since there was no clear rise or fall around 200 iterations, it is recommended to adjust the batch size to provide about 200 iterations.
Meanwhile, to eliminate various noise (white noise) contained in the sounds, RPs were created using a low-pass filter. Given that the frequency band effective for detecting water leaks using sensing data on water pipes was shown to be 1,500 Hz or less in a previous study (Kawamura et al. 2016), a low-pass filter that cuts frequencies over 1,500 Hz was adopted.
Model for detecting the presence of water leaks using the CNN model
The structure of the CNN model was chosen as follows. Various tools such as CNN, AutoEncoder and recurrent neural networks have been proposed. This study constructed learning models using CNN, which has demonstrated strong performance in many fields, with RPs as training data (Hatami et al. 2018). CNN is a model (Figure 5) that uses a neural network with layers called the convolution layer and the pooling layer. With CNN, the input data of the convolutional layer is called the input feature map; the output data is called the output feature map; and the input and output data together are defined as the feature map.
Two types of RPs reconstructing time series data of Area 4_sensor A in Figure 2. (The number above RP represents the time interval, e.g. 0.02–0.03 means between 0.02 and 0.03 s).
Two types of RPs reconstructing time series data of Area 4_sensor A in Figure 2. (The number above RP represents the time interval, e.g. 0.02–0.03 means between 0.02 and 0.03 s).
There were four convolutional layers in this study, and the configuration based on the shape notation is as follows: 1) first layer: input (1, 64, 64), kernel (8, 3, 3), output (8, 31, 31), stride 2; 2) second layer: input (8, 31, 31), kernel (8, 8, 3, 3), output (8, 15, 15), stride 2; 3) third layer: input (8, 15, 15), kernel (8, 8, 3, 3), output (8, 7, 7), stride 2; and 4) fourth layer: input (8, 7, 7), kernel (8, 8, 3, 3), output (8, 5, 5), stride 1 (images from the first to fourth layers are shown in Figure 6). Moreover, when the stride 2 convolution process is performed on the RP image having an odd size, no padding process is performed. (The right end and the bottom end are truncated.) Finally, global average pooling, which takes into account the entire size of the input layer, was applied.
Meanwhile, the initial kernel value of the convolutional layers of this study is randomly generated. Then, the weights are updated so as to converge on the correct answer as the training progresses. This means that even if the data used for training is the same, the results may differ each time. The difference in balanced accuracy at the end of learning is not large, but in order to increase the reliability of the results, the same process was performed five times and expressed as an average value.
The fully connected layer refers to the layer connected to all outputs through the pooling layer (Duchi et al. 2011). With a threshold based on 0.5 as the standard, the fully connected layer was designed so that inputs no larger than 0.5 are output as 0, and inputs larger than 0.5 are output as 1. On the other hand, the value throughout the fully connected layers adopts the sigmoid function as the activation function before it goes to distinction by threshold, and is the actual output of the CNN as a value between 0 and 1. Moreover, this study used the Adam model, which is an adaptive learning rate optimisation algorithm. This model computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients (Kingma & Ba 2015). The advantage of using this model to optimise parameters is that performance equivalent to other methods can be achieved without tuning any detailed hyperparameters.
On the other hand, to avoid ‘learning with future data and evaluating with past data’ for data division, 90% future data was used as training data and 10% past data was used as test data. Moreover, the hold-out method was used. This is one of the simplest among the various data resampling strategies: it randomly samples some cases from the learning set for the test set, while the remaining cases constitute the training set (Berrar 2019).
The confusion matrix in Table 1 was used to calculate the accuracy and balanced accuracy of the learning model (Deng et al. 2016). The accuracy is given by (TP + TN)/(TP + FP + FN + TN), and the balanced accuracy can be expressed by the general formula 1/2 (TP/all_P + TN/all_N) (all_P = TP + FN, all_N = FP + TN). In addition, positive in Table 1 corresponds to ‘when there is a water leak,’ while negative corresponds to ‘no water leaks.’ Balanced accuracy was used to assess the detection ability of the model.
Confusion matrix
. | Predicted class . | ||
---|---|---|---|
Positive . | Negative . | ||
Actual class | Positive | TP (true positive) | FN (false negative) |
Negative | FP (false positive) | TN (true negative) |
. | Predicted class . | ||
---|---|---|---|
Positive . | Negative . | ||
Actual class | Positive | TP (true positive) | FN (false negative) |
Negative | FP (false positive) | TN (true negative) |
RESULTS AND DISCUSSION
The RPs of background noise (no water leaks) in Figure 4 are close to white noise and their shapes tend not to have regular features (weak deterministic properties). In contrast, it can be qualitatively determined that the RPs of water leak sounds exhibit shapes with regular features, such as a mesh or honeycomb (strong deterministic properties).
On the other hand, the difference between the RPs in Figure 4 is clear and can be sufficiently distinguished without a CNN model. The leak sounds in this study were recorded by a sensor installed as close as possible to the leak point. Since the location of the leak was known in advance, the sensor could be installed where the sound could be clearly recorded. However, in practice, the exact leak point is not known, and so the sensor is likely to be placed further away. Furthermore, it is also likely to be difficult for a person to detect the characteristics of RP. Therefore, this study used the CNN model to develop a leak detection technique.
Figure 7 shows the changes in the loss function (the average of recognition error due to cross entropy) and the recognition accuracy (the average of balanced accuracy) for the learning process of ‘background noise and water leak sounds.’ It can be seen that as the number of times of learning increases, the value of the loss function decreases, and conversely, the recognition accuracy improves. This finding indicates that the parameters were updated and optimised appropriately by the learning process using the Adam model.
Changes in loss function and recognition accuracy broken down by training process.
Changes in loss function and recognition accuracy broken down by training process.
The learning process differed depending on the material of the water pipeline where the leak occurred. In Figure 7, panel (a) depicts the learning process in LP, (b) in TDP, (c) in VSP, (d) in VP and (e) in PP. With the metals used in (a) and (b), the loss function decreases rapidly and the recognition accuracy increases sharply, while the values in (c), (d) and (e), which are for non-metallic materials, decrease or increase at a much slower pace.
Moreover, the two groups, metal and non-metallic, displayed a clear difference in the speed of their learning processes, and also in the absolute value of their respective recognition accuracies. In the case of metals, some iterations showed a balanced accuracy of 80%, which is relatively close to 100%, at an early point when the iterations approached 100, but the non-metallic materials displayed a recognition accuracy close to 80% at the end of their training process. Further, in some cases, the low balanced accuracy in the 60% range was maintained, as in Area 9_B.
Of course, since the amount of leakage in Area 9 is 0.001 L/min, which is the lowest value among all areas, the low recognition accuracy is not only caused by the material of the pipeline. It was also found that the balanced accuracy of the non-metallic group, Areas 5, 6, 7 and 8, which experienced some leakage, did not exceed 90% throughout all training processes. This trend was confirmed when looking at the average balanced accuracy in the verification process in Table 2, which lists the test results when using the trained model. In addition, since the balanced accuracy in Table 2 was close to 80% at the stage of one epoch, it was judged that one epoch was sufficient for this model. Specifically, there is little variation in the sound data, so there is little merit in increasing the epoch number, or one reason not to add more.
Results of demonstration experiments in the field
Area/sensor . | Amount of water leak (L/min) . | Measuring distance (m) . | Material of water pipeline . | Diameter (mm) . | Average of balanced accuracy . |
---|---|---|---|---|---|
1/A | 2.98 | 1.8 | LP | 20 | 93.51 |
1/B | 2.98 | 12.8 | LP | 20 | 99.87 |
2/A | 20.1 | 2.5 | LP | 13 | 93.32 |
2/B | 20.1 | 23.5 | LP | 13 | 99.62 |
3/A | 34.4 | 24.3 | TDP | 150 | 80.39 |
3/B | 34.4 | 113.0 | TDP | 150 | 85.84 |
4/A | 61.1 | 21.9 | TDP | 75 | 99.43 |
4/B | 61.1 | 23.3 | TDP | 75 | 99.82 |
5/A | 0.72 | 2.8 | VSP | 25 | 70.97 |
5/B | 0.72 | 66.8 | VSP | 25 | 82.6 |
6/A | 0.87 | 4.9 | VSP | 25 | 81.9 |
6/B | 0.87 | 10.2 | VSP | 25 | 78.8 |
7/A | 0.72 | 0.8 | VP | 13 | 81.6 |
7/B | 0.72 | 13.7 | VP | 13 | 87.6 |
8/A | 27.1 | 7.6 | VP | 25 | 84.44 |
8/B | 27.1 | 25.9 | VP | 25 | 86.02 |
9/A | 0.001 | 3.1 | PP | 25 | 74.63 |
9/B | 0.001 | 12.0 | PP | 25 | 66.02 |
10/A | 1.72 | 0.4 | PP | 20 | 79.54 |
10/B | 1.72 | 18.1 | PP | 20 | 92.04 |
Area/sensor . | Amount of water leak (L/min) . | Measuring distance (m) . | Material of water pipeline . | Diameter (mm) . | Average of balanced accuracy . |
---|---|---|---|---|---|
1/A | 2.98 | 1.8 | LP | 20 | 93.51 |
1/B | 2.98 | 12.8 | LP | 20 | 99.87 |
2/A | 20.1 | 2.5 | LP | 13 | 93.32 |
2/B | 20.1 | 23.5 | LP | 13 | 99.62 |
3/A | 34.4 | 24.3 | TDP | 150 | 80.39 |
3/B | 34.4 | 113.0 | TDP | 150 | 85.84 |
4/A | 61.1 | 21.9 | TDP | 75 | 99.43 |
4/B | 61.1 | 23.3 | TDP | 75 | 99.82 |
5/A | 0.72 | 2.8 | VSP | 25 | 70.97 |
5/B | 0.72 | 66.8 | VSP | 25 | 82.6 |
6/A | 0.87 | 4.9 | VSP | 25 | 81.9 |
6/B | 0.87 | 10.2 | VSP | 25 | 78.8 |
7/A | 0.72 | 0.8 | VP | 13 | 81.6 |
7/B | 0.72 | 13.7 | VP | 13 | 87.6 |
8/A | 27.1 | 7.6 | VP | 25 | 84.44 |
8/B | 27.1 | 25.9 | VP | 25 | 86.02 |
9/A | 0.001 | 3.1 | PP | 25 | 74.63 |
9/B | 0.001 | 12.0 | PP | 25 | 66.02 |
10/A | 1.72 | 0.4 | PP | 20 | 79.54 |
10/B | 1.72 | 18.1 | PP | 20 | 92.04 |
The average of balanced accuracy in this table is the verification result using the ‘trained model’ and ‘new set of data (remaining 10%)’ in Figure 5.
CONCLUSIONS
Leak detection is crucial in the rehabilitation of water networks, and capital expenditure on maintaining WDSs is expected to keep increasing. However, with limited resources, waterworks bureaus are constantly trying to develop more efficient methods of detecting leaks. This study attempted to develop a fully automatic detection technique using machine learning by visualising the sound of actual water leaks in water pipes using RPs. A model for detecting the presence of water leaks using a CNN was constructed by using the visualised images as training data.
The major findings were as follows. (1) By expressing background noise (no water leaks) and water leak sounds using RPs, the difference in the deterministic properties of both sounds was clarified. (2) It was found that the background noise RP is white noise with shapes that tend not to have regular features, while the RPs of water leak sounds show shapes with regular features, such as a mesh or honeycomb. (3) When the detection ability of the constructed CNN model was tested using actual leaks, 19 of the 20 cases showed an accuracy of more than 70%, of which 15 showed more than 80%.
Regarding the direction of future research, the leak detection technique proposed here is not yet complete and ready for application, for the following reasons. The same data were used for training and verification. The hold-out method was used for verifying the proposed model using data that was different from the data used for training. However, it is true that all the data were generated in one location: during training, the sounds recorded in one area were also used to verify the performance of the model for the same area. Accordingly, the test method may have suffered from overfitting, where training was concentrated on only one sample. As a result, a model trained in one area may not attain the same high recognition accuracy when used in another area. This problem can be resolved by performing CV. The detection model will be improved through CV in a future study.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.