Water leak detection based on convolutional neural network using actual leak sounds and the hold-out method

The main purpose of this study was to investigate whether machine learning can be used to detect leak sounds in the ﬁ eld. A method for detecting water leaks was developed using a convolutional neural network (CNN), after taking recurrence plots and visualising the time series as input data. In collaboration with a pipeline restoration company, 20 acoustic datasets of leak sounds were recorded by sensors at 10 leak sites. The detection ability of the constructed CNN model was tested using the hold-out method for the 20 cases: 19 showed more than 70% accuracy, of which 15 showed more than 80%.


INTRODUCTION
Globally, capital expenditure on supplying drinking water was approximately US$90 billion in 2011. Almost half of this was spent on water distribution networks, including constructing new water networks and rehabilitating existing ones. According to the municipal water capex by category, capital expenditure on rehabilitation (34.5%) already exceeds that on new networks (13.0%) (Global Water Intelligence ). The situation is similar in Japan: of the total assets of the water supply system, water pipes account for about 65% of the economic value. Furthermore, it is estimated that 14.8% of water pipes have surpassed their durable life (40 years). The ratio of ageing pipes that need to be replaced, which was 6% in 2006, is increasing every year, while the rate of pipeline renewal has been falling steadily (Ministry of Health Labour & Welfare ).
Furthermore, this figure is expected to exceed 20% in 10 years and 40% in 20 years.
The deterioration and ageing of water pipes are the main causes of water leaks. If such leaks are left unrepaired for a long time, sinkholes could occur due to ground loss and cause major incidents. Therefore, leaks should be discovered and fixed at an early stage. However, since water pipes are buried underground, it is difficult to locate leaks until direct damage appears on the ground.
The sound of leaking water varies with the material of the pipe, water pressure, diameter, distance and so forth.
Inspectors need to become familiar with the sound propagation characteristics based on the material of the pipes and also any external noise, such as the sound of electric motors in vending machines. They also require experience to be able to estimate the location of a leak and to differentiate water leak sounds. Therefore, they often attend courses on leak detection prior to actual investigation (Ministry of Health Labour & Welfare ). However, this approach, which may involve listening at night when there is less noise, is burdensome for inspectors. For these reasons, it is difficult for skilled inspectors to pass down technical expertise to younger inspectors. This paper starts with a review of previous studies on leak detection techniques used in water distribution systems (WDSs) that have been published at international congresses and by various researchers. Khulief et al. () studied the detection of leaks by measuring the leakage sound inside a water pipe. However, the technique was applicable to a single pipeline, and although the probability of detection is higher, only a limited area can be investigated at one time.
Seyoum et al. () described a leak detection technique using the same principle as an app that identifies the name of a song or artist. The leak detection model in their study was divided into Feed mode and Detection mode, and the software makes the final decision in Detection mode. This is a sophisticated technique that uses software to detect leaks. However, it is designed for households, and is not suitable for detecting leaks in a place with many obstacles.
Finally, Fuad et al. () reviewed multiple leak detection methods, including a sensor that measures sound intensity (amplitude), and estimating the leak location using the delay time between two sensors. However, leak detection that depends on sound intensity cannot easily distinguish between a leak sound and a similar other sound.
As indicated above, many studies use sound data to detect water leakage, but sound is not the only clue for detecting leakage. Leaks can also be inferred from changes in pressure and flow. However, pressure and flow can change greatly throughout a pipe network, and the more complex the network, the harder it is to locate the leak.
Although sound is not completely free from such influences, the sound becomes clearer closer to the leak point, which is a useful characteristic. Therefore, the present study uses sound data acquired by sensors for leak detection.
When such data is acquired, it is not immediately known whether the sound is from a water leak or is background noise. Just as people listening to a saxophone or flute can distinguish the two instruments, experts must listen for any differences between water leak sounds and background noise.
In a previous study, Nam et al. () investigated whether the differences between water leak sounds and background noise could be quantified and verified prior to the full-scale detection of water leaks, and considered what the differences were. It was found that the pattern of a specific sound was expressed as having deterministic properties (Fujimoto & Iokibe ), and it was shown that leak sounds had stronger deterministic properties than normal or background noise.
In the present study, time series data (which has deterministic properties) are obtained in the form of recurrence plots (RP), which is a two-dimensional form, and the characteristics of actual water leakage are visualised. In addition, the visualised RPs are trained using the convolutional neural network (CNN) model, which is a deep learning technology.
On the other hand, it is beyond the scope of this paper to discuss the effectiveness of cross-validation (CV) in machine learning. CV is a data resampling method to assess the generalisation ability of predictive models and to prevent overfitting (Berrar ). Before performing CV, which requires a lot of time and effort, it is important to conduct a basic review of whether these data are trained normally. This study used actual leakage sound data, but there has been little research on how to process and utilise such data. Therefore, this study mainly addresses some fundamental questions when applying the hold-out method in machine learning.

How to collect actual leak sounds
This section explains the collection of experimental data, including actual leak sounds. The leak sounds recorded by sensors in the field were obtained in collaboration with a pipeline restoration company. The procedure for measuring leak sounds and background noise was as follows: (1) the company's engineer visited the place where a leak was suspected at the request of the waterworks bureau; (2) the engineer then installed leak sensors on a fire hydrant or gate valve as close as possible to the leak site; (3) the sound (leak sound) was recorded for 3 min; (4) the leak was fixed; and then (5) the sound (background noise) was re-recorded on the same day at the same location ( Figure 1).
The measuring device (AQUASCAN 620 L, Gutermann) used a sampling frequency of 10,000 Hz, 16-bit resolution and WAV format.
The water pipelines analysed in this study were made of five materials. Figure 2 shows the 10 leak sites in this study and each pipe material. The amount of water that leaked was measured as follows: (1) the spilled water was collected in a plastic bag for 10-60 s; (2) the amount (mass) of water collected was measured; and (3) the mass in units (L/min) was measured. (The temperature of the water was assumed to be 4 C, at which the density of water is the maximum, 1 kg/L).

Visualisation of measurement sounds using a recurrence plot
The RP and the effect of RP size on the training process were as follows.
where i is plotted on the horizontal axis and j on the vertical axis, and where X is an embedding vector. UTRPs are able to reflect delicate boundaries between data in results and are also useful for reverse-inferring appropriate threshold values. Accordingly, the time series data were visualised on a plane using the UTRP method in order to express in detail the characteristics of frequencies (input data) that change sensitively in units of Hertz. Figure 3 shows how to convert time series data into RP.
When comparing X i and X j obtained from two systems of the same dimension, the vector embedded from each time series can be constructed as follows (y: time series data, m: embedding dimension, τ: delay time): The matrix on the right in Figure 3 is the RP reconstructed from the time series data on the left. If i ¼ 2 and j ¼ 2, the distance matrix becomes 0 from the calculation of D 2,2 ¼ X 2 À X 2 k k . This corresponds to the smallest value among the calculated values in one RP, and is displayed on the plane in black.
In this study, the size of the horizontal axis and vertical axis of one RP was 64. Ten-thousand 1-s data (with a sampling frequency of 10,000 Hz) were divided into 100, and adjusted (using bicubic interpolation) so that the size became 64 × 64 to reduce the computation time of machine learning. Consequently, the time series data of 3 min of background noise and water leak sounds became 36,000 (180 s × 100 × 2) RPs, and the number of RPs for Area 1 to 10 in Figure 2 totalled 720,000 (36,000 × 10 areas × 2 sensors). As an example, Figure 4 shows RPs that visualise background noise (left) and water leak sounds (right) using sensors A in Area 4 (m: 3, τ: 1).
The RP size used in the training process was determined as follows. contained in the sounds, RPs were created using a low-pass filter. Given that the frequency band effective for detecting  water leaks using sensing data on water pipes was shown to be 1,500 Hz or less in a previous study (Kawamura et al.

)
, a low-pass filter that cuts frequencies over 1,500 Hz was adopted.
Model for detecting the presence of water leaks using the CNN model The structure of the CNN model was chosen as follows. Various tools such as CNN, AutoEncoder and recurrent neural networks have been proposed. This study constructed learning models using CNN, which has demonstrated strong performance in many fields, with RPs as training data (Hatami et al. ). CNN is a model (Figure 5) that uses a neural network with layers called the convolution layer and the pooling layer. With CNN, the input data of the convolutional layer is called the input feature map; the output data is called the output feature map; and the input and output data together are defined as the feature map.
There were four convolutional layers in this study, and the configuration based on the shape notation is as follows: Meanwhile, the initial kernel value of the convolutional layers of this study is randomly generated. Then, the weights are updated so as to converge on the correct answer as the training progresses. This means that even if the data used for training is the same, the results may differ each time.
The difference in balanced accuracy at the end of learning is not large, but in order to increase the reliability of the results, the same process was performed five times and expressed as an average value.
In addition, in order to evaluate whether training was being performed properly, cross entropy was used as the where p denotes the classification target, and q the output of the model. The smaller the H value, the higher the probability of the correct answer; and the larger the value, the farther away from the correct answer. In other words, the fact that H decreases as the iteration progresses means that training is performed normally.
The fully connected layer refers to the layer connected to all outputs through the pooling layer (Duchi et al. ).
With a threshold based on 0.5 as the standard, the fully connected layer was designed so that inputs no larger than 0.5 are output as 0, and inputs larger than 0.5 are output as 1. On the other hand, the value throughout the fully connected layers adopts the sigmoid function as the activation function before it goes to distinction by threshold, and is the actual output of the CNN as a value between 0 and On the other hand, to avoid 'learning with future data and evaluating with past data' for data division, 90% future data was used as training data and 10% past data was used as test data. Moreover, the hold-out method was used. This is one of the simplest among the various data   The confusion matrix in Table 1 Table 1 corresponds to 'when there is a water leak,' while negative corresponds to 'no water leaks.' Balanced accuracy was used to assess the detection ability of the model.

RESULTS AND DISCUSSION
The RPs of background noise (no water leaks) in Figure 4 are close to white noise and their shapes tend not to have regular features (weak deterministic properties). In contrast, it can be qualitatively determined that the RPs of water leak sounds exhibit shapes with regular features, such as a mesh or honeycomb (strong deterministic properties).
On the other hand, the difference between the RPs in However, in practice, the exact leak point is not known, and so the sensor is likely to be placed further away. Furthermore, it is also likely to be difficult for a person to detect the characteristics of RP. Therefore, this study used the CNN model to develop a leak detection technique.   Of course, since the amount of leakage in Area 9 is 0.001 L/min, which is the lowest value among all areas, the low recognition accuracy is not only caused by the material of the pipeline. It was also found that the balanced accuracy of the non-metallic group, Areas 5, 6, 7 and 8, which experienced some leakage, did not exceed 90% throughout all training processes. This trend was confirmed when looking at the average balanced accuracy in the verification process in Table 2, which lists the test results when using the trained model. In addition, since the balanced accuracy in Table 2  detecting the presence of water leaks using a CNN was constructed by using the visualised images as training data.
The major findings were as follows.
(1) By expressing background noise (no water leaks) and water leak sounds using RPs, the difference in the deterministic properties of both sounds was clarified.
(2) It was found that the background Regarding the direction of future research, the leak detection technique proposed here is not yet complete and ready for application, for the following reasons. The same data were used for training and verification. The hold-out method was used for verifying the proposed model using data that was different from the data used for training. However, it is true that all the data were generated in one location: during training, the sounds recorded in one area were also used to verify the performance of the model for the same area. Accordingly, the test method may have suffered from overfitting, where training was concentrated on only one sample. As a result, a model trained in one area may not attain the same high recognition accuracy when used in another area. This problem can be resolved by performing CV. The detection model will be improved through CV in a future study.

DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.