ABSTRACT
Aging urban water supply pipelines are facing significant leakage issues. Ground penetrating radar (GPR) is an effective non-destructive method for leakage localization, but its image analysis mainly relies on interpreter experience, leading to uncertainty and inefficiency. To address this issue, an unsupervised learning method based on GPR data attributes is proposed for automatic leakage detection. First, velocity and energy attributes are extracted from B-Scan profiles to differentiate wet areas caused by leakage from normal areas. Then, the density-based spatial clustering of applications with noise (DBSCAN) clustering method is applied to automatically classify these two types of areas. The proposed method was evaluated on an experimental platform with constant water pressure and pre-drilled leakage points. Experiments show that the velocity and energy attributes decrease by 25.2 and 26.6% after leakage. Using velocity attributes yields better results, and suggestions for DBSCAN hyperparameters are provided. Feature importance analysis indicates that two-way-travel-time with a score of 46.1 and energy attribution index with a score of 32.7 significantly influence leakage identification. This method can also estimate affected leakage areas with an error not exceeding the spacing of three measurement lines, and is available across various underground conditions without needing well-labeled training data.
HIGHLIGHTS
An unsupervised pipeline leak detection method for ground penetrating radar (GPR) data is proposed.
The velocity and energy attributes of GPR profiles are extracted for classification.
Suggestions for hyperparameter selection in density-based spatial clustering of applications with noise clustering are provided.
The range estimation error of the leakage area is extremely small.
Accurate identification of leakage areas is achieved with small samples.
INTRODUCTION
As one of the most important underground facilities in a city, the aging and corrosion of water supply pipelines become increasingly severe over time, which may lead to leakage or burst. According to the water distribution network bulletin, the global fresh water loss rate ranges from 5% (in European countries) to 50% (in developing countries), with 30% of the losses attributable to pipeline leakage (Plath et al. 2014; De Coning & Mouton 2020; Adraoui et al. 2024). The occurrence and development of underground pipeline leakage are inevitable. If not detected and addressed promptly, prolonged water erosion will cause further damage to the pipeline itself and the surrounding environment. Therefore, to minimize water loss, it is essential to use non-destructive testing techniques to detect early leaks and accurately locate defects.
In current water distribution systems (WDS), various pipeline inspection devices or techniques are used. These advanced devices or techniques can be classified into three categories based on their detection principles: vision-based, acoustic and vibration-based, and electromagnetic and radio frequency-based (Hao et al. 2012; Romero-Ben et al. 2023). Vision-based devices such as closed-circuit television (Wang et al. 2021) and sewer scanner evaluation (Haurum & Moeslund 2020) obtain health status by capturing images along the pipeline. They are widely used in drainage pipes but their application in water supply pipelines is limited by pipe diameter and water pressure, and they have drawbacks such as being destructive, affecting water quality, being difficult to operate, and having high application costs. Acoustic and vibration-based devices, such as leak noise correlators, listening sticks, and sonar, use acoustic emission as the core technology (Brennan et al. 2019). These devices are also widely used but are sensitive to environmental noise and are unsuitable for non-metallic pipelines, deep pipelines, or cases with small leakage points. Electromagnetic and radio frequency-based methods include magnetic flux leakage (Ege & Coramik 2018), eddy current, water mirror, and electromagnetic-based ground penetrating radar (GPR). The combination of multiple devices to overcome the limitations of single-device applications has also emerged as a new trend (Cataldo et al. 2017; Hu et al. 2023; Liu et al. 2024), providing higher detection efficiency and accuracy.
GPR is increasingly valued for leak detection in WDS due to its high resolution and ease of operation. Compared with the most advanced technologies, GPR is less limited by application scenarios, and by selecting the appropriate antenna frequency, it can identify pipelines or leaks of any material and depth. Numerous field studies have confirmed the feasibility of using GPR to detect pipeline leaks. Initially, Tong (1993) evaluated the impact of various pipeline materials on pipeline localization. Hunaidi & Giamou (1998) conducted field tests and first proposed that leaking water could wash away the soil around the pipeline, forming voids. Leakage can significantly change the dielectric constant of the medium and slow down the propagation speed of electromagnetic waves. Cheung & Lai (2019) noticed that plenty of reverberation will appear in the radargrams when the leaked water spreads downward, and such multiple reflection waves can be a fingerprint of leakage. Scale-down laboratory tests with higher central frequency GPR and smaller dimension of setting are also employed to investigate the potential of detecting leakage. Perturbation patterns caused by leakage in a metallic and a PVC buried pipe were studied by Lai et al. (2016), and the signatures of leakage in different time periods after leakage were identified. Amran et al. (2018) observed the GPR pattern collected with the leakage point diameter of 1/4, 1/2, and 3/4, respectively, found that the smaller diameter hole corresponds to larger water pressure and more reverberation appear in the radargrams. Numerical modeling can also be conducted to study the law of GPR patterns due to leakage. Dong et al. (2012) proposed a finite element model to simulate an artificial leakage, and found that the distinct hyperbola of pipe shown in the GPR profile was disturbed by saturated soil. In addition to basic signal processing methods to clearly display reflections of geological anomalies, advanced technologies, such as microwave tomographic inversion (Guo et al. 2023), 3D models based on interpretation of GPR images (Ocaña-Levario et al. 2015; Feng et al. 2023), and variance filters (Ocaña-Levario et al. 2018), have also been introduced to improve leak signal imaging or locate leaks in 3D space.
Based on the analysis results of the above-mentioned techniques, highly skilled operators can promptly identify anomalous signals and assess their relevance to actual pipeline leaks. However, for most ordinary operators, identifying leak-related anomalous signals from a large number of GPR images is still challenging. Even experienced experts can sometimes make false positive and false negative errors due to subjective judgment or prolonged visual fatigue. Therefore, the intelligent interpretation of GPR data based on machine learning is gaining increasing attention. This involves using methods such as parabolic fitting or Hough transform (HT) to identify hyperbolic reflection signals in B-Scan images and accurately calculate the target diameter and position (Zhou et al. 2018; Kumar et al. 2024). Attribute analysis techniques (Ayala-Cabrera et al. 2013) are then used to extract key indicators for classification, followed by classification and prediction of leak and normal signals using algorithms such as support vector machines, Gaussian classification, and decision trees (Liu et al. 2020; Zhao et al. 2023). Moreover, deep learning algorithms capable of automatic feature extraction can be used for leak detection, such as one-dimensional convolutional neural networks for signal recognition and two-dimensional convolutional neural networks for B-Scan image recognition.
Currently, intelligent methods based on machine learning or deep learning for identifying anomalies in radar images have been successfully implemented in tasks such as concrete pavement damage detection, tunnel lining inspection, and moisture damage detection (Zhang et al. 2020; Qin et al. 2021; Xu et al. 2021). However, these methods typically require a large amount of labeled data to train the models and face three typical challenges in water pipeline leak detection tasks: (1) due to the significant variation in leakage characteristics under different overburden conditions, once the trained model is applied to a new environment, the recognition accuracy may decrease due to insufficient generalization. (2) The downshifting of the pipeline reflection signal caused by leakage can easily be confused with other pipelines, affecting identification accuracy. (3) Most importantly, the amount of data for leakage signals may be very small. To address these issues, this paper proposes an unsupervised learning method based on GPR data attributes and density-based spatial clustering of applications with noise (DBSCAN) clustering. First, velocity and energy attributes are introduced to quantitatively describe the radar profiles. Then, the DBSCAN clustering algorithm, which does not require labeled data, is used to detect anomalies based on the attribute differences between a set of adjacent radar profiles, and to intelligently cluster survey lines with similar characteristics into dry or wet areas. Combined with feature importance analysis methods, the proposed method was validated on a test platform and in real field conditions. It successfully identified leaks in pipelines with known construction locations based on a small amount of field data.
The structure of this paper is as follows: Section 2 provides an explanation of all the methodologies involved in this study, Section 3 provides a specific description of the experimental setup and data preprocessing, Section 4 presents the experimental results and related discussions, and the conclusion is presented in Section 5.
METHODOLOGY
GPR inverts the subsurface structure according to the electrical differences in underground media and displays the corresponding reflection signals on profile images. Water pipeline leaks cause the underground media to become heterogeneous, and the GPR profiles affected by leaking water are characterized by phase axis discontinuities, deflection wave distortions, and changes in reflection wave frequency. In wet areas, various anomalous reflection signals appear in the radar profile. These signals are distinctly different from unaffected signals, and the leak location can be determined by comparing adjacent survey line profiles. The fundamental reason for this phenomenon is that the dielectric constant of the soil around the pipeline is altered by water, and the electromagnetic wave velocity and reflection energy closely related to the dielectric constant also change.
This paper introduces velocity and energy attributes to quantitatively describe the GPR profile, and extracts the two attributes using random Hough transform (RHT) and statistical calculation methods, respectively. Once these two attributes are obtained, DBSCAN can be used to cluster the area to which the survey line belongs based on the density of data points.
Velocity attribute and random Hough transform
Among them, where x0 is the horizontal distance between the coordinate origin and the pipeline target, t0 is the two-way-travel-time of the hyperbolic vertex when the GPR is located directly above the pipeline, and v is the speed at which electromagnetic waves travel in soil media.
(1) Filtering: filter out the clutter and highlight the hyperbolic signal of the pipeline.
(2) Image binarization: set a fixed threshold and binarize the image B-scan. The threshold is generally set to 128. Pixels with gray values greater than the threshold are set to white, while those below the threshold are set to black.
(3) Open operation: open operation is a filter based on geometric operation. Erosion operation is first performed on the image, followed by expansion operation. This process can remove isolated points and burrs in the image without changing the main features of the image.
- (4) Sample weighted difference: sample weighted difference is performed on the grayscale. Then set the value more than 255 to 0 (black pixel), and set the value less than 255 to 255 (white pixel), which can be expressed as:where Di represents the grayscale of any point after weighted difference. This process can preserve the structural information of the original reflection signal while refining the lines, thereby simplifying the data points and reducing the computational complexity of RHT. The calculation process of RHT is shown in Figure 2.
Energy attribute
DBSCAN clustering
DBSCAN is a density-based unsupervised clustering algorithm (Ester et al. 1996) that, unlike methods such as K-means, does not require the number of clusters to be specified in advance, making it highly suitable for situations where multiple parallel pipelines with varying leakage conditions may exist underground. In cases where the soil or underground medium is heterogeneous, the leakage patterns of pipelines may exhibit different characteristics. DBSCAN can detect clusters of arbitrary shapes, not limited to spherical or elliptical forms, thus effectively identifying leaks. Additionally, DBSCAN is insensitive to noise, and by selecting appropriate hyperparameters, noise can be removed without being classified into any clusters, making it suitable for GPR data classification in complex environments.
DBSCAN represents the distribution and aggregation of samples through two parameters: the neighborhood radius ε and the minimum number of points MinPts within the neighborhood. The number of objects obtained within the ε-neighborhood is defined as density. When the density exceeds a specified threshold, all these objects form a cluster. Taking the set D(x1, x2,…, xm) as an example, the clustering process can be defined as follows:
(2) Core object: an object that its ε-neighborhood containing at least MinPts other objects is called the core object.
(3) Directly density-reachable: if an object xi is within the ε-neighborhood of xj, and xj is a core object, then we call xi is directly density-reachable from xj.
(4) Density-reachable: for a chain of objects p1, p2,…, pn, meet the conditions that pi+1 is directly density-reachable from pi. If p1 = xi and pn = xj, then object xj is density-reachable from the object xi.
(5) Density-connected: as for objects xi and xj, if there is a core object xk, and both xi, xj are density-reachable from xk, then xi is density-connected to xj.
DBSCAN randomly selects an unclassified object to search its ε-neighborhood. If the number of points in the ε-neighborhood is less than MinPts, the object will be marked as a noise object. Density clustering is completed until all core objects are marked. All the objects that exist in the relationship of density-connected clustered and objects can be divided into one or more categories.
The advantage of using DBSCAN for pipeline leakage identification is that unsupervised learning has no dependence on labels and is very friendly for processing large-scale data. After obtaining samples, the next experiments can be carried out immediately, improving the data cost of supervised learning. Moreover, it automatically extracts key features through clustering without the need for manual selection, thereby reducing the influence of subjective factors. Furthermore, the proposed method has significant environmental friendliness compared to excavation methods. It differs from existing equipment-based methods (such as infrared imaging, which is unsuitable for cold water pipes, and electromagnetic induction, which is not suitable for non-metallic pipes) in its strong adaptability and robustness to complex environments and different types of pipelines. Compared to methods that solely use GPR, the proposed approach combines the DBSCAN unsupervised learning clustering algorithm with GPR to enable rapid detection, eliminating the need for human judgment to discern leakage conditions and provide the range of leakage areas based on experience.
DATA SOURCES
Experimental setup
The survey lines are arranged in an orthogonal grid as shown in Figure 3(c), with 35 GPR survey lines parallel to the ductile iron pipe, referred to as X-n (n = 1, 2, …, 35), and 25 GPR survey lines perpendicular to the ductile iron pipe, referred to as Y-m (m = 1, 2, …, 25). The distance between two adjacent lines is 0.1 m. A 5 m diameter leak hole was pre-drilled at the intersection of survey lines X-18 and Y-17, and the other end of the ductile iron pipe was sealed to ensure that water could only leak out from the hole. Data collection was carried out using the MALA ProEx Basic system and a 500 MHz shielded antenna. Each trace consisted of 256 sampling points, stacked 4 times at a sampling frequency of 7,000 MHz.
Data collection and preprocessing
Data were collected before and after the leakage, resulting in a total of 120 GPR survey lines. The dense grid arrangement of the survey lines was designed to comprehensively observe and demonstrate the changes in GPR data before and after the leakage. Since this study focuses on pipelines with known positions and depths, in field leakage detection tasks, GPR data will be collected longitudinally along the pipeline once, and then repeated 3–5 times along lines parallel to the pipeline at 0.1 m intervals. The GPR speed is controlled at 1.0 m/s, so collecting data along a 1-km-long pipeline takes 16–17 min, making the actual collection efficiency higher and less dense than in the laboratory.
Before collecting data without leakage, soil sampling is conducted to ensure the ground medium is completely dry. After completing the first phase of data collection without leakage, the valve opening is increased, and the pressure gauge reading is maintained at a stable 0.35 MPa. The leakage process lasts for 90 min to ensure the affected area fully expands, and then data collection is performed. After completing collection, the valve is closed, and the soil is excavated layer by layer while recording the survey line numbers within the wet soil area as the manual leakage detection results. Once the overlying soil is completely dry, it is backfilled layer by layer, compacting the site every 0.2 m of soil. This collection process will be repeated three times.
The raw signals were preprocessed by Dewow, subtract direct current (DC) shift, time zero correction, and average xy-filter successively. Dewow and DC correction are used to adjust waveform drift. Time zero correction eliminates invalid signals and corrects the position of surface waves. Average xy-filter suppresses pulse interference and smooths the data. Since the pipeline reflection signals are distinguishable, traditional gain processing was not implemented to avoid signal distortion.
RESULTS
Artificial diagnostic results
Upon opening the valve, there was a leakage under the pipeline water pressure, resulting in changes in the profiles from Y-13 to Y-21. The hyperbolic curves of the pipeline reflection signals moved significantly downward, particularly for the survey line Y-17 corresponding to the leakage point, where the shape of the hyperbolic curves became distorted and asymmetrical compared to before the leakage. This is attributed to the higher dielectric constant and conductivity of the moist soil, which polarizes and delays the reflected waves. The scouring effect of water also altered the soil distribution, causing the subsurface medium to no longer be homogeneous and resulting in echo reflection waves. This change occurred gradually along the 25 survey lines in the Y direction. As shown in Figure 4(b), the further the survey line is from the leakage point, the smaller the change in radar reflection patterns. When the measuring line is 0.8 m away from the leakage point, corresponding to measuring line Y-9, there is almost no difference in the image before and after the leakage. Through hyperbolic curve fitting, the electromagnetic wave velocity after the leakage was approximately 0.076 m/ns, which decreased by 27.6% compared to before the leakage. Another change was the interruption of the horizontal stripes around 5 ns in Y-17, replaced by several distinct reflection waves. This phenomenon occurred only at the survey line corresponding to the leakage point, possibly indicating reflection signals from a sinkhole.
Figure 4(c) and 4(d) presents radar profiles of X-12 to X-24, totaling 7 survey lines. In Figure 4(c), continuous black stripes are observed at 11 ns on the vertical axis for X-16, X-18, and X-20 (highlighted in red boxes), indicating reflection signals from ductile iron pipes, absent in other survey lines. In Figure 4(d), the black stripes of X-16, X-18, and X-20 show discontinuous downward shifts (highlighted in red boxes). This is because the soil moisture content around the leakage point decreases with increasing distance. Furthermore, bending of stripes is observed at X-14 and X-22, indicating influence from leaking water flow. Anomalous diffraction waves are present from 7 to 10 ns in X-20, possibly indicative of cavities formed by erosion from leaking water flow (highlighted in blue boxes).
Comparing radar images before and after the leakage, significant differences are evident between the two measurement results of Y-13 to Y-21 and X-14 to X-22. Thus, these areas are confirmed to be influenced by leaking water flow. The on-site excavation results are shown in Figure 4(e). The leakage area is centered around the intersection of Y-17 and X-18, forming a hollow area with a radius of about 0.2 m and a moist area with a radius of about 0.4 m. To avoid redundant description, subsequent experimental analyses focus exclusively on survey line Y-m.
Cluster results
Velocity clustering
Subsequently, both the two-way-travel-time and velocity were normalized to the range of 0 to 1. After performing DBSCAN analysis with parameters ε = 0.25 and MinPts = 3, the 25 profiles were classified into two clusters, as shown in Figure 5(b). Each cluster exhibited data points with similar two-way-travel-times and velocities. The cluster center of Category 1 was (0.170, 0.678), corresponding to a two-way-travel-time of 11.49 ns and an electromagnetic wave velocity of 0.107 m/ns. The cluster center of Category 2 was (0.932, 0.080), with a two-way-travel-time of 15.95 ns and an electromagnetic wave velocity of 0.080 m/ns, indicating a decrease of 25.2%. It was observed that Category 1 corresponded to profiles in dry areas, and the electromagnetic wave velocity corresponding to its cluster center is similar to the manual detection results. In contrast, the significantly reduced electromagnetic wave velocity at the center of Category 2 indicated profiles from moist areas. Additionally, Y-14 was labeled as noise and not included in two categories. Based on excavation results after experimentation, Y-14 was found to be located precisely at the moisture boundary, making it difficult to determine whether this profile belonged to a leaking or non-leaking area.
Energy clustering
Similarly, taking and EAI of the GPR profiles as the calculation objects, the results before and after leakage are plotted as functions of the survey line, as shown in Figure 5(c). The curves of Ē and EAI before leakage fluctuate at 380M ± 50 M and 170M ± 30 M, respectively. After leakage, the curve exhibits pronounced fluctuations, with the lowest point at Y-18 and the highest point at Y-23, while the EAI curve shows intense reflection energy at Y-16. Near the leakage points, the soil is saturated with water, resulting in significant fluctuations in reflection energy, contrasting sharply with other dry areas. After leakage, there is an enhanced reflection of electromagnetic wave energy. Energy deviates from the mean in Y-13 to Y-18, attributed to downward diffraction waves, thus highlighting the spherical spreading effect.
The normalized energy attributes were analyzed using DBSCAN, and the results are shown in Figure 5(d). The 25 GPR patterns were classified into two clusters. Each cluster contains data points with similar mean energy and energy attributes. The first cluster center is located at (0.297, 0.699), corresponding to an of 601.8 M and an EAI of 168.4 M, characterized by weak energy reflection and balanced energy distribution. The second cluster center is located at (0.828, 0.156), with an of 442.0 M and an EAI of 297.5 M, representing a decrease of 26.6% and an increase of 76.7%, respectively. It exhibits strong energy reflection and unstable energy distribution characteristics. Considering the experimental environment, the first group includes survey lines Y-1 to Y-12 and Y-19 to Y-25, which are far from the leakage points, while the second group comprises survey lines Y-13 to Y-18, which are close to the leakage points. The results indicate that DBSCAN can classify soil areas based on energy attributes.
Threshold selection
DBSCAN includes two important parameters, ε and MinPts. ε determines the distance of attribute values, while MinPts determines the minimum number of survey lines in a cluster. Setting different thresholds for these two parameters will yield different classification results. Larger MinPts and smaller ε imply stricter classification criteria. In other words, only when the number of highly similar survey lines meets the threshold can they be clustered together, while the rest are considered noise data and ignored.
Discussions
Experimental results demonstrate that the proposed method effectively divides wet and dry areas, correctly identifies leakage areas, with an error margin of less than 0.5 m. However, using different attributes as classification criteria yields slightly different results, with some differences observed in determining wet area boundaries. Manual interpretation identifies Y-13 to Y-21 as the leakage area, while the classification based on velocity attributes places the leakage area between Y-15 and Y-19, and according to energy attributes, it is between Y-13 and Y-18. Therefore, the unilateral error in estimating the range of the leakage area does not exceed three survey lines. From the perspective of the regional center survey line, the velocity-based classification results are closer to reality. This does not imply that velocity attributes are suitable for leakage detection in any environment. For example, electromagnetic waves experience severe attenuation when propagating through water media, resulting in weak reflection wave energy that prevents the identification of hyperbolas, thus making it difficult to obtain velocity characteristics through hyperbola fitting. Additionally, velocity attributes are challenging to obtain from profiles in the X-direction. In such cases, choosing energy attributes rather than velocity attributes can obtain more effective results.
The sensitivity of classification can be adjusted by modifying ε and MinPts. According to experimental findings, radar images collected before and after leakage often exhibit approximately 25% differences in velocity or energy attributes (). Among them, the velocity decreased by 27.6% in the manual detection results. In the proposed method, the velocity decreased by 25.2%, and the mean energy decreased by 26.6%. Moreover, under conditions of high-density line arrangements, radar images with anomalous leakage signals are not isolated. Therefore, setting ε to 0.25 and MinPts to 3 can achieve good classification results. These suggestions are not absolute, and in practical leak detection applications, slightly adjusting the threshold can achieve better classification results.
Feature importance ranking
The importance of each feature was evaluated using violin plots and the shapley additive explanation (SHAP) method (Lundberg & Lee 2017). Violin plots display the distribution and probability density of data, where the x-axis represents sample states including normal and leakage, and the y-axis represents feature values. In violin plots, the white dot represents the median of the feature values, the thick black lines denote the upper and lower quartiles, the thin black lines represent the 95% confidence interval, and the outer shape of the violin outlines the kernel density estimation, reflecting the distribution probability. The SHAP method is a decomposition algorithm used to explain the predictions of the machine learning model. It is based on Shapley values from cooperative game theory, aiming to provide a score for each feature that indicates the magnitude of its impact on a specific prediction, thus quantifying the contribution of each feature to the model.
Figure 8 illustrates the distribution of data for survey line Y-m. Among them, EAI, electromagnetic wave propagation velocity, and two-way-travel-time of hyperbolic signal vertices exhibit significant distribution differences between the two sample classes. This is characterized by minimal overlap in intervals and the presence of a threshold that can distinguish between leakage and non-leakage survey lines. For instance, in Figure 8(c), EAI values for normal samples range from 1.3 to 1.5 M, while for leakage samples, they range from 2.2 to 2.8 M, indicating more pronounced energy fluctuations in survey lines affected by leakage water. In Figure 8(e), the median of normal samples is 0.111 m/ns, while the median of missed samples is 0.083 m/ns, a decrease of 25.2%. In Figure 8(f), the two-way-travel-time for normal samples ranges from 10.8 to 11.4 ns with concentrated data distribution and narrow confidence interval, whereas for leakage samples, it ranges from 13.8 to 16.0 ns with a dispersed data distribution and wider confidence interval, indicating a noticeable lag.
However, in Figure 8(a) and 8(b), there is considerable overlap in intervals between the two sample classes, particularly with normal sample energy averages appearing approximately uniformly distributed, making it difficult to differentiate between the two classes based on energy mean attributes. In Figure 8(d), there is no significant difference in the frequency distribution range between the two sample classes, which both range from 280 to 320 MHz. The distribution of normal samples is more concentrated, which also indicates that the data similarity of each survey line before leakage is high. Analysis of each feature using the SHAP method ranks their importance as shown in Table 1, with two-way-travel-time > EAI > velocity > mean energy > main frequency > mean amplitude. Two-way-travel-time is highly correlated with electromagnetic wave velocity, validating the effectiveness of using velocity and energy attributes for classifying leakage areas.
Features . | Two-way-travel-time . | Velocity . | Main frequency . | EAI . | Mean energy . | Mean amplitude . |
---|---|---|---|---|---|---|
Score | 46.1 | 11.2 | 4.8 | 32.7 | 6.9 | 4.4 |
Features . | Two-way-travel-time . | Velocity . | Main frequency . | EAI . | Mean energy . | Mean amplitude . |
---|---|---|---|---|---|---|
Score | 46.1 | 11.2 | 4.8 | 32.7 | 6.9 | 4.4 |
Field tests
Data collection was conducted using the same equipment and parameters as described in Section 3. The initial survey line is 1 m away from the valve, and each subsequent survey line is 0.3 m apart, totaling 14 survey lines. All survey lines intersected the pipelines perpendicularly with a length of 8 m, named as YT-p (p = 1, 2, …, 14). Using RHT to solve the velocity of the profile and the two-way-travel-time of the hyperbolic vertex, and then using DBSCAN to perform clustering analysis on the features. Based on prior experience, clustering was performed with ε set to 0.25 and MinPts set to 4, yielding results as shown in Figure 9(b). A total of 42 hyperbolas were obtained from 14 survey lines. Except for 3 hyperbolas which were considered as noise, the remaining 39 hyperbolas were classified into 4 categories. Among them, Category 3 corresponds to the steel pipe, with a two-way-travel-time of 10.25 ns and a speed of 0.110 m/ns at the cluster center. Category 2 corresponds to the cast iron pipe, where YT-1 to YT-7 and YT-9 to YT-14 exhibit some differences but do not reach the set distance threshold, hence classified into the same category. The two-way-travel-time of the cluster center is 12.91 ns, and the velocity is 0.098 m/ns. Category 1 and Category 4 correspond to the self-stressing pipe, with Category 1 having a two-way-travel-time of 17.07 ns and a velocity of 0.082 m/ns, and Category 4 having a two-way-travel-time of 15.00 ns and a velocity of 0.110 m/ns. The decrease in electromagnetic wave velocity after leakage is 25.5%. The significant decrease in velocity for Category 1 indicates an anomaly in the pipeline, with the moist area identified between survey lines YT-8 and YT-14, particularly with YT-10 showing the greatest deviation. Therefore, it is speculated that the survey line YT-10 of the self-stressing pipe is the leakage location. After confirmation by the staff at the base, the valve of the self-stressing pipe was opened, and the leakage point was 3.83 m away from the valve, while YT-10 was 3.70 m away from the valve, proving that the proposed method has strong generalization ability.
CONCLUSIONS
To address the issues with existing GPR data processing techniques, a method is proposed for extracting GPR profile attributes and performing unsupervised classification to identify water pipeline leaks. Compared to techniques that rely on equipment for detection under restrictive conditions, this method is not affected by the engineering environment and is not limited by factors such as pipeline material, diameter, and burial depth. As it does not require specifying the number or shape of clusters, the proposed method remains applicable in complex scenarios with multiple interfering targets and parallel pipelines, reducing the impact of subjective human factors. Additionally, it achieves accurate classification without the need for large amounts of labeled training data, addressing the dependency of current supervised learning methods on labels and improving data costs. The current progress of this research is summarized as follows:
(1) The velocity and energy attributes of GPR profiles can be used as indicators for pipeline leak detection, with velocity attributes generally yielding better classification results.
(2) The numerical differences in the velocity and energy attributes of the GPR profile before and after leakage are 25.2 and 26.6%, respectively. Based on this, it can be determined that ε for DBSCAN should be close to the magnitude of the attribute change, which is 0.25 in this case. Furthermore, based on the number of B-Scan images that may be affected at the boundary between the leakage and normal areas on a cross-section, it is determined that MinPts should be greater than 2, which is 3 in this case.
(3) The proposed method can also estimate the range of the leakage area, with a unilateral error of no more than three measurement lines, which is 0.3 m in this study.
Machine learning techniques can also be integrated with other data sources, which is part of our ongoing work. For example, noise meters can be installed at valves or in sewer wells to monitor pipelines, and intelligent algorithms can analyze noise data to identify leakage segments, followed by precise localization of leakage points using GPR. Despite promising progress, this study has some limitations. The selection of clustering features relies on prior experience. While the physical significance of the hyperparameters in the clustering method has been clarified, the values of ε and MinPts still need to be determined through preliminary experiments. Ambiguous wet boundaries and unknown pipeline information may lead to multiple noise points misclassified as a single pipeline. The lack of effective non-destructive validation methods can also result in misjudgments regarding the layout of underground pipelines, increasing the excavation workload. Moreover, the chosen features are based on radar waveforms, and it is not possible to intuitively understand the areas represented by each category from the clustering results, necessitating further analysis. Future work should incorporate hydraulic models for feature extraction and further reduce human involvement in the identification process, which may lead to more accurate results.
ACKNOWLEDGEMENTS
We gratefully acknowledge the support from the Postdoctoral Fellowship Program of CPSF (Grant No. GZC20241517), the National Key Research and Development Program (Grant No. 2022YFF06069003-03), and the ZJU-ZCCC Institute of Collaborative Innovation (Grant No. ZDJG2021009).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.