ABSTRACT
Urban water supply and drainage systems are a crucial component of urban infrastructure, directly affecting residents' livelihoods and industrial production. The normal operation of the water supply and the drainage pipeline is of great significance for conserving water resources and preventing water pollution. However, due to characteristics such as deep burial, diverse materials, and extensive lengths, the detection of defects becomes exceptionally complex. Traditional detection methods used in practical applications, such as ground excavation and destructive testing, typically require the shutdown of water pipelines. This process is time-consuming and labor-intensive, often resulting in significant economic losses. This paper proposes an effective technique for detecting defects in the water supply and the drainage pipeline. The method involves capturing images of the inner walls of water supply conduits and subsequently utilizing an artificial intelligence large-scale model approach (grounded language-image pre-training, GLIP) and a You Only Look Once version 5 (YOLOv5) model to detect defects within them. The experimental results show that GLIP demonstrates impressive detection performance in zero-shot scenarios, while YOLOv5 performs well on existing datasets. By combining these two models, we were able to achieve a balance between fast, flexible detection and high precision, making our approach both practical and efficient for real-world applications.
HIGHLIGHTS
This study combines GLIP and YOLOv5 for efficient and accurate pipeline defect detection.
It applies zero-shot learning for rapid defect identification without prior training.
It provides non-destructive testing to minimize environmental impacts.
It enhances early defect detection in urban water systems.
It offers a globally applicable solution for sustainable water infrastructure management.
INTRODUCTION
Water supply pipelines are crucial for daily life, leakages and industrial activities, as they ensure the delivery of clean water. Imperfections in these pipelines can lead to significant water loss (Taiwo et al. 2023) – and introduce pollutants that harm human health (Lee & Schwab 2005). Similarly, sewage systems are critical, and leaks can contaminate groundwater (Reynolds & Barrett 2003), posing health risks and complicating water treatment.
. | Results based on the GLIP model . | Results based on the YOLOv5 model . | ||||||
---|---|---|---|---|---|---|---|---|
Rupture . | Corrosion . | Leakage . | Seepage . | Rupture . | Corrosion . | Leakage . | Seepage . | |
Precision | 0.222 | 0.728 | 0.275 | 0.008 | 0.331 | 0.887 | 0.456 | 0.855 |
Recall | 0.176 | 0.738 | 0.259 | 0.074 | 0.286 | 0.891 | 0.75 | 0.333 |
F1 score | 0.196 | 0.734 | 0.267 | 0.077 | 0.307 | 0.889 | 0.567 | 0.479 |
mAP50 | – | – | – | – | 0.296 | 0.933 | 0.648 | 0.417 |
mAP50-95 | – | – | – | – | 0.0858 | 0.742 | 0.444 | 0.329 |
. | Results based on the GLIP model . | Results based on the YOLOv5 model . | ||||||
---|---|---|---|---|---|---|---|---|
Rupture . | Corrosion . | Leakage . | Seepage . | Rupture . | Corrosion . | Leakage . | Seepage . | |
Precision | 0.222 | 0.728 | 0.275 | 0.008 | 0.331 | 0.887 | 0.456 | 0.855 |
Recall | 0.176 | 0.738 | 0.259 | 0.074 | 0.286 | 0.891 | 0.75 | 0.333 |
F1 score | 0.196 | 0.734 | 0.267 | 0.077 | 0.307 | 0.889 | 0.567 | 0.479 |
mAP50 | – | – | – | – | 0.296 | 0.933 | 0.648 | 0.417 |
mAP50-95 | – | – | – | – | 0.0858 | 0.742 | 0.444 | 0.329 |
The automated identification of faults in these systems is therefore critical. Closed Circuit Television (CCTV) inspection is a popular method due to its safety and the intuitive nature of the resulting imagery. Traditional approaches for defect detection using CCTV data range from pattern recognition (Guo et al. 2009) to advanced deep learning techniques like convolutional neural networks (CNNs) (Zhao et al. 2023). In the field of water pipeline defect detection, several studies and applications based on deep learning models have been developed. For instance, Shen et al. (2023) proposed an improved object detection algorithm, enhanced feature extraction (EFE)-Single Shot MultiBox Detector (SSD), which enhances feature extraction capabilities by adding the Receptive Block Fields (RFB_s) module and an improved Efficient Channel Attention (ECA) attention mechanism, and addresses the issue of positive and negative sample imbalance during training by using focal loss instead of cross-entropy loss. Additionally, the You Only Look Once version 5 (YOLOv5) model has been applied in multiple pipeline defect detection tasks. Wang et al. (2023) introduced a pipeline defect detection model based on an improved YOLOv5s algorithm. This model incorporates convolution modules and Grouped Spatial Convolution (GSConv) to simplify its model structure while integrating the Convolutional Block Attention Module (CBAM) attention mechanism. As a result, it significantly improves detection accuracy and speed, achieving a mean average precision (mAP) of 80.5% and a detection speed of 75 Frames Per Second (FPS). Hu et al. (2022) applied the YOLOv3 network model to the identification and localization of sewage pipeline defects based on a self-designed pipeline detection robot system. Moreover, Chen et al. (2024) proposed a cascaded deep learning approach in 2022, combining YOLOv5 and pre-trained Vision Transformer (ViT) models, which performed well in detecting and classifying pipeline defects. Chen et al. (2024) introduced a novel cascaded deep learning model using the Swin Transformer Backbone YOLOv5 (SwinYv5) for object detection and a cross-residual CNN (CR-CNN) for quantifying pipeline defects.
Although deep learning offers high accuracy and the ability to discern geometric features of defects, enabling the simultaneous detection and classification of multiple pipeline defects, models such as YOLO architecture still face challenges in detecting small objects due to their limited size, low visibility, and the lack of large-scale training datasets. Future improvements in pipeline defect detection using deep learning models based on YOLO architecture can be achieved by introducing more sophisticated small object detection methods and creating large-scale datasets that include a variety of defect types and sizes.
This paper proposes a method for detecting defects in water supply and drainage pipelines based on the grounded language-image pre-training (GLIP) model and the YOLOv5 model. The defect detection utilizes CCTV-collected images of pipeline defects, which include structural defects (corrosion, misalignment, foreign object intrusion, and leakage) and functional defects (sediment, scaling, blockages, and floating debris), encompassing a total of eight types of defects. The GLIP model focuses on evaluating the detection success rate without the need for dataset fine-tuning, assessing the transferability of this approach and thereby demonstrating the effectiveness of using pre-trained models for defect detection. The advantages of this work are as follows:
(1) Rich real-world data: The dataset collected independently includes abundant real-world scenario data, enhancing the generalization capability of the model.
(2) Zero-shot and fine-tuning performance: The GLIP pre-trained model demonstrates excellent detection capability in both zero-shot scenarios, achieving efficient defect detection without requiring extensive training on specific data.
(3) Combining GLIP and YOLOv5: By integrating the GLIP and YOLOv5 models, this method exhibits outstanding detection accuracy and speed, making it highly practical and promising for application.
PIPELINE DEFECT DETECTION METHODS
The GLIP pre-trained model
The GLIP pre-trained model is a unified model for object detection and grounding tasks (Li et al. 2022), which boasts several advantages in the field of target detection: (1) compared to previous supervised models (such as Fast region-based convolutional neural network (RCNN) and Dynamic Head (Dai et al. 2021)), GLIP demonstrates superior detection performance in zero-shot and fine-tuning domains; (2) with only one-shot training, GLIP-L is competitive with fully supervised Dynamic Head models; and (3) GLIP can perform all downstream tasks without changing the model parameters.
For pipeline defect detection, the advantages of GLIP are as follows:
(1) GLIP's robust zero-shot performance allows the model to recognize targets of untrained categories, which is highly beneficial in scenarios where it is impractical to collect extensive data for every type of defect.
(2) GLIP is more accessible for grassroot pipeline maintenance workers who may not have education in artificial intelligence or deep learning, as specific tasks can be executed with GLIP without the need to adjust model parameters.
The YOLOv5 model
The YOLOv5 model is a single-stage object detection algorithm that predicts multiple class confidences and bounding boxes using the entire highest feature map simultaneously. Unlike two-stage object detection models (such as RCNN, Fast RCNN, Faster RCNN, and Mask RCNN) that first propose regions of interest (ROIs) and then perform classification, the YOLO algorithm treats object detection as a single-stage problem, predicting bounding boxes and their associated class probabilities from the entire image at once. This approach makes YOLO significantly faster than two-stage models, albeit with a slight trade-off in accuracy. Its flexibility allows for fine-tuning on specific datasets to achieve better performance.
The YOLOv5 model belongs to the family of compound-scaled object detection models. It uses a full CNN to process images, dividing them into a grid where each cell is responsible for detecting objects within that cell. YOLOv5 uses CSPDarknet as its backbone network, PANet as its neck network, and YOLO layers as its head. This design reduces the model's parameters and floating-point operations per second. These components enhance the flow of information and the utilization of low-level features in end-to-end training, thereby increasing the accuracy of multi-scale predictions and localization. YOLOv5 also uses the Generalized Intersection over the Union loss function and a weighted Non-Maximum Suppression process to obtain the optimal bounding boxes.
In this study, the YOLOv5 model is used for binary detection tasks because it balances classification accuracy with high detection efficiency. The ViT is employed as a cascaded model for pipeline defect classification. This combination of a pre-trained YOLOv5 object detection model and a ViT image classification architecture is suitable for offline analysis of pipeline inspection data, effectively balancing detection precision and efficiency. By leveraging existing pre-trained models, this method reduces the need for large amounts of specific defect training data and model tuning, thus achieving rapid analysis of pipeline inspection data.
RESULTS OF THE PIPELINE DEFECT DETECTION AND DISCUSSION
Results of the GLIP pre-trained model
In zero-shot scenarios, where the model has not been specifically trained for the task, the GLIP method performs well in detecting foreign object intrusion, scaling, leakage, corrosion, cracks, and floating debris and can correctly identify misalignment and sediment. This indicates that GLIP can identify defects without being trained on specific data. However, as shown in Figure 1(a), the detection boxes marked two parts of the pipeline but did not correctly distinguish the gap at the misaligned section. Additionally, while the GLIP model successfully detected sediment targets, it did not capture all instances of sediment. This suggests that further training or fine-tuning is needed to improve its accuracy for pipeline defect detection. To address this limitation and enhance detection performance, we employed YOLOv5, which was specifically trained on our dataset to ensure a more comprehensive identification of sediment and other defects. By combining the strengths of both models, we were able to achieve faster detection with GLIP and more precise results with YOLOv5.
Results of the YOLOv5 model
The YOLOv5 model achieved commendable performance under conditions of sparse samples or indistinct features in the existing dataset. The training was conducted using an Intel(R) Xeon(R) Silver 4210R CPU, 128 GB RAM, and an NVIDIA V100s GPU with 32 GB memory. The final training and detection results for the four types of pipeline defect detection (rupture, corrosion, leakage, and seepage) are as show in Figure 2:
The F1 confidence curve illustrates the F1 scores for various categories at different confidence thresholds. As shown in Figure 3, it is evident that the corrosion category performs exceptionally well within the high-confidence range, with an F1 score approaching 0.9. In contrast, the rupture and seepage categories exhibit relatively lower F1 scores, indicating that the model's precision and recall for these categories need further optimization.
The precision–recall curve provides insights into the model's precision at different recall rates. It can be observed that the corrosion category achieves very high precision and recall, demonstrating the best performance. However, the rupture and seepage categories have lower precision and recall. This suggests that there are certain challenges in detecting these categories, which may require a larger dataset or a more sophisticated model to improve detection performance for these categories.
Comparison of test results
Due to the limited size of our dataset, the four types of defect images with the most samples were selected for testing and validation. The test results are as show in Table 1:
The comparative analysis indicates that YOLOv5 has better results. The YOLOv5 model demonstrates strong performance in detecting defects, particularly corrosion and seepage, with high precision and recall in these categories. However, due to the limited size of our dataset, the model tends to overfit, reducing its ability to generalize well to new, unseen data. This overfitting is especially noticeable in the lower precision and recall for misalignment and seepage defects.
In addition, the GLIP with the pre-trained model exhibits robust performance in zero-shot scenarios, effectively detecting a wide range of defects without specific training on the dataset. This indicates strong transferability and generalization capabilities. By specifically training the GLIP model on the current dataset, it is likely to achieve superior performance. The GLIP model's inherent strength in feature extraction and defect-recognition suggests that, with dedicated training, it could overcome the limitations faced by YOLOv5 due to constraints related to dataset size.
CONCLUSION
This paper proposes a method for detecting defects in water supply and drainage pipelines based on the GLIP pre-trained model and YOLOv5, with experimental validation conducted on a limited-size dataset. The GLIP model, with its robust zero-shot detection capabilities, was used for rapid defect identification during the data collection phase. This allowed for efficient detection without the need for extensive model retraining. However, to further enhance the accuracy and performance of defect detection, we fine-tuned YOLOv5 on our dataset. By combining these two models, we were able to achieve a balance between fast, flexible detection and high precision, making our approach both practical and efficient for real-world applications.
With over 500,000 km of sewage pipelines needing maintenance in China alone (Fan et al. 2024), the widespread adoption of this method could benefit 440 million people who rely on urban water supply and drainage systems (Wang et al. 2021). Additionally, it can protect the surrounding soil and groundwater from pollution. Timely repair of water pipeline defects using this method is crucial for public health, the stability of industrial production, and the sustainable use of water resources. Future work will focus on expanding the dataset and further optimizing the GLIP model to improve detection accuracy.
FUNDING
This work was supported by the Water Conservancy Science and Technology Innovation Project of Guangdong Province (No. 2022-03) and the Shenzhen Science and Technology Program (Grant No. GJHZ20210705141403009).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.