ABSTRACT
Water companies use closed-circuit television (CCTV) to inspect the condition of sewage pipes. The reports generated by surveyors help companies to plan for the maintenance and rehabilitation of sewage pipes. A surveyor needs to record the water level at the start of every survey and any point of significant change in level. Recording the water level provides insight into the cross-section area being surveyed, highlighting any underlying issues with the pipe. An abrupt change in water level can indicate a poor gradient of pipe, a build-up of debris, or even hidden structural damage. However, manually recorded water levels are often unreliable due to factors like surveyor experience, the camera angle, light conditions, and pipe shape. In this paper, we have discussed and compared six methods for the automated estimation of water levels in sewage pipes. Using the segmentation masks extracted with DeepLabv3 as inputs into an Extra Trees regressor achieved the most accurate results. To perform an objective comparison of the techniques, mean absolute error (MAE), root mean square error (RMSE), and max error were used as evaluation metrics.
HIGHLIGHTS
Identifies a method to estimate water level in sewage pipes with an average error of ∼5.7%.
Highlights the subjectivity and variability of human-reported water levels during sewer surveys.
Compares six methodologies used in the study to automate the estimation of water level.
Highlights data quality issues and the significance of accurately labelled data for computer vision tasks.
INTRODUCTION
Closed-circuit television (CCTV) cameras are commonly used for inspecting the condition of wastewater pipes. The process requires remotely controlled cameras to be passed through pipes, and the collected footage annotated by trained surveyors. These reports are critical to the effective maintenance and rehabilitation of sewer networks. In the UK, to ensure uniform and high-quality surveys, these surveys are annotated as per the industry-wide standards specified in the Manual of Sewer Condition Classification (MSCC) (WRc plc 2013).
Example of CCTV images. Images (a) and (b) showcasing distorted pipe wall and bellied pipe shape, presence of deposits. Images (c) and (d) represent regular pipes with water present in them. Water level recorded by the surveyors in the images from left to right: 50, 25, 50, and 35%, respectively.
Example of CCTV images. Images (a) and (b) showcasing distorted pipe wall and bellied pipe shape, presence of deposits. Images (c) and (d) represent regular pipes with water present in them. Water level recorded by the surveyors in the images from left to right: 50, 25, 50, and 35%, respectively.
Despite the importance of water level estimation as an element of the survey process, limited literature on automated identification exists, even given the growing interest in the application of AI to sewer surveys. Haurum et al. (2020) carried out a study in which the effects of different standards for labelling water levels were compared. Utilising a convolutional neural network (CNN) for the estimation of water level, Haurum et al. (2020) achieved micro-F1 scores of 40 and 79% on two versions of the Danish sewer inspection standards. These results showed promise for application, although significant noise and subjectivity associated with the dataset were noted by Haurum et al. (2020). Tomperi et al. (2022) developed cost-effective soft sensors that can be installed at frequent measurement points to overcome the challenge of data inconsistency by installing expensive sensors at just key locations. The sensors were mounted under the cover of a sewer manhole which helped to determine the distance to the water's surface. The distance to the bottom of the pipe was measured initially and the sensor provided the distance to the water surface, which aided in calculating the water level in the pipe. The study assesses the method's accuracy, reliability, and limitations, highlighting its value as a cost-effective solution for monitoring sewer systems in smaller, resource-constrained communities. The year-long research showcased that it requires immense care to mount the sensors properly and the changing conditions inside a sewer manhole can make it difficult to record accurate data (Tomperi et al. 2022). The study carried out by Ji et al. (2020) investigated three camera-based techniques for measuring sewage flow rate: deep learning, image processing, and direct visual inspection. MATLAB was utilised for image processing to detect water levels. The detection rates of boundary lines in images processed, and direct inspection were 12 and 53%, respectively. Ji et al. (2020) summarised that deep learning demonstrated substantial promise to outperform conventional sensors, with 100% water level detection.
Although the studies mentioned above show promising results, the challenges of data quality, real-time monitoring, environmental robustness, and integration with existing systems highlight the need for continued research. Given the constraints on budget and resource allocation, companies may hesitate to invest in low-cost sensors due to various concerns like maintainability, long-term reliability, and scalability. Hence, it is important to focus on research that can leverage existing data and infrastructure. While CNNs show great promise in water level estimation, they can be computationally expensive and require large amounts of labelled data to avoid overfitting when applied to varied datasets. This paper focuses on studying the cross-sectional areas of sewage pipes to effectively isolate the water region from irrelevant backgrounds, ensuring clear inputs to the regressor. The primary goal of the study is to enhance the accuracy and reliability of water level estimation by leveraging traditional image processing and machine learning techniques.
In this study, we focus on six distinct methods to estimate water levels in sewer pipes by applying machine learning to feature descriptors and image texture, estimating using geometric features, and applying CNN and segmentation models. The image texture method was further explored in two distinct ways using the cell-wise and super-pixels method.
METHODOLOGY
The study was conducted over an extended period, during which the availability of datasets evolved. The study began with images extracted from historic South West Water (SWW) surveys, each labelled by a surveyor. This dataset consisted of 1,044 images with the presence of water in them and provided the necessary foundation for the exploration of the research objectives. Later, the WRC dataset became publicly available and was widely recognised as a standard, it was adopted for subsequent analysis to ensure that the study was aligned with the current benchmark dataset in the market and it consisted of 1,009 images of sewage pipes containing water in them. The images from these datasets were modified as required to suit the application of different methods used throughout the study. These modifications are further explained in detail in the ‘Results and Discussion’ section.
Feature descriptor method
Image (a) represents a sample image from the inside of the pipe with water present. Images (b) and (c), respectively, demonstrate the visual representation of HOG features and LBP extracted in image (a).
Image (a) represents a sample image from the inside of the pipe with water present. Images (b) and (c), respectively, demonstrate the visual representation of HOG features and LBP extracted in image (a).
Geometric method
Image (b) is a visual representation of a theoretical approach adapted to predict water level in sewage pipe image (a) using the geometric method. The circle is detected using a pipe wall and the tangential lines highlight the waterline in the pipe. The intersection of waterlines with the circle results in a segment, the area of which can be used to estimate the water level percentage in the pipe. Image (c) is the result of the application of the geometric method in image (a). The yellow lines represent the Hough lines, while the pink circle represents the circle that was identified by the method proposed above. The line extending to intersect the circle is represented by the purple colour. The surveyor's labelled water level of 15%, and the estimated water level using the geometric method is 28% (∼30%).
Image (b) is a visual representation of a theoretical approach adapted to predict water level in sewage pipe image (a) using the geometric method. The circle is detected using a pipe wall and the tangential lines highlight the waterline in the pipe. The intersection of waterlines with the circle results in a segment, the area of which can be used to estimate the water level percentage in the pipe. Image (c) is the result of the application of the geometric method in image (a). The yellow lines represent the Hough lines, while the pink circle represents the circle that was identified by the method proposed above. The line extending to intersect the circle is represented by the purple colour. The surveyor's labelled water level of 15%, and the estimated water level using the geometric method is 28% (∼30%).
Flowchart outlining the process of water level prediction in an image using the geometric method along with super-pixels.
Flowchart outlining the process of water level prediction in an image using the geometric method along with super-pixels.
Due to noise introduced by faults and imperfections in the pipe, we found that deriving water levels from edges alone was unreliable. We, therefore, investigated the use of super-pixels to simplify the image, refining the edges between water and pipe, upon which the method is reliant. Super-pixels are used to segment an image into relatively homogeneous and spatially compact regions utilising similarity measures based on perceptual features (Achanta et al. 2010). Having accurately detected the waterline and the pipe wall, the intersection of these elements can be used to estimate the fraction of pipe occupied by water and the water level present in the pipe.
Texture-based method
Hypothesising that the greatest visual difference between pipe and water was the surface of each material, we explored the use of texture descriptors, classifying regions of the image based on the presence of water.
Texture-based methods with super-pixels: A classifier was trained on the super-pixel patches of images to predict the presence of water in individual patches. These super-pixels were generated using the SLIC segmentation algorithm (Achanta et al. 2012), and classification was performed using the histogram of their LBP descriptors as feature vectors (256 bins) (see Figure 5). Having classified all segments in an image as water or pipe, the water level was inferred by calculating the proportion of segments containing water.
Cell-wise method for water level prediction: In this approach, the image was split into a uniform grid of cells (8 × 8 cells). Like the previous method, the presence of water was classified in each cell, based on histograms of their LBP descriptors (see Figure 6). The output of the classifier applied to each patch was then provided to a regressor to predict the overall water level in the image.
Images showcasing the application of SLIC segmentation for extraction of super-pixels on an image (left to right in order). The method predicted a water level of 20% and the labelled water by the surveyor is 25%.
Images showcasing the application of SLIC segmentation for extraction of super-pixels on an image (left to right in order). The method predicted a water level of 20% and the labelled water by the surveyor is 25%.
Figure illustrating a cell-wise method for water level prediction. Image (a) is the original image on which prediction is to be done, and (b) illustrates how an image is uniformly divided into 8 × 8 cells. Image (c) is a representation of cells predicted by the classifier as water present in them. The predicted water level by the cell-wise method in image (a) is 20%, whereas the labelled water level by the surveyor is 25%.
Figure illustrating a cell-wise method for water level prediction. Image (a) is the original image on which prediction is to be done, and (b) illustrates how an image is uniformly divided into 8 × 8 cells. Image (c) is a representation of cells predicted by the classifier as water present in them. The predicted water level by the cell-wise method in image (a) is 20%, whereas the labelled water level by the surveyor is 25%.
CNN regression
Sample image from SWW dataset. Image (b) represents the class activation map (CAM) extracted in image (a). The predicted water level is 10%.
Sample image from SWW dataset. Image (b) represents the class activation map (CAM) extracted in image (a). The predicted water level is 10%.
DeepLabv3 – segmentation of images to extract a mask followed by a regressor
Image (a) represents a sample image from the WRC dataset and image (b) represents the segmentation mask extracted from that image. The method correctly identified a water level of 25%.
Image (a) represents a sample image from the WRC dataset and image (b) represents the segmentation mask extracted from that image. The method correctly identified a water level of 25%.
The performance of the proposed methodologies was evaluated using a range of metrics to ensure comprehensive analysis. Root mean square error (RMSE) and mean absolute error (MAE) were used to assess the accuracy and average deviation of the predictions, while max error was used to provide insight into a single maximum prediction error made by the method. These metrics are mentioned in Table 1, in the ‘Results and Discussion’ section.
Performance metrics for each alternative water level detection method applied to CCTV images
. | Avg. error (%) . | Max error (%) . | RMSE (%) . |
---|---|---|---|
Feature descriptor method | 7.22 | 85 | 10.83 |
Geometric method | 20.00 | 100 | 27.46 |
Texture-based method along with super-pixels | 10.71 | 65 | 15.65 |
Cell-wise method for water level prediction | 5.70 | 35 | 8.00 |
CNN regression | 5.69 | 40 | 8.34 |
DeepLabv3 – segmentation of images to extract a mask followed by a regressor | 5.68 | 30 | 8.41 |
. | Avg. error (%) . | Max error (%) . | RMSE (%) . |
---|---|---|---|
Feature descriptor method | 7.22 | 85 | 10.83 |
Geometric method | 20.00 | 100 | 27.46 |
Texture-based method along with super-pixels | 10.71 | 65 | 15.65 |
Cell-wise method for water level prediction | 5.70 | 35 | 8.00 |
CNN regression | 5.69 | 40 | 8.34 |
DeepLabv3 – segmentation of images to extract a mask followed by a regressor | 5.68 | 30 | 8.41 |
RESULTS AND DISCUSSION
Examining Table 1, we can see large variations in the effectiveness of the methods explored. At a glance, the ‘DeepLabv3 – segmentation of images to extract a mask followed by a regressor’ method performed the best giving the lowest error rates and least variation in error.
The methods mentioned above were applied to images extracted from historic SWW surveys, each of which had previously been labelled by a surveyor as part of standard maintenance practice.
The figure represents the imbalance in the SWW dataset. The figure showcases the data distribution count before and after the data validation exercise and the count of labels by our survey team.
The figure represents the imbalance in the SWW dataset. The figure showcases the data distribution count before and after the data validation exercise and the count of labels by our survey team.
This figure illustrates the distribution of water level percentages in the WRC dataset, emphasising the significant imbalance in the data.
This figure illustrates the distribution of water level percentages in the WRC dataset, emphasising the significant imbalance in the data.
Application of the feature descriptor method performed well on low, water levels (5, 10, and 15%), but could not accurately predict higher water levels, this may be attributed to the large imbalance of data labels as shown in Figure 9, with more labelled data available for lower water levels. Other contributory factors may include image quality, camera orientation, and in many cases, a visually similar appearance of the texture of pipe walls and water.
The geometric method appeared to be a logical technique for measuring water levels, however, in practice, the presence of tide lines, the texture of deposits, or text within an image made obtaining adequate water line marks problematic. Due to the large number of edges in any given CCTV image, identifying the correct water lines is challenging. Limiting lines to only those plausible, using gradients and intersections with the pipe circumference, proved too unreliable for practical use. Another issue arose in pipes with very smooth texture and similarity in colour between pipe and water. This issue restricts the number of edges that can be detected within the image, further limiting the reliability of this method.
The texture-based method struggled to distinguish between pipe wall and water within the pipe due to the similarities in appearance. These similarities often resulted in overestimations of water level. To improve the performance of the texture-based method, we aggregated texture descriptors into histograms, for both binary and Gray code formats. This approach was computationally efficient and yielded consistent results for features extracted by LBP, highlighted by the lower maximum error seen in Table 1.
For the CNN regressor, we trained using just the publicly available WRC dataset with a 65–15–20% split in training, validation, and testing, respectively. However, due to the huge imbalance in the WRC dataset as shown in Figure 10, the regressor was not robust in making predictions for extreme water levels. To overcome this challenge and as CNNs require large datasets for training, the validated SWW data and the publicly available WRC dataset were combined to train the CNN regressor. However, insufficient data with higher water level percentages was still an issue. As a result, the outcomes of this experiment were comparable to those obtained from the WRC dataset using CNN regression, as presented in Table 1.
The DeepLabv3 (Chen et al. 2017) model was trained using the publicly available WRC dataset to segment water in the images. Based on the light conditions, image quality, and blurriness present in the image, only 405 water level images from the WRC dataset were used to label the segments with water in an image. The trained DeepLabv3 (Chen et al. 2017) model was able to accurately identify water lines in the images, which was challenging in the methods seen above. The masks highlight relevant features within an image, making it easier for the model to focus on important regions. Despite of the imbalance present in the labelled data, this method performed the best with the significantly lower maximum error and low average error, as shown in Table 1.
High-quality, accurately labelled images are crucial for training models because they enable the model to learn from complex patterns, features, relationships, pixel intensities, textures, and colours, ultimately leading to better predictions. Low-resolution images can obscure essential details, making it harder for the model to identify key features. The subjective nature of recording water levels, as illustrated in Figure 9, can mislead models, causing them to learn incorrect associations between features and actual values. The data used for this analysis were not originally collected to develop machine learning or AI models. Some images are either too dark or too bright due to the camera focusing on the water or pipe walls, leading to a bias in the model's predictions based on lighting conditions. Additionally, the presence of text in varying positions and formats could cause the models to mistakenly interpret the text as a critical feature. Blurry images, caused by camera movement, improper focus, or water splashes on the lens, further degrade image quality. These factors collectively impact data quality. However, using DeepLabv3 (Chen et al. 2017) for segmentation has shown improvements by allowing the model to ignore noise in images, although the presence of water tide marks or settled deposits may still complicate accurate water level predictions. The DeepLabv3 segmentation model followed by a regressor can generalise more using the labelled data used in training and hence performs best at predicting extreme as well as average water levels.
Ji et al. (2020) summarised in their study that even with a small data quantity, it is possible to achieve good accuracy in the detection of water lines, and as previously stated by Haurum et al. (2020), deep learning methods do have potential to estimate water level provided, we have a good and balanced dataset. The study carried out in this research shows that it is possible to estimate water level in sewage pipes accurately provided we have good quality and quantity data for training our models and each method can be refined further to make better predictions. Furthermore, data augmentation can be applied to capture masks at different camera angles.
CONCLUSION AND FUTURE WORK
This study describes preliminary work to investigate a variety of methods to detect water levels in sewage pipelines. From the various methodologies described in this research for predicting water level in sewage pipes, the segmentation of images by extracting masks using DeepLabv3 followed by an ExtraTrees regression produced the most reliable findings with an error rate of 5.68%. The CNN regression showed a similar performance. This error rate is acceptable for practical application and has great potential for predicting water levels in CCTV sewer surveys. These results were achieved despite the subjective nature of water level estimation and the imbalanced nature of our datasets.
Despite showing similar performance by the CNN regression and DeepLabv3 (Chen et al. 2017) segmentation method, there is a huge difference in processing time by these methods. CNN regression is faster as compared to DeepLabv3 segmentation, but it tends to predict a higher percentage than the true water level percentage. Secondly, irrespective of the imbalance present in the dataset, the combination of DeepLabv3 segmentation and the regressor showed strong potential in accurately predicting water levels at higher percentages.
The datasets available had a high imbalance between the high and lower water level percentages, which makes it difficult for the models to be robust. The pipe images with no water present sometimes contained tide marks or deposits which the model identified as water lines. This made it challenging to train the models for 0% water level, resulting in an emphasis on tide marks by models rather than water present in the pipes. We tried adding blank masks for 0% water levels to the training data, however, no significant difference was seen in the estimation.
ACKNOWLEDGEMENTS
We thank Radenko Danilovic, Kelly Mackey, and Raveena Murali, our team members, for their guidance and support.
FUNDING
This work was supported by the UKRI Future Leaders Fellowship scheme. Grant title: Full automation of sewer CCTV surveys [Grant number: MR/V024655/1].
DATA AVAILABILITY STATEMENT
South West Water data cannot be made publicly available; readers should contact the corresponding author for details. The WRC dataset is publicly available for use here: https://ukwir2021.my.site.com/spring/s/uservoice/a0JNz0000005dAPMAY/ofwat-innovation-challenge-artificial-intelligence-and-sewers (note that a free “Spring” account will be required for access).
CONFLICT OF INTEREST
The authors declare there is no conflict.