ABSTRACT
The burgeoning growth of urban areas has escalated the necessity for efficient and precise leak detection in water distribution networks. Automatic detection methods based on deep learning are a state-of-the-art research topic. In this paper, a methodology that combines deep learning and data imaging is proposed. The framework employs pressure monitoring data and is anchored on the following three pillars: (1) the generation of a comprehensive dataset, encompassing one year of leak-free demand data derived from Fourier Series analysis and monitoring pressure under normal and leak conditions, (2) the transformation of pressure time series into images using kriging interpolation, (3) establishing convolution neural network (CNN) and evaluating its performance of abnormal identification. The effectiveness of the proposed methodology is assessed in different image sets under various leak conditions. The findings reveal that this method meets dependable and effective outputs for leak detection, with the deep learning model achieving a high true positive rate (TPR) of 98% and an area under the curve (AUC) of 94%. This study provides invaluable information for strategic action planning and the enhancement of water loss management protocols, especially in situations where water utilities and regulatory authorities grappling with limited budgets and diminishing revenues.
HIGHLIGHTS
Generating a dataset with monitoring pressure data under normal and leak conditions.
Utilizing data imaging technology to convert discrete data into continuous pseudocolor spatial pressure distribution images.
Developing a leakage detection model for water distribution networks using VGG16, a convolutional neural network.
Comparing the performance of the proposed model with other deep learning models.
INTRODUCTION
Water loss in water distribution networks (WDNs) is a significant issue in water resource management. Serious leakage incidents not only precipitate substantial water and energy wastage but also catalyze secondary disasters, which lead to the invasion of external pollutant contamination and bacterial through leak holes (Karim et al. 2003; LeChevallier et al. 2003; Fontanazza et al. 2015), and even ground collapse (Guo et al. 2013). According to statistics, the annual global volume of water loss amounts to approximately 126 billion cubic meters, translating to a financial loss of around US $39 billion (Liemberger & Wyatt 2018). Therefore, accurate and advanced leakage identification technologies are crucial for saving valuable water resources and fostering the sustainable development of society.
Numerous factors contribute to the occurrence of leakages, including the poor quality of pipes, exceeding service time limits, and alterations in mechanical conditions due to subway construction, etc. However, WDNs are typically buried underground and span vast geographical areas, which makes them vulnerable to structural integrity failure (Van Leuven 2011). Consequently, many minor leaks remain undetected over short periods, complicating the assessment of both the timing and location of a leak. To enhance the performance of leakage identification technologies, two primary categories of anomaly detection methods are extensively employed in the field (Li et al. 2015). The first category is hardware-based and focuses on the use of physical devices to detect discrepancies. The second category relies on software, which is designed to analyze and interpret data to identify anomalies (Valizadeh et al. 2009).
Many technologies rely on specialized hardware equipment, such as leak noise loggers (Muggleton et al. 2006), leak noise correlators (Guo et al. 2021), and infrared photography (Fahmy & Moselhi 2010). These technologies utilize distinct physical phenomena to accurately detect and locate leaks in WDNs. However, their application to large-scale WDNs is limited by significant challenges, including the high costs of the equipment, the labor-intensive nature of the operations, and the generally low efficiency of these methods when deployed on a large scale (Romano et al. 2011).
With the rapid advancement of supervisory control and data acquisition (SCADA) systems, the integration of advanced computer technologies such as machine learning for analyzing and processing monitoring data is becoming increasingly prevalent (Zhou et al. 2019). SCADA systems can control and monitor key hydraulic data of WDNs in real-time, such as node pressure and demand, providing critical data support for abnormal event identification technologies. Compared to flow meters, pressure sensors are more cost-effective and capable of providing instantaneous data (Sun et al. 2020), and enabling the collection of more comprehensive information on changes in operating conditions. Therefore, this study concentrates on analyzing variations in node pressure and their spatial distribution to enhance the accuracy of leakage identification.
However, the ability of SCADA systems to provide rich spatiotemporal information is limited by the vast scale, complex topology of WDNs and the limited number of monitoring points. Zhou et al. (2022) leveraged graph signal processing to reconstruct slow-varying components, which improves estimation accuracy and provides unknown nodal pressures. To further visualize the spatially continuous pressure variations in WDNs, this study employs spatial interpolation methods to generate continuous spatial data from point samples. Li & Heap (2008) categorized spatial interpolation methods into non-geostatistical interpolation methods, geostatistical interpolation methods, and combined methods, briefly describing a total of 38 methods. Subsequently, Li & Heap (2011) summarized over 70 spatial interpolation methods/sub-methods in environmental science, identifying inverse distance weighting (IDW), ordinary kriging (OK), and ordinary co-kriging (OCK) as the most frequently compared methods. According to the results of 53 comparative studies, OK demonstrated a favorable application effect.
Data-driven leak detection methodologies utilizing machine learning techniques have undergone extensive investigation, primarily concentrating on the analysis of pressure (Yu et al. 2023), flow (Moors et al. 2018), and transient wave data (Zhou et al. 2020). However, these approaches predominantly excel in detecting severe leaks, such as pipe bursts, or in identifying abnormal events within relatively simplistic WDNs. To effectively surmount the constraints inherent in machine learning, it is imperative to develop a novel approach characterized by: (1) the capacity to analyze complex features, (2) robustness sufficient to withstand uncertainties such as variability in water consumption and monitoring noise, (3) scalability to accommodate extensive detection datasets, (4) adaptability to WDNs of diverse scales and configurations, and (5) the ability for autonomous learning with reduced dependence on manual design.
Deep learning, an artificial intelligence (AI) approach, has undergone a significant resurgence since 2006. This revival has been driven by the development of new algorithms that project input spaces into progressively lower-dimensional latent representations in a hierarchical manner. These algorithms do not require domain expertise or human supervision, and can automatically extract complex patterns and relationships. And their performance significantly surpasses that of conventional anomaly detection methods (Pang et al. 2022). Recent research has leveraged deep learning to enhance leak detection capabilities in WDNs, demonstrating the effectiveness of this methodology. Romano et al. (2014) utilized AI techniques to forecast pressure and flow signal values and applied statistical methods to analyze signals from multiple and distinct District-Metered Areas (DMAs). These DMAs can be dynamically recalibrated as conditions in the WDN change. However, the effectiveness of this approach is influenced by the size of the DMA. As the scale of the network increases, there is a corresponding decrease in sensitivity to anomalous events. Rajabi et al. (2023) introduced a Conditional Deep Convolutional Generative Adversarial Network (CDCGAN) to convert images of node demand into corresponding pressure images. To differentiate between normal and leakage images, a threshold was set based on one year of normal operational monitoring data and the 3σ (three-sigma) principle. A leak is identified in the WDN when the similarity between the real-time monitoring images and the pressure images predicted by the CDCGAN falls below the threshold for five consecutive time steps. However, this threshold-based method does not accommodate real-time updates based on variations in weekly and seasonal changes, which can lead to diminished accuracy.
Inspired by the aforementioned studies on leak detection, this paper aims to delve into the complex relationships between hydraulic parameters and attributes of operational conditions based on a deep learning model. The core of this study is to extract deep features from pressure distribution images under varying conditions to enhance the accuracy of leak detection methods. This framework integrates geographic information of urban WDNs with monitored pressure data, and encodes these data into images using the Kriging interpolation method. These spatial pressure distribution images, along with derived labels, are fed into a convolutional neural network (CNN)-based backbone model. The model then extracts image features and identifies anomalous images indicative of leaks, thereby facilitating more accurate and efficient leak detection.
MATERIALS AND METHODS
The proposed method is a novel leak detection method based on data imaging techniques and a deep learning model. Different operation condition means different spatial distributions of monitoring data. According to this otherness, abnormal events are identified through the classification process. Section 2.1 introduces the framework structure of the proposed leakage detection method. Section 2.2 introduces the generation of training data. Section 2.3 introduces the data imaging technology based on Kriging interpolation. Section 2.4 introduces the backbone network of the leak detection based on visual geometry group 16 (VGG16).
Leakage detection framework based on VGG16
Generation of hydraulic parameters
The framework proposed in the paper is data-driven, so the amount of information stored in the dataset has a significant influence on the performance of the deep learning models. The dataset consists of pressure monitoring data for two operating conditions, including normal conditions and various leakages.
Generation of node demand under normal conditions
Flow and pressure are two important hydraulic indexes reflecting the operating state of the network. Under normal conditions, the variation of node water consumption or pressure data consists of daily, weekly, seasonal patterns and other uncertain factors, which is calculated by Equation (1).
Generation of node pressure under normal and leak conditions
Although the flow rate is more sensitive to leakages, pressure data are more likely to reflect real leak zones due to the amount of pressure sensors being far more than inlet flowmeters. The variation of different pressure sensors differs from the spatial position of sensors when a leak occurs. Therefore, some leak data could make the detection result more accurate. The core step of simulating leakages is adding a hole to the middle of the pipe, and setting different sizes of the hole. As the probability of a leak occurring on every pipe is the same, each pipe is set as a potential leak pipe, and Table 3 lists the parameters for simulating various potential leak scenarios.
Operating state Parameter . | Normal . | Leak . |
---|---|---|
Time step | 5 min | 5 min |
Duration | 365 days | 7 days |
Number of leak scenarios per pipe | – | 20 (1/20 DN–DN) |
Number of potential leak pipes | – | 905 |
Time of leak occurrence | – | 1–4 am |
Total number of samples | 105,120 | 868,800 |
Operating state Parameter . | Normal . | Leak . |
---|---|---|
Time step | 5 min | 5 min |
Duration | 365 days | 7 days |
Number of leak scenarios per pipe | – | 20 (1/20 DN–DN) |
Number of potential leak pipes | – | 905 |
Time of leak occurrence | – | 1–4 am |
Total number of samples | 105,120 | 868,800 |
Note: DN represents the nominal diameter of the leak pipe, measured in millimeters (mm).
Name . | Optimal choice . |
---|---|
Optimizer | SGD |
Learning rate | 0.001 |
Batch size | 64 |
Epochs | 200 |
Learning rate decay | 0.2 |
Momentum | 0.6 |
Weight decay (L2 regularization) | 0.01 |
Decay milestones | [60,120,160] |
Gamma | 0.2 |
Loss function | Cross entropy loss |
Name . | Optimal choice . |
---|---|
Optimizer | SGD |
Learning rate | 0.001 |
Batch size | 64 |
Epochs | 200 |
Learning rate decay | 0.2 |
Momentum | 0.6 |
Weight decay (L2 regularization) | 0.01 |
Decay milestones | [60,120,160] |
Gamma | 0.2 |
Loss function | Cross entropy loss |
Models . | TPR . | FPR . | F1 . | AUC . |
---|---|---|---|---|
VGG16 | 0.98 | 0.01 | 0.94 | 0.94 |
VGG13 | 0.83 | 0.00 | 0.89 | 0.91 |
ResNet18 | 0.80 | 0.00 | 0.89 | 0.91 |
SqueezeNet | 0.67 | 0.00 | 0.79 | 0.82 |
Models . | TPR . | FPR . | F1 . | AUC . |
---|---|---|---|---|
VGG16 | 0.98 | 0.01 | 0.94 | 0.94 |
VGG13 | 0.83 | 0.00 | 0.89 | 0.91 |
ResNet18 | 0.80 | 0.00 | 0.89 | 0.91 |
SqueezeNet | 0.67 | 0.00 | 0.79 | 0.82 |
Transforming pressure data to images
In order to deeply mine the information contained in the various pressure data, the pressure data was transformed into images and a deep learning model was used to extract image features. The data imaging method used in this paper is kriging interpolation, one of the popular spatial interpolation methods, that could interpolate the values of the primary variable at unsample locations to form a continuous spatial field (Li & Heap 2008).
The basic estimation formula of this method can be represented as weighted averages of sampled data, as Equation (3). Due to spatial variability did not increase linearly with distance, the semivariance, γ(h), could be estimated by Equation (4), and Exponential, Gaussian, and Spherical models were developed to quantify the similarity of spatial variability.
Leak detection based on VGG16
CASE STUDY AND RESULTS
Description of the case study
Establishment of dataset
There are many methods to simulate leakages such as the artificial reservoir method (Ang & Jowitt 2006), the emitter method (Giustolisi et al. 2008), and the additional node demand method (Shao et al. 2019). The simulating method adopted in this paper is to add a leak hole in the middle of the pipe and analyze the hydraulic operating parameters by the Water Network Tool for Resilience (WNTR) (Klisel et al. 2018). WNTR is an open-source Python package designed to help water utilities simulate and analyze the resilience of WDN, which is compatible with EPANET 2.00.12 and EPANET 2.2. The information on simulation parameters is listed in Table 2.
Pressure data under different leakage scenarios is also important for VGG16. It is assumed that the possibility of a leak in 905 pipes is the same, and the size of the leak hole represents the severity of the leak. Table 1 shows the leak start time, the duration of the leak, the degree of leak and other simulation parameters.
VGG16 training
The experiments were conducted on a computer that is configured with NVIDIA GeForece GTX 1660 SUPER; Windows 11 64-bit operating system. CUDA version 11.8, Pytorch 2.1.2, Python 3.9. The software mainly used is the OpenCV image-processing package. For this experiment, pressure data under two operating conditions were combined to form a dataset, and then the kriging interpolation method was used to generate the pressure images, with a uniform size of 224 × 224. All these images compose an image set. A real image set should be an unbalanced image set, with a much larger proportion of normal images than abnormal ones. Therefore, 5,500 normal images and 480 abnormal images were randomly and repeatedly selected from the set for model optimization and then divided into three parts with the proportions for training, validation, and test set as 60, 20, and 20%, respectively.
The hyperparameter optimization provides the identification model with the best learning rate and weight updating, which ensures that the model has the least loss and highest performance. By comparing the validation loss of VGG16 model under different settings, the final hyperparameter information adopted is shown in Table 2.
Performance evaluation
DISCUSSION
Compared with other related research methods, the leakage identification method proposed in this paper, based on a deep learning model, exhibits superior accuracy. This enhancement is attributed to the integration of spatiotemporal information from pressure monitoring data. By employing spatial interpolation techniques, limited real-time monitoring data are transformed into a continuous spatial distribution image of pressure. A well-established CNN model then extracts spatial distribution and change characteristics from these pressure images, identifying anomalies through the deployment of a specific sliding time window.
This method offers significant advantages over traditional hydraulic models due to its minimal need for preliminary tasks, such as model establishment and validation. By utilizing basic network geographic information, extensive time series data of normal operational pressure, and limited sequences of abnormal pressure, it enables the analysis of continuous spatial pressure distributions. This not only enhances real-time monitoring but also improves adaptability. In contrast, methods based on machine learning algorithms – such as Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and clustering algorithms – typically require manual configuration of feature extraction parameters. This requirement can complicate the process of learning complex features and impede the attainment of high performance.
Deep neural networks, as demonstrated by empirical evidence, tend to achieve better recognition outcomes. Rajabi et al. (2023) utilized a conditional convolutional generative adversarial network (CDCGAN) to produce grayscale maps of pressure distributions under normal conditions, comparing the structural similarity index (SSIM) of real-time images and the threshold images. However, the threshold value of this method is determined according to the normal pictures in the database, and the latest pressure fluctuation pattern cannot be updated according to seasonal changes, and the recognition accuracy of this method is 70%. Li et al. (2022) developed a leak detection model leveraging the ResNet18, designed to mitigate issues related to gradient vanishing and network depth. In our study, we conducted a comparative analysis of the VGG16 model and the ResNet18 model under identical hyperparameters. The results indicate that the TPR for the VGG16 model was 0.98, demonstrating superior performance compared to the TPR of 0.8 achieved by the ResNet18 model.
In addition to these comparisons, we also evaluated our deep learning model against other architectures such as SqueezeNet and VGG13. Our results consistently showed the superiority of the VGG16 model in leakage detection tasks, highlighting its robustness and reliability.
Despite the demonstrated advantages, our approach has not yet been tested on more complex or large-scale networks. Future work will focus on extending the application of this method to expansive urban networks and evaluating its performance under varied environmental conditions. This ongoing research will also explore the integration of real-time adaptive mechanisms to update the model according to seasonal changes in pressure patterns, thereby enhancing its accuracy and reliability.
In summary, the proposed deep learning-based leakage detection method not only outperforms traditional hydraulic models and other machine learning approaches but also offers significant potential for further optimization and application in diverse pipe network scenarios. Future research will aim to address current limitations and expand the applicability of this approach, ensuring its effectiveness in more complex and larger-scale WDNs.
CONCLUSION
The rapid advancement of SCADA systems has necessitated enhanced capabilities for the automatic detection and identification of pressure change signals at various monitoring points. However, traditional machine learning methods fall short in analyzing the deeper relationships within data, which curtails the accuracy of anomaly detection. In response to this limitation, this study proposes an advanced leakage detection approach for WDNs utilizing the VGG16 model. This method projects input spaces into progressively lower-dimensional latent representations hierarchically, eliminating the need for domain expertise or human supervision. Consequently, it can autonomously extract complex patterns and relationships, enhancing the detection process.
The essence of this method involves converting original pressure data into images and employing a deep learning model to accurately identify leakage events in the network. This approach effectively mitigates noise interference and demonstrates robustness. Additionally, the method for recognizing pipe network pressure images via deep learning models can be readily adapted to other large-scale pipe networks. Compared to VGG13, ResNet, and SqueezeNet, this method exhibits superior recognition performance.
Future research can focus on exploring the impact of varying the number and location of pressure sensors on data imaging and evaluating how different imaging results affect the accuracy of image-based anomaly identification. Furthermore, semantic segmentation could be utilized to visualize the decision-making process of the deep learning model, thereby facilitating the extraction of leakage location information in WDN.
ACKNOWLEDGEMENT
This work was supported by the National Key Research and Development Program of China (2023YFC3208102, 2022YFF0606905), and the Ningbo Science and Technology Plan Project (2023Z057).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.