This research article presents a data-driven approach for detecting bursts in water distribution networks (WDNs). The framework uses spatiotemporal information from monitoring pressure and unsupervised learning model. This approach employs three stages: (1) benchmark dataset acquisition, (2) spatiotemporal information analysis, and (3) burst detection model construction. First, the benchmark datasets were the normal dataset initially obtained by the clustering algorithm. Second, spatiotemporal information features are extracted from multimoment time windows from multiple sensors, including the distance and shape features. Third, burst detection was performed based on the isolation forest technique. A WDN is used to evaluate the performance of the method. Results show that the method can effectively detect the burst.

  • Burst detection method based on the unlabeled monitoring data.

  • Multiple consecutive moments of data.

  • Spatial data information for multiple meters.

  • Two dimensions of data features.

  • The burst detection method trained with monitoring data.

Urbanization, climate change, and global water scarcity pose threats to water supply security (Butler et al. 2014). Water distribution networks (WDNs) are one of the main infrastructures of cities and ensure the urban development and livelihood of the inhabitants (Gong et al. 2014; Oliker & Ostfeld 2014; Yan et al. 2019). The safe, efficient, and economic operation of WDNs is essential in the modern society (Jensen & Jerez 2018). Water loss is primarily caused by bursts and leaks in WDNs, often occurring in aging and deteriorating infrastructure. For example, China lost approximately 7.85 × of water in 2017 (MOHURD 2017). Bursts occur at high to moderate flow rates, and their impact can persist for a few hours when reported or potentially several days when unreported. On the other hand, leaks are characterized by lower flow rates and longer durations, often found throughout WDNs. Background leakage refers to the small leaks that happen at service connections, generating flow rates too low to be physically detected (Farley & Trow 2003). In addition, burst events may decrease the life span of water supply infrastructure and cause environmental problems (Xu et al. 2014) and other social losses, such as floods and road collapses (Fox et al. 2016; Laucelli et al. 2016, 2017). One of the crucial research tasks in WDNs is focused on timely burst detection and mitigating the hazards they present. Burst events consist of unawareness, detection, location, and repair periods (Mounce et al. 2010; Wu & Liu 2017). Consequently, this notion means that the effect of the burst detection method is closely related to the length of burst events. The occurrence of burst events can be identified in a timely and effective manner, shortening the run time of the burst events and reducing the volume of bursts (Tornyeviadzi et al. 2023).

Previous research on burst detection has focused on transient-based, model-based, data-driven detection methods. The transient-based methods detect bursts by analyzing monitoring data in time or frequency domains (Datta & Sarkar 2016). One technique involves directly detecting transient negative pressure waves caused by bursts. Burst detection is then completed by analyzing the simulation results of the transient model and the differences in the transient measurement data (Colombo et al. 2009; Srirangarajan et al. 2013; Lee et al. 2016). However, this approach requires collecting large amounts of high-frequency data, which can be expensive to transmit. Moreover, transient signals due to bursts can be masked by background noise and other events. Another technique is to analyze the negative pressure propagation and reflection processes caused by the burst (Mpesha et al. 2001; Gong et al. 2013, 2014). However, due to the complexity of the actual WDNs, including valves, tanks, and other components, any one of which may lead to severe attenuation of transients. So far, most of the tests are carried out in a simple WDN under strictly controlled experimental conditions (Wu & Liu 2017). It is still difficult to apply the transient-based method in actual WDNs, especially in large and complex ones.

The model-based methods (Meseguer et al. 2014; Jensen & Jerez 2019; Sophocleous et al. 2019; Li et al. 2021; Zhang et al. 2021a, 2021b) in previous studies detect bursts by analyzing the correlation between the estimates given by the hydraulic model and measurements obtained from the Supervisory Control and Data Acquisition (SCADA) system. The performance of the methods is influenced by the availability and accuracy of hydraulic models (Savic et al. 2009; Sanz et al. 2016; Menapace et al. 2018; Huang et al. 2022). Reliable hydraulic models consist of hydraulic model construction and hydraulic model calibration. The hydraulic model calibration involves verifying the unmeasurable parameters, including nodal demand, pipe roughness, and pipe diameter, based on measurable parameters such as pipe length, pipe flow rate, and nodal head. Hydraulic models contain the errors in model construction, nodal demand uncertainties, and measurement noise (Blesa & Perez 2018). Furthermore, high computational costs also hinder the application of model-based methods (Romero et al. 2022).

The new era of artificial intelligence and big data has resulted in the development of innovative methods for creating sustainable management modes (Savic 2019; Wu et al. 2021). With sufficient acquisition equipment, a good and widely concerned method is to utilize and analyze the measured data collected by the SCADA system to detect bursts (Hu et al. 2021). The data-driven approaches based on a large amount of data to detect burst are highly effective (Zaman et al. 2020) and do not rely on the accuracy of the model. This method can be categorized as classification methods, prediction–classification methods, statistical methods, and unsupervised clustering methods (Wu & Liu 2017; Hu et al. 2021). Classification methods such as artificial neural networks (ANNs) (Mounce et al. 2014; Pérez-Pérez et al. 2021), graph neural networks (Zanfei et al. 2022), convolutional neural networks (Shukla & Piratla 2020), and k-nearest neighbor (Bermúdez et al. 2020; Tariq et al. 2021) distinguish bursts from normal data. Prediction–classification methods aim to detect bursts by evaluating the residual errors of predicted and actual values as per certain evaluation criteria. Examples include ANN (Mounce et al. 2002, 2010; Romano et al. 2014; Fang et al. 2019), long short-term memory (Wang et al. 2020), linear Kalman filter (Ye & Fenner 2011), polynomial functions based on weighted least squares (Ye & Fenner 2014), and random forest (Huang et al. 2018; Zhang et al. 2021a). The accuracies of this approach depend on the accuracy of the prediction or classification model (Hu et al. 2021). Statistical approaches detect bursts based on the statistical process control instead of prediction or classification models. The burst detection methods have been widely used such as Western Electric Company (WEC) rules (Ahn & Jung 2019), Hotelling control chart (Palau et al. 2012; Hashim et al. 2020), cumulative sum method (Ahn & Jung 2019), and exponentially weighted moving average (Nam et al. 2019). However, the inappropriate assumption for data distribution in statistical methods results in performance deterioration (Wu & Liu 2017). Specifically, the distribution of the monitoring data is assumed to be Gaussian. The method can serve as an important step of other detection methods to improve the performance in detecting bursts. In addition, the unsupervised clustering algorithm allows features, such as cosine distances (Wu et al. 2018), to be extracted from available data for burst detection.

The aforementioned classification and prediction–classification methods are based on the supervised binary classification models. An important prerequisite for the successful application of the aforementioned algorithm is to have a large amount of labeled data. The major limitation is that not all the accurately labeled burst data are available. In future research, the application of an unsupervised model is promising and worth investigating (Wu & Liu 2017). In addition, the statistical approach can serve as an important step in other detection methods and be combined with other methods to detect bursts (Wu & Liu 2017; Hu et al. 2021).

Burst events will result in different pressure variation amplitudes of multiple sensors. These events also result in a sustained reduction of pressure over a period of time. Consequently, pressure data from different sensors are temporally and spatially connected. On the one hand, in the previous research, the burst is detected by a single time step (Wang et al. 2020). On the other hand, monitoring data information from multiple sensors is more reliable than a single sensor. The correlation features of the monitoring data from multiple pressure sensors are extracted to detect pipe bursts (Wu et al. 2018; Xu et al. 2020). The method for extracting spatiotemporal correlation pressure information can improve the accuracy of burst detection.

In recent years, district metering areas (DMAs) have been widely adopted for the control and management of leakage as well as burst in WDNs. WDNs are typically divided into smaller areas called DMAs, which are convenient for independent metering (Moors et al. 2018; Zhang et al. 2019). However, burst detection is difficult due to unpredictable variations in consumer demand, measurement noise, and seasonal effects. The DMA-based burst detection method in this article was investigated to fill the aforementioned gaps. The main idea of the method is to extract pressure data features and detect burst by using an unsupervised model. The major three contributions are as follows: (1) the normal pressure benchmark dataset was established on the basis of historical monitoring data to provide a benchmark data basis for timely data analysis; (2) the data features of the temporal and spatial information were extracted to maximize the distinction between normal and burst data; (3) an unsupervised burst detection model based on multidimensional spatiotemporal information has been developed to improve the effectiveness of burst detection. The case network will allow the demonstration and analysis of the overall burst detection strategy from a hydraulic perspective.

The proposed method is illustrated in Figure 1. Based on the monitoring pressure data from the sensors in the networks, the burst detection is mainly followed by normal benchmark dataset acquisition, spatiotemporal information analysis, and burst detection based on the isolation forest (IF) machine learning model. The three sections are described in detail benchmark dataset acquisition, spatiotemporal information analysis of the pressure data, and burst detection model sections. The benchmark dataset is obtained from historical data by the clustering algorithm to eliminate anomalous data as much as possible. This dataset largely reflects the characteristics of normal data. This stage provides the database for the next stage. Thereafter, a spatiotemporal information analysis of the pressure data is conducted to calculate the data feature from both the shape and distance dimensions based on the time window and sensor groups. The two features can distinguish burst data from normal data. Finally, the data features are fed into the IF model for training to determine if the near-time data are burst events.
Figure 1

Research methodology for burst detection.

Figure 1

Research methodology for burst detection.

Close modal

Benchmark dataset acquisition

The benchmark data are the monitoring pressure data filtered to exclude burst data based on the distribution characteristics of historical monitoring data. Pressure changes serve as the main basis for burst monitoring. During the burst events, the monitoring data values are smaller than those of normal moments and substantially deviate from the distribution of normal pressure data. Given this feature, the benchmark data are calculated through clustering. In addition, the historical monitoring pressure data of each pressure sensor in WDNs are periodic, meaning that pressure data from the same pressure sensor at the same moment on different days are similar (Wang et al. 2020). These data can be classified into one dataset for clustering and obtaining the benchmark pressure data at that moment. Each time moment for each pressure sensor corresponds to a benchmark pressure data point.

The normal data cluster has a high density, indicating a concentrated distribution, while burst data are dispersed and have uneven density. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm can adapt to datasets with different distribution patterns, allowing for the identification of low-density and high-density clusters in case of uneven dataset distribution (Schubert et al. 2017). In this research, two parameters – Eps and MinPts – were set for the selected DBSCAN clustering algorithm on the basis of a controlled proportion β of the data size in the data cluster with the maximum internal density in the total data. Specifically, the two parameters mainly describe the tightness of the sample distribution in neighborhoods. Eps denotes the threshold for the neighborhood distance of one point, while MinPts indicates the threshold for the number of data points within Eps. The data points within the Eps of one data point are determined to form a cluster, with the single data point as the core point. A large value for Eps, when MinPts is not changed, will cause most data to be clustered in the same cluster. Moreover, a small value for Eps splits a cluster and causes data in the same cluster to be marked as outliers. However, a large number of core data points will be found when MinPts is considerably small.

The acquisition process of the benchmark data is carried out through the following steps:

  • (a)

    The historical monitoring data are collected from the pressure data acquired by the SCADA system.

  • (b)

    The two-dimensional data points are clustered for the sake of clustering feasibility and operation efficiency. In this research, the number of time window steps for clustering was set as two, including the data from two consecutive moments.

  • (c)

    The initial values for Eps and MinPts are set. If a data point selected is the core point, then the data points within the range of Eps form a cluster.

  • (d)

    If the selected data point is not the core point, then the other data points are preferred.

  • (e)

    Steps (c) and (d) are repeated until all points are processed.

  • (f)

    The data cluster division results are acquired to evaluate whether β is met; otherwise, Eps and MinPts in (c) are reset, and steps (c)–(f) are repeated.

Based on the clustering results from the previous steps, the centroid of the two-dimensional data in the high-density cluster is calculated as the benchmark pressure data required by the pressure sensor j at the moment. Subsequently, the benchmark pressure data vector of the pressure sensor j at all moments is determined. By following the same reasoning, the benchmark pressure data for all pressure sensors at all moments can be obtained, resulting in the benchmark dataset :
formula
(1)
formula
(2)
where and represent the benchmark pressure of sensor j in the first and second time steps, respectively; and denotes the benchmark data vector of the pressure sensor j. In this research, the time step is set to 15 min, totaling 96 steps within 1 day. The dimension of is 96. stands for the quantity of pressure sensors, and B is a 96 × matrix.

Spatiotemporal information analysis of the pressure data

Temporal information extraction of the pressure data

The monitoring pressure data at a single moment provide limited information in WDNs. In this research, burst detection was achieved by combining pressure data from previous moments, followed by data analysis in terms of time windows. The approach facilitated the effective utilization of temporal pressure information. The analysis of temporal information for monitoring pressure data from multiple pressure sensors is analyzed as follows:
  • (1)
    The monitoring dataset and previously determined benchmark dataset are divided into time windows using a time window step of s. The multi-time pressure data are then classified into the same time window:
    formula
    (3)
    formula
    (4)
    formula
    (5)
    where is the pressure data of the pressure sensor k at the sth step point on the first day; represents the time window pressure data vector at the sth step point on the first day, including s pressure data points; and denotes the time window data matrix of the pressure sensor k on the ith day:
    formula
    (6)
    formula
    (7)
where is the benchmark data of the pressure sensor k at the first step point, represents the time window benchmark pressure data vector of the pressure sensor k at the first step point, and is the time window benchmark pressure data matrix of the pressure sensor k. Figures 2 and 3 display the time window division process at the time window step of s = 3 using the time window division for the dataset of sensor k as an example.
Figure 2

Schematic of the time window division of pressure data for sensor k.

Figure 2

Schematic of the time window division of pressure data for sensor k.

Close modal
Figure 3

Schematic of the time window division of benchmark pressure data for sensor k.

Figure 3

Schematic of the time window division of benchmark pressure data for sensor k.

Close modal
  • (2)

    Data analysis is performed on the time axis for the monitoring pressure data window of the pressure sensor k at time j and the corresponding benchmark pressure data time window. The features distinguishing burst data from normal data are extracted. In this research, the distance and shape features of time windows at time j are calculated.

  • (3)

    Step (2) is repeated to acquire the time window data features of all pressure sensors at all moments on all days in the historical and latest monitoring datasets from the SCADA system.

The time window features of the monitoring pressures with and without burst vary to a certain extent, requiring the extraction of features that distinguish the presence or absence of burst from the time window. In this research, the distance and shape features of time windows have been innovatively and comprehensively investigated. This task differs from the single-moment characterization in previous studies (Zhang et al. 2021a), as it extracts data features from two perspectives in terms of time windows. The two features can distinguish the presence or absence of burst events through monitoring data, with a greater feature value indicating a higher possibility of burst. The main calculation formulas are presented as follows:
formula
(8)
formula
(9)
formula
(10)
where denotes the distance feature of the sensor k within the jth time window on the ith day, represents the shape feature of the pressure sensor k within the jth time window on the ith day, and is the ratio of the pressure data vector of pressure sensor k within the jth time window on the ith day to the benchmark pressure data vector within the jth time window.

Spatial information extraction of pressure data

To achieve burst detection within DMAs, monitoring data from multiple pressure sensors are combined. Sensors in such monitoring networks are grouped based on their distance to reduce the dimensionality of the data feature and improve the method efficiency. The number of groups and the average number of pressure sensors are determined based on the specific WDNs. Closer pressure sensors are grouped together to form the same group.

Burst detection model

IF model

Burst events cause a decrease in the value of the spatiotemporal information features. This article adopted the IF algorithm developed by Liu et al. (2008) to analyze the isolation degree of the spatiotemporal information features from the pressure data. The algorithm constructs an ensemble of binary tree structures called isolation trees (iTrees). The core idea of the IF algorithm is to randomly cut a data space with a hyperplane, resulting in two data subspaces. This process is repeated until only one data point exists in each subspace. High-density clusters are repeatedly cut, while the operation can be rapidly stopped for low-density clusters. In Figure 4, the data point located in densely distributed areas required 11 hyperplanes to be isolated, while the data point , located in a sparsely distributed area, only required four hyperplanes for isolation. Details of the iForest algorithm can be found in the study by Liu et al. (2008).
Figure 4

2D diagram of the IF algorithm.

Figure 4

2D diagram of the IF algorithm.

Close modal

Evaluation criteria

The true-positive rate (TPR), false-positive rate (FPR) and detection time (DT) describe the effectiveness and timeliness of the method proposed in this article. The monitoring rule followed in this research is that the near-time data window eigenvector is inputted into the burst detection model. This model outputs the burst status at the current moment and judges whether a burst event has occurred according to the number of continuous monitoring moments (here taken as three), as shown in Figure 5:
formula
(11)
formula
(12)
formula
(13)
where TP indicates that a burst event is predicted as proven by the actual burst data, FP means that a burst event is predicted when the actual data indicates no burst, TN means that no burst event is predicted as proven by the actual data, FN indicates that no burst event is predicted when the actual data proves burst, represents the number of burst events, and represents the number of time steps used to detect the burst event i.
Figure 5

Rules of the burst detection method.

Figure 5

Rules of the burst detection method.

Close modal
In this study, a benchmark pipe network C-town (Creaco et al. 2014) was chosen, with 388 nodes, 429 pipes, one water pond, seven reservoirs, and five pump stations. The C-town WDNs were formed by mixing a tree topology and a cyclic structure, as shown in Figure 6. This article focused on DMA1, which is located near pump station S1 and subjected to high water supply pressure and a low elevation, resulting in a relatively large pressure. DMA1 included 132 nodes, 153 pipes, and 10 pressure sensors, with a water supply flow of 118 L/s.
Figure 6

WDNs: (a) C-Town and (b) DMA1 in C-town WDNs.

Figure 6

WDNs: (a) C-Town and (b) DMA1 in C-town WDNs.

Close modal

Two monitoring pressure datasets were mainly included: a benchmark dataset and a dataset for model training and testing. The former specifically referred to the 1,000-day historical monitoring dataset. The latter consisted of a historical monitoring dataset and a latest monitoring pressure database. The training dataset was composed of 1,200-day pressure data, while the test dataset consisted of 400-day pressure data.

EPANET 2.2 was used to perform the simulation analysis of hydraulic data to facilitate the setting of working conditions under different external noises and bursts. The simulation involved several steps. First, noise disturbance was added to the initial water demand at each node to generate the 1 day water demand. Second, the burst data were mainly correlated with the position, flow rate , and occurrence time of burst events, and the pressure data at each monitoring point were generated. Third, noise disturbance was added to the pressure data to generate the pressure data of burst events. Finally, the aforementioned steps were repeated to generate multiday pressure data. In this context, indicates the random change in users' daily water consumption caused by weather, temperature, and other factors (Wu et al. 2016). The actual water consumption is determined by multiplying by the initial water consumption. denotes the measurement errors during the pressure acquisition process. This monitoring pressure is obtained by adding to EPANET-simulated pressure. This is because can mask pressure drops by bursts, making detection more difficult (Xu et al. 2020). In this article, both and follow a normal distribution (Xu et al. 2020).

Results and discussion

For the method to be practical and applicable, it is crucial to thoroughly explore the user-defined parameters mentioned earlier and determine optimal values that achieve a useful level of burst detection in WDNs. The number of normal data points is much more than that of burst data in actual monitoring pressure data. Thereafter, the parameter β in the DBSCAN algorithm was set to 80–90%. The initial values for Eps and MinPts are set to 0.05 and 10, respectively. They were then adjusted incrementally by 0.01 and 1, respectively, to ensure that the values fell within the desired range. The number of iTrees in the IF algorithm was set to 300. A historical pressure dataset for obtaining the benchmark dataset from a pressure sensor J204, for example, contained the monitoring data at two continuous moments, as illustrated in Table 1. Figure 7 displays the clustering analysis of the historical pressure dataset of J204. The benchmark data were acquired by calculating the centroid of the data in a high-density cluster. Twelve pieces of benchmark data could be solved through Figure 7. The data at each moment were solved by clustering until all data of the pressure sensors listed in Table 2 were obtained.
Table 1

Pressure dataset for obtaining the benchmark dataset at sensor J204 (unit: m)

Days123459989991,000
Time step 1 28.390 28.666 28.556 28.512 28.527  28.592 28.646 28.652 
Time step 2 28.651 28.654 28.413 28.434 28.566  28.545 28.366 28.415 
Days123459989991,000
Time step 1 28.390 28.666 28.556 28.512 28.527  28.592 28.646 28.652 
Time step 2 28.651 28.654 28.413 28.434 28.566  28.545 28.366 28.415 
Table 2

Benchmark database for all sensors in DMA1 WDNs (unit: m)

J39J204J417J7J156
Time step 1 28.390 46.203 … 29.600 … 46.298 43.673 
Time step 2 28.651 46.580 … 29.631 … 46.546 43.555 
Time step 3 28.422 46.459 … 29.612 … 46.734 43.746 
Time step 4 28.537 47.252 … 29.633 … 47.121 43.778 
Time step 5 28.318 47.293 … 29.672 … 47.175 43.965 
Time step 6 28.341 47.374 … 29.691 … 46.956 43.677 
Time step 7 28.419 48.925 … 30.651 … 47.147 43.915 
 … … …  …   
Time step 93 31.365 54.177 … 39.844 … 51.209 45.634 
Time step 94 31.359 54.085 … 39.972 … 51.602 45.787 
Time step 95 31.576 54.292 … 40.126 … 51.694 45.754 
Time step 96 31.517 54.393 … 40.471 … 51.873 45.892 
J39J204J417J7J156
Time step 1 28.390 46.203 … 29.600 … 46.298 43.673 
Time step 2 28.651 46.580 … 29.631 … 46.546 43.555 
Time step 3 28.422 46.459 … 29.612 … 46.734 43.746 
Time step 4 28.537 47.252 … 29.633 … 47.121 43.778 
Time step 5 28.318 47.293 … 29.672 … 47.175 43.965 
Time step 6 28.341 47.374 … 29.691 … 46.956 43.677 
Time step 7 28.419 48.925 … 30.651 … 47.147 43.915 
 … … …  …   
Time step 93 31.365 54.177 … 39.844 … 51.209 45.634 
Time step 94 31.359 54.085 … 39.972 … 51.602 45.787 
Time step 95 31.576 54.292 … 40.126 … 51.694 45.754 
Time step 96 31.517 54.393 … 40.471 … 51.873 45.892 
Figure 7

Schematic of the clustering distribution for the six time windows at sensor J204.

Figure 7

Schematic of the clustering distribution for the six time windows at sensor J204.

Close modal
The main features of 10 pipe burst events in one training and test dataset are displayed in Table 3. The 10 sensors are divided into three groups according to the sensor grouping principle outlined in Section 2.2.2. The first group comprised J39, J204, and J40, the second group included J366, J408, and J417, and the third group consisted of J7, J179, J161, and J156. Figures 8 and 9 display the time-dependent changes in the distance feature df and shape feature sf of three pressure sensors in events 5 and 6, which varied from one group to another. When a pipe burst event occurred, the two features could suddenly increase and lasted for a certain time and distinguish burst and normal events. The two features could reflect the differences between pipe burst and normal events from different perspectives.
  • (1)

    Detection effect analysis under different time window lengths

Figure 8

Schematic of sf of three pressure sensors in 7 days.

Figure 8

Schematic of sf of three pressure sensors in 7 days.

Close modal
Figure 9

Schematic of df of three pressure sensors in 7 days.

Figure 9

Schematic of df of three pressure sensors in 7 days.

Close modal
Table 3

Information of the selected pipe burst events

Start date
Burst event 1 9:15 12:00 J189 
Burst event 2 2:00 5:30 J180 16 
Burst event 3 27 13:45 17:15 J188 10 
Burst event 4 29 16:30 19:15 J4 17 
Burst event 5 34 9:00 12:45 J1157 19 
Burst event 6 36 16:15 20:00 J337 21 
Burst event 7 58 8:15 11:30 J226 24 
Burst event 8 133 8:15 20:00 J226 24 
Burst event 9 136 1:45 2:45 J376 19 
Burst event 10 142 13:30 16:00 J432 
Start date
Burst event 1 9:15 12:00 J189 
Burst event 2 2:00 5:30 J180 16 
Burst event 3 27 13:45 17:15 J188 10 
Burst event 4 29 16:30 19:15 J4 17 
Burst event 5 34 9:00 12:45 J1157 19 
Burst event 6 36 16:15 20:00 J337 21 
Burst event 7 58 8:15 11:30 J226 24 
Burst event 8 133 8:15 20:00 J226 24 
Burst event 9 136 1:45 2:45 J376 19 
Burst event 10 142 13:30 16:00 J432 
Figure 10 displays the comparison of the burst detection effects under four types of noise conditions and different s values (number of time steps included in the time window). The s value corresponding to the optimal TPR was found to be within the candidate interval of 6–13. Figure 11 exhibits the DP under the four noise conditions and different time window steps s. The DP progressively increased with the increase in s, and the timely burst monitoring effect gradually declined. The intervals of and s were 3–30 L/s (3–25% of the total flow rate in the DMAs) and 2–45, respectively. The external pressure and flow noise conditions are noise 1: ; noise 2: ; noise 3: ; and noise 4: . Each external noise consists of variables and , where the corresponding values for the four noises are , , , ;, ; and , .
Figure 10

TPR and FPR for the different s values with four noise conditions.

Figure 10

TPR and FPR for the different s values with four noise conditions.

Close modal
Figure 11

DT for the different s values with four noise conditions.

Figure 11

DT for the different s values with four noise conditions.

Close modal
Table 4 illustrates the optimal steps of time window in the test dataset under different noise conditions. Figure 12 illustrates the burst detection rates under different burst flow rates and s values, where the intervals of were 3–6, 6–9, and 9–12, and s ranged from 2 to 40. The TPR initially increased slowly with the increase in s and then presented a fast-declining trend, which was consistent with the trend displayed in Figure 10. Specifically, the variation could be attributed to several factors. Under a large s value, the number of windows at the past moments used for detecting burst increased, and the data at the previous moments would have a greater influence on the burst detection effect. Meanwhile, the data at the current moment had a smaller impact on the burst outcome, resulting in a decrease in detection effectiveness. Moreover, the detection performance on the test set was found to be better for bursts with a large flow rate compared to bursts with a small flow rate, as shown in Figure 12. The minor pressure fluctuation caused by burst with the small flow rate might be covered by the noise. Figure 13 displays the s values corresponding to the optimal burst detection effect under different noise conditions and burst flow rates, which falls into the aforementioned candidate interval. In addition, 20 schemes correspond to the optimal detection performance at each under each noise condition, and the TPR, FPR, and DP corresponding to the optimal detection effect are exhibited in Table 5.
  • (2)

    Detection performance analysis under a single and multiple pressure sensors

Figure 12

Burst detection performance of different s in varying with four noise conditions.

Figure 12

Burst detection performance of different s in varying with four noise conditions.

Close modal
Figure 13

Value of s corresponding to the best performance in different .

Figure 13

Value of s corresponding to the best performance in different .

Close modal
Table 4

Optimal s intervals for the testing dataset with four noise conditions





sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)
10 93.33 2.95 1.67 83.33 4.95 3.00 85.00 2.77 2.77 76.67 4.83 3.85 
93.33 2.87 1.70 81.67 3.92 3.3 85.00 2.80 2.85 10 76.67 2.94 4.48 
11 93.33 2.99 1.76 81.67 4.44 3.45 85.00 2.86 2.9 75.00 5.38 4.12 
12 93.33 2.97 1.98 10 81.67 4.52 3.53 10 85.00 2.89 3.12 75.00 2.84 4.40 
13 93.33 2.99 2.00 11 81.67 2.91 3.83 11 85.00 2.94 3.10 75.00 2.86 4.45 




sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)sTPR (%)FPR (%)DT (15 min)
10 93.33 2.95 1.67 83.33 4.95 3.00 85.00 2.77 2.77 76.67 4.83 3.85 
93.33 2.87 1.70 81.67 3.92 3.3 85.00 2.80 2.85 10 76.67 2.94 4.48 
11 93.33 2.99 1.76 81.67 4.44 3.45 85.00 2.86 2.9 75.00 5.38 4.12 
12 93.33 2.97 1.98 10 81.67 4.52 3.53 10 85.00 2.89 3.12 75.00 2.84 4.40 
13 93.33 2.99 2.00 11 81.67 2.91 3.83 11 85.00 2.94 3.10 75.00 2.86 4.45 
Table 5

Best performance in each for the different noise conditions

Case 1Case 2Case 3Case 4Case 5Case 6Case 7Case 8Case 9Case 10
TPR (%) 65.00 38.33 40.00 38.33 83.33 56.67 65.00 50.00 96.67 71.67 
FPR (%) 2.36 2.45 2.19 2.44 2.66 4.72 2.54 2.57 2.89 4.83 
DT (15 min) 6.37 8.33 8.47 8.35 3.55 6.25 6.06 7.15 2.05 4.32 
Case 11Case 12Case 13Case 14Case 15Case 16Case 17Case 18Case 19Case 20
TPR (%) 81.67 66.67 100.00 81.67 88.33 78.33 100.00 86.67 96.67 85.00 
FPR (%) 2.74 4.72 2.85 2.92 2.86 2.88 2.85 2.99 3.11 4.90 
DT (15 min) 4.16 5.15 0.8 4.38 2.47 4.78 0.8 2.73 2.13 2.92 
Case 1Case 2Case 3Case 4Case 5Case 6Case 7Case 8Case 9Case 10
TPR (%) 65.00 38.33 40.00 38.33 83.33 56.67 65.00 50.00 96.67 71.67 
FPR (%) 2.36 2.45 2.19 2.44 2.66 4.72 2.54 2.57 2.89 4.83 
DT (15 min) 6.37 8.33 8.47 8.35 3.55 6.25 6.06 7.15 2.05 4.32 
Case 11Case 12Case 13Case 14Case 15Case 16Case 17Case 18Case 19Case 20
TPR (%) 81.67 66.67 100.00 81.67 88.33 78.33 100.00 86.67 96.67 85.00 
FPR (%) 2.74 4.72 2.85 2.92 2.86 2.88 2.85 2.99 3.11 4.90 
DT (15 min) 4.16 5.15 0.8 4.38 2.47 4.78 0.8 2.73 2.13 2.92 
Figure 14 displays the comparison of the burst detection effects achieved by single, two, and multiple pressure sensors under four noise conditions, where ranged from 3 to 30 L/s (3–25% of the total inflow rate in the DMAs). Under the four external noise conditions, the monitoring performances achieved by the multiple monitoring pressure sensors were 21.66, 18.34, 18.33, and 16.67% higher than those attained by two pressure sensors, and 21.66, 18.34, 18.33, and 16.67% higher than those acquired by a single pressure sensor, respectively. Moreover, two pressure sensors harvested a better detection effect than a single pressure sensor, with increase rates of 3.34, 1.66, 13.34, and 16.66%, respectively. Figure 15 exhibits the detection performances of bursts at various flow rates and the number of pressure sensors. The effectiveness of this method relies on the monitoring data, and thus, the detection effect depends largely on the number of sensors.
Figure 14

Burst detection performance of the different numbers of sensors in DMA1.

Figure 14

Burst detection performance of the different numbers of sensors in DMA1.

Close modal
Figure 15

Burst detection performance of the different numbers of sensors in the various flowrates.

Figure 15

Burst detection performance of the different numbers of sensors in the various flowrates.

Close modal

The method employed in this study detects sudden pipe burst events in WDNs by capturing changes in pressure data. However, it is unable to detect pre-existing background leaks with a very low flow rate. The monitoring data of each pressure sensor obtained by the SCADA system were to extract data features for pipe burst detection. Pipe burst events could be identified in an efficient and timely manner by effectively extracting the data features distinguishing burst from normal data. The burst could be amplified to a certain extent by combining the spatiotemporal correlations of the pressure (Zhang et al. 2021b). However, the pressure data reduction amplitude caused by pipe bursts could be easily covered by noise when a single pressure sensor was used, leading to a decrease in the accuracy of burst detection. To address the issue, multiple pressure sensors were integrated to avoid the low burst detection accuracy by the failure of a single sensor. Overall, the multi-time data performance of the multiple pressure sensors can contribute to the improvement of burst detection accuracy. In this research, an innovative approach for pipe burst detection was proposed. The approach could rapidly extract the data features revealing burst events from large amounts data and detect bursts based on such features. The framework for burst detection was constituted as follows: The spatiotemporal data feature information of time windows from different pressure sensor groups was extracted using the benchmark data estimated from historical data as the reference data. The IF-based unsupervised learning model was then used to detect bursts. Herein, the pressure monitoring data within certain time window length were used to comprehensively monitor burst in the whole DMA, leading to a satisfactory detection as indicated by the three indexes mentioned in this article. Effectively utilizing the spatiotemporal information of multiple sensors and data from multiple time moments is highly effective for timely and efficient detection of bursts (Zhang et al. 2021b). Moreover, detecting bursts in a timely manner and minimizing damage is crucial, particularly when they occur at night or in rural areas (Xu et al. 2020).

The proposed method can rapidly detect burst events and minimize the damage caused by pipe bursts. This study extensively uses spatiotemporal information, and the corresponding detection results are highly reliable, achieving up to 93.33, 83.33, 85, and 76.67% for four noise cases with the time window lengths ranging from 6 to 13 time steps. The main findings based on the case study are as follows:

  • 1.

    The pressure change patterns of burst event are entirely different from that under normal conditions. On this basis, the proposed method can extract the essential information features from space and time, providing a viable solution for burst detection.

  • 2.

    The proposed method effectively detected bursts based on the time window data of each pressure sensor in the DMA. Multiple pressure sensors with time window data in a certain interval achieve the best detection performance, as shown by the comparison results for the detection performance of a single, two, and multiple pressure sensors under different time window lengths. This research overcomes the low reliability and detection rate of single-time and single pressure sensor data in relevant literature.

  • 3.

    The proposed method, which utilizes unlabeled monitoring pressure data obtained in the DMA, has promising prospects and guiding significance for practical engineering. Moreover, this method avoids the sole reliance on flowmeters and hydraulic model accuracy and overcomes the lack of data with burst and nonburst labels in practical engineering.

As the data-driven method is heavily dependent on monitoring data from sensors, the accuracy of the method depends on the sensor arrangement scheme, which is also its main limitation. To address this, the combination of model-based and data-driven methods will be investigated in the future. Currently, it is challenging to distinguish between abnormal events caused by pipe bursts, significant water usage by large consumers, and operational changes in the pipeline network. In the future, more detailed distinctions will be made between these events through further research and development of advanced detection and identification methods. In addition, the proposed method utilizes steady-state pressure data, which may not perform as well in real-time processing compared to transient-based methods. To improve the timely performance of this method, we plan to couple transient and steady-state data in future studies. Another issue beyond the scope of this study is burst localization. Future research will focus on the accurate burst localization based on pressure after the burst detection.

This work was supported by the National Key Research and Development Program of China (2022YFF06069004), National Natural Science Foundation of China (52070167), and Zhejiang Provincial Natural Science Foundation of China (LHY22E080003).

All relevant data are available from an online repository or repositories. See https://github.com/zhangxiangqiu/burst-detection.

The authors declare there is no conflict.

Ahn
J.
&
Jung
D.
2019
Hybrid statistical process control method for water distribution pipe burst detection
.
Journal of Water Resources Planning and Management
145
(
9
),
06019008
.
Bermúdez
J. R.
,
López-Estrada
F.
,
Besançon
G.
,
Torres
L.
&
Santos-Ruiz
I.
2020
Leak diagnosis approach for water distribution networks based on a k-NN classification algorithm
.
IFAC-PapersOnLine
53
(
2
),
1665116656
.
Blesa
J.
&
Perez
R.
2018
Modelling uncertainty for leak localization in water networks
.
IFAC-Papersonline
51
(
24
),
730
735
.
Butler
D.
,
Farmani
R.
,
Fu
G.
,
Ward
S.
,
Diao
K.
&
Astaraie-Imani
M.
2014
A new approach to urban water management: safe and sure
.
Procedia Engineering
89
,
347
354
.
Colombo
A. F.
,
Lee
P.
&
Karney
B. W.
2009
A selective literature review of transient-based leak detection methods
.
Journal of Hydro-Environment Research
2
(
4
),
212
227
.
Datta
S.
&
Sarkar
S.
2016
A review on different pipeline fault detection methods
.
Journal of Loss Prevention in the Process Industries
41
,
97
106
.
Fang
Q.
,
Zhang
J.
,
Xie
C.
&
Yang
Y.
2019
Detection of multiple leakage points in water distribution networks based on convolutional neural networks
.
Water Science and Technology – Water Supply
19
(
8
),
2231
2239
.
Farley
M.
&
Trow
S.
2003
Losses in water distribution networks
.
A Practitioner's Guide to Assessment, Monitoring and Control
.
IWA Publishing
,
London
.
Fox
S.
,
Shepherd
W.
,
Collins
R.
&
Boxall
J.
2016
Experimental quantification of contaminant ingress into a buried leaking pipe during transient events
.
Journal of Hydraulic Engineering
142
(
1
),
04015036
.
Gong
J. Z.
,
Lambert
M. F.
,
Simpson
A. R.
&
Zecchin
A. C.
2013
Single-event leak detection in pipeline using first three resonant responses
.
Journal of Hydraulic Engineering
139
(
6
),
645
655
.
Gong
J. Z.
,
Lambert M
F.
,
Simpson
A. R.
&
Zecchin
A. C.
2014
Frequency response diagram for pipeline leak detection: comparing the odd and even harmonics
.
Journal of Water Resources Planning and Management
140
(
1
),
65
74
.
Hu
Z. K.
,
Chen
B. Q.
,
Chen
W. L.
,
Tan
D. B.
&
Shen
D. T.
2021
Review of model-based and data-driven approaches for leak detection and location in water distribution systems
.
Water Science and Technology – Water Supply
21
(
7
),
3282
3306
.
Huang
P. J.
,
Zhu
N. F.
,
Hou
D. B.
,
Chen
J. Y.
,
Xiao
Y.
,
Yu
J.
,
Zhang
G. X.
&
Zhang
H. J.
2018
Real-time burst detection in district metering areas in water distribution system based on patterns of water demand with supervised learning
.
Water
10
(
12
),
1765
.
Huang
L. F.
,
Du
K.
,
Guan
M. T.
,
Huang
W.
,
Song
Z. G.
&
Wang
Q.
2022
Combined usage of hydraulic model calibration residuals and improved vector angle method for burst detection and localization in water distribution systems
.
Journal of Water Resources Planning and Management
148
(
7
),
04022034
.
Jensen
H. A.
&
Jerez
D. J.
2019
A Bayesian model updating approach for detection-related problems in water distribution networks
.
Reliability Engineering and System Safety
185
,
100
112
.
Laucelli
D.
,
Romano
M.
,
Savic
D.
&
Giustolisi
O.
2016
Detecting anomalies in water distribution networks using EPR modelling paradigm
.
Journal of Hydroinformatics
18
(
3
),
409
427
.
Laucelli
D. B.
,
Simone
A.
,
Berardi
L.
&
Giustolisi
O.
2017
Optimal design of district metering areas for the reduction of leakages
.
Journal of Water Resources Planning and Management
143
(
6
),
04017017
.
Lee
S. J.
,
Lee
G.
,
Suh
J. C.
&
Lee
J. M.
2016
Online burst detection and location of water distribution systems and its practical applications
.
Journal of Water Resources Planning and Management
142
(
1
),
04015033
.
Li
X.
,
Chu
S. P.
,
Zhang
T. Q.
&
Yu
T. C.
2021
Leakage localization using pressure sensors and spatial clustering in water distribution systems
.
Water Science and Technology – Water Supply
22
(
1
),
1020
1034
.
Liu
F. T.
,
Ting
K. M.
&
Zhou
Z.
2008
Isolation forest
. In:
Paper Presented at 2008 Eighth IEEE International Conference on Data Mining
,
Pisa, Italy
.
Menapace
A.
,
Avesani
D.
,
Righetti
M.
,
Bellin
A.
&
Pisaturo
G.
2018
Uniformly distributed demand EPANET extension
.
Water Resources Management
32
(
6
),
2165
2180
.
Meseguer
J.
,
Mirats-Tur
J. M.
,
Cembrano
G.
,
Puig
V.
,
Quevedo
J.
,
Pérez
R.
,
Sanz
G.
&
Ibarra
D.
2014
A decision support system for on-line leakage localization
.
Environmental Modelling and Software
60
,
331
345
.
MOHURD (Ministry of Housing and Urban-Rural Development, PRC)
2017
China Urban Construction Statistical Yearbook
.
China Statistical Publlishing House, Beijing
.
Moors
J.
,
Scholten
L.
,
van der Hoek
J.
&
den Besten
J.
2018
Automated leak localization performance without detailed demand distribution data
.
Urban Water Journal
15
(
2
),
116
123
.
Mounce
S. R.
,
Day
A. J.
,
Wood
A. S.
,
Khan
A.
,
Widdop
P. D.
&
Machell
J.
2002
A neural network approach to burst detection
.
Water Science and Technology
45
(
4–5
),
237
246
.
Mounce
S. R.
,
Boxall
J. B.
&
Machell
J.
2010
Development and verification of an online artificial intelligence system for detection of bursts and other abnormal flows
.
Journal of Water Resources Planning and Management
136
(
3
),
309
318
.
Mounce
S. R.
,
Mounce
R. B.
,
Jackson
T.
,
Austin
J.
&
Boxall
J. B.
2014
Pattern matching and associative artificial neural networks for water distribution system time series data analysis
.
Journal of Hydroinformatics
16
(
3
),
617
632
.
Mpesha
W.
,
Chaudhry
M. H.
&
Gassman
S. L.
2001
Leak detection in pipes by frequency response method
.
Journal of Hydraulic Engineering
127
(
2
),
134
147
.
Oliker
N.
&
Ostfeld
A.
2014
A coupled classification - Evolutionary optimization model for contamination event detection in water distribution systems
.
Water Research
51
,
234
245
.
Palau
C. V.
,
Arregui
F. J.
&
Carlos
M.
2012
Burst detection in water networks using principal component analysis
.
Journal of Water Resources Planning and Management
138
(
1
),
47
54
.
Pérez-Pérez
E.
,
López-Estrada
F. R.
,
Valencia-Palomo
G.
,
Torres
L.
,
Puig
V.
&
Mina Antonio
J.
2021
Leak diagnosis in pipelines using a combined artificial neural network approach
.
Control Engineering Practice
107
,
104677
.
Romano
M.
,
Kapelan
Z.
&
Savic
D. A.
2014
Automated detection of pipe bursts and other events in water distribution systems
.
Journal of Water Resources Planning and Management
140
(
4
),
457
467
.
Romero
L.
,
Blesa
J.
,
Puig
V.
&
Cembrano
G.
2022
Clustering-learning approach to the localization of leaks in water distribution networks
.
Journal of Water Resources Planning and Management
148
(
4
),
04022003
.
Sanz
G.
,
Perez
R.
,
Kapelan
Z.
&
Savic
D.
2016
Leak detection and localization through demand components calibration
.
Journal of Water Resources Planning and Management
142
(
2
),
04015057
.
Savic
D.
2019
Artificial Intelligence—how can water planning and management benefit from?
Savic
D. A.
,
Kapelan
Z. S.
&
Jonkergouw
P. M. R.
2009
Quo vadis water distribution model calibration?
Urban Water Journal
6
(
1
),
3
22
.
Schubert
E.
,
Sander
J.
,
Ester
M.
,
Kriegel
H. P.
&
Xu
X. W.
2017
DBSCAN revisited, revisited – why and how you should (Still) use DBSCAN
.
Acm Transactions on Database Systems
42
(
3
),
1
12
.
Shukla
H.
&
Piratla
K.
2020
Leakage detection in water pipelines using supervised classification of acceleration signals
.
Automation in Construction
117
(
4
),
103256
.
Sophocleous
S.
,
Savic
D.
&
Kapelan
Z.
2019
Leak localization in a real water distribution network based on search-space reduction
.
Journal of Water Resources Planning and Management
145
(
7
),
04019024
.
Srirangarajan
S.
,
Allen
M.
,
Preis
A.
,
Lqbal
M.
,
Lim
H. B.
&
Whittle
A. J.
2013
Wavelet-based burst event detection and localization in water distribution systems
.
Journal of Signal Processing Systems for Signal Image and Video Technology
72
(
1
),
1
16
.
Tornyeviadzi
H. M.
&
Seidu
R.
2023
Leakage detection in water distribution networks via 1D CNN deep autoencoder for multivariate SCADA data
.
Engineering Applications of Artificial Intelligence
122
,
106062
.
Wang
X. T.
,
Guo
G. C.
,
Liu
S. M.
,
Wu
Y. P.
,
Xu
X. Y.
&
Smith
K.
2020
Burst detection in district metering areas using deep learning method
.
Journal of Water Resources Planning and Management
146
(
6
),
04020031
.
Wu
Y. P.
,
Liu
S. M.
,
Wu
X.
,
Liu
Y. F.
&
Guan
Y. S.
2016
Burst detection in district metering areas using a data driven clustering algorithm
.
Water Research
100
,
28
37
.
Wu
Y. P.
,
Liu
S. M.
,
Smith
K.
&
Wang
X. T.
2018
Using correlation between data from multiple monitoring sensors to detect bursts in water distribution systems
.
Journal of Water Resources Planning and Management
144
(
2
),
04017084
.
Wu
J.
,
Wang
Z.
&
Dong
L.
2021
Prediction and analysis of water resources demand in Taiyuan City based on principal component analysis and BP neural network
.
AQUA – Water Infrastructure, Ecosystems and Society
70
(
8
),
1272
1286
.
Xu
Q.
,
Liu
R.
,
Chen
Q.
&
Li
R.
2014
Review on water leakage control in distribution networks and the associated environmental benefits
.
Journal of Environmental Sciences
26
(
5
),
955
961
.
Xu
W. R.
,
Zhou
X.
,
Xin
K. L.
,
Boxall
J.
,
Yan
H. X.
&
Tao
T.
2020
Disturbance extraction for burst detection in water distribution networks using pressure measurements
.
Water Resources Research
56
(
5
),
e2019WR025526
.
Yan
H.
,
Wang
Q.
,
Wang
J.
,
Xin
K.
,
Tao
T.
&
Li
S.
2019
A simple but robust convergence trajectory controlled method for pressure driven analysis in water distribution system
.
Science of the Total Environment
659
,
983
994
.
Ye
G. L.
&
Fenner
R. A.
2011
Kalman filtering of hydraulic measurements for burst detection in water distribution systems
.
Journal of Pipeline Systems Engineering and Practice
2
(
1
),
14
22
.
Ye
G. L.
&
Fenner
R.
2014
Weighted least squares with expectation-maximization algorithm for burst detection in U.K. water distribution systems
.
Journal of Water Resources Planning and Management
140
(
4
),
417
424
.
Zaman
D.
,
Tiwari
M. K.
,
Gupta
A. K.
&
Sen
D.
2020
A review of leakage detection strategies for pressurised pipeline in steady state
.
Engineering Failure Analysis
109
,
104264
.
Zanfei
A.
,
Menapace
A.
,
Bretan
B. M.
,
Righetti
M.
&
Herrera
M.
2022
Novel approach for burst detection in water distribution systems based on neural networks
.
Sustainable Cities and Society
86
,
104090
.
Zhang
K.
,
Yan
H.
,
Zeng
H.
,
Xin
K.
&
Tao
T.
2019
A practical multi-objective optimization sectorization method for water distribution network
.
Science of the Total Environment
656
,
1401
1412
.
Zhang
X. Q.
,
Long
Z. H.
,
Yao
T.
,
Zhou
H.
,
Yu
T. C.
&
Zhou
Y. C.
2021a
Real-time burst detection based on multiple features of pressure data
.
Water Science and Technology – Water Supply
22
(
2
),
1474
1491
.
Zhang
T. Q.
,
Li
X.
,
Chu
S. P.
&
Shao
Y.
2021b
Parameter determination and performance evaluation of time-series-based leakage detection method
.
Urban Water Journal
18
(
9
),
750
760
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).