Frequent burst events in water distribution systems cause severe water loss and other environmental issues such as contamination and carbon emissions. The availability of massive monitored data has facilitated the development of data-driven burst detection methods. This paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in district metering areas (DMAs). The sliding window is used to create flow subsequence libraries for all time points of a day using a historical data set and thereafter the clustering–reconstruction analysis is conducted to obtain flow pattern libraries and reconstruction error subsequences. The threshold vector is determined by the detection matrix extracted from the reconstruction error subsequences at each time point. At the detection stage, the new flow subsequence is created and its reconstruction version is obtained based on the flow pattern library at the same time point. The new detection vector is extracted and compared with the threshold vector to identify bursts. The proposed method is applied to two real-world DMAs and its detection performance is demonstrated and compared with two previous methods. The proposed method is proven to be effective in detecting burst events with fewer false alarms.

  • The flow subsequences clustering–reconstruction analysis method is proposed for burst detection.

  • The impact of parameterization on the detection performance is discussed.

  • The selection of the length of the historical data set directly affects the detection performance.

  • The proposed method performs better than the cumulative sum and shape similarity-based methods on two real-world DMA cases.

Water is one of the most important necessities for human survival and development. However, many countries in the world have been facing severe shortages of water because of rapid population growth and urbanization, economic growth, environmental pollution, and climate change. Water distribution systems (WDSs), as a critical infrastructure, play a vital role in safely delivering drinking water to customers. However, pipe bursts occur frequently in WDSs because of pipe ageing, inadequate construction quality, or lack of maintenance, which causes a significant amount of water loss (Guo et al. 2021). In addition, pipe bursts may threaten public health, increase carbon emissions, and disrupt normal water supply (Wang et al. 2022). Hence, developing effective and efficient pipe burst detection methods is a proactive means to reduce water losses and improve the management of water resources.

District metering areas (DMAs) and supervisory control and data acquisition (SCADA) systems are effective ways for controlling leakage in WDSs (Du et al. 2023). With the availability of real-time hydraulic data such as pipe flows and nodal pressures collected by installed sensors in DMAs, data-driven methods have been increasingly studied and developed for burst detection (Wu et al. 2022; Nimri et al. 2023). The data-driven methods developed in previous studies can be classified into three categories: prediction-classification method, statistical process control (SPC)-based methods, and similarity-based methods. In the prediction-classification methods, the first step is developing prediction models; then the residuals between the observed values and the corresponding predicted values are further analyzed to identify anomalies (Li et al. 2022). Various prediction models have been studied such as the artificial neural network (Romano et al. 2014; Wang et al. 2020), the Kalman filter (Ye & Fenner 2011), and the support vector machine (Mounce et al. 2011). Furthermore, different detection threshold-setting methods have been studied using statistical theory (Wang et al. 2020) or the probabilistic approach (Hutton & Kapelan 2015) based on the residuals of the training data set. The key issue of prediction-classification methods is how to select the input features of the prediction model for improving the prediction accuracy, which directly affects the detection performance (Wu & Liu 2017; Guo et al. 2018). In addition, different threshold-setting strategies based on the prediction residuals also affect the detection performance (Wang et al. 2020).

SPC-based methods, as simple but effective burst detection methods, use control charts with a set of control limits to identify the abnormal fluctuation in the hydraulic data, which have been widely studied. The most common SPC-based methods in the previous studies include Western Electrical Company rules (Jung et al. 2015), exponential weighted moving average (Claudio et al. 2015), Hoteling T2 control chart (Palau et al. 2012), and cumulative sum (CUSUM) (Jung et al. 2015). However, SPC-based methods do not achieve a promising detection performance in real-world monitoring data because the assumption of Gaussian distribution of the data may not be satisfied (Jung et al. 2015; Wu & Liu 2020).

Unlike the prediction-classification and SPC-based methods that detect bursts by determining whether a single hydraulic data point is an outlier, similarity-based methods detect bursts based on the similarity analysis of time series subsequences. Wu et al. (2016) transformed the flow time series data from multiple sensors to detection matrixes for each time point of a day in which each vector includes data from different sensors for the same monitoring day. Vectors with low similarity were identified as outliers from all vectors in the same matrix by clustering analysis. The method was proved to be able to effectively detect relatively large bursts and have a low false-positive rate (FPR). Hereafter, Wu et al. (2018) further transformed the normalized flow time series data from multiple sensors to a detection matrix in which each vector includes data from different sensors at the same time point, and then the similarity between vectors is quantified using the cosine distance. Compared with the previous study (Wu et al. 2016), this method not only requires less historical data but also achieves a lower FPR. The preceding two methods are mainly based on the temporal varying correlation between the data from multiple sensors and have been proven to be effective on multi-inlet and multi-outlet DMAs. However, many DMAs in water utilities have a single inlet and only inflow is monitored, which severely limits the real application of these methods.

There are only a few similarity-based methods that focus on the time series data from a single sensor. Mounce et al. (2014) proposed the pattern matching technique based on the similarity between burst-induced flow subsequences for burst detection. The key step is to populate the burst event library, which requires a large amount of burst event records and corresponding monitoring data. Huang et al. (2018) applied the dynamic time warping method to quantify the similarity between different daily flow series and built the Random Forest classification model using a large training data set with both normal flow subsequences and burst-induced flow sequences to identify bursts. Both methods (Mounce et al. 2014; Huang et al. 2018) require large amounts of flow data under burst conditions to train their models. A limitation of their practical application is that rare and incomplete burst event records in practical scenarios make the amount of data under burst conditions much less than under normal conditions. Therefore, it is more practical to develop burst detection methods solely based on a large amount of flow data under normal conditions. Wu & Liu (2020) proposed the subsequence library construction method solely using a large amount of normal flow data. Bursts are detected by analyzing the shape similarity of real-time flow subsequences with flow subsequences in the corresponding library. The method presents a good detection performance on flow data sets generated by a hydraulic model. However, the method has a limited ability for the detection of relatively small bursts from the real-world monitoring flow data with high uncertainty.

To take advantage of the availability of large amounts of flow data sets under normal conditions, this paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in DMAs. Flow subsequence libraries are created using a sliding window method for all time points of a day. The clustering–reconstruction analysis is conducted for each flow subsequence library, to obtain the flow pattern library and reconstruction error subsequences. Detection vectors are extracted from reconstruction error subsequences at the same time point to form the detection matrix, which is then used to determine the threshold vector. The reconstruction versions of new flow subsequences at the detection stage are obtained based on the corresponding flow pattern library. New detection vectors are extracted and compared with the corresponding threshold vector to identify bursts. The proposed method is applied to two real-world DMAs and its detection performance is demonstrated and compared with two previous methods.

Flow time series collected from a DMA under normal conditions exhibit time-varying patterns but follow some certain diurnal patterns with varying fluctuations. The main reason is that the number of consumers in a DMA changes very little over an extended period. Their water usage behaviors are relatively stable and influenced by factors such as seasons, holidays, and weather. When bursts occur in a DMA, the flow subsequences may deviate significantly from flow subsequences under normal conditions. Based on these analyses, this paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in DMAs. The proposed burst detection method consists of four steps: (1) flow subsequence library creation; (2) clustering and reconstruction analysis for the flow subsequence library; (3) detection threshold determination; and (4) burst identification. The schematic diagram of the proposed method is shown in Figure 1 and more details are further described in the following sections.
Figure 1

Schematic diagram of the proposed method.

Figure 1

Schematic diagram of the proposed method.

Close modal

Flow subsequence library creation

In this method, historical flow time series under normal conditions are required for the flow subsequence library creation. To obtain the historical flow time series under normal conditions, a data preprocessing technique is chosen for deleting the burst-induced data from the raw historical flow time series. For time point t of the day, the average value μt and the standard deviation σt are calculated based on all data from that time point for different days from the raw historical flow time series. The upper limit for the time point t is then determined as μt + 3σt. Any data from time point t above the upper limit are substituted by the average value μt.

The sliding window method is then used to generate flow subsequences from historical flow time series under normal conditions. The length of the sliding window is defined as l and the sliding step is set to 1 day. For time point t of d-th day, flow data at that point and l − 1 flow data ahead of that point form a flow subsequence, defined as follows:
(1)
where represents the flow subsequence at time point t of d-th day, and is the flow data at time point t of d-th day. The sliding window moves across the time series and flow subsequence can be generated for time point t of different days. All the flow subsequences at the same time point of different days form a flow subsequence library. The aforementioned process is repeated to form flow subsequence libraries for each time point, as follows.
(2)
where represents the flow subsequence library at time point t, N is the number of days contained in the historical time series, and is the number of time points of a day. For example, if the sampling interval of the sensor is 15 min, is 96.

Clustering and reconstruction

For the flow subsequence library at time point t, clustering analysis is conducted to obtain the corresponding flow pattern library, which consists of all the cluster centers. The k-means ++ algorithm (Arthur & Vassilvitskii 2007) is a powerful unsupervised method for solving cluster problems such as identifying and classifying types of random processes with different labels. It is chosen to cluster the flow subsequences into a given number of clusters. The K-means ++ algorithm involves the following steps:

  • (1)
    A flow subsequence is randomly selected as the first initialized cluster center from the flow subsequence library , represented as follows:
    (3)
  • (2)
    The similarity between the flow subsequence and the cluster center is quantified based on the Euclidean distance and it is calculated as follows:
    (4)

A smaller distance means a higher similarity. For the flow subsequence , the smallest distance from the existing cluster centers is defined as , and the probability that is selected as the next cluster is calculated as follows:
(5)

The next cluster center is selected using the Roulette method based on the probability of each flow subsequence.

  • (3)

    Step (2) is repeated until the initialized cluster center is obtained. The variable nc is the number of clusters.

  • (4)

    For the flow subsequence , its similarity with each cluster center is calculated, respectively. is assigned to the k-th cluster defined as Ck if it has the highest similarity with the cluster center .

  • (5)
    The cluster centers are updated as follows:
    (6)
    where is the number of flow subsequences contained in the cluster Ck.
  • (6)
    Steps 4 and 5 are repeated until the cluster centers no longer change. The obtained cluster centers represent the specific flow patterns and constitute the flow pattern library at time point t, represented as follows:
    (7)

After the clustering analysis is completed for , the reconstruction version of the flow sequence can be determined based on the flow pattern library , as follows:
(8)
where is the i-th flow pattern in the . The meaning is that the reconstructed flow subsequence, defined as , is the flow pattern in the with the minimal distance to , represented as follows:
(9)

After the clustering and reconstruction processes are repeated for each flow subsequence library, the flow subsequences and the corresponding reconstruction versions are further analyzed for the determination of the detection threshold.

Detection threshold matrix determination

A reconstruction error subsequence is obtained by an original flow subsequence and the corresponding reconstructed flow subsequence, defined as follows:
(10)
where represents the reconstruction error subsequence at time point t of d-th day. The last few elements in a reconstruction error subsequence are extracted to form a detection vector, defined as follows:
(11)
where represents the detection subsequence at time point t of d-th day and Nds is the number of elements in the detection vector. All detection vectors at time point t of different days form a detection matrix, defined as follows:
(12)
where represents the detection matrix at time point t. Each column in the detection matrix consists of all reconstruction errors at the same time point for different days. For each column, a detection threshold is determined as the value associated with the given percentile of the reconstruction error distribution. Here the threshold percentile is defined as p. The detection thresholds obtained by each column of the detection matrix form a threshold vector, defined as follows:
(13)
where represents the threshold vector at time point t. All the threshold vectors form the threshold matrix, defined as follows:
(14)
where TS represents the threshold matrix.

Burst identification

At the burst identification stage, the real-time collected flow data at time point t and l − 1 historical flow data ahead of that time point form a new flow subsequence, defined as . The corresponding reconstructed flow subsequence, defined as , is obtained based on the flow pattern library . The new reconstruction error subsequence is obtained by and , defined as follows:
(15)
The new detection vector consists of the last elements in , defined as follows:
(16)
The burst is identified based on the comparison of with the threshold vector at the same time point t in the threshold matrix TS, shown as follows:
(17)
(18)
where is the number of elements in the new detection vector that are larger than the corresponding elements in the threshold vector. The time point t is identified as under burst status if the value of is equal to . Otherwise, the time point t is identified as under normal status. The value of determines the length of detection vectors and threshold vectors. It also directly determines the minimum time difference between the time when a burst event is first detected and the start time of that burst event. If is set to 1, an alarm is triggered if the last element in a new reconstruction error subsequence is greater than the corresponding threshold. Under this condition, the first alarm is expected to be triggered instantaneously after a new burst occurs. However, the number of false alarms may increase and this affects the reliability of the method, as burst events always result in consecutive abnormal flow data. Many previous burst detection methods trigger the alarm when they detect two or more consecutive outliers (Wu et al. 2018; Wang et al. 2020). is set to 3 in the proposed method. The burst is identified and an alarm is triggered if the new detection vector (last three elements in the new reconstruction error subsequence) at time point t is larger than the corresponding threshold vector. Under this condition, the first alarm is expected to be triggered in three time steps after a new burst occurs.

Performance metrics

This study evaluates the detection performance of the proposed method using five metrics, including detection probability (DP), FPR, Recall, Precision, and F1 score (Taormina & Galelli 2018; Wu & Liu 2020). DP is an event-based metric, defined as the percentage of the number of successfully detected burst events to the total number of burst events. FPR is defined as the percentage of the number of time points wrongly classified as under burst status to the total number of time points that are actually under normal status. Recall is the ratio of the number of time points correctly classified as under burst status to the total number of time points that are actually under burst status. Precision is the ratio of the number of time points correctly classified as under burst status to the total number of time points classified as under burst status. F1 score is defined as the harmonic mean of Recall and Precision. These metrics are calculated as follows:
(19)
(20)
(21)
(22)
(23)
where is the total number of successfully detected burst events; is the number of total burst events; FP is the number of false positives; TN is the number of true negatives; TP is the number of true positives; and FN is the number of false negatives.

Case description

The flow data used in this study come from two real-life DMAs in the UK. Both DMAs have a single inlet and the flow is monitored and transmitted every 15 min. DMA-1 has about 1,048 consumers and the average inflow is about 10.2 L/s. The flow time series data set of DMA-1 was collected over 145 days, from 31 October 2016 to 25 March 2017. DMA-2 has about 1,861 consumers and the average inflow is about 12.5 L/s. The flow time series data set of DMA-2 was collected over 485 days, from 1 January 2015 to 30 April 2016. To overcome the impact of data quality on the performance of the proposed detection method, original flow time series data sets are preprocessed by removing outliers and filling missing values by interpolation.

Data set division and synthetic burst generation

The preprocessed flow time series data set is divided into the historical data set and the detection data set. The historical data set is used to construct flow pattern libraries and determine the detection threshold matrix. The detection data set contains both burst-induced flow time series and normal flow time series, and it is used to test the detection performance. Table 1 lists the lengths and date ranges of the historical data sets and the detection data sets for DMA-1 and DMA-2, respectively. The first 100 days of flow data are set as the historical data set and the remaining 45 days of data are set as the detection data set for DMA-1. In addition, to investigate the impact of the length of historical data set on the detection performance, four different lengths of historical data sets are designed for DMA-2, including 184, 245, 306, and 365 days, and they have the same end date. Their detection performances are compared using the same detection data set containing 121 days of data from 1 January 2016 to 30 April 2016.

Table 1

Detailed description of the data set division for DMA-1 and DMA-2

CaseHistorical data setDetection data set
DMA-1 100 days (10/31/2016–2/8/2017) 45 days (2/9/2017–3/25/2017) 
DMA-2 184 days (7/1/2015–12/31/2015) 121 days (1/1/2016–4/30/2016) 
245 days (5/1/2015–12/31/2015) 
306 days (3/1/2015–12/31/2015) 
365 days (1/1/2015–12/31/2015) 
CaseHistorical data setDetection data set
DMA-1 100 days (10/31/2016–2/8/2017) 45 days (2/9/2017–3/25/2017) 
DMA-2 184 days (7/1/2015–12/31/2015) 121 days (1/1/2016–4/30/2016) 
245 days (5/1/2015–12/31/2015) 
306 days (3/1/2015–12/31/2015) 
365 days (1/1/2015–12/31/2015) 

An adequate number of burst events are essential for the evaluation of detection performance of the proposed method. However, burst event records are rare in real-world DMAs. Many previous studies chose to simulate burst events using the hydraulic model (Zhou et al. 2019; Xu et al. 2020) or directly added the additional flow to the original flow data (Wang et al. 2020; Wu & Liu 2020). In this study, burst events are generated by adding various additional flows to the original flow data in the detection data set. Three key factors are considered, including burst start time, duration, and burst flow interval (the percentage of average flow). Eight different burst start times are designated, including 0:00, 3:00, 6:00, 9:00, 12:00, 15:00, 18:00, and 21:00. The burst duration is set to 3 h. Seven burst flow intervals are designated, including (4%, 7%), (7%, 10%), (10%, 13%), (13%, 16%), (16%, 19%), (19%, 22%), and (22%, 25%). For each burst start time, 10 values are randomly sampled from each burst flow interval, to generate 70 synthetic burst events. Therefore, a total of 560 synthetic burst events are artificially generated.

Impact of parameterization on detection performance

There are three parameters in the proposed method, including threshold percentile p, length of flow subsequence l, and the number of clusters nc. In this section, we evaluate the detection performance of the proposed method by exploring the impact of parameterization on five metrics.

Threshold percentile p

The length of flow subsequence and the number of clusters are set to 36 and 10, respectively, for both the DMAs. The value of the parameter threshold percentile p ranges from 91 to 99%, with a constant increment of 2%. Figure 2 shows the variation of DP, FPR, Recall, Precision, and F1 score with different values of p. The DPs for both DMAs decrease linearly when p changes from 91 to 97% and then sharply decrease when p is 99% (Figure 2(a) and 2(c)). The maximum values of the DPs for DMA-1 and DMA-2 are 94.82 and 89.92%, respectively. The minimum values of the DPs for DMA-1 and DMA-2 are 77.85 and 63.92%, respectively. The FPRs for both DMAs decrease linearly with the value of p (Figure 2(a) and 2(c)). The maximum values of the DPs for DMA-1 and DMA-2 are 6.67 and 6.53%, respectively. The minimum values of DP for DMA-1 and DMA-2 are 1.06 and 0.37%, respectively. The Recall values for DMA-1 and DMA-2 decrease from 0.77 and 0.71 to 0.47 and 0.38, respectively, as the value of p increases from the minimum value. Contrariwise, the Precision values for DMA-1 and DMA-2 increase from 0.47 and 0.24 to 0.77 and 0.75, respectively (Figure 2(b) and 2(d)).
Figure 2

Variations of DP, FPR, Recall, Precision, and F1 score with different threshold percentiles p for two cases: (a,b) DMA-1; (c,d) DMA-2.

Figure 2

Variations of DP, FPR, Recall, Precision, and F1 score with different threshold percentiles p for two cases: (a,b) DMA-1; (c,d) DMA-2.

Close modal

The variation of DP, FPR, Recall, and Precision values illustrates that there is a trade-off between the number of detected burst events and the number of false alarms obtained by different threshold percentiles. A lower value of threshold percentile detects a large number of synthetic burst events (high DP and Recall), but at the cost of the increase of false positives (high FPR and low Precision). A higher value of threshold percentile achieves a low number of false positives (low FPR and high Precision), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low DP and Recall). The best value of threshold percentile is determined by analyzing the variation of F1 score with the value of p. The F1 scores for both the DMAs increase up to a maximum value and then decrease as the value of p increases from 91 to 99% (Figure 2(b) and 2(d)). The best value of threshold percentile is equal to 97% for both DMAs, which achieves the maximum value of F1 score. From the perspective of engineering application, the value of threshold percentile can also be determined according to the specific needs of water utility. Water utility can choose a lower value of threshold percentile for a DMA with the high leakage rate, because the increase of the number of false alarms is acceptable compared with the number of detected burst events. For the well-managed DMA, a higher value of threshold percentile can be chosen to decrease the number of false alarms, which can lighten the burden on workers and reduce the capital expenditure.

Length of flow subsequence l

The value of the parameter length of flow subsequence l ranges from 8 to 36 with a constant increment of 4 and from 48 to 96 with a constant increment of 24. The values of DP, FPR, Recall, Precision, and F1 score under different numbers of clusters (p = 97%) are averaged, respectively, to analyze the variation of five metrics with the value of the l, as shown in Figure 3. The DP and FPR for DMA-1 increase from 77.42 and 1.48% to 84.60 and 2.52%, respectively, as the value of l increases from 8 to 20. The DP varies slightly, but the FPR decreases slightly to 2.25% as l changes from 20 to 36 (Figure 3(a)). The DP and FPR for DMA-2 increase from 56.93 and 0.77% to 72.70 and 1.67%, respectively, as the value of l increases from 8 to 24. The DP continues to increase slightly to 75.2%, but the FPR decreases slightly to 1.58% as l changes from 24 to 36 (Figure 3(c)). The DPs and FPRs for both DMAs vary slightly for the value of l greater than 36. As shown in Figure 3(b) and 3(d), low Recall and high Precision are obtained for DMA-1 and DMA-2 under very low values of l (8 and 12) and high Recall and low Precision are obtained under relatively high values of l (24 to 36). The Recall and Precision values vary slightly for both DMAs under very high values of l (greater than 48).
Figure 3

Variations of DP, FPR, Recall, Precision, and F1 score with different lengths of flow subsequence l for two cases: (a,b) DMA-1; (c,d) DMA-2.

Figure 3

Variations of DP, FPR, Recall, Precision, and F1 score with different lengths of flow subsequence l for two cases: (a,b) DMA-1; (c,d) DMA-2.

Close modal

The variations of the DP, FPR, Recall, and Precision values illustrates that there is a trade-off between the number of detected burst events and the number of false alarms obtained under different lengths of flow subsequence. A relatively high value of l detects a large number of burst events (high DP and Recall), but at the cost of the increase of false positives (high FPR and low Precision). A lower value of l achieves a low number of false positives (low FPR and high Precision), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low DP and Recall). Thus, the best value of length of flow subsequence is determined by analyzing the variation of F1 score with the value of l. The F1 scores for both DMAs initially increase sharply and then vary slightly as the value of l increases from 8 to 96. In addition, excessively high values of l (i.e., 48, 72, and 96) are not recommended because of the little variations for five metrics under these values of l. Conversely, excessively high values of l may result in the significant increase of computation load for clustering analysis owing to the excessively long flow subsequence. It also requires high data integrity at the real-time detection stage, yet missing values and outliers are common in the monitored data set of the real-world DMAs. Hence, the best value of length of flow subsequence is equal to 36 for both DMAs.

The number of clusters nc

The parameter number of clusters nc determines the number of specific flow patterns contained in a flow pattern library. To analyze its impact on the detection performance, different values of nc are designed for DMA-1 and DMA-2 considering the length of the historical data sets. Five different values of nc (i.e., 2, 4, 6, 8 and 10) are set for 100 days of historical data set in the case of DMA-1. Seven different values of nc (i.e., 4, 6, 8, 10, 15, 20, and 30) are set for 306 days of historical data set in the case of DMA-2. The length of flow subsequence and threshold percentile are set to 36 and 97% for the two cases. Figure 4 shows the variations of DP, FPR, Recall, Precision, and F1 score with different values of nc for the two cases. The DPs for DMA-1 and DMA-2 increase from 80.36 and 68.75% to 87.98 and 79.64% with the increase of the value of nc, respectively (Figure 4(a) and 4(c)). The FPR values for DMA-1 and DMA-2 increase slightly from 2.29 and 1.46% to 2.60 and 1.94%, respectively (Figure 4(a) and 4(c)). The Recall values for both DMAs also increase as the value of nc increases from the minimum value. The Precision under high values of nc is greater than that under low values of nc (Figure 4(b) and 4(d)).
Figure 4

Variations of DP, FPR, Recall, Precision, and F1 score with different numbers of clusters nc for two cases: (a) DMA-1; and (b) DMA-2.

Figure 4

Variations of DP, FPR, Recall, Precision, and F1 score with different numbers of clusters nc for two cases: (a) DMA-1; and (b) DMA-2.

Close modal

The variations of DP, FPR, Recall, and Precision values illustrate that there is a trade-off between the number of detected burst events and the number of false alarms obtained by different numbers of clusters. A higher value of nc detects a large number of burst events (high DP and Recall), but at the cost of the increase of the number of false positives (high FPR and low Precision). Conversely, a lower value of nc achieves a low number of false positives (low FPR and high Precision), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low DP and Recall). The best value of nc is determined by analyzing the variation of F1 score. The F1 scores for both DMA-1 and DMA-2 gradually increase to the maximum value and then decrease as the value of nc increases (Figure 4(b) and 4(d)). Thus, the best value of nc is equal to 6 and 10 for DMA-1 and DMA-2, respectively. From the perspective of engineering application, a low value of nc can be chosen for well-managed DMAs to reduce the false alarms, while a high value of nc can be chosen for poorly managed DMAs to increase the sensitivity to burst events.

Impact of the length of historical data set

It is meaningful to study how the detection performance is affected by the lengths of historical data sets in the proposed method, which can guide water utilities in choosing appropriate lengths of historical data sets for clustering analysis. The case of DMA-2 is used to conduct the study. Four different lengths of historical data sets are designated and they are detailed in Table 1. The values of DP, FPR, Recall, Precision, F1 score under different values of l and values of nc (p = 97%) are averaged for each historical data set to analyze the variation of detection performance with the length of historical data set. The DP and FPR decrease gradually from 74.7 and 3.17% to 58.64 and 0.45%, respectively, as the length of historical data set increases from 184 to 365 days (Figure 5(a)). The Recall value decreases from 0.54 to 0.45 and the Precision increases from 0.33 to 0.68 with the increase of the length of historical data set (Figure 5(b)).
Figure 5

Variations of five metrics with different lengths of historical data set for DMA-2: (a) DP and FPR; (b) Recall, Precision, and F1.

Figure 5

Variations of five metrics with different lengths of historical data set for DMA-2: (a) DP and FPR; (b) Recall, Precision, and F1.

Close modal

The results illustrate there is a trade-off between the number of detected bursts and the number of false alarms obtained by different lengths of historical data sets. A longer length of historical data set achieves a low number of false positives (low FPR and high Precision), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low DP and Recall). The main reason is that the historical data set with a longer length contains more flow subsequences, which increases the number of normal flow patterns. Flow subsequences actually under normal conditions in the detection data set are more easily identified as under normal status. The best length of the historical data set is determined by analyzing the variations of F1 score. The F1 score increases up to a maximum value and then decreases as the length of historical data set increases from 184 to 365 days (Figure 5(b)). The best length of historical data set is equal to 306 days, which achieves the maximum value of the F1 score. From the perspective of engineering application, the length of the historical data set can be determined according to specific needs of water utilities. Water utilities can choose a longer length of historical data set for a well-managed DMA and a shorter length of historical data set for a poorly managed DMA with frequent burst events.

Comparison with other detection methods

To test the detection performance of the proposed method, the CUSUM method and the shape similarity-based (SSB) method are chosen for the comparison. The main reasons are as follows. The CUSUM method, the SSB method, and our proposed method all aim to detect bursts purely by analyzing features of monitoring data using the flow time series data set. The CUSUM method focuses on the feature analysis of a single flow data point while both the SSB method and our proposed method focus on the feature analysis of flow subsequences. The CUSUM method detects burst events by analyzing the magnitude of monitoring flow data, which was described in detail by Jung et al. (2015). In the CUSUM method, the appropriate values for the two parameters including the user-defined reference value K and the user-defined decision interval H are determined (K = 2, H = 0.1). The SSB method detects burst events by analyzing the shape similarity of flow subsequences, which was described in detail by Wu & Liu (2020). In the SSB method, the values of three parameters including length of subsequence l, subsequence shift s, and fraction of total historical days p are determined based on the study of Wu & Liu (2020) (l = 5, s = 1, p = 0.05). Tables 2 and 3 present the values of DP, FPR, Recall, Precision, and F1 score obtained by the proposed method, the CUSUM method, and the SSB method for DMA-1 and DMA-2, respectively. The proposed method achieves higher DP, Recall, Precision, and F1 score and lower FPR compared with the CUSUM method for both DMAs. This means that the proposed method can detect a large number of burst events with a limited number of false alarms. Although the proposed method leads to slightly high FPR compared with the SSB method, it achieves higher DP, Recall, Precision, and F1 score values. The mean DP obtained by the proposed method is larger than 80%, but the mean DP obtained by the SSB method for the two DMAs is only about 62.5%. This means that the SSB method achieves fewer false alarms, but at the cost of the significant decrease of sensitivity to bursts. Therefore, it is concluded that the proposed method performs better than the CUSUM method and the SSB method.

Table 2

Performance of the proposed method, the CUSUM method, and the SSB method obtained for DMA-1

MethodDP (%)FPR (%)RecallPrecisionF1
Proposed method 87.85 2.52 0.61 0.65 0.63 
CUSUM method 70.71 3.56 0.43 0.55 0.48 
SSB method 61.78 1.99 0.41 0.62 0.50 
MethodDP (%)FPR (%)RecallPrecisionF1
Proposed method 87.85 2.52 0.61 0.65 0.63 
CUSUM method 70.71 3.56 0.43 0.55 0.48 
SSB method 61.78 1.99 0.41 0.62 0.50 
Table 3

Performance of the proposed method, the CUSUM method, and the SSB method obtained for DMA-2

MethodDP (%)FPR (%)RecallPrecisionF1
Proposed method 79.11 1.52 0.52 0.50 0.51 
CUSUM method 56.78 6.47 0.51 0.19 0.27 
SSB method 63.57 1.40 0.43 0.47 0.45 
MethodDP (%)FPR (%)RecallPrecisionF1
Proposed method 79.11 1.52 0.52 0.50 0.51 
CUSUM method 56.78 6.47 0.51 0.19 0.27 
SSB method 63.57 1.40 0.43 0.47 0.45 

Impact of the uncertainty of burst events

The preceding sections have focused on the overall DP for the total number of synthetic burst events. Furthermore, another key aspect is how the ability to detect bursts changes with the start time and flow interval of synthetic bursts. Tables 4 and 5 present the number of successfully detected synthetic burst events under different start times and flow intervals for the burst events for DMA-1 and DMA-2, respectively. The total number of synthetic events for each start time and flow interval are 10.

For the DMA-1 (Table 4), the number of successfully detected synthetic burst events increases significantly with the burst flow interval. A total of 46 out of 80 synthetic burst events are detected with a DP of 57.5% when the burst flow interval is (4%, 7%). Nearly all synthetic burst events can be detected when the burst flow is above 16% of average daily flow. Overall, the detectability is relatively low for burst events below 7% of the average daily flow. More specifically, the number of successfully detected synthetic burst events and their flow intervals change with different burst start times. When the burst start time is 00:00, 03:00, and 06:00, nearly all synthetic burst events are successfully detected, illustrating the good ability to detect bursts. When the start time is 09:00, 15:00, and 18:00, most of the bursts can be detected with a DP of 86%. However, the ability is significantly limited for bursts below 7% of average daily flow. When the start time is 12:00 and 21:00, the DP further decreases to 75% and the ability is significantly limited for bursts below 10% of average daily flow.

Table 4

The number of successfully detected burst events under different start times and flow intervals of bursts for DMA-1

Start timeFlow interval (percentage of average daily flow) (%)
Sum
4–77–1010–1313–1616–1919–2222–25
00:00 10 10 10 10 10 10 10 70 
03:00 10 10 10 10 10 10 10 70 
06:00 10 10 10 10 10 67 
09:00 10 10 10 10 61 
12:00 10 10 10 54 
15:00 10 10 10 10 10 60 
18:00 10 10 60 
21:00 10 10 10 52 
Sum 46 61 72 76 79 80 80 494 
Start timeFlow interval (percentage of average daily flow) (%)
Sum
4–77–1010–1313–1616–1919–2222–25
00:00 10 10 10 10 10 10 10 70 
03:00 10 10 10 10 10 10 10 70 
06:00 10 10 10 10 10 67 
09:00 10 10 10 10 61 
12:00 10 10 10 54 
15:00 10 10 10 10 10 60 
18:00 10 10 60 
21:00 10 10 10 52 
Sum 46 61 72 76 79 80 80 494 

For the DMA-2 (Table 5), the number of successfully detected synthetic burst events also increases significantly with the burst flow interval. Only 36 out of 80 synthetic burst events are detected when the flow interval is (4%, 7%). A total of 75 out of 80 synthetic burst events are detected when the flow interval is (19%, 22%). Overall, the detection ability is significantly limited for small bursts below 7% of average daily flow. In addition, the number of successfully detected synthetic burst events and their flow intervals also change with different burst start times. When the burst start time is 00:00 and 03:00, almost all synthetic burst events can be detected, illustrating the good detection ability. However, when the burst start time is 06:00, the number of detected synthetic burst events is lowest with the DP of 54% and the detection ability is limited for bursts below 13% of the average daily flow. Moreover, when the burst start time is 09:00, 12:00, 15:00, and 18:00, the detection ability is significantly limited for bursts below 10% of average daily flow and is relatively low for bursts below 13% of average daily flow. When the burst start time is 21:00, bursts above 7% of the average daily flow can be easily detected with the high DP (90%).

Table 5

The number of successfully detected burst events under different start times and flow intervals of bursts for DMA-2

Start timeFlow interval (percentage of average daily flow) (%)
Sum
4–77–1010–1313–1616–1919–2222–25
00:00 10 10 10 10 10 10 69 
03:00 10 10 10 10 10 10 69 
06:00 38 
09:00 10 10 10 51 
12:00 10 10 53 
15:00 10 47 
18:00 45 
21:00 10 10 10 10 63 
Sum 36 49 61 68 71 75 75 435 
Start timeFlow interval (percentage of average daily flow) (%)
Sum
4–77–1010–1313–1616–1919–2222–25
00:00 10 10 10 10 10 10 69 
03:00 10 10 10 10 10 10 69 
06:00 38 
09:00 10 10 10 51 
12:00 10 10 53 
15:00 10 47 
18:00 45 
21:00 10 10 10 10 63 
Sum 36 49 61 68 71 75 75 435 

The results obtained from DMA-1 and DMA-2 illustrate that burst events with higher flow are more easily detected and the minimum flow of detectable bursts changes with different burst start times. Flow subsequences under bursts with higher flow are more dissimilar with normal flow patterns and lead to larger reconstruction errors, which makes the detection vectors larger than the threshold vectors. In addition, a threshold vector changes at different time points of a day. A larger threshold vector decreases the sensitivity to small bursts and makes detectable burst flow higher.

To take advantage of the availability of large amounts of flow data set under normal conditions, this paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection. The basis of this method is creating flow subsequence libraries using the historical flow time series under normal conditions so that the model can accurately generate normal flow patterns for identifying bursts. Most of the previous burst detection methods, such as the prediction-classification method (Romano et al. 2014; Wan et al. 2023) and SPC-based method (Jung et al. 2015), also require the historical flow time series under normal conditions to build their models. However, incomplete burst records make it difficult to directly remove the historical burst events from the raw historical flow data set. Data preprocessing techniques such as SPC have been chosen for dealing with the issue of incomplete burst records (Romano et al. 2014). The assumption behind the proposed method in this paper is that only fewer large burst events exist in the historical flow data set and they can be removed by the data preprocessing technique to obtain the flow time series under normal conditions. The limitation of this assumption is that the generated flow subsequence library may contain some flow subsequences under small bursts in practical scenarios, as it is slightly difficult to remove all burst-induced data from the raw historical data set becaue of the incomplete burst records. The potential consequence of this limitation is that the proposed method may not effectively and reliably detect small burst events in practical application.

Another limitation is that the detection performance of the proposed method for burst events is tested on synthetic bursts. Synthetic bursts are artificially generated by adding different additional flows to the original flow time series. However, real-world burst events exhibit greater complexity and variability, as they may occur at any time of the day and their burst flows vary with different burst locations and crack sizes. The duration of real-world burst events is also longer than the simulated durations. To make synthetic burst events complex in our study, eight different time points of the day (representing peaks and troughs in the flow time series) are chosen as random start times of burst events and the burst flow of each synthetic burst is randomly sampled from a large range of burst flow intervals (4 ∼ 25% of average daily flow). Future work should test the performance of the proposed method by using real-world burst events.

This paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in DMAs. The proposed method is based on the temporal correlation analysis of the flow time series data set, different from the burst detection method focusing on a single flow data point. The main idea of the proposed method is that historical normal flow subsequences are clustered to obtain normal flow patterns, which are subsequently used to reconstruct flow subsequences. Flow subsequences under bursts lead to higher reconstruction errors than normal flow subsequences. The proposed method is applied to two real-world DMAs and its detection performance is compared with the CUSUM method and the SSB method. The impact of parameterization on detection performance is analyzed to guide the value setting of the parameters included in the proposed method. The impact of the length of historical data set and the uncertainty of bursts are also discussed. According to the results obtained from two cases, the following conclusions are drawn:

  • (1)

    The proposed method is proven to be effective in detecting burst events with fewer number of false alarms. The proposed method achieves a better detection performance than the CUSUM method and the SSB method based on the values of DP, FPR, Recall, Precision, and F1 score.

  • (2)

    The length of the historical data set and the parameters in the proposed method including the threshold percentile, the length of flow subsequence, and the number of clusters lead to a trade-off between the number of detected burst events and false alarms. The determination of these parameters should take into account the preference of water utilities for high burst detection ability or fewer false alarms in engineering application.

  • (3)

    The detectability of the proposed method varies with different start times and flow intervals of bursts due to the variation of the threshold vector at different time points of a day. The minimum flow of detected burst events is determined for different burst start times.

In future work, DMAs with more complicated flow patterns can be collected to test the detection performance. The flow pattern library at each time point could be adaptively updated to further improve the detection performance after some false alarms are raised.

This work was supported by the National Natural Science Foundation of China (52079016, 52122901).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Arthur
D.
&
Vassilvitskii
S.
2007
k-means + +: The Advantages of Careful Seeding
.
Society for Industrial and Applied Mathematics
,
New Orleans, Louisiana
, pp.
1027
1035
.
Claudio
K.
,
Couallier
V.
,
Leclerc
C.
,
Le Gat
Y.
,
Litrico
X.
&
Saracco
J.
2015
Detecting leaks through AMR data analysis
.
Water Supply
15
(
6
),
1368
1372
.
Du
K.
,
Li
J.
,
Xu
W.
,
Liu
Z.
&
Zheng
F.
2023
Topology-distance-based clustering method for water distribution network partitioning
.
AQUA – Water Infrastructure, Ecosystems and Society
72
(
11
),
2186
2198
.
Guo
G.
,
Liu
S.
,
Wu
Y.
,
Li
J.
,
Zhou
R.
&
Zhu
X.
2018
Short-term water demand forecast based on deep learning method
.
Journal of Water Resources Planning and Management
144
(
12
),
04018076
.
Guo
G.
,
Liu
S.
,
Jia
D.
,
Wang
S.
&
Wu
X.
2021
Simulation of a leak's growth process in water distribution systems based on growth functions
.
AQUA – Water Infrastructure, Ecosystems and Society
70
(
4
),
521
536
.
Li
Z.
,
Zhang
C.
,
Liu
H.
,
Zhang
C.
,
Zhao
M.
,
Gong
Q.
&
Fu
G.
2022
Developing stacking ensemble models for multivariate contamination detection in water distribution systems
.
Science of the Total Environment
828
,
154284
.
Mounce
S. R.
,
Mounce
R. B.
&
Boxall
J. B.
2011
Novelty detection for time series data analysis in water distribution systems using support vector machines
.
Journal of Hydroinformatics
13
(
4
),
672
686
.
Mounce
S. R.
,
Mounce
R. B.
,
Jackson
T.
,
Austin
J.
&
Boxall
J. B.
2014
Pattern matching and associative artificial neural networks for water distribution system time series data analysis
.
Journal of Hydroinformatics
16
(
3
),
617
632
.
Nimri
W.
,
Wang
Y.
,
Zhang
Z.
,
Deng
C.
&
Sellstrom
K.
2023
Data-driven approaches and model-based methods for detecting and locating leaks in water distribution systems: A literature review
.
Neural Computing and Applications
35
,
11611
11623
.
Palau
C. V.
,
Arregui
F. J.
&
Carlos
M.
2012
Burst detection in water networks using principal component analysis
.
Journal of Water Resources Planning and Management
138
(
1
),
47
54
.
Romano
M.
,
Kapelan
Z.
&
Savić
D. A.
2014
Automated detection of pipe bursts and other events in water distribution systems
.
Journal of Water Resources Planning and Management
140
(
4
),
457
467
.
Taormina
R.
&
Galelli
S.
2018
Deep-learning approach to the detection and localization of cyber-physical attacks on water distribution systems
.
Journal of Water Resources Planning and Management
144
(
10
),
04018065
.
Wan
X.
,
Farmani
R.
&
Keedwell
E.
2023
Gradual leak detection in water distribution networks based on multistep forecasting strategy
.
Journal of Water Resources Planning and Management
149
(
8
),
04023035
.
Wang
X.
,
Guo
G.
,
Liu
S.
,
Wu
Y.
,
Xu
X.
&
Smith
K.
2020
Burst detection in district metering areas using deep learning method
.
Journal of Water Resources Planning and Management
146
(
6
),
04020031
.
Wang
C.
,
Xu
Q.
,
Qiang
Z.
&
Zhou
Y.
2022
Research on pipe burst in water distribution systems: Knowledge structure and emerging trends
.
AQUA – Water Infrastructure, Ecosystems and Society
71
(
12
),
1408
1424
.
Wu
Y.
&
Liu
S.
2020
Burst detection by analyzing shape similarity of time series subsequences in district metering areas
.
Journal of Water Resources Planning and Management
146
(
1
),
04019068
.
Wu
Y.
,
Liu
S.
,
Smith
K.
&
Wang
X.
2018
Using correlation between data from multiple monitoring sensors to detect bursts in water distribution systems
.
Journal of Water Resources Planning and Management
144
(
2
),
04017084
.
Wu
Z. Y.
,
Chew
A.
,
Meng
X.
,
Cai
J.
,
Pok
J.
,
Kalfarisi
R.
,
Lai
K. C.
,
Hew
S. F.
&
Wong
J. J.
2022
Data-driven and model-based framework for smart water grid anomaly detection and localization
.
AQUA – Water Infrastructure, Ecosystems and Society
71
(
1
),
31
41
.
Xu
W.
,
Zhou
X.
,
Xin
K.
,
Boxall
J.
,
Yan
H.
&
Tao
T.
2020
Disturbance extraction for burst detection in water distribution networks using pressure measurements
.
Water Resources Research
56
(
5
),
e2019WR025526
.
Ye
G.
&
Fenner
R. A.
2011
Kalman filtering of hydraulic measurements for burst detection in water distribution systems
.
Journal of Pipeline Systems Engineering and Practice
2
(
1
),
14
22
.
Zhou
X.
,
Tang
Z.
,
Xu
W.
,
Meng
F.
,
Chu
X.
,
Xin
K.
&
Fu
G.
2019
Deep learning identifies accurate burst locations in water distribution networks
.
Water Research
166
,
115058
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).