## ABSTRACT

Frequent burst events in water distribution systems cause severe water loss and other environmental issues such as contamination and carbon emissions. The availability of massive monitored data has facilitated the development of data-driven burst detection methods. This paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in district metering areas (DMAs). The sliding window is used to create flow subsequence libraries for all time points of a day using a historical data set and thereafter the clustering–reconstruction analysis is conducted to obtain flow pattern libraries and reconstruction error subsequences. The threshold vector is determined by the detection matrix extracted from the reconstruction error subsequences at each time point. At the detection stage, the new flow subsequence is created and its reconstruction version is obtained based on the flow pattern library at the same time point. The new detection vector is extracted and compared with the threshold vector to identify bursts. The proposed method is applied to two real-world DMAs and its detection performance is demonstrated and compared with two previous methods. The proposed method is proven to be effective in detecting burst events with fewer false alarms.

## HIGHLIGHTS

The flow subsequences clustering–reconstruction analysis method is proposed for burst detection.

The impact of parameterization on the detection performance is discussed.

The selection of the length of the historical data set directly affects the detection performance.

The proposed method performs better than the cumulative sum and shape similarity-based methods on two real-world DMA cases.

## INTRODUCTION

Water is one of the most important necessities for human survival and development. However, many countries in the world have been facing severe shortages of water because of rapid population growth and urbanization, economic growth, environmental pollution, and climate change. Water distribution systems (WDSs), as a critical infrastructure, play a vital role in safely delivering drinking water to customers. However, pipe bursts occur frequently in WDSs because of pipe ageing, inadequate construction quality, or lack of maintenance, which causes a significant amount of water loss (Guo *et al.* 2021). In addition, pipe bursts may threaten public health, increase carbon emissions, and disrupt normal water supply (Wang *et al.* 2022). Hence, developing effective and efficient pipe burst detection methods is a proactive means to reduce water losses and improve the management of water resources.

District metering areas (DMAs) and supervisory control and data acquisition (SCADA) systems are effective ways for controlling leakage in WDSs (Du *et al.* 2023). With the availability of real-time hydraulic data such as pipe flows and nodal pressures collected by installed sensors in DMAs, data-driven methods have been increasingly studied and developed for burst detection (Wu *et al.* 2022; Nimri *et al.* 2023). The data-driven methods developed in previous studies can be classified into three categories: prediction-classification method, statistical process control (SPC)-based methods, and similarity-based methods. In the prediction-classification methods, the first step is developing prediction models; then the residuals between the observed values and the corresponding predicted values are further analyzed to identify anomalies (Li *et al.* 2022). Various prediction models have been studied such as the artificial neural network (Romano *et al.* 2014; Wang *et al.* 2020), the Kalman filter (Ye & Fenner 2011), and the support vector machine (Mounce *et al.* 2011). Furthermore, different detection threshold-setting methods have been studied using statistical theory (Wang *et al.* 2020) or the probabilistic approach (Hutton & Kapelan 2015) based on the residuals of the training data set. The key issue of prediction-classification methods is how to select the input features of the prediction model for improving the prediction accuracy, which directly affects the detection performance (Wu & Liu 2017; Guo *et al.* 2018). In addition, different threshold-setting strategies based on the prediction residuals also affect the detection performance (Wang *et al.* 2020).

SPC-based methods, as simple but effective burst detection methods, use control charts with a set of control limits to identify the abnormal fluctuation in the hydraulic data, which have been widely studied. The most common SPC-based methods in the previous studies include Western Electrical Company rules (Jung *et al.* 2015), exponential weighted moving average (Claudio *et al.* 2015), Hoteling *T*^{2} control chart (Palau *et al.* 2012), and cumulative sum (CUSUM) (Jung *et al.* 2015). However, SPC-based methods do not achieve a promising detection performance in real-world monitoring data because the assumption of Gaussian distribution of the data may not be satisfied (Jung *et al.* 2015; Wu & Liu 2020).

Unlike the prediction-classification and SPC-based methods that detect bursts by determining whether a single hydraulic data point is an outlier, similarity-based methods detect bursts based on the similarity analysis of time series subsequences. Wu *et al.* (2016) transformed the flow time series data from multiple sensors to detection matrixes for each time point of a day in which each vector includes data from different sensors for the same monitoring day. Vectors with low similarity were identified as outliers from all vectors in the same matrix by clustering analysis. The method was proved to be able to effectively detect relatively large bursts and have a low false-positive rate (*FPR*). Hereafter, Wu *et al.* (2018) further transformed the normalized flow time series data from multiple sensors to a detection matrix in which each vector includes data from different sensors at the same time point, and then the similarity between vectors is quantified using the cosine distance. Compared with the previous study (Wu *et al.* 2016), this method not only requires less historical data but also achieves a lower *FPR*. The preceding two methods are mainly based on the temporal varying correlation between the data from multiple sensors and have been proven to be effective on multi-inlet and multi-outlet DMAs. However, many DMAs in water utilities have a single inlet and only inflow is monitored, which severely limits the real application of these methods.

There are only a few similarity-based methods that focus on the time series data from a single sensor. Mounce *et al.* (2014) proposed the pattern matching technique based on the similarity between burst-induced flow subsequences for burst detection. The key step is to populate the burst event library, which requires a large amount of burst event records and corresponding monitoring data. Huang *et al.* (2018) applied the dynamic time warping method to quantify the similarity between different daily flow series and built the Random Forest classification model using a large training data set with both normal flow subsequences and burst-induced flow sequences to identify bursts. Both methods (Mounce *et al.* 2014; Huang *et al.* 2018) require large amounts of flow data under burst conditions to train their models. A limitation of their practical application is that rare and incomplete burst event records in practical scenarios make the amount of data under burst conditions much less than under normal conditions. Therefore, it is more practical to develop burst detection methods solely based on a large amount of flow data under normal conditions. Wu & Liu (2020) proposed the subsequence library construction method solely using a large amount of normal flow data. Bursts are detected by analyzing the shape similarity of real-time flow subsequences with flow subsequences in the corresponding library. The method presents a good detection performance on flow data sets generated by a hydraulic model. However, the method has a limited ability for the detection of relatively small bursts from the real-world monitoring flow data with high uncertainty.

To take advantage of the availability of large amounts of flow data sets under normal conditions, this paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in DMAs. Flow subsequence libraries are created using a sliding window method for all time points of a day. The clustering–reconstruction analysis is conducted for each flow subsequence library, to obtain the flow pattern library and reconstruction error subsequences. Detection vectors are extracted from reconstruction error subsequences at the same time point to form the detection matrix, which is then used to determine the threshold vector. The reconstruction versions of new flow subsequences at the detection stage are obtained based on the corresponding flow pattern library. New detection vectors are extracted and compared with the corresponding threshold vector to identify bursts. The proposed method is applied to two real-world DMAs and its detection performance is demonstrated and compared with two previous methods.

## METHODOLOGY

### Flow subsequence library creation

In this method, historical flow time series under normal conditions are required for the flow subsequence library creation. To obtain the historical flow time series under normal conditions, a data preprocessing technique is chosen for deleting the burst-induced data from the raw historical flow time series. For time point *t* of the day, the average value *μ _{t}* and the standard deviation

*σ*are calculated based on all data from that time point for different days from the raw historical flow time series. The upper limit for the time point

_{t}*t*is then determined as

*μ*+ 3

_{t}*σ*. Any data from time point

_{t}*t*above the upper limit are substituted by the average value

*μ*.

_{t}*l*and the sliding step is set to 1 day. For time point

*t*of

*d*-th day, flow data at that point and

*l*− 1 flow data ahead of that point form a flow subsequence, defined as follows:where represents the flow subsequence at time point

*t*of

*d*-th day, and is the flow data at time point

*t*of

*d*-th day. The sliding window moves across the time series and flow subsequence can be generated for time point

*t*of different days. All the flow subsequences at the same time point of different days form a flow subsequence library. The aforementioned process is repeated to form flow subsequence libraries for each time point, as follows.where represents the flow subsequence library at time point

*t*,

*N*is the number of days contained in the historical time series, and is the number of time points of a day. For example, if the sampling interval of the sensor is 15 min, is 96.

### Clustering and reconstruction

For the flow subsequence library at time point *t*, clustering analysis is conducted to obtain the corresponding flow pattern library, which consists of all the cluster centers. The *k*-means ++ algorithm (Arthur & Vassilvitskii 2007) is a powerful unsupervised method for solving cluster problems such as identifying and classifying types of random processes with different labels. It is chosen to cluster the flow subsequences into a given number of clusters. The K-means ++ algorithm involves the following steps:

- (1)
- (2)

The next cluster center is selected using the Roulette method based on the probability of each flow subsequence.

- (3)
Step (2) is repeated until the initialized cluster center is obtained. The variable

*nc*is the number of clusters. - (4)
For the flow subsequence , its similarity with each cluster center is calculated, respectively. is assigned to the

*k*-th cluster defined as*C*if it has the highest similarity with the cluster center ._{k} - (5)
- (6)

*i*-th flow pattern in the . The meaning is that the reconstructed flow subsequence, defined as , is the flow pattern in the with the minimal distance to , represented as follows:

After the clustering and reconstruction processes are repeated for each flow subsequence library, the flow subsequences and the corresponding reconstruction versions are further analyzed for the determination of the detection threshold.

### Detection threshold matrix determination

*t*of

*d*-th day. The last few elements in a reconstruction error subsequence are extracted to form a detection vector, defined as follows:where represents the detection subsequence at time point

*t*of

*d*-th day and

*N*is the number of elements in the detection vector. All detection vectors at time point

_{ds}*t*of different days form a detection matrix, defined as follows:where represents the detection matrix at time point

*t*. Each column in the detection matrix consists of all reconstruction errors at the same time point for different days. For each column, a detection threshold is determined as the value associated with the given percentile of the reconstruction error distribution. Here the threshold percentile is defined as

*p*. The detection thresholds obtained by each column of the detection matrix form a threshold vector, defined as follows:where represents the threshold vector at time point

*t*. All the threshold vectors form the threshold matrix, defined as follows:where

*TS*represents the threshold matrix.

### Burst identification

*t*and

*l*− 1 historical flow data ahead of that time point form a new flow subsequence, defined as . The corresponding reconstructed flow subsequence, defined as , is obtained based on the flow pattern library . The new reconstruction error subsequence is obtained by and , defined as follows:

*t*in the threshold matrix TS, shown as follows:where is the number of elements in the new detection vector that are larger than the corresponding elements in the threshold vector. The time point

*t*is identified as under burst status if the value of is equal to . Otherwise, the time point

*t*is identified as under normal status. The value of determines the length of detection vectors and threshold vectors. It also directly determines the minimum time difference between the time when a burst event is first detected and the start time of that burst event. If is set to 1, an alarm is triggered if the last element in a new reconstruction error subsequence is greater than the corresponding threshold. Under this condition, the first alarm is expected to be triggered instantaneously after a new burst occurs. However, the number of false alarms may increase and this affects the reliability of the method, as burst events always result in consecutive abnormal flow data. Many previous burst detection methods trigger the alarm when they detect two or more consecutive outliers (Wu

*et al.*2018; Wang

*et al.*2020). is set to 3 in the proposed method. The burst is identified and an alarm is triggered if the new detection vector (last three elements in the new reconstruction error subsequence) at time point

*t*is larger than the corresponding threshold vector. Under this condition, the first alarm is expected to be triggered in three time steps after a new burst occurs.

### Performance metrics

*DP*),

*FPR*,

*Recall*,

*Precision*, and

*F*1 score (Taormina & Galelli 2018; Wu & Liu 2020).

*DP*is an event-based metric, defined as the percentage of the number of successfully detected burst events to the total number of burst events.

*FPR*is defined as the percentage of the number of time points wrongly classified as under burst status to the total number of time points that are actually under normal status. Recall is the ratio of the number of time points correctly classified as under burst status to the total number of time points that are actually under burst status. Precision is the ratio of the number of time points correctly classified as under burst status to the total number of time points classified as under burst status.

*F*1 score is defined as the harmonic mean of Recall and Precision. These metrics are calculated as follows:

## CASE STUDY

### Case description

The flow data used in this study come from two real-life DMAs in the UK. Both DMAs have a single inlet and the flow is monitored and transmitted every 15 min. DMA-1 has about 1,048 consumers and the average inflow is about 10.2 L/s. The flow time series data set of DMA-1 was collected over 145 days, from 31 October 2016 to 25 March 2017. DMA-2 has about 1,861 consumers and the average inflow is about 12.5 L/s. The flow time series data set of DMA-2 was collected over 485 days, from 1 January 2015 to 30 April 2016. To overcome the impact of data quality on the performance of the proposed detection method, original flow time series data sets are preprocessed by removing outliers and filling missing values by interpolation.

### Data set division and synthetic burst generation

The preprocessed flow time series data set is divided into the historical data set and the detection data set. The historical data set is used to construct flow pattern libraries and determine the detection threshold matrix. The detection data set contains both burst-induced flow time series and normal flow time series, and it is used to test the detection performance. Table 1 lists the lengths and date ranges of the historical data sets and the detection data sets for DMA-1 and DMA-2, respectively. The first 100 days of flow data are set as the historical data set and the remaining 45 days of data are set as the detection data set for DMA-1. In addition, to investigate the impact of the length of historical data set on the detection performance, four different lengths of historical data sets are designed for DMA-2, including 184, 245, 306, and 365 days, and they have the same end date. Their detection performances are compared using the same detection data set containing 121 days of data from 1 January 2016 to 30 April 2016.

Case . | Historical data set . | Detection data set . |
---|---|---|

DMA-1 | 100 days (10/31/2016–2/8/2017) | 45 days (2/9/2017–3/25/2017) |

DMA-2 | 184 days (7/1/2015–12/31/2015) | 121 days (1/1/2016–4/30/2016) |

245 days (5/1/2015–12/31/2015) | ||

306 days (3/1/2015–12/31/2015) | ||

365 days (1/1/2015–12/31/2015) |

Case . | Historical data set . | Detection data set . |
---|---|---|

DMA-1 | 100 days (10/31/2016–2/8/2017) | 45 days (2/9/2017–3/25/2017) |

DMA-2 | 184 days (7/1/2015–12/31/2015) | 121 days (1/1/2016–4/30/2016) |

245 days (5/1/2015–12/31/2015) | ||

306 days (3/1/2015–12/31/2015) | ||

365 days (1/1/2015–12/31/2015) |

An adequate number of burst events are essential for the evaluation of detection performance of the proposed method. However, burst event records are rare in real-world DMAs. Many previous studies chose to simulate burst events using the hydraulic model (Zhou *et al.* 2019; Xu *et al.* 2020) or directly added the additional flow to the original flow data (Wang *et al.* 2020; Wu & Liu 2020). In this study, burst events are generated by adding various additional flows to the original flow data in the detection data set. Three key factors are considered, including burst start time, duration, and burst flow interval (the percentage of average flow). Eight different burst start times are designated, including 0:00, 3:00, 6:00, 9:00, 12:00, 15:00, 18:00, and 21:00. The burst duration is set to 3 h. Seven burst flow intervals are designated, including (4%, 7%), (7%, 10%), (10%, 13%), (13%, 16%), (16%, 19%), (19%, 22%), and (22%, 25%). For each burst start time, 10 values are randomly sampled from each burst flow interval, to generate 70 synthetic burst events. Therefore, a total of 560 synthetic burst events are artificially generated.

## RESULTS AND DISCUSSION

### Impact of parameterization on detection performance

There are three parameters in the proposed method, including threshold percentile *p*, length of flow subsequence *l*, and the number of clusters *nc*. In this section, we evaluate the detection performance of the proposed method by exploring the impact of parameterization on five metrics.

#### Threshold percentile *p*

*p*ranges from 91 to 99%, with a constant increment of 2%. Figure 2 shows the variation of

*DP*,

*FPR*,

*Recall*,

*Precision*, and

*F*1 score with different values of

*p*. The

*DP*s for both DMAs decrease linearly when

*p*changes from 91 to 97% and then sharply decrease when

*p*is 99% (Figure 2(a) and 2(c)). The maximum values of the

*DP*s for DMA-1 and DMA-2 are 94.82 and 89.92%, respectively. The minimum values of the

*DP*s for DMA-1 and DMA-2 are 77.85 and 63.92%, respectively. The

*FPR*s for both DMAs decrease linearly with the value of

*p*(Figure 2(a) and 2(c)). The maximum values of the

*DP*s for DMA-1 and DMA-2 are 6.67 and 6.53%, respectively. The minimum values of

*DP*for DMA-1 and DMA-2 are 1.06 and 0.37%, respectively. The

*Recall*values for DMA-1 and DMA-2 decrease from 0.77 and 0.71 to 0.47 and 0.38, respectively, as the value of

*p*increases from the minimum value. Contrariwise, the

*Precision*values for DMA-1 and DMA-2 increase from 0.47 and 0.24 to 0.77 and 0.75, respectively (Figure 2(b) and 2(d)).

The variation of *DP*, *FPR*, *Recall*, and *Precision* values illustrates that there is a trade-off between the number of detected burst events and the number of false alarms obtained by different threshold percentiles. A lower value of threshold percentile detects a large number of synthetic burst events (high *DP* and *Recall*), but at the cost of the increase of false positives (high *FPR* and low *Precision*). A higher value of threshold percentile achieves a low number of false positives (low *FPR* and high *Precision*), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low *DP* and *Recall*). The best value of threshold percentile is determined by analyzing the variation of *F*1 score with the value of *p*. The *F*1 scores for both the DMAs increase up to a maximum value and then decrease as the value of *p* increases from 91 to 99% (Figure 2(b) and 2(d)). The best value of threshold percentile is equal to 97% for both DMAs, which achieves the maximum value of *F*1 score. From the perspective of engineering application, the value of threshold percentile can also be determined according to the specific needs of water utility. Water utility can choose a lower value of threshold percentile for a DMA with the high leakage rate, because the increase of the number of false alarms is acceptable compared with the number of detected burst events. For the well-managed DMA, a higher value of threshold percentile can be chosen to decrease the number of false alarms, which can lighten the burden on workers and reduce the capital expenditure.

#### Length of flow subsequence *l*

*l*ranges from 8 to 36 with a constant increment of 4 and from 48 to 96 with a constant increment of 24. The values of

*DP*,

*FPR*,

*Recall*,

*Precision*, and

*F*1 score under different numbers of clusters (

*p*= 97%) are averaged, respectively, to analyze the variation of five metrics with the value of the

*l*, as shown in Figure 3. The

*DP*and

*FPR*for DMA-1 increase from 77.42 and 1.48% to 84.60 and 2.52%, respectively, as the value of

*l*increases from 8 to 20. The

*DP*varies slightly, but the

*FPR*decreases slightly to 2.25% as

*l*changes from 20 to 36 (Figure 3(a)). The

*DP*and

*FPR*for DMA-2 increase from 56.93 and 0.77% to 72.70 and 1.67%, respectively, as the value of

*l*increases from 8 to 24. The

*DP*continues to increase slightly to 75.2%, but the

*FPR*decreases slightly to 1.58% as

*l*changes from 24 to 36 (Figure 3(c)). The

*DPs*and

*FPRs*for both DMAs vary slightly for the value of

*l*greater than 36. As shown in Figure 3(b) and 3(d), low

*Recall*and high

*Precision*are obtained for DMA-1 and DMA-2 under very low values of

*l*(8 and 12) and high

*Recall*and low

*Precision*are obtained under relatively high values of

*l*(24 to 36). The

*Recall*and

*Precision*values vary slightly for both DMAs under very high values of

*l*(greater than 48).

The variations of the *DP*, *FPR*, *Recall*, and *Precision* values illustrates that there is a trade-off between the number of detected burst events and the number of false alarms obtained under different lengths of flow subsequence. A relatively high value of *l* detects a large number of burst events (high *DP* and *Recall*), but at the cost of the increase of false positives (high *FPR* and low *Precision*). A lower value of *l* achieves a low number of false positives (low *FPR* and high *Precision*), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low *DP* and *Recall*). Thus, the best value of length of flow subsequence is determined by analyzing the variation of *F*1 score with the value of *l*. The *F*1 scores for both DMAs initially increase sharply and then vary slightly as the value of *l* increases from 8 to 96. In addition, excessively high values of *l* (i.e., 48, 72, and 96) are not recommended because of the little variations for five metrics under these values of *l*. Conversely, excessively high values of *l* may result in the significant increase of computation load for clustering analysis owing to the excessively long flow subsequence. It also requires high data integrity at the real-time detection stage, yet missing values and outliers are common in the monitored data set of the real-world DMAs. Hence, the best value of length of flow subsequence is equal to 36 for both DMAs.

#### The number of clusters *nc*

*nc*determines the number of specific flow patterns contained in a flow pattern library. To analyze its impact on the detection performance, different values of

*nc*are designed for DMA-1 and DMA-2 considering the length of the historical data sets. Five different values of

*nc*(i.e., 2, 4, 6, 8 and 10) are set for 100 days of historical data set in the case of DMA-1. Seven different values of

*nc*(i.e., 4, 6, 8, 10, 15, 20, and 30) are set for 306 days of historical data set in the case of DMA-2. The length of flow subsequence and threshold percentile are set to 36 and 97% for the two cases. Figure 4 shows the variations of

*DP*,

*FPR*,

*Recall*,

*Precision*, and

*F*1 score with different values of

*nc*for the two cases. The

*DP*s for DMA-1 and DMA-2 increase from 80.36 and 68.75% to 87.98 and 79.64% with the increase of the value of

*nc*, respectively (Figure 4(a) and 4(c)). The

*FPR*values for DMA-1 and DMA-2 increase slightly from 2.29 and 1.46% to 2.60 and 1.94%, respectively (Figure 4(a) and 4(c)). The

*Recall*values for both DMAs also increase as the value of

*nc*increases from the minimum value. The

*Precision*under high values of

*nc*is greater than that under low values of

*nc*(Figure 4(b) and 4(d)).

The variations of *DP*, *FPR*, *Recall*, and *Precision* values illustrate that there is a trade-off between the number of detected burst events and the number of false alarms obtained by different numbers of clusters. A higher value of *nc* detects a large number of burst events (high *DP* and *Recall*), but at the cost of the increase of the number of false positives (high *FPR* and low *Precision*). Conversely, a lower value of *nc* achieves a low number of false positives (low *FPR* and high *Precision*), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low *DP* and *Recall*). The best value of *nc* is determined by analyzing the variation of *F*1 score. The *F*1 scores for both DMA-1 and DMA-2 gradually increase to the maximum value and then decrease as the value of *nc* increases (Figure 4(b) and 4(d)). Thus, the best value of *nc* is equal to 6 and 10 for DMA-1 and DMA-2, respectively. From the perspective of engineering application, a low value of *nc* can be chosen for well-managed DMAs to reduce the false alarms, while a high value of *nc* can be chosen for poorly managed DMAs to increase the sensitivity to burst events.

### Impact of the length of historical data set

*DP*,

*FPR*,

*Recall*,

*Precision*,

*F*1 score under different values of

*l*and values of

*nc*(

*p*= 97%) are averaged for each historical data set to analyze the variation of detection performance with the length of historical data set. The

*DP*and

*FPR*decrease gradually from 74.7 and 3.17% to 58.64 and 0.45%, respectively, as the length of historical data set increases from 184 to 365 days (Figure 5(a)). The

*Recall*value decreases from 0.54 to 0.45 and the

*Precision*increases from 0.33 to 0.68 with the increase of the length of historical data set (Figure 5(b)).

The results illustrate there is a trade-off between the number of detected bursts and the number of false alarms obtained by different lengths of historical data sets. A longer length of historical data set achieves a low number of false positives (low *FPR* and high Precision), but at the cost of the decrease of the number of detected burst events and the increase of false negatives (low *DP* and *Recall*). The main reason is that the historical data set with a longer length contains more flow subsequences, which increases the number of normal flow patterns. Flow subsequences actually under normal conditions in the detection data set are more easily identified as under normal status. The best length of the historical data set is determined by analyzing the variations of *F*1 score. The *F*1 score increases up to a maximum value and then decreases as the length of historical data set increases from 184 to 365 days (Figure 5(b)). The best length of historical data set is equal to 306 days, which achieves the maximum value of the *F*1 score. From the perspective of engineering application, the length of the historical data set can be determined according to specific needs of water utilities. Water utilities can choose a longer length of historical data set for a well-managed DMA and a shorter length of historical data set for a poorly managed DMA with frequent burst events.

### Comparison with other detection methods

To test the detection performance of the proposed method, the CUSUM method and the shape similarity-based (SSB) method are chosen for the comparison. The main reasons are as follows. The CUSUM method, the SSB method, and our proposed method all aim to detect bursts purely by analyzing features of monitoring data using the flow time series data set. The CUSUM method focuses on the feature analysis of a single flow data point while both the SSB method and our proposed method focus on the feature analysis of flow subsequences. The CUSUM method detects burst events by analyzing the magnitude of monitoring flow data, which was described in detail by Jung *et al.* (2015). In the CUSUM method, the appropriate values for the two parameters including the user-defined reference value *K* and the user-defined decision interval H are determined (*K* = 2, *H* = 0.1). The SSB method detects burst events by analyzing the shape similarity of flow subsequences, which was described in detail by Wu & Liu (2020). In the SSB method, the values of three parameters including length of subsequence *l*, subsequence shift *s*, and fraction of total historical days *p* are determined based on the study of Wu & Liu (2020) (*l* = 5, *s* = 1, *p* = 0.05). Tables 2 and 3 present the values of *DP*, *FPR*, *Recall*, *Precision*, and *F*1 score obtained by the proposed method, the CUSUM method, and the SSB method for DMA-1 and DMA-2, respectively. The proposed method achieves higher *DP*, *Recall*, *Precision*, and *F*1 score and lower *FPR* compared with the CUSUM method for both DMAs. This means that the proposed method can detect a large number of burst events with a limited number of false alarms. Although the proposed method leads to slightly high *FPR* compared with the SSB method, it achieves higher *DP*, *Recall*, *Precision*, and *F*1 score values. The mean *DP* obtained by the proposed method is larger than 80%, but the mean *DP* obtained by the SSB method for the two DMAs is only about 62.5%. This means that the SSB method achieves fewer false alarms, but at the cost of the significant decrease of sensitivity to bursts. Therefore, it is concluded that the proposed method performs better than the CUSUM method and the SSB method.

Method . | DP (%)
. | FPR (%)
. | Recall
. | Precision
. | F1
. |
---|---|---|---|---|---|

Proposed method | 87.85 | 2.52 | 0.61 | 0.65 | 0.63 |

CUSUM method | 70.71 | 3.56 | 0.43 | 0.55 | 0.48 |

SSB method | 61.78 | 1.99 | 0.41 | 0.62 | 0.50 |

Method . | DP (%)
. | FPR (%)
. | Recall
. | Precision
. | F1
. |
---|---|---|---|---|---|

Proposed method | 87.85 | 2.52 | 0.61 | 0.65 | 0.63 |

CUSUM method | 70.71 | 3.56 | 0.43 | 0.55 | 0.48 |

SSB method | 61.78 | 1.99 | 0.41 | 0.62 | 0.50 |

Method . | DP (%)
. | FPR (%)
. | Recall
. | Precision
. | F1
. |
---|---|---|---|---|---|

Proposed method | 79.11 | 1.52 | 0.52 | 0.50 | 0.51 |

CUSUM method | 56.78 | 6.47 | 0.51 | 0.19 | 0.27 |

SSB method | 63.57 | 1.40 | 0.43 | 0.47 | 0.45 |

Method . | DP (%)
. | FPR (%)
. | Recall
. | Precision
. | F1
. |
---|---|---|---|---|---|

Proposed method | 79.11 | 1.52 | 0.52 | 0.50 | 0.51 |

CUSUM method | 56.78 | 6.47 | 0.51 | 0.19 | 0.27 |

SSB method | 63.57 | 1.40 | 0.43 | 0.47 | 0.45 |

### Impact of the uncertainty of burst events

The preceding sections have focused on the overall *DP* for the total number of synthetic burst events. Furthermore, another key aspect is how the ability to detect bursts changes with the start time and flow interval of synthetic bursts. Tables 4 and 5 present the number of successfully detected synthetic burst events under different start times and flow intervals for the burst events for DMA-1 and DMA-2, respectively. The total number of synthetic events for each start time and flow interval are 10.

For the DMA-1 (Table 4), the number of successfully detected synthetic burst events increases significantly with the burst flow interval. A total of 46 out of 80 synthetic burst events are detected with a *DP* of 57.5% when the burst flow interval is (4%, 7%). Nearly all synthetic burst events can be detected when the burst flow is above 16% of average daily flow. Overall, the detectability is relatively low for burst events below 7% of the average daily flow. More specifically, the number of successfully detected synthetic burst events and their flow intervals change with different burst start times. When the burst start time is 00:00, 03:00, and 06:00, nearly all synthetic burst events are successfully detected, illustrating the good ability to detect bursts. When the start time is 09:00, 15:00, and 18:00, most of the bursts can be detected with a *DP* of 86%. However, the ability is significantly limited for bursts below 7% of average daily flow. When the start time is 12:00 and 21:00, the *DP* further decreases to 75% and the ability is significantly limited for bursts below 10% of average daily flow.

Start time . | Flow interval (percentage of average daily flow) (%) . | Sum . | ||||||
---|---|---|---|---|---|---|---|---|

4–7 . | 7–10 . | 10–13 . | 13–16 . | 16–19 . | 19–22 . | 22–25 . | ||

00:00 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 70 |

03:00 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 70 |

06:00 | 8 | 9 | 10 | 10 | 10 | 10 | 10 | 67 |

09:00 | 4 | 8 | 9 | 10 | 10 | 10 | 10 | 61 |

12:00 | 4 | 5 | 7 | 8 | 10 | 10 | 10 | 54 |

15:00 | 3 | 7 | 10 | 10 | 10 | 10 | 10 | 60 |

18:00 | 5 | 8 | 9 | 9 | 9 | 10 | 10 | 60 |

21:00 | 2 | 4 | 7 | 9 | 10 | 10 | 10 | 52 |

Sum | 46 | 61 | 72 | 76 | 79 | 80 | 80 | 494 |

Start time . | Flow interval (percentage of average daily flow) (%) . | Sum . | ||||||
---|---|---|---|---|---|---|---|---|

4–7 . | 7–10 . | 10–13 . | 13–16 . | 16–19 . | 19–22 . | 22–25 . | ||

00:00 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 70 |

03:00 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 70 |

06:00 | 8 | 9 | 10 | 10 | 10 | 10 | 10 | 67 |

09:00 | 4 | 8 | 9 | 10 | 10 | 10 | 10 | 61 |

12:00 | 4 | 5 | 7 | 8 | 10 | 10 | 10 | 54 |

15:00 | 3 | 7 | 10 | 10 | 10 | 10 | 10 | 60 |

18:00 | 5 | 8 | 9 | 9 | 9 | 10 | 10 | 60 |

21:00 | 2 | 4 | 7 | 9 | 10 | 10 | 10 | 52 |

Sum | 46 | 61 | 72 | 76 | 79 | 80 | 80 | 494 |

For the DMA-2 (Table 5), the number of successfully detected synthetic burst events also increases significantly with the burst flow interval. Only 36 out of 80 synthetic burst events are detected when the flow interval is (4%, 7%). A total of 75 out of 80 synthetic burst events are detected when the flow interval is (19%, 22%). Overall, the detection ability is significantly limited for small bursts below 7% of average daily flow. In addition, the number of successfully detected synthetic burst events and their flow intervals also change with different burst start times. When the burst start time is 00:00 and 03:00, almost all synthetic burst events can be detected, illustrating the good detection ability. However, when the burst start time is 06:00, the number of detected synthetic burst events is lowest with the *DP* of 54% and the detection ability is limited for bursts below 13% of the average daily flow. Moreover, when the burst start time is 09:00, 12:00, 15:00, and 18:00, the detection ability is significantly limited for bursts below 10% of average daily flow and is relatively low for bursts below 13% of average daily flow. When the burst start time is 21:00, bursts above 7% of the average daily flow can be easily detected with the high *DP* (90%).

Start time . | Flow interval (percentage of average daily flow) (%) . | Sum . | ||||||
---|---|---|---|---|---|---|---|---|

4–7 . | 7–10 . | 10–13 . | 13–16 . | 16–19 . | 19–22 . | 22–25 . | ||

00:00 | 9 | 10 | 10 | 10 | 10 | 10 | 10 | 69 |

03:00 | 9 | 10 | 10 | 10 | 10 | 10 | 10 | 69 |

06:00 | 2 | 4 | 5 | 6 | 6 | 8 | 7 | 38 |

09:00 | 2 | 4 | 7 | 8 | 10 | 10 | 10 | 51 |

12:00 | 5 | 5 | 6 | 8 | 9 | 10 | 10 | 53 |

15:00 | 2 | 3 | 7 | 8 | 8 | 9 | 10 | 47 |

18:00 | 2 | 4 | 7 | 8 | 8 | 8 | 8 | 45 |

21:00 | 5 | 9 | 9 | 10 | 10 | 10 | 10 | 63 |

Sum | 36 | 49 | 61 | 68 | 71 | 75 | 75 | 435 |

Start time . | Flow interval (percentage of average daily flow) (%) . | Sum . | ||||||
---|---|---|---|---|---|---|---|---|

4–7 . | 7–10 . | 10–13 . | 13–16 . | 16–19 . | 19–22 . | 22–25 . | ||

00:00 | 9 | 10 | 10 | 10 | 10 | 10 | 10 | 69 |

03:00 | 9 | 10 | 10 | 10 | 10 | 10 | 10 | 69 |

06:00 | 2 | 4 | 5 | 6 | 6 | 8 | 7 | 38 |

09:00 | 2 | 4 | 7 | 8 | 10 | 10 | 10 | 51 |

12:00 | 5 | 5 | 6 | 8 | 9 | 10 | 10 | 53 |

15:00 | 2 | 3 | 7 | 8 | 8 | 9 | 10 | 47 |

18:00 | 2 | 4 | 7 | 8 | 8 | 8 | 8 | 45 |

21:00 | 5 | 9 | 9 | 10 | 10 | 10 | 10 | 63 |

Sum | 36 | 49 | 61 | 68 | 71 | 75 | 75 | 435 |

The results obtained from DMA-1 and DMA-2 illustrate that burst events with higher flow are more easily detected and the minimum flow of detectable bursts changes with different burst start times. Flow subsequences under bursts with higher flow are more dissimilar with normal flow patterns and lead to larger reconstruction errors, which makes the detection vectors larger than the threshold vectors. In addition, a threshold vector changes at different time points of a day. A larger threshold vector decreases the sensitivity to small bursts and makes detectable burst flow higher.

## DISCUSSION

To take advantage of the availability of large amounts of flow data set under normal conditions, this paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection. The basis of this method is creating flow subsequence libraries using the historical flow time series under normal conditions so that the model can accurately generate normal flow patterns for identifying bursts. Most of the previous burst detection methods, such as the prediction-classification method (Romano *et al.* 2014; Wan *et al.* 2023) and SPC-based method (Jung *et al.* 2015), also require the historical flow time series under normal conditions to build their models. However, incomplete burst records make it difficult to directly remove the historical burst events from the raw historical flow data set. Data preprocessing techniques such as SPC have been chosen for dealing with the issue of incomplete burst records (Romano *et al.* 2014). The assumption behind the proposed method in this paper is that only fewer large burst events exist in the historical flow data set and they can be removed by the data preprocessing technique to obtain the flow time series under normal conditions. The limitation of this assumption is that the generated flow subsequence library may contain some flow subsequences under small bursts in practical scenarios, as it is slightly difficult to remove all burst-induced data from the raw historical data set becaue of the incomplete burst records. The potential consequence of this limitation is that the proposed method may not effectively and reliably detect small burst events in practical application.

Another limitation is that the detection performance of the proposed method for burst events is tested on synthetic bursts. Synthetic bursts are artificially generated by adding different additional flows to the original flow time series. However, real-world burst events exhibit greater complexity and variability, as they may occur at any time of the day and their burst flows vary with different burst locations and crack sizes. The duration of real-world burst events is also longer than the simulated durations. To make synthetic burst events complex in our study, eight different time points of the day (representing peaks and troughs in the flow time series) are chosen as random start times of burst events and the burst flow of each synthetic burst is randomly sampled from a large range of burst flow intervals (4 ∼ 25% of average daily flow). Future work should test the performance of the proposed method by using real-world burst events.

## CONCLUSIONS

This paper proposes the flow subsequences clustering–reconstruction analysis method for burst detection in DMAs. The proposed method is based on the temporal correlation analysis of the flow time series data set, different from the burst detection method focusing on a single flow data point. The main idea of the proposed method is that historical normal flow subsequences are clustered to obtain normal flow patterns, which are subsequently used to reconstruct flow subsequences. Flow subsequences under bursts lead to higher reconstruction errors than normal flow subsequences. The proposed method is applied to two real-world DMAs and its detection performance is compared with the CUSUM method and the SSB method. The impact of parameterization on detection performance is analyzed to guide the value setting of the parameters included in the proposed method. The impact of the length of historical data set and the uncertainty of bursts are also discussed. According to the results obtained from two cases, the following conclusions are drawn:

- (1)
The proposed method is proven to be effective in detecting burst events with fewer number of false alarms. The proposed method achieves a better detection performance than the CUSUM method and the SSB method based on the values of

*DP*,*FPR*,*Recall*,*Precision*, and*F*1 score. - (2)
The length of the historical data set and the parameters in the proposed method including the threshold percentile, the length of flow subsequence, and the number of clusters lead to a trade-off between the number of detected burst events and false alarms. The determination of these parameters should take into account the preference of water utilities for high burst detection ability or fewer false alarms in engineering application.

- (3)
The detectability of the proposed method varies with different start times and flow intervals of bursts due to the variation of the threshold vector at different time points of a day. The minimum flow of detected burst events is determined for different burst start times.

In future work, DMAs with more complicated flow patterns can be collected to test the detection performance. The flow pattern library at each time point could be adaptively updated to further improve the detection performance after some false alarms are raised.

## ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China (52079016, 52122901).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.