Pipe bursts are an essential issue for water loss in water distribution systems. This study proposes a real-time burst detection method that combines multiple data features of multiple time steps. The method sets burst thresholds in three dimensions according to different moments at a specific monitoring point, and achieves burst identification based on a classification model. First, three data features, namely, absolute pressure value, predicted deviation value obtained by pressure variation value, of historical pressure at each time step are scored based on the Western Electric Company rules. The scores represent different abnormalities. Then, the scores corresponding to the three features are used as input of the decision tree classification model. The trained model is used for detecting burst events. Results show that this method achieves 99.56% detection accuracy, indicating that it is effective for burst detection. The proposed method outperformed the single-feature-based method and provides good results in water distribution systems.

  • A data-driven real-time burst detection method using three different pressure data features, namely, absolute pressure value, predicted deviation value, and pressure variation value.

  • Three stages, namely setting the abnormal recognition thresholds in each feature at each moment separately, scoring real-time pressure feature data, and finally training the decision tree model to form an effective burst recognition model.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Water distribution systems (WDS), the lifeblood of cities, play a vital role in the process of delivering clean drinking water from water plants to users with sufficient pressure and safe quantity (Gong et al. 2014; Ostfeld et al. 2014; Yan et al. 2019). IWA/AWWA standard water balance methodology defines pipe bursts as leakage of high flow rate but short duration (AWWA 2009). Pipe bursts mainly cause water loss and a series of far-reaching secondary hazards (Bakker et al. 2012), such as intrusion of pollutants into the pipe network, road surface collapse, and traffic delay (Berardi et al. 2008; Fox et al. 2016; Qi et al. 2018b). Burst duration depends on awareness, location, and repair periods (Mounce & Boxall 2010; Bakker et al. 2012). Therefore, timely and efficient detection of pipe burst accidents indirectly helps water utilities shorten the burst period and reduce the volume of water lost.

With the wide application of supervisory control and data acquisition (SCADA) systems, data-driven approaches are used in problems to improve the water supply safety of WDS (Soldevila et al. 2016; Sun et al. 2020; Li et al. 2021; Nam et al. 2021). This study focuses on pipe burst detection as the main research content. Real-time burst detection using pressure/flow data from hydraulic monitoring sensors is currently possible (Wu & Liu 2020). Generally, pipe burst leads to excessive deviation from other observations. In other words, continuous sudden changes in measurements may represent the occurrence of abnormal events. Mining useful information from a large number of data and analyzing features of monitoring data are effective for event detection (Romano et al. 2014).

In the context of burst detection, data-driven approaches have been widely studied and are proven effective in real-life burst detection. These methods are classified into three categories based on various techniques, namely, classification method, prediction–classification method, and statistical method (Perelman & Ostfeld 2013; Wu & Liu 2017). The classification method acquires experience from the historical data and then judges whether or not the input data are abnormal (Mounce & Machell 2006; Aksela et al. 2009). The prediction–classification method uses the normal hydraulic data to predict the data of the following time period and implement classification by evaluating the difference between predicted and actual observed values. The statistical method identifies a suspected change in the process output by the utilization of control charts (Palau et al. 2012; Jung & Lansey 2015; Jung et al. 2015; Loureiro et al. 2016). The statistical methods are relatively simple in computation and have become one of the most widely used methods for burst detection (Palau et al. 2012). The data-driven methods are considered to be promising tools to detect pipe bursts.

Two strategies are widely adopted to boost the performance of the burst detection method. The first strategy tries to make full use of the combination of multiple data features to detect the pipe burst (Mounce et al. 2010; Bakker et al. 2012; Bakker et al. 2013; Ye & Fenner 2014; Wu & Liu 2017; Wang et al. 2020). In the water distribution network, a pipe burst can typically trigger responses on multiple sensors. The combined utilization of multiple hydraulic data features from sensors in the time series would be better than a single feature. This strategy can strengthen the difference between hydraulic monitoring data under normal and abnormal conditions and thus obtain good performance for burst detection. The second strategy is to set different burst threshold values at different moments of a specific monitoring point. The pressure data fluctuates from day to day and within a certain range at the same time on different days, resulting in multiple thresholds at different times of the day (Wang et al. 2020). The third strategy aims to use the information from multiple in continuous time steps to identify burst events. Previous research identified burst occurrence based on a single time step (Wang et al. 2020). However, pipe burst is an accumulated water loss process as it usually persists for some time before it can be repaired. This indicates that the event can be identified by the analysis of the hydraulic data from a continuous time step (Aggarwal 2016; Huang et al. 2018; Wang et al. 2020). The approach simultaneously fusing multiple pressure data features at continuous time steps, has not been well explored.

To fill these knowledge gaps, this paper focuses on the combined use of multiple data features at multiple continuous time steps to identify burst events on the basis of the prediction–classification concept. Three effective indicators, namely the residual between the predicted data and observed data, absolute pressure, and pressure variation value, are used to characterize the burst event. The developed method comprises three stages: setting the abnormal recognition thresholds in each feature at each moment separately, evaluating real-time pressure feature data through Western Electric Company (WEC) rules (Hagos et al. 2016), and training the decision tree model to form an effective burst recognition model. The developed method is applied to a realistic case to verify the effectiveness and the conclusions are summarized at the end of the article.

In accordance with the hydraulic model of a WDS, data-driven techniques extract features from monitoring data in pressure patterns when burst accidents occur in the distribution system. This approach is adopted because the pressure data may drop significantly and last for a period of time in the event of a pipe burst (Wu & Liu 2017). The occurrence of a pipe burst in the water distribution system can be detected by matching the extracted features with real-time monitoring data. The structure of the proposed method for detecting pipe bursts is shown in Figure 1.

Figure 1

Burst detection flowchart.

Figure 1

Burst detection flowchart.

Figure 1 shows the process of pipe burst identification. Three features, including the deviation value of predicted pressure from the actual pressure, absolute pressure, and pressure variation value, are extracted from the collected monitoring data. In addition, in accordance with a series of previous measurements, the WEC rules are used to rate the three features, as described in the ‘Scoring evaluation’ section. The specific process of obtaining three-feature extraction is described in detail in the ‘Burst feature extraction’ section. Finally, the decision tree is used as the binary classification model to predict the sample and implement classification.

Burst feature extraction

The data at the monitoring point have changed at different times of the day. In detail, a certain regular pattern is observed, and the changes are nonrandom fluctuations without burst in the WDS (Huang et al. 2018). To strengthen the distinction between normal data and abnormal data, one of the innovations of this article is to detect pipe burst from three dimensions. The specific three characteristics are as follows.

Absolute pressure values

When burst occurs in a WDS, it causes a dramatic change in pressure at some monitoring points as shown in Figure 2(a), out of the range of normal fluctuations (Ma et al. 2016). Therefore, the absolute pressure values can be used to judge whether or not a burst has occurred.

Figure 2

Schematic diagrams of three features under normal conditions and when a pipe burst event occurs: (a) absolute pressure value; (b) pressure variation value; (c) deviation value of pressure prediction.

Figure 2

Schematic diagrams of three features under normal conditions and when a pipe burst event occurs: (a) absolute pressure value; (b) pressure variation value; (c) deviation value of pressure prediction.

Pressure variation value

The pressure variation value refers to the difference between the pressure value at the current moment and the previous moment (Misiunas et al. 2005; Bakker et al. 2014; Li et al. 2015). It reflects the degree and trend of pressure change. The pressure variation value at every specific time under normal working conditions is in accordance with the normal distribution. In the case of a pipe burst, the pressure change value at a certain moment far exceeds the change value under normal conditions as shown in Figure 2(b). Hence, this feature can be used to distinguish between normal and burst events.

Deviation value of pressure prediction

The deviation value of pressure prediction is the deviation between the predicted pressure value and the actual monitored data. Bursts can be identified by classifying normal and abnormal conditions by the pressure prediction deviation (Ye & Fenner 2011; Wang et al. 2020). On the premise that the accuracy of the prediction model satisfies the requirements, the deviation between the predicted value and the measured value can provide strong evidence for the detection of pipe burst accidents as shown in Figure 2(c). In such cases, sudden changes in the underlying data records can be considered anomalous events. The computation of this feature relies on an efficient pressure time-series prediction model, which will be discussed in the following section.

Figure 3

Monitoring network of water supply system: (a) schematic of G city's water supply network; (b) schematic diagram of the distribution of pressure monitoring points in the research area.

Figure 3

Monitoring network of water supply system: (a) schematic of G city's water supply network; (b) schematic diagram of the distribution of pressure monitoring points in the research area.

Scoring evaluation

The WEC rules are decision rules in statistical process control for detecting out-of-control or nonrandom conditions in measured data on the Shewhart control chart. The WEC rules are easy to implement without parameters to be estimated (Hagos et al. 2016). The studies confirmed that the WEC had a good effect in real-time burst detection (Huang et al. 2018) and outperformed multivariate statistical methods with respect to detection probability (AWWA 2009; Jung et al. 2015). The rules attempt to distinguish unnatural patterns from natural patterns based on the following criteria:

Rule 1. Any single measurement is beyond the ±4σ-limit from the centerline.

Rule 2. Two out of three consecutive measurements are beyond the ±3σ-limit on the same side of the centerline.

Rule 3. Four out of five consecutive measurements are beyond the ±2σ-limit on the same side of the centerline.

Rule 4. Eight consecutive measurements are beyond the ±1σ-limit on the same side of the centerline.

The term ±nσ-limit refers to the centerline that is plus or minus n standard deviations from the mean μ. The rules are appropriate for the same side of the centerline. The WEC rules consider, at most, the eight most recent previous measurements, with a potential to consider series of previous measurements in the decision. In addition, a variational threshold can be used to detect abnormal patterns. Thus, μ and σ are different at each time point in statistical theory.

This paper uses different residual thresholds at different times. Time-series measurement pressure data build the original time series. These pressure data are recorded every 5 min. The mean and standard deviation of absolute pressure and pressure variation value at each time point (288 moments) are calculated based on historical time-series measurement pressure data. In addition, the fitting residuals in the training process of the support vector regression (SVR) model prediction model are used as samples for statistical analysis, and the mean and standard deviation are calculated.

This study evaluates the pressure data at each time step by using the rules and judges the number of items that satisfy the criteria as a score. The three characters of the pressure data are scored by the rules and three scores of the three features are generated at each time step. If any criterion is not satisfied, then the score is 0. If all criteria are satisfied, then the score is 4. The number of satisfied criteria indicates the score, quantifying the level of anomaly. A higher score indicates the higher probability of occurrence of the pipe burst on this dimension. Compared with the criterion based on a single-time threshold as the judgement of pipe burst, the WEC rules can effectively use multi-time data to provide a basis for the current time judgement.

Decision tree to realize the pipe burst detection

The corresponding scores of three features can be estimated by the previous sections. A decision tree trained by the scores of the three features is adopted for pipe burst detection. The decision tree is a nonparametric supervised machine-learning method (Song & Lu 2015). The input of this model is the scoring result calculated by the ‘Scoring evaluation’ section, and the output is whether it is a pipe burst..

The Iterative Dichotomiser 3 (ID3) algorithm is a basic greedy algorithm for data classfication (Khedr et al. 2016). The paper chooses the decision tree model based on the ID3 algorithm to identify the accident (Cios & Liu 1992). The core of the ID3 algorithm is to apply information gain criteria to select features on each node of the decision tree and construct the decision tree recursively. The specific method starts from the root node, calculates the information gain of all possible features for the node, selects the feature with the largest information gain as the feature of the node, and establishes the child node from the different values of the feature. Then, this method is used for the child node recursively until the information gain of all features is very small or no feature is available to choose the location; finally, a decision tree is obtained.

Since this study uses the abnormality of three features to distinguish between pipe bursts and normal events in multiple dimensions, it needs to rely on a decision tree classification model. The decision tree model is easy to implement, efficient and has low time complexity.

Time-series prediction model

As mentioned previously, the deviation value of pressure prediction is a critical feature for pipe burst detection. The computation of this feature relies on an efficient pressure time-series prediction model. In this paper, the support vector machine (SVM) model is adopted. SVM is a basic supervised learning model in machine learning (Mounce & Boxall 2010). In the stage of time series prediction, SVR is selected for prediction, and the latter value from the first few observations is predicted (Kim 2003).

The effect of the SVR model is largely limited by the kernel function. The kernel function determines the mapping of the sample unit to the high-dimensional space and is the embodiment of the core idea of SVR. In this study, the Radial Basis Function (RBF) is selected, and two adjustable parameters, namely gamma and cost, are selected. The parameter gamma determines the number of support vectors. The larger the gamma, the more support vectors, which means that the training sample reaches a wider range and is always positive. The cost parameter represents the cost of making mistakes. The larger the value, the greater the penalty for error, and the smaller the fitting error to the training set. Grid search for gamma and cost is used within a limited range to exhaustively search for possible permutations and combinations of all parameters, and five different gammas (from to ) and seven cost parameters (from to ) are considered. The accuracy estimation process uses ten-fold cross-validation. In accordance with the accuracy evaluation result, the parameter corresponding to the model with the smallest error is selected as the final model result (Mounce et al. 2011).

The precision of the prediction model influences the predicted deviation value, further affecting this burst detection method. The mean absolute percentage error (MAPE), mean square error (MSE), mean absolute error (MAE), and coefficient of determination () in statistics is the prediction metric that evaluates the accuracy:
formula
(1)
formula
(2)
formula
formula
(4)
where is the observed value, is the predicted value and is the average of the observed value. A good prediction model has small values of the MAPE, MSE, and MAE.
When a burst detection method is used in real WDSs, water companies focus on detection accuracy (DA). For a binary classifier, accuracy is composed of two parts, namely, sensitivity and specificity. The sensitivity indicates the ability to detect actual bursts, and specificity is the ability to accurately exclude normal data from being classified as bursts (Song & Lu 2015; Wu & Liu 2017). Evidently, sensitivity has a negative correlation with specificity; that is, a higher sensitivity with more bursts detected has a greater number of false alarms (i.e., lower specificity). To quantify the two aspects of accuracy, true positive rate (TPR) and false positive rate (FPR) are calculated, as follows:
formula
(5)
formula
(6)
where true positive (TP) is the detection of an actual burst; false positive (FP) refers to an inaccurately classified burst; true negative (TN) refers to an accurately identified normal situation; and false negative (FN) is the situation in which an actual burst is undetected. In addition, detection accuracy (DA) is introduced to evaluate the effectiveness of this method:
formula
(7)

A good detection method must maintain a small FPR even at the expense of reducing TPR within an acceptable range, considering that bursts occur rarely in a WDS (Wu et al. 2016).

This study used the three metrics to evaluate the effectiveness and accuracy of this burst detection approach. TP rate (TPR), FP rate (FPR), and DA assessed the effectiveness of the classification method.

Description

The developed method is applied to a realistic network (where the burst data are real and not synthetic) with field data. The network of G city is located in the south of China, Guangdong province. The city has established the supervisory control and data acquisition (SCADA) system, collecting pressure data of the pipe network and analyzing the real-time water supply status.

Figure 3(a) and 3(b) respectively show a schematic of G city's entire water distribution system and pressure monitoring points within the research area. In this area, 13 pressure meters upload the measurements every five minutes. The proposed method was tested on one pressure meter within its detection range, for 340 days from January to December in 2018. It is impossible for all pressure meters to detect the occurrence of pipe bursts at the same time and every pressure meter has its monitoring range (Qi et al. 2018a).

The two types of data set are normal data without burst event and detection data including normal events and burst events. Part of the normal data is used for feature extraction to obtain the threshold at each moment, and the remaining data is used to build a decision tree model for burst detection. There is no significant difference in water consumption during holidays and working days in this area, and the difference is not considered when the historical pressure data is calculated. The proposed data-driven burst detection approach has been tested, verified, and illustrated in normal pipe bursts of real-life cases.

Feature extraction and scoring evaluation

Burst detection is achieved by multiple features of the pressure data instead of only one feature. In the next sections, the actual pressure data of a pressure monitoring point are used for detailed analysis. In addition, the results (TPR and FPR) of burst detection in the real case are presented.

Deviation value of pressure prediction

The actual monitoring data are used in the G network as the data set to construct the SVR model. The trained model is used to predict the pressure data at five-minute time steps throughout the day. Table 1 shows the part of the model input data and pressure at the moment to be predicted.

Table 1

Part of the model input data of SVR model

TimeMarch 1 (Mpa)Previous time (Mpa)February 28 (Previous day) (Mpa)February 22 (Previous seven days) (Mpa)
00:00 0.1839 0.1909 0.1539 0.1767 
00:05 0.1597 0.1839 0.1250 0.1688 
00:10 0.1437 0.1597 0.1258 0.1561 
00:15 0.1450 0.1437 0.1221 0.1538 
00:20 0.1456 0.1450 0.1200 0.1471 
00:25 0.1456 0.1456 0.1052 0.1507 
00:30 0.1450 0.1456 0.0971 0.1533 
00:35 0.1414 0.1450 0.0969 0.1583 
00:40 0.1453 0.1414 0.1000 0.1579 
00:45 0.1456 0.1453 0.1042 0.1584 
TimeMarch 1 (Mpa)Previous time (Mpa)February 28 (Previous day) (Mpa)February 22 (Previous seven days) (Mpa)
00:00 0.1839 0.1909 0.1539 0.1767 
00:05 0.1597 0.1839 0.1250 0.1688 
00:10 0.1437 0.1597 0.1258 0.1561 
00:15 0.1450 0.1437 0.1221 0.1538 
00:20 0.1456 0.1450 0.1200 0.1471 
00:25 0.1456 0.1456 0.1052 0.1507 
00:30 0.1450 0.1456 0.0971 0.1533 
00:35 0.1414 0.1450 0.0969 0.1583 
00:40 0.1453 0.1414 0.1000 0.1579 
00:45 0.1456 0.1453 0.1042 0.1584 

The measured values of the previous time, the corresponding pressure data at the same time of the previous day, and the previous seven days are used as input to the model, and the pressure data at the time point T of prediction are used as the model output (Wang et al. 2020), considering the trend and periodicity of historical data. For time-series data, comparing the adjacent data in the same period of time is necessary. Thus, the value at time T has a strong dependence on a few moments prior to time point T (Table 2).

Table 2

Prediction and inputs of the SVR model

PredictionModel input
Pressure data at time point T T − 15 min,T−(24 h) × 1,T−(24 h) × 7 
PredictionModel input
Pressure data at time point T T − 15 min,T−(24 h) × 1,T−(24 h) × 7 

Specifically, if the pressure data at 0:00 on January 19 are predicted, then the model input data are predicted at 23:55 on February 28 and 0:00 on February 22. Table 1 shows part of the model input data of the SVR model. The optimal parameters of the SVR model are searched in a certain range. The parameter cost is 1, gamma is 0.25, and the number of support vectors is 1,205. The trained model is used to predict the data from 0:00 to 23:55 on March 1 under normal conditions. The pressure curve is not smooth and fluctuates within a certain range due to the complicated situation of the collected data in the actual case. The comparison of the results of the one-day forecast and actual measurement results is shown in Figure 4.

Figure 4

Prediction result and measured data on January 19 (normal condition).

Figure 4

Prediction result and measured data on January 19 (normal condition).

Pressure prediction is the basis for providing the characteristics of the prediction deviation required by the method (Wang et al. 2020). Table 3 shows that the accuracy of the prediction result is reliable. According to the results of SVR, the mean value and the standard deviation of the pressure prediction deviation from the normal historical data are , respectively.

Table 3

Performance of the models on testing data

ModelMAPE (%)MSEMAE (%)
SVR 3.91 0.006967 0.00528 93.236 
ModelMAPE (%)MSEMAE (%)
SVR 3.91 0.006967 0.00528 93.236 

Absolute pressure and pressure variation value

The burst threshold is set at each time point of each feature. The specific method is that the mean value and standard deviation of each feature at each moment are calculated from actual historical pressure measurements over a period of time. Figures 5(a) and 6(a) separately plot the mean (centerline) of two features and limits that are multiple standard deviations on either side of the mean at each moment of a day (288 time points). Figures 5(b) and 6(b) show the density distributions at different times, which are the basis for determining the multi-time threshold. The peak in Figure 5(a) indicates that the variance of the pressure variation data at approximately 14:00 is much greater than at other times. The pressure variation data are relatively discrete, thereby resulting in a peak in Figure 5(a).

Figure 5

(a) Representative Shewhart control charts (24 h) of pressure variation value, and (b) distribution at different times (0:00, 6:00, 12:00, and 18:00).

Figure 5

(a) Representative Shewhart control charts (24 h) of pressure variation value, and (b) distribution at different times (0:00, 6:00, 12:00, and 18:00).

Figure 6

(a) Representative Shewhart control charts (24 h) of absolute pressure value, and (b) distribution at different times (0:00, 6:00, 12:00, and 18:00).

Figure 6

(a) Representative Shewhart control charts (24 h) of absolute pressure value, and (b) distribution at different times (0:00, 6:00, 12:00, and 18:00).

Scoring evaluation results for three features

Based on the WEC rules, three features from the pressure data at each moment are evaluated by scoring. The pressure at a normal time (January 19, 9:05) and abnormal time (January 20, 9:05) is considered, and the specific values of the three features at the present time and the previous continuous time steps are shown in Table 4. Figure 7 shows the absolute pressure curve and pressure variation curve in a period of time. Figure 7(b) shows that the absolute pressure value started to decrease significantly at 8:30, but the pipe burst started at 9:05. Therefore, a single feature cannot effectively detect burst. This study combines the characteristics of multiple moments and multiple characteristics to analyze burst events.

Table 4

(a) Data feature at 09:05 on January 19 and previous continuous time steps; (b) data feature at 09:05 on January 20 and previous continuous time steps

(a) Time (January, 19)Pressure measurementDeviation value of pressure predictionPressure variation
08:30 0.1179 −0.001495 −0.001477 
08:35 0.1135 −0.005044 −0.004359 
08:40 0.1141 −0.001148 0.0005934 
08:45 0.1111 −0.004570 −0.003000 
08:50 0.1168 0.003538 0.005695 
08:55 0.1104 −0.007436 −0.006453 
09:00 0.1108 −0.002292 0.0004838 
09:05 0.1126 −0.001352 0.001797 
(b) Time (January, 20)Pressure measurementDeviation value of pressure predictionPressure variation
08:30 0.1028 −0.01267 −0.01050 
08:35 0.09371 −0.001266 −0.009110 
08:40 0.09926 −0.003484 0.005547 
08:45 0.09198 −0.01189 −0.007282 
08:50 0.08824 −0.01273 −0.003734 
08:55 0.08449 −0.01601 −0.003750 
09:00 0.08127 −0.01957 −0.003219 
09:05 0.07648 −0.02567 −0.004789 
(a) Time (January, 19)Pressure measurementDeviation value of pressure predictionPressure variation
08:30 0.1179 −0.001495 −0.001477 
08:35 0.1135 −0.005044 −0.004359 
08:40 0.1141 −0.001148 0.0005934 
08:45 0.1111 −0.004570 −0.003000 
08:50 0.1168 0.003538 0.005695 
08:55 0.1104 −0.007436 −0.006453 
09:00 0.1108 −0.002292 0.0004838 
09:05 0.1126 −0.001352 0.001797 
(b) Time (January, 20)Pressure measurementDeviation value of pressure predictionPressure variation
08:30 0.1028 −0.01267 −0.01050 
08:35 0.09371 −0.001266 −0.009110 
08:40 0.09926 −0.003484 0.005547 
08:45 0.09198 −0.01189 −0.007282 
08:50 0.08824 −0.01273 −0.003734 
08:55 0.08449 −0.01601 −0.003750 
09:00 0.08127 −0.01957 −0.003219 
09:05 0.07648 −0.02567 −0.004789 
Figure 7

Curve of two statistical features in the normal and abnormal time: (a) pressure change between 7:30 and 10:30 on January 19; (b) absolute pressure between 7:30 and 10:30 on January 20.

Figure 7

Curve of two statistical features in the normal and abnormal time: (a) pressure change between 7:30 and 10:30 on January 19; (b) absolute pressure between 7:30 and 10:30 on January 20.

For normal moments (January 19, 9:05), the three features cannot satisfy any evaluation criterion. Therefore, the evaluation score of the three features in this study is 0. With regard to the abnormal moment, the absolute pressure satisfies the third and fourth criteria. Moreover, the deviation value of pressure prediction satisfies the second and fourth criteria, but the pressure variation value does not satisfy any criteria. Therefore, the evaluation scores of the pressure value, the predicted residual value, and the pressure change value at this time are 2, 2, and 0, respectively. It can be seen that the evaluation score of abnormal time is higher than that of normal time, which is also the basis for classification in this study.

The single-time score evaluation results of three pressure data features at a single time are used as sample inputs, and the binary classification results at a single time are used as outputs to train a decision tree model. A total of 70% of the sample units is used as the training data set, and 30% is used as the validation data set. Figure 8 shows the decision tree model of burst detection trained with the scores of the three features. The trained model is used to determine whether there is a pipe burst event in the water distribution by using the real-time score evaluation results for the three pressure data features as model input.

Figure 8

Decision tree model for burst detection.

Figure 8

Decision tree model for burst detection.

Taking a burst event as an example, Figure 9 shows the difference in pressure variation under normal conditions and in the case of a pipe burst event. Figures 9(a) and 9(c) show the daily normal pressure change, and Figure 9(a) shows the pipe burst accident that occurred from 9:00 to 18:00. When a pipe burst occurs, the actual pressure data and simulated pressure data exceed the normal fluctuation range, and the pressure at the monitoring points drops sharply. To explain the model, the normal moment (January 19, 9:05) and the abnormal moment (January 20, 9:05) shown in Figure 7 are used as examples. The predicted deviation value score is 0 on January 19, 9:05; thus, it is judged to be less than 1 at the first-level tree node, which is divided into the left subset. At the second-level tree node, the pressure absolute value score is 0, which is less than 2; thus, it is considered in the left subset. At this time, the judgment is that the data at this moment are normal at the end node of the classification model. In contrast, on January 20, 9:05, the deviation of pressure prediction is 2, which is evidently greater than 1. Therefore, at the first-level tree node, it is classified in the right subset. At the second-level tree node, the pressure absolute value score is 2 and classified in the right subset. Finally, the judgment at the end tree node is that pipe burst occurs at this moment.

Figure 9

Curve of pressure change on three days: (a) pressure change on January 19 without burst; (b) pressure change on January 20 including pipe burst events; (c) pressure change on January 21 without burst.

Figure 9

Curve of pressure change on three days: (a) pressure change on January 19 without burst; (b) pressure change on January 20 including pipe burst events; (c) pressure change on January 21 without burst.

Figure 10 summarizes the TPR and FPR in an actual case. The three features as input were more effective than the single feature. It had better detection performance (TPR = 96.65%, FPR = 0.35%, and DA = 99.56%). The abnormal moments that are not detected are mainly because the burst flow may be relatively small, and occurs on a pipe with a small diameter that is far away from the monitoring points. The single-feature method, on the basis of the absolute pressure value (TPR = 35.65%, FPR = 0.37%), prediction deviation (TPR = 83.48%, FPR = 0.35%), and pressure variation (TPR = 4.35%, FPR = 0.33%) is less accurate than the three-feature method.

Figure 10

TPR and FPR of the actual burst event in the case of single feature and three features.

Figure 10

TPR and FPR of the actual burst event in the case of single feature and three features.

In this real-time detection method, prediction deviation plays the greatest role, and pressure variation value has slight effects. According to the node information of the model in this realistic case, the pressure value and the predicted residual value play a major role in the final classification result. However, the pressure variation value score information is insufficient, and the role of pressure change value as the basis for the judgment is not evident. According to the two moments shown in Figure 7, the pressure changes at the first eight moments of the normal moment and the abnormal moment are not remarkably different in Figure 7(a), and they float within μ ± 2σ. By contrast, the absolute value of the pressure is relatively different in Figure 7(b). Specifically, for the abnormal time the absolute pressure value is much lower than for the normal time. This finding is due to pressure variation that significantly increases only at the beginning of the pipe burst, thereby indicating that the pressure sharply decreases.

This detection method focuses on consecutive moments rather than the time point (the pipe burst occurring). Furthermore, this method uses multiple data features for burst detection. It can be seen from the results that the detection effect of the multi-featured method is much better than that of the single feature. The importance of the different features in identifying pipe burst is different. Specifically, the feature of pressure variation is not as important as the two other features. In addition, the effectiveness of the method is affected by the location of the monitoring points (Aral et al. 2010; Hagos et al. 2016). The good monitoring effect of this case is closely related to the optimization of the monitoring location.

This study proposes a data-driven real-time pipe burst detection method for water supply networks. On account of prediction–classification, an SVR model is established to predict the time series values of the network pressure. A combination of multi-time-step data information is proposed in the classification stage based on statistical process control. The decision tree model is used for burst pipe identification.

  • 1.

    The results show that the combination of multiple features remarkably affects the detection performance. The three features in the real-time detection method are absolute pressure value, deviation value of pressure prediction, and pressure variation value. The proposed method improved the detection effect to a large extent compared with the model using three single features separately. Detection accuracy of the multi-featured method reached 99.56%. In addition, this method shows that each feature plays a different role. The data feature of prediction deviation plays the most important role, and the pressure variation plays the least role.

  • 2.

    In this stage of scoring data features based on statistical control rules, the data features of multiple time points are combined. Burst detection can be a continuous identification of outliers. To determine whether a pipe burst occurs at the current moment, the data features of the current moment are considered, as well as the data characteristics of several previous consecutive moments. The score of the data at these eight consecutive moments indicates, to a certain extent, the probability of pipe burst.

  • 3.

    When identifying a pipe burst event, different burst thresholds are selected at different times. This is more accurate than choosing the same time threshold to identify pipe burst because the multiple threshold is more suitable for actual pressure data.

  • 4.

    The method in this study achieved timely burst detection (within 15 min) using online data. The application of the pipe burst detection method is realized in actual pressure data of a pipe network, and desirable identification results are obtained.

Future work should focus on the connection between different sensors. Subsequent research would improve the detection of bursts on the basis of the information of multiple meters.

The present research is funded by the National Science and Technology Major Projects for Water Pollution Control and Treatment (2017ZX07201003), the National Key Research and Development Program of China (No.2016YFC0400601), and the National Natural Science Foundation of China (No. 51761145022).

All relevant data are available from an online repository or repositories: https://github.com/zhangxiangqiu/detection.

Aggarwal
C. C.
, 2016
Outlier Analysis
, 2nd edn. Springer, Cham, Switzerland.
Aksela
K.
,
Aksela
M.
&
Vahala
R.
2009
Leakage detection in a real distribution network using a SOM
.
Urban Water Journal
6
(
4
),
279
289
.
Aral
M. M.
,
Guan
J. B.
&
Maslia
M. L.
2010
Optimal design of sensor placement in water distribution networks
.
Journal of Water Resources Planning and Management
136
(
1
),
5
18
.
AWWA
2009
Water Audits and Loss Control Programs
, 3rd edn.
American Water Works Association
,
Denver
, CO, USA.
Bakker
M.
, , Vreeburg, J. H. G., Rietveld, L. C. & van de Roer, M.
2012
Reducing customer minutes lost by anomaly detection?
In:
14th Water Distribution Systems Analysis Conference
, 24–27 September, Adelaide, South Australia, Australia.
Bakker
M.
,
Vreeburg
J. H. G.
,
van Schagen
K. M.
&
Rietveld
L. C.
2013
A fully adaptive forecasting model for short-term drinking water demand
.
Environmental Modelling & Software
48
,
141
151
.
Bakker
M.
,
Trietsch
E. A.
,
Vreeburg
J. H. G.
&
Rietveld
L. C.
2014
Analysis of historic bursts and burst detection in water supply areas of different size
.
Water Science and Technology - Water Supply
14
(
6
),
1035
1044
.
Berardi
L.
,
Giustolisi
O.
,
Kapelan
Z.
&
Savic
D. A.
2008
Development of pipe deterioration models for water distribution systems using EPR
.
Journal of Hydraulic Engineering
10
(
2
), 113-126.
Fox
S.
,
Shepherd
W.
,
Collins
R.
&
Boxall
J.
2016
Experimental quantification of contaminant ingress into a buried leaking pipe during transient events
.
Journal of Hydraulic Engineering
142
(
1
), 04015036.
Gong
J. Z.
,
Zecchin
A. C.
,
Simpson
A. R.
&
Lambert
M. F.
2014
Frequency response diagram for pipeline leak detection: comparing the odd and even harmonics
.
Journal of Water Resources Planning and Management
140
(
1
),
65
74
.
Hagos
M.
,
Jung
D.
&
Lansey
K. E.
2016
Optimal meter placement for pipe burst detection in water distribution systems
.
Journal of Hydroinformatics
18
(
4
),
741
756
.
Huang
P. J.
,
Zhu
N. F.
,
Hou
D. B.
,
Chen
J. Y.
,
Xiao
Y.
,
Yu
J.
,
Zhang
G. X.
&
Zhang
H. J.
2018
Real-time burst detection in district metering areas in water distribution system based on patterns of water demand with supervised learning
.
Water
10
(
12
), 1765.
Jung
D. H.
&
Lansey
K.
2015
Water distribution system burst detection using a nonlinear Kalman filter
.
Journal of Water Resources Planning and Management
141
(
5
), 04014070.
Khedr
A. E.
,
Idrees
A. M.
&
El Seddawy
A. I.
2016
Enhancing Iterative Dichotomiser 3 algorithm for classification decision tree
.
Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery
6
(
2
),
70
79
.
Kim
K.-j.
2003
Financial time series forecasting using support vector machines
.
Neurocomputing
55
(
1–2
),
307
319
.
Li
R.
,
Huang
H. D.
,
Xin
K. L.
&
Tao
T.
2015
A review of methods for burst/leakage detection and location in water distribution systems
.
Water Science and Technology - Water Supply
15
(
3
),
429
441
.
Li
X.
,
Chu
S.
,
Zhang
T.
,
Yu
T.
&
Shao
Y.
2021
Leakage localization using pressure sensors and spatial clustering in water distribution systems
.
Water Supply
, ws2021219. https://doi.org/10.2166/ws.2021.219.
Loureiro
D.
,
Amado
C.
,
Martins
A.
,
Vitorino
D.
,
Mamade
A.
&
Coelho
S. T.
2016
Water distribution systems flow monitoring and anomalous event detection: a practical approach
.
Urban Water Journal
13
(
3
),
242
252
.
Ma
J.
,
Sun
L.
&
Wang
H.
, , Zhang, Y. & Aickelin, U.
2016
Supervised anomaly detection in uncertain pseudoperiodic data streams
.
ACM Transactions on Internet Technology
16
(
1
), 4.
Misiunas
D.
,
Lambert
M.
,
Simpson
A.
&
Olsson
G.
2005
Burst detection and location in water distribution networks
.
Water Supply
5
(
3–4
),
71
80
.
Mounce
S. R.
,
Boxall
J. B.
&
Machell
J.
2010
Development and verification of an online artificial intelligence system for detection of bursts and other abnormal flows
.
Journal of Water Resources Planning and Management
136
(
3
),
309
318
.
Mounce
S. R.
,
Mounce
R. B.
&
Boxall
J. B.
2011
Novelty detection for time series data analysis in water distribution systems using support vector machines
.
Journal of Hydroinformatics
13
(
4
),
672
686
.
Nam
Y. W.
,
Arai
Y.
,
Kunizane
T.
&
Koizumi
A.
2021
Water leak detection based on convolutional neural network (CNN) using actual leak sounds and the hold-out method
.
Water Supply
, ws2021109. https://doi.org/10.2166/ws.2021.109.
Ostfeld
A.
,
Oliker
N.
&
Salomons
E.
2014
Multiobjective optimization for least cost design and resiliency of water distribution systems
.
Journal of Water Resources Planning and Management
140
(
12
),
04014037
.
Palau
C. V.
,
Arregui
F. J.
&
Carlos
M.
2012
Burst detection in water networks using principal component analysis
.
Journal of Water Resources Planning and Management
138
(
1
),
47
54
.
Perelman
L.
&
Ostfeld
A.
2013
Bayesian networks for source intrusion detection
.
Journal of Water Resources Planning and Management
139
(
4
),
426
432
.
Qi
Z. X.
,
Zheng
F. F.
,
Guo
D. L.
,
Maier
H. R.
,
Zhang
T. Q.
,
Yu
T. C.
&
Shao
Y.
2018a
Better understanding of the capacity of pressure sensor systems to detect pipe burst within water distribution networks
.
Journal of Water Resources Planning and Management
144
(
7
), 04018035.
Qi
Z. X.
,
Zheng
F. F.
,
Guo
D. L.
,
Zhang
T. Q.
,
Shao
Y.
,
Yu
T. C.
,
Zhang
K. J.
&
Maier
H. R.
2018b
A comprehensive framework to evaluate hydraulic and water quality impacts of pipe breaks on water distribution systems
.
Water Resources Research
54
(
10
),
8174
8195
.
Romano
M.
,
Kapelan
Z.
&
Savić
D. A.
2014
Automated detection of pipe bursts and other events in water distribution systems
.
Journal of Water Resources Planning and Management
140
(
4
),
457
467
.
Soldevila
A.
,
Blesa
J.
,
Tornil-Sin
S.
,
Duviella
E.
,
Fernandez-Canti
R. M.
&
Puig
V.
2016
Leak localization in water distribution networks using a mixed model-based/data-driven approach
.
Control Engineering Practice
55
,
162
173
.
Song
Y. Y.
&
Lu
Y.
2015
Decision tree methods: applications for classification and prediction
.
Shanghai Archives of Psychiatry
27
(
2
),
130
135
.
Wang
X. T.
,
Guo
G. C.
,
Liu
S. M.
,
Wu
Y. P.
,
Xu
X. Y.
&
Smith
K.
2020
Burst detection in district metering areas using deep learning method
.
Journal of Water Resources Planning and Management
146
(
6
), 04020031.
Wu
Y. P.
&
Liu
S. M.
2020
Burst detection by analyzing shape similarity of time series subsequences in district metering areas
.
Journal of Water Resources Planning and Management
146
(
1
), 04019068.
Wu
Y. P.
,
Liu
S. M.
,
Wu
X.
,
Liu
Y. F.
&
Guan
Y. S.
2016
Burst detection in district metering areas using a data driven clustering algorithm
.
Water Research
100
,
28
37
.
Yan
H.
,
Wang
Q.
,
Wang
J.
,
Xin
K.
,
Tao
T.
&
Li
S.
2019
A simple but robust convergence trajectory controlled method for pressure driven analysis in water distribution system
.
Science of the Total Environment
659
,
983
994
.
Ye
G. L.
&
Fenner
R. A.
2011
Kalman filtering of hydraulic measurements for burst detection in water distribution systems
.
Journal of Pipeline Systems Engineering and Practice
2
(
1
),
14
22
.
Ye
G.
&
Fenner
R. A.
2014
Weighted least squares with expectation maximization algorithm for burst detection in UK water distribution systems
.
Journal of Water Resources Planning and Management
140
(
4
),
417
424
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).