Abstract
Real-time leakage detection based on pressure and flow data has become increasingly essential for water distribution systems (WDSs). Recent data-driven leakage detection approaches have largely focused on burst detection characterised as sudden outflow or sudden pressure drops but did not mention the ability to detect gradual leakage events that do not have sudden change and could cause more water loss. This study proposes an online leakage detection system based on the exponential weighted moving average (EWMA)-enhanced Tukey method to help monitor gradual leakage events of WDSs. The proposed online system comprises three main parts: data pre-processing, the online detection sub-system, and the parameter updating sub-system. The proposed online system is based on lightweight and powerful statistical tools without complex model construction. The effectiveness of the proposed system is demonstrated on leakage datasets under various real-world scenarios, including gradual leakages and bursts. The results showed that the proposed EWMA-enhanced Tukey method could detect gradual leakage events quickly while generating low false alarms. The proposed method is computationally effective and able to deal with non-stationary behaviours automatically.
HIGHLIGHTS
This paper proposed a leakage detection method that focused on gradual leakage events.
The proposed leakage detection method could adapt to the time-varying characteristics of flow monitoring data, such as the demand variation caused by weather-related issues.
This paper proposed a method that is robust to the data noises and could successfully detect leakage events without generating many false alarms.
Graphical Abstract
INTRODUCTION
In modern society, water is distributed within well-established water distribution networks (WDN). From the first pipes in Crete around 3,500 years ago to today's complex pipeline system, the water distribution system (WDS) has become one of the inevitable parts of human life. Although the freshwater available for direct human consumption only accounts for less than 1% of the Earth's water resources (Romano 2012), the demand for water resources continues to grow due to urbanisation, population, economic growth, etc. (Gupta & Kulat 2018). Furthermore, high financial and environmental costs are involved in infrastructure construction, power and chemical investment for water treatment, energy costs for pumping water, etc. (Lambert 2002). Therefore, it is becoming more and more important to reduce water loss in all aspects. The International Water Association (IWA) (Farley 2003) has defined water loss as the sum of real and apparent losses plus unbilled authorised consumption, and among them, leakage in pipelines was identified as the primary contributor of water loss. Furthermore, UK Water Industry Research (UKWIR) has identified leakage as one of their strategic priorities and has raised the question, ‘How will we achieve zero leakage in a sustainable way by 2050?’ Therefore, it is vital to develop methods to support this strategy and to reduce leakage as much as possible in WDSs.
In recent years, leakage detection for WDSs based on data-driven methods has received increasing attention. With the development of hydraulic sensor technology and data acquisition system, it has become possible to monitor a WDS in real-time using pressure and flow monitoring devices that have been permanently installed in the pipeline system. Abundant data sources have become available to represent the complex condition of the real-world system. However, these data are usually too numerous and complicated for humans to handle (Wu & Liu 2017). Thus, data-driven methods are needed to automatically extract valuable information and detect patterns from the big data. Moreover, unlike model-based methods (Mohammed et al. 2021) that require a well-calibrated hydraulic model, data-driven strategies do not require specific in-depth knowledge about the WDS and are more suitable for online control.
Various data-driven methods have been studied and developed to detect burst events in WDSs and reduce the time to awareness (Wan et al. 2022). With the availability of the real-time flow or pressure monitoring data collected from the Supervisory Control and Data Acquisition (SCADA) system, data-driven methods could be used to mine the historical data and find a model to represent the condition of a WDS. Based on the representation of the distribution system's normal behaviour, new leakage events could be determined if the system's behaviour is substantially different from its normal state. Therefore, some prediction models could be used to learn historical flow or pressure data pattern and provide a prediction for reference. Mounce et al. (2002, 2007) introduced a burst detection system based on an artificial neural network (ANN). In addition, Kalman filter (KF) (Ye & Fenner 2011), nonlinear KF (Jung & Lansey 2015), support vector regression (SVR) (Mounce et al. 2011), long short-term memory (LSTM) model (Wang et al. 2020), and other prediction models (Bakker et al. 2014; Ye & Fenner 2014; Karray et al. 2016) have been explored for burst detection for WDSs.
Statistical process control (SPC) charts, with a set of control limits, provide intuitive and cost-effective tools to monitor and display the unusual behaviour of a process. Claudio et al. (2015) applied an exponential weighted moving average (EWMA) model to detect a leakage event in a DMA equipped with automated meter reading (AMR), and the awareness time is approximately 1 week after its occurrence. Borges et al. (2017) applied the Western Electric Company (WEC) rules for pipe burst detection, but the detection probability (DP) can be as low as 40%. Jung et al. (2015) compared the performance of three univariate and three multivariate SPC methods for burst detection in WDSs with consistent system operation. The result showed at least one false alarm per day, which is not promising. In their later research, Ahn & Jung (2019) proposed a hybrid approach that combined the results generated from WEC rules and the cumulative sum (CUSUM) method, but the detection time (DT) for burst events is at least 5 h. Therefore, the accuracy and applicability of the conventional SPC method are still not promising.
Traditional burst detection methods that aim to capture the sudden change in the data may not be suitable for gradual leakage detection. The task of detecting sudden flow increases is now transformed to detecting the trend shift. The most common method used in literature for leakage detection is the Shewhart method, which only considers the current measurement and does not retain any memory of the historical data (Kadri et al. 2016). Thus, the Shewhart method is not very effective when detecting gradual leakage events that cause small or moderate process mean shifts. EWMA chart is proposed by Roberts (2000) to mitigate the shortcomings of the Shewhart chart by incorporating the information from past measurements. EWMA exponentially weighted the average of all prior data so that it could utilise all the available information based on their importance to sense small changes in the process mean. Therefore, EWMA is an advanced method to monitor small or moderate shifts in process mean (Nam et al. 2019).
In addition, Wu & Liu (2017) provided a review of the data-driven methods for burst detection, and they pointed out that any burst detection or leakage detection method developed must be able to work in an online environment. The data come in sequence, and a decision is also required to be made in sequence. Moreover, the information of new data should be learned by the model in real time so that the model can adapt to the time-varying characteristic of data. For example, in the northern hemisphere, the consumers’ demand in summer is generally higher than in winter. Thus, the control limits set in summer should be automatically different from winter. In addition, the information that comes from leakage events should be considered separately.
To address issues mentioned above, this paper proposes an online detection framework to adapt to the time-varying condition and provides an early warning system to detect gradual leak events. The proposed online system comprises three main parts: data pre-processing, the online detection sub-system, and the parameter updating sub-system. In the step of data pre-processing, data differencing and data de-seasonalisation are used to transform the raw monitoring data for the online detection phase. Then, a robust statistic-based approach called the EWMA-enhanced Tukey test is proposed to assess the sequential data and detect gradual leakage events in an online manner. The proposed approach is based on lightweight and powerful statistical tools without complex model construction. A rolling time window is used to adapt to the time-varying characteristics of the monitoring data. The parameter updating sub-system provides an information interaction to inform the online detection sub-system to absorb information and update parameters according to the system condition. The main contributions of this paper are:
- 1.
Based on the EWMA method, this paper proposed a leakage detection method that focused on gradual leakage events;
- 2.
Based on the use of data differencing, the proposed leakage detection method could adapt to the time-varying characteristics of flow monitoring data, such as the demand increase and decrease caused by weather-related issues;
- 3.
Based on the use of robust statistics and the Tukey test, this paper proposed a leakage detection method that is robust to the data noises and could successfully detect leakage events without generating many false alarms;
- 4.
Proposed an online detection framework that could analyse data in sequential order and automatically update model parameters based on the information received selectively.
Leakage datasets under various real-world scenarios are used in this research to evaluate the applicability and effectiveness of the proposed online leakage detection method. Various real-life scenarios are considered, such as daily patterns, weekly patterns, seasonal behaviour, etc. One year of real-time monitoring flow data has been used to detect gradual leakages and bursts within the WDS. Leakage events with different increasing rates and amplitude have been generated from the benchmark model to demonstrate the early warning capability and the limit of the proposed detection system.
METHODOLOGY
Data pre-processing
In conclusion, the trend and periodicity of the flow monitoring data pose a great challenge, and data pre-processing is inevitable for the performance of the SPC-based leakage detection methods. Therefore, the identification and removal of periodic trends and seasonal effects are conducted at this stage to prepare the data for SPC testing.
Online detection based on EWMA-enhanced Tukey method
Parameter updating
The online detection system allows for real-time information selection by parameter updating. Three parameters should be set for this method: threshold scalar, outlier tolerance, and online window length. These parameters could be selected based on empirical experience or intuition, and the robustness of the parameter will be discussed in the next section. The information provided by online detection will be used for parameter updating. For every newly collected data, online detection can flag it as if it belongs to a leakage event or not. It should be noted that a single anomaly has a high probability that caused by consumers’ behaviour or noises. Thus, it may not be the best representation of the occurrence of leakage events. In this study, a parameter called outlier tolerance N is used to provide a buffer for leakage detection, which means that the alarm will be triggered when N consecutive outliers are being detected.
If the system remains healthy, the new information will be used to update the model, and old information outside the time window will be abandoned. A rolling time window with length l is used to ensure the data used for statistics calculating is stable and unbiased, and the rolling time window strategy can provide the dynamic threshold that can adapt to the time-varying behaviour of the flow monitoring data. If a leakage event is detected, the detection system will flag it and not use it for parameter updating until there is no more alarm or the algorithm being notified that the leakage has been repaired.
Performance evaluation
Three criteria are used in this study to evaluate the performance of the proposed leakage detection system, which are DP, the number of false alarms (NF), and the DT.
In order to demonstrate the capability of the proposed online detection system in dealing with the varying property of flow data caused by weather factors, the number of false alarms is calculated on a yearly basis. A good leakage detection method should maximise the DP while minimising the number of false detections.
The DT is also an important criterion because it reflects how quickly the method can respond to an abnormal event, especially for gradual leakage events. The DT is defined as the elapsed time from the start of the leakage event to the time when the event is first detected.
RESULTS AND DISCUSSION
Description of the study area
The monitoring system of the L-Town network contains 1 tank water level sensor, 3 flow sensors, 33 pressure sensors, and 82 automated metered readings (AMR) in Area C. It provides a dataset of pressure and flow monitoring time series and contains multiple leakage scenarios under varying conditions. It should be noted that the smart meters in Area C can provide valuable information on the consumers’ behaviour. However, because most water companies do not readily have these measurements, the information provided by smart meters will not be used in this study. Therefore, this paper focuses on detecting leakage based on flow data only. Flow sensors are located downstream of water sources, as shown in Figure 4. In addition, flow data are mainly independent of system changing operation controls (such as pump operations) and are more related to the consumers’ behaviour (Jung & Lansey 2015). Based on the inlet and outlet flow monitoring data, leakage could be detected by estimating subarea demands. The total number of monitoring stations is dependent on the number of inlets and outlets of the DMA. In this paper, three flow sensors are used to detect leakages.
Dataset generation
Leak . | Type . | Leak diameter (mm) . | Peak leak volume (m3/h) . | Peak leak volume of average demand . | Start time . | Peak time . | End time . | Leak growth volume (m3/h /day) . | Leak growth volume of average demand . |
---|---|---|---|---|---|---|---|---|---|
1 | Gradual | 20.11 | 37.97 | 23.55% | 1 May 09:20 | 12 May 16:05 | 17 May 09:20 | ≈3.366 | 2.09% |
2 | Gradual | 17.39 | 28.40 | 17.61% | 20 Jun. 15:45 | 6 Jul. 15:45 | 10 Jul. 10:25 | ≈1.775 | 1.10% |
3 | Burst | 20.00 | 24.98 | 15.49% | 3 Aug. 07:00 | 3 Aug. 07:00 | 3 Aug. 11:00 | ≈24.96 | 15.48% |
4 | Gradual | 22.92 | 49.32 | 30.59% | 28 Aug. 10:35 | 10 Sep. 02:45 | 15 Sep. 17:30 | ≈3.884 | 2.41% |
5 | Gradual | 19.04 | 34.06 | 21.12% | 06 Oct. 02:35 | 10 Nov. 02:35 | 15 Nov. 13:35 | ≈0.973 | 0.60% |
6 | Burst | 20.00 | 24.61 | 15.26% | 15 Dec. 13:00 | 15 Dec. 13:00 | 15 Dec. 17:00 | ≈24.61 | 15.26% |
Leak . | Type . | Leak diameter (mm) . | Peak leak volume (m3/h) . | Peak leak volume of average demand . | Start time . | Peak time . | End time . | Leak growth volume (m3/h /day) . | Leak growth volume of average demand . |
---|---|---|---|---|---|---|---|---|---|
1 | Gradual | 20.11 | 37.97 | 23.55% | 1 May 09:20 | 12 May 16:05 | 17 May 09:20 | ≈3.366 | 2.09% |
2 | Gradual | 17.39 | 28.40 | 17.61% | 20 Jun. 15:45 | 6 Jul. 15:45 | 10 Jul. 10:25 | ≈1.775 | 1.10% |
3 | Burst | 20.00 | 24.98 | 15.49% | 3 Aug. 07:00 | 3 Aug. 07:00 | 3 Aug. 11:00 | ≈24.96 | 15.48% |
4 | Gradual | 22.92 | 49.32 | 30.59% | 28 Aug. 10:35 | 10 Sep. 02:45 | 15 Sep. 17:30 | ≈3.884 | 2.41% |
5 | Gradual | 19.04 | 34.06 | 21.12% | 06 Oct. 02:35 | 10 Nov. 02:35 | 15 Nov. 13:35 | ≈0.973 | 0.60% |
6 | Burst | 20.00 | 24.61 | 15.26% | 15 Dec. 13:00 | 15 Dec. 13:00 | 15 Dec. 17:00 | ≈24.61 | 15.26% |
The longer the duration between the start time and the peak time, the slower the growth process of the leak, which means that it will be more difficult to detect the leak. In order to demonstrate the limit of the proposed method, 40 other datasets have been generated. Each dataset contains two gradual leakage events with the same leakage size and the same leakage duration, one occurred in the summer, and one occurred in the winter. Leakage with different leak diameter is generated with different leak growing duration for each dataset. After several leakage scenarios were inserted in the distribution network, monitoring data could be generated with a sampling interval of 5 min.
Application results
Parameter estimation
Figure 6 shows the DP and NF (per year) of the proposed method with different parameter combinations. Threshold scalar k determines the range of threshold, and a larger k indicates a wider range of data will be considered as normal. As shown in Figure 6, with the same parameter setting for N and l, a larger k indicates a higher probability of undetected leakage events but could reduce the number of false alarms. Outlier tolerance N represents the size of the buffer for raising a leakage alarm. A larger N indicates greater tolerance for outliers, reduces the probability of false alarms but may delay the DT. Online window length l determines the length of historical data that will be considered at each time step for the statistics calculation so that the threshold could adapt to the time-varying characteristic of flow data caused by environmental factors (such as weather change). If the window length is too short, there are not enough data samples for accurate statistics estimation, but if the window length is too long, the assumption about stationery could be easily disobeyed. Furthermore, the three parameters have influences on each other. Therefore, these three parameters need to be tuned to achieve a relatively high DP, low number of false alarms per year, and quick DT.
From Figure 6, it could be observed that the proposed method is robust to threshold scalar settings and outlier tolerance settings with a designated rolling window length. For example, when the window length equals 10 days and 20 days, the DP is maintained at 100% no matter the value of k and N (within a certain range). In most parameter settings, the number of false alarms is less than five per year. Based on the sensitivity analysis, there are two parameter sets that could achieve 100% accuracy and 0 false alarm at the same time: (1) k = 2.5, N = 4, l = 20 and (2) k = 2.75, N = 6, l = 10. Table 2 shows the final detection results for Dataset 0 based on the two potential parameter settings. It could be observed that parameter set (2) has outperformed parameter set (1) in DT for all gradual leakage events, but is slightly lower for burst detection. Since this paper aims to develop an early warning system for gradual leakage, DT for gradual leakage events is one of the primary considerations for the proposed method. Therefore, the EWMA-enhanced Tukey method's parameters k, N, l were set to 2.5, 4, 20, respectively.
Leak . | Type . | DT (Parameter 1) . | DT (Parameter 2) . |
---|---|---|---|
1 | Gradual | 3 d 3 h 40 min | 5 d 55 min |
2 | Gradual | 3 d 14 h | 3 d 16 h 25 min |
3 | Burst | 3 h 50 min | 1 h 55 min |
4 | Gradual | 2 d 18 h 40 min | 3 d 18 h 10 min |
5 | Gradual | 9 d 3 h | 9 d 3 h 10 min |
6 | Burst | 2 h 45 min | 2 h |
Leak . | Type . | DT (Parameter 1) . | DT (Parameter 2) . |
---|---|---|---|
1 | Gradual | 3 d 3 h 40 min | 5 d 55 min |
2 | Gradual | 3 d 14 h | 3 d 16 h 25 min |
3 | Burst | 3 h 50 min | 1 h 55 min |
4 | Gradual | 2 d 18 h 40 min | 3 d 18 h 10 min |
5 | Gradual | 9 d 3 h | 9 d 3 h 10 min |
6 | Burst | 2 h 45 min | 2 h |
Comparison between EWMA-based method and Shewhart-based method
Conventional Shewhart-based method or three-sigma rule only uses the last data sample to make decisions and does not have any memory of previous data (Kadri et al. 2016). In contrast, EWMA-based monitoring charts take into account the historical information by using a weighted average of past observations, which makes it more suitable for detecting gradual anomalies. In order to compare the proposed EWMA-enhanced Tukey method to the commonly-used three-sigma rule (Dunn 2019) in detecting gradual leakage events, the Shewhart method has been extended with the proposed online framework using the same pre-processing stage and the parameter updating sub-system.
Table 3 shows the detection results of the proposed method and the online version of the Shewhart method. As shown in Table 3, the Shewhart method detects gradual leakage events at a much lower rate than the proposed EWMA-enhanced Tukey method. The proposed method successfully detects all the leakage events within a short duration. All burst events are successfully detected, and almost all gradual leakage events are detected within 4 days, except leak No. 5. The DT of leak No. 5 is around 9 days. This is because the daily growth rate of leak No. 5 is about 0.5% of the average water demand, which is a very small = increase. Table 3 also shows the amount of leakage flowrate when the leakage is first being detected. The proposed method detects all gradual leakage events before they reach 4 m3/h. However, the Shewhart method raises alarms when flow rates of leakages reach more than 15 m3/h. The results show the superiority of the EWMA-based method for gradual leakage events detection.
Leak . | Type . | Peak leak volume (m3/h) . | DT (EWMA-enhanced Tukey) . | LF (EWMA-enhanced Tukey) . | DT (Shewhart) . | LF (Shewhart) . |
---|---|---|---|---|---|---|
1 | Gradual | 37.97 | 3.16 days | 3.35 | 7.03 days | 15.16 |
2 | Gradual | 28.40 | 3.61 days | 1.43 | 13.57 days | 20.42 |
3 | Burst | 24.98 | 3.83 h | 24.98 | – | – |
4 | Gradual | 49.32 | 2.78 days | 2.37 | 7.61 days | 17.83 |
5 | Gradual | 34.06 | 9.16 days | 2.32 | – | – |
6 | Burst | 24.61 | 2.75 h | 24.61 | – | – |
Leak . | Type . | Peak leak volume (m3/h) . | DT (EWMA-enhanced Tukey) . | LF (EWMA-enhanced Tukey) . | DT (Shewhart) . | LF (Shewhart) . |
---|---|---|---|---|---|---|
1 | Gradual | 37.97 | 3.16 days | 3.35 | 7.03 days | 15.16 |
2 | Gradual | 28.40 | 3.61 days | 1.43 | 13.57 days | 20.42 |
3 | Burst | 24.98 | 3.83 h | 24.98 | – | – |
4 | Gradual | 49.32 | 2.78 days | 2.37 | 7.61 days | 17.83 |
5 | Gradual | 34.06 | 9.16 days | 2.32 | – | – |
6 | Burst | 24.61 | 2.75 h | 24.61 | – | – |
– means the detection method fails to detect the leakage event.
LF means the amount of leakage flowrate when first being detected (m3/h).
Comparison between EWMA-enhanced Tukey method with or without data differencing
Data differencing is one of the important parts of the proposed leakage detection method. Seasonal differencing can effectively reduce the influence of global trends in the data caused by weather factors. Water usage increases from spring to summer and decreases from summer to winter gradually. One difficulty for gradual leakage detection is that the global trend could be easily confused with the increasing trend caused by gradual leakage. In order to satisfy the assumption of stationarity (or as close as possible), data differencing is used in this paper to eliminate the global trend in the flow data while preserving the increasing trend caused by gradual leakages. To evaluate the usefulness of this procedure, methods without data differencing and methods with data differencing were compared.
Comparison between EWMA-enhanced Tukey method with or without robust statistics
Another important part of the proposed methodology is the adoption of robust statistics. The use of robust statistics in the data transformation and the use of the Tukey method in the threshold setting stage enhanced the robustness of the proposed online leakage detection method. The traditional methods (Jung et al. 2015; Ahn & Jung 2019; Nam et al. 2019) use non-robust statistics such as mean and standard deviation that are not robust to noises or outliers, leading to unsatisfying detection performance. Therefore, the proposed method uses robust statistics, such as the median, which is robust against noise.
Detection ability
In order to show the robustness of the proposed method, all six leakage events in Table 1 have been shuffled randomly in the dataset. The order of the six leakage events and the start time of each leakage event have been shuffled randomly three times. Table 4 shows the detection results of three shuffled datasets. According to Table 4, the proposed algorithm maintained 100% DP while the Shewhart-based method missed several leakage events on each dataset. The proposed algorithm did not raise any false alarms except one false alarm in shuffled dataset 2, while the Shewhart-based method did not raise any false alarms. Furthermore, for the detected leakage events, the proposed method detected all leakage events quicker than the Shewhart-based method all the time.
Shuffled dataset . | NF (per year) . | Leak . | Type . | Leak growth volume of average demand . | DT . | ||
---|---|---|---|---|---|---|---|
EWMA-enhanced Tukey . | Shewhart . | EWMA-enhanced Tukey . | Shewhart . | ||||
1 | 0 | 0 | 1 | Burst | 15.26% | 1 h | 1.83 h |
2 | Gradual | 2.41% | 4.73 days | 8.80 days | |||
3 | Burst | 15.48% | 2.75 h | – | |||
4 | Gradual | 0.60% | 20.12 days | – | |||
5 | Gradual | 2.09% | 5.18 days | 10.28 days | |||
6 | Gradual | 1.10% | 8.56 days | – | |||
2 | 1 | 0 | 1 | Gradual | 2.41% | 3.73 days | 7.69 days |
2 | Burst | 15.26% | – | – | |||
3 | Gradual | 2.09% | 5.01 days | 8.05 days | |||
4 | Gradual | 0.60% | 9.05 days | – | |||
5 | Burst | 15.48% | 0.92 h | 2.25 h | |||
6 | Gradual | 1.10% | 8.60 days | – | |||
3 | 0 | 0 | 1 | Burst | 15.48% | 1.5 h | – |
2 | Gradual | 2.09% | 4.10 days | 9.01 days | |||
3 | Gradual | 0.60% | 25.13 days | – | |||
4 | Burst | 15.26% | 1.58 h | – | |||
5 | Gradual | 1.10% | 8.48 days | – | |||
6 | Gradual | 2.41% | 5.85 days | 8.69 days |
Shuffled dataset . | NF (per year) . | Leak . | Type . | Leak growth volume of average demand . | DT . | ||
---|---|---|---|---|---|---|---|
EWMA-enhanced Tukey . | Shewhart . | EWMA-enhanced Tukey . | Shewhart . | ||||
1 | 0 | 0 | 1 | Burst | 15.26% | 1 h | 1.83 h |
2 | Gradual | 2.41% | 4.73 days | 8.80 days | |||
3 | Burst | 15.48% | 2.75 h | – | |||
4 | Gradual | 0.60% | 20.12 days | – | |||
5 | Gradual | 2.09% | 5.18 days | 10.28 days | |||
6 | Gradual | 1.10% | 8.56 days | – | |||
2 | 1 | 0 | 1 | Gradual | 2.41% | 3.73 days | 7.69 days |
2 | Burst | 15.26% | – | – | |||
3 | Gradual | 2.09% | 5.01 days | 8.05 days | |||
4 | Gradual | 0.60% | 9.05 days | – | |||
5 | Burst | 15.48% | 0.92 h | 2.25 h | |||
6 | Gradual | 1.10% | 8.60 days | – | |||
3 | 0 | 0 | 1 | Burst | 15.48% | 1.5 h | – |
2 | Gradual | 2.09% | 4.10 days | 9.01 days | |||
3 | Gradual | 0.60% | 25.13 days | – | |||
4 | Burst | 15.26% | 1.58 h | – | |||
5 | Gradual | 1.10% | 8.48 days | – | |||
6 | Gradual | 2.41% | 5.85 days | 8.69 days |
– means the detection method fails to detect the leakage event.
From Tables 3 and 4, it can be observed that for a small leakage event, the method needs a longer time to detect its occurrence. For example, the No. 5 leakage event has the most prolonged grow duration and relatively small magnitude, making it the most difficult to detect. When the amplitude of a leakage event becomes too small, it could be easily confused with normal consumption behaviour or variation caused by weather and resulting in undetectable leakages. To demonstrate the limit of the proposed methods, another 40 datasets that contain leakage events with different diameter and different grow duration have been used. The grow duration is defined as the duration between the leak start time and leak peak time. Table 5 shows the detection results for those 40 datasets. Each dataset contains two leakage events with the same growth duration and amplitude, but one happened in summer, and one happened in winter.
Leak diameter (mm) . | Peak leak magnitude () . | DT based on the proposed method . | DT based on the Shewhart-based method . | ||||||
---|---|---|---|---|---|---|---|---|---|
d = 10 . | d = 15 . | d = 20 . | d = 25 . | d = 10 . | d = 15 . | d = 20 . | d = 25 . | ||
13 | 10.63 | 4.82 | – | – | – | – | – | – | – |
14 | 12.32 | 4.78 | 6.77 | – | – | – | – | – | – |
15 | 14.14 | 4.75 | 6.76 | – | – | – | – | – | – |
16 | 16.08 | 4.67 | 6.76 | 19.73 | – | 13.75 | – | – | – |
17 | 18.14 | 4.62 | 6.76 | 19.73 | – | 13.73 | – | – | – |
18 | 20.33 | 2.96 | 5.74 | 12.86 | – | 13.73 | 14.77 | – | – |
19 | 22.64 | 2.96 | 4.82 | 6.76 | – | 12.78 | 14.77 | – | – |
20 | 25.06 | 2.96 | 4.78 | 6.76 | 26.69 | 12.78 | 19.73 | – | – |
21 | 27.61 | 2.95 | 4.78 | 6.76 | 26.69 | 6.75 | 19.73 | – | – |
22 | 30.27 | 2.88 | 4.77 | 6.76 | 12.86 | 6.72 | 19.73 | – | – |
Leak diameter (mm) . | Peak leak magnitude () . | DT based on the proposed method . | DT based on the Shewhart-based method . | ||||||
---|---|---|---|---|---|---|---|---|---|
d = 10 . | d = 15 . | d = 20 . | d = 25 . | d = 10 . | d = 15 . | d = 20 . | d = 25 . | ||
13 | 10.63 | 4.82 | – | – | – | – | – | – | – |
14 | 12.32 | 4.78 | 6.77 | – | – | – | – | – | – |
15 | 14.14 | 4.75 | 6.76 | – | – | – | – | – | – |
16 | 16.08 | 4.67 | 6.76 | 19.73 | – | 13.75 | – | – | – |
17 | 18.14 | 4.62 | 6.76 | 19.73 | – | 13.73 | – | – | – |
18 | 20.33 | 2.96 | 5.74 | 12.86 | – | 13.73 | 14.77 | – | – |
19 | 22.64 | 2.96 | 4.82 | 6.76 | – | 12.78 | 14.77 | – | – |
20 | 25.06 | 2.96 | 4.78 | 6.76 | 26.69 | 12.78 | 19.73 | – | – |
21 | 27.61 | 2.95 | 4.78 | 6.76 | 26.69 | 6.75 | 19.73 | – | – |
22 | 30.27 | 2.88 | 4.77 | 6.76 | 12.86 | 6.72 | 19.73 | – | – |
– means the detection method fails to detect the leakage event.
d means the time duration it takes for the leak to grow to its maximum value.
Table 5 shows that the DT could be affected by both leakage magnitude and leakage growth rate. As the leakage magnitude becomes smaller, the DT could be delayed and eventually it will remain undetectable in the system. This is because the leakage volume is small, and it will be regarded as normal water consumption by the algorithm. Furthermore, as the leakage growth duration becomes longer, the DT could also be delayed and eventually fail to raise the alarm. This is because the growing trend caused by leakage is too small, and eventually, it becomes indistinguishable from the growing trend caused by weather factors. In most cases, the detection algorithm could raise the alarm at least 13 days before the leakage reaches its maximum level.
In comparison, the detection results based on the traditional Shewhart method are presented in Table 5. It could be observed that the traditional method failed to detect most of the gradual leakage events, especially when the growing trend of the leakage is relatively slow. In addition, the DT based on the Shewhart-based method is much slower than the proposed method, with detection times at least 10 days slower in each case.
CONCLUSIONS
Gradual leakage events are more challenging to be detected than burst events due to their long-term influence and unnoticeable behaviour. This study proposed a novel EWMA-enhanced Tukey method to detect gradual leakage events in a real-time manner for water distribution systems. This method used simple, computationally lightweight but powerful and robust statistics to detect leakage events in the WDS. First, the raw monitoring data were pre-processed using data differencing and data transformation techniques. Then, the transformed data were processed based on the EWMA-enhanced Tukey method. Based on the results of the online leakage detection, parameter updating was proposed to update the model automatically and ensure the proposed method could process streaming data in an online manner.
The proposed method has been successfully applied for a case study with a year of monitoring data and proved effective in real-time monitoring. The 1-year monitoring datasets contain four gradual leakage events and two burst events, and the detection results showed that the proposed method has successfully detected all leakage events. All leakage events were detected with a short DT and did not generate any false alarms in a year, which showed a promising future for this method.
The comparison analysis showed that the improvement of the method had a significant impact on the detection performance. Data differencing has proven effective when dealing with seasonal behaviour in the consumers’ demand, and it ensured the proposed algorithm could be successfully applied in real-time. Robust statistics have proved to be helpful in eliminating the influence of data uncertainties and reducing the number of false alarms.
There is no leakage detection method that could detect all leaks of any magnitude. Therefore, in this paper, the detection ability was tested for the proposed method. In this study, leakage events that continuously happened for weeks have been successfully detected with leak diameter larger than 20 mm or growth duration less than around 20 days.
It should be noted that even though the proposed methodology has presented promising behaviour in dealing with the real-time gradual leakage event detection, the real-life monitoring data may contain more challenging uncertainties, such as missing data, system changing operation, and human behaviour changing caused by pandemics such as Covid-19. This will be studied in our future research. In addition, the proposed detection algorithm could be failed if the leakage event is too small or slowly happens in the system, and a method with higher accuracy could be further developed in future research.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.