Abstract
In this work, we focus on the detection of leaks occurring in district metered areas (DMAs). Those leaks are observable as a number of time-related deviations from zone patterns over days or weeks. While they are detectable given enough time, due to the huge cost of water loss resulting from an undetected leak, the main challenge is to find them as soon as possible, when the deviation from the zone pattern is small. Using our collected observational data, we investigate the appearance of leaks and discuss the performance of several machine learning (ML) anomaly detectors in detecting them. We test a diverse set of six anomaly detectors, each based on a different ML algorithm, on nine scenarios containing leaks and anomalies of various kinds. The proposed approach is very effective at quickly (within hours) identifying the presence of a leak, with a limited number of false positives.
HIGHLIGHTS
We focus on the detection of leaks and anomalies occurring in the district metered areas (DMAs).
We use machine learning anomaly detection algorithms on hourly inflow, loss, consumption and pressure data.
We test the proposed approach on nine scenarios and show its good performance, potentially finding leaks within hours, with a limited number of false positives.
INTRODUCTION
Growing human population, especially in urban areas, creates many new challenges for water distribution systems maintenance, as growing demand requires them to be more efficient and limit water losses. In those, major factors are leakages and burst in pipe networks, which occur between water treatment and delivery to customer locations. As quoted in Mamlook & Al-Jayyousi (2003) and Beuken et al. (2008), water loss occurs in almost all water networks and starts from 3 to 7% in developed countries, rising to more than 50% in undeveloped ones. While this creates an obvious economical issue and is a major concern for water delivery companies, the problems of water loss are also environmental, sustainability and potentially even energy, health and safety issues (Colombo & Karney 2002). In recent years, there has been a significant amount of research concerning leak management in water delivery systems (WDS), as seen in reviews (Puust et al. 2010; Xu et al. 2014).
Water leak management consists of: leak detection, localisation and repair (Islam et al. 2011); this paper is focused on the first of those issues. While a large burst in a pipe network may sometimes be easily detected, e.g. by reported flooding or when it causes a sudden pressure drop in the WDS, small leakages may stay undetected for days or even weeks. WDS are commonly segmented into zones, or district metered areas (DMAs).
Data-based leak detection is DMAs was typically based on inlet meter and pressure sensors. The inlet meter provides frequent (e.g. hourly) information about the water inflow into the DMA, while pressure sensors provide information from selected points within the DMA – often a single measure at the inlet. Inflow analysis methods (Buchberger & Nadimpalli 2004; Rahmat et al. 2017) are typically applied to such data by system operators. One of the most significant approaches is the analysis of minimum night flow (MNF) (Farley & Trow 2003; Liemberger & Farley 2004; Alkasseh et al. 2013) based on the observation that nightly DMA consumption is much lower than during the daytime, which means leakages or pipeline bursts easier to observe. An approach, called BABE (Bursts and Background Estimates) was proposed in Lambert (2007) and used both inflow information and annual losses data. Inflow was often used in conjunction with data from pressure sensors, such as in a fuzzy approach presented in Islam et al. (2011), multi-scale neural networks proposed in Hu et al. (2021) or in statistical anomaly detection (AD) in time series-based approach in Wu & He (2021).
Wider use of smart meters, which are able to provide frequent data from every single end-point of the water network, resulted in a significant number of works on using such data to detect post-meter leaks (leaks within the internal network of the consumer). Example approaches include the use of individual periods of null consumption and minimum night usage to detect client leaks (Boudhaouia & Wira 2018) or building a user usage profile (Abate et al. 2019). Data from smart meters can also be employed for DMA leak detections, it allows for calculating DMA's joint consumptions and using them along with the inflow values to obtain the DMA water loss value. A problem of detecting leaks using a smart meter system is presented in Farah & Shahrour (2017), example approaches include the pressure-driven balance model proposed in Yu et al. (2021), or graph partitioning methods (Rajeswaran et al. 2018).
In this work, we study the problem of leak detection using the DMA monitoring data. We use hourly data of DMA inflow, total water consumption (computed from a smart meter grid) and a small number of pressure sensors (1–3, depending on the particular DMA configuration). These hourly data vectors form an input to the detection algorithm while the output is a binary value indicating that a leak is detected. Such detection can easily be integrated into the monitoring software (e.g. through a dashboard notification for the DMA operator). The use of hourly DMA in this scenario differs from a more typical scenario of MNF analysis, which may require three or more data points to detect consistent growth in loss values which means that at least 72 h have to pass before the leak can be detected – as we show, usage of hourly data can lead to much quicker reaction time.
Our proposition for detecting leaks is to detect the anomalies it causes in the DMA monitoring time series data. By treating leak detection as an AD problem, we can use many well-researched machine learning (ML) algorithms, which have been successfully applied in other domains. The ML methods have already been applied to leak detection: in Farah & Shahrour (2017), the probability density function was applied to hourly water consumption on the customer level to detect local leaks. Self-supervised leak detector (SSLD) was proposed in Blázquez-García et al. (2021), the method is based on differences from normal system behaviour in hourly inflow data. An interesting approach is proposed by the authors of Sadeghioon et al. (2018) who use AD methods on the pressure and temperature monitoring for the pipeline. However, a typical approach is to use a physical pressure simulation model such as an EPANET simulation in Mashhadi et al. (2021) or Fan et al. (2021). Compared to this approach, our method is simpler, easier to apply and less computationally expensive. In addition, it can be applied to DMAs with a limited number of pressure sensors.
The main challenge that is to be expected when applying general AD methods for leak detection is the complex nature of the input data. Most effective approaches to AD are based on ML, i.e. learning typical patterns from the data and detecting outliers as non-conforming to those patterns. This usually requires a long history of stationary data for model learning. In contrast to that, DMA monitoring data are heterogeneous (e.g. inflow or loss has different nature to pressure data), complex (e.g. hourly and daily variations, irregular users) and frequently changing in character (due to e.g. maintenance and management operations). Due to those difficulties, the performance of AD methods in a leak detection role is an open question.
In this paper, we present an experimental analysis of applying eight algorithms that represent the current state of the art of AD to detection of two distinct classes of leaks: a build-up leak and a spike leak. We use a dataset of eight scenarios analysed and confirmed by experts in three different DMAs. In addition to leaks, we investigate anomalous situations resulting from pipeline maintenance. We show that the proposed approach is a promising method of leak detection with an ability to capture a majority of tested leaks within the first 24 h.
METHODS
We focus on time series AD, the task of which is to identify patterns in time series data that do not correspond to a well-defined notion of a normal or typical behaviour (Chandola et al. 2009).
Our detection scenario is based on the observation that DMA time series data are not stationary, i.e. its statistical properties may change in time. A moment of such change is often visible as a distinct anomaly in data, e.g. a sharp change in pressure readings, inflow or loss values. If such event is spotted by an operator, it is investigated or sometimes ignored, e.g. when it results from a planned maintenance task. Typically, anomalous readings last for a time, from a few hours to even days, until they stabilise. However, the new ‘normal’ DMA state is often distinctly different from its state before the anomaly which may correspond to differences in mean values of pressures or loss, their variance or even a presence (or lack of) a subset of pressure sensors. This indicates the need to retrain AD models, which work in a time-localised region of the DMA data, i.e. between what the operator defines as a new normal state (after e.g. the previous leak is repaired) and the discovery of a new one (reported by the AD and being investigated).
Time-localised AD for leak detection
Given a set of DMA time regions, our data are time series
where
are hourly timestamps and
is a number of raw measured data streams and/or derived features in the DMA region
. In all cases considered in this paper,
includes values of zone inflow, aggregated consumptions as well as hourly water loss value, computed as DMA inflow and consumption difference. Some cases include also one of more pressure sensor data – minimum, average and maximum values over hourly intervals.
Given a set of training vectors representing a typical DMA behaviour, an anomaly detector is a function
that for any input data vector returns a value of detection statistics (DS). High values of DS indicate the abnormality of a data vector compared to the statistical properties of the training set. Given some value of a detection threshold
, an anomaly is detected in the DMA region z and time t if
.
Training set
Given a DMA region z, a detector D starts to process DMA data starting from time . Vectors from
up to
form a training set
, where n is a training set size. An anomaly can then be detected in time
.
Detection accuracy metric
An anomaly in a dataset is labelled by its anomaly time associated with the moment when the anomaly begins. The (first) moment of detection by an algorithm is denoted as
. The difference between the detection and anomaly times
is the detection accuracy metric used to estimate the performance of the detector. Since the accuracy metrics value can be negative (indicating a detection before the anomaly time) or positive, when this value is minimised, e.g. in the case of detector parameter selection, its absolute value is used.
Anomaly threshold
Given a training set for a DMA region
, and a detector D, first the maximum and minimum values of the DS over the training set,
and
, respectively, are computed. The anomaly threshold is estimated as
where
is a scaling parameter of the detection algorithm. Larger values of the scaling parameter may be used to lower the probability of false alarms, especially for small training sets.
In our experiments, a value of corresponds to detecting an anomaly.
Parameter selection
The detection performance of every algorithm depends on its internal parameters, e.g. the number of neighbours k for a -NN detector, and the threshold scaling
parameter, common to all detectors. In order to select the parameters in the most objective and unbiased manner, for every scenario, parameter values are determined based on data from other scenarios and without access to data from the currently tested scenario. This approach follows two assumptions: on one hand, a number of examples of DMA leaks are available. On the other hand, parameters of the detectors should be general, which means that all leaks can be detected (a detector is not trained for a particular class or type of leaks). In the presented experiments, we require that a chosen set of detector parameters should work for both classes of leaks in our scenarios.
More formally, the parameters of the detector are estimated by performing a grid-search using a leave-one-scenario-out approach: Given a set of parameters in a searched parameter grid (i.e. a Cartesian product of the parameter grid), a parameter candidate
is evaluated by computing its averaged detection score. The averaged detection score is computed by performing a detection experiment on a set of remaining scenarios
and averaging absolute values of their detection accuracy. The best parameters are chosen by minimising the averaged detection scores among all parameter candidates.
Detectors
We have chosen eight AD algorithms for our experiments. These algorithms include both well-known and recent methods and represent a diverse set of approaches to AD problems with regard to both assumptions and detector complexity:
- 1.
-nearest neighbours (
-NN) (Angiulli & Pizzuti 2002) and local outlier factor (LOF) (Breunig et al. 2000) detectors are examples of proximity-based detectors, where the abnormality of an example depends on the distance from its neighbours in the feature space.
- 2.
Isolation forest (IF) (Liu et al. 2012) is an ensemble approach which works on the principle of randomly choosing features and generating ensembles of binary trees, measuring the abnormality of examples by the length of their paths in the trees.
- 3.
One-class support vector machine (OCSVM) (Schölkopf et al. 2001) is a kernel-based approach based on the principle of finding a maximal margin hyperplane separating the dataset from its origin after mapping data points into a high-dimensional feature space (using a kernel function).
- 4.
AutoEncoder (AE) (Charu 2019) is a neural-network, reconstruction-based approach, where an NN model is used to encode and then reconstruct a dataset and the abnormality of examples depends on the value of the reconstruction error.
- 5.
Principal component analysis (Shyu et al. 2003) is a subspace-based approach where the abnormality score of an example is obtained as the sum of its projected distance on eigenvectors with small or large eigenvalues.
- 6.
Unsupervised outlier detection using empirical cumulative distribution functions (ECOD) (Li et al. 2022) and copula-based outlier detection (COPOD) (Li et al. 2020) are examples of probabilistic approaches which first estimate the distribution of data and then estimate example abnormalities based on their tail distributions across dimensions.
Detection scenarios
The scenarios were selected from leak and anomaly events discovered during routine maintenance of WDS in one Polish city. The events were taken from three DMAs (denoted as ‘zone A’, ‘zone B’, ‘zone C’) and numbered accordingly (e.g. A-1, A-2, B-1). Expert's knowledge and consultations with WDS operators were used to select the DMAs, events within the zones, their starting times and AD targets
. In order to test the performance of detectors in the absence of unusual events in data, one additional scenario marked as N-1 was prepared.
Scenario A-1

Scenario A-1, a leak that grows bigger as the break in the pipeline gets larger under water pressure which results in increasing loss and decreasing pressure values. The loss plot is computed as the difference between zone inflow and the sum of consumptions of individual customers. The pressure plot is hourly average and minimum of the sensor readout. The vertical dashed line denotes the target for anomaly detectors () set by the experts. Note that in this case the target was set by the experts 9 days earlier than the leak was originally detected, as it is the recommended behaviour of a leak detection system.
Scenario A-1, a leak that grows bigger as the break in the pipeline gets larger under water pressure which results in increasing loss and decreasing pressure values. The loss plot is computed as the difference between zone inflow and the sum of consumptions of individual customers. The pressure plot is hourly average and minimum of the sensor readout. The vertical dashed line denotes the target for anomaly detectors () set by the experts. Note that in this case the target was set by the experts 9 days earlier than the leak was originally detected, as it is the recommended behaviour of a leak detection system.
Scenario A-2
Scenario A-2, a slowly increasing leak resulting in a consistent, growing trend in the water loss. The presented leak followed the one in scenario A-1.
Scenario A-2, a slowly increasing leak resulting in a consistent, growing trend in the water loss. The presented leak followed the one in scenario A-1.
Scenario A-3
Scenario A-3, a dynamically increasing leak resulting in a consistent, growing trend in the water loss and small but noticeable pressure drop.
Scenario A-3, a dynamically increasing leak resulting in a consistent, growing trend in the water loss and small but noticeable pressure drop.
Scenario A-4
Scenario A-4, a sharp drop followed by a spike in DMA water loss values. The most probable cause of this anomaly is the maintenance of the pipeline.
Scenario A-4, a sharp drop followed by a spike in DMA water loss values. The most probable cause of this anomaly is the maintenance of the pipeline.
Scenario A-5
Scenario A-5, a challenging case of two anomalies: a build-up and a spike leak one after another.
Scenario A-5, a challenging case of two anomalies: a build-up and a spike leak one after another.
Scenarios A-6 and B-1
Scenarios A-6 (top plot) and B-1 (bottom plot): an anomaly resulting from pipeline maintenance in two connected DMAs with a sharp drop in minimum pressure values in the pipe connecting them.
Scenarios A-6 (top plot) and B-1 (bottom plot): an anomaly resulting from pipeline maintenance in two connected DMAs with a sharp drop in minimum pressure values in the pipe connecting them.
Scenario C-1
Scenario C-1, a break in the pipeline located in the DMA, indicated by an increase in DMA loss values and a decrease in values on one of the pressure sensors, lasting for a long period of time.
Scenario C-1, a break in the pipeline located in the DMA, indicated by an increase in DMA loss values and a decrease in values on one of the pressure sensors, lasting for a long period of time.
The visible pressure values come from sensors located in the provider's pipes (the highest pressure value, with values consistently above 6,000 mbar), the sensor located after the pressure reducer and the third one located deep within the DMA. The first two sensors do not show any major changes which indicate that the anomaly is located within the DMA, far from the source.
Scenario N-1
EXPERIMENTS AND RESULTS
Experiments
Experiments were implemented in Python 3.9 using NumPy, SciPy, pandas, Matplotlib libraries, as well as PyOD library (Zhao et al. 2019). For every scenario, the length of the training set was set to half of the period between its first timestamp
and the timestamp of the anomaly
. Only the moment of the first detection for each detector is evaluated.
Features
The data features consisted of hourly values of DMA consumptions, raw inflow, and loss (difference between consumptions and inflow), as well as min, max and mean values of pressures from all DMA pressure sensors. This set of features was chosen as one of the best after initial experiments.
In every scenario, the features were standardised, by subtracting the mean and dividing them by the standard deviation, the values of which were estimated on the training set. The features with zero variance in the training set were removed.
Parameters
The ranges of parameters used in our experiments were as follows:
- 1.
For every detector, the range threshold scaling parameter
.
- 2.
-NN: the number of neighbours,
three approaches to outlier score estimation were tested: a distance to the
-th neighbour and both average and median distances to all
-neighbours.
- 3.
LOF: the number of neighbours
.
- 4.
IF: the size of the ensemble
.
- 5.
OCSVM: Radial Basis Function (RBF) kernel, parameters
,
.
- 6.
AE: four hidden layers
neurons, bath size
, learning rate
.
- 7.
PCA: the number of components is estimated using the heuristics described in Minka (2000).
- 8.
ECOD, COPOD: the methods are nonparametric.
Results
Results of detection experiments with all features
Detector . | A-1 . | A-2 . | A-3 . | A-4 . | A-5 . | A-6 . | B-1 . | C-1 . | Avg . |
---|---|---|---|---|---|---|---|---|---|
COPOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
ECOD | ![]() | ![]() | ![]() | x | ![]() | x | x | ![]() | ![]() |
k-NN | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
LOF | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
PCA | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
OCSVM | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
AE | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
IFOREST | ![]() | ![]() | ![]() | x | x | x | x | ![]() | ![]() |
Avg | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Detector . | A-1 . | A-2 . | A-3 . | A-4 . | A-5 . | A-6 . | B-1 . | C-1 . | Avg . |
---|---|---|---|---|---|---|---|---|---|
COPOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
ECOD | ![]() | ![]() | ![]() | x | ![]() | x | x | ![]() | ![]() |
k-NN | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
LOF | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
PCA | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
OCSVM | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
AE | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
IFOREST | ![]() | ![]() | ![]() | x | x | x | x | ![]() | ![]() |
Avg | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Values in the table represent the detection accuracy, i.e. the difference between the actual and detected starting time of an anomaly (in hours) with ‘x’ denoting no detection. The last row and last column present values averaged over their absolute value.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A1, (b) A2, (c) A3.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A1, (b) A2, (c) A3.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A4, (b) A5.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A4, (b) A5.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A6, (b) B1, (c) C1.
Detection results for experimental scenarios with annotated responses of individual detectors. Values near the detector name are the detection accuracy scores. (a) A6, (b) B1, (c) C1.
Our scenarios can be divided into four ‘types’ of events:
- 1.
Scenarios A-1, A-2 and A-3 are examples of typical leaks resulting from breaks in the pipeline. This type of leak is usually detected through analysis of MNF which requires a minimum of two or three values from consecutive days. In this context, leak detection in less than
h can be considered a good result compared to the MNF analysis. A majority of tested detectors achieved this result with only three higher values of detection scores. In scenarios A-1 and A-3, the detection time of almost all detectors was lower than
h. Scenario A-2 proved to be challenging which is indicated by its high mean detection score compared to other scenarios. The relatively late response of most detectors in this scenario may result from a large variance in the values of loss and pressure in the training set. As a result, some detectors reacted only to strong changes in the trend visible in Figure 8(b) after about 4 and 7 days.
- 2.
Scenarios A-4 and A-5 are examples of rapid, huge losses which cannot be detected by MNF analysis. In the case of A-4, a majority of detectors reacted to the sudden drop in loss values 2 h before the actual leak – the early activation of the copod detector can be considered a false positive. Scenario A-5 was clearly a challenge for half of the detectors, which is indicated by its second-worst mean detection score. However, looking at Figure 9(b), it seems that detectors that activated early were triggered by rising loss values. Since rising losses are also an indication of leaks in scenarios A-1, A-2 and A-3, it can be expected that algorithms trained to detect both kinds of leaks may be sensitive to such anomalies.
- 3.
Scenarios A-6 and B-1 are an example of the same event observed in two connected DMAs. The event was a result of pipeline maintenance works but exhibited clearly anomalous characteristics with sharp changes in both loss and pressure values. Interestingly, while in B-1 scenario, all detectors captured the event within 1 h, in A-6, a majority of detectors reacted to a sharp spike in loss values 25 h earlier. Only a copod detector reacted almost the same in both scenarios, which may indicate that it was triggered by changes in pressure values instead of loss (notice that both DMAs share a pressure sensor at the point of their connection). The difference in performance may result from the fact that B-1 DMA contains more water meters and has significantly higher raw inflow; therefore, the DMA consumption and loss functions have lower variance.
- 4.
Scenario C-1 is an example of a confirmed break in the pipeline resulting in a sharp spike in loss values and a drop in minimal and average pressures. A majority of detectors activated within 2 h which can be considered an acceptable result. Interestingly, both ECOD and COPOD detectors which share similarities in their design, acted differently – one activated early while the other activated late.
When considering detection scores presented in Table 1, the PCA,
-NN, AE, COPOD and LOF detectors were on average able to capture anomalies within 24 h time. On the other hand, ECOD and IFOREST detectors performed relatively poorly with regard to their mean score.
Two detectors: ECOD and IFOREST were not able to detect anomalies in multiple scenarios which may indicate their low sensitivity.
Activation times of the COPOD detector are visibly different from the remaining algorithms while its mean detection score is fourth among tested methods. When considering averaging scores of multiple detectors in some form of ensemble learning, this diversity makes COPOD a valid candidate for such an ensemble.
Figure 12 presents the detection probability of three example detectors in the A-3 scenario. In this example, the COPOW detector estimated probability function seems to be primarily correlated with average pressure values, while PCA and LOF functions follow the change in both the pressure and loss functions of DMA. A comparison of detection probabilities of all detectors and scenarios reveals that this is a common pattern: responses of COPOD and ECOD detectors share similarities and are less correlated with loss values than responses of the remaining detectors.
Detection results for scenario A-3 with estimated probability and an annotated moment of activation of three example detectors. The probability of PCA and LOF detectors is correlated with the loss function; therefore at some point, it becomes, saturated. The COPOD detector probability seems to depend more on pressure values. (a) COPOD detector, 0 h difference; (b) PCA detector, 1 h difference; (c) LOF detector, 94 h difference.
Detection results for scenario A-3 with estimated probability and an annotated moment of activation of three example detectors. The probability of PCA and LOF detectors is correlated with the loss function; therefore at some point, it becomes, saturated. The COPOD detector probability seems to depend more on pressure values. (a) COPOD detector, 0 h difference; (b) PCA detector, 1 h difference; (c) LOF detector, 94 h difference.


Detection results for scenario N-1 testing the detectors for false positives (FP). Vertical lines denote every case of detection.
Detection results for scenario N-1 testing the detectors for false positives (FP). Vertical lines denote every case of detection.
Discussion
Results indicate that on average, a large subset of anomaly detectors captures both the gradually growing and the sudden DMA leaks within the first 24 h. Compared to the MNF analysis which requires 2 or 3 days, the overall response time of anomaly detectors should be considered short.
Despite these promising results, the presented scenarios show the complexity of the leak detection problem in hourly data. Hourly consumption and pressure data show significant variance. The nature of the anomalies themselves is also varied, which makes it difficult to describe and classify them. It is even more difficult as the cases of actual, confirmed leakages, which may constitute training data for ML methods, are relatively rare. In addition, anomalies resulting from both leakages and other events may occur directly after each other or coexist, as in scenario A-5.
Considering the complexity of DMA hourly data and lack of training examples, one of the major problems of using anomaly detectors for detecting DMA leaks is their parameterisation, i.e. finding parameters that will allow for accurate detection while keeping the number of false alarms low. Since our approach to parameter selection involves averaging detection accuracy scores over several example scenarios (see Section 2.1.4), we can treat the aggregated score of the best parameter set as a measure of expected detector performance. The results of parameter selection are presented in Table 2. Comparing these scores with the final results in Table 1, it can be concluded that they are a good estimation of detector performance, especially with regard to the best (PCA) and the worst (ECOD, IFOREST) detectors. They are, however, not a good estimation of a dataset performance which can be expected, since the estimation of these scores for a dataset is performed using the remaining datasets. The results for scenario N-1 indicate that the FPR of the detectors with the proposed parameterisation scheme is relatively low (% for the majority of algorithms tested). In practice, the detector parameters, and in particular, the detection threshold, are manually adjusted for most DMAs, which helps to keep the FPR low.
Results of parameter selection described in Section 2.1.4
Detector . | A-1 . | A-2 . | A-3 . | A-4 . | A-5 . | A-6 . | B-1 . | C-1 . | Avg . |
---|---|---|---|---|---|---|---|---|---|
COPOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
ECOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
k-NN | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
LOF | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
PCA | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
OCSVM | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
AE | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
IFOREST | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Avg | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Detector . | A-1 . | A-2 . | A-3 . | A-4 . | A-5 . | A-6 . | B-1 . | C-1 . | Avg . |
---|---|---|---|---|---|---|---|---|---|
COPOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
ECOD | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
k-NN | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
LOF | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
PCA | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
OCSVM | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
AE | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
IFOREST | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Avg | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
The table presents averaged scores of detectors over training scenarios (using the leave-one-scenario-out approach) for the best set of parameters that were used in the final experiment. Values in the table can be treated as a measure of the expected performance of the detector in a given scenario. Note that the value in the table for a given scenario/detector pair is estimated i.e. it is created without access to this scenario data.
Possible approaches to better detector parameterisation include extending the set of training scenarios by examples where anomalies do not exist – these examples are more common than ones with anomalies present and their inclusion may lower the number of false positives; allowing for periodic detector retraining; employing an ensemble of multiple detectors which vote for the final score.
Regarding the problem of feature selection, the set of features in our experiments was chosen as a result of initial experiments. Example alternative candidate sets included an extended set with additional features characterising missing values in hourly consumptions of individual DMA sensors. The incompleteness of data results from physical constraints related to the acquisition process, e.g. loss of packets transmitted over the radio which results in underestimated DMA consumption values which must be corrected with data imputation. However, compared to the set of parameters used in experiments, the extended set was on average times worse than the chosen set with regard to mean scenario scores in Table 1 and
times worse with regard to mean detector scores. Another example was a reduced set including only DMA loss and raw inflow values as well as minimal and average DMA input pressures; this set of parameters was worse than the chosen one
times with regards to mean scenario scores and
times with regards to mean detector scores.
CONCLUSIONS
The goal of our experiments was to test the performance of anomaly detectors applied to detecting leaks in hourly DMA loss and pressure data. We focus on two types of leaks: the gradually growing breaks resulting in a rise in DMA losses over an extended period and sudden leaks resulting in sharp changes in loss and/or pressure values. We used eight unique datasets with examples of anomalies and leaks collected by the analysis of the annual data of four real DMAs in Poland and eight representative state-of-the-art (SOA) anomaly detectors.
Our results suggest that on average, anomaly detectors can detect both types of leaks in less than h and sometimes within 1–2 h of the incident. This is a promising result when compared with MNF analysis which usually requires data from 2 or 3 days. On the other hand, paramaterisation of detectors is challenging due to variance in hourly DMA data and a small number of example incidents which can be used as training data.
The main topics of future work will be: improving the parameterisation of detectors, examining their performance in a scenario where incidents occur one after another, and the classification of detected leaks.
ACKNOWLEDGEMENTS
This work has been partially supported by the Polish National Centre for Research and Development grant POIR.01.01.01-00-1414/20-00, ‘Intelligence Augumentation Ecosystem for analysts of water distribution networks’.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.