This work proposes a reliable leakage detection methodology for water distribution networks (WDNs) using machine-learning strategies. Our solution aims at detecting leakage in WDNs using efficient machine-learning strategies. We analyze pressure measurements from pumps in district metered areas (DMAs) in Stockholm, Sweden, where we consider a residential DMA of the water distribution network. Our proposed methodology uses learning strategies from unsupervised learning (K-means and cluster validation techniques), and supervised learning (learning vector quantization algorithms). The learning strategies we propose have low complexity, and the numerical experiments show the potential of using machine-learning strategies in leakage detection for monitored WDNs. Specifically, our experiments show that the proposed learning strategies are able to obtain correct classification rates up to 93.98%.

  • Leakage detection in water distribution networks using efficient machine-learning strategies.

  • We analyze pressure measurements from pumps in district-metered areas in Stockholm, Sweden, where we consider a monitored subarea of the water distribution network.

  • Our proposal can be applied to leakage detection scenarios where we have access to water pressure measurements at different points of the WDN.

The usage of pipeline and pipe networks for the transport of water and other fluids has continuously evolved since the past century, and these technological enhancements made this mode of transport more reliable (Lawal 2001). In spite of several pressurized pipeline advantages, see Sharma & Maheshwari (2017), the pipeline and pipe networks need to operate in a secure and sustainable manner, which is challenged by frequent events of leaks and bursts. Early detection is one of the most suitable strategies to minimize the loss of resources.

Leakage detection solutions for water distribution networks (WDNs) have been the subject of research for more than two decades (Gupta & Kulat 2018; Zaman et al. 2020). Since then, as detailed by Li et al. (2015), several techniques have been developed, including hardware (acoustic and non-acoustic solutions) and software (numerical and non-numerical modelling solutions) methods. Moreover, the work by Chan et al. (2018) has reviewed current intelligent technologies focusing on non-numerical modelling solutions, such as the machine-learning strategies of support vector machines (SVMs), neural networks, and convolutional neural networks.

In the context of hardware methods, the recent survey of Moubayed et al. (2021) has summarized the state-of-the-art strategies for water leakage detection including ground radar and acoustic solutions, such as reflectometry and Piezoelectric sensor. Furthermore, the works by Lai et al. (2016) and Senin et al. (2019) have extensively studied the ground penetrating radar method. Moreover, the works by Papadopoulou et al. (2008) and Moubarak et al. (2011) have comprehensively studied the reflectometry and Piezoelectric sensor solutions, respectively.

Within the machine-learning strategies context, the recent studies by Soldevila et al. (2017) and Xing & Sela (2019) have confirmed the efficiency of leakage detection modelling based on pressure analysis and machine-learning techniques. Furthermore, Vrachimis et al. (2022) have summarized several results obtained in the emerging framework of the Battle of the Leakage Detection and Isolation Methods (BattLeDIM). In addition, Marzola et al. (2022) have proposed a solution for the BattLeDIM problem based on the analysis of observed data and hydraulics simulations.

As can be verified by Belka et al. (2018) and Sousa et al. (2019), there have been abundant successful prototype-based solutions for diverse anomaly detection applications, such as condition monitoring of electrical motors. On the other hand, the performance of these algorithms is strongly dependent on the pre-specified number of prototypes (Biehl et al. 2016).

Moreover, Villmann et al. (2017) have summarized the state-of-the-art prototype-based models and have discussed that solutions formulated using prototypes can provide more understandable results than the ones formulated using SVMs and deep learning schemes. Therefore, instead of using robust non-linear solutions, we assess the state-of-the-art prototype-based models to investigate machine-learning strategies for leakage detection in WDNs.

Motivated by those mentioned successful cases, our goal is to propose a reliable leakage detection solution that demands low complexity to analyze pressure measurements acquired from WDNs in municipal areas. Specifically, our approach is a non-numerical modelling solution for detecting leakage in WDNs through the analysis of observed water pressure data using low-complexity learning strategies.

To achieve our goal, we design representative sets with a reduced number of prototypes for generating a compact and realistic dataset for fault detection/classification of a monitored water distribution network. Specifically, we first cluster the observed water pressure data into understandable subgroups; in the following, we train prototypes to represent the generated subgroups; finally, we use the trained prototypes to process operational condition predictions for newly observed water pressure data.

Within the context of the prototype-based models, we propose low-complexity strategies based on both unsupervised and supervised learning. For the unsupervised method, we use the conventional K-means and cluster validation techniques. For the supervised method, we use crucial learning vector quantization (LVQ) classifiers. Specifically, we determine the number of prototypes through a clustering and cluster validation procedure per class label that can determine an adequate number of prototypes to obtain representative subsets of the input data. Then, we fine-tune the prototypes of these generated subgroups using LVQ classifiers.

Moreover, since our solution does not require hydraulic modelling, we are agnostic to it and only water pressure measurements are of our interest. Therefore, as a software-based solution, our proposal can be applied to leakage detection scenarios where we have access to water pressure measurements at different points of the WDN. To this end, we analyze water pressure measurements from pumps in district-metered areas (DMAs) in Stockholm, Sweden. To evaluate our solution, we consider a monitored subarea of the WDN. Our numerical experiments show that the proposed learning strategies are able to obtain correct classification rates of up to 93.98%.

Dataset description

In this study, we consider a set of real water pressure measurements from the water and wastewater company of Stockholm, Sweden (SVOA, Stockholm Vatten och Avfall). In this dataset, we analyze the observed water pressure data in four selected pumping stations collected from January 2018 to March 2019. These stations are located in a DMA of the WDN. The DMA corresponds to a residential area that has a total population of 70,250 people. Moreover, there are no tanks or reservoirs in the monitored area. Figure 1 shows the approximate positions of the pumping stations in the DMA. Due to a privacy agreement with SVOA, we do not identify these stations or reveal the DMA network. Hence, we generically label these pumps as A, H, K, and S.
Figure 1

Approximate positions of the pumping stations in the DMA.

Figure 1

Approximate positions of the pumping stations in the DMA.

Close modal

The dataset represents the pumping stations operating in normal and faulty (presence of leakage) working conditions, these conditions are distinguished through a maintenance report provided by the water company. Since leakage detection is an anomaly detection problem, the majority of observations are normal conditions and this imbalance is shown in Table 1.

Table 1

Number of observed days per working condition

Year
20182019Total
Condition Normal 324 70 394 
Leakage 34 20 54 
Total 358 90 448 
Year
20182019Total
Condition Normal 324 70 394 
Leakage 34 20 54 
Total 358 90 448 

In the dataset, the hydraulic data are stored for entire days of acquisition with a 1-min sampling frequency. For the raw database, we denote by ‘sample’ a pressure signal stored as a vector of 1,440 components for each station and for each day. During the aforementioned period, there are 7 days with excessive missing values and, as a consequence, we remove these samples from this analysis. Therefore, there are 448 available observed days to build the prediction model, and the total number of samples is denoted by .

Let denote the 1,440 pressure measurements from pump during the nth day. Then, the sample during the nth day is denoted by , which has 1,440 rows and four columns.

Figure 2 shows the stored pressure time series at each pump station separated by a label. Note that there is a correlation between the normal and faulty (presence of leakage) operation conditions on the selected stations. Therefore, for effective data analysis, features must be extracted from the raw data where the observations are mapped to a set of feature vectors.
Figure 2

Stored daily time series for each pump and working conditions (in metres of water column).

Figure 2

Stored daily time series for each pump and working conditions (in metres of water column).

Close modal

Feature extraction

To generate suitable feature vectors that represent the proposed engineering application, we apply a canonical discriminant function on the original time series vectors to obtain linear combinations of the projected time series vectors (known as canonical variables). Further explanation of canonical analysis is given in Rencher (1992).

The procedure for building the feature vectors comprises the following steps: (i) define the acquisition period and the sampling rate; (ii) read the pressure signals from a selected station; (iii) separate the samples according to their corresponding labels, such as normal and leakage; (iv) calculate the within-group W and between-group B scatter matrices (see definition in the Section ‘Cluster validation techniques’); (v) obtain the eigenvector , which is the eigenvector associated to the largest eigenvalue of the matrix ; (vi) obtain the projected data , which is a projection of the original data, by applying the inner product between and the raw data matrix ; (vii) repeat the steps (ii) to (vi) for the remaining pump stations; (viii) finally, concatenate every projected data as:
formula
(1)

In summary, the treated dataset D is comprised of 448 four-dimensional-labelled feature vectors, in which the attribute values represent the canonical values obtained from the most representative canonical function.

In addition to the effective sample representation, we also investigate the existing data imbalance through sampling tuning. For this particularly challenging task, we want to measure the impact of the majority decrease of samples with normal conditions. We hypothesize that by adjusting the trade-off between sample quality representation (pressure signals sampling) and label equilibrium, we can further improve the recognition rates of supervised classifiers trained with the selected dataset.

For this task, we modified the sampling rate and increased the number of leakage cases by the respective gain factors: {3 × |5 × |15 × } (e.g. 3-min sampling frequency for the 3× gain factor). These variants of the dataset are described in Table 2. Note that the first column (variable p) shows the decrease in the signal quality representation, whereas the last column shows the equilibrium rate between the number of normal, NN, and leakage samples, NL, respectively.

Table 2

Database setups used in this study

pNSNNNL (%)
Original 1,440 448 394 54 13.71 
3 × Leakage 480 556 394 162 41.12 
5 × Leakage 288 664 394 270 68.53 
3 × N + 15 × L 96 1,992 1,182 810 68.53 
pNSNNNL (%)
Original 1,440 448 394 54 13.71 
3 × Leakage 480 556 394 162 41.12 
5 × Leakage 288 664 394 270 68.53 
3 × N + 15 × L 96 1,992 1,182 810 68.53 

Prototype-based models

Prototype-based models are recognized in machine-learning due to their potential to explicitly represent observations (Biehl et al. 2016). Prototypes are reference vectors used to represent subsets of the input data, in terms of distances dissimilarity measures. As a consequence, it is possible to directly compare input data using prototypes. The prototypes compete to represent data regions, and their positions are updated during the training, which can be unsupervised (e.g. clustering methods) or supervised (e.g. LVQ classifiers). Due to this reason, in the Section ‘Cluster validation techniques’ we present the relevant literature on cluster validation techniques, and in the Section ‘LVQ classifier techniques’ we introduce pertinent improvements on LVQ classifier techniques.

Cluster validation techniques

Techniques for cluster validation are used a posteriori to evaluate the results of a given clustering algorithm. However, each cluster validation index has its own set of assumptions to quantify the groups’ cohesion and separation. Hence, the final results (e.g. the most adequate number of groups to generate representative subsets) may vary across the chosen techniques. In the following, we give some necessary definitions for the clustering techniques.

We denote K as the number of clusters, as the most suitable number of clusters according to a given cluster validation technique, as the centroid of the input data matrix X, as the number of objects in cluster , as the centroid of cluster , and as the th feature vector, , belonging to the cluster .

  • (i)
    The Davies–Bouldin (DB) index (Davies & Bouldin 1979): it is a function of the ratio of the sum of scatter within the clusters and the separation between clusters, using the clusters’ centroids. Initially, we need to compute the scatter within the th cluster and the separation between the th and th clusters, and , respectively, which are defined as:
    formula
    (2)
    where ∥·∥ is the Euclidean norm. Finally, the DB index is defined as:
    formula
    (3)

The value of K leading to the smallest is chosen as the , i.e., .

  • (ii) The Dunn index (Dunn 1973): it is a function defined as:
    formula
    (4)
    where and , with d(·,·) denoting a dissimilarity function between vectors. Note that while is a measure of the separation between clusters and , is a measure of the dispersion of data within the cluster . The value of K resulting in the largest Dunn() is chosen as the , i.e., Dunn.
  • (iii) The Calinski–Harabasz (CH) index (Calinski & Harabasz 1974): it is a function defined as:
    formula
    (5)
    where is the between-group scatter matrix for the data partitioned into K clusters, and is the within-group scatter matrix for data clustered into K clusters. The trace(·) denotes the trace operator. The value of K resulting in the largest CH(K) is chosen as the , i.e., CH.
  • (iv) The Silhouettes (Sil) index (Rousseeuw 1987): it is a function defined as:
    formula
    (6)
    formula
    (7)
    where a(i) represents the average dissimilarity of the ith feature vector to all other vectors within the same cluster, and b(i) denotes the lowest average dissimilarity of the ith feature vector to any other cluster of which it is not a member. The silhouettes can be calculated with any dissimilarity metric, such as the Euclidean or Manhattan distances. The value of K producing the largest Sil(K) is chosen as the , i.e., .

LVQ classifier techniques

LVQ classifiers are a family of algorithms for statistical pattern classification introduced in the late 1980s (Kohonen 1988), which led to the proposal of several variants. The main advantages of LVQ methods are their flexibility and intuitiveness because they are established on the notion that samples belonging to distinct labels are separated among data regions.

For the LVQ classifiers presented in this section, we make the following definitions. Let us consider a set of training input–output samples , where denotes the tth input sample and denotes its corresponding class label. Note that is a categorical variable, which assumes only one out of L values in the finite set .

For the family of LVQ classifiers, we have , i.e., the number of prototypes () is higher than the number of classes (). As a consequence, different prototypes may share the same label. Given a set of labelled prototype vectors , , the class assignment for a new input sample is based on the decision criterion that the class of must be the same of the class of , where in which d(·,·) denotes a dissimilarity measure specific to the extension of LVQ and c is the index of the nearest prototype among the K ones available.

Some relevant LVQ algorithms are discussed below respecting their chronological order. We first outline the original algorithm, LVQ1 (Kohonen 1988), which does not have a cost function to ensure convergence to the optimal solution. The following two algorithms, LVQ2.1 and LVQ3 (Kohonen 1988), present improvements to obtain higher convergence speed. Then, the generalized LVQ (GLVQ) (Sato & Yamada 1995) is the first one to propose a cost function, whereas the relevance LVQ (RLVQ) (Bojer et al. 2001) is the pioneer to use a distance learning approach, which also learns the relevance of each feature. Finally, the generalized relevance LVQ (GRLVQ) (Hammer & Villmann 2002) and the locally generalized relevance LVQ (LGRLVQ) include improvements by manipulating distance learning with the GLVQ cost function (Hammer et al. 2005).

In a nutshell, the LVQ variants LVQ1, LVQ2.1, LVQ3, and RLVQ are heuristic solutions, and the LVQ variants GLVQ, GRLVQ, and LGRLVQ present cost functions that guarantee the convergence. Further explanation of these LVQ variants is given in Nova & Estévez (2014).

Summary

The exploited techniques to compose the methodology by their characteristics and purpose are summarized in Table 3.

Table 3

Summary of the exploited techniques

TechniqueCharacteristicApplication
Canonical discriminant function Feature extraction Reduce redundant information 
K-means Unsupervised learning Clustering 
DB index Unsupervised learning Cluster validation technique 
Dunn index Unsupervised learning Cluster validation technique 
LVQ1 Supervised learning Classification 
LVQ2.1  Supervised learning Classification 
LVQ3 Supervised learning Classification 
RLVQ Supervised learning Classification 
GLVQ Supervised learning Classification 
GRLVQ Supervised learning Classification 
LGRLVQ Supervised learning Classification 
TechniqueCharacteristicApplication
Canonical discriminant function Feature extraction Reduce redundant information 
K-means Unsupervised learning Clustering 
DB index Unsupervised learning Cluster validation technique 
Dunn index Unsupervised learning Cluster validation technique 
LVQ1 Supervised learning Classification 
LVQ2.1  Supervised learning Classification 
LVQ3 Supervised learning Classification 
RLVQ Supervised learning Classification 
GLVQ Supervised learning Classification 
GRLVQ Supervised learning Classification 
LGRLVQ Supervised learning Classification 

In this section, we evaluate the proposed methodology to find the optimal number of prototypes and their positions for the two types of classes existing in the available dataset, whose labels are represented as N (normal) and L (leakage).

We begin our analysis by validating the canonical discriminant function, which is the technique we use to extract relevant information from the raw pressure time series. Considering the scenario where all samples are available to be extracted, we obtain the class separation as shown in Figure 3. In this figure, the normal and leakage samples are represented by blue and orange dots. As observed in the scatter plots for each attribute (the pumps), the proposed strategy is able to generate desirable separations between the contrasting states.
Figure 3

Label separation obtained if we use all the samples of the original dataset to calculate the matrix . Note that here the blue and orange colours represent the normal and leakage samples. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.054.

Figure 3

Label separation obtained if we use all the samples of the original dataset to calculate the matrix . Note that here the blue and orange colours represent the normal and leakage samples. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.054.

Close modal

For each classifier, 100 independent runs of training and testing are carried out. For each run, the four steps of the proposed methodology are executed: (i) the division of the dataset into training (80%) and validation (20%) sets; (ii) canonical discriminant analysis of the training set and projection of the validation set (see description in the Section ‘Feature extraction’); (iii) determination of the and prototypes’ positions via application of clustering and cluster validity techniques per data class; and (iv) LVQ training and testing. At the end of each run, the accuracy rate of each classifier is determined.

Specifically, in the third step we run the K-means algorithm 10 independent times and we choose the execution that produced the lowest value of mean squared quantization error. We repeat this procedure considering the quantity of prototypes ranging from 2 to 10 to obtain the per class according to the suggestion of each cluster validation technique defined in Section ‘Cluster validation techniques’. Finally, we define the using the majority voting among the suggested values.

The histograms of the suggested per class of each cluster validation technique along the 100 independent turns are shown in Figure 4. From this figure, we can verify that the Dunn index is the option with biggest dispersion (consequently, the least reliable), while Silhouettes and CH are consistently providing similar suggestions for per class.
Figure 4

Histograms of per label according to each cluster validity technique. (a) Original: N class; (b) 3 × Leakage: N class; (c) 5 × Leakage: N class; (d) 3 × N + 15 × L: N class; (e) Original: L class; (f) 3 × Leakage: L class; (g) 5 × Leakage: L class; and (h) 3 × N + 15 × L: L class.

Figure 4

Histograms of per label according to each cluster validity technique. (a) Original: N class; (b) 3 × Leakage: N class; (c) 5 × Leakage: N class; (d) 3 × N + 15 × L: N class; (e) Original: L class; (f) 3 × Leakage: L class; (g) 5 × Leakage: L class; and (h) 3 × N + 15 × L: L class.

Close modal

The frequency distribution of the suggested per class resulting from the majority voting scheme along the 100 independent turns is shown in Table 4. From this table, it can seen that we obtain the setup along the 100 turns for most of the datasets. For the 3 × Leakage dataset, we obtain the setup along 96 turns, the setup along a single turn, and the setup along three turns. Therefore, it is valuable to emphasize that the following classification procedures emulate scenarios extremely limited of resources and, consequently, present low computational cost.

Table 4

Distribution of the suggested optimal number of prototypes per class

DatasetClasses
Original  [100,100] [0,0] [0,0] [0,0] [0,0] 
3 × Leakage  [96,100] [0,0] [1,0] [0,0] [3,0] 
5 × Leakage  [100,100] [0,0] [0,0] [0,0] [0,0] 
3 × N + 15 × L  [100,100] [0,0] [0,0] [0,0] [0,0] 
DatasetClasses
Original  [100,100] [0,0] [0,0] [0,0] [0,0] 
3 × Leakage  [96,100] [0,0] [1,0] [0,0] [3,0] 
5 × Leakage  [100,100] [0,0] [0,0] [0,0] [0,0] 
3 × N + 15 × L  [100,100] [0,0] [0,0] [0,0] [0,0] 

The statistical performance of each LVQ-based classifier is shown in Figure 5. A closer look at these metrics reveals a small increase over the maximum classification rates in comparison with the most complex LVQ variants, where we obtain up to 93.98% when using the LGRLVQ. For the minimum classification rates, we observe that the LVQ2.1 has the worst performance. In addition, we verify that the LVQ1 has the lowest dispersion values of standard deviation rates.
Figure 5

Classification accuracies obtained by the LVQ algorithms: (a) maximum rates, (b) minimum rates, (c) mean rates, and (d) standard deviation rates.

Figure 5

Classification accuracies obtained by the LVQ algorithms: (a) maximum rates, (b) minimum rates, (c) mean rates, and (d) standard deviation rates.

Close modal
To further test our model, we also compare the LVQ variants by investigating the results obtained from the F1 score (see Figure 6(a)). We observe that for scenarios with a few number of prototypes, a substantial increase on the maximum classification results is achieved when using more complex LVQ schemes. However, such complex LVQ schemes lead to undesired decrease on the minimum classification rates. Among the most complex LVQ variants, the GRLVQ has the least depreciation on the minimum classification rates and has classification rates up to 91.73% on the original dataset. Therefore, we consider it as the preferable LVQ-based classifier.
Figure 6

Boxplots of the different LVQ-based classifier results: (a) F1 score (%) rates obtained by the evaluated LVQ classifiers and (b) GRLVQ relevance vector weights.

Figure 6

Boxplots of the different LVQ-based classifier results: (a) F1 score (%) rates obtained by the evaluated LVQ classifiers and (b) GRLVQ relevance vector weights.

Close modal

A major characteristic of relevance oriented modifications for LVQ models (such as RLVQ, GRLVQ and LGRLVQ) is that we are able to check the relevance attributes’ weights in order to have a direct notion of which pumps have more influence on the classifiers performance. Accordingly, the last aspect we highlight relates to the relevance vectors weights obtained after the GRLVQ training (see Figure 6(b)). These empirical observations reveal that the first attribute (Pump A) is the most important one and displays an equilibrium tendency of the pumps relevance in the sense that we reduce the data imbalance.

In this work, we proposed a non-numerical modelling method for water leakage detection in WDNs through the analysis of observed pressure data by means of machine-learning strategies. To evaluate our solution, we considered water pressure measurements from pumps in a residential DMA of the WDN of Stockholm, Sweden.

We proposed low complexity machine-learning strategies for leakage detection. Specifically, our strategies used techniques from both unsupervised and supervised learning methods. For the numerical experiments of our proposed solution, we used a real dataset from a DMA in Stockholm, Sweden.

The numerical experiments showed the potential benefits of using machine-learning strategies in the leakage detection of monitored WDNs. Specifically, we obtained classification rates up to 93.98% when using the locally GRLVQ algorithm. Among the compared algorithms, the GRLVQ had the least depreciation on the minimum values of the F1 score. Moreover, the GRLVQ showed promising maximum values of classification accuracies (e.g. 91.73% on the original dataset) while computing the importance of each pump. Regarding the importance of the considered pumps, the GRLVQ revealed that Pump A was the most significant for the training of our machine-learning-based solution.

Therefore, since our solution does not require hydraulic modelling, we showed the possibility of leakage detection solutions without neither the modelling of the hydraulic system nor the knowledge of particular information about the network architecture. Specifically, such benefits make our proposal leakage detection algorithm suitable to be applied in real-world scenarios where measurements are available, but without much prior knowledge about them.

When a higher level description of the WDN is available, it is possible to use such knowledge about the network architecture to apply clustering methods aiming to divide the DMA and reduce the search area for the localization of the predicted leakages. Therefore, our machine-learning strategies can be extended and support solutions formulated by hydraulic modelling.

An important aspect to highlight is the required amount of data to properly generate the predictive system. We acknowledge that scenarios with non-sufficient data for training could lead to significantly misleading outcomes when anomalous behaviours in DMAs are analyzed. Therefore, we analyzed the total amount of collected data (15 months) that the SVOA company has shared with us to conduct our machine-learning strategies. From the water utility side, the company may continuously collect new observations to increase the reliability of the predictive system.

For future works, we aim to investigate federated learning strategies to obtain confident data analysis preserving the privacy of the collected information on the pumps for the same fault diagnosis task. This is of high importance when dealing with sensitive and critical information, such as water supply and pumps locations. Moreover, the federated model would be capable of monitoring an entire WDN and distinguishing different DMAs. This would turn the problem into not only a classification but also a preliminary localization (yet in a region) problem.

The authors would like to thank the partial financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) – Finance Code 001, Grant 88887.155782/2017-00, CNPq Proc. 313151/2020-2 and FUNCAP Grant PS-0186-00103.01.00/21. The authors would like to thank the Mistra InfraMaint Program for the financial support. The authors also thank Stockholm Vatten och Avfall company, Stockholm, Sweden, for providing the data used in this study.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Biehl
M.
,
Hammer
B.
&
Villmann
T.
2016
Prototype-based models in machine-learning
.
Wiley Interdisciplinary Reviews: Cognitive Science
7
(
2
),
92
111
.
Bojer
T.
,
Hammer
B.
,
Schunk
D.
&
Von Toschanowitz
K. T.
2001
Relevance determination in learning vector quantization
.
European Symposium on Artificial Neural Networks
1
,
271
276
.
Calinski
T.
&
Harabasz
J.
1974
A dendrite method for cluster analysis
.
Communications in Statistics-Theory and Methods
3
(
1
),
1
27
.
Davies
D. L.
&
Bouldin
D. W.
1979
A cluster separation measure
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
PAMI-1, (
2
),
224
227
.
Gupta
A.
&
Kulat
K. D.
2018
A selective literature review on leak management techniques for water distribution system
.
Water Resources Management
32
(
10
),
3247
3269
.
Hammer
B.
&
Villmann
T.
2002
Generalized relevance learning vector quantization
.
Neural Networks
15
(
8–9
),
1059
1068
.
Hammer
B.
,
Strickert
M.
&
Villmann
T.
2005
On the generalization ability of GRLVQ networks
.
Neural Processing Letters
21
(
2
),
109
120
.
Kohonen
T.
1988
An introduction to neural computing
.
Neural Networks
1
(
1
),
3
16
.
Lai
W. W. L.
,
Chang
R. K. W.
,
Sham
J. F. C.
&
Pang
K.
2016
Perturbation mapping of water leak in buried water pipes via laboratory validation experiments with high-frequency ground penetrating radar (GPR)
.
Tunnelling and Underground Space Technology
52
,
157
167
.
Lawal
M. O.
2001
Historical development of the pipeline as a mode of transportation
.
Geograph Bull
43
(
2
),
91
99
.
Li
R.
,
Huang
H.
,
Xin
K.
&
Tao
T.
2015
A review of methods for burst/leakage detection and location in water distribution systems
.
Water Science and Technology: Water Supply
15
(
3
),
429
441
.
Marzola
I.
,
Mazzoni
F.
,
Alvisi
S.
&
Franchini
M.
2022
Leakage detection and localization in a water distribution network through comparison of observed and simulated pressure data
.
Journal of Water Resources Planning and Management
148
(
1
),
04021096
.
Moubarak
P. M.
,
Ben-Tzvi
P.
&
Zaghloul
M. E.
2011
A self-calibrating mathematical model for the direct piezoelectric effect of a new MEMS tilt sensor
.
IEEE Sensors Journal
12
(
5
),
1033
1042
.
Moubayed
A.
,
Sharif
M.
,
Luccini
M.
,
Primak
S.
&
Shami
A.
2021
Water leak detection survey: challenges & research opportunities using data fusion & federated learning
.
IEEE Access
9
,
40595
40611
.
Nova
D.
&
Estévez
P. A.
2014
A review of learning vector quantization classifiers
.
Neural Computing and Applications
25
(
3
),
511
524
.
Papadopoulou
K. A.
,
Shamout
M. N.
,
Lennox
M.
,
Mackay
D.
,
Taylor
A. R.
,
Turner
J. T.
&
Wang
X.
2008
An evaluation of acoustic reflectometry for leakage and blockage detection
.
Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
222
(
6
),
959
966
.
Rencher
A. C.
1992
Interpretation of canonical discriminant functions, canonical variates, and principal components
.
The American Statistician
46
(
3
),
217
225
.
Rousseeuw
P. J.
1987
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
.
Journal of Computational and Applied Mathematics
20
,
53
65
.
Sato
A.
&
Yamada
K.
1995
Generalized learning vector quantization
.
Advances in Neural Information Processing Systems
8
, 423–429.
Senin
S. F.
,
Jaafar
M. S.
&
Hamid
R.
2019
Locating underground water pipe leakages via interpretation of ground penetrating radar signals
.
International Journal of Engineering & Technology
8
(
2
),
72
77
.
Sharma
S. K.
&
Maheshwari
S.
2017
A review on welding of high strength oil and gas pipeline steels
.
Journal of Natural Gas Science and Engineering
38
,
203
217
.
Soldevila
A.
,
Fernandez-Canti
R. M.
,
Blesa
J.
,
Tornil-Sin
S.
&
Puig
V.
2017
Leak localization in water distribution networks using Bayesian classifiers
.
Journal of Process Control
55
,
1
9
.
Sousa
D. P.
,
Barreto
G. A.
,
Cavalcante
C. C.
&
Medeiros
C.
2019
Lvq-type classifiers for condition monitoring of induction motors: a performance comparison
. In: Vellido, A., Gibert, K., Angulo, C. & Guerrero, J. D. M. (eds).
International Workshop on Self-Organizing Maps
.
Springer
,
Cham
, pp.
130
139
.
Vrachimis
S. G.
,
Eliades
D. G.
,
Taormina
R.
,
Kapelan
Z.
,
Ostfeld
A.
,
Liu
S.
,
Kyriakou
M.
,
Pavlou
P.
,
Qiu
M.
&
Polycarpou
M. M.
2022
Battle of the leakage detection and isolation methods
.
Journal of Water Resources Planning and Management
148
,
12
.
Zaman
D.
,
Tiwari
M. K.
,
Gupta
A. K.
&
Sen
D.
2020
A review of leakage detection strategies for pressurised pipeline in steady-state
.
Engineering Failure Analysis
109
,
104264
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).