This article presents a model-based fusion classifier technique to estimate the approximate position of a leak in a water distribution network (WDN). This technique uses residuals obtained by comparing pressure measurements with the nominal behavior estimated from a mathematical model. These residuals are analyzed as a classification task to associate the information with an approximate leak location. The classification task is performed by three different classifiers: the K-Nearest Neighbors algorithm, the Multilayer Perceptron, and the Decision Tree classifier. The outputs of these classifiers are combined using two ensemble/fusion methods, the Majority Voting and the Naive Bayes fusion, to improve the accuracy of the leak position estimation. Results from a benchmark problem comprising 126 nodes, 8 valves, 2 tanks, 2 reservoirs, 3 pumps, and 168 pipes, subject to four variable demand patterns, illustrate the performance improvement of the two ensemble methods over an individual classifier. Additionally, to create a more realistic scenario, the performance of the proposed leak localization scheme is evaluated in a perturbed scenario by noise and uncertainties in the demand estimation.

  • A fusion classifier technique for more accurate leak localization in water distribution networks.

  • Model-based fusion classifier technique integrating KNN, Multilayer Perceptron, and Decision Tree algorithms for precise leak localization in water networks.

  • Improved leak position estimation through Majority Voting and Naive Bayes fusion methods.

  • Evaluated under real-world conditions with noise and demand estimation uncertainties.

  • Shows improvements over individual classifiers.

Water loss management is an issue facing water suppliers around the world due to the potential consequences related to safety, economic, and environmental damage. Leaks can occur as the water infrastructure deteriorates, leading to corrosion, material fatigue, and joint failures, resulting in pipe degradation over time. In addition, high operating pressures in the water network and hydraulic transients intensify these problems by inducing dynamic stresses that accelerate crack propagation (Jara-Arriagada & Stoianov 2024). To mitigate water loss, pressure management has been widely implemented in water distribution network (WDN), demonstrating that regulating pressure effectively reduces leakage (Meniconi et al. 2024). Moreover, increased water consumption by end users causes greater pressure fluctuations and placing additional stress on service lines, leading to high damage probability in the pipes (Meniconi et al. 2022). Beside that, soil movement, temperature fluctuations, and inadequate installation practices are other contributing factors in the water network leakages (Rezaei et al. 2015). Consequently, leakage rates between 30 and 50% are common in water distribution systems (World Water Assessment Programme United Nations and UN-Water 2009; Puig et al. 2016).

In this regard, leak detection and localization methods based on different approaches have been proposed to address this problem. One of these approaches is model-based methods, where nodal pressure and flow measurements, hydraulic models, and different estimation methods are used to perform online leak detection and localization tasks. These methods formulate the leak localization problem based on different design considerations and mathematical assumptions. Pudar & Liggett (1992) addresses the leak issue as a least square parametric estimation problem. Pérez et al. (2011, 2014) propose a sensitivity analysis approach, where a comparison between the pressure in a nominal state and a leak scenario is stored in a matrix and analyzed to obtain leak localization. However, these methods do not consider uncertainties in demand estimation and measurement disturbances, which are presented in real applications. Another study proposed by Valizadeh et al. (2009) and Leu & Bui (2016) where the leakage localization problem is treated as a classification task. These approaches are considered data-driven, as they only require experimental data without the need for mathematical models. However, the performance of these methods is strongly related to the training process, and in real applications, data sets that cover all possible faults can be difficult to obtain.

In this way, Puig et al. (2016) and Sarrate et al. (2014) propose a mixed model-based/data-driven method, where a residual generated from the nominal state and the fault condition is analyzed using classification techniques. Unlike purely data-driven methods, this approach does not require datasets to cover all possible faults, as the model-based scheme allows the generation of potential faults. However, determining a classifier that can handle all test data or guarantee correct labeling can be difficult to achieve, as the nature of demand patterns, leak size, uncertainty level, network structure, and operation varies. Additionally, a single classifier is generally unable to manage the wide data variability. Another approach proposed by Vrachimis et al. (2021) uses a priori available information about the system to improve the accuracy of the model-based leak detection and isolation technique. This approach integrates sensor measurements within an optimization framework to effectively prelocalize a potential leak. Furthermore, in 2020, as part of the CCWI/WDSA conference, the BattLeDIM competition was held to evaluate methods for detecting and locating leaks in a simulated water network based on a real system. Participants used algorithms based on techniques such as machine learning and statistical analysis. Vrachimis et al. (2022) shows that although effective solutions emerged, only 50% of the maximum score is achieved, highlighting the potential for improving these methodologies. This suggests that fusion methods for leak localization, rather than relying on a single approach, can yield more robust results and better address the complexities inherent in real-world systems.

Recent pattern classification techniques use a combination of classifiers and fuse decisions to obtain a result that outperforms each of the single classifiers (Mangai et al. 2010). The current literature presents various applications of fusion classifier methods across multiple fields, such as finance (Tsai 2014), anatomy (Heckemann et al. 2006), earth sciences (Tsai 2014), and manufacturing (Mar et al. 2011), among others. Additionally, experimental studies demonstrate that collecting and combining the outputs of multiple classifiers reduce generalization error (Rokach 2014).

To improve the general performance of the mixed model-based/data-driven approach, this paper proposes a mixed model-based/data-driven approach using a fusion scheme to deal with the leakage localization problem. The proposed classifier comprises a decision tree classifier, a Multilayer Perceptron (MLP), and K-Nearest Neighbors (KNN) algorithms, combined through two different fusion rules. The fusion rules are Majority Voting (MV) and Naive Bayes (NB) combination, which are compared with data from a case study to determine the more appropriate rules for the application. The hydraulic model of the case study corresponds to the Network 1 used to evaluate the algorithm performance in the contest ‘The Battle of the Water Sensor Networks’, BWSN (Ostfeld et al. 2008). The performance of the proposed leak localization scheme is evaluated in a scenario perturbed by noise and uncertainties in the demand estimation. The training process incorporates the residual signal derived from a leak-free baseline or a system with preexisting leaks, enabling the method to specifically locate newly emerging leaks rather than those already present. Also, neither multiple nor non-concurrent leaks are considered.

Leak localization methods have improved significantly in recent years, but their performance under real-world conditions still faces important challenges. Traditional model-based approaches, although theoretically solid, can sometimes be ineffective when dealing with practical issues such as changes in water demand, sensor noise, and inaccuracies in hydraulic models. On the other hand, data-driven methods require large training datasets that cover all possible leak scenarios, which are difficult to achieve. Hybrid approaches attempt to combine the strengths of both strategies, but most still rely on a single classifier, limiting their ability to handle the complexity and variability of leak patterns across different network configurations. This study proposes a new solution using a fusion of classifiers within a hybrid model-based/data-driven framework. The main contributions of this method are as follows: (i) it combines three classifiers (Decision Tree, MLP, and KNN) using both MV and NB fusion to leverage their complementary strengths; (ii) it increases robustness by generating synthetic leak scenarios, reducing dependence on large training datasets; and (iii) it achieves superior performance in benchmark tests under noisy and uncertain conditions. This fusion approach consistently outperforms single-classifier methods, especially in real networks with high variability. Its flexible design supports future extensions to multiple simultaneous leaks and dynamic network conditions, representing a solid step toward real-world implementation in water distribution systems.

The rest of the paper is organized as follows: The Preliminaries section presents the WDN modeling, the number of sensors considered, and the placement procedure used to define the instrumentation scheme for the proposed leakage localization algorithm. The Methods section describes the proposed fusion scheme for leak localization in WDS. The Results section evaluates the performance of the proposed fusion scheme through benchmark problem results. The Discussion section analyzes these findings in context. Finally, the Conclusions section summarizes the main contributions of this study.

This section introduces the hydraulic simulation scheme, leak model, sensor number, and placement methods considered in this study.

Governing equations

The flow rate and pressure analysis of a WDN can be performed using different model approaches, such as inertial, transient, static, and quasi-static models. The inertial and transient models are characterized by partial differential equations, while the static model analyzes steady-states through a system of algebraic equations. Static models can be extended to different periods of simulation with the superposition of static simulations in time (quasi-static models) with different boundary conditions. In general, static and quasi-static models are used for practical applications, since the daily network management (analysis, design, and operation) can be addressed with this kind of model. Furthermore, transient effects dissipate quickly, and transient models may be computationally expensive (Cabrera & Vela 2013).

The static network analysis is based on the mass (1) and energy (2) equations:
(1)
where j is the set of pipes connected to the node i, P is the total number of pipes, is the flow demand at the node i, is the flow rate at the node i from the pipe j, and N is the total number of nodes. On the other hand, the energy equation involves energy balances, work, power, and machines that interact with flowing fluids and the WDN. The energy equation, from the node i in the pipe j, can be expressed as:
(2)
where n is a flow exponent which depends of the friction equation, r is the friction coefficient, m is the local loss, and is the nodal head at the ith node and the head at the end of the jth pipe, respectively (Rossman et al. 2002). The pressure head added by pumps is described by a power law:
(3)
The traditional modeling of water networks assumes that pressures in the network are a function of constant demand, i.e., the demand drives the network performance. This Demand-Driven Pressure (DDP) approach leads to improved results when pressures in the network are sufficient. However, in scenarios where low-pressure conditions are presented, e.g., faulty scenarios, consumers do not always receive their request demand which sometimes leads to negative pressures in the simulation results. In this context, works as in Giustolisi et al. (2008), Klise et al. (2017), and Todini (2010) propose a Pressure-Driven Demand (PDD) simulation to obtain a more realistic representation of the WDS in faulty scenarios. Considering a leaking scenario of a network comprising pipes with unknown discharges, nodes with unknown heads (internal nodes), and nodes with known heads e.g., tank levels, the PDD simulation can be described by the following system of equations based on (1) and (2):
(4)
where is a column vector of known nodal heads, and is a column vector of unknown pipe flow rates and heads, respectively. is a diagonal matrix of order which elements are defined by the pipelines head losses as:
(5)
with , . On the other hand, the elements related to the head pumps are defined as:
(6)
with ; . On the other hand, the network topology is described by the incidence submatrix , defined as follows:
(7)
The general incidence submatrix can be partitioned into two submatrices: , which relate the pipes to the nodes with unknown head () and the nodes with known head (). For clarity in the notation, and . Additionally, is a diagonal matrix defined according to the following PDD relationship (Klise et al. 2017):
(8)
where is the minimum service pressure required for supplying the demand, is the pressure, where the consumer should receive the desired demand. The uniqueness of the solution of system (4) requires at least one node with known head.
In the model, leaks can be added to junctions and tanks in any location. A pipe break is modeled using a leak area large enough to drain the pipe by splitting the pipe into two sections and adding a node which the leak demand is defined by the leak law proposed in Crowl & Louvar (2001) as:
(9)
where is the discharge coefficient, g the gravity acceleration, denotes the head pressure at the leak, then the exponent α should be 0.5 according to Torricelli’s law, although it can also be considered a model parameter. Other methods for estimating flow at the leak node can be found in Schwaller & van Zyl (2015).

Sensor number and placement

The performance of a monitoring system is highly dependent on available measurements. Before implementing leak localization methods, selecting optimal sensor numbers and positions is fundamental. Although installing sensors at every network node would provide maximum monitoring performance, this approach results in economically unfeasible instrumentation costs (Sarrate et al. 2014).

The placement of M sensors among N potential nodes () requires exponential time for exhaustive methods, becoming computationally infeasible for moderate-sized WDNs (Gamboa-Medina & Reis 2017). Sarrate et al. (2014) proposes a two-stage approach: (1) clustering techniques group similar fault signatures1 to eliminate correlated sensor locations, reducing redundancy and setting the maximum sensor count equal to the number of clusters; (2) formulating the placement as an optimization problem to maximize monitoring performance.

The sensitivity matrix approach (Pudar & Liggett 1992; Casillas et al. 2015; Puig et al. 2016) captures pressure deviations under leaks:
(10)
where is the nominal head pressure at node i, the pressure under leak j (magnitude ), and N the total number of nodes.
This study implements an integrated methodology that combines: The sensor placement configuration is encoded as a binary vector , where marks node i as instrumented (pressure measured) and indicates non-instrumented nodes. This binary representation enables the construction of a diagonal matrix used to compute the isolability measure through:
(11)
which evaluates the systems ability to distinguish between different leak scenarios based on the angular separation between residual vectors in the reduced measurement space. The optimization problem minimizes the error metric:
(12)
where represents the binary sensor placement vector with indicating an instrumented node and a non-instrumented node, quantifies the average isolation error across all nodes calculated as the mean of individual binary error indicators (where 0 denotes correct leak isolation and 1 indicates isolation failure), m corresponds to the total number of potential leak scenarios considered in the analysis, and specifies the required number of sensors to deploy with the constraint where N is the total number of network nodes. The optimization problem is solved using Genetic Algorithms (Casillas et al. 2015; Walters & Savic 2024).

The model-based leak localization problem aims to compare the mathematical model that describes the theoretical behavior of the WDS with pressure measurements taken by distributed sensors on the network structure. This comparison generates a residual (R) containing information that allows symptom extraction and fault isolation (Isermann 2006).

To perform the leak localization task for WDS, the residuals are obtained from the differences between the estimated pressure () from the hydraulic simulation in a leak-free scenario and the pressure measurements H. Additionally, it is usually assumed in the modeling that all demands () occur at the nodes; for this reason, nodes are considered information sources. To obtain a more realistic scenario, measurements are considered to be perturbed by noise, uncertainties in demand patterns, and leaks of different magnitudes, which can appear at any node.

In a WDS, the occurrence of a leak affects different surrounding nodes and impacts the global network’s performance. Therefore, associating a leak symptom with a specific node to determine the leak position is not a straightforward task. In this study, the selected method is based on a mixed model-based/data-driven isolation approach, as in Puig et al. (2016), where symptom extraction and leak localization are formulated as a classification task. Following this approach, a classification method needs to map the relationship between the computed symptoms and fault indicators previously generated offline through a training process (Ferrandez-Gamot et al. 2015). This is represented as a set of residual data:
(13)
where , sn is the number of sensors, is the pressure residual, and represents the class indicating the corresponding leakage node. The mapping function is defined as:
(14)
where, for any object , the classifier output produces a class label . The training process is critical for classifier performance. Since real representative data is limited, the PDD simulation is used to obtain a training dataset.

The main objective of the classification task is to efficiently identify which of the network nodes can be associated with the fault symptom. In a WDS, each node represents an information source, and each node has different temporal demand patterns. In general, training classifiers using a vast amount of data can be impractical and inefficient. Recent trends in classifier design propose using fusion techniques to improve performance. In fusion techniques, a set of classifiers is combined to provide a better and less biased output than a single classifier. Moreover, a fusion scheme can be efficient, as each classifier’s training process can be divided into smaller subsets of the data and later combined (Mangai et al. 2010).

The proposed classifier structure is illustrated in Figure 1, where the mixed model-based/data-driven method is complemented by a fusion scheme. In this configuration, well-known classification methods such as the KNN algorithm (Zhu & Nandi 2015), an MLP (Ghate & Dudul 2010), and decision tree classifiers (Zhu & Nandi 2015) are implemented. The leak localization method is based on the online computation of pressure residuals obtained from the difference between measurements provided by pressure sensors installed in the WDN and the estimated ones from a hydraulic simulation considering a leak-free scenario.
Figure 1

General leak location scheme.

Figure 1

General leak location scheme.

Close modal

The possible ways of combining classifiers depend on the output type of the individual classifiers (Kuncheva 2014). In the leak localization problem, the classifier output is a crisp label representing a network node. To evaluate the performance of fusion schemes in the leak localization problem, two fusion techniques are presented: MV and NB combination.

In MV, decisions are determined by an output matching count from the classifier pool. A class is selected when: (i) all classifiers agree on a specific output (unanimous decision), (ii) an output receives more than half the votes (simple majority), or (iii) the output receives the highest number of votes, regardless of whether it exceeds half of the total decisions (plurality vote) (Mangai et al. 2010). On the other hand, the NB combination is a decision rule based on the assumption of mutually independent classifiers and conditional probabilities (Kuncheva 2014).

Considering that the label outputs of each classifier are given by the binary vector , , in the MV decision, a class label is selected if:
(15)
where c is the number of classes. If the ith classifier selects class , then and 0 otherwise. It is important to note that MV assumes an odd number of classifiers and that the classifier outputs are independent (Kuncheva 2014).
On the other hand, the NB method assumes mutually independent classifiers, allowing the following representation (Kuncheva 2014):
(16)
where is the probability that classifier labels in class , and is the set of class labels. Then, the posterior probability needed to label sample x is:
(17)
In the practical implementation of NB in a dataset, the support for class can be calculated as:
(18)
where is a confusion matrix of size obtained by applying each classifier to a training dataset.

To illustrate the performance of the proposed Fusion Classifier schemes to deal with the leak localization in WDS, this section presents different fault scenarios generated by a PDD simulation of a real-world case study. In addition, the proposed fusion scheme performance is compared with each individual classifier.

The hydraulic model of the case study corresponds to the Network 1 used to evaluate the algorithm performance in the contest ‘The Battle of the Water Sensor Networks’, BWSN,2Ostfeld et al. (2008). This network consists of 126 nodes, eight valves, two tanks, two reservoirs, three pumps, and 168 pipes whose Nominal Diameter (ND) varies from 203.2 to 609.6 mm. The average pressure in the distribution network is 72 mWG (meters of water column, equivalent to 72,000 Pa), as illustrated in Figure 2, Klise et al. (2017).
Figure 2

Pressure in network under study [mWG].

Figure 2

Pressure in network under study [mWG].

Close modal
The global expected demand of the network is illustrated in Figure 3 by considering 96 hours and computed as the sum of demand at each junction.
Figure 3

Global expected demand in network under study.

Figure 3

Global expected demand in network under study.

Close modal
Before implementing the fusion scheme, it is necessary to define the number of sensors and their position. First, to determine the number of sensors, the clustering techniques FCM, ECMF, and KFCM are used to determine the similitude between fault signatures provided by the sensitivity matrix. Since the natural clusters are known, the cluster algorithms are executed to evaluate the cluster quality of each execution through different validity indexes, where different cluster numbers are considered. In the case study, it is considered an iterative scheme in a range of three to 10 cluster numbers. The considered cluster validity index is: Xie-Beni index for the FCM method (Miyamoto et al. 2008), the ECM validity index proposed in Masson & Denœux (2008) for the ECM method, and the traces of covariance matrix for the KFCM (Girolami 2002) method. The cluster number determines the appropriate number, which minimizes the indices in the considered range. In Figure 4 illustrates the cluster number obtained for each method, where it is observed that two of the three of the considered methods match that the appropriate number of groups is three. Given that the number of clusters is three, the number of sensors to be employed is also three.
Figure 4

Validity index of clustering techniques.

Figure 4

Validity index of clustering techniques.

Close modal
Once the cluster number is defined, the sensor placement problem is addressed as an optimization task as in Casillas et al. (2015). Building on this framework, a GA algorithm composed by a Roulette wheel selection, a uniform recombination defined by a crossover selected of the set [0,1], and a bit Flip mutation is implemented to find a near-optimal sensor position (Man et al. 2001). Figure 5 shows the obtained hard clustering from the KFCM algorithm and the position defined by the GA algorithm configured with a recombination crossover of 0.7, a population size of 30 genes, and as stop criteria 100 iterations for evolutionary operations.
Figure 5

Hard classification of fault signatures and sensor position in Network under study.

Figure 5

Hard classification of fault signatures and sensor position in Network under study.

Close modal

It is important to point out that the population (sensor locations) in this optimization method can be configured by the user to exclude nodes where sensor installation may not be feasible in a real-world network due to practical constraints. This can be achieved by defining a restricted search space in the algorithm, where only eligible nodes are considered during the optimization process. Constraints can be incorporated by assigning a predefined list of feasible locations or by introducing penalty functions that discourage the selection of non-viable nodes.

Classifiers training

The design of the classification techniques of the fusion scheme requires a previous offline stage of training. This stage is essential since the availability of representative data is a necessary condition to obtain a better classifier. However, obtaining the amount of data necessary in a real WDN can be limited by the amount of leak event information stored by the operating agency. One way to obtain a complete set of training data is through the use of a hydraulic model (Soldevila et al. 2017) as WNTR simulator. A scheme similar to that of Figure 1 is employed for data generation, where the real network is replaced by a noise-disturbed simulation, leaks of different magnitudes, and an uncertainty percentage for demand estimations, as illustrated in Figure 6. In order to obtain a realistic scenario, it is assumed that the pressure measurements are perturbed by Gaussian noise with ±5% variance with respect to the mean value of the pressure measurements and uncertainty in the demand estimation of 5%. Additionally, the leaks are generated in different positions along the pipes, with varying magnitudes ranging from 0.005 m to 0.05 m in diameter for the leak orifices.
Figure 6

General scheme of data generation.

Figure 6

General scheme of data generation.

Close modal

In the fusion schemes, it is necessary to define the tuning parameters of each algorithm. In the KNN algorithm the number of nearest neighbors, for the MLP classifier the hidden layers number, and the maximum number of iterations, and in the SVM algorithm the γ parameter of the RBF. Also, it is important to consider the estimation uncertainties and the noise in the measurements affect the performance of the classification methods and it is possible to obtain a poor performance despite a good tuning of the classifiers. To smooth the uncertainties and noise measured considered in the proposed algorithms testing scenario, the confusion matrix information is used in the window time or time horizon (Ferrandez-Gamot et al. 2015). The ith column of the confusion matrix contains the probabilities of a leak being present in the node i, when the classifier predicts that the leak is in the node j, according to the available information for a time instant t. The sum of the column vector of the confusion matrix along the window time is performed to obtain the most probable leak position. On the other hand, to define the tuning parameters of each algorithm, a test simulates different leaks and considers different tuning parameters for the classification methods. Taking into account the information of the test, the values of each adjustment parameter where the best performance is presented in Table 1.

Table 1

Tuning parameters of classification algorithms

AlgorithmParameterValue
KNN Neighbors number 
 Distance metric Euclidean 
MLP Activation function Hyperbolic tangent (tanh) 
 Maximum iterations 1,500 
 Hidden layers 200 
Decision Tree Classifier Criterion Gini 
 Maximum depth 100 
AlgorithmParameterValue
KNN Neighbors number 
 Distance metric Euclidean 
MLP Activation function Hyperbolic tangent (tanh) 
 Maximum iterations 1,500 
 Hidden layers 200 
Decision Tree Classifier Criterion Gini 
 Maximum depth 100 

To demonstrate the effectiveness of the fusion scheme in the case study, we present the results obtained from 1,000 distinct tests. In each test, leakage is considered to appear in any node and with sizes ranging from 0.005 m to 0.05 m leakage pipe diameter. Also, it is considered that pressure measurements are affected by Gaussian noise with amplitude of ±10% against the mean value of pressure residuals and ±10% uncertainty of nodal pressure estimations. An example of the measurements obtained for the sensor is presented in Figure 7, where a leak is introduced in time = 84 h (denoted by the dotted red line). The sample time of sensor measurements is 10 min and the window time considered for the smooth noise is composed for the last five measurements, considering the total simulation time of 96 h. Furthermore, Figure 8 shows some results of the classification algorithms.
Figure 7

Pressure measurements in study case in the nodes 22, 75, and 115, respectively.

Figure 7

Pressure measurements in study case in the nodes 22, 75, and 115, respectively.

Close modal
Figure 8

Representative results of the classification algorithms.

Figure 8

Representative results of the classification algorithms.

Close modal
The results illustrated in Figure 9 show that, among the evaluated classifiers, the Voting Classifier is the most effective in terms of accuracy, precision, and F1-score. It achieves an accuracy of 0.80, a precision of 0.72, and an F1-score of 0.84. Additionally, it demonstrates a mean distance of 7.05, indicating better proximity of predictions to the leak points. This classifier achieves a high proportion of correct predictions, as shown by its accuracy and precision scores. Although Naive Bayes Fusion was also evaluated, its performance was less effective compared to the Voting Classifier. Specifically, its classification metrics did not reach the effectiveness observed with the Voting Classifier, particularly regarding prediction proximity to leak points, with a mean distance of 8.53. On the other hand, the Decision Tree performed better than the individual classifiers. Although it does not match the performance of the Voting Classifier across all metrics, the Decision Tree can identify patterns in the data and provides a good approximation of leak locations, with an accuracy of 0.76, a precision of 0.68, an F1 score of 0.81, and a mean distance of 7.58. It is important to note that the performance metrics were calculated as outlined in the Scikit-learn documentation on model evaluation, scikit-learn developers (2024).
Figure 9

Comparison of classifier performance metrics and mean distance.

Figure 9

Comparison of classifier performance metrics and mean distance.

Close modal

The results obtained from the fusion classifier schemes for leak localization in WDS illustrate that the Voting Classifier outperforms the other classifiers in terms of accuracy, precision, and F1-score. Its superior performance, with an accuracy of 0.80, precision of 0.72, and F1-score of 0.84, suggests that combining multiple classifiers can lead to improved overall performance in leak localization tasks. Moreover, the mean distance of 7.05 shows that the Voting Classifier provides better proximity to actual leak locations.

On the other hand, the Naive Bayes Fusion classifier shows lower performance compared to the Voting Classifier. Its mean distance of 8.53 indicates that it is less effective in terms of prediction accuracy and proximity to the true leak points. The lower performance could be due to the tuning parameters, which could be further analyzed in future studies.

The Decision Tree classifier, though not as effective as the Voting Classifier, demonstrated good results when compared to the individual classifiers. This method achieves an accuracy of 0.76, precision of 0.68, and an F1-score of 0.81, providing a good approximation of leak locations. Considering the above, a fusion method such as Random Forest could be explored as a potential approach to improve accuracy.

The observed results highlight the importance of selecting the appropriate classifier and tuning its parameters to the specific characteristics of the WDS. Additionally, the incorporation of uncertainty and noise in the pressure measurements during the data generation phase added complexity to the problem but also helped assess the robustness of each classifier in real-world scenarios.

A fundamental aspect influencing the performance of the proposed method is the simulation time. Large simulation times can provide more data to train the classifier which and potentially improves the accuracy of leak localization. This benefit comes at the cost of increased computational effort, which could affect the overall system performance, especially in real-time applications. In this regard, future work could explore optimizing the balance between simulation time and the quality of the leak estimation results, ensuring that the technique remains effective in both practical and time-sensitive environments.

This study presents a mixed model-based fusion approach to locate leaks in WDNs, focusing on identifying new leaks. The following three classifiers are used: the KNN algorithm, the MLP, and the Decision Tree classifier. Their outputs are combined through fusion techniques, specifically the Voting Classifier and Naive Bayes Fusion. The results underscore the benefits of using fusion methods over individual classifiers. The Voting Classifier emerges as the most effective, achieving 0.80 accuracy, 0.84 F1-score, and an average distance of 7.05 from the actual leak location. Although Naive Bayes Fusion yields slightly lower metrics, confirming that model fusion enhances both accuracy and robustness in leak localization. For future study, two main directions are proposed: first, refining the design of the Naive Bayes Fusion method to improve its performance and second, extending the evaluation framework to include scenarios with multiple simultaneous leaks, to better align with real-world operating conditions.

We gratefully acknowledge the financial support and facilities provided by the Tecnológico Nacional de México (TecNM), TecNM Campus Zacatecas Norte, TecNM campus Chihuahua and Universidad Autonóma de Zacatecas (UAZ). This study was published with the support of the Instituto de Innovación y Competitividad of the Secretaría de Innovación y Desarrollo Económico del Estado de Chihuahua.

1

Pressure differences between leak-free and leak-affected states.

2

The Network 1 is a real network which its location is hidden to avoid advantages in the contest.

This work was published with the support of the INSTITUTO DE INNOVACIÓN Y COMPETITIVIDAD de la SECRETARÍA DE INNOVACIÓN Y DESARROLLO ECONÓMICO of the state of Chihuahua, México.

All authors declare that this manuscript is original and has not been submitted to another journal for simultaneous consideration, nor has it been previously published in any language or format. The study is presented in full, without being divided into separate parts, and the results are honestly reported, without fabrication, falsification, or inappropriate data manipulation. Only open-source software was used during the research, and all authors agreed with the designation of the corresponding author and the established authorship order.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Cabrera
E.
&
Vela
A. F.
(
2013
)
Improving Efficiency and Reliability in Water Distribution Systems
.
B.V., Heidelberg, Germany
:
Springer Science Bussines Media
.
Crowl
D. A.
&
Louvar
J. F.
(
2001
)
Chemical Process Safety: Fundamentals with Applications
.
New Jersey, USA
:
Pearson Education
.
Ferrandez-Gamot
L.
,
Busson
P.
,
Blesa
J.
,
Tornil-Sin
S.
,
Puig
V.
,
Duviella
E.
&
Soldevila
A.
(
2015
)
Leak localization in water distribution networks using pressure residuals and classifiers
,
IFAC-PapersOnLine
,
48
(
21
),
220
225
.
Gamboa-Medina
M. M.
&
Reis
L. F. R.
(
2017
)
Sampling design for leak detection in water distribution networks. In: Procedia Engineering, Vol. 186. Elsevier. pp. 460–469. Cartagena, Colombia: XVIII International Conference on Water Distribution Systems, WDSA2016
.
Ghate
V. N.
&
Dudul
S. V.
(
2010
)
Optimal MLP neural network classifier for fault detection of three phase induction motor
,
Expert Systems with Applications
,
37
(
4
),
3468
3481
.
Girolami
M.
(
2002
)
Mercer kernel-based clustering in feature space
,
IEEE Transactions on Neural Networks
,
13
,
780
784
.
Giustolisi
O.
,
Savic
D.
&
Kapelan
Z.
(
2008
)
Pressure-driven demand and leakage simulation for water distribution networks
,
Journal of Hydraulic Engineering
,
134
(
5
),
626
635
.
Heckemann
R. A.
,
Hajnal
J. V.
,
Aljabar
P.
,
Rueckert
D.
&
Hammers
A.
(
2006
)
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
,
NeuroImage
,
33
(
1
),
115
126
.
Isermann
R.
(
2006
)
Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance
.
Heidelberg, Germany
:
Springer Science & Business Media
.
Jara-Arriagada
C.
&
Stoianov
I.
(
2024
)
Pressure-induced fatigue failures in cast iron water supply pipes
,
Engineering Failure Analysis
,
155
,
107731
.
Klise
K.
,
Hart
D.
,
Moriarty
D.
,
Bynum
M. L.
,
Murray
R.
,
Burkhardt
J.
&
Haxton
T. M.
(
2017
)
Water Network Tool for Resilience (WNTR) User Manual. Technical report, United States Environmental Protection Agency, informacion faltante
.
Kuncheva
L. I.
(
2014
)
Combining Pattern Classifiers: Methods and Algorithms
, 2nd edn.
New Jersey, USA
:
John Wiley & Sons
.
Leu
S. S.
&
Bui
Q. N.
(
2016
)
Leak prediction model for water distribution networks created using a Bayesian network learning approach
,
Water Resources Management
,
30
,
2719
2733
.
Man
K.-F.
,
Tang
K. S.
&
Kwong
S.
(
2001
)
Genetic Algorithms: Concepts and Designs
.
London, UK
:
Springer Science & Business Media
.
informacion faltante
.
Mangai
U.
,
Samanta
S.
,
Das
S.
&
Chowdhury
P.
(
2010
)
A survey of decision fusion and feature fusion strategies for pattern classification
,
IETE Technical Review
,
27
(
4
),
293
.
Mar
N. S.
,
Yarlagadda
P. K.
&
Fookes
C.
(
2011
)
Design and development of automatic visual inspection system for PCB manufacturing
,
Robotics and Computer-Integrated Manufacturing
,
27
,
949
962
.
Masson
M. H.
&
Denœux
T.
(
2008
)
ECM: an evidential version of the fuzzy C-means algorithm
,
Pattern Recognition
,
41
(
4
),
1384
1397
.
Meniconi
S.
,
Maietta
F.
,
Alvisi
S.
,
Capponi
C.
,
Marsili
V.
,
Franchini
M.
&
Brunone
B.
(
2022
)
Consumption change-induced transients in a water distribution network: laboratory tests in a looped system
,
Water Resources Research
,
58
(
10
),
e2021WR031343
.
Meniconi
S.
,
Rubin
A.
,
Tirello
L.
,
Doro
A.
,
Brunone
B.
&
Capponi
C.
(
2024
)
Mapping pressure surge source in urban water networks: integrating low-and high-frequency pressure data with an illustrative real case study
,
Water Resources Research
,
60
(
8
),
e2023WR036773
.
Miyamoto
S.
,
Ichihashi
H.
&
Honda
K.
(
2008
)
Algorithms for Fuzzy Clustering Methods in c-Means Clustering with Applications
.
Berlin, Heidelberg, Germany
:
Springer
.
Ostfeld
A.
,
Uber
J. G.
,
Salomons
E.
,
Berry
J. W.
,
Hart
W. E.
,
Phillips
C. A.
,
Watson
J.-P.
,
Dorini
G.
,
Jonkergouw
P.
,
di Pierro
F.
,
Khu
S.-T.
,
Savic
D.
,
Eliades
D. G.
,
Polycarpou
M. M.
,
Ghimire
S. R.
,
Barkdoll
B. D.
,
Gueli
R.
,
Huang
J. J.
,
McBean
E. A.
,
James
W.
,
Krause
A.
,
Leskovec
J.
,
Isovitsch
S.
,
Xu
J.
,
Guestrin
C.
,
VanBriesen
J.
,
Small
M.
,
Fischbeck
P.
,
Preis
A.
,
Propato
M.
,
Piller
O.
,
Trachtman
G. B.
,
Wu
Z. Y.
&
Walski
T.
(
2008
)
The battle of the water sensor networks (BWSN): a design challenge for engineers and algorithms
,
Journal of Water Resources Planning and Management
,
134
(
6
),
556
568
.
Pérez
R.
,
Puig
V.
,
Pascual
J.
,
Quevedo
J.
,
Landeros
E.
&
Peralta
A.
(
2011
)
Methodology for leakage isolation using pressure sensitivity analysis in water distribution networks
,
Control Engineering Practice
,
19
,
1157
1167
.
Pérez
R.
,
Sanz
G.
,
Puig
V.
,
Quevedo
J.
,
Cuguero Escofet
M.
,
Nejjari
F.
,
Meseguer
J.
,
Cembrano
G.
,
Mirats Tur
J.
&
Sarrate
R.
(
2014
)
Leak localization in water networks: a model-based methodology using pressure sensors applied to a real network in barcelona
,
IEEE Control Systems
,
34
,
24
36
.
Pudar
R. S.
&
Liggett
J. A.
(
1992
)
Leaks in pipe networks
,
Journal of Hydraulic Engineering
,
118
,
1031
1046
.
informacion faltante. Reston, VA, USA: American Society of Civil Engineers
.
Puig
V.
,
Duviella
E.
,
Soldevila
A.
,
Fernandez-Canti
R.
,
Blesa
J.
&
Tornil-Sin
S.
(
2016
)
Leak localization in water distribution networks using a mixed model-based/data-driven approach
,
Control Engineering Practice
,
55
,
162
173
.
Rezaei
H.
,
Ryan
B.
&
Stoianov
I.
(
2015
)
Pipe failure analysis and impact of dynamic hydraulic conditions in water supply networks
,
Procedia Engineering
,
119
,
253
262
.
Rokach
L.
(
2014
)
Pattern Classification Using Ensemble Methods
.
New Jersey, USA
:
World Scientific
.
Rossman
L. A.
,
Woo
H.
,
Tryby
M.
,
Shang
F.
,
Janke
R.
&
Haxton
T.
(
2002
)
Manual del usuario de epanet 2.2. Washington, DC, USA: US Environmental Protection Agency EPA
.
Sarrate
R.
,
Blesa
J.
&
Nejjari
F.
(
2014
)
Clustering techniques applied to sensor placement for leak detection and location in water distribution networks. In: 22nd Mediterranean Conference on Control and Automation, 2014. Palermo, Italy: IEEE. pp. 109–114
.
Schwaller
J.
&
van Zyl
J. V.
(
2015
)
Modeling the pressure-leakage response of water distribution systems based on individual leak behavior
,
Journal of Hydraulic Engineering
,
141
(
5
),
04014089
.
scikit-learn developers
(
2024
)
Scikit-learn model evaluation documentation. (Online; accessed 1 January 2024)
.
Soldevila
A.
,
Fernandez-Canti
R. M.
,
Blesa
J.
,
Tornil-Sin
S.
&
Puig
V.
(
2017
)
Leak localization in water distribution networks using Bayesian classifiers
,
Journal of Process Control
,
55
,
1
9
.
Todini
E.
(
2010
)
A more realistic approach to the “extended period simulation” of water distribution networks. In: Advances in Water Supply Management. London, UK: Taylor & Francis
.
Tsai
C.-F.
(
2014
)
Combining cluster analysis with classifier ensembles to predict financial distress
,
Information Fusion
,
19
,
40
49
.
Valizadeh
S.
,
Moshiri
B.
&
Salahshoor
K.
(
2009
)
Leak detection in transportation pipelines using feature extraction and KNN
,
Proceedings of the International Conference on Computer Engineering and Applications
,
2
,
348
351
.
Vrachimis
S. G.
,
Timotheou
S.
,
Eliades
D. G.
&
Polycarpou
M. M.
(
2021
)
Leakage detection and localization in water distribution systems: a model invalidation approach
,
Control Engineering Practice
,
110
,
104755
.
Vrachimis
S. G.
,
Eliades
D. G.
,
Taormina
R.
,
Kapelan
Z.
,
Ostfeld
A.
,
Liu
S.
,
Kyriakou
M.
,
Pavlou
P.
,
Qiu
M.
&
Polycarpou
M. M.
(
2022
)
Battle of the leakage detection and isolation methods
,
Journal of Water Resources Planning and Management
,
148
(
12
),
04022068
.
Walters
G. A.
&
Savic
D. A.
(
1996
)
Recent applications of genetic algorithms to water system design
, London, UK:
WIT Transactions on Ecology and the Environment
,
18
,
151
160
.
World Water Assessment Programme United Nations and UN-Water
(
2009
)
Water in a Changing World
,
Vol. 1
.
Earthscan
.
Zhu
Z.
&
Nandi
A. K.
(
2015
)
Automatic Modulation Classification: Principles, Algorithms and Applications
.
Chichester, UK
:
John Wiley & Sons
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).