Accurate and rapid leak localization in water distribution networks is extremely important as it prevents further loss of water and reduces water scarcity. A framework for identifying relevant leak event parameters such as the leak location, leakage area, and start time is presented in this paper. Firstly, the proposed data-driven methodology consists of acquiring pressure data at nodes in the network through hydraulic simulations by randomly changing the leak event initial conditions (leak location, area, and start time). Pressure uncertainties are added to the sensor measurements in order to make the problem more realistic. Secondly, the acquired data are then used to train, test, and validate a machine learning model in order to predict the relevant parameters. The random forest and the histogram-based gradient boosting machine learning algorithms are investigated and compared for the leak detection problem. The proposed approach with the histogram-based gradient boosting algorithm shows high accuracy in predicting the true leak location.

  • A machine learning-based framework for water network leak localization is presented.

  • ML models are trained with data generated by the WNTR simulator.

  • Histogram-based gradient boosting outperforms the random forests algorithm.

  • HGB achieved a leak node search space reduction of 92% with added pressure uncertainty.

  • High correlation between pressure measurements and leak area (R2 = 75%) and leak start time (R2 = 97%).

Water distribution networks (WDN) play a significant role in multiple aspects of today's society such as water supply, agriculture, farming, and energy production, making them crucial for the everyday life of every individual. Efficient and reliable water systems are very expensive and require a great amount of maintenance to function properly. Clean water is distributed using WDNs that need to be designed to realize sustainable usage and also satisfy the needs of every end user. Quality and efficiency standards need to be very high to minimize the chance of leakage which results in an estimated 126 billion cubic meters of water loss per year followed by the cost of the lost water of 39 billion USD annually (Liemberger & Wyatt (2019)). Around 20–30% up to 50% of water in a WDN is lost due to slow and steady degradation of the pipe material as described in the work by Kang et al. (2018). The biggest challenge in preventing water leakages is the ability to detect the exact location of the leakage, so a course of action needs to be taken in a timely manner to prevent large financial losses as well as losses in valuable resources. Current leakage detection systems are also prone to faults that can lead to false alarms that cause additional loss of resources.

Many leakage-detecting techniques have been used and optimized through the years. Leak detection methods can be broadly divided into active and passive approaches, based on the paper published by Puust et al. (2010). In an active approach, the leaks are detected by measuring acoustic signals, pressure signals, and vibration by analyzing flow data. The passive approach, on the other hand, relies on real-life monitoring and inspection of the WDN. Active approaches can be further divided into transient-based, model-based, and data-driven methods. Transient-based leak detection methods are based on measurements of transient pressure signals that occur in the network (Colombo et al. (2009)). The model-based approach uses mathematical modeling to simulate the network state of operation (Perez et al. (2011)). In this approach, model calibration must be conducted to ensure the provision of an accurate reflection of the actual states in the real WDN. In summary, model-based approaches involve the construction of the mathematical model, calibration of the hydraulic model, detection of the leak by comparison of the model state and the actual state of the real system, and finally leak localization.

In data-driven approaches, leaks can be detected via statistical and signal-processing analyses of the acquired data, as described in the work by Cody et al. (2020). The advantage of this approach is that it does not require the construction of a complex mathematical model. Rather it relies on detecting outlier values caused by abnormal events in the WDN (Geelen et al. (2019)). The generic framework for data approaches consists of extracting relevant information from monitoring data, which can be historical or live-updated data. The data are preprocessed and transformed using different strategies to remove outliers or erroneous values in order to facilitate subsequent analyses. The next step is to choose a suitable leak detection strategy starting from a feature set classification method, prediction classification method, unsupervised clustering method, or statistical method. The feature set classification method consists of training a simple classification model, with a machine learning algorithm to separate leak events from the normal operation data. The downside of this strategy is the requirement of a large training dataset which includes numerous known leakage events, needed for achieving high precision of a classification model.

Several published papers tackled the problem of predicting the existence of a leak in the WDN such as in the work by Fan et al. (2021) where an autoencoder artificial neural network (AEANN) is used to determine whether a leak is present in the WDN. The AEANN has an input layer of 11 nodes, each for a node of the WDN on which data are trained, and several hidden layers with an encoding layer of three nodes ending with 11 decoding nodes on the final layer. The employed strategy consists of training the model on data from the WDN without a leak and then passing input data with a leak to the model. The model outputs a reconstruction error that is compared to a threshold value indicating the presence of the leak. The accuracy of the described model depends on the compression ratio of the training data, demand uncertainty, and leak size that need to be optimized to maximize its performance.

The paper by Hu et al. (2021) proposed a leakage detection model based on density spatial clustering of application with noise (DBSCAN) and a multi-scale fully convolutional network (MFCN) (DBSCAN-MFCN) to detect leakage in a WDN. The DBSCAN is used to divide the pipes into different areas according to the leakage characteristics of the pipes. The MFCN determines the leakage location based on the residuals of the pressure and flow measurements validation. The accuracy of the proposed method was a 78, 28, and 72% improvement over support vector machine (SVM), K-nearest neighbors (KNN), and the naive Bayes classifier (NBC)-based approaches, respectively.

In the work by Ravichandran et al. (2021), a binary classifier was used to identify the leak and no-leak cases using acoustic signals. The features have been extracted from the acoustic signals, such as power spectral density and time-series data. The data have been collected from multiple cities across North America. A Multi-Strategy Ensemble Learning (MEL) using a Gradient Boosting Tree (GBT) was used as a classification model, which provides better precision in comparison with other classification models such as KNN and ANN. More improvements have been made by combining a number of GBT classifiers in a parallel ensemble method called bagging. The proposed methodology achieved a significant reduction in false-positive events.

The paper by Zhou et al. (2019) presents a hybrid intelligent method for pipeline leak detection. Firstly, signal denoising and reconstruction are performed based on local mean decomposition (LMD), where a Kernel principal component analysis (KPCA) method was used for feature dimension reduction. Subsequently, a K-means algorithm is used for the clustering of various operating modes. A cascade support vector for data division (Cas-SVDD) was used for pipeline leak detection. The Cas-SVDD is compared with a sequential support vector for data division (S-SVDD) and the results point to a reduction of the false-positive rate. The next step was to locate the leak in the exact location where it occurred in the WDN as an expansion of the already well-documented problem of guessing whether a leak is present or not.

A random forest (RF) classifier is used in the work by Lučin et al. (2021) to detect the node in which the leak happened. This paper was a continuation of the work by Grbčić et al. (2020) where an RF classifier was used to find the injection source of a pollutant in the WDN. In the work by Lučin et al. (2021), the RF classifier was trained on a large number of simulation results to predict the node with the occurring leakage. Every simulation result in the dataset was obtained using the engineering pressure analysis network hydraulic simulation software (EPANET) with a known leak location. The RF model requires a great amount of data for its training but this problem was solved by farming a large dataset on a supercomputer. The RF model was trained on five different datasets, ranging in sizes from 100,000 to 500,000 simulations and the accuracy of the results is documented to be in the range of 86–98% in terms of true positive rate. It should be noted that the accuracy of the model greatly depends on the size of the WDN and the amount of data provided for training.

In the work by Kang et al. (2018), the solution to the problem of detecting a leak and localizing it was presented in the form of an ensemble model in which a one-dimensional convolutional neural network coupled with a support vector machine (1D-CNN-SVM) detects the leak and a graph-based local search algorithm determines the position of the leak. The proposed ensemble architecture combines the outputs of the CNN and SVM to achieve high accuracy. This particular architecture was chosen to provide a more heterogeneous ensemble that implies that the error is less correlated by model diversity than homogeneous classifier models. This technique also combines a probabilistic model with a non-probabilistic model. In the case of the trained ensemble method, all three evaluations scored very high, 0.993, 0.982, and 0.998, respectively. The signals classified as leakage signals are then fed into the localization algorithm as input to determine the location of the leakage.

In the work by Mashhadi et al. (2021), an investigation of the use of machine learning methods for leak localization in WDN was performed, where localization of the leakage was based on the creation of hydraulic zones. In this technique, each zone consisted of several sensors to measure the water supply variations and the water pressure. This methodology was used to investigate the capacity of six machine learning methods used to localize the leaks in the WDN. The models that were tested are three supervised methods, namely logistic regression (LR), decision tree (DT), and RF, and two unsupervised methods, namely hierarchical classification and a combination of principal component analysis (PCA) and K-means classification methods. Finally, the performance of an ANN was tested. The presented results illustrate excellent performances by the supervised methods for detecting leaks, resulting in 100% accuracy for the RF and LR, followed by 98% accuracy for the DT. An accuracy of 100% was achieved by the ANN model, but some difficulties were noticed when training the unsupervised methods due to overlapping clusters.

The issue of sensor placement and data collection was tackled in the work by Fan & Yu (2021), where a novel ML framework named clustering-then-localization semi-supervised learning (CtL-SSL) was developed to wrap several stages of the traditional ML training process in one solution. This framework uses the topological features of WDN and its leakage characteristics for partition and sensor placement and subsequently utilizes the monitoring data for detection and localization of the leak. The framework is applied to two WDNs and achieved 95% detection and 83% localization accuracy. The WDN partition is based on the leaking behaviors of the WDN junctions. The leakage characteristics are defined based on features extracted from non-leaking data with unsupervised ML models such as PCA and AE. The K-means method is proposed for the zone partition of the WDN. The leakage localization is implemented as an ML-based classifier for detecting the partition zones.

In the context of sensor placement, the paper by Tanyimboh (2021) shows the results of a study conducted on two benchmark WDNs, using two different genetic algorithms (GA), to investigate the impact of redundant binary codes on the performance of GA. The study finds that different mapping schemes between the genotype and phenotype spaces can lead to varying solutions and that mapping schemes that improve diversity in the population of candidate solutions achieve better results. The results suggest that mapping schemes that promote diversity in the population can lead to better results and provide guidance for the handling of redundant binary codes.

An extensive analysis was conducted in the work by Djebedjian (2021) to rank a great number of common optimization tools for WDNs. Two performance metrics ( and ) have been created to evaluate algorithms that reach optimal solutions (gcost = 1) in the literature. The first metric measures global performance based on one run, considering the number of generations and function evaluations to reach the optimal solution. The second metric, which is more realistic, measures the average global performance over multiple runs, taking into account the reliability and deviation of results from the optimal solution. The results of the literature indicate that the FDE algorithm has a superior global performance compared to other algorithms, with a higher reliability in handling medium WDN. The FDE algorithm can reach the optimal solution faster with fewer function evaluations than other algorithms.

The article presented by Shende (2019) discusses the optimization of WDN design using a meta-heuristic optimization technique. The optimization problem involves balancing hydraulic requirements and cost. A new approach based on the simple benchmarking (SBA) Algorithm called SiBANET is introduced which uses the hydraulic solver EPANET 2.0 and is able to handle discrete constraints effectively. The results of the analysis show that SiBANET is a robust algorithm with faster convergence and lower CPU time compared to other intelligent optimization algorithms (IOAs). The study focuses on single objective optimization of WDN pipe sizing but suggests further research on optimizing complex looped networks with multiple objectives.

In this paper, a novel machine learning-based framework for determining the leak position, leak area, and leakage start time in a WDN is presented and investigated. The methodology is based on the work done by Lučin et al. (2021) where high accuracy is achieved for predicting a leak event using an RF classification algorithm in a simple water distribution network. The proposed framework described in this paper is mainly focused on presenting the improvements in accuracy achieved by a histogram gradient boosting classification algorithm (HGB) in comparison to the RF model presented by Lučin et al. (2021). Also, multiple regression HGB and RF models are trained to estimate the leak area and leak start time parameters during a rupture event. The framework utilizes the water network tool for resilience (WNTR) hydraulic simulator with a leak model for massive data farming and HGB model for determining all the relevant leak scenario parameters. The accuracy of HGB is compared with the RF algorithm. The framework was tested on a benchmark WDN with 126 possible leak nodes. Furthermore, pressure uncertainties were incorporated into the leak scenarios in order to make the WDN flow conditions closer to a real-world situation.

Data farming

In this section, the process of the generation of synthetic data will be defined. The network used for data farming is Network 1 and can be seen in Figure 1. This network consists of 126 nodes and 168 pipes including 1 source, 2 tanks, 2 pumps, and 8 valves. The pressure sensor nodes, i.e. sensor layout was specifically chosen with a preliminary analysis as presented in section 3.1. All nodes have a set steady demand value, while 60% of all nodes have an unsteady demand pattern assigned to them with multipliers.

Table 1

Sensor layout names paired with the optimized positions of the sensors as investigated in Ostfeld et al. (2008) 

Sensor layout nameNodes
17, 21, 68, 79, 122 
10, 31, 45, 83, 118 
17, 31, 45, 83, 126 
126, 30, 118, 102, 34 
126, 30, 102, 118, 58 
Sensor layout nameNodes
17, 21, 68, 79, 122 
10, 31, 45, 83, 118 
17, 31, 45, 83, 126 
126, 30, 118, 102, 34 
126, 30, 102, 118, 58 

The leakage scenarios were simulated using the Python package WNTR (Klise et al. 2017), which is designed to simulate and analyze the resilience of WDN. The package provides a flexible and easy to use framework that allows changes to the network structure and operations, along with simulation of disruptive incidents and recovery actions. One of the benefits that the WNTR package provides is the functionality to explicitly add a leak to a specific node. The package offers two different simulators, namely EpanetSimulator and WNTRSimulator. The data in this paper were obtained using the WNTRSimulator with the flow in the network being pressure driven. The WNTRSimulator is built upon the same mathematical model as the EpanetSimulator but offers additional functionalities. The leak model implemented in the WNTRSimulator is defined as:
(1)
where is the leak demand (m3/s), is the dimensionless discharge coefficient, which is set to 0.75 (turbulent flow in pipes) (Lambert 2001), A is the area of the leakage hole (m2), is the dimensionless leak exponent (set as 0.5) (Lambert 2001), p is the gauge pressure (Pa), and is density of the fluid (kg/m3).
Machine learning models described in this paper (RF, HGB) require a large amount of data to predict the correct location of the leak, area of leakage, and start time of the leak. The data farming procedure was conducted on the BURA supercomputer of the Center for Advanced Computing and Modelling, University of Rijeka. The loss/break occurs on a randomly selected pipe by creating a temporary node in the middle of the pipe and then adding the leak conditions to that temporary node, as opposed to the implementation in the work by Lučin et al. (2021), where the leak can occur in the already defined nodes of the network. This was done in accordance with the assumption that it is more probable that the rupture event will happen somewhere near the middle of the pipe. The WNTR hydraulic simulator allows adding leaks only at water distribution system junctions; hence, the temporary node methodology is employed. The gathered data are formatted as tabular data in which each row represents a single hydraulic simulation in which a single rupture event occurs. Each row in the tabular data consists of pressure values for a given sensor location at a specific time. Each simulation was conducted over a time period () of 24 h with a hydraulic and report time step () of 5 min. This results in a total of 1,445 features per simulation as shown in the following equation.
(2)
where represents the number of features, while is the number of sensors. The and values are in seconds, and five features are added as they represent the initial sensor measurements at 00:00 h. To ensure that our model is able to handle a wide range of possible leak scenarios, we selected initial hydraulic conditions at random and uniformly. This rigorous approach allowed us to cover a broad range of possible leaks throughout the entire network, as we included all pipes as potential leak candidates during the data farming process.

Machine learning algorithms

A comparison in prediction accuracy is conducted between two ML algorithms. The trained algorithms, in this case, are the RF algorithm and the histogram-based gradient boosting algorithm. These algorithms are used based on their proven success in tackling high-dimensional dataset problems as described in the papers by Lučin et al. (2021) and Ravichandran et al. (2021). In the mentioned paper by Lučin et al. (2021), it is shown that an RF model is very successful in predicting the leak node in a simple WDN as the tree-based methodology aids in preventing overfitting in such feature-dense datasets. On the other hand, good performance was achieved by a GBT model in a time-series based dataset by Ravichandran et al. (2021) which led to assuming that a gradient boosting approach would be justifiable. Adopting the techniques described in this paper brings several advantages. Two of the most relevant for this paper are robustness to outliers in the dataset and the ability to handle high-dimensional data efficiently. Since a degree of noise is added during the data farming process, there may be outliers present that can negatively impact the trained model's ability to generalize. However, the RF and HGB algorithms are able to divide the feature space into smaller regions, which makes them more resistant to outliers. Additionally, these tree-based models use less memory and computational resources compared to other methods, while still being able to quickly process and make predictions on new data points, which is important for large datasets and tasks where prediction speed is a concern. Both the presented models can be used for both classification and regression tasks.

Random forest algorithm

The RF algorithm was introduced by Breiman (2001) as an ensemble algorithm that uses multiple decision trees (DTs), which are being constructed in the training process using random subsets of the data. The algorithm uses the bagging method to train each DT to de-correlate them from each other. The most important hyper-parameter for the training of the RF algorithm is the number of DTs. More decision trees trained can produce greater precision in classification or regression, taking into account the possibility of overfitting the model. The final prediction of the model is acquired by the majority of the predictions of the trained DT in the ensemble. The RF implementation in the Python machine learning module scikit-learn 0.21.3 was used (Pedregosa et al. 2011).

Histogram gradient boosting

The HGB algorithm is a variation of the well-known gradient boosting (GB) (Bentéjac et al. 2021) algorithm that is widely used for a vast number of different ML tasks regarding classification and regression. These algorithms including AdaBoost belong to a set of models called boosting algorithms that have the primary purpose of converting weak learners into strong ones. Boosting methods focus on sequentially adding and training new weak learners to correct the errors of the previous added weak learner. Each newly added weak learner is trained to avoid the errors that the previous learner made. The most commonly used weak learners are decision trees. The HGB algorithm is a boosting algorithm that was developed to resolve the biggest flaw of the GB algorithm which is its long training time when training on big datasets. This problem is solved by discretizing or binning the continuous input variables to a few hundred unique values. The most important hyper-parameter, in this case, is the learning rate of the algorithm. Great focus was directed towards algorithm optimization with multiple rounds of hyper-parameter tuning. The HGB implementation in the Python machine learning module scikit-learn 0.21.3 was used (Pedregosa et al. 2011).

Training methodology

Both RF and HGB models were trained to perform the classification task, which is the prediction of the node in which the leak has happened and the regression task which includes the prediction of the area of leakage and start time of the leakage. For each classification and regression task, a different model with a different combination of hyper-parameters had to be trained. The models were subjected to multiple rounds of random hyper-parameter search and one round of searching of the best fitting combination of hyper-parameters taking into account all possible combinations. No preprocessing was conducted on the generated data and raw data were used. In Figure 2, a flowchart of the different stages of the framework can be seen.
Figure 1

Configuration of the WDN Network 1 benchmark for leak localization. With the sensor nodes in layout A described in Table 1. The red dots indicate the sensor positions within the WDN while the blue dots indicate the positions of the tanks and the reservoir.

Figure 1

Configuration of the WDN Network 1 benchmark for leak localization. With the sensor nodes in layout A described in Table 1. The red dots indicate the sensor positions within the WDN while the blue dots indicate the positions of the tanks and the reservoir.

Close modal
Figure 2

Data farming with hydraulic simulations and machine learning model training for classification and regression. ID denotes the node id, and A denotes the leak area.

Figure 2

Data farming with hydraulic simulations and machine learning model training for classification and regression. ID denotes the node id, and A denotes the leak area.

Close modal

The classification models were trained using two methodologies. To achieve the top picks metric described in the section 2.5, a classical approach to training and testing was considered. The classification model was firstly trained on a random 84% dataset split and then tested with the remaining 16% simulations of the dataset to generate a set of predictions of the true leak node. Also, a cross-validation approach was required to present the mean and standard deviation of the accuracy to investigate the robustness of the model. For the classification model, a Stratified KFold cross-validation method was used to guarantee an optimal spread of all the node classes.

The regression models on the other were tested only using a cross-validation technique. A stratified shuffle validation split (k = 5) with enhanced randomness was used for the regression models to ensure that the data were optimally and equally distributed in the process of cross-validation. To achieve the described folds in the data the functions StratifiedKFold and StratifiedShuffleSplit implemented in scikit-learn 0.21.3. were used. The use of such strategies is particularly important as the dataset is complex in terms of the variability of each hydraulic simulation, with several randomly chosen parameters that need to be calculated during each simulation. Each fold must be as representative of the overall dataset as possible to ensure the reliability of the results. The enhanced randomness refers to using a random number generator with a high degree of unpredictability to shuffle the data rows before splitting it into folds. This helps to further reduce bias and improve the uniformity of the folds. We believe that this approach provides a reliable method for evaluating the performance of the models as it reduces the degree of overfitting by enhanced randomness. It is important to note that cross-validation is a tool used to evaluate the performance of the model and does not directly influence the outcome of the analysis.

Each row in the tabular dataset is composed of 1,445 features in the input data as described in Equation (2), while the output data consist of values derived from the simulation result.

Pressure uncertainty

To simulate a more realistic WDN scenario, a pressure uncertainty of the sensor measurement was added to randomly selected network nodes. In Figure 3, the flowchart of the algorithm for adding pressure uncertainty to the data is presented. For each network node, a random Boolean value is created, based on which a random number is sometimes created and a new value of pressure is assigned to the node in question based on the equation , where is the original pressure value (Figure 3). Taking into account the algorithm that is presented in Figure 3, the amount of nodes with the added pressure measurement uncertainty is around 50% due to the fact that the algorithm adds uncertainty based on a boolean value that can be either True or False.
Figure 3

Injecting uncertainty into pressure data for randomly selexcted nodes.

Figure 3

Injecting uncertainty into pressure data for randomly selexcted nodes.

Close modal

The previously described algorithms are then trained on two sets of created data that are farmed with two different deviations from the original pressure and . The final results of the obtained models are compared with the results obtained by training an HGB classifier and regressor with the same amount of data but with perfect data without any pressure noise.

Machine learning model accuracy metrics

The accuracy for the classification and regression models is expressed in two different metrics. To measure the performance of the classification model on multilabel data, the accuracy_score function in scikit-learn 0.21.3. was used. The metric is defined in the following equation as
(3)
where is the ML predicted value of the sample and is the true value of the same sample, while defines the total number of samples that are being predicted by the ML model.

Furthermore, due to the great number of output labels and the complexity of the problem, an additional accuracy metric for classification has been used by utilizing the scikit-learn 0.21.3. predict_proba function. Firstly, each node's probability of being the true value is obtained by the function, then the nodes are ranked by probability and the top n nodes are extracted. If the true value is within the top n extracted ML predicted values , the prediction is labeled as True Positive. The value of n was varied as 1, 3, 5, and 10, furthermore, when n = 1, it is considered to be the same as the metric defined in Equation (3).

Subsequently, for the regression models, the R2 score metric was used. The R2 score or the coefficient of determination is a measure that provides the goodness of fit of a model. It is a statistical measure of how well the regression line approximates the actual data or its mean as defined in the following equation:
(4)
where the term is the variation of the model from the actual data and is the variation of the mean from the actual data. In the context of the problem investigated within this research, the R2 score determines how well the measured pressure values at all sensors at the specified time correlate with the leak area and leak start time.

In this section, the prediction accuracy of the used classification and regression models (RF and HGB) are analyzed. Firstly, the choice of the sensor layout is assessed through comparison with several other sensor layouts taken from the previous literature (Ostfeld et al. 2008).

Secondly, the results of hyper-parameter tuning of both RF and HGB are shown. Furthermore, an investigation of RF and HGB algorithms trained with data without added pressure uncertainty is presented.

Finally, the impact of the added pressure uncertainty on the final prediction results is investigated. It is important to mention that the classification and regression accuracy are expressed through the metrics defined in subsection 2.5.

Sensor layout selection

To determine what is the best sensor layout for the network, five different sensor layouts in Network 1 are considered. Sensor layouts are taken from the paper by Ostfeld et al. (2008) and they comprise the top five most effective sensor layouts with five sensors, as given in the mentioned paper. The network is considered calibrated and its demand pattern, pressure pattern, and valve configurations are as in Ostfeld et al. (2008). In Figure 4, the different sensor layouts are named A, B, C, D, and E.

To determine the best layout for the task ahead, a preliminary analysis was made with the RF algorithm with 20,000 farmed simulation data without pressure uncertainty. A simple train and test methodology was used where the model was trained on 84% and tested on 16% of the provided data. For each sensor configuration under test, a small dataset was gathered and a simple RF model without hyper-parameter tuning was trained on each dataset. According to the training and testing results of this simple model, an assumption was made about which of the five datasets with different sensor placements would be better to further investigate. The goal was to determine which sensor layout would yield the most useful extraction of information, i.e. the percentage of the top 3, 5, and 10 picks containing the true leak node.

As seen in Figure 4, the best sensor layout is layout A (see Figure 1 for sensors positions within the WDN) with the sensors placed in nodes 17, 21, 68, 79, and 122, since the ML model was most accurate for both top 3, top 5, and top 10 predictions. Sensor layout A was originally proposed by Berry et al. (2006), and this layout was used to obtain all the training data throughout this paper and further investigate the performance of RF and HGB algorithms.
Figure 4

Prediction results obtained by training RF classifier on a dataset with different sensor layouts.

Figure 4

Prediction results obtained by training RF classifier on a dataset with different sensor layouts.

Close modal

HGB and RF hyper-parameter tuning

The hyper-parameter search was conducted in several rounds in order to find a combination that will result in an optimal combination of hyper-parameters (Tables 2 and 3). The process included several rounds of random hyper-parameter selection through multiple iterations. This was done using the RandomizedSearchCV class defined in the model_selection module which allows the evaluation of a model with a random set of hyper-parameters using cross-validation on a small subset of the dataset. Each training iteration results in a new set of hyper-parameters for the model. Using this technique, the search space was considerably narrowed down to allow a finer search. After narrowing down and reducing the search space, a more detailed procedure of parameter optimization was conducted. Using the GridSearchCV class, we were able to comb the parameter space even further. The mentioned class allows the selection of a range of values for each hyper-parameter on which it will perform a cross-validation procedure for each possible combination to find the most optimal set of parameters. The best set of parameters that was produced by RandomizedSearchCV was further analyzed. For each parameter produced by the random procedure, a range of values was selected around that value. With these newly defined ranges, a GridSearchCV procedure was conducted. This resulted in the hyper-parameter sets used in the rest of this paper. The search for the optimal combination was conducted on a 100,000 simulation dataset without added uncertainty once for every model. The hyper-parameter search procedure is very resource-consuming so it was conducted in parallel with multiple cores using the BURA supercomputer. This process resulted in the following combination of hyper-parameters for the HGB model. Both algorithms were primarily tuned to increase the accuracy of the true leak node prediction, i.e. multilabel classification.

The same process of hyper-parameter searching was used to find the optimal combination for the RF model. The results are the following:

HGB and RF prediction results without pressure uncertainty

The results presented in this section are produced by both RF and HGB models that were trained on ideal data, i.e. without pressure sensor uncertainty. The purpose of this comparison is to observe the amount of decline in accuracy due to the addition of uncertainty in the collected data and to assess the better ML algorithm for the given problem in an ideal setting. Tables 4 and 5 show the results for the HGB and RF algorithms, respectively. It can be observed from the presented results that the HGB-trained model is superior to the RF model by almost 30% when trained on one million instances of data. The presented results are obtained by evaluating the models with a Stratified KFold cross-validation procedure using five folds in the data. The maximum number of farmed instances was one million simulations but the training process was conducted starting from 100,000 to observe the learning curve and to get a sense of the upcoming plateau in learning of the models.

Table 2

Optimized hyper-parameters for the HGB model

Parameter nameParameter value
Learning rate 0.15 
L2 regularization 17 
Loss Categorical crossentropy 
Max bins 156 
Max depth 
Max iter 429 
Max leaf nodes 184 
Min sample leaf 75 
Parameter nameParameter value
Learning rate 0.15 
L2 regularization 17 
Loss Categorical crossentropy 
Max bins 156 
Max depth 
Max iter 429 
Max leaf nodes 184 
Min sample leaf 75 
Table 3

Optimized hyper-parameters for the RF model

Parameter nameParameter value
Max depth 444 
Max leaf nodes 457 
Min sample leaf 
Min samples split 65 
Number of estimators 481 
Parameter nameParameter value
Max depth 444 
Max leaf nodes 457 
Min sample leaf 
Min samples split 65 
Number of estimators 481 
Table 4

HGB prediction results without added pressure uncertainty

InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 85.03 90.35 94.12 98.30 
300k 83.25 88.69 92.23 95.97 
500k 84.63 90.07 93.57 97.07 
1 million 85.59 91.18 94.59 97.98 
InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 85.03 90.35 94.12 98.30 
300k 83.25 88.69 92.23 95.97 
500k 84.63 90.07 93.57 97.07 
1 million 85.59 91.18 94.59 97.98 
Table 5

RF prediction results without added pressure uncertainty

InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 22.58 21.99 22.56 22.60 
300k 35.86 34.50 34.87 35.33 
500k 43.57 42.23 42.76 43.25 
1 million 57.17 56.11 56.74 57.50 
InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 22.58 21.99 22.56 22.60 
300k 35.86 34.50 34.87 35.33 
500k 43.57 42.23 42.76 43.25 
1 million 57.17 56.11 56.74 57.50 

The results provide a conclusion that the HGB algorithm has a greater success in predicting the leaking node as it has the ability to process a greater amount of data in a shorter period of time due to its ability to create its own categories of features through the use of the binning method. The top nodes results are depicted in Figure 5(a) and 5(b). Furthermore, the mean accuracy and standard deviation of the Stratified KFold cross-validation for both HGB and RF are shown in Figure 6(a) and 6(b), respectively. It can be observed through the validation curve that the HGB prediction is more robust as the standard deviation of the model's repeated prediction is smaller than that of RF.
Figure 5

Top nodes probabilities for (a) HGB model and (b) RF model without pressure uncertainty.

Figure 5

Top nodes probabilities for (a) HGB model and (b) RF model without pressure uncertainty.

Close modal

HGB and RF prediction results with pressure uncertainty

In this subsection, a comparative analysis is conducted for the results obtained with the pressure measurement uncertainty of 5% and subsequently with an uncertainty of 10%. Every model is incrementally evaluated using a cross-validation technique (KFold for classification and Stratified Shuffle Split for regression models) with the total number of input data being one million. For the evaluation of the regression models, a mean of the R2 score of each fold result is used to indicate the performance. On the other hand for the classification models, the accuracy score mean of all folds is used as described in subsection 2.5. As expected, the classification and regression HGB and RF models achieved a decline in accuracy due to the introduced uncertainty in the pressure measurement.

5% pressure uncertainty

In Figure 7(a) and 7(b), the top node results for the HGB and RF model with pressure uncertainty are shown. Through the obtained results (which are also presented in Tables 6 and 7), it can be concluded that the model trained with the HGB algorithm greatly outperforms the RF model. When the model training and testing (86 and 14% split) is done with 1 million input data, it is shown that the HGB model has an accuracy (the predicted node being the actual leak node) of around 71%. The decrease in accuracy is noticeable when compared to a less realistic case when no pressure uncertainty is incorporated into the model (as presented in Table 4) when the accuracy is around 86%.
Table 6

Probabilities percentages for top nodes of the HGB model with 5% added uncertainty

Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
57.68 65.75 69.37 75.75 
66.60 72.35 75.62 80.39 
67.82 73.25 76.61 81.61 
10 69.38 75.06 78.46 83.63 
20 69.52 75.14 78.48 83.90 
30 70.30 75.85 79.45 84.5 
50 70.87 76.69 80.39 85.67 
60 70.78 76.66 80.39 85.72 
80 70.63 76.62 80.37 85.76 
100 70.98 77.06 80.80 86.05 
Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
57.68 65.75 69.37 75.75 
66.60 72.35 75.62 80.39 
67.82 73.25 76.61 81.61 
10 69.38 75.06 78.46 83.63 
20 69.52 75.14 78.48 83.90 
30 70.30 75.85 79.45 84.5 
50 70.87 76.69 80.39 85.67 
60 70.78 76.66 80.39 85.72 
80 70.63 76.62 80.37 85.76 
100 70.98 77.06 80.80 86.05 
Table 7

Probabilities percentages for top nodes of the RF model with 5% added uncertainty

Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
7.81 15.56 20.12 30.87 
11.77 20.72 27.06 37.72 
12.75 22.15 28.16 40.33 
10 13.8 23.82 30.06 42.51 
20 15.02 24.76 31.13 43.14 
30 15.13 24.54 30.66 42.79 
50 15.54 24.87 31.20 43.22 
60 15.55 24.76 30.82 42.86 
80 15.60 25.03 31.17 42.99 
100 15.48 24.88 30.93 42.72 
Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
7.81 15.56 20.12 30.87 
11.77 20.72 27.06 37.72 
12.75 22.15 28.16 40.33 
10 13.8 23.82 30.06 42.51 
20 15.02 24.76 31.13 43.14 
30 15.13 24.54 30.66 42.79 
50 15.54 24.87 31.20 43.22 
60 15.55 24.76 30.82 42.86 
80 15.60 25.03 31.17 42.99 
100 15.48 24.88 30.93 42.72 
Figure 6

Stratified KFold cross-validation accuracy mean with standard deviation for (a) HGB model and (b) RF model with no pressure uncertainty added.

Figure 6

Stratified KFold cross-validation accuracy mean with standard deviation for (a) HGB model and (b) RF model with no pressure uncertainty added.

Close modal

Furthermore, the increase in accuracy is also apparent when rankings of the top 3, 5, and 10 nodes are considered. When 100% of the farmed data is used for model training and testing (86 and 14% split), there exists a 86% certainty that the true leak node will be among the top 10 predicted nodes (presented in Figure 7(a) and Table 6). In other words, when the top 10 ranking of predicted nodes are considered, the search space for the true leak node has been narrowed down by 92%.

Figure 8(a) and 8(b) shows the validation curve of the cross validated R2 score for the leak area for the HGB and RF models, respectively. The HGB algorithm achieved a higher R2 score than the RF algorithm; however, the discrepancy is not as apparent as for the classification leak localization problem. Furthermore, an increase in the farmed data for stratified shuffle split cross-validation procedure clearly improves the R2 score of the leak area value for both algorithms. However, a trend can be observed for the HGB algorithm as 1 million input data (100% of the total input) is approaching the R2 score of 75%.
Figure 7

Probabilities for the top nodes of the (a) HGB model and (b) RF model with 5% added uncertainty. The percentages in the caption indicate the percentage of the total inputs that have been used for cross-validation.

Figure 7

Probabilities for the top nodes of the (a) HGB model and (b) RF model with 5% added uncertainty. The percentages in the caption indicate the percentage of the total inputs that have been used for cross-validation.

Close modal
Figure 9(a) and 9(b) shows the leak start time R2 score validation curves for of both HGB and RF model, respectively. It is apparent that there exists a strong correlation between the pressure measurements at the five sensors and the leak start time since the maximum value of the stratified shuffle split is around 97% for both algorithms.
Figure 8

R2 score validation curve for the (a) HGB model and (b) RF model with 5% added uncertainty for the leak area.

Figure 8

R2 score validation curve for the (a) HGB model and (b) RF model with 5% added uncertainty for the leak area.

Close modal

10% pressure uncertainty

The HGB and RF results with 10% pressure uncertainty are shown in Figure 10(a) and 10(b), respectively, and in more detail, in Tables 8 and 9, respectively. Generally, the obtained results show that the accuracy of the prediction classification and regression prediction truly is reduced by introducing pressure uncertainty, but not tremendously. When comparing the leak node prediction of the HGB model a considerable drop in precision can be observed when adding the first 5% of uncertainty, specifically more than 15% of accuracy is taken away (see Tables 4 and 6) for the top 1 probability at 1 million input simulations. However, when comparing the drop in accuracy of the model with 5% uncertainty with that of 10% pressure uncertainty, the accuracy is diminished by a negligible amount.
Table 8

Probabilities percentages for top nodes of the HGB model with 10% added uncertainty

Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
60.75 66.62 70.06 74.93 
67.37 72.43 75.58 80.56 
68.87 73.98 77.06 81.98 
10 69.89 74.9 77.90 82.89 
20 69.75 75.21 78.55 83.46 
30 69.82 75.29 78.76 84.06 
50 70.27 75.97 79.48 84.54 
60 70.52 76.21 79.71 84.83 
80 70.40 76.32 79.86 85.01 
100 70.34 76.39 80.10 85.38 
Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
60.75 66.62 70.06 74.93 
67.37 72.43 75.58 80.56 
68.87 73.98 77.06 81.98 
10 69.89 74.9 77.90 82.89 
20 69.75 75.21 78.55 83.46 
30 69.82 75.29 78.76 84.06 
50 70.27 75.97 79.48 84.54 
60 70.52 76.21 79.71 84.83 
80 70.40 76.32 79.86 85.01 
100 70.34 76.39 80.10 85.38 
Table 9

Probabilities percentages for top nodes of the RF model with 10% added uncertainty

Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
8.812 16.68 21.12 30.68 
12.02 20.70 25.97 37.04 
12.51 21.98 27.33 37.70 
10 13.44 22.51 28.69 39.55 
20 14.40 23.59 29.43 40.65 
30 15.09 24.21 29.71 40.51 
50 15.25 24.23 29.99 41.03 
60 15.26 24.19 29.91 40.91 
80 15.40 24.23 29.72 40.77 
100 15.32 24.21 29.83 40.61 
Percentages of total inputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
8.812 16.68 21.12 30.68 
12.02 20.70 25.97 37.04 
12.51 21.98 27.33 37.70 
10 13.44 22.51 28.69 39.55 
20 14.40 23.59 29.43 40.65 
30 15.09 24.21 29.71 40.51 
50 15.25 24.23 29.99 41.03 
60 15.26 24.19 29.91 40.91 
80 15.40 24.23 29.72 40.77 
100 15.32 24.21 29.83 40.61 
Figure 9

R2 score validation curve for the (a) HGB model and (b) RF model with 5% added uncertainty for the leak start time.

Figure 9

R2 score validation curve for the (a) HGB model and (b) RF model with 5% added uncertainty for the leak start time.

Close modal
On the other hand, the precision of the RF model dropped by a considerable amount (more than 40%) after training it on data with both 5% and 10% uncertainty. Furthermore, the HGB model is superior to the RF model also in the correlation of the regression parameters – leak area and leak start time, as seen in Figures 11 and 12, respectively.
Figure 10

Probabilities for the top nodes of the (a) HGB model and (b) RF model with 10% added uncertainty. The percentages in the caption indicate the percentage of the total inputs that have been used for cross-validation.

Figure 10

Probabilities for the top nodes of the (a) HGB model and (b) RF model with 10% added uncertainty. The percentages in the caption indicate the percentage of the total inputs that have been used for cross-validation.

Close modal
Figure 11

R2 score validation curve for (a) HGB model and (b) RF model with 10% added uncertainty for the leak area.

Figure 11

R2 score validation curve for (a) HGB model and (b) RF model with 10% added uncertainty for the leak area.

Close modal
Figure 12

R2 score validation curve for (a) HGB model and (b) RF model with 10% added uncertainty for the leak start time.

Figure 12

R2 score validation curve for (a) HGB model and (b) RF model with 10% added uncertainty for the leak start time.

Close modal

Proposed framework limitations

Although the proposed methodology yields good results, it has limitations. One known limitation at this moment is the fact that data farming and training of the predictive models require a great amount of computing resources, i.e. it is impossible to apply this framework on a conventional machine without spending a great amount of time on farming the required data. Based on the slope of the validation curves of some parameters (11, 12), it can be reasonably expected that adding more simulation results to the dataset would produce further improvements in model accuracy up to a point. However, it's important to keep in mind that there is a trade-off between the size of the dataset and the computational resources required to process it. Additionally, it is important to monitor the performance of the model on an independent test set to ensure that any improvements in accuracy on the validation set are not due to overfitting. Therefore, careful consideration should be made when deciding whether to increase the size of the dataset, as the additional data may not necessarily lead to significant improvements in model accuracy or may lead to computational limitations. A proposed solution to the problem would be the segmentation of the process into smaller, more manageable rounds of farming and training. Several rounds of farming and training a model could be done. Where after each farming cycle, a training portion of the farmed data is used to train the model. This process can limit the probability that the model is trained on redundant data to avoid overfitting. A strategy that can be considered is to add a bias to the random leak node selection toward a specific set of nodes to increase the generalization capacity of the model on the test set.

For larger WDN, to reduce the total computational effort of the framework, sub-zones of the network could be created with each sub-zone having a uniquely trained ML model for leak localization which only includes the nodes located within that same sub-zone. Optimal sensor position in this case should be further explored for each sub-zone to reduce cost and maximize data quality. Djebedjian (2021) proposes a comprehensive guide of available optimization techniques that can be used in a case study such as this one. On the other hand, WDN sub-zone sensor optimization can also be conducted using a genetic algorithm (GA) as proposed by Shende (2019).

In this research, an ML-based framework was developed in order to localize a leak within a water distribution network and to investigate the correlation between pressure measurements with the leak event start time and leak area. The framework consisted of massively parallel data farming of leak events with the hydraulic simulator WNTR, acquiring pressure values at sensors and then training and testing an ML algorithm (HGB and RF, both tuned for multilabel classification) in order to solve the leak localization problem. The study suggests several key takeaways:

  • The presented methodology can be used to assess the quality of pressure sensor positions within a WDN. The sensor layout that yields the highest accuracy of the true leak node localization prediction can be considered the most optimal. Henceforth, this methodology and the used metrics could be used to define a combinatorial optimization problem of the optimal sensor placement for leak localization.

  • The HGB algorithm clearly outperforms RF for the leak localization problem and is computationally efficient, which can be attributed to its binning of continuous input variables (pressure in this case) technique. However, both algorithms can be used to determine the correlation between sensor measurements and prediction accuracy of leak event parameters.

  • Increasing the uncertainty of the pressure measurements decreases the leak localization accuracy, however, the HGB algorithm managed to achieve the leak node search space reduction of 92% when both 5% and 10% uncertainty is considered, i.e. the chance that true leak node is among the 10 predicted by the ML model is 86%.

  • Through an ML regression analysis, a high correlation is achieved between the pressure sensor measurements (with the uncertainty of 5% and 10%) and both the leak area (75%) and leak start time (97%).

This study could serve as a foundation of future research which includes coupling the HGB leak localization model with simulation-calibration models which can be used to determine the leak location with higher certainty due to the narrowing down of search space by the ML model. Furthermore, the methodology could be used along with clustering methods which can determine specific sub-zones of the WDN based on hydraulic characteristics in order to build ML models for each sub-zone. Each model is specialized at recognizing anomalous events in a specific sub-zone. The specialized model at the end can be used as an ensemble rather than use one monolithic model to predict anomalies over the whole WDN. This methodology could also result in a more generalized prediction algorithm that is more robust to noise and outlier events. Another benefit to training smaller, more specialized models is the ability to use them as pre-trained models for different WDN pressure data. This could reduce the training time and farming effort in applying the methodology to other networks.

The authors acknowledge the support of the Center of Advanced Computing and Modelling at the University of Rijeka for providing computing resources.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Bentéjac
C.
,
Csörgő
A.
&
Martínez-Muñoz
G.
2021
A comparative analysis of gradient boosting algorithms
.
Artificial Intelligence Review
54
,
1937
1967
.
Berry
J.
,
Hart
W.
,
Phillips
C. A.
&
Watson
J. P.
2006
A facility location approach to sensor placement optimization
. In:
8th Annual Symposium on Water Distribution Systems Analysis
,
Cincinnati, OH
.
Breiman
L.
2001
Random forests
.
Cody
R. A.
,
Dey
P. D.
&
Narasimhan
S.
2020
Linear prediction for leak detection in water distribution networks
.
Journal of Pipeline Systems Engineering and Practice
11
,
1
16
.
Colombo
F. A.
,
Lee
P.
&
Karney
W. B.
2009
A selective literature review of transient-based leak detection methods
.
Journal of Hydro-Environment Research
2
,
212
227
.
Fan
X.
&
Yu
X.
2021
An innovative machine learning based framework for water distribution network leakage detection and localization
.
Structural Health Monitoring
0
,
1
19
.
Fan
X.
,
Zhang
X.
&
Yu
X.
2021
Machine learning model and strategy for fast and accurate detection of leaks in water supply network
.
Journal of Infrastructure Preservation and Resilience
2
, 556–568.
Geelen
C.
,
Yntema
D.
,
Molenaar
J.
&
Keesman
K.
2019
Monitoring support for water distribution systems based on pressure sensor data
.
Water Resources Management
33
,
3339
3353
.
Grbčić
L.
,
Lučin
I.
,
Kranjčević
L.
&
Družeta
S.
2020
Water supply network pollution source identification by random forest algorithm
.
Journal of Hydroinformatics
22
,
1521
1535
.
Hu
X.
,
Han
Y.
,
Yu
B.
,
Geng
Z.
&
Fan
J.
2021
Novel leakage detection and water loss management of urban water supply network using neural network
.
Journal of Cleaner Production
278
, 556–568.
Kang
J.
,
Park
Y.-J.
,
Lee
J.
,
Wang
S.-H.
&
Eom
D.-S.
2018
Novel leakage detection by ensemble CNN-SVM and graph-based localization in water distribution systems
.
IEE Transactions on Industrial Electronics
65
, 4279–4289.
Klise
K. A.
,
Bynum
M.
,
Moriarty
D.
&
Murray
R.
2017
A software framework for assessing the resilience of drinking water systems to disasters with an example earthquake case study
.
Environmental Modelling & Software
95
,
420
431
.
Lambert
A.
2001
What do we know about pressure-leakage relationships in distribution systems
. In:
IWA Conference on Systems Approach to Leakage Control and Water Distribution System Management
.
Liemberger
R.
&
Wyatt
A.
2019
Quantifying the global non-revenue water problem
.
Water Supply
19
,
831
837
.
Lučin
I.
,
Lučin
B.
,
Čarija
Z.
&
Sikirica
A.
2021
Data-driven leak localization in urban water distribution networks using big data for random forest classifier
. Mathematics
9
,
672
.
Mashhadi
N.
,
Shahrour
I.
,
Attoue
N.
,
El Khatabi
J.
&
Aljer
A.
2021
Use of machine learning for leak detection and localization in water distribution systems
.
Smart Cities
4
,
1293
1315
.
Ostfeld
A.
,
Uber
J. G.
,
Salomons
E.
,
Berry
J. W.
,
Hart
W. E.
,
Phillips
C. A.
,
Watson
J. P.
,
Dorini
G.
,
Jonkergouw
P.
,
Kapelan
Z.
,
di Pierro
F.
,
Khu
S
,
Savic
D.
,
Eliades
D.
,
Ghimire
S.
,
Barkdoll
B.
,
Gueli
R.
,
Huang
J.
,
McBean
E.
,
James
W.
,
Krause
A.
,
Leskovec
J.
,
Isovitsch
S.
,
Xu
J.
,
Guestrin
C.
,
VanBriesen
J.
,
Small
M.
,
Fishbeck
P.
,
Preis
A.
,
Propato
M.
,
Piller
O.
,
Trachtman
G.
,
Wu
Z.
&
Walski
T.
2008
The battle of the water sensor networks (BWSN): a design challenge for engineers and algorithms
.
Journal of Water Resources Planning and Management
134
,
556
568
.
Pedregosa
F.
,
Varoquaux
G.
,
Gramfort
A.
,
Michel
V.
,
Thirion
B.
,
Grisel
O.
,
Blondel
M.
,
Prettenhofer
P.
,
Weiss
R.
,
Dubourg
V.
,
Vanderplas
J.
,
Passos
A.
,
Cournapeau
D.
,
Brucher
M.
,
Perrot
M.
&
Duchesnay
E.
2011
Scikit-learn: machine learning in python
.
The Journal of Machine Learning Research
12
,
2825
2830
.
Perez
R.
,
Puig
V.
,
Pascual
J.
,
Quevedo
J.
,
Landeros
E.
&
Jordanas
L.
2011
Methodology for leakage isolation using pressure sensitivity analysis in water distribution networks
.
Control Engineering Practice
19
,
1157
1167
.
Puust
R.
,
Kapelan
Z.
,
Savic
D. A.
&
Koppel
T.
2010
A review of methods for leakage management in pipe networks
.
Urban Water Journal
7
,
25
45
.
Ravichandran
T.
,
Gavahi
K.
,
Ponnambalam
K.
,
Burtea
V.
&
Mousavi
J.
2021
Ensemble-based machine learning approach for improved leak detection in water mains
.
Journal of Hydroinformatics
23
, 307–323.
Zhou
M.
,
Zhang
Q.
,
Liu
Y.
,
Sun
X.
,
Cai
Y.
&
Pan
H.
2019
An integration method using Kernel principal component analysis and cascade support vector data description for pipeline leak detection with multiple operating modes
.
Processes
7
, 648.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).