Abstract
Leak detection and location in water distribution systems (WDSs) is of utmost importance for reducing water loss, which is, however, a major challenge for water utility companies. To this end, researchers have proposed a multitude of methods to detect such leaks in WDSs. Model-based and data-driven approaches, in particular, have found widespread uses in this area. In this paper, we reviewed both these approaches and classified the techniques used by them according to their leak detection methods. It is seen that model-based approaches require highly calibrated hydraulic models, and their accuracies are sensitive to modeling and measurement uncertainties. On the contrary, data-driven approaches do not require an in-depth understanding of the WDS. However, they tend to result in high false positive rates. Furthermore, neither of these approaches can handle anomalous variations caused by unexpected water demands.
HIGHLIGHTS
A comprehensive review of model-based and data-driven approaches for leak detection.
Generic framework of each technique is summarized and compared.
Future directions for improving the performance of leak detection is presented.
Graphical Abstract
INTRODUCTION
Detecting and locating leaks are crucial for reducing water loss in water distribution systems (WDSs) and realizing sustainable water usage. The global volume of non-revenue water (NRW) is estimated to be 126 billion cubic meters per year, and the cost/value of water lost amounts to USD 39 billion annually (Liemberger & Wyatt 2019). In addition to water loss, leaks can also contaminate water supplies and adversely affect human health (Kouchi et al. 2017), or they might damage public infrastructure such as roads. Generally, the amount of water loss due to leakage depends on the time elapsed between the occurrence of the leak and its detection by the water utility company (Bakker et al. 2014a). Water leakages in WDSs are typically classified into reported, unreported, and background types (Lambert 1994). Generally, reported leakages can be detected very quickly via complaint hotlines, thereby ensuring their quick repair. However, unreported or background leakages may continue for a much longer time before they are discovered, which significantly increases the amount of water loss. Therefore, to reduce water loss, the leak duration (Water Research Centre (WRc) 1994) must be minimized.
A large number of leak detection and location techniques have already been applied by researchers, and it is rather difficult to consider all of them according to a single classification. Figure 1 attempts to summarize the different leakage management and detection methods reviewed in the current literature (Puust et al. 2010; Li et al. 2015; Adedeji et al. 2017; AL-Washali et al. 2020). Leak detection methods may be broadly divided into active and passive approaches, based on the technique used for leak detection (Puust et al. 2010). In a passive approach, leaks are detected via in-situ visual inspection or monitoring; in an active approach, leaks are detected by performing signal analyses on measurements like acoustic signals, vibrations, pressure, and flow data. Because the passive approach cannot be used for continuous leak monitoring and typically involves long testing times, the active approach is much more commonly used for leak detection. Active approaches can be further divided into transient-based approaches, model-based approaches, and data-driven approaches. Transient-based leak detection is performed by measuring transient pressure signals (Colombo et al. 2009). The model-based approach typically uses mathematical equations to represent scenarios in a WDS (Almandoz et al. 2005). In these approaches, the location of a leak may be approximated by comparing the measurements and WDS model estimations (Young & Tych 1996). In a data-driven approach, leaks are detected by applying signal processing techniques or statistical analysis to the acquired data, which do not require a profound understanding of WDS (Mounce et al. 2002). Owing to the proliferation of technology, smart sensors with both wired and wireless communication technologies have been widely used in leak detection (Adedeji et al. 2017), which has made it possible to detect and locate leakages in a WDS using real-time pressure, flow, consumer demand, or acoustic data.
Different leakage management and detection methods in the current literature.
Puust et al. (2010) provided a comprehensive review of leakage management methods and further divided them into three categories: leakage assessment (quantifying the amount of water lost), leakage detection (detection of leakage hotspots) and leakage control methods (effective control of current and future leakage levels). Other reviews focus on one or more of them: assessment of water losses (Gupta & Kulat 2018; AL-Washali et al. 2020), leakage detection (Colombo et al. 2009; Li et al. 2015; Datta & Sarkar 2016; Abdulshaheed et al. 2017; Adedeji et al. 2017; Wu & Liu 2017; Chan et al. 2019; Zaman et al. 2020), and condition assessment of water pipes (Liu & Kleiner 2013; Dawood et al. 2020; Hawari et al. 2020). Various leak detection methods are summarized and classified from different perspectives, such as transient-based methods (Colombo et al. 2009), hardware- and software-based methods (Li et al. 2015; Zaman et al. 2020), various pipeline fault detection methods (vibration analysis, pulse echo methodology, acoustic techniques, negative pressure wave, support vector machine, interferometric fiber sensor and filter diagonalization method) (Datta & Sarkar 2016), pressure-based methods (Abdulshaheed et al. 2017), externally- and internally-based methods (Adedeji et al. 2017), data-driven methods (Wu & Liu 2017), and current and proposed intelligent methods (Chan et al. 2019). More details on these articles can be found in Table 1. Colombo et al. (2009) have reviewed and classifed transient-based methods. Therefore, in this paper, we discuss model-based and data-driven methods in detail.
Summary of current reviews about leakage detection and location methods
References . | Focus on . | Classification . | Remarks . |
---|---|---|---|
Colombo et al. (2009) | Transient-based leak detection methods | Transient leak detection | Comprehensive review of transient-based methods field testing needed |
Inverse-transient analysis | |||
Frequency domain techniques | |||
Direct transient analysis | |||
Li et al. (2015) | Burst/leakage detection and location | Hardware-based methods | Higher detection accuracy |
Software-based methods | Timelier | ||
Datta & Sarkar (2016) | Pipeline fault detection methods | Blockage detection techniques | Acoustic reflectometry is found most suitable |
Leakage detection techniques | |||
Abdulshaheed et al. (2017) | Pressure-based leakage detection method | Inverse-transient analysis method | Hydraulic leak detection has the important advantage of being less costly and has a faster response compared to other leak detection approaches |
Transient steady state method | |||
Transient damping method | |||
Inverse resonance method | |||
Pressure-flow deviation method | |||
Negative pressure wave method | |||
Pressure residual vector method | |||
Adedeji et al. (2017) | Leakage detection and localization | Externally-based methods | Practical application of these techniques for large-scale WDNs is still a major concern |
Internally-based methods | |||
Leak localization | |||
Utilization of wireless sensor networks | |||
Wu & Liu (2017) | Data-driven approaches for burst detection | Classification method | More comprehensive performance evaluation needed |
Prediction-classification method | |||
Statistical method | |||
Chan et al. (2019) | Leakage detection | Current technologies | Higher leakage localization accuracy is still a difficult task |
Intelligent methodologies | |||
Zaman et al. (2020) | Steady-state leakage detection strategies | Hardware-based methods | Hybrid technique provides better accuracy in leak detection and less false alarms |
Software-based methods | |||
Hybrid leak detection techniques |
References . | Focus on . | Classification . | Remarks . |
---|---|---|---|
Colombo et al. (2009) | Transient-based leak detection methods | Transient leak detection | Comprehensive review of transient-based methods field testing needed |
Inverse-transient analysis | |||
Frequency domain techniques | |||
Direct transient analysis | |||
Li et al. (2015) | Burst/leakage detection and location | Hardware-based methods | Higher detection accuracy |
Software-based methods | Timelier | ||
Datta & Sarkar (2016) | Pipeline fault detection methods | Blockage detection techniques | Acoustic reflectometry is found most suitable |
Leakage detection techniques | |||
Abdulshaheed et al. (2017) | Pressure-based leakage detection method | Inverse-transient analysis method | Hydraulic leak detection has the important advantage of being less costly and has a faster response compared to other leak detection approaches |
Transient steady state method | |||
Transient damping method | |||
Inverse resonance method | |||
Pressure-flow deviation method | |||
Negative pressure wave method | |||
Pressure residual vector method | |||
Adedeji et al. (2017) | Leakage detection and localization | Externally-based methods | Practical application of these techniques for large-scale WDNs is still a major concern |
Internally-based methods | |||
Leak localization | |||
Utilization of wireless sensor networks | |||
Wu & Liu (2017) | Data-driven approaches for burst detection | Classification method | More comprehensive performance evaluation needed |
Prediction-classification method | |||
Statistical method | |||
Chan et al. (2019) | Leakage detection | Current technologies | Higher leakage localization accuracy is still a difficult task |
Intelligent methodologies | |||
Zaman et al. (2020) | Steady-state leakage detection strategies | Hardware-based methods | Hybrid technique provides better accuracy in leak detection and less false alarms |
Software-based methods | |||
Hybrid leak detection techniques |
In this review, we provide a detailed description of model-based and data-driven approaches for leak detection. We will classify the techniques that fall under these categories according to their respective leak detection method and summarize the generic processes used by them. We will then compare a variety of techniques in terms of their performance, analyze their weaknesses, and suggest new directions for future research.
MODEL-BASED APPROACHES
In model-based leak detection techniques, a hydraulic model of the WDS is used to simulate their state of operation (Pérez et al. 2011). Once this is constructed, the model must be calibrated to ensure that it provides an accurate reflection of the actual operating states of the WDS. Model-based approaches involve four key steps: (1) construction of a hydraulic model, (2) calibration of the hydraulic model, (3) leak detection, and (4) leak localization. The first three steps are very similar in most model-driven approaches, while the fourth step depends on the leak localization strategy; more specifically, it depends on the type of data being analyzed and the localization method. Figure 2 illustrates the generic framework of model-based approaches in the form of a generic algorithm that captures the essential features of many of these methods. Table 2 compares and summarizes a number of model-based leak detection techniques.
- (1)
Construction of the hydraulic model
Summary of and comparison among model-based approaches
References . | Technique adopted . | Data requested . | Category . |
---|---|---|---|
Pérez et al. (2011, 2014a, 2014b); Casillas et al. (2012, 2015) Salguero et al. (2019) | Pressure sensitivity matrix | Pressure | Sensitivity matrix-based approaches |
Jiménez-Cabas et al. (2018); Geng et al. (2019) | Flow sensitivity matrix | Flow | |
Zhang et al. (2016) | Multiclass support vector machine | Pressure/flow | Mixed model-based/data-driven approaches |
Soldevila et al. (2016, 2017) | K-nearest neighbors; Bayesian classifiers | Pressure | |
Xie et al. (2019) | K-means clustering, linear classifier | Pressure | |
Rayaroth & Sivaradje (2019) | Random Bagging classifier | Pressure | |
Zhou et al. (2019a) | Fully linear DenseNet | Pressure | |
Hu et al. (2021) | Multiscale neural networks | Pressure/flow | |
Wu et al. (2010, 2012),Nasirian et al. (2013); Sophocleous et al. (2019) | Genetic algorithm | Pressure | Optimization-calibration approaches |
Sanz et al. (2016) | Least squares (LS), geographically allocated demand parameters | Demand | |
Hajibandeh & Nazif (2018) | Multi-objective ant colony optimization | Pressure/demand | |
Goulet et al. (2013); Moser et al. (2017) | Error-domain model falsification | Demand | Error-domain model falsification-based approaches |
Shao et al. (2019a) | Time-series-based | Pressure/flow | |
Jensen & Jerez (2019) | Bayesian model updating approach; multi-level Markov chain Monte Carlo algorithm | Demand |
References . | Technique adopted . | Data requested . | Category . |
---|---|---|---|
Pérez et al. (2011, 2014a, 2014b); Casillas et al. (2012, 2015) Salguero et al. (2019) | Pressure sensitivity matrix | Pressure | Sensitivity matrix-based approaches |
Jiménez-Cabas et al. (2018); Geng et al. (2019) | Flow sensitivity matrix | Flow | |
Zhang et al. (2016) | Multiclass support vector machine | Pressure/flow | Mixed model-based/data-driven approaches |
Soldevila et al. (2016, 2017) | K-nearest neighbors; Bayesian classifiers | Pressure | |
Xie et al. (2019) | K-means clustering, linear classifier | Pressure | |
Rayaroth & Sivaradje (2019) | Random Bagging classifier | Pressure | |
Zhou et al. (2019a) | Fully linear DenseNet | Pressure | |
Hu et al. (2021) | Multiscale neural networks | Pressure/flow | |
Wu et al. (2010, 2012),Nasirian et al. (2013); Sophocleous et al. (2019) | Genetic algorithm | Pressure | Optimization-calibration approaches |
Sanz et al. (2016) | Least squares (LS), geographically allocated demand parameters | Demand | |
Hajibandeh & Nazif (2018) | Multi-objective ant colony optimization | Pressure/demand | |
Goulet et al. (2013); Moser et al. (2017) | Error-domain model falsification | Demand | Error-domain model falsification-based approaches |
Shao et al. (2019a) | Time-series-based | Pressure/flow | |
Jensen & Jerez (2019) | Bayesian model updating approach; multi-level Markov chain Monte Carlo algorithm | Demand |
In this step, a WDS hydraulic model, the bedrock of these model-based approaches, is constructed based on the WDS to simulate its actual operating states. For this, software packages such as EPANET, LOOP, CADRE flow, pipe flow expert, Synergi pipeline simulator, InfoWorks WS, WaterGEMS and NextGen Simulation Suite (Zaman et al. 2020) are used.
- (2)
Calibrating the hydraulic model
The parameters of a WDS hydraulic model include the nodal head, water consumption, pipe length, pipe diameter, and pipe roughness coefficients. Compared to parameters such as the nodal head and pipe length, the pipe diameter, pipe roughness, and water consumption at a demand node are the most uncertain input variables in the simulation model because they are not typically directly measurable. Therefore, they require calibration (Walski 1983).
- (3)
Leak detection






- (4)
Leak localization
After the leak alarm has been sounded, it is necessary to locate the leak. Depending on the method used for this purpose, model-based techniques may be classified as sensitivity matrix-based approaches, mixed model-based/data-driven approaches, optimization-calibration approaches, and error-domain model falsification methods.
Sensitivity matrix-based approaches
Sensitivity matrix-based approaches are based on pressure measurement and sensitivity analysis of the distribution networks, and take advantage of the interdependence of all the operating parameters (Salguero et al. 2019). The ‘sensitivity matrix’ proposed mathematically by Pudar & Liggett (1992), was first used for leak detection and location in the distribution network of Barcelona by Pérez et al. (2009). Subsequently, similar studies have been conducted (Pérez et al. 2011, 2014a, 2014b, 2016; Quevedo et al. 2011; Casillas et al. 2012, 2015; Meseguer et al. 2014, 2015; Sala & Kołakowski 2014; Steffelbauer et al. 2014; Rosich et al. 2015; Blesa & Pérez 2018; Salguero et al. 2019). The general flow chart of the sensitivity matrix-based approaches is shown in Figure 3. Although leak detection can be performed very efficiently under ideal conditions using such a sensitivity matrix, uncertainties in nodal demands and measurement noise negatively affect the performance of these methods (Blesa & Pérez 2018). Recently, researchers have used various methods to reduce the uncertainty of hydraulic models, such as by structuralizing pressure residuals (Pérez et al. 2014b; Rosich et al. 2015), presenting an accuracy assessment before locating leaks (Pérez et al. 2014a), quantifying the uncertainty in demands (Pérez et al. 2016; Blesa & Pérez 2018), noise influence (Steffelbauer et al. 2014), considering pipe materials (Abdulshaheed et al. 2018), using random fluctuations to quantify the normal pressure variations of each monitored node (Kang & Lansey 2014), simplified calculation of the sensitivity of the elements of the network (Salguero et al. 2019) or using a specific signature minimally affected by leak magnitude to locate leaks (Casillas et al. 2015). In addition, the sensitivity matrix of pipe flow has been applied for leak detection and location in WDSs (Jiménez-Cabas et al. 2018; Geng et al. 2019).
Mixed model-based/data-driven approaches
Researchers have proposed a mixed model-based/data-driven approach (Ferrandez-Gamot et al. 2015; Soldevila et al. 2016, 2017; Zhang et al. 2016) based on sensitivity matrix-based approaches, where the leak detection is formulated as a classification problem. The WDS hydraulic model is used to obtain pressure residual matrices that correspond to a variety of leakage scenarios, which are then used to train a classifier. Finally, the classifier is used to detect and locate leaks (Soldevila et al. 2016). The general flow chart of the mixed model-based/data-driven approaches is shown in Figure 4. Researchers have used a variety of classification methods, such as statistical classifiers (Ferrandez-Gamot et al. 2015), k-nearest neighbor (k-NN) classifier (Soldevila et al. 2016), Bayesian classification (Soldevila et al. 2017; Fereidooni et al. 2020), multiclass support vector machine (Zhang et al. 2016), Fisher discriminant analysis (Romero-Tapia et al. 2018), convolutional neural networks (CNNs) (Javadiha et al. 2019), linear classifier (Xie et al. 2019), random Bagging Classifier (Rayaroth & Sivaradje 2019), deep learning identifies (Zhou et al. 2019a), random forest algorithm and decision tree algorithm (Fereidooni et al. 2020), and multiscale fully convolutional networks (Hu et al. 2021). The most similar behavior between the actual residuals and the theoretical ones determines the most probable leak location, but uncertainties must also be taken into account when performing the leak localization (Blesa & Pérez 2018).
In the aforementioned studies, the leak detection performance of the mixed model-based/data-driven approach was improved by improving the classifier (Soldevila et al. 2017; Zhou et al. 2019a). For example, compared to the method proposed by Soldevila et al. (2016), Soldevila et al. (2017) found the use of a Bayesian classifier led to higher accuracies (H = 24, the accuracy of Bayesian was 91.59% and that of K-NN was 81.47%). Comparisons with DenseNet and the traditional fully linear neural network demonstrate that deep learning (Fully-liner DenseNet) can effectively narrow the potential burst district to one or several pipes with good robustness and applicability (Zhou et al. 2019a). The performance of the hybrid method using AI algorithms and hydraulic relations developed by Fereidooni et al. (2020) is higher compared with those of existing methods (Decision Tree, KNN, random forest and Bayesian network). It is also possible to improve the leak detection performance by improving the placement of pressure sensors; some methods such as the feature selection algorithm (Soldevila et al. 2019), genetic algorithm (Zhou et al. 2019a), and shuffled frog leaping algorithm (Rayaroth & Sivaradje 2019) are used for this purpose.
Optimization-calibration approaches
Optimization-calibration is the inverse of parameter identification because it involves the use of observational values at monitoring points to derive the unknown leakage parameters (namely, the location and intensity of the leaks) of a hydraulic model (Covas & Ramos 2001). Pudar & Liggett (1992) were the first to use this approach for leak detection in pipe networks, but their method was unsuitable for steady-state problems. Wu et al. (2010) successfully used inverse analysis to detect and locate leaks. However, their method tends to converge to local optima because of the excessively large solution space, which negatively affects the accuracy of model calibrations. The general flow chart of the optimization-calibration approaches is shown in Figure 5. Walski et al. (2014) provided some practical suggestions to help users collect the right quality and quantity of data and interpret the results when running genetic algorithms for locating leaks and incorrectly closed valves. A series of similar methods have been proposed for the detection of pipe bursts (Sanz et al. 2016) and leakages (Kapelan et al. 2003; Wang et al. 2012; Nasirian et al. 2013; Hajibandeh & Nazif 2018; Zhang et al. 2018; Righetti et al. 2019; Sophocleous et al. 2019; Blocher et al. 2020; Moasheri & Jalili-Ghazizadeh 2020). Some researchers aimed to identify leaks before model calibration (i.e., background leaks) (Wu et al. 2010; Sophocleous et al. 2015), while others have the opposite goal (Sanz et al. 2016; Berglund et al. 2017; Sophocleous et al. 2019). Researchers have used various optimization methods, such as genetic algorithms (Wu et al. 2010; Wang et al. 2012; Nasirian et al. 2013; Sophocleous et al. 2019), least squares (LS) (Sanz et al. 2016), linear and mixed-integer programming (Berglund et al. 2017), artificial immune systems (Eryiğit 2017), ant colony optimization (Hajibandeh & Nazif 2018), particle swarm optimization (Zhang et al. 2018; Righetti et al. 2019), and the imperialist competitive algorithm (Moasheri & Jalili-Ghazizadeh 2020). The performance of the differential evolution algorithms depends on distance metrics used to compute the objective function (Steffelbauer & Fuchs-Hanusch 2016), as well as the ordering of the leakage positions in the parametric space (Steffelbauer et al. 2017). Since optimization-calibration is the inverse of parameter identification, many decision variable combinations can lead to the same result (Jensen & Jerez 2019). The methods mentioned above are mainly based on the steady state. Based on the same principle, many optimization methods based on transient states have also been proposed (Liggett & Chen 1994; Colombo et al. 2009; Covas & Ramos 2010; Soares et al. 2011; Kim 2016).
Error-domain model falsification-based approaches
To reduce the effects of modeling and measurement uncertainties on the efficacy of leak detection methods, some researchers have employed error-domain model falsification in leak detection methods for WDSs (Goulet et al. 2013; Moser et al. 2015, 2016, 2017, 2018). Such methods can describe the uncertainty between model predictions and measured values using only a small amount of information. In this approach, the modeling and measurement uncertainties of each location are explicitly represented, and prior knowledge is used to define the bounded values of each parameter and construct the set of possible scenarios. Each scenario corresponds to a set of parametric values that describing the system state, and the scenario set covers all possible states. The parameters of the model (leak location and intensity) are then calibrated by minimizing the difference between the predicted and measured values through error-domain model falsification. This approach does not require the error structure of model predictions to be fully defined, as one simply has to falsify the model instances where the difference between predicted and measured values exceeds the maximum plausible error (i.e. various leakage scenarios). This is obtained by the combining measurement and modeling uncertainties.

Model falsification-based leak detection was first proposed by Goulet et al. (2013), who also demonstrated a method for optimizing the placement of sensors in a WDS. However, this method performs poorly with small leaks. Model falsification has subsequently become widely adopted as a leak detection method (Moser et al. 2015, 2016, 2017; Jensen & Jerez 2019; Shao et al. 2019a). The general flow chart of the error-domain model falsification-based approaches is shown in Figure 6. Compared to earlier methods, the methods proposed in later studies were significantly better in terms of leak detection performance. For example, Moser et al. (2015) proposed a method that significantly reduced the leak detection time. Moser et al. (2016) extended the model falsification-based leak detection method to electrical networks for creating example cases for the study of leak detection strategies. Jensen & Jerez (2019) performed steady and unsteady analyses of a hydraulic model using their method. Some researchers have also considered the effects of uncertainties in nodal demand on the leak detection performance. Moser et al. (2016) provided approximate estimates of nodal demand to reduce the effects of their uncertainties, while Shao et al. (2019a) used real-time measurements and model predictions to detect leaks using a time-series-based leak-scenario falsification method to select correlation coefficient thresholds. By accounting for the effects of modeling and measurement uncertainties, model falsification is shown to be a promising approach for leak detection. Nonetheless, these methods also have a few weaknesses; for example, they are insensitive to small leaks (Goulet et al. 2013; Moser et al. 2018; Shao et al. 2019a), and they perform poorly if the WDS contains a relatively small number of sensors (Jensen & Jerez 2019).
Generic framework of error-domain model falsification-based approaches.
DATA-DRIVEN APPROACHES
In data-driven approaches, leaks are detected via statistical and signal processing analyses of acquired data (Cody et al. 2020a). This approach is especially promising for WDSs having a large number of sensors because the construction of complex WDS hydraulic models is not required, which makes it insensitive to structural or operational complexities (Wu & Liu 2017). Simply put, these technologies are dedicated to locating abnormal values in data that may be caused by abnormal events (such as leaks) from the usual patterns recorded in the pipeline system (Geelen et al. 2019). Data-driven techniques typically utilize flow or pressure data, but they may also use end-user water demands for leak analysis. Figure 7 illustrates the generic framework of data-driven approaches, while Table 3 summarizes and compares a variety of these methods. In most cases, a data-driven technique performed in two steps: (1) data acquisition, preprocessing, and transformation, and (2) leak detection strategy.
- (1)
Data acquisition, preprocessing, and transformation
Summary of data-driven approaches
References . | Technique adopted . | Data requested . | Category . |
---|---|---|---|
Kang et al. (2018) | One-dimensional convolutional neural network; support vector machine | Wave velocity | Feature set classification methods |
Sun et al. (2020) | Linear discriminant analysis; neural networks | Pressure | |
Zhou et al. (2019b) | Kernel principal component analysis; cascade support vector data description | Pressure | |
Soldevila et al. (2019) | Kriging spatial interpolation; Bayesian reasoning | Pressure | |
Cody et al. (2020a) | Linear prediction | Acoustic signals | |
Xu et al. (2020) | Discrete Fourier transform; isolation forest techniques | Pressure | |
Cody et al. (2020b) | Convolutional neural network; variational autoencoder | Acoustic signals | |
Romano et al. (2011) | Artificial neural network | Pressure/flow | Prediction-classification methods |
Ye & Fenner (2011) | Linear Kalman filter | Pressure/flow | |
Bakker et al. (2014b) | Adaptive forecasting model and deviation analysis | Demand/pressure | |
Mounce et al. (2011) | Support vector regression | Pressure/flow | |
Ye & Fenner (2014) | Polynomial function based on weighted least squares with EM algorithm | Flow | |
Hutton & Kapelan (2015) | Probabilistic demand forecasting model | Demand | |
Jung & Lansey (2015) | Nonlinear Kalman filter | Demand | |
Karray et al. (2016) | Predictive Kalman filter | Wave velocity | |
Wang et al. (2020) | Deep learning recurrent neural networks | Flow | |
Laucelli et al. (2016) | Evolutionary polynomial regression | Pressure/flow | |
Huang et al. (2018) | Dynamic time warping; supervised learning | Demand/flow | |
Romano et al. (2014) | Bayesian inference system | Pressure/flow | |
Jung et al. (2015) | Statistical process control | Pressure/flow | Statistical approaches |
Palau et al. (2012) | Principal component analysis; statistical process control | Flow | |
Loureiro et al. (2016) | Modified statistical process control | Flow | |
Quinõnes-Grueiro et al. (2018) | Principal component analysis; periodic transformation; vector extension | Flow | |
Nam et al. (2019) | Principal component analysis; standardized exponential weighted moving average (EWMA) | Pressure/flow | |
Wu et al. (2016) | Clustering | Flow | Unsupervised clustering techniques |
Wu et al. (2018) | Clustering; cosine distance | Flow | |
Geelen et al. (2019) | Feature-based clustering | Pressure | |
Lu & Sela (2019) | Time series data mining | Pressure transients |
References . | Technique adopted . | Data requested . | Category . |
---|---|---|---|
Kang et al. (2018) | One-dimensional convolutional neural network; support vector machine | Wave velocity | Feature set classification methods |
Sun et al. (2020) | Linear discriminant analysis; neural networks | Pressure | |
Zhou et al. (2019b) | Kernel principal component analysis; cascade support vector data description | Pressure | |
Soldevila et al. (2019) | Kriging spatial interpolation; Bayesian reasoning | Pressure | |
Cody et al. (2020a) | Linear prediction | Acoustic signals | |
Xu et al. (2020) | Discrete Fourier transform; isolation forest techniques | Pressure | |
Cody et al. (2020b) | Convolutional neural network; variational autoencoder | Acoustic signals | |
Romano et al. (2011) | Artificial neural network | Pressure/flow | Prediction-classification methods |
Ye & Fenner (2011) | Linear Kalman filter | Pressure/flow | |
Bakker et al. (2014b) | Adaptive forecasting model and deviation analysis | Demand/pressure | |
Mounce et al. (2011) | Support vector regression | Pressure/flow | |
Ye & Fenner (2014) | Polynomial function based on weighted least squares with EM algorithm | Flow | |
Hutton & Kapelan (2015) | Probabilistic demand forecasting model | Demand | |
Jung & Lansey (2015) | Nonlinear Kalman filter | Demand | |
Karray et al. (2016) | Predictive Kalman filter | Wave velocity | |
Wang et al. (2020) | Deep learning recurrent neural networks | Flow | |
Laucelli et al. (2016) | Evolutionary polynomial regression | Pressure/flow | |
Huang et al. (2018) | Dynamic time warping; supervised learning | Demand/flow | |
Romano et al. (2014) | Bayesian inference system | Pressure/flow | |
Jung et al. (2015) | Statistical process control | Pressure/flow | Statistical approaches |
Palau et al. (2012) | Principal component analysis; statistical process control | Flow | |
Loureiro et al. (2016) | Modified statistical process control | Flow | |
Quinõnes-Grueiro et al. (2018) | Principal component analysis; periodic transformation; vector extension | Flow | |
Nam et al. (2019) | Principal component analysis; standardized exponential weighted moving average (EWMA) | Pressure/flow | |
Wu et al. (2016) | Clustering | Flow | Unsupervised clustering techniques |
Wu et al. (2018) | Clustering; cosine distance | Flow | |
Geelen et al. (2019) | Feature-based clustering | Pressure | |
Lu & Sela (2019) | Time series data mining | Pressure transients |
The primary objective of data preprocessing is to eliminate erroneous or missing data from the time series data to facilitate subsequent analyses. The acquired measurement data must be processed before they can be used for leak analysis, which ensures that they are suitable for the selected algorithm. Although issues pertaining to variability and uncertainty may arise during the processing of the measurement data, they may be overcome to an extent by preprocessing and transforming the data (Zaman et al. 2020). The preprocessing and transformation methods are selected according to the leak detection method and the data properties.
- (2)
Leak detection strategy
Data-driven techniques can be classified according to their leak detection strategies; that is, feature set classification methods, prediction-classification methods, statistical methods, and unsupervised clustering methods.
Feature set classification methods
Simple classification methods such as ANNs (Caputo & Pelagagge 2003; Mounce & Machell 2006; Mounce et al. 2014) and self-organizing map ANNs (Aksela et al. 2009) can be used to distinguish leaks from normal data by constructing classifiers. However, these classifiers must be trained with large amounts of normal and anomalous (pipe bursts or leaks) data, which often becomes very difficult (Wu & Liu 2017). A number of studies have been conducted to solve problems caused by a lack of training samples. For example, Tao et al. (2014) used a multi-level neural network for leak detection, where the first level determines the occurrence of a leak, while the second level determines the location and intensity of the leak. Soldevila et al. (2019) used Kriging spatial interpolation to obtain the pressure data of all nodes and stored data of normal operations in a database for use as historical data. However, these methods assume a moving variance of leakage regions; given that the leakage events obtained from the model are irregular in nature, it is difficult to satisfy this assumption of non-stationarity.
Many researchers have used the feature sets of anomalous events to train classifiers to make them distinguish between normal and anomalous events (Chan et al. 2019). Feature extraction is key to the success of feature set classification methods. To improve the accuracy of leak detection and localization, various methods have been used to extract leak features, including wavelet transform (Ahadi & Bakhtiar 2010; Lu et al. 2016; Bentoumi et al. 2017; Kumar et al. 2017; Li et al. 2018; Rai & Kim 2021), singular spectrum analysis (Cody et al. 2018), Markov chain (Liu et al. 2019a), association rules (Harmouche & Narasimhan 2020), principal component analysis (Liu et al. 2019b; Zhou et al. 2019b; Hashim et al. 2020), joint time frequency analysis (Zan et al. 2014), linear prediction (Cody & Narasimhan 2020; Cody et al. 2020a; Guo et al. 2020), CNNs (Kang et al. 2018; Cody et al. 2020b), matched-field processing (Wang et al. 2021), transfer matrix extraction (Shi et al. 2020), and discrete fourier transform (Xu et al. 2020). These techniques often use acoustic signals (Kumar et al. 2017; Cody et al. 2018, 2020a, 2020b; Kang et al. 2018; Li et al. 2018; Cody & Narasimhan 2020), transient pressure waves (Bohorquez et al. 2020; Kim 2020; Wang et al. 2021) or pressure monitoring data (Sun et al. 2020; Xu et al. 2020) for leak detection. Various classification techniques are used to classify the extracted features. Examples include artificial neural networks (Kumar et al. 2017; Li et al. 2018; Manzi et al. 2019; Bohorquez et al. 2020), support vector machine (Cody et al. 2018; Kang et al. 2018; Liu et al. 2019a, 2019b; Zhou et al. 2019b; Hashim et al. 2020), random forest (Guo et al. 2020), Gaussian mixture model (Cody et al. 2020a; Cody & Narasimhan 2020; Rai & Kim 2021), decision tree and naïve Bayes classifiers (El-Zahab et al. 2018), neural network and linear discriminant analysis (Sun et al. 2020) and isolation forest techniques (Xu et al. 2020). Traditional fixed-feature extraction methods depend on the quality of user-defined handcrafted feature extraction (Kang et al. 2018) and require significant signal preprocessing, parameter tuning, and training phases (Harmouche & Narasimhan 2020). The application of some newer methods has achieved good results; for example, CNNs could be more effective for classifying time-series signals (Kang et al. 2018). The shorter segments of linear prediction reconstructed signals can achieve similar levels of accuracy as those using longer segments of raw time series, thereby enabling the identification of relatively small changes in the acoustic signatures due to leaks (Harmouche & Narasimhan 2020) and the detection of subtle burst signals from normally noisy pressure data (Xu et al. 2020). Therefore, different feature extraction and classification methods can be used to improve the performance of leak detection.
Prediction-classification methods
These methods consist of two stages: prediction and classification. The prediction stage estimates the ideal WDS data under normal conditions. Techniques such as ANNs (Mounce et al. 2002, 2003; Bougadis et al. 2005; Romano et al. 2011), Kalman filter (KF) (Ye & Fenner 2011; Jung & Lansey 2015), Bayesian inference system (Poulakis et al. 2003; Romano et al. 2014), support vector regression (Mounce et al. 2011), evolutionary polynomial regression (EPR) (Laucelli et al. 2016), and long short-term memory (LSTM) (Wang et al. 2020) are widely used for the prediction of WDS data. On the contrary, the classification stage identifies leaks by comparing predictions and measurements. Under normal circumstances, the measurements corresponding to a leakage event may be identified by their (significant) deviations from their predicted values.
Prediction-classification methods must predict the future operating states of the WDS, such as the short-term variations in flow and pressure distributions (Romano et al. 2011; Romano et al. 2014; Wang et al. 2020), as well as variations in water demand (Mounce et al. 2011; Bakker et al. 2014b). False positives may occur if anomalous water demands are not accounted for by these predictions. Prediction-classification methods also require a large amount of historical data. A few methods have been proposed to reduce the false positive rate of these techniques and improve their accuracy. Hutton & Kapelan (2015) proposed an approach of comparing measurements to predictions in the form of probability distributions, while Jung & Lansey (2015) used a nonlinear KF to identify the operating state of the system. Karray et al. (2016) found that the data compression problem can be solved using a predictive KF. The LSTM technique was used by Wang et al. (2020) to predict WDS flows, which was demonstrated to outperform conventional methods in terms of learning performance and accuracy. However, the dataset used by the same researchers only covered a few months, and they only considered diurnal and weekly variations in flow without accounting for seasonal variations. Therefore, this method requires further validation in terms of its ability to detect small pipe bursts and leaks.
In the abovementioned methods, leak detection is performed by analyzing the difference between the measured and predicted values. Some researchers have also explored other approaches for leak detection; for example, prediction and classification using adaptive inflow approximations and fault detection (Eliades & Polycarpou 2012), EPR (Laucelli et al. 2016), and dynamic time warping (DTW) (Huang et al. 2018). The EPR approach enables the use of on-line data from affordable pressure and flow sensors to reproduce the behaviors of a WDS. Since DTW acts on pattern anomalies instead of point anomalies, this approach possesses a very low false positive rate; however, it can only be used to detect pipe bursts.
Statistical approaches
Most data-driven techniques include a modeling process (either prediction or classification). Prediction-classification methods rely on historical data of normal operations to produce predictions, while simple classification methods must be trained using large quantities of historical data on anomalous conditions. The accuracies of both these approaches depend on the accuracy of the prediction or classification model, as erroneous judgments will occur if the model is inaccurate (Wu & Liu 2017; Chan et al. 2019). Conversely, statistical approaches do not require prediction or classification models as they can predict leaks simply by applying statistical analyses on the acquired data (Palau et al. 2012; Jung et al. 2015). Statistical process control (SPC) is a key technique for this class of methods, which uses control charts (a graphical analysis tool with a set of control limits) to identify outliers in monitored data (i.e. leak-induced changes in pressure or flow). The construction of residual control charts is critical for SPC-based leak monitoring (Wu & Liu 2017). SPC-based leak detection methods have been widely adopted (Palau et al. 2012; Jung et al. 2015; Lee et al. 2016; Loureiro et al. 2016; Quinõnes-Grueiro et al. 2018; Nam et al. 2019; Diao et al. 2020), and the Gaussian mixture model has also been used for leak detection (Fagiani et al. 2016).
Control charts are graphical tools with built-in control limits, which are used to analyze the current system state and determine whether it is poised in a normal state. Some SPC control charts (Western Electric Company (WEC), Hotelling T2 control chart, cumulative sum (CUSUM) and exponentially weighted moving average (EWMA)) have found practical uses in WDS leak detections (Palau et al. 2012; Jung et al. 2015).
The statistical approach has also been combined with other techniques to improve the leak detection performance. For example, Palau et al. (2012) reduced data redundancy using principal component analysis to project the flow data to a lower dimension, while Loureiro et al. (2016) used robust statistical metrics to account for the asymmetrical behavior of flow data patterns. Combined with wavelet transform to solve the problem of slow response (Lee et al. 2016), Quinõnes-Grueiro et al. (2018) proposed data preprocessing methods based on the dynamic characteristics of pipe networks, while Nam et al. (2019) combined the principal component analysis, K-means clustering, and sensitivity analysis to optimize sensor configurations. Unsupervised leak detection methods are simple and intelligent, not requiring the collection of historical data. Although statistical methods have shown promise for leak detection, they are deficient in certain aspects. For example, the method used by Loureiro et al. (2016) fails to account for unexpected water demands or the demands of large users, whereas the method used by Quinõnes-Grueiro et al. (2018) is quite time-consuming for leak detection. Finally, the method used by Nam et al. (2019) does not account for the influence of noise on their results. In practice, SPC is more often used as a component of other methods, either as a module or as a processing step. For example, SPC can be used for data preprocessing (Wu & Liu 2017).
Unsupervised clustering techniques
Similar to the statistical approach, unsupervised clustering techniques allow leaks to be detected using only the features of the available data. In this class of methods, the flow (Wu et al. 2016, 2018) or pressure (Lu & Sela 2019; Geelen et al. 2019) data are compared in terms of similarities via the clustering analysis, which are then used to detect anomalous events. These methods are unsupervised because clustering algorithms do not require a priori information about leakage scenarios (Chan et al. 2019). The guiding principle of these methods is based on the division of the WDS or its data into a number of distinct clusters and identification of leaks using another strategy.
Unsupervised clustering techniques are mainly used to detect burst pipes in district metering areas (DMAs) having many inlets and outlets. Using measurements taken at the many outlets of a DMA, unsupervised clustering techniques can detect large numbers of short-duration pulses without a preceding data selection process (Wu et al. 2016). Although these methods do not require a prediction process, they still require a large amount of historical data for generating the time-series data. Moreover, they cannot account for water demand variations caused by weather or seasonal variations. To overcome these issues, Wu et al. (2018) introduced the use of cosine distances to evaluate the differences among flow data, and they also used the temporally varying correlation among flow sensors to reduce the false positive rate caused by periodic variations in water demand, weather, and season. However, it cannot account for unexpected water demands. Furthermore, compared to the earlier method of Wu et al. (2016), this method of Wu et al. (2018) does not yield any improvement in the true positive rate, and it only slightly lowers the false positive rate. Instead of flow data, the behavior of pressure transients was used to reveal recurrent and consistent patterns. By analyzing high-frequency pressure signals from distributed sensors, the proposed approach provides a fast and efficient way to discover hidden information in WDSs (Lu & Sela 2019). These results indicate that the feature-based clustering method is the best for detecting recurring pressure anomalies, with accuracy F1-scores of 92 and 94% for the 2013 and 2017 datasets, respectively (Geelen et al. 2019).
PERFORMANCE ASSESSMENT
Assessment metrics
DT is the time elapsed between the beginning of a leak and its detection, and the leak damage increases in proportion to DT. Therefore, it is an important metric for leak detection methods. It should be noted that leak simulations were not always performed to assess leak detection. In some studies, the records of water utility companies were used to calculate DT, while others used synthesized data to estimate the detection performance of the leak detection system. Therefore, the DT values calculated using some methods are not entirely accurate as these values could be higher in reality (Mounce et al. 2010).
Other than the three abovementioned metrics, the minimum detectable leak size is also an important metric for leak detection methods. It is complementary to the TPR and FPR. Some techniques can only detect large leaks (e.g. pipe bursts), whereas others are meant for detecting very small or background leaks. If the test dataset already contains a large number of small known leaks, the TPR of a method can be very low even if it is effective in detecting large leaks. Conversely, if a method can detect small leaks, its FPR might be very large. For leak localization methods, the leak localization accuracy is the most important metric for determining the quality of the method.
ATD reflects the leak localization accuracy as it represents the average of the minimum distance between the leaking node and the candidate nodes selected in the leak localization method. Currently, most leak localization methods have accuracies only in the order of meters, which is why ancillary methods are still needed to determine the exact leak location.
Assessment and analysis
A good leak detection method should have a high TPR and low FPR. For the DT, it is important to distinguish among different leak developments. Considering bursts with high volumetric flow rates, rapid detection is essential. Consider the leakage occurring due to the increase of leakage size; for example, due to corrosion, starting with a low-volume flow, the cumulative flow is low initially. Therefore, for this type of leak, a large DT is not critical. However, it is difficult to satisfy all these requirements simultaneously. Table 4 summarizes the performance of some model-based and data-driven approaches. Although a few methods might have low FPRs, they are flawed in other ways, such as high DTs (Romano et al. 2014; Jung & Lansey 2015), low TPRs (Wu et al. 2016, 2018), or low localization accuracies (Cody et al. 2020a). Moreover, the methods that seem to perform well were either tested using only synthetic data (Karray et al. 2016; Cody et al. 2020b) or only at midnight (Huang et al. 2018; Kang et al. 2018). For example, the method proposed by Huang et al. (2018) showed high FPRs during peak water demand periods (4:15–8:00 and18:15–24:00), but performs well (TPR = 98.3%, FPR = 6.7%) at midnight (2:15–4:00).
Performance of some model-based and data-driven approaches
Reference . | Classification accuracy (%) . | Detection accuracy (%) . | Classification time (h) . | Error rate (%) . | Category . |
---|---|---|---|---|---|
Rayaroth & Sivaradje (2019) | 99 | 99 | 1 | 1 | Mixed model-based/data-driven approaches |
Hajibandeh & Nazif (2018) | 94 | 94 | 1.34 | 6 | Optimization-calibration approaches |
Kang et al. (2018) | 88 | 90 | 2 | 10 | Feature set classification methods |
Reference . | TPR (%) . | FPR (%) . | DT (h) . | Leakage flow (%) . | Category . |
Mounce & Machell (2006) | 75 | 0 | —— | 2–10 | Feature set classification methods |
Mounce et al. (2010) | —— | 15 | —— | —— | Prediction-classification methods |
Ye & Fenner (2014) | —— | 4 | 0.25 | 10–50 | |
Eliades & Polycarpou (2012) | 94.5 | 0 | 235 | 1.5–10 | |
Bakker et al. (2014a) | —— | —— | 0.08–0.16 | >20 | |
Romano et al. (2014) | 92 | 8 | 1 | 11 | |
Huang et al. (2018) | 98.3 | 6.7 | —— | 10–20 | |
Wang et al. (2020) | 94.82 | 0.2 | 10 | 2.79–13.51 | |
Loureiro et al. (2016) | 80 | 10 | >= 0.5 | —— | Statistical approaches |
Wu et al. (2016) | 71.43 | 0.61 | —— | 13.30 | Unsupervised clustering techniques |
Wu et al. (2018) | 71.43 | 0.4 | —— | 13.3–23.1 |
Reference . | Classification accuracy (%) . | Detection accuracy (%) . | Classification time (h) . | Error rate (%) . | Category . |
---|---|---|---|---|---|
Rayaroth & Sivaradje (2019) | 99 | 99 | 1 | 1 | Mixed model-based/data-driven approaches |
Hajibandeh & Nazif (2018) | 94 | 94 | 1.34 | 6 | Optimization-calibration approaches |
Kang et al. (2018) | 88 | 90 | 2 | 10 | Feature set classification methods |
Reference . | TPR (%) . | FPR (%) . | DT (h) . | Leakage flow (%) . | Category . |
Mounce & Machell (2006) | 75 | 0 | —— | 2–10 | Feature set classification methods |
Mounce et al. (2010) | —— | 15 | —— | —— | Prediction-classification methods |
Ye & Fenner (2014) | —— | 4 | 0.25 | 10–50 | |
Eliades & Polycarpou (2012) | 94.5 | 0 | 235 | 1.5–10 | |
Bakker et al. (2014a) | —— | —— | 0.08–0.16 | >20 | |
Romano et al. (2014) | 92 | 8 | 1 | 11 | |
Huang et al. (2018) | 98.3 | 6.7 | —— | 10–20 | |
Wang et al. (2020) | 94.82 | 0.2 | 10 | 2.79–13.51 | |
Loureiro et al. (2016) | 80 | 10 | >= 0.5 | —— | Statistical approaches |
Wu et al. (2016) | 71.43 | 0.61 | —— | 13.30 | Unsupervised clustering techniques |
Wu et al. (2018) | 71.43 | 0.4 | —— | 13.3–23.1 |
Some researchers have tried to improve the detection performance using a larger time window (Loureiro et al. 2016), improving the prediction and classification methods (Wang et al. 2020), or increasing the number of sensors in the WDS (Rayaroth & Sivaradje 2019; Sun et al. 2020). The best way to improve detection performance is to use a longer time-series of monitoring data, and utilize new methods to improve the prediction and classification accuracies. Increasing the number of sensors in the WDS improves the detection accuracy (Sun et al. 2020) and also decreases DT (Rayaroth & Sivaradje 2019). Therefore, the number of sensors installed should be maximized, and their placement must also be optimized.
Some pipe burst/leak localization methods can either only determine the approximate area of the leakage (Tao et al. 2014) or have low localization accuracies (Pérez et al. 2014b; Cody et al. 2020a). Other methods have only been tested during low-noise periods (Kang et al. 2018). Zhou et al. (2019a) incorporated deep learning in a mixed model-based/data-driven method and were able to accurately locate burst pipe signals (in single and multiple pipes).
DISCUSSION
Weaknesses of model-based approaches
All model-based approaches require a hydraulic model of the WDS, regardless of the leak detection method used. These hydraulic models must be accurately calibrated to ensure that their pressure and flow predictions are realistic reflections of the operating states of the WDS. Therefore, model-based approaches require a large amount of data for calibration, which highly increases the computational cost of this process (Pérez et al. 2014a). Furthermore, the model must be reconstructed and recalibrated if the topological structure of the pipeline network changes in any way (Kang & Lansey 2011). Thus, all of these tasks are difficult for water utility companies as they can only be performed by experts.
Another weakness of model-based approaches is the uncertainty of model parameters; for example, the state of the pipes, which is assumed invariant in leak detection methods. However, as the pipes grow older, some parameters of the WDS vary. For instance, the roughness coefficients might increase, or the pipe diameters may become smaller (Adedeji et al. 2017; Chan et al. 2019). Furthermore, model-based approaches are also affected by uncertainties in nodal demand because nodal demands cannot be measured (Blesa & Pérez 2018). Because WDS hydraulic models are simplified compared to real systems, the real-time operating states simulated by these models will deviate from reality to some extent (Moser et al. 2015). Furthermore, the measured pressures and flows of the network may contain errors due to sensor measurement uncertainties (Moser et al. 2017). These uncertainties will negatively affect the leak detection and localization accuracies using model-based approaches.
To improve the leak detection performance of these methods, some methods (such as the formal Bayesian approach (Shao et al. 2019b)) have been used to estimate real-time nodal demands.
Optimizing sensor placement in a WDS is an effective approach for achieving this purpose (Berry et al. 2003; Vitkovsky et al. 2003; Kapelan et al. 2005; Shastri & Diwekar 2005). In general, sensor placement is determined by the selected leak detection method. Therefore, a unique optimal sensor configuration for all WDSs does not exist (a configuration might be optimal for one method but not for others) (Soldevila et al. 2019). Considering this, the sensor placement in a WDS is usually performed by experts. In studies concerning sensor configuration, a variety of methods have been used; for example, genetic algorithms (Zhou et al. 2019a) and the shuffled frog leaping algorithm (Rayaroth & Sivaradje 2019), to optimize the placement of sensors. The use of some new classification methods (Zhou et al. 2019a; Fereidooni et al. 2020) or optimization methods can also improve the performance of various methods. Presently, model-based approaches have been validated for simplistic networks using synthetic data and for field networks with real data under controlled conditions (Zaman et al. 2020).
Weaknesses of data-driven approaches
Data-driven techniques show promising applicability for leak detection in pipeline systems equipped with an ample number of monitoring sensors (Zaman et al. 2020). Although a profound understanding of the WDS is not required for data-driven techniques, these techniques exhibit high FPRs due to uncertainties in the monitored data and their intrinsic limitations, which pose difficulties for effective decision-making (Wu & Liu 2017).
In data-driven approaches, leak detection is formulated as an anomaly detection problem that may be solved by data mining (Mounce et al. 2014). The performance of these methods depends on the quality of the monitored data. Since data losses can possibly occur during the data acquisition processes, detecting leaks using a data-driven technique may become impossible because of insufficient historical data (Romano et al. 2014). Owing to financial limitations, the coverage of sensors and telecommunication networks in a WDS is usually limited (Wu & Liu 2017). Furthermore, sensor anomalies, communication issues, and noise often result in ‘poverty data’. Consequently, the data must be vetted by experts. The monitoring data also tend to vary periodically (by day, week, season, or year), and leak-induced anomalies are often obscured by unsteady monitored data (Wu et al. 2018). Although a few methods have been proposed to reduce variations caused by different working days (Ye & Fenner 2014), water demands can also vary seasonally, which leads to unstable daily/weekly data series. Since night-time flows usually follow a consistent trend, they may be used to improve the detection rate and accuracy of data-driven methods (Chan et al. 2019).
Pressure measurements tend to be noisy, which limits their applicability in leak detection. As such, certain techniques can be used to exclude signal noise and extract information from the raw data, such as wavelet analysis (Romano et al. 2014) and discrete Fourier transform (Xu et al. 2020). Some researchers have used interpolation methods to retrieve the lost data (e.g. linear interpolation (Romano et al. 2014), auto-regressive integrated moving average (Mounce et al. 2010) and Kriging spatial interpolation (Soldevila et al. 2019)), which may resolve issues caused by insufficient historical data. Although these techniques are reasonably feasible to implement, interpolation will inherently increase the data uncertainty, which may reduce the reliability of the leak detection results. Mathematical preprocessing is vital for these data-driven methods, and a diverse range of mathematical techniques, such as CNNs (Kang et al. 2018), linear prediction (Cody et al. 2020a), and clustering techniques (Geelen et al. 2019) have been used. Therefore, we can attempt to combine different methods to improve the leakage detection performance of the data-driven method.
CONCLUSION
In this paper, we have reviewed model-based and data-driven approaches for WDS leak detection and location. We then classified the techniques that fall under these approaches according to their respective leak detection methods. Although these approaches are promising, they have not been well developed, and the current technology is far from ideal (Zaman et al. 2020). Model-based approaches include sensitivity matrix-based approaches, mixed model-based/data-driven approaches, optimization-calibration approaches, and error-domain model falsification. On the contrary, data-driven approaches include feature set classification methods, prediction-classification methods, statistical methods, and unsupervised clustering methods. The generic processes of these methods were summarized and encapsulated in a generic algorithm. It is seen that model-based approaches are capable of detecting and locating leaks but require calibrated hydraulic models and optimized sensor placement; their results are also highly sensitive to modeling and measurement errors. For data-driven methods, a profound understanding of the WDS is not necessary as these methods only involve statistical or signal processing analyses of the acquired data. However, they require large quantities of data and are also sensitive to data loss, anomalous sensor data, communication issues, and noise. Furthermore, fluctuations in water demand also affect their leak detection performances. A data-driven approach is more appropriate when a large amount of historical data can be obtained from a real network. However, when the data is less and its hydraulic model is easy to obtain, model-based methods are then preferred.
Model-based and data-driven approaches both have their own strengths and weaknesses. Some researchers have tried to combine two or more of these approaches to improve leak detection performance, and these combinatory methods are called ‘hybrid leak detection techniques’ (Zaman et al. 2020). Some AI-based techniques have also demonstrated excellent leak detection performance (Zhou et al. 2019a; Sun et al. 2020; Wang et al. 2020; Xu et al. 2020). Therefore, the incorporation of new methods in one or more of the key processes of current leak detection methods may improve their performances. In addition, multiple methods can be combined to compensate for individual shortcomings. For example, as model-based and data-driven approaches both involve classification processes, it may be feasible for model-based approaches to ‘borrow’ the most effective classifiers from data-driven approaches, and vice versa. Statistical techniques may also be applied to one or more steps in many leak detection techniques. In the future, the ‘hybridization’ of multiple techniques to overcome their individual weaknesses could lead to significant improvements in the performance of leak detection methods and create new, promising leak detection techniques. The economic level of leakage (ELL) is defined as ‘the level at which it costs more to reduce leakage further than to produce that water from an alternative source’ (House of Lords 2006). Given the expense for pinpointing hardware, and computation and data acquisition expense for data driven and modelling approaches, it seems the concept of ELL remains alive and well.
FUNDING
This research was funded by The National Key R&D Program of China (grant number 2019YFC0408805) and Key Technology Application and Demonstration of Water Conservation Society Innovation Pilot in Jinhua, Zhejiang (grant number SF-201801).
ACKNOWLEDGEMENTS
We would like to express appreciations to colleagues in the laboratory for their constructive suggestions. Also, we thank the anonymous reviewers and members of the editorial team for their constructive comments.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.