## ABSTRACT

Leakage in water distribution networks (WDNs) not only leads to serious water loss and pipe contamination but also affects residents’ daily water. Accurate localization of leaks in WDN is significant to conserve water resources and reduce economic losses. However, in traditional optimization and verification methods for leak detection, factors such as modeling and pressure monitoring point errors are not taken into account, resulting in the deviation from the actual location of leaks. To address the mentioned issues, this article presents a WDN leakage probability prediction method based on Bayesian inference. The method converts the prior probability of leakage events into posterior probability. Then, by utilizing the posterior probability density function, the uncertainty of modeling and measurement errors in the hydraulic simulation model are quantified. The probability distribution of leakage pipes and leakage quantity within the leakage area is calculated, allowing for the location prediction and corresponding magnitude of leakage. The experimental results indicate that the prediction model can detect unknown quantities of leakage events. By collecting multiple sets of leakage data, it is possible to accurately predict the location and quantity of leaks, enhancing the efficiency of leakage detection in large-scale water supply networks and providing decision-making assistance for water utilities.

## HIGHLIGHTS

Presenting a method for predicting leakage probability based on probability statistics, which calculates the leakage probability of each leakage event and infers the most likely leakage event to occur.

Reducing the adverse effects of observation and modeling errors through multiple data collection, achieving multi-point leakage detection in pipe networks.

## INTRODUCTION

The water supply network is an essential infrastructure in modern urban construction, playing a crucial role in urbanization and people's daily lives. With the accelerated process of urban modernization in our country, the urban population has been increasing dramatically. Urban water demand has been growing, leading to rapid development in the construction and management of large-scale water supply pipes. However, as urban water supply networks continue to expand in scale and the service time increases, the leakage detection and management level of China's water supply industry still lags behind, and the leakage rate remains high. Thus, addressing water leakage is essential in the field of leakage control for large-scale water supply networks. According to statistics, approximately 60 million cubic meters of water are lost daily due to leakage in the water supply networks in Asia. The leakage could meet the water demand of 230 million people, resulting in an economic loss of over 14 billion US dollars per year. According to the ‘China Urban and Rural Construction Statistical Yearbook 2021’ released by the Ministry of Housing and Urban–Rural Development of the People's Republic of China (Zhu 2022), the total municipal water supply in urban and county areas of China was 79.53 billion cubic meters, with a leakage volume of 9.408 billion cubic meters and a total leakage rate of 11.83%. Among the leakage data released from 687 cities nationwide, most cities have managed to control leakage rates between 10 and 20%. There are 109 cities that have achieved a leakage rate of below 10%, while 25 cities have a leakage rate above 30%, with some even exceeding 50%. Addressing the issue of water leakage in large-scale urban water supply networks has become an urgent challenge for the water supply industry both domestically and internationally.

In the past few decades, both domestically and internationally researchers have proposed a series of leak detection techniques, continuously exploring key technologies in addressing water pipe network leaks. Currently, the commonly used methods for leak localization can be summarized as acoustic signal-based and pressure signal-based. The point to the acoustic signal method lies in the cross-correlation technique, which involves placing specific sensors at both ends of a pipe for leak detection. Due to the complex environment of actual leakage situations and the influence of low signal-to-noise ratios, it is sometimes difficult to effectively collect leakage signals (Chen *et al.* 2011). To solve such problems, researchers often use wave decomposition and transformation techniques to reduce or eliminate the noise in leakage signals. Brennan *et al*. (2019) demonstrated through the use of polarity co-incidence correlation on simulated and measured data, accurate time delay estimate can be obtained by retaining only the zero crossings in the noisy data. The results indicated that the severe clipping has no significant effect on the leakage location, which is crucial to the development of leakage noise correlators. Cody *et al*. (2020) applied linear prediction (LP) techniques to address the detection and localization problem of water distribution pipeline leaks. This data-driven method extracts features from the linear programming coefficients that represent the underlying acoustic signals. The results showed that using shorter segments of LP reconstructed signals can achieve similar levels of accuracy as those using longer segments of raw time series, which is a key advantage in long-term online implementation applications. Gao *et al.* (2006) compared the ability of various cross-correlation-based delay estimators to locate leakage in plastic pipes. The experimental results indicated that random noise has little impact on the measurement error of the signal relative to the time delay estimation resolution of the low-pass filtering characteristics of the pipe. Therefore, small-scale leak signals often get submerged in the background noise, making them difficult to detect.

The point of the pressure signal method lies in the construction of a hydraulic model to simulate the operation of the pipeline network, obtaining simulated monitoring data under different leakage conditions. By minimizing the difference between the actual values and corresponding pressure and flow monitoring points in the pipeline network, the inverse problem based on the steady-state model is formulated and resolved, thereby achieving the leakage localization (Pérez *et al.* 2011). This method was first proposed by Pudar & Liggett (1992), where the leakage detection problem was defined as a least squares parameter estimation problem (Pudar & Liggett 1992). Zhang (2013) simulated pipe network leakage by calculating the orifice flow rate. They optimized the leakage locations and node emitter coefficients using the Cuckoo algorithm, then validated the effectiveness of the method through two actual pipe network cases. Du (2014) utilized the first-order second moment method to quantify the normal fluctuation range of pressure values and transformed leakage localization into an optimization and verification problem similar to nodal flow. They constructed an objective function by minimizing the difference between observed values and simulated values to achieve leakage localization and leakage alarm in pipeline networks. Furthermore, Wu *et al.* (2019) introduced a pressure-related leakage detection method. This approach constructed the objective function related to the leakage node positions and their corresponding emitter coefficients, and solved using a genetic algorithm (GA) to achieve leak detection and localization in pipeline networks. Berglund *et al.* (2017) compared the pressure changes between simulated normal operating conditions and actual leaking pipeline networks. Then they determined the linear combination of individual simulated leaks and used linear constraint minimization to estimate the leakage amount. The deviation between the calibration requirements given by Sanz *et al.* (2016) and the expected requirements indicated the presence of leaks and their approximate locations. The demand nodes are grouped based on the sensitivity of the monitoring points to changes in node requirements. Hajibandeh & Nazif (2018) considered operating in areas with lower pressure, where the likelihood of low leakage is higher, and took into account the probability of corresponding leakage occurring during the detection process. Sophocleous *et al.* (2019) adopted a method with a two-phase approach: (1) search space reduction (SSR) and (2) leak detection and location (LDL). During the SSR phase, the number of decision variables is reduced with the range of possible values, while attempting to maintain an optimal solution. Subsequently, the size and area of the leak are detected in the LDL phase.

Except for the mentioned methods, some researchers have also employed probability statistical methods to locate leakage events in the research of pressure signals. Poulakis *et al.* (2003) proposed a Bayesian system identification method to estimate the most probable leakage events (leakage location and amount) and the uncertainties of these estimates based on flow test data. The experimental results showed that the model with an error of less than 5% is able to correctly identify the leakage locations in the pipeline network. Lei *et al.* (2020) investigated the impact of node flow uncertainty on the leakage localization ability of water supply networks. In different conditions, they built a hydraulic model for the demand distribution of node flow. The results indicated that under the same leakage scenario, the decomposed model is more efficient in leakage localization compared to the uniform model. Díaz *et al.* (2018) proposed a leakage detection method for state estimation that evaluates leakage events by analyzing a single hydraulic state and perceives significant leakage incidents. The experimental results indicated that the probability calculation method of the measurement-estimate joint binary distribution is more efficient than the sampling method of normalized residual calculation. Jerez *et al.* (2021) proposed a Bayesian model updating method for locating pollution sources in water distribution networks (WDNs). By combining Bayesian model updating techniques with water quality simulation models, the most reliable location of the pollution source can be determined, along with providing information about the pollution source. Jensen & Jerez (2020) investigated the hydraulic characterization and evaluation of large-scale water supply pipe networks in the presence of uncertainty. They defined the categories of hydraulic models using connectivity events, calculated the posterior probabilities of each hydraulic model category, and estimated the most probable connectivity scenarios of the water supply pipe network.

Traditional methods for locating leaks do not consider errors in WDN modeling and pressure monitoring points. If errors are considered, the observed values from simulation modeling will not match the actual values, making it impossible to obtain an optimal solution. This results in the leak location deviating from the true location (Zhang 2013). Modeling errors are the errors in the roughness coefficients of pipes and the node demand in the actual network compared to the hydraulic model. Measurement errors are the differences between the node pressures in the hydraulic model and the pressures at the corresponding monitoring points in the actual network. Therefore, the key issue in the leakage probability prediction model based on the hydraulic model lies in quantifying the hydraulic model error, and addressing the deviation in the positioning of leakage caused by modeling and measurement errors. In the preliminary work of this study, we investigated a method for identifying leakage areas based on virtual areas. A leakage area identification model was constructed to identify virtual leakage areas and estimate the magnitude of leakage in each area using the improved Gray Wolf Optimization algorithm (Fang *et al.* 2023). In the previously identified virtual leakage area, this paper proposes a probability prediction method for leakage in a WDN based on Bayesian inference. First, the prior probability of leakage events is transformed into a posterior probability, using the posterior probability density function (PDF) to quantify the uncertainty associated with modeling errors and measurement errors in hydraulic simulation models. Then, the probability distributions for leakage segments and associated leakage volumes within the leakage area are calculated. Finally, the method predicts the locations of leakage segments and corresponding leakage volumes, enhancing the efficiency of leakage detection in large-scale water supply networks. This provides valuable support for decision-making in water supply enterprises.

## METHODS

When an actual water supply network leaks, how to simulate the actual operation of the network by leveraging the hydraulic model under normal operating conditions, is a significant problem in leak probability prediction models (Berglund *et al.* 2017). Hydraulic modeling is used to simulate the behavior of water flow in hydraulic systems. Although data does not change when modeling at a single point in time, as modeling is conducted at different time points, measurement errors and modeling errors will continuously arise. Therefore, by utilizing Bayesian inference based on probability statistics, we can reduce these errors through multiple data collection iterations. In previous research, it has been indicated that constructing a leakage area identification model based on optimization methods enables the recognition of virtual leakage areas and estimation of the magnitude of leakage for each virtual area (Fang *et al.* 2023). Based on that, the probability estimation of each leakage event is provided using Bayesian inference, and all simulated leakage events are prioritized. That means, the event with the highest probability is considered to be the leakage event most likely to occur in the WDN, thereby achieving accurate leakage prediction. The proposed leakage localization framework based on Bayesian inference consists of two stages: the initialization process and posterior probability calculation.

### Phase I initialization process

*Q*in the pipe network, the leakage is simulated within the intervals and ranges where the water demand continuously varies, as shown in the following equation.where

*Q*represents the total leakage of the pipe network, represents the leakage at leakage points,

*S*represents the number of leaks in the pipe network, and

*S*is less than or equal to the total number of network nodes

*N*. If there is only one leak in the pipe network,

*S*equals 1, as shown in the following equation.where represents the change interval of leakage amount and leakage position. represents the maximum leakage amount at a node, while

*N*represents the total number of leakage nodes in the pipe network. is less than or equal to

*Q*.

### Phase II calculates the posterior probability

Bayesian inference describes the situation of unknown variables through probability distributions. This probability distribution is an estimating process about the unknown variables based on existing knowledge. The flexibility and generality of Bayesian inference enable it to handle quite complex problems, and it does not hinder the fitting of models with many parameters and complex multilevel probability specifications.

In the leakage hydraulic model (denoted as ), we use leakage amount and leakage location to describe the leakage events. Therefore, different leakage positions and leakage amounts constitute a series of parameter sets . For multi-point leakage, subset *j* and are column vectors, where the number of leakage points corresponds to the dimension of this vector. For pressure monitoring points, assuming there are *S* monitoring points, and each monitoring point has undergone *T* measurements when leakage occurs, we can use to represent the pressure measurement value of the *i*th monitoring point at the *j*th time. Therefore, is a matrix of size *S***T* with elements .

*S**

*T*matrix composed of the error

*e*(

*γ*) between observed data and simulated data from pressure monitoring points under each operating condition.

*c*

_{2}ensures that the posterior PDF integrates to 1 over the entire parameter space. Since the prior distribution is a constant,

*c*

_{2}includes this constant and is determined during the normalization process, ultimately not affecting the posterior probability.

Thus, we obtain the posterior probability of the optimal estimation of the model parameter . By calculating formula (12) to prioritize the model parameter set , values with higher probabilities are considered as the optimal values , representing the most probable locations and corresponding amounts of leakage.

## EXPERIMENT AND RESULT ANALYSIS

To assess the reliability of the proposed model, we conducted experiments on the NET3 benchmark model network, which is publicly available from the Rossman team. In previous work, a method for identifying leakage areas based on virtual partitioning was studied. The hydraulic model was used to simulate leakage in the pipe network, establish rules for virtual partitioning, reduce the search space of the model, and identify abnormal areas, providing predictive targets for leakage probability prediction (Fang *et al.* 2023).

We focus on the issue of leakage probability prediction within virtual areas I and II. Single-point leakage and multi-point leakage scenarios are simulated separately, along with variation and pattern of node water demand. As shown in Figure 2(a) and 2(b) represents variations in node water demand (L/s), while the remaining set represents time multipliers (dimensionless). The time interval for all sets is 1 hour, during which the values remain constant indicating that the water demand remains unchanged and is equal to the product of the base water demand and the time multiplier for that time period. It is worth noting that nodes 15, 123, 203, and 35 in the network have different demand patterns, which are not detailed here.

During our experiment, we simulate two types of leakage scenarios: single-point leakage and multi-point leakage. Simulated single-point leakage occurs in virtual area I, which includes 14 water consumption nodes, 17 pipes, 1 water source, and 1 water pump, among other physical components. Simulated multi-point leakage occurs in virtual area II, which includes 22 water nodes, 25 pipes, 1 water source, 1 water tank, 1 water pump, and other physical components.

*S*= 2. They are located at nodes 7 and 14 in Figure 3(a). In the WDN, multi-point leakage events can be simulated by adding multiple single-point leakage events. As shown in Figure 3(b), it simulates two leakages occurring at nodes 6 and 17, with leakage magnitudes of 10 and 5 L/s, respectively. The number of monitoring points

*S*= 2. They are located at nodes 8 and 14 in Figure 3(b) (Hu

*et al.*2021).

Using simulated pressure values under leakage conditions as field actual pressure observations. Quantifying the uncertainties of modeling errors and measurement errors in hydraulic simulation models through a leak probability prediction model. Calculate the probability distribution of leakage nodes and leak quantities within the leakage area, and then search potential leakage events. Finally, by invoking the EPANET dynamic link library, select any node in the pipe network as the leakage node to simulate leakage conditions occurring in an actual pipe network.

### Prediction of single-point leakage probability

Due to the differences between the actual WDN and the hydraulic model, random errors are added to the model's predictions to simulate the differences between the actual network predictions and the modeling process, and the measurements from sensors. Furthermore, it introduces the random errors *α*, *β*, and *γ*, which respectively represent the roughness coefficient error of the pipe network model, the error in node water demand, and the error in pressure monitoring points. Assuming that the variations of each error term follow a zero-mean uniform distribution, with boundaries of , , and , these values represent the magnitude of model errors, expressed as a percentage of the actual WDN.

*et al.*2003). First, consider collecting data once to calculate the posterior probability. The impact of roughness coefficient error and node water demand error on the prediction results are shown in Figure 4.

As shown in Figure 4(a) and 4(b), respectively, represent the probability distribution of leakage events for each type after adding roughness coefficients and errors in water demand for nodes. The leakage probability is expressed as the posterior probability based on Bayesian inference. The experimental results indicate that node 3 has the highest probability of leakage, with the probability of leakage in other nodes being significantly lower than that of node 3. Near the topology structure of node 3, nodes 6–8 have a higher probability of leakage compared to other nodes. The closer the nodes are to the location of the leakage point, the higher the probability of leakage compared to other positions. This is because there was a leakage at node 3, which resulted in a decrease in pressure in the neighboring nodes. Therefore, even with the addition of 5% roughness coefficient error and error in node water demand, the model is still able to accurately predict the location of leakage.

### Prediction of multi-point leakage probability

When leakages occur in the water supply network, the actual number of leaks often cannot be determined in advance. Even when the number of leaks is known, there may be thousands of possible combinations of leaks, resulting in a large computational burden and low accuracy. Therefore, based on the single-point leakage probability prediction model, we investigate a multi-point leakage prediction method. Assuming the total leakage of the pipe network is known, the total leakage can be calculated based on the pressure of the network (Maghrebi *et al.* 2014). With the increase in network leakage, the pressure decreases nonlinearly, but within a smaller range, it shows a linear correlation (Sophocleous *et al.* 2019). In the WDN, multi-point leakage events can be simulated by adding multiple single-point leakage events, and the location of the leakage is more likely to cause pressure fluctuations than the amount of leakage.

*K*-value decrement method to achieve leakage prediction, where

*K*represents the number of leakage points in the pipe network. If the value of

*K*decreases to 0, it indicates that all leakage incidents in the pipe network have been thoroughly inspected, and

*K*leak locations are outputted.

In phase II, the total network leakage is added to an arbitrary node. The EPANET toolkit is then called to perform hydraulic analysis and obtain simulated data for the leakage model. Calculating the posterior probability of leakage combinations based on the Bayesian framework and prioritizing them, thereby obtaining the leakage events with the highest probability of leakage.

As shown in Figure 3(b), the linear combination of individual leaks can simulate multiple leakage scenarios in the pipe network. In this simulation, two leaks occur at nodes 6 and 17, with leakage rates of 5 and 10 L/s, respectively. The number of monitoring points *S* = 2, with pressure monitoring points located at nodes 8 and 14. Under the condition of adding 5% measurement error, consider using statistical methods, with intervals of one-time units, to quantify the measurement errors through multiple data collection. According to Table 1, a multiple-point leakage event occurred in the simulated WDN. Collecting experimental results of 10 independent experiments under this leakage event were collected. The three possible leakage events with the highest probabilities of leakage were selected based on the ranking of leakage probabilities in each independent experiment. The experimental results indicate that, among ten independent experiments, the leakage event with the highest predicted probability of leakage occurred at node 17 in eight experiments. The predicted results of the remaining two independent experiments are nodes 18 and 20 (both nodes are located near node 17).

. | ID . | P . | ID . | P . | ID . | P . | ID . | P . | ID . | P . |
---|---|---|---|---|---|---|---|---|---|---|

One | Two | Three | Four | Five | ||||||

1 | 17 | 1 | 17 | 1 | 17 | 1 | 17 | 1 | 18 | 1 |

2 | 10 | 0.21 | 7 | 0.38 | 7 | 0.38 | 10 | 0.21 | 7 | 0.38 |

3 | 9 | 0.1 | 8 | 0.09 | 8 | 0.09 | 9 | 0.1 | 16 | 0.1 |

Six | Seven | Eight | Nine | Ten | ||||||

1 | 17 | 1 | 17 | 1 | 17 | 1 | 17 | 1 | 20 | 1 |

2 | 6 | 0.69 | 10 | 0.21 | 10 | 0.21 | 6 | 0.69 | 10 | 0.21 |

3 | 8 | 0.09 | 9 | 0.1 | 9 | 0.1 | 8 | 0.09 | 16 | 0.1 |

. | ID . | P . | ID . | P . | ID . | P . | ID . | P . | ID . | P . |
---|---|---|---|---|---|---|---|---|---|---|

One | Two | Three | Four | Five | ||||||

1 | 17 | 1 | 17 | 1 | 17 | 1 | 17 | 1 | 18 | 1 |

2 | 10 | 0.21 | 7 | 0.38 | 7 | 0.38 | 10 | 0.21 | 7 | 0.38 |

3 | 9 | 0.1 | 8 | 0.09 | 8 | 0.09 | 9 | 0.1 | 16 | 0.1 |

Six | Seven | Eight | Nine | Ten | ||||||

1 | 17 | 1 | 17 | 1 | 17 | 1 | 17 | 1 | 20 | 1 |

2 | 6 | 0.69 | 10 | 0.21 | 10 | 0.21 | 6 | 0.69 | 10 | 0.21 |

3 | 8 | 0.09 | 9 | 0.1 | 9 | 0.1 | 8 | 0.09 | 16 | 0.1 |

In this study, we also compare our method with the leak localization approaches proposed by Lijuan *et al.* (2012) and Sophocleous *et al.* (2019), which utilize optimization algorithms. Each method underwent 10 independent experiments, with the runtime representing the duration required for a single experiment (conducted on an Intel(R) Core(TM) i7-8750H CPU @ 2.20 GHz, 16 GB RAM). The results are presented in Table 2.

. | Single-point leakage . | Multi-point leakage . | ||
---|---|---|---|---|

No error added | ||||

Correct count | Run time | Correct count | Run time | |

Bayesian | 10 | 3 s | 10 | 9 s |

GA | 10 | 705 s | – | – |

Add error | ||||

correct count | Run time | correct count | Run time | |

Bayesian | 10 | 4 s | 8 | 10 s |

GA | 7 | 550 s | – | – |

. | Single-point leakage . | Multi-point leakage . | ||
---|---|---|---|---|

No error added | ||||

Correct count | Run time | Correct count | Run time | |

Bayesian | 10 | 3 s | 10 | 9 s |

GA | 10 | 705 s | – | – |

Add error | ||||

correct count | Run time | correct count | Run time | |

Bayesian | 10 | 4 s | 8 | 10 s |

GA | 7 | 550 s | – | – |

As shown in Table 2, for the single-point leakage scenario, both Bayesian inference and the GA can effectively localize the leakage when no error is added. However, the GA's runtime is significantly longer than that of Bayesian inference. When errors are introduced, Bayesian inference still accurately localizes the leakage in a short amount of time, whereas the GA not only takes much longer but also fails to correctly localize the leakage in three out of ten trials. For a multi-point leakage scenario, Bayesian inference accurately localizes the leakages when no error is added. When errors are introduced, Bayesian inference fails to correctly localize the leakages twice, whereas the GA is unable to localize multi-point leakages altogether. Therefore, Bayesian inference is not only leakage localization but also effectively handles the impact of errors.

## CONCLUSIONS

We build a leak prediction model based on Bayesian inference, achieving single-point and multi-point leak prediction within the leakage region. Initially, a leakage prediction model is constructed based on Bayesian inference to calculate the posterior probability of each leakage event. Subsequently, considering the uncertainties in the roughness coefficient and the water demand, modeling errors, and measurement errors should be added. Finally, considering the multi-point leakage in the pipe network, multiple sets of data are collected to solve the predictive model. The experimental results indicate that multiple collections based on statistical methods contribute to the accuracy and reliability of leakage prediction. Collecting multiple sets of pressure data can predict the leakage location very stably, and investigate leakage events one by one to achieve multi-point leakage probability prediction. This can provide water workers with more precise guidance.

This paper explores the feasibility of Bayesian methods in multi-point leakage detection but does not delve into scenarios with numerous leakage points, such as five or ten leakage points. Future research should further investigate the accuracy of this method in cases with numerous leakage points and its accuracy in larger WDNs.

## ACKNOWLEDGEMENTS

This work is supported by the funds: National Key Research and Development Project No.2023YFC3807704; Scientific Research Program of Anhui Provincial Department of Education No.2022AH010018.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Study of Hydraulic Model Calibration and Leakage Location of Water Distribution System*

*Journal of Civil Engineering and Urbanism*,

**4**(3), 322–327.

*Research on Leakage Detection Via Hydraulic Model Calibration in Water Distribution Systems*

*MS thesis*

*Study on Leak Detection and Localization of Urban Water Distribution Networks Based on Data-Driven Approaches*

*MS thesis*