## Abstract

The water loss rates in Brazil are still very concerning and cause huge financial, social, and environmental damages. Although great advances have been made in acoustic leak detection methods, some of them are intrusive or at least demand convenient access points to the pipe's surface. Furthermore, some methods are expensive and require a highly experienced and qualified operator. Thus, the goal of this paper is to establish a benchmark using a simplistic analytical model for the ground surface analysis for leaks location, simplifying the search process carried out in water networks, and requiring minimal system interference. For leak location prediction, the proposed method considered two seismic wave paths – the compressional body waves and the Rayleigh superficial waves. The traveltime curves of the compressional waves are calculated as a function of the surface wave's experimental information. Monte Carlo Simulations were carried out to test each model's precision considering uncertainties related to the input parameters of the problem. At last, through a parametric study, the two seismic events were combined to obtain better accuracy in predicting the leakage epicenter location. In the results, the maximum absolute error found was 15 cm, and the minimum improvement obtained with the parametric study was 27.4%.

## HIGHLIGHTS

The model provides a good prediction of the leaks’ location.

The method simplifies the search process for leaks in the water distribution system.

The methodology requires easy instrumentation.

The research applies a robust method for the statistical treatment of uncertainties associated with experimental data.

### Graphical Abstract

## INTRODUCTION

The high rate of water losses existing in the distribution networks of supply companies is a topic of great relevance nowadays. According to the National Sanitation Information System (SNIS), approximately 40% of treated water is lost in the distribution process in Brazil (SNIS 2022). Such waste is concerning, especially considering that water is a finite resource, and impacts the environmental, human, social, and economic development.

To combat the severe real losses in the water distribution networks, many techniques for detecting and locating leaks in underground pipelines are being developed and constantly improved. Vibroacoustic methods, for instance, which use the vibroacoustic signal emitted by the leaks, have stood out and proved to be efficient in this scenario (Tsutiya 2008; Adnan *et al.* 2015).

However, although promising, some of these methods are invasive and require changes in the system's operation or downtime in it, like smart-balls technology (El-Zahab & Zayed 2019). Other methods, such as acoustic correlators, at least demand convenient access points with direct access to the pipe's surface, like hydrants and valves, for positioning the sensors (Scussel *et al.* 2021), being sometimes unfeasible in cities with little infrastructure. Furthermore, some current vibroacoustic methods are expensive due to the technology employed and require a highly experienced and qualified personnel to process and interpret the collected data (Yu *et al.* 2021), and others are only effective when used in metallic pipelines, due to material-induced attenuation of other pipe's types (El-Zahab & Zayed 2019; Scussel *et al.* 2021).

Therefore, the goal of this article is to propose an analytical vibroacoustic model to predict water leaks’ location in an underground pipeline through the ground surface analysis, which does not require interference and downtime in the water distribution system. The method requires minimal operator experience and training since it automates the leak's location by providing the distance between a sensor and the epicenter of the leak, besides its instrumentation is relatively cheap and portable.

The proposed model considers two seismic paths to the leak's signal propagation to the soil surface and predicts the leak's location assuming uncertainties in the parameters of the problem, and processing the signals collected by a mesh of sensors on the ground surface. The effect of random variables – which in this case are the Rayleigh wave velocity, the dynamic Poisson's rate and the depth of the pipe – on the random output variables is investigated.

The uncertainties of this model are linked to the variability in its parameters, estimations errors, modeling simplifications and hypotheses, and some lack of knowledge about the entire physical phenomenon that is studied, i.e. epistemic uncertainties (Soize 2013, 2017). A geological environment is a complex multiparameter system. The nature of the soil and the state in which it finds itself are the primary factors that influence its behavior. Thus, in a geophysical study, there will always be a significant degree of uncertainty related to the properties of the environment that is under investigation due to the morphological variations of its profile. Furthermore, the unavoidable measurement errors also introduce randomness into the results of a real experiment. Thus, for a better interpretation of the results, with known levels of confidence and with minimal loss of information about the investigated object, it is necessary to implement a statistical analysis (Troyan & Kiselev 2010). Statistical inference will both describe the experimental observations and verify the prediction quality of the proposed analytical model to describe, in a simplified way, the phenomenon.

Through the proposed computational model, the Monte Carlo (MC) simulation evaluates the propagation of uncertainties of the random parameters and quantifies how much the results are influenced by them (Kroese *et al.* 2011). In the MC simulation, a wide variety of scenarios is created using a deterministic method and several values of the parameters of the model according to their known Probability Density Functions (PDF). At that point, this amount of information is combined through statistical analysis to obtain the response of the random system (Kroese *et al.* 2011). The large number of random samples created converges to the true probabilistic results, i.e. the most likely scenarios. This simulation technique is widely used for stochastic calculation due to its conceptual and algorithmic simplicity and great outcomes, although it usually requires a high computational cost, because, in general, the method requires many samples to get a satisfactory approximation, which leads to a long runtime (Shonkwiler & Mendivil 2009).

## ANALYTICAL MODEL

Any excitation on the ground surface or even inside the massif propagates in the medium through seismic waves, which are a kind of elastic waves – since they depend on the elastic properties of the material of the medium through which they are traveling (Hunter & Crow 2012). These waves are classified according to their propagation characteristics. The waves generated internally in the soil are known as body waves – such as compressional waves or shear waves – and those of superficial propagation are known as surface waves – such as Rayleigh waves and Love waves (Kramer 1996).

The second path considered is represented in Figure 1 by the blue dashed line, in which the compressional wave propagates up to the surface boundary, called epicenter, and then is converted into a Rayleigh wave along the ground surface and arrives at the sensors. The existing Interfaces along the propagation paths are responsible for the transformation and decomposition process of the seismic waves. Rayleigh waves, in turn, propagate on the free surface of the ground, in a retrograde elliptical movement (Figure 2(b)), causing vertical displacements in the propagation plane. The amplitude of the displacements of the Rayleigh waves decreases exponentially as the distance traveled increases (Filho 2002; Pinto 2016), and its propagation velocity is generally less than that of P waves.

The path to be taken along the pipe wall by a fluid-borne wave generated by a negative pressure in the pipe's internal fluid, which radiates to the ground by body waves (Scussel *et al.* 2021), was not considered, as well as the shear waves and the several other possibilities due to reflection, diffraction and refraction in the soil massif and on its surface. Thus, only two wave paths are treated in the analytical modeling of this work – the compressional body waves and the Rayleigh superficial waves – which will greatly contribute to the composition of the signal collected by the sensors on the ground surface.

*A*and

*B*is estimated from the peak in the Cross-Correlation Function (CCF) between the two measured signals placed at points

*A*and

*B*, respectively. The CCF is given by (Scussel

*et al.*2021):where and indicates the inverse Fourier transform, and denotes the Cross Power Spectral Density (CPSD) between the signals.

*F*and

_{R}*F*are the contribution percentages of Rayleigh and compressional waves in the final signal, respectively. In this work, a parametric study will be performed to obtain the combinations of

_{P}*F*and

_{P}*F*coefficients, as a function of the distance of the sensors, that provide a better accuracy in predicting the leakage epicenter location.

_{R}If the distance between the pairs of sensors is large enough so that there is a significant change in the waves’ contribution in each sensor or if the massif is highly heterogeneous, generating a large number of internal interfaces for reflection and refraction, the correlation of the signals will become complex and the hypotheses adopted in the models will not represent reality so well, thus generating a prediction error concerning the location of the leak.

## EXPERIMENTAL TESTS

### Characterization of uncertainties of the parameters of the model

In the MC simulation, the probabilistic distribution of the output variable is obtained through the proposed mathematical model, which describes the problem, and with the random samples of the parameters of the model, which are previously characterized by a PDF. The PDF determines the probability of a variable to assume a given value, thus allowing greater accuracy in describing the uncertainties associated with the parameters used (Rezvani & Bolduc 2014). In the problem approached in this work, there are three parameters with associated uncertainties: the Rayleigh wave velocity (*V _{R}*), the velocities ratio (), function of Poisson's ratio, and the depth of the pipe (

*h*), since the output predictions of the model are sensitive to variances of these input parameters.

*c*) is a function of frequency cycles (Jongmans & Demanet 1993):

As an example, Figure 3(a) shows the experimental dispersion curve associated with a pair of receivers on the ground surface. Moreover, the heterogeneities in the geomaterial also change the propagation velocity of the waves, since this parameter depends on the properties of the medium in which it travels.

The PDF of the Rayleigh wave velocity (*V _{R}*) was obtained from the experimental samples collected. All experimental tests were carried out in a controlled space, as shown in Figure 4(b). The experimental setup was composed of a plywood box of size 1.6 m × 1.6 m × 0.8 m, filled with compacted local soil, and the inner walls were covered with a 10 mm-thick soft polyester mantle.

To obtain a set of samples of the surface wave velocity in the box, 10 accelerometers were distributed on the surface of the soil and fixed in steel screws, covering a mesh of dots spaced 0.2 m apart, as shown in Figure 4(a). An instrumented hammer was used as dynamic source, generating an impulse signal on a central screw on the ground surface. Figure 3(b) presents the coherence obtained between the superficial excitation and the signal measured by the sensors around.

*V*,

_{P}*V*, and the ratio

_{S}*V*, as presented below and discussed by Filho (2002), where

_{R}/V_{S}*V*is the shear wave:

_{S}*υ*, of the soil used. These dynamic mechanical properties of the soils are influenced by the morphological characteristics of the geomaterial and its physical state – degrees of compaction (GC), internal humidity (

*w*), homogeneity, and others. As shown by Ohsaki & Iwasaki (1973) through seismic explorations results for a wide variety of soils, the dynamic Poisson's ratio is always positive. Thus, with a positive finite support for the parameter of [0.25–0.40] and expectation

*μ*= 0.35, by the Maximum Entropy Principle, the maximally unpresumptive distribution consistent with the data is a truncated exponential distribution (Udwadia 1989; Cunha 2020), as shown in Figure 6(a). To include the sanitation companies’ uncertainty associated to the installation depth of the underground pipes, a variation of Δ

*h*= 10% and a uniform distribution for the parameter

*h*were adopted in this work. (Figure 6(b)).

At the core of any MC method is a random number generator that must be able to produce a large stream of random variables that are independent and distributed according to a pre-established Probability Distribution Function (Kroese *et al.* 2011). In this work, for the random variables generation used in the simulations, the Inverse Transform Method was used. So, a random variable X with a cumulative distribution function F is generated by drawing *U* ∼ *U*(0,1) and setting *X* = *F*^{−1}(U) (Kroese *et al.* 2011).

### Experimental setup and procedures

Signal acquisition was carried out with seven simultaneous channels at a sampling frequency of 12.8 kHz, during 30 s, using the LMS Scadas XS system and uniaxial piezoelectric accelerometers PCB 333B. The input signal is a white noise in the range [0–6,400] Hz, generated in Python and amplified by a sound amplifier. Wang *et al.* (2017) define leakage as a continuous seismic microsource that emits a random vibration, with constant statistical parameters over time. Moreover, for these applications of leak detection and location in underground pipes, the leak spectrum can be simulated by a white noise if the ratio of high- to low-frequency cut-off frequencies of the bandwidth is small (10) (Scussel *et al.* 2021).

In the tests, the excitation amplitude was performed with a root mean square (rms) acceleration value ranging from 0.1 to 2.0 m/s^{2}. The lowest signal amplitude was defined for the signal-to-noise ratio (SNR) at around 0 dB and the highest amplitude was limited by the electromechanical actuator, applying a current value close to its maximum working current.

For signal processing, for identifying the time delays between the signals of each pair of selected sensors, the CCF between the samples was used. But previously, as mentioned the signals collected were filtered with a band-pass filter in the range of [300–900] Hz, since as shown in Figure 3(b), it is a band with good coherence between the signals from the surface. It is important to point out that the loss of coherence in the other frequency intervals is related to the natural characteristics of soil damping, which attenuate the high frequencies more effectively of the source of excitation (Proença *et al.* 2022), and also with the background noises that focus more on mid-high frequency ranges (El-Zahab & Zayed 2019).

## RESULTS AND DISCUSSION

*Y*whenever :

For the analysis, three layouts of sensors and six different input excitation amplitudes were used. The MC simulation was carried out through 20,000 random samples for each case.

^{2}. These statistical results take into account the obtained values for all 11 pairs of sensors. The mean of the 11 predictions is calculated in each scenario. The statistical results, from both proposed analytical equations, provided a good prediction of the leak location in all cases evaluated. Taking into account the most likely value of the PDF, the absolute error obtained was 15 cm for the first model and 9 cm for the second one in this case.

In the analysis, it was observed that the coefficients *F _{R}* and

*F*are a function of the distance of the sensors in relation to the position of the epicenter of the leak. As the distance increases,

_{P}*F*decreases. The rate of reduction of

_{R}*F*will be better studied in the future, as well as its dependence on the depth h.

_{R}*F*and

_{R}*F*coefficients obtained in the parametric study were used in other cases. Figure 11 shows the results of leak location prediction obtained with the combination of the seismic events, for the three different layouts of sensors considered and with rms acceleration value equal to 1.0 m/s

_{P}^{2}too. With the combination of events, the results of predictions improved, in the worst case, by 27.4%, compared to the predictions of the isolated models, of the same cases. Taking into account the most likely value of PDF, the maximum absolute error obtained was 8 cm.

The three tested arrangements were promising in locating the leak epicenter. As shown in Figure 11, with the layouts A1, A2, and A3, the model obtained predictions with a mean absolute error of approximately 4, 7, and 8 cm, respectively. This variation observed between the arrangements is related to the increase in the relative spacing between some pairs of sensors and not the absolute distance of a sensor to the epicenter. Furthermore, it is worth noting that the method achieved good results in locating a leak in a plastic pipe, which is already a difficulty for many existing vibroacoustic methods.

^{2}. Figure 12 shows the boxplot of the MC distribution for the six excitations investigated. It was observed that for the last two excitation amplitudes the predictions were a little worse. This could be due to reflections in the box since the input signals have a lot of energy and could not be completely attenuated until the walls.

The level of outliers is related to the SNR. The background noise was responsible for generating the observed discrepant predictions, which comprised 2.73% of the samples for *A*_{rms} = 0.1 m/s^{2}, to 0.82% of the samples for *A*_{rms} = 2 m/s^{2} (70% variation between scenarios). Moreover, it was observed that the data dispersivity – interquartile ranges and inter-whiskers ranges – varied only 15% with the change in the amplitude of the excitation signal, while the range of outliers varied by 40%. It is also important to add that all existing outliers are considered mild outliers since they are data outside the inner fences but inside the outer fences.

Outside the filtered band of [300–900] Hz, the results are not good for an excitation signal with *A*_{rms} < 0.5 m/s^{2}. It is known that the excitation signal that propagates through the massif has a lower frequency content, with an exponential reduction in high frequencies due to the natural characteristics of soil damping (Proença *et al.* 2022). For the cases investigated in this work, there is a loss of coherence between the signals of the pairs of sensors, at frequencies above 1 kHz (Figure 3(b)), which compromises the correlation of the signals and generates a bigger error in predicting the location of the leak.

For a case with a real water leakage, the soil properties will change considerably in the vicinity of the water flow. The considerable variation in impedances will generate strong reflections that would also reduce the model's accuracy. Combining several pairs of sensors is a strategy to reduce the impact of heterogeneities; however, future tests must also be carried out to better investigate and discuss the quality of results for these cases.

## CONCLUSIONS

In this article, an analytical model was developed to simplify the search process of leaks in water distribution networks, through ground surface analysis. The proposed method considers two seismic paths to the leak's signal propagation to the soil surface – a direct path taken by compressional body waves and a path taken by Rayleigh superficial waves from the epicenter of the leak. To obtain the probabilistic water leak location, geometrical relationships, the delay of the leak's signal measured by pairs of sensors on the ground surface, and statistical inference are used.

Given the model's assumptions, first, velocity samples of the superficial wave (*V*_{r}) are collected experimentally to fit its empirical PDF, then conservative distributions to the Poisson's ratio (*υ*) and to the buried pipe's depth (*h*) are assumed considering the Maximum Entropy Principle. At last, the propagation of uncertainties of these geophysical parameters is addressed using the MC simulation.

In this work, experimental tests were carried out in a controlled environment to verify the validity and accuracy of the method. Its performance was analyzed for each propagation path separately and then for their combination. The statistical results, from both isolated analytical equations, provided a good prediction of the leak location in the cases investigated. Taking into account the most likely value of the PDF, the maximum absolute error obtained was 15 cm for the first seismic event and 9 cm for the second one. In addition, the results attested to the validity of calculating the compressional wave (P) traveltime curves as a function of the Rayleigh wave information, thus avoiding the need to perform intrusive seismic tomography in the massif, like cross-hole, for this worked case.

At last, through a parametric study, the two seismic events were combined, which generated a good improvement in the prediction results. In the worst case, the results of the leak's location improved 27.4%, compared to the predictions of the isolated models, of the same cases. Taking into account the most likely value of the PDF, the maximum absolute error obtained was 8 cm, which is less than half the minimum distance between two sensors. In the study, it was also observed that the coefficients *F _{R}* and

*F*are a function of the distance of the sensors in relation to the position of the epicenter of the leak. Therefore, the rate of reduction of

_{P}*F*will be better studied in the future, as well as its dependence on the depth

_{R}*h*.

Thus, it has been demonstrated that the method proposed has advantages over many other current acoustic methods because it does not require system interference and downtime; furthermore, it does not require a highly qualified operator either, since it automates the leak's location by providing the distance between the sensor and the epicenter of the leak.

It is known that in soils with not-so-controlled conditions and in the presence of a real water leakage there will be more reflective events, which were not accounted for in the model, but that influence it. However, this does not invalidate the proposed method, they only reduce its accuracy. Combining several pairs of accelerometers in a mesh of sensors in different positions over the pipe is already a strategy to try to reduce the impact of these heterogeneity effects. Therefore, future tests must also be carried out to better investigate and discuss the quality of the results for these cases.

## FUNDING

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Previsão numérica do comportamento dinâmico da barragem de Breapampa no Peru (Numerical Prediction of the Dynamic Behavior of the Breapampa dam in Peru)*

*Master's Thesis*

*Modelagem sísmica de ondas elásticas e migração reversa no tempo em meios transversalmente isotrópicos (Seismic Modelling of Elastic Waves and Reverse Time Migration in Transversely Isotropic Media)*

*Doctoral Thesis*

*Determinação experimental do amortecimento de um solo: Desenvolvimento de uma ferramenta de aplicação laboratorial (Experimental Determination of Soil Damping: Development of an Application Tool Laboratory)*