This study investigates rapid dynamic pressure variations in water distribution networks due to critical incidents such as pipe bursts and valve operations. We developed and implemented a machine learning (ML)-based methodology that surpasses traditional slow cycles of pressure data acquisition, facilitating the efficient capture of transient phenomena. Employing the Orion ML library, which features advanced algorithms including long short-term memory dynamic threshold, autoencoder with regression, and time series anomaly detection using generative adversarial networks, we engineered a system that dynamically adjusts data acquisition frequencies to enhance the detection and analysis of anomalies indicative of system failures. The system's performance was extensively tested using a pilot-scale water distribution network across diverse operational conditions, yielding significant enhancements in detecting leaks, blockages, and other anomalies. The effectiveness of this approach was further confirmed in real-world settings, demonstrating its operational feasibility and potential for integration into existing water distribution infrastructures. By optimizing data acquisition based on learned data patterns and detected anomalies, our approach introduces a novel solution to the conventionally resource-intensive practice of high-frequency monitoring. This study underscores the critical role of advanced ML techniques in water network management and explores future possibilities for adaptive monitoring systems across various infrastructural applications.

  • Developed a machine learning (ML)-based system to dynamically monitor pressure in water networks.

  • Utilized advanced Orion ML library algorithms for real-time anomaly detection.

  • Enhanced operational efficiency by optimizing data acquisition rates.

  • Demonstrated effectiveness in pilot and real-world water distribution settings.

  • Integrated Orion ML library with advanced algorithms to improve infrastructure resilience.

In water distribution networks, events such as pipe bursts and valve operations manifest through the dynamics of pressure waves. However, the slow cycles of pressure data acquisition, relative to the rapid propagation of pressure waves, pose challenges in capturing these dynamics effectively. Ongoing research has been directed toward overcoming this limitation in actual water distribution systems (WDSs). For instance, Choi et al. (2015) analyzed the complexities of pressure monitoring in water supply systems, particularly challenges arising from valve operations. They identified the constraints of existing supervisory control and data acquisition (SCADA) systems in recording rapid transient events and stressed the importance of strategic data sampling locations and intervals. Furthermore, Starczewska et al. (2015) examined the occurrence and impacts of transient events on water networks, highlighting the need for a thorough analysis of transient phenomena within complex network configurations.

Considering the increasing complexities in monitoring and managing WDSs, recent research has significantly enhanced our understanding and capabilities. Dai et al. (2024a) conducted a comparative assessment of global sensitivity approaches to elucidate the uncertainty in water resources models, revealing crucial insights into the constraints and possibilities of current hydrological models under varying parameters. Furthermore, Dai et al. (2024b) introduced a novel two-step Bayesian network-based process for sensitivity analysis in complex nitrogen reactive transport modeling, providing a new perspective on managing the intricacies of nutrient transport in water networks. Complementing this research, Dai et al. (2023) undertook experimental and numerical studies on the mechanisms of ground collapse resulting from underground drainage pipe leakage, which directly impacts the structural integrity and operational reliability of urban water systems. Additionally, Hu et al. (2023) have provided advanced techniques for enhancing defect feature purification in multilabel sewer defect classification, pushing the boundaries of technology in precisely identifying infrastructure anomalies. These studies collectively underscore the critical role of integrating advanced analytical methodologies and the latest machine learning (ML) techniques to improve the robustness and accuracy of anomaly detection and management in water distribution networks. Based on these recent advancements, this study proposes an innovative method to dynamically adjust data acquisition frequency based on ML-driven insights, thereby addressing both immediate and long-term challenges in water network management.

Insights obtained from recent studies underscore the critical operational benefits of event detection in water distribution networks. High-frequency, transient-flow pressure data facilitate the precise and cost-efficient identification of system conditions, such as leaks, variations in wave speed, blockages, and trapped air pockets. These detection methods typically utilize inverse transient analysis (ITA), an approach that often involves intentionally inducing transient flows. While effective, this method poses a risk of serious hydraulic incidents. Consequently, the scholarly contributions of Colombo et al. (2009), Duan et al. (2011, 2014), Ferrante et al. (2014), and Vítkovský et al. (2007) in leak detection, as well as the research by Covas & Ramos (2010), Kim (2011), and Kim et al. (2014) on identifying wave speeds, blockages, and air pockets, hold a significant value. To detect such parameters in a pipeline system using the ITA technique, the intentional generation of transient flow should precede it; however, the intentional generation of transient flow can lead to a severe hydraulic accident in the WDS. Therefore, attempts have been made to inject a small amount of pressure through the pipeline network and record the reflected pressure signal without high risk (Brunone et al. 2021; Lee et al. 2021). Recent advancements have shown the effectiveness of high-resolution pressure sensors for accurate leak localization (Levinas et al. 2021). Furthermore, time-series analysis using multiple pressure sensors has been utilized to improve leakage detection in WDSs.

Transient-flow events commonly occur in WDSs owing to the periodic operation of the hydraulic components. However, these events have rarely been observed by data acquisition systems in the WDSs, owing to the low frequency of pressure sampling. Starczewska et al. (2015) argued that the current regulation of WDSs in the UK for pressure monitoring at 15-min intervals is not fast enough to capture changes in pressure due to transient flow. The authors collected high-frequency (100 Hz) pressure data from various points in a real WDS, and the number of severe transient-flow events was ascertained, which could not be determined via low-frequency data collection. Choi et al. (2015) conducted a similar study. They reported a valve-induced transient event in a real WDS with a 1-s time interval (1 Hz). This event cannot be recorded with the existing SCADA system of real WDSs, which records the pressure signal every 1 min (1/60 Hz).

Thus, transient events, which can be used for the diagnosis of WDSs, cannot be observed with low-frequency data acquisition but can be observed with high-frequency sampling. However, few papers have proposed appropriate data sampling frequencies for WDSs. Ye & Fenner (2014) investigated the appropriate sampling interval of flow data for burst alarms in a WDS. In these studies, various data sampling intervals of the flow rate were applied to observe the impact on the accuracy of burst detection. The results indicated that burst events with long durations can be detected even with low-frequency data by applying an adaptive Kalman filter algorithm. Recent field tests have further validated the importance of high-frequency data acquisition for accurate pipeline monitoring (Brunone et al. 2024). However, there have been no studies on the appropriate data sampling frequency for observing transient-flow events in WDSs.

In response to these challenges, this study introduces a device engineered to dynamically adjust its data acquisition frequency based on detected network events. We developed and implemented an unsupervised ML algorithm to facilitate this dynamic adjustment. The efficacy of this device and the ML algorithm's performance were rigorously tested in a pilot-scale water distribution network experimental setup, as well as in actual operational networks.

Time-series anomaly detection using the Orion ML library

In this study, we utilized the Orion ML library, which was developed by the AI Laboratory at the Massachusetts Institute of Technology. This library is specifically tailored for the unsupervised detection of anomalies in time-series data and includes a suite of automated ML tools designed to handle a diverse array of datasets – from spacecraft telemetry signals to soil moisture levels and urban traffic patterns (Alnegheimish et al. 2022).

Anomaly detection in our study is defined as the process of identifying unexpected changes in water pressure that may signal critical failures, such as pipe bursts or abrupt valve closures within a water distribution network. These incidents are among the most detrimental that can occur in these systems.

The ML models provided by Orion follow several key steps. Initially, an ML algorithm is trained to recognize patterns within the data. Following the training phase, the model constructs a predictive time series of values derived from these learned patterns. This predictive series is then compared against actual observed data. Any significant deviations between the model's predictions and the actual data series are identified as anomalies.

ML models

Our study utilizes ML models from the Orion library, which have been shown to outperform traditional autoregressive integrated moving average techniques for processing time-series data (Wong et al. 2022). The employed models include the long short-term memory (LSTM) dynamic threshold, autoencoder with regression (AER), and time-series anomaly detection using generative adversarial networks (TadGANs).

The LSTM dynamic threshold method, specifically utilized by Hundman et al. (2018) for detecting anomalies in spacecraft, employs LSTM networks to dynamically adjust thresholds for pinpointing anomalies. This method combines the robust capabilities of LSTM networks with a non-parametric dynamic thresholding approach to achieve precise anomaly detection. Furthermore, the effectiveness of this technique has been validated with real spacecraft data, illustrating various strategies to enhance system performance under actual operational conditions.

The AER model integrates the strengths of both prediction- and reconstruction-based approaches, employing a joint objective function to train an autoencoder with a regression component. This design enables the generation of both reconstruction- and prediction-based anomaly scores. Wong et al. (2022) introduced this innovative architecture to enhance anomaly detection in time-series data by overcoming the limitations inherent in existing methodologies. The AER model yields more precise anomaly scores by executing bidirectional predictions and reconstructions concurrently. Their research further investigates various methods for combining prediction- and reconstruction-based scores, demonstrating that such integration significantly improves the performance of anomaly detection systems.

TadGAN utilizes generative adversarial networks (GANs) to reconstruct time-series data and detect anomalies through contextual error assessment. Introduced by Geiger et al. (2020), TadGAN represents a cutting-edge GAN-based framework for anomaly detection that employs both Generators and Critics, integrated with LSTM networks, to effectively capture and reconstruct time-series distributions. This architecture benefits from cycle consistency loss, notably enhancing its ability to detect anomalies by accurately reconstructing time-series data.

Hyperparameters

The Orion ML library facilitates the construction of ML pipelines that integrate time-series data preprocessing techniques with anomaly detection methods and ML models, aiming to optimize classification performance through the ideal combination of hyperparameters. Particularly, the ‘interval’ and ‘window_size_portion’ hyperparameters were identified as having a significant impact on the classification performance in our experimental settings.

The ‘interval’ parameter dictates the frequency at which signal preprocessing is executed. The data for validation were collected at a rate of 100 Hz, and since the Orion library is not configured to process such high-frequency data directly, we preprocessed this data into a Unix time format compatible with the Orion library.

The ‘window_size_portion’ is critical in determining the accuracy with which the model interprets the discrepancy between predicted and actual time-series values. Anomalies are identified based on data within the specified window size, and the ‘window_size_portion’ represents the proportion of the window size relative to the total data size. A smaller ratio suggests a narrower window, enhancing the detection of rapidly evolving intricate patterns, whereas a larger ratio facilitates the observation of broader trends. However, an excessively small window size can result in false positives (FPs) by incorrectly classifying non-existent events, underscoring the necessity of selecting an appropriate ‘window_size_portion’.

Other hyperparameters, which appeared to have minimal impact on the classification performance of our experimental data, were maintained at their default settings in the Orion library; for example, the Adam optimizer was used with a batch size of 64. However, reducing the number of training epochs due to computational constraints and operational requirements is imperative. The duration of training epochs directly affects computation time, and given the operational exigencies of real-world water distribution networks, the response time – from the occurrence of an event to its detection – must be within 1 h. Consequently, training and event detection needed to be completed within this specified timeframe.

Evaluation

The effectiveness of the ML algorithms was assessed using three key metrics: recall, F1-score, and relative execution time. Prioritizing recall is essential due to the critical nature of detecting true events within water distribution networks. Recall is defined as the ratio of actual positive instances correctly identified by the model to the total actual positives, which is calculated as follows:
formula
where TP (true positive) denotes the number of correct positive predictions made by the model, and FN (false negative) denotes the number of positive instances that the model incorrectly predicted as negative. High recall is crucial as it indicates a lower rate of FNs – which is vital in operational contexts where failing to detect actual events could lead to severe consequences. Although FPs are undesirable, they are considered less detrimental in this scenario, thus highlighting the importance of recall.
The F1-score, the second metric of interest, provides a balanced measure of an algorithm's precision (the ability to label only truly positive samples) and recall. The F1-score is calculated using the formula:
formula
with precision given by:
formula
where FP represents the number of incorrect positive predictions made by the model. The F1-score, as the harmonic mean of precision and recall, is especially useful in scenarios where there is a need to strike a balance between the accuracy of detection and the imperative to avoid overlooking real events.
Relative execution time is a crucial metric for measuring the efficiency of ML algorithms by quantifying both the training and inference times, consolidated into a single indicator. Under specific hyperparameter settings, this metric facilitates the comparison of the cumulative execution time of a model against a designated high-performing baseline; the baseline is established as an index for performance comparison. The calculation for relative execution time is defined by the following formula:
formula
where represents the execution time for the current model and hyperparameter configuration, and is the execution time for the baseline model, which is optimally configured. For example, the LSTM dynamic threshold model with an interval of 28,800 and a window size portion of 0.01 serve as the benchmark, denoted as 1.

This metric is particularly significant in the typical contexts of water distribution networks, which may involve outdoor or underground settings, where using high-specification computing devices is impractical. Relative execution time offers insights into the operational feasibility of ML algorithms, ensuring they are precise in anomaly detection and operationally efficient. Such efficiency is indispensable for systems requiring real-time analysis and implementation in edge-computing scenarios.

Pilot-scale WDS

For parameter tuning and performance evaluation of the Orion algorithms in our research, as well as for optimizing implementation in actual water distribution networks, we utilized the extensive testing facilities of the Water Industry Cluster in Daegu Metropolitan City. The experimental network comprises two reservoirs, 20 junctions, 21 pipelines, and one pump, with a total pipe length of 1,428.38 m. The network incorporates four different types of pipe materials: polyvinyl chloride, ductile iron, polyethylene, and steel. Table 1 summarizes the key specifications of this pilot-scale water distribution network. Figure 1 depicts a schematic of the experimental WDS used in this study. Water is supplied by RUpstream via a centrifugal pump, achieving an average flow rate between 769.4 and 773.9 m3/h.
Table 1

Pipe properties of the target water network

Pipe IDNode 1Node 2Diameter (mm)Length (m)Material
P1 RUpstream N1 300 64.50 DCIP 
P2 N1 N2 300 15.94 DCIP 
P3 N2 N3 300 70.05 PVC 
P4 N3 N4 300 43.33 PE 
P5 N4 N5 300 218.59 PVC 
P6 N5 N6 300 134.26 PVC 
P7 N5 N8 300 261.34 PVC 
P8 N6 300 135.08 PVC 
P9 N7 300 68.73 PVC 
P10 N7 300 21.09 PE 
P11 PDAQ 300 29.45 PE 
P12 PDAQ N8 300 22.32 DCIP 
P13 B8 N9 300 27.98 DCIP 
P14 N9 N10 300 50.36 PVC 
P15 N10 N11 300 50.25 SP 
P16 N11 N12 300 77.24 PVC 
P17 N12 N13 300 34.11 PVC 
P18 N13 N14 300 26.77 PE 
P19 N14 N15 300 70.00 SP 
P20 N15 RDownstream 300 6.99 PVC 
Pipe IDNode 1Node 2Diameter (mm)Length (m)Material
P1 RUpstream N1 300 64.50 DCIP 
P2 N1 N2 300 15.94 DCIP 
P3 N2 N3 300 70.05 PVC 
P4 N3 N4 300 43.33 PE 
P5 N4 N5 300 218.59 PVC 
P6 N5 N6 300 134.26 PVC 
P7 N5 N8 300 261.34 PVC 
P8 N6 300 135.08 PVC 
P9 N7 300 68.73 PVC 
P10 N7 300 21.09 PE 
P11 PDAQ 300 29.45 PE 
P12 PDAQ N8 300 22.32 DCIP 
P13 B8 N9 300 27.98 DCIP 
P14 N9 N10 300 50.36 PVC 
P15 N10 N11 300 50.25 SP 
P16 N11 N12 300 77.24 PVC 
P17 N12 N13 300 34.11 PVC 
P18 N13 N14 300 26.77 PE 
P19 N14 N15 300 70.00 SP 
P20 N15 RDownstream 300 6.99 PVC 

SP, steel pipe; DCIP, ductile iron pipe; PVC, polyvinyl chloride; PE, polyethylene.

Figure 1

Schematic of the experimental WDS.

Figure 1

Schematic of the experimental WDS.

Close modal

To facilitate event generation within the network, two devices were strategically installed at designated Points A and B. Point A, fitted with a 50-mm ball valve connected to the 300 mm water main, is positioned 51.77 m away from the pressure data acquisition point. This setup is optimized for the direct detection of significant hydraulic events. Conversely, Point B, equipped with a 15-mm ball valve on a fire hydrant, is located 141.59 m from the data acquisition site, making it ideal for capturing smaller-scale events from a distance.

The experimental data were gathered at a site 801.02 m downstream from the upstream reservoir (Point of DAQ). To ensure the precision of data acquisition, high-accuracy pressure sensors were employed. These sensors, Model PXJ409-1.0 MGI from Omega Engineering Inc., feature a measurement range of 0 –1.0 MPa and boast an accuracy of 0.08%. Data acquisition was facilitated by the NI-9253 module from National Instruments Inc., capable of capturing pressure and flow data at a sampling rate of up to 1,000 Hz. Additionally, a software routine was developed in LabVIEW to configure and manage the data acquisition system effectively.

Field validation

To evaluate the applicability of the developed event detection algorithm in actual water distribution networks, data acquisition devices were installed within the network of City D, with data collected as per the operational requirements. Table 2 details the pipe properties of the target network, and Figure 2 illustrates a schematic of the installation sites of the proposed event detection devices and the locations of recorded events. The network provides water to a designated district metered area through a 400-mm-diameter pipeline P1. Connected to P1 is a 200-mm-diameter pipeline P2, which, in turn, connects to a 100-mm-diameter pipeline P3, delivering water to a large apartment complex. Measurement Site A was established along the main water supply route to key consumers, while Measurement Site B was positioned along the continuation of pipeline P1 to the 200-mm-diameter pipeline P4.
Table 2

Pipe properties of the target network

Pipe IDNode 1Node 2Diameter (mm)Length (m)Material
P1 N1 N2 400 257.63 DCIP 
P2 N2 Site A 200 94.13 DCIP 
P3 Site A N3 100 47.10 DCIP 
P4 N2 N4 400 172.01 DCIP 
P5 N4 Site B 200 18.77 DCIP 
Pipe IDNode 1Node 2Diameter (mm)Length (m)Material
P1 N1 N2 400 257.63 DCIP 
P2 N2 Site A 200 94.13 DCIP 
P3 Site A N3 100 47.10 DCIP 
P4 N2 N4 400 172.01 DCIP 
P5 N4 Site B 200 18.77 DCIP 

DCIP, ductile iron pipe.

Figure 2

Schematic overview of the actual water distribution network considered for field validation.

Figure 2

Schematic overview of the actual water distribution network considered for field validation.

Close modal

In this study, the sensors installed at each site within the actual water distribution network were outfitted with a broad temperature compensation range of −40 to 105 °C, rendering them highly suitable for environments experiencing significant diurnal and seasonal temperature fluctuations. The employed pressure sensor, Model SPT-I2 from Prignitz Inc., boasts a measurement range of 0–1.0 MPa and an accuracy of 0.5%.

Detection of simple events in the pilot-scale water distribution network

Figure 3(a) presents a comparative analysis of the pressure behaviors under normal operations and specific event conditions at Points A and B within the simulated water distribution network. During standard, event-free operations, the network's pressure displays a periodic sinusoidal wave pattern, primarily due to pump activity.
Figure 3

(a) Time series of pressure waves and (b) ML algorithm detection results for valve-induced events in the pilot-scale water distribution network.

Figure 3

(a) Time series of pressure waves and (b) ML algorithm detection results for valve-induced events in the pilot-scale water distribution network.

Close modal

At Point A, a controlled event was simulated by the rapid opening and subsequent slow closing of a 50-mm ball valve, which induced two pronounced rapid pressure drops, effectively demonstrating the transient-flow characteristic of a rupture event. However, the pressure wave behavior resulting from the valve closure was less discernible. Given the potential for serious safety incidents or pipe damage within the 300-mm network when a 50-mm valve is suddenly closed, the valve was intentionally closed slowly to mitigate the generation of large pressure waves. The distinct pressure waves prompted by valve-induced ruptures contrast sharply with the sinusoidal waves produced by the pumps, indicating that the detection of such anomalies could be facilitated by setting specific thresholds. Notably, the pressure wave behavior associated with the valve opening was recorded twice during the 130-s interval, specifically from 13.32 to 28.467 s and from 74.32 to 89.45 s, each instance triggered by the rapid activation of the valve.

At Point B, operations analogous to those at Point A were conducted, with events initiated by the rapid opening and closing of a smaller 15-mm ball valve. However, unlike the events at Point A, these actions at Point B induced only minor pressure fluctuations – challenging to discern visually against the continuous background of sinusoidal waves despite the generation of more severe events through rapid valve operations. Hence, distinguishing these subtle fluctuations from the ongoing sinusoidal background remains complex.

The dynamics of the pressure waves resulting from valve operations at Point B were recorded during four distinct events within the 130-s test period. Specifically, a rapid opening of the valve from 45.23 to 56.32 s triggered one pressure wave, while a subsequent rapid closing of the already-open valve from 67.13 to 78.21 s resulted in another. Furthermore, another rapid valve opening from 90.32 to 101.40 s and a rapid closing from 116.23 to 127.32 s each produced additional pressure waves.

We employed the LSTM dynamic threshold model to assess the detectability of rupture events at Points A and B. The settings for the performance evaluation were configured as follows: the interval and the window_size_portion were set to 86,000 s to 0.33, respectively, with all other hyperparameters remaining at their default values. Figure 3(b) displays the outcomes of the event detection. The actual event periods are marked with blue shading, while the red shading indicates the periods of detected anomalies using the dynamic threshold model. At Point A, the model effectively recognized distinct pressure waves resulting from valve operations, accurately identifying the first event from 14.78 to 25.85 s and the second event from 75.45 to 88.30 s, consistent with the actual timings of the valve operations. Although the rupture events at Point B were visually subtle, the model successfully detected the first valve opening event from 45.68 to 54.43 s, the subsequent valve closing from 68.89 to 77.40 s, and the next valve opening from 91.87 to 101.89 s. However, the model failed to recognize the final valve-closing event. The rapid opening and closing of the 15-mm valve at Point B, which coincided with a peak of the pump-induced sinusoidal wave, posed detection challenges, possibly attributed to the configured interval being significantly longer than the actual input data interval. Furthermore, the default hyperparameter settings, such as the window size, might not have been optimal for capturing the specific characteristics of the anomalies targeted in our study.

Model selection and hyperparameter tuning

To select an optimal model and fine-tune its hyperparameters for field applications, continuous events were simulated within the pilot-scale water distribution network. Figure 4 illustrates the pressure time-series from these experiments. Approximately 50,000 data points were collected for 500 s at a frequency of 100 Hz. The initial 200 s, during which no events occurred in the network and no valve operations were conducted, provided normal signals used for model training. Between 200 and 500 s, 12 events were induced at Point A through periodic valve operations.
Figure 4

Pressure time series from continuous event experiments conducted in the pilot-scale WDS for model selection and hyperparameter tuning.

Figure 4

Pressure time series from continuous event experiments conducted in the pilot-scale WDS for model selection and hyperparameter tuning.

Close modal

The primary objective was to identify the most suitable ML model and optimal hyperparameter settings based on the experimental outcomes, with specific attention given to the interval and window_size_portion. Table 3 provides a detailed summary of the model and hyperparameter combinations tested in the experiments. We evaluated 48 unique combinations, assessing each one's performance based on recall, F1-score, and relative execution time. These metrics were crucial for determining each combination's effectiveness in accurately detecting events while also considering execution efficiency and speed.

Table 3

Combinations of models and hyperparameters (interval and window_size_portion) considered in our experiments

ModelIntervalWindow_size_portion
LSTM dynamic threshold 21,600 0.01 
AER 28,800 0.13 
TadGAN 32,400 0.23 
 36,000 0.33 
ModelIntervalWindow_size_portion
LSTM dynamic threshold 21,600 0.01 
AER 28,800 0.13 
TadGAN 32,400 0.23 
 36,000 0.33 

Table 4 summarizes the average values of the evaluation metrics for each model, and Figure 5 displays the results of the evaluations for each combination. The LSTM dynamic threshold model demonstrated superior performance with an interval of 28,800 and a window_size_portion of 0.13, achieving a recall of 0.917 and an F1-score of 0.959. In contrast, TadGAN exhibited protracted computational times and lower classification effectiveness. While the AER model offered relatively rapid computation speeds compared with the LSTM dynamic threshold model, its classification performance was less satisfactory. No significant enhancements in performance or speed increases were observed with other hyperparameter adjustments beyond the best-performing combination previously identified. The characteristics of our acquired water pressure data seem to align better with the prediction-based LSTM dynamic threshold model, in contrast to the reconstruction-based approaches utilized by AER or TadGAN. For practical field deployment, we opted for the combination that delivered the best results to ensure effective classification while reducing response times to within 1 h by decreasing the number of training epochs from 35 to 4.
Table 4

Summary of the average value of the evaluation metrics

ModelRecallF1-scoreRelative execution time
AER 0.188 ± 0.252 0.215 ± 0.234 0.6 ± 0.1 
LSTM dynamic threshold 0.688 ± 0.162 0.802 ± 0.114 1.2 ± 0.1 
TadGAN 0.365 ± 0.172 0.416 ± 0.166 13.2 ± 3.3 
ModelRecallF1-scoreRelative execution time
AER 0.188 ± 0.252 0.215 ± 0.234 0.6 ± 0.1 
LSTM dynamic threshold 0.688 ± 0.162 0.802 ± 0.114 1.2 ± 0.1 
TadGAN 0.365 ± 0.172 0.416 ± 0.166 13.2 ± 3.3 
Figure 5

Visualization of evaluation results with different combinations of models and hyperparameters.

Figure 5

Visualization of evaluation results with different combinations of models and hyperparameters.

Close modal

Field validation: assessing operational performance under real-world conditions

Following the results of model selection and hyperparameter tuning from the pilot-scale water distribution tests, we employed a combination of the ML model and hyperparameters that demonstrated the best performance to monitor event phenomena in an actual water distribution network. Figure 6 illustrates the algorithmic structure used for dynamic data acquisition and anomaly detection within the network. The pressure time-series data collected during the initial 5 min were established as the baseline for normal conditions and were utilized to train the ML algorithm. This algorithm was subsequently applied to detect anomalies in the pressure behavior of the network over the next hour, analyzing data in 5-min intervals. Datasets identified as abnormal during these intervals were stored as high-frequency time-series data, whereas datasets deemed normal were recorded as 1 s averages to enhance data storage efficiency. Data from the final 5 min, categorized as normal, were repurposed as training data to facilitate fault evaluation in the network during the subsequent 1-h period.
Figure 6

Algorithm structure for dynamic data acquisition and anomaly detection in water distribution networks.

Figure 6

Algorithm structure for dynamic data acquisition and anomaly detection in water distribution networks.

Close modal
Figure 7 displays the pressure time series derived from an actual water distribution network, captured using a data acquisition system that operated as depicted in Figure 6. Throughout a 24-h monitoring period, several events were detected, including significant pressure drops and spikes, along with various other disturbances. These events, discernible only in high-frequency data and typically associated with pressure waves traveling at speeds between 1,000 and 1,400 m/s, provide valuable insights into the impact of such disturbances on the water distribution network.
Figure 7

Pressure time series acquired from an actual water distribution network using the developed algorithm.

Figure 7

Pressure time series acquired from an actual water distribution network using the developed algorithm.

Close modal
Figure 8 displays the high-frequency pressure data time series from Sites A and B, which were identified as containing events. The data are presented in both their raw form, captured at a 100-Hz data acquisition frequency, and as a moving average to facilitate trend analysis. Site A, strategically located just before a major water consumption area, is particularly susceptible to sudden pressure changes. These fluctuations are often driven by pump operations, which adjust according to varying demand levels. In this locale, the water is conveyed through pipes with smaller diameters (100 mm), which are inadequate to fully satisfy the demand, resulting in frequent alterations in flow rate. This dynamic contributes to the movements of the pressure waves as illustrated in Figure 8. At Site B, the algorithm successfully identified events characterized by sudden pressure drops, confirming that these rapid changes in pressure generated distinct pressure waves.
Figure 8

High-frequency pressure time series of detected events at Sites A and B.

Figure 8

High-frequency pressure time series of detected events at Sites A and B.

Close modal

In this study, we developed and implemented a method using ML algorithms to detect pressure waves generated by various anomalies (such as pipe bursts and valve operations) in water distribution networks. We rigorously evaluated and selected the most effective ML model and hyperparameter combinations based on experiments conducted within a pilot-scale network. To validate the applicability of our methodology, we installed and operated a device in an actual water distribution network, which adjusted the data acquisition cycle variably according to the outputs from the ML model. This approach enabled the detection of nuanced behaviors of pressure waves, which were challenging to discern with the standard 1 s pressure acquisition cycle in the network. The direct application of the developed pressure acquisition device in real water distribution networks is anticipated to provide critical operational insights. These insights are expected to be instrumental in transient-flow analysis and other related applications. Our methodology also promotes data storage efficiency by maintaining standard data acquisition frequencies during periods devoid of detectable events, thereby ensuring that critical incidents are captured without fail.

Despite these advancements, the lack of a clear standard for modifying the data acquisition frequency based on the detection or absence of network events indicates a pressing need for further development of refined methodologies. Given the limitations of current pressure acquisition systems within water distribution networks, our study highlights the imperative of advancing and fine-tuning our ML-based anomaly detection system. This technology holds particular promise for regions prone to transient pressure disturbances and frequent pipeline ruptures – events that conventional systems often fail to detect. Implementing our system in such settings could reveal the root causes of previously undetected events, potentially leading to significant enhancements in network management and emergency response protocols. Future research will focus on establishing practical prediction intervals based on actual operational data – crucial for optimizing our methodology and ensuring its effective application in real-world settings.

This work was supported by the Korea Planning & Evaluation Institute of Industrial Technology funded by the Ministry of the Interior and Safety (MOIS, Korea; Project Name: Development of water quality platform to prevent with tap/drinking water accidents/Project Number: 20025188).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alnegheimish
S.
,
Liu
D.
,
Sala
C.
,
Berti-Equille
L.
&
Veeramachaneni
K.
2022
Sintel: A machine learning framework to extract insights from signals
. In:
Proceedings of the 2022 ACM International Conference on Management of Data
.
Association for Computing Machinery
, pp.
1855
1865
.
doi:10.1145/3514221.3517910
.
Brunone
B.
,
Tirello
L.
,
Rubin
A.
,
Cifrodelli
M.
&
Capponi
C.
2024
Transient tests for checking the Trieste subsea pipeline: Toward field tests
.
Journal of Marine Science and Engineering
12
(
3
),
374
.
Choi
D. Y.
,
Kim
J.
,
Lee
D.-J.
&
Kim
D.
2015
Pressure measurements with valve-induced transient flow in water pipelines
.
Urban Water Journal
12
,
200
206
.
Colombo
A. F.
,
Lee
P.
&
Karney
B. W.
2009
A selective literature review of transient-based leak detection methods
.
Journal of Hydro-Environment Research
2
,
212
227
.
Covas
D.
&
Ramos
H.
2010
Case studies of leak detection and location in water pipe systems by inverse transient analysis
.
Journal of Water Resources Planning and Management
136
,
248
257
.
Dai
Z.
,
Peng
L.
&
Qin
S.
2023
Experimental and numerical investigation on the mechanism of ground collapse induced by underground drainage pipe leakage
.
Environmental Earth Sciences
83
,
1
17
.
Dai
H.
,
Liu
Y.
,
Guadagnini
A.
,
Yuan
S.
,
Yang
J.
&
Ye
M.
2024a
Comparative assessment of two global sensitivity approaches considering model and parameter uncertainty
.
Water Resources Research
60
,
e2023WR036096
.
Dai
H.
,
Ju
J.
,
Gui
D.
,
Zhu
Y.
,
Ye
M.
,
Liu
Y.
&
Hu
B. X.
2024b
A two-step Bayesian network-based process sensitivity analysis for complex nitrogen reactive transport modeling
.
Journal of Hydrology
632
,
130903
.
Duan
H. F.
,
Lee
P. J.
,
Ghidaoui
M. S.
&
Tung
Y. K.
2011
Leak detection in complex series pipelines by using the system frequency response method
.
Journal of Hydraulic Research
49
,
213
221
.
Duan
H. F.
,
Lee
P. J.
,
Ghidaoui
M. S.
&
Tuck
J.
2014
Transient wave-blockage interaction and extended blockage detection in elastic water pipelines
.
Journal of Fluids and Structures
46
,
2
16
.
Ferrante
M.
,
Brunone
B.
,
Meniconi
S.
,
Karney
B. W.
&
Massari
C.
2014
Leak size, detectability and test conditions in pressurized pipe systems
.
Water Resources Management
28
,
4583
4598
.
Geiger
A.
,
Liu
D.
,
Alnegheimish
S.
,
Cuesta-Infante
A.
&
Veeramachaneni
K.
2020
TadGAN: Time series anomaly detection using generative adversarial networks
. In:
Proceedings of the IEEE International Conference on Big Data
.
IEEE
.
Hu
C.
,
Dong
B.
,
Shao
H.
,
Zhang
J.
&
Wang
Y.
2023
Toward purifying defect feature for multilabel sewer defect classification
.
IEEE Transactions on Instrumentation and Measurement
72
,
1
11
.
Hundman
K.
,
Constantinou
V.
,
Laporte
C.
,
Colwell
I.
&
Soderstrom
T.
2018
Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding
. In:
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
.
ACM
.
Kim
S. H.
,
Zecchine
A.
&
Choi
R. W.
2014
Diagnosis of a pipeline system for transient flow in low Reynolds number with impedance method
.
Journal of Hydraulic Engineering
140
,
04014063
.
Lee
J.
,
Ko
D.
,
Lee
E.
,
Kim
S.
,
Kim
J.
&
Choi
D.
2021
Leak detection through wavelet analysis of pressure measurement for injected pressure for a simple pipeline system
.
Desalination and Water Treatment
227
,
116
123
.
Levinas
D.
,
Perelman
G.
&
Ostfeld
A.
2021
Water leak localization using high-resolution pressure sensors
.
Water
13
(
5
),
591
.
Starczewska
D.
,
Collins
R.
&
Boxall
J.
2015
Occurrence of transients in water distribution networks
.
Procedia Engineering
119
,
1473
1482
.
Vítkovský
J.
,
Lambert
M. F.
,
Simpson
A. R.
&
Liggett
J. A.
2007
Experimental observation and analysis of inverse transients for pipeline leak detection
.
Journal of Water Resources Planning and Management
133
,
519
530
.
Wong
L.
,
Liu
D.
,
Berti-Equille
L.
,
Alnegheimish
S.
&
Veeramachaneni
K.
2022
AER: Auto-encoder with regression for time series anomaly detection
. In:
Proceedings of the 2022 IEEE International Conference on Big Data
.
IEEE
.
Ye
G.
&
Fenner
R. A.
2014
Study of burst alarming and data sampling frequency in water distribution networks
.
Journal of Water Resources Planning and Management
140
,
06014001-1-7
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).