The timely and robust detection of anomalies is essential for resilient and secure operations of critical water infrastructures against operational faults or malicious actions. However, real-world systems exhibit diverse and evolving spatiotemporal relationships among their components, posing an intricate challenge in anomaly detection. This study proposes a Hybrid Dynamic graph neural network that jointly maps long- and short-term spatiotemporal relationships in multivariate data streams. Those relationships are encoded via a hybrid graph, comprising an optimally learned static subgraph for persistent relationships and a complementary dynamic subgraph for dynamically shifting relationships. Additionally, an attention mechanism captures time-varying relational importances and shifts the model's focus towards significant relationships, while minimising contributions of less importance to the final outputs. The proposed architecture is showcased through a synthetic case study of a water distribution system with multivariate data streams from both on-site and soft sensors, and shows strong detection and localisation accuracy against all scenarios of operational faults and malicious actions explored. A comparative analysis with an equivalent static graph model indicates that the addition of the hybrid dynamic component enhances detection accuracy and reduces false alarm rates through more robust characterisation of behaviours, thus allowing actionable insights for more resilient and secure operations.

  • This study introduced Hybrid Dynamic GNN to capture long- and short-term relationships in complex multivariate systems.

  • Dynamically adjusted attention mechanisms to varying relationships’ significance.

  • Reliable detection and explanation both operational faults and malicious actions in complex, stochastically driven systems.

  • Comparative analysis suggests reduced false alarms and increased detection accuracy over static GNN in identical scenarios.

Water infrastructures, as critical entities for the well-being and security of societies, are expected to continuously sustain and overcome any stressor that may hinder integrity of their services. This constitutes an increasingly complex task, as the current (and future) operational landscape of the sector is characterised by a plethora of disruptive events, including typical operational faults (e.g., pipe bursts and asset malfunctions), and intentional malicious actions (e.g., sabotages and cyber–physical attacks). In the unremitting quest for resilience, albeit the merits of strategic planning (see e.g., Dimas et al. 2022; Moraitis et al. 2023), the prompt real-time detection of events is crucial to prevent escalation and cascading of impacts through timely interventions. At its core, anomaly detection, also referred to as outlier detection, relies on the formulation of a benchmark normal profile for a system, to characterise any new behaviour as either normal or abnormal (Patcha & Park 2007). Although conceptually straightforward, in practice anomaly detection is challenged by many factors (Chandola et al. 2009), both intrinsic (i.e., technique-specific) and extrinsic (i.e., domain or case-specific).

With a focus on the water sector, real-time anomaly detection is identified as a key capability for mature and insightful Water Digital Twins (SWAN Digital Twin Work Group 2022) – a point in the Digital Water horizon emerging from the new data landscape and smart hydroinformatics (Makropoulos & Savić 2019; Sarni et al. 2021). However, the field of anomaly detection has a vast amount of branching techniques that propagate from their early statistical form in the seminal works of Newcomb (1886) and Edgeworth (1887), and expand to the deep learning era. There exist literature over the suite of anomaly detection techniques, both wide (see e.g., Patcha & Park 2007; Chandola et al. 2009) and type-specific (e.g., deep learning (Pang et al. 2021) and graph neural networks (GNNs) (Wang & Yu 2022), detailing their merits and drawbacks for a variety of applications. Notably, there are the following five key challenges that overarch the anomaly detection applications:

  • CH.1: Water systems, like other infrastructures (e.g., power grids and communication networks), are dynamic by nature, with complex spatiotemporal relationships and multimodal ‘normal’ behaviours.

  • CH.2: The new data landscape is characterised by larger and more heterogeneous data streams (Makropoulos & Savić 2019; Savić 2021).

  • CH.3: The difference between normal and anomalous behaviour is obscured by the complex cyber–physical architecture of systems – and even concealed under bogus data in the case of cyber–physical attacks.

  • CH.4: It is hard to attain complete, accurate, and representative labelled datasets of past events (Chandola et al. 2009; Mounce et al. 2011; Wangen 2019).

  • CH.5: The majority of anomaly detection techniques are focused on a specific application or a specific problem formulation (Chandola et al. 2009).

Anomaly detection in the water sector is utilised predominantly in the following two distinct applications: (a) leakage detection or (b) attack detection, yet never jointly. The early formulation for both applications was through model-based approaches, with Carpentier & Cohen (1991) comparing on-site measurements to a calibrated static model to detect leakages in the Vaugirard network in Paris, and Amin et al. (2013) proposing hydrodynamic modelling as a reference to detect false-data injection attacks in open channel sensors. Later, Housh & Ohar (2018) proposed the use of EPANET (Rossman 2000) for a model-based attack detector in water distribution systems. Despite promising results and recent advances in cyber–physical modelling of water systems (Taormina et al. 2019; Nikolopoulos et al. 2020; Murillo et al. 2023), attaining well-calibrated models of real systems is not trivial. Moreover, the uncertainty bound of their estimations is expanded by time-evolving system parameters (e.g. pipe roughness) (Chan et al. 2019), as well as the variability and stochastic nature of the underlying water demands (Kossieris & Makropoulos 2018) that drive the systems.

In recent years there has been extensive work in anomaly detection, catalysed by relevant research ‘battles’ (Taormina et al. 2018; Vrachimis et al. 2022), and machine learning (ML) techniques are increasingly proposed for leakage (Chan et al. 2019) and attack (Tuptuk et al. 2021) detection, utilising both supervised and unsupervised approaches. A bottleneck of supervised ML techniques is their need for labelled training datasets, thus favouring the use of unsupervised techniques.

A common drawback among classical ML approaches is that their representation learning aims to decipher short- and long-term temporal relationships and often overlooks or lacks the mechanisms to explicitly account for combined spatiotemporal relationships. A field of ML that explicitly accounts for those mechanisms is that of GNNs, a deep learning graph representation technique especially potent for anomaly detection (Wang & Yu 2022). GNNs jointly infer the spatiotemporal relationships through the adjacency matrix (a.k.a. connectivity matrix), which can be either user-defined or self-learned by the model, and either static or dynamic; with the most common combination being that of user-defined static graphs. The latter is the case for the prediction-based graph anomaly detection introduced by Tsiami & Makropoulos (2021) for the water sector, via temporal graph convolutional neural networks (TGCNs). In their work, system interconnections were represented as a skeletonised graph of the network, derived by the on-site sensors' location. Although the actual spatial connectivity is a meaningful direct relationship, systems often exhibit secondary and possibly unperceived relations. This yields a major challenge to the wide-spread implementation of GNNs, as their performance is highly affected by the representational quality of the underlying graph (Zhu et al. 2021; Wang & Yu 2022).

To maximise understanding and representation of underlying mechanisms, GNNs can be coupled with metric-based, neural or direct graph structure learning (GSL) methods (Zhu et al. 2021; Chen & Wu 2022), and jointly learn the optimal parameters and graph structure, in an end-to-end manner. A proven technique in this process is ‘node embedding’, which, in simple terms, provides a unique ‘summary’ of each node, learnt from its corresponding data per se – instead of requiring human expertise to characterise it. In practice, the embedding vectors build a map of meaningful representations of each node's characteristics and patterns in a model-readable format, and allow the identification of ‘similar’ nodes as well as the generalisation of ML models to unseen examples. Chen et al. (2020) proposed node embeddings to derive an adjacency matrix optimised for the downstream prediction task, and linearly combine it with an original graph to formulate the final static connectivity for a two-layered graph convolutional network (GCN). In the absence of an initial graph, one may be derived, for example, through kNN (k-Nearest Neighbour) graph constructor based on cosine similarity. Deng & Hooi (2021) took the idea a step further and suggested that the final static graph can be optimally derived solely from the similarity between node embeddings, restricted to a predefined set of candidate relationships if prior knowledge exists. The proposed graph deviation network (GDN) utilises the learned sensor embeddings in both the GSL and the downstream graph attention network (GAT) (Velickovic et al. 2018), to represent the different behaviours of sensors in a static graph, and enhance the anomaly detection of attacks. GAT introduce the concept of attention for graph networks, which allow the adjustment of the convolution parameters, i.e., the degree of association between connected nodes, and thus weight the importance of each pairwise connection in the static graph. However, real-world systems are rarely – if ever – static in their relational interconnections, raising questions over the possible existence of a threshold in the ‘optimal’ representability of static graphs.

Water systems, like power grids, communications, and other systems, are multimodal, i.e., they exhibit multiple operational schemes. For example, a distribution system may interchangeably rely on tanks and pressure-boosting stations to sustain pressure levels in a district metered area (DMA), diversify daily the percentage of supply among its water treatment plants, and so on. Although the spatial connectivity of assets remains practically static, the systems' operational connectivity continuously alternates. Dynamic GNNs (DGNNs) have recently emerged to capture both discrete (Seo et al. 2018) and continuous (Trivedi et al. 2019) evolutions of structural information, and have become the state-of-the-art approach, with several models proposed (Yang et al. 2024). Examples for anomaly detection applications include the AddGraph (Zheng et al. 2019), which utilises GCN and gated recurrent units (GRUs) to learn spatiotemporal relationships of dynamic graphs' snapshots, and NetWalk (Yu et al. 2018), which proposed dynamic clustering of clique embedding outputs, i.e., sets of tightly associated node groups, to detect anomalies based on the derived network representation.

Challenged by alternating relationships between multivariate data streams, the problem of dynamic graph anomaly detection in heterogeneous systems is largely under-explored (Kim et al. 2022), and, notably, never addressed in regard to the water systems. So far, there is still no approach proposed that explicitly considers the multimodal behaviour of the systems and the dynamic spatiotemporal relationships emerging from the evolution of their operational connectivity, without loss of potentially strong, long-lasting relationships.

In this vein, the objective of this work is the development of a robust unsupervised solution for anomaly detection in multivariate data streams (CH.2 and CH.4), able to properly adjust to the underlying dynamic spatiotemporal relationships and multimodal behaviours of any complex, cyber–physical system (CH.1, CH.3, and CH.5).

Inspired by the above-mentioned GNN works, and in response to the identified challenges, this work proposes a hybrid dynamic graph neural network (HDGNN) architecture that jointly represents dynamically shifting and persistent structural relationships in multivariate data streams, through a learned hybrid graph. As the term ‘hybrid’ suggests, the generated graph consists of a learned static subgraph, to encode persistent relationships as permanent edges, and a dynamic subgraph, which encodes additional instantaneous relationships in the system as temporary edges in each timestep. The hybrid graph is passed downstream through a masked attention mechanism layer of the HDGNN to update the features of the vertices based on the information aggregation of the assigned neighbourhood for each timestep. The HDGNN outputs predict the baseline performance of the system and, compared to the supervisory control and data acquisition (SCADA) system measurements, characterise normal or anomalous behaviours at node level.

The novelty of this work lies in the fact that, unlike traditional methods that focus on static relationships, the HDGNN jointly captures both persistent and dynamic relationships in a hybrid graph network. This new approach enables the model to effectively capture the complexities and multimodal behaviours inherent in water systems. Moreover, the proposed architecture utilises mechanisms that allow for better characterisation of spatiotemporal dynamics and varying degrees of relationships, thus enhancing the expressive power of the model and leading to nuanced anomaly detections. Last, but not least, with the aim of building a practical solution for the sector applicable to multiple cases, the proposed prediction-based anomaly detection scheme is adoptable without the need for labelled datasets or previous knowledge over optimal graph structures in the system.

In this section, we present the proposed HDGNN model and its basic components, namely: (1) hybrid dynamic graph construction, to jointly encode into a single graph the persistent and temporal structural relationships (edges) between measured entities (vertices) of a system; (2) dynamic masked attention for graphs, to enforce the learned connectivity on embeddings-aware attention mechanisms and generate custom dynamic attention patterns per graph instance, i.e., dynamic graph attention; and (3) stacked fully connected output, to sequentially aggregate vertices' features based on the dynamic attention patterns, apply linear and non-linear transformations to the extracted features, and derive the predictions of the vertices' behaviour. The HDGNN is then coupled downstream with an anomaly scoring system that compares model predictions and actual system measurements to characterise anomalous and normal vertex behaviours, in an end-to-end workflow.

Hereafter, and for simplicity, the term sensor refers to any component (tangible or not) and its measurements derived from soft or on-site sensors, simulation, naïve, or statistical models, while the term system refers to any isolated or interconnected network, such as interdependent critical entities, which the entire set of sensors measures.

Preprocessing

An important step in ML techniques is feature scaling, which improves convergence and stability of the model by transforming sensors' features into a standardised scale. This reduces scale-driven dominance of sensors and ensures a more balanced representation of their influence, through various normalisation and standardisation techniques. Given the problem addressed is focused on multivariate timeseries, we hereby utilise z-score normalisation, transforming inputs into zero-mean and unit-variance features.

Let denote the set of system sensors that measures number of variables over a period T, i.e., produces features sets . The features are normalised via , where is the features' mean and , their standard deviation over period T, for each sensor i.

Hybrid dynamic graph construction

Real-world systems are rarely static, and often exhibit multimodal behaviour, i.e., different vertices are connected via different relationships at different times t, as the system adapts to the most recent conditions. Yet, systems often tend to also retain a certain degree of ‘deterministic’ relationships, especially in relation to source points (e.g., water treatment plants and power plants) or central operational hubs (e.g., storage tanks and transmission substations) at a local or global scale. The goal of the hybrid dynamic graph construction module is to formulate a common graph that reflects both static and dynamic relationships, in a two-step approach. In the first step, graph node embeddings are utilised to extract an optimal static graph that reflects the learned persistent relational structure among system components, without any prior knowledge or initial graph required. The second step seeks to augment the static graph with dynamic vertices derived by sensors' short-term correlation, which reflects the strongest relations emerging in the given period. The fusion of the two creates the hybrid dynamic graph used to represent the sensors' relationships at every timestep t.

As mentioned, the first step is to learn an optimal static subgraph. For a system measured by a set of sensors S, let denote a static subgraph that connects a set of vertices, with , through a set of edges. The graph can be represented by its adjacency matrix that contains all edges . The unweighted adjacency matrix is synthesised as , where indicates a connection between the vertices and , and indicates lack of pairwise connection. Among the possible sets of edges of a given size, there shall exist an optimal set that best represents the vertices' correlation, as the optimal graph . Based on network homophily (i.e., similar vertices are more likely to be connected), the optimal connections will also be represented in the form of similarity between the graph vertices embeddings . Hence, by leveraging this property, the model can map the sensors' behaviours into low-dimensional space through a mapping function : and subsequently learn the optimal graph based on the embeddings' pairwise similarity. To be more precise, the model learns a set of internal parameters that, used in the mapping function f, will yield the embedding vectors for each sensor. With no prior knowledge over the optimal graph structure, the optimisation is with respect to the downstream prediction tasks, i.e., the HDGNN learns a graph structure that minimises the difference between measurements and predictions. To derive the similarity of the embeddings, various metrics can be used, including radial basis functions kernel, dot product, cosine similarity, or the weighted cosine similarity (Chen et al. 2020). Without loss of generality, this work hereby utilises the cosine similarity metric to derive the similarity score , as in the following equation.
(1)
The optimal edges can be then defined as the connections between vertices and that satisfy a similarity criterion. This may be expressed in the form of a similarity threshold, above which the vertices are assigned a connection, or via a sparsification operation to learn a sparse neighbourhood for each vertex based on its embeddings' similarity, such as the function (Norcliffe-Brown et al. 2018). The later function ranks the embeddings' similarity scores for every vertex pairwise to a vertex and selects only the set of K-highest values per vertex. This allows the construction of an optimal static subgraph with constraint size , while ensuring a common degree of representativeness for each vertex through a K-sized neighbourhood. Based on this function, the static adjacency matrix is learnt as follows:
(2)

The above-mentioned outcome represents the learned size-constrained optimal connections within the system via the corresponding static subgraph .

For the second step of GSL, the goal is to augment the statistic subgraph with a set of temporary edges that reflect the latest system interconnections and can be described as a dynamic subgraph of the system. Let denote the dynamic subgraph at timestep t, connecting a set of vertices , through a set of temporary edges . To represent the dynamic changes in the system based on network homophily, the dynamic edges shall exist between vertices that exhibit similar behaviours in a given time window and are not already a part of the static graph. The pairwise relationship of sensors can thus be implied through their monotonic relationship, not in a sense of causality but of similarity. To characterise the strength and direction of the monotonic relationships in a time window w, and without preconception over linearity, the Spearman rank correlation coefficient is calculated as for each sensor pair, based on its features’ sequence in the sliding window , converted to ranks through a function . For each timestep t, the unweighted adjacency matrix can be formulated based on the satisfaction of a threshold criterion, signifying a connection between every pair whose correlation is above a defined threshold.
(3)

An indicative threshold criterion, and a good starting point, is , which suggests a strong positive correlation between sensors, or , which reflects both negative and positive strong correlations within the selected time window w.

Subsequently, and for each timestep the hybrid dynamic graph of the system is synthesised as the union of its static and dynamic subgraphs, while the adjacency matrix is derived as the addition of the two adjacency matrices, i.e., , and contains self-edges . Although the hybrid dynamic graph is derived solely from sensor-centred information, pairwise relationships may be enforced or restricted based on expert judgement, to modify the final graph. Experts may enforce or restrict connections within specific subsystems according to their expert judgement, or even utilise this capability to restrict connections with sensors known to be prone to failure, inaccuracies, or vulnerability-induced attacks.

Dynamic masked attention for graphs

GNNs obtain their expressive power and representativeness by leveraging the relational information encoded within the system graph. Central to their operation is the fusion of information between neighbouring vertices, which enables the extraction of higher-level representations of the vertices' features based on the characteristics of their neighbourhood. In the derived hybrid graph, the neighbourhood of each vertex can be defined at each timestep t as , i.e., all the vertices connected to it. However, the presence of connections between vertices does not inherently imply the uniform significance of the represented relationships; rather, it simply signifies their existence. To attend to significant relationships while minimising the contribution of less relevant neighbouring sensors, we utilise a graph attention layer, as conceptualised by Velickovic et al. (2018) and later enhanced through sensors' embeddings (Deng & Hooi 2021). Due to the dynamic nature of the hybrid graph, vertices can dynamically enter or exit each other's neighbourhood over time, adding to the complexity of the process. To adaptively adjust the attention mechanism to the varying significance of relationships of a neighbourhood in a dynamic context, the graph attention layer masks the raw attention weights prior to their normalisation for each timestep t. In that manner, any raw attention learned for the pairwise feature extraction of vertices is applied only in the presence of a connection and normalised based on the characteristics of the respective neighbourhood – yielding higher or lower attention at each different graph occurrence. Similarly to the dynamic subgraph construction process, the HDGNN uses a sliding window to perform feature extraction in both space and time, for each sensor's sequence in the time window . Note that the sliding window can be of different size than the one used for the construction of the dynamic graph.

Let be a set of learnable weights for a shared linear transformation over the features of vertices , and a vector of learnable raw attentional weights used in the attention mechanism. In every timestep t, the set of raw attention coefficients for every possible vertex pair is derived as follows:
(4)
where is the concatenation operator, and selu (Klambauer et al. 2017) is the Scaled Exponential Linear Unit (SELU), a non-linear activation function that induces self-normalisation. SELU activation allows for faster and more stable convergence and is defined for any variable as follows:
(5)
The matrix of raw attention coefficients resembles an attention mechanism for a fully connected graph, where every vertex would attend to the attributes of every vertex of the system, ignoring the learnt connections of the hybrid graph. To dynamically adjust the attention mechanism to the derived relationships at each timestep, the HDGNN enforces the relationship structure to the raw attention coefficients through masking. The raw attention is transformed according to each vertex's neighbourhood captured in the hybrid graph through the following masking operation:
(6)
After the dynamic masking of the raw attention coefficients, the function is applied to normalise the coefficients across all vertices and compute the attention coefficients .
(7)
Through the function, all connections that do not belong in the hybrid graph at timestep t receive a zero-attention coefficient, and are thus not considered in the feature extraction of vertex . Every other vertex is assigned an attention coefficient that reflects its importance to the feature extraction of based on the dynamically varying spatial and temporal characteristics of its neighbourhood, i.e., the vertices' connectivity and their respective features in the given time window. The aggregated feature representation per vertex is then computed as the linear combination of its attended neighbourhood's features’ sequence after applying non-linearity, e.g., via the following rectifier activation function:
(8)
The feature representation reflects the contribution of the neighbourhood's information to the vertex. To combine the attention-based information with the learnt behavioural patterns of the represented vertex , the final output of the layer is computed as the Hadamard product between the embeddings and the aggregated feature representation as follows:
(9)

Stacked fully connected output

The purpose of the last layer of the HDGNN is to transform the learned sequence representations from the masked attention layer into meaningful task-specific outputs, and specifically into predictions of the sensors' behaviour at time t. This is achieved through a stacked fully connected output layer (also known as stacked dense layer) that sequentially performs linear and non-linear transformations to derive increasingly complex representations and map them into the output vertices features . The general structure of an L-layered process can be described as the sequence of L transformations, as follows:
(10)
where denote the learnable weights, are the bias neurons and , the activation function of each layer L. Without loss of generality, the proposed HDGNN utilises an at least two-layered output, with the rectifier activation function as the non-linear transformer between the stacked layers. Nevertheless, the selection of the activation function, the number of layers, the hidden units in each layer, and so on, are hyperparameters of the model, which are adjusted according to the problem's specificities. In addition to the typical hyperparameters’ fine-tuning, the accuracy of the HDGNN predictions in respect to each case-specific dataset can also be improved via the adjustment of the static subgraph connectivity size via the parameter K, the rolling window w, and threshold for the correlation-driven construction of the dynamic subgraph and the time window for feature extraction in the masked attention layer.

The output of the HDGNN is the prediction vector that represents the normalised expected features of each sensor's behaviour at time t, based on the learnt relationships captured in the hybrid graph and the attended feature sequences of its neighbourhood in the time window .

Anomaly detection

The goal of the HDGNN model is to perform one step ahead forecast of the system's behaviour based on the sensors' features in a rolling time window. To make the HDGNN predictions human-interpretable and meaningful in the context of comparison against the SCADA measurements, their representation to the original scale per sensor can be attained by reversing the normalisation process, i.e., as . To reduce noise and better represent the underlying behavioural patterns, the sensors' predictions can be passed through a smoothing filter, such as the Savitzky–Golay filter (Savitzky & Golay 1964) that fits low-degree polynomials to local data points or the moving average filter that computes the average of local data points in a sliding window. Particularly in the case of the moving average, to avoid losing the importance of most recent trends predicted, a weighted moving average filter is preferred, which assigns lower weights to the older data points. The weights for each backwards moving timestep j in a rolling window of size is derived as follows:
(11)
where corresponds to the latest predictions at time t, and to the oldest. The filtering processes, although not necessary for the direct detection of anomalies, facilitate easier interpretation of the predicted system performance and enable actionable insights in the context of real-time monitoring.
The HDGNN-derived predictions are then compared against the sensors' measurements, to detect deviating behaviours and thus alarm for potentially occurring anomalies in the system. The prediction errors for each sensor can be derived either in the original scale of measurements or for the normalised , as the difference between measurements and predictions. To infer the existence of anomalies in the data streams, the standard score (a.k.a. z-score) of the prediction error against the distribution of errors from the testing period can be estimated as follows:
(12)
where and are, respectively, the mean and the standard deviation of errors for each sensor in the testing period. The standard score thus measures in terms of standard deviation the distance between the errors of the latest predictions from the mean of the error distribution of past predictions, derived from the testing set. Thus, new anomalies can be detected as events that drive the model to yield errors that deviate from their normal distribution, through a threshold criterion. The process of defining the threshold criterion depends on the available datasets, i.e., if labelled datasets are available or not. In the case of available labelled datasets to be used as holdout, the threshold can be defined as an optimisation problem such that it includes all identified events and minimises the false-positive ratio. In the absence of labelled data, the threshold can be derived as the maximum or an upper quintile of the standard score of the prediction errors in the holdout set, such that no or little false-positive anomaly identifications are expected.

However, a disruptive event can cause the system to deviate in numerous sensors, posing challenges to their explainability. To provide more valuable insight and pinpoint the source of anomalies, the proposed anomaly detection scheme suggests an expected causal association between the detected anomaly event and the sensor that exhibits the highest deviation score .

A synthetic water infrastructure

A key maturity pillar of Digital Twins for water is their ability to provide insights over the operational state of the system, including the detection of anomalous behaviours (SWAN Digital Twin Work Group 2022). In this vein, the HDGNN was initially conceptualised and subsequently deployed as the latest building block for a proof-of-concept Water Digital Twin (WDT) of a synthetic real-world system. The physical counterpart of the WDT resembled a modified water distribution network of C-Town, a benchmark network of an anonymised real-world system (Ostfeld et al. 2012). To account for real-world conditions and the inevitable pipe deterioration, fissures are assigned across all pipes, following the leakage modelling framework presented by Chela et al. (2023). Hence, the synthetic system exhibits hydraulic behaviours past the ‘optimal design’ of the original benchmark, and under the effect of continuous, dynamically varying leakages, with a mean monthly background leakage ratio of ∼10%. Moreover, the system is driven by a stochastically generated demand timeseries synthesised with the anySim R package (Tsoukalas et al. 2020). The recreated synthetic consumption behaviours preserve the auto-correlation structure of historical patterns and their lag-0 cross-correlation, and lead the system to exhibit synthetic, yet realistic operational behaviours.

In total, the water distribution system comprises five DMAs served by a single water source, i.e., reservoir R1 and operated through 11 pumps, four valves, and seven tanks and a total of 52 on-site sensors. It is supervised by a central SCADA system, which regulates operations via automated controls. The system is hypothesised as not yet equipped with smart-water metering devices at the consumers' endpoints, while their consumptions are measured monthly and recorded to the utility's database.

The historical data of the case study refer to an unlabelled dataset of a 2-year operational window, with 15-minutes measuring frequency and includes measurements from 52 on-site sensors and five soft sensors that estimate the fine-scale patterns of background leakages at DMA level, briefly introduced in Supplementary material, Appendix B.

Case study scenarios

Utilising the SCADA and soft-sensor data, this case study comprises six indicative scenarios, to explore the effectiveness of anomaly detection against a diverse array of scenarios that range from operational faults (e.g., pipe bursts and asset malfunctions), to intentional malicious actions (e.g., cyber–physical attacks). Each scenario spans a 2-week period past the historical period. The scenario designs also accommodate additional evaluation insights, presented along the scenario descriptions as follows:

  • Sc1: An event-free period under business-as-usual conditions. This scenario serves as a test to false alarms per se, and thus evaluation of the operational applicability of the anomaly detection, as high rates and long false-positive detections can lead to the ‘crying wolf’ effect, i.e., scepticism or disregard when real emergency occurs.

  • Sc2: A motivated and skilled attacker leverages knowledge over system operations and tank refill automations via data breaches that have gone unnoticed. The attacker thus manipulates the system by injecting realistic bogus data timeseries of the level sensor of the main system tank, i.e., Tank T1 in DMA1. The attack occurs during the refill process with the bogus data indicating an increasing water level above 4.5 m (following a polynomial shape). This leads the control system to close Pump PU2 and maintain connection to the source only via Pump PU1. The attack occurs on Day 7 at 15:45, with a total duration of 8 h. This scenario can help evaluate the anomaly detection against sophisticated attacks concealed as normal operations. It can also help evaluate the ability of the algorithm to ‘escape’ strong autoregressive dynamics, and thus, by deduction, infer meaningful spatiotemporal relationships in the system.

  • Sc3: Occurrence of asset malfunction, and specifically of the actuator of Pump PU4, located at the inlet of DMA3. After Tank T3 level reaches 5.3 m, the SCADA sends an actuation command to turn off Pump PU4, yet the control is not accomplished due to a fault in the servo-electric motor. This leads Pump PU4 to remain open, pushing water in the DMA and causing Tank T3 to overflow. This event occurs in the second day of Week 2 at 23:45 and has a total duration of 15 h. This scenario helps evaluate the capacity of the anomaly algorithm to detect deviations from the control rules in-place and separate the source of deviation among the faulty behaviour and its cascading effects.

  • Sc4: A threat actor has gained access to the local PLC that oversees the operation of the pressure-boosting station at the inlet of DMA1 and maliciously activates the unscheduled Pump PU3. The attack occurs at 23:45 of the 10th day and has a total duration of 15 h. This scenario helps evaluate the ability of the anomaly detection algorithm to deal with assets that have not been operated within the training period, and the subsequent effects they may have in the system behaviour.

  • Sc5: Aging and material defects had weakened the structural integrity of Pipe P937, located in the middle of DMA2, leading to a sudden burst on the fifth day at 12:30. The event lasts for a total of 20 h, before utility staff applies clamp-on repair couplings to stop the leakage, i.e., no isolation is required, and downstream flow is not interrupted. This scenario can help evaluate the detection capacity of the algorithm in events that occur in non-monitored assets of the system (i.e., outside the graph), and thus its ability to indirectly infer anomalous behaviours solely from the effects of the events.

  • Sc6: In the last scenario a combination of disruptive events occurs, and more specifically a cyber–physical attack against a pump that leads to a pipe rapture. An attacker maliciously activates Pump PU5 located at the inlet of DMA3 for a short time, causing a pressure surge. The excessive stress to downstream pipeline leads to a rapture in Pipe P794. The malicious pump activation occurs on Day 7 at 17:30, with a simultaneous occurrence of the pipe rapture. The pump is activated for 30 min while the subsequent leakage lasts 12 h, before clamp-on repair couplings are applied. This scenario helps evaluate the algorithm's adaptability against combinatorial simultaneous events and test its ability to continue detecting anomalous behaviours even after their cause ends.

The HDGNN used in this case study learns a static subgraph that allows each sensor to be steadily associated with approximately 25% of the network (i.e., ), while the dynamic subgraph in each timestep is derived from the correlation between sensors' readings in the last hour (i.e., ). The feature extraction for the attention processes is performed with a sliding window of 4 h (i.e., ). Both the HDGNN predictions and errors are smoothed via weighted moving average filter of the last hour (i.e., ). The anomaly detection threshold for each sensor is derived using an unlabeled holdout set of the last 5 months (20%) of the historical dataset.

The following paragraphs present the results of anomaly detection across each scenario, showcasing the applicability of the proposed HDGNN under a diverse set of disruptive events, and evaluating the performance of the case formulated HDGNN. In summary, the proposed scheme was able to detect all the events, with the swiftest anomaly detection time being 0 h (i.e., instantly detect the event) and the latest being 3 h, for the malfunction of a pump's actuator. The duration of false alarms prior to events never exceeded 1hr, while post-event false alarms not associated to known events last a maximum of 2.5 h. The longest false alarm associated with the extension of a true alarm past the event duration was 8.5 h. In terms of explainability, the HDGNN was able to localise the affected assets, while in the case of pipe bursts the localisation was more generally inferred through the corresponding soft sensor. To explore the contribution of the additional dynamic subgraph component, Supplementary material, Appendix A provides a comparative analysis of a prediction-based anomaly detection with a static GNN for the case study dataset and scenarios.

The HDGNN results for each scenario are presented as follows in two parts; the first part is the performance of the event detection and the second, the localisation of the events' sources.

Sc1: Event-free operations

In the event-free scenario, a total of three false alarms were issued, as seen in Figure 1 (left). In total, the false alarms ratio in this scenario corresponds to ∼0.68% of the 2-week period, with the longer false alarm lasting 1.75 h.
Figure 1

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left) and true-negative (lower right) predictions using HDGNN for Sc1.

Figure 1

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left) and true-negative (lower right) predictions using HDGNN for Sc1.

Close modal

By examining the highest anomaly scores in each alarm, the sources are identified, respectively, for each as potential issues in (i) the level of Tank T7, (ii) the inlet pressure in Pipe PU1, and (iii) the flow passing through Valve V4. In the case of longest alarm issued, the HDGNN predicts a higher level of Tank T7 and a decrease with slower rates – possibly due to an unusually increased consumption rate in DMA4. The other two alarms are issued instantaneously, because of error deviations that marginally exceeded the defined thresholds, with no obvious trends of behavioural deviations in the HDGNN predictions. Nevertheless, in this case study the anomaly threshold is deterministic, and alarms are activated regardless of the surpass magnitude.

Sc2: Cyber–physical attack of tank level manipulation

In the case of the cyber–physical attack against the level sensor of Tank T1 and the subsequent deactivation of Pump PU2, an alarm is issued at 17:45pm, 2 h after the attack began, with a detection accuracy of 75%. The alarm lasts 1 h after the attack ends, as seen in Figure 2 (left). In this period, three additional alarms are raised, not associated with any known event. The total false alarm ratio, including the detection past the event duration, corresponds to ∼0.8%.
Figure 2

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc2.

Figure 2

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc2.

Close modal
The highest anomaly score throughout the true-positive detection corresponds to Tank T1, except for one step in which the highest anomaly occurs in the flow sensor of Pump PU2. The behavioural deviation between predicted and SCADA reported tank levels during the attack can be seen in Figure 3.
Figure 3

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T1 in Sc2.

Figure 3

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T1 in Sc2.

Close modal

At the early steps of the attack, the HDGNN predicts a similar performance that is reversed soon after to indicate a tank emptying process throughout the attack period – and in contrast to the tank sensor measurements. Past the attack duration, and after receiving the true level of the tank, the system continues to detect anomalous behaviour in the tank level for 1 h.

Sc3: Asset malfunction

In this scenario the anomaly detector issues an alarm 3 h after the asset malfunction caused Pump PU4 to remain open. Subsequently, the anomalous behaviour of the system is intermittently detected, while the alarm continuous past the repair of the servo-electric motor, as illustrated in Figure 4 (left). As in the previous scenarios, three false alarms are also issued in the period. The detection accuracy of HDGNN in this scenario is ∼69%, while the false alarm ratio is ∼0.9%, including the alarm continuing past the actual event.
Figure 4

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left) and true-negative (lower right) predictions using HDGNN for Sc3.

Figure 4

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left) and true-negative (lower right) predictions using HDGNN for Sc3.

Close modal
With respect to the true-positive detection, the highest anomaly scores alternate between the outlet pressure sensor located downstream of Pumps PU5 and PU4, and the flow sensor of PU4, with approximately 60% of the alarm instances associated with the later sensor per se. The HDGNN-predicted flow and outlet pressure of the malfunctioned Pump PU4 against the SCADA measurements can be seen in Figure 5. The HDGNN predictions during the asset malfunction indicate that the system was expected to have lower to no flow coming from the pump, and subsequently also lower pressure at the outlet.
Figure 5

SCADA measurements and smoothed HDGNN predictions for flow (upper) and the outlet pressure (lower) of Pump PU4 in Sc3.

Figure 5

SCADA measurements and smoothed HDGNN predictions for flow (upper) and the outlet pressure (lower) of Pump PU4 in Sc3.

Close modal
In terms of the side effect of the asset malfunction, i.e., the overflow of Tank T3, there are no major deviations between predicted and measured levels, as shown in Figure 6. This suggests that the HDGNN considers it a ‘normal’ behaviour under the given conditions driven by the anomalous behaviour of Pump PU4.
Figure 6

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T3 in Sc3.

Figure 6

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T3 in Sc3.

Close modal

Sc4: Cyber–physical attack with malicious activation of unscheduled pump

In the case of the cyber–physical attack against the unscheduled Pump PU1 near the source-point, the anomaly detection issued an alarm simultaneously to the attack occurrence, with a detection accuracy of 100%. The alarm extends 4.5 h after the attack ends, as seen in Figure 7 (left). In this scenario period, a total of five false alarms are raised, leading to a false alarm ratio of 2.75%.
Figure 7

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc4.

Figure 7

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc4.

Close modal
With respect to the true-positive detection, the highest anomaly scores are those of Pump PU3 for the entire duration of the attack, with the HDGNN predicting a steadily deactivated pump in contrast to the SCADA flow readings, as shown in Figure 8 (upper). Past the malicious event duration, the identified root cause alters and indicates a suspected anomalous behaviour of a parallel pump, i.e., Pump PU2, as shown in Figure 8 (lower).
Figure 8

SCADA measurements and smoothed HDGNN predictions for flow of Pumps PU2 (upper) and PU3 (lower) in Sc4.

Figure 8

SCADA measurements and smoothed HDGNN predictions for flow of Pumps PU2 (upper) and PU3 (lower) in Sc4.

Close modal

Sc5: Pipe burst

For the sudden pipe burst occurring in Pipe P937, located in the middle of DMA2, the algorithm was able to detect an anomalous behaviour and issue an alarm 1.25 h after the burst, while the alarm extends 8.5 h after the pipe is fixed. The detection accuracy in this scenario is 93.75%. In total, five false alarms are raised, with the last three occurring after the incident, as shown in Figure 9 (left). This corresponds to a false alarm ratio of ∼4.5%, with the longest false alarm not associated with the known event having a duration of 2.5 h.
Figure 9

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc5.

Figure 9

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc5.

Close modal
In the true-positive detection, the algorithm almost continuously indicates deviations in terms of the leakage flow in DMA2. The background leakage soft sensor for DMA2 indicates a decrease of the background leakages – due to the decrease of average zone pressure caused by the burst. Conversely, and in contrast to the SCADA measurements, the HDGNN predictions indicate that the system is experiencing increased leakages in DMA2, as seen in Figure 10.
Figure 10

Soft-sensor (SCADA) measurements and smoothed HDGNN predictions for the background leakage flow of DMA2 in Sc5.

Figure 10

Soft-sensor (SCADA) measurements and smoothed HDGNN predictions for the background leakage flow of DMA2 in Sc5.

Close modal
In this anomaly detection event, some isolated instances of larger anomaly scores indicate anomalous behaviours occurring at the level of Tanks T4, located in the same DMA and affected by the pipe burst, and T7. After the pipe repair, the anomaly localisation alternates between system assets. In that post-event alarm, the largest duration is associated with the flow of Pump PU6 at the inlet of DMA2, as the algorithm predicts higher flowrates, as shown in Figure 11.
Figure 11

SCADA measurements and smoothed HDGNN predictions for the flow of Pump PU6 in Sc5.

Figure 11

SCADA measurements and smoothed HDGNN predictions for the flow of Pump PU6 in Sc5.

Close modal

Sc6: Pipe burst caused by malicious activation of pump

For the combination of events of this scenario, i.e., the cyber–physical attack and the cascade effect of a pipe burst, the anomaly detection issued an alarm simultaneously to the attack occurrence, as illustrated in Figure 12 (left), and intermittently continued throughout the duration, with a detection accuracy of 87.5%. The alarm lasted 2 h after the repair. In this scenario, a total of three false alarms are raised, leading to an overall false alarm ratio of 1.56% for this scenario.
Figure 12

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc6.

Figure 12

(Left) System-wide actual state against detected state by HDGNN, and (right) corresponding confusion matrix that summarises true-positive (upper left box), false-negative (upper right), false-positive (lower left), and true-negative (lower right) predictions using HDGNN for Sc6.

Close modal
At the start of the event, a large deviation is detected to the flow of Pump PU5 targeted by the attacker, but it is ranked second. The highest anomaly scores in the beginning of the true-positive detection are those of the DMA3 soft sensor, for a duration of 1.25 h. As illustrated in Figure 13, the HDGNN predicts higher leakage rates than the ones reported by the corresponding sensor.
Figure 13

SCADA measurements and smoothed HDGNN predictions for the background leakage flow of DMA3 in Sc6.

Figure 13

SCADA measurements and smoothed HDGNN predictions for the background leakage flow of DMA3 in Sc6.

Close modal
Subsequently, the anomaly score ranking alternates and steadily points to the tank level of Tank T3, located in the same DMA. As seen in Figure 14, the HDGNN initially matches the reported SCADA measurements, but after the tank level drops to 3 m, it predicts an increase instead of the steady decrease reported. This behavioural deviation between prediction and measurements aptly captures that the DMA inflow does not reach the tank, as expected with the activation of the automated refill protocol when Tank T3 reaches 3 m, and water is pumped in the DMA through Pump PU5.
Figure 14

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T3 in Sc6.

Figure 14

SCADA measurements and smoothed HDGNN predictions for the water level of Tank T3 in Sc6.

Close modal

With respect to the entire scenario set, the HDGNN exhibited a detection accuracy of ∼86.5% for both operational faults and malicious actions and a false alarm ratio of ∼1.9%, under stochastically driven system behaviours. The comparative analysis to a static GNN in the same case study, presented in Supplementary material, Appendix A, indicates the positive effects of jointly accounting for dynamic and static relationships, on both the detection accuracy (+14%) and the reduction of false alarms (−68%).

The proposed HDGNN demonstrates robust detection performance under a diverse set of disruptive scenarios that include operational faults, malicious actions, and their combination. The algorithm was able to detect all the events occurring in the synthetic water distribution system, with a detection accuracy of ∼86.5% across all cases. At the same time, the HDGNN issued a small number of false alarms, with consistent rates across the event-free and the five disruptive scenarios, with an overall false alarm ratio of ∼1.9%. The HDGNN was able to directly associate anomalies with the affected assets per se in all the disruptive scenarios, showing strong localisation capabilities. The results indicate the algorithms robustness in detecting and localising system anomalies without any previous knowledge of specific failure patterns, and under realistic, stochastic system behaviours.

This work also demonstrated the HDGNN's ability to independently identify meaningful spatiotemporal relationships and adapt to operational changes dynamically, without relying on strong autoregressive patterns. This allows the HDGNN to properly generalise to unseen operational patterns and behaviours, and better identify contextual anomalies, i.e., infer if the system behaviour is normal under a given set of conditions, and abnormal under another.

When compared to more ‘traditional’, yet robust, approaches, the HDGNN also exhibits improved performance, exceeding the accuracy, false alarm ratio, and localisation capabilities of a static GNN in detecting complex cyber–physical events under realistic, highly variable conditions. Such conditions may be often found in the real world yet not often encountered in literature, in which detection performance is showcased with respect to either leakages (Vrachimis et al. 2022) or malicious events (Taormina 2017), with the underlying water systems operating under ‘fairly regular’ demand patterns with ‘no significant variations’, thus raising questions over the retainment of performance under real-world conditions (Tsiami & Makropoulos 2021). As such, the performance evaluation and comparative analysis can be deemed sufficiently representative of real-world conditions, while under less variable or more ‘typical’ consumption behaviours, HDGNN is expected to be equally good – if not better. The additional positive effects from coupling static with dynamic relationships are also aligned with the key findings of relevant literature over the advantage of dynamic GNNs to capture stochastic spatiotemporal relationships over the traditional ‘deterministic’ approaches (Kim et al. 2022). To the best of our knowledge, the proposed HDGNN is the first approach in our sector to explicitly consider the multimodal behaviour of systems and the dynamic spatiotemporal relationships emerging from the evolution of their operational connectivity, without loss of potentially strong, long-lasting relationships. Moreover, the proposed architecture goes beyond the combination of static and dynamic relationships by also accounting for their time-varying importance (i.e., the existence of connections does not imply a priori a steady or uniform importance) as it considers the significance of each relationship under both the existing system state and the unique characteristics of each sensor captured via embedding vectors. Hence, it continuously maps and ‘evaluates’ both the static and the dynamic relationships of a system, as they evolve through time. A key advantage of the proposed HDGNN, regardless of its application, is that its underlying graph connectivity can span any combination between a static fully connected (i.e., ) to a fully dynamic graph (i.e., ). This allows the HDGNN to adjust its internal relationship dynamics by relying more on static or dynamic connections, per the problem's needs. To this end, there are both practical and conceptual evidences towards its adoption as a robust, adaptive approach.

While the proposed approach demonstrates promising results in anomaly detection, it is important to acknowledge certain limitations. As every other unsupervised learning, the training of the HDGNN demonstrated in this case study builds on the principle that abnormal instances in the datasets are much rarer than normal, thus no labels are required to train the model into representing the ‘normal’ behaviour. However, if this assumption does not hold true, and training data are not properly curated, the anomaly detection will generate high rates of false alarms, and lead to the ‘cry wolf’ effect. An equally important aspect in the same vein is that of the smoothing process parameterisation, and specifically the time window for the weighted moving average filtering, where the trade-off between low false alarms and swift detection of actual events should be accounted for.

Another limitation and challenge in anomaly detection is the ability to detect and localise two or more events that occur simultaneously. Although not a limitation of the HDGNN per se, the downstream anomaly detection scheme utilised in this work can associate the system anomaly to a single point in the system based on the magnitude of the resulting deviation. This may create a gap in detection, where the anomaly detection system cannot automatically alert for a simultaneous event until the most critical issue is resolved.

An overarching limitation in anomaly detection is the ability to localise events when those occur in unmonitored parts of the system, and subsequently outside the HDGNN graph. Although the use of soft sensors, as illustrated in this case study, can help better identify and localise such events, further downstream schemes could be pursued to deduct more spatially specific information and allow for swifter decision making at the operational level.

This paper introduced a HDGNN architecture as an advancement in the expressive power of GNNs for multivariate data streams of complex systems, particularly in the water sector. The proposed architecture is the first in the sector to jointly address both persistent and dynamically shifting spatiotemporal relationships among system components, allowing a more detailed and realistic representation of multimodal behaviours and, thus, a more robust inference of contextual anomalies; a great challenge in the anomaly detection domain. This capability becomes more pivotal under the operational complexity emerging from the expanding cyber–physical water system architecture and – in parallel – the evolving perplexity of cyber–physical attacks. Notably, the HDGNN was able to perform robust anomaly detection on both cyber–physical attacks and asset malfunctions, under the same configuration and training, proving that its expressive power can capture a wide range of unseen system behaviours. Especially since this capability emerged without the use of labelled dataset, it signifies its practical potential for a broad range of applications without the need for multiple instances or application-specific modifications. This can be of special interest to water companies, as they could derive a single, fine-tuned instance of the HDGNN of their system and use it across departments. Moreover, the HDGNN architecture requires no prior knowledge regarding both the static and the dynamic subgraphs in the system, and can be used at any scale (from DMA to system wide). Moreover, the HDGNN architecture allows for a wide range of potential system connectivities, ranging from fully static to fully dynamic graphs, thus being able to adapt to each case.

Besides the underlying conceptual advancement that makes it more suitable for complex water systems, the HDGNN also exhibits enhanced performance, with respect to ‘traditional’ static approaches. The advantage of hybrid over static connectivity appears in all the desired traits of the anomaly detection, i.e., the detection accuracy, the localisation, the contextual inference, and the false alarm ratio – which in practice can lead to the ‘crying wolf effect’. Notably, the performance and comparative analysis of the HDGNN were evaluated under realistic boundary conditions, thus yielding more realistic insights. The case study presented in this work provides a synthetic, yet realistic benchmark cyber–physical water system that mimics real-world dynamics, including background leakages and stochastically generated water demands. The dataset can aid future researchers in evaluating the performance of relevant applications in conditions that more closely represent real-world variability and stochasticity – thus ensuring relevance and scalability to larger real systems. This will be crucial in tackling the remaining and future challenges in GNNs’ expressive power and anomaly detection for complex systems.

Future direction could include enhancing the proposed anomaly detection scheme to detect multiple simultaneous anomalies. To this end, new approaches need to be explored employing either simplistic methodologies, e.g., set a second threshold above which parallel alarms are issued, or more advanced ML-based approaches to indicate if the resulting anomalies belong to the same or different events. Moreover, the HDGNN scheme can be combined with parallel modules to allow the detection of anomalies outside the graph, i.e., the cyber layer. As a point towards future research, such a scheme could build on DT technologies to simulate multiple what-if scenarios of asset failures based on the most up-to-date reference model of the system. Those simulation results can then be used as ‘fingerprints’ for each asset's failure and help identify the most probable cause of the identified anomaly.

Nevertheless, the HDGNN anomaly detection scheme, explored under both typical operational faults (i.e., pipe bursts and asset malfunctions) and intentional malicious actions (i.e., cyber–physical attacks), has already yielded promising results in terms of its detection capacity for contextual anomalies and robust explainability, surpassing those of deterministic static GNN. We argue that the proposed joint exploration of static and dynamic relationships can help better map the varying behavioural patterns of complex real-world systems and enhance the performance of graph-based deep learning models used in prediction and anomaly detection tasks.

The research work was supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the ‘Third Call for HFRI PhD Fellowships’ (Fellowship Number: 6349).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Amin
S.
,
Litrico
X.
,
Sastry
S. S.
&
Bayen
A. M.
(
2013
)
Cyber security of water scada systems-part II: Attack detection using enhanced hydrodynamic models
,
IEEE Transactions on Control Systems Technology, IEEE
,
21
(
5
),
1679
1693
.
Carpentier
P.
&
Cohen
G.
(
1991
)
State estimation and leak detection in water distribution networks
,
Civil Engineering Systems
,
8
(
4
),
247
257
.
Chan
T. K.
,
Chin
C. S.
&
Zhong
X.
(
2019
)
Review of current technologies and proposed intelligent methodologies for water distributed network leakage detection
,
IEEE Access, IEEE
,
6
(
c
),
78846
78867
.
Chandola
V.
,
Banerjee
A.
&
Kumar
V.
(
2009
)
Anomaly detection: A survey
,
ACM Computing Surveys
,
41
(
3
),
1
58
.
Chela
N. P.
,
Moraitis
G.
&
Makropoulos
C.
(
2023
) '
Modelling leakage and optimizing PRV settings for NRW reduction
',
2th World Congress on Water Resources and Environment (EWRA 2023) ‘Managing Water-Energy-Land-Food Under Climatic, Environmental and Social Instability’
.
Thessaloniki, Greece
.
Chen
Y.
&
Wu
L.
(
2022
)
Graph neural networks: Graph structure learning
,
Graph Neural Networks: Foundations, Frontiers, and Applications
,
297
321
.
Chen
Y.
,
Wu
L.
&
Zaki
M. J.
(
2020
)
Iterative deep graph learning for graph neural networks: Better and robust node embeddings
,
Advances in Neural Information Processing Systems
.
2020-Decem(NeurIPS)
, 33, 19314–19326.
Deng
A.
&
Hooi
B.
(
2021
) '
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series
’,
35th AAAI Conference on Artificial Intelligence, AAAI 2021
, Vol.
5A
, pp.
4027
4035
.
Dimas
P.
,
Nikolopoulos
D.
&
Makropoulos
C.
(
2022
)
Simulation framework for pipe failure detection and replacement scheduling optimization
,
Environmental Sciences Proceedings, 21, 37. MDPI, Basel Switzerland
.
Edgeworth
F. Y.
(
1887
)
XLI. On discordant observations
,
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science
,
23
(
143
),
364
375
.
Kim
H.
,
Lee
B. S.
,
Shin
W. Y.
&
Lim
S.
(
2022
)
Graph anomaly detection with graph neural networks: current status and challenges
,
IEEE Access
,
10
,
111820
111829
.
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. (2017) Self-Normalizing Neural Networks, Advances in Neural Information Processing Systems, 30, 971–980.
Makropoulos
C.
&
Savić
D. A.
(
2019
)
Urban hydroinformatics: Past, present and future
,
Water (Switzerland)
,
11
(
10
), 1959–1976.
Moraitis
G.
,
Sakki
G. K.
,
Karavokiros
G.
,
Nikolopoulos
D.
,
Tsoukalas
I.
,
Kossieris
P.
&
Makropoulos
C.
(
2023
)
Exploring the cyber-physical threat landscape of water systems: a socio-Technical modelling approach
,
Water (Switzerland)
,
15
(
9
), 1687.
Mounce
S. R.
,
Mounce
R. B.
&
Boxall
J. B.
(
2011
)
Novelty detection for time series data analysis in water distribution systems using support vector machines
,
Journal of Hydroinformatics
,
13
(
4
),
672
686
.
Murillo
A.
,
Taormina
R.
,
Tippenhauer
N. O.
&
Galelli
S.
(
2023
)
High-fidelity cyber and physical simulation of water distribution systems. II: Enabling cyber-Physical attack localization
,
Journal of Water Resources Planning and Management
,
149
(
5
), 04023010.
Nikolopoulos
D.
,
Moraitis
G.
,
Bouziotas
D.
,
Lykou
A.
,
Karavokiros
G.
&
Makropoulos
C.
(
2020
)
Cyber-physical stress-testing platform for water distribution networks
,
Journal of Environmental Engineering
,
146
(
7
),
04020061
.
Norcliffe-Brown
W.
,
Vafeias
E.
&
Parisot
S.
(
2018
)
Learning conditioned graph structures for interpretable visual question answering
,
Advances in Neural Information Processing Systems
,
2018-Decem(NeurIPS)
, 31,
8334
8343
.
Ostfeld
A.
,
Salomons
E.
,
Ormsbee
L.
,
Uber
J. G.
,
Bros
C. M.
,
Kalungi
P.
,
Burd
R.
,
Zazula-Coetzee
B.
,
Belrain
T.
,
Kang
D.
,
Lansey
K.
,
Shen
H.
,
McBean
E.
,
Yi Wu
Z.
,
Walski
T.
,
Alvisi
S.
,
Franchini
M.
,
Johnson
J. P.
,
Ghimire
S. R.
,
Barkdoll
B. D.
,
Koppel
T.
,
Vassiljev
A.
,
Kim
J. H.
,
Chung
G.
,
Yoo
D. G.
,
Diao
K.
,
Zhou
Y.
,
Li
J.
,
Liu
Z.
,
Chang
K.
,
Gao
J.
,
Qu
S.
,
Yuan
Y.
,
Prasad
T. D.
,
Laucelli
D.
,
Vamvakeridou Lyroudia
L. S.
,
Kapelan
Z.
,
Savic
D.
,
Berardi
L.
,
Barbaro
G.
,
Giustolisi
O.
,
Asadzadeh
M.
,
Tolson
B. A.
&
McKillop
R.
(
2012
)
Battle of the water calibration networks
,
Journal of Water Resources Planning and Management
,
138
(
5
),
523
532
.
Pang
G.
,
Shen
C.
,
Cao
L.
&
Hengel
A. V. D.
(
2021
)
Deep learning for anomaly detection: A review
,
ACM Computing Surveys
,
54
(
2
),
1
36
.
Rossman
L. a.
(
2000
)
EPANET Programmer's Toolkit
,
US Environmental Protection Agency. Office of Research and Development, Washington, D.C., USA
. pp.
1
74
.
Sarni
W.
,
White
C.
,
Webb
R.
,
Cross
K.
&
Glotzbach
R.
(
2021
)
Digital Water – Industry Leaders Chart the Transformation Journey
.
London, United Kingdom
:
Digital Water
.
Savić
D.
(
2021
)
Digital water developments and lessons learned from automation in the car and aircraft industries
,
Engineering
, 9, 35-41.
Savitzky
A.
&
Golay
M. J. E.
(
1964
)
Smoothing and differentiation of data by simplified least squares procedures
,
Analytical Chemistry
,
36
(
8
),
1627
1639
.
Seo
Y.
,
Defferrard
M.
,
Vandergheynst
P.
&
Bresson
X.
(
2018
) '
Structured Sequence Modeling with Graph Convolutional Recurrent Networks
’,
Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part I 25
, pp.
362
373
.
SWAN Digital Twin Work Group
. (
2022
)
Digital Twin Readiness Guide: Applying SWAN's Digital Twin Architecture to the Water Industry
.
Taormina
R.
(
2017
)
BATtle of the Attack Detection ALgorithms (BATADAL) Annual Water Distribution Systems Analysis Symposium
, pp.
1
8
.
Taormina
R.
,
Galelli
S.
,
Tippenhauer
N. O.
,
Salomons
E.
,
Ostfeld
A.
,
Eliades
D. G.
,
Aghashahi
M.
,
Sundararajan
R.
,
Pourahmadi
M.
,
Banks
M. K.
,
Brentan
B. M.
,
Campbell
E.
,
Lima
G.
,
Manzi
D.
,
Ayala-Cabrera
D.
,
Herrera
M.
,
Montalvo
I.
,
Izquierdo
J.
,
Luvizotto
E.
,
Chandy
S. E.
,
Rasekh
A.
,
Barker
Z. A.
,
Campbell
B.
,
Shafiee
M. E.
,
Giacomoni
M.
,
Gatsis
N.
,
Taha
A.
,
Abokifa
A. A.
,
Haddad
K.
,
Lo
C. S.
,
Biswas
P.
,
Pasha
M. F. K.
,
Kc
B.
,
Somasundaram
S. L.
,
Housh
M.
&
Ohar
Z.
(
2018
)
The battle of the attack detection algorithms: disclosing cyber attacks on water distribution networks
,
Journal of Water Resources Planning and Management
,
144 (8), 04018048. doi:10.1061/(ASCE)WR.1943-5452.0000969
.
Taormina
R.
,
Galelli
S.
,
Douglas
H. C.
,
Tippenhauer
N. O.
,
Salomons
E.
&
Ostfeld
A.
(
2019
)
A toolbox for assessing the impacts of cyber-physical attacks on water distribution systems
,
Environmental Modelling and Software
,
112
,
46
51
.
Trivedi
R.
,
Farajtabar
M.
,
Biswal
P.
&
Zha
H.
(
2019
) '
Dyrep: Learning representations over dynamic graphs
’,
International Conference on Learning Representations
.
Tuptuk
N.
,
Hazell
P.
,
Watson
J.
&
Hailes
S.
(
2021
)
A systematic review of the state of cyber-security in water systems
,
Water
,
13
(
1
),
81
.
CRC Press
.
Velickovic
P.
,
Cucurull
G.
,
Casanova
A.
,
Romero
A.
,
Lio
P.
&
Bengio
Y.
(
2018
) '
Graph Attention Networks
’,
International Conference on Learning Representations
.
Vrachimis
S. G.
,
Eliades
D. G.
,
Taormina
R.
,
Kapelan
Z.
,
Ostfeld
A.
,
Liu
S.
,
Kyriakou
M.
,
Pavlou
P.
,
Qiu
M.
&
Polycarpou
M. M.
(
2022
)
Battle of the leakage detection and isolation methods
,
Journal of Water Resources Planning and Management
,
148
(
12
),
04022068
.
Wang, S. & Yu, P. S. (2022) Graph Neural Networks in Anomaly Detection. In: Wu, L., Cui, P., Pei, J., Zhao, L. (eds) Graph Neural Networks: Foundations, Frontiers, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-16-6054-2_26.
Wangen, G. (2019). Quantifying and Analyzing Information Security Risk from Incident Data. In: Albanese, M., Horne, R., Probst, C. (eds) Graphical Models for Security. GraMSec 2019. Lecture Notes in Computer Science, vol 11720. Springer, Cham. https://doi.org/10.1007/978-3-030-36537-0_7
.
Yang, L., Chatelain, C. & Adam, S. (2024) Dynamic graph representation learning with neural networks: A survey. IEEE Access, 12, 43460–43484.
Yu
W.
,
Cheng
W.
,
Aggarwal
C. C.
,
Zhang
K.
,
Chen
H.
&
Wang
W.
(
2018
). '
NetWalk: A flexible deep embedding approach for anomaly detection in dynamic networks
’,
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
2672
2681
.
Zheng
L.
,
Li
Z.
,
Li
J.
,
Li
Z.
&
Gao
J.
(
2019
). '
Addgraph: Anomaly detection in dynamic graph using attention-based temporal GCN
',
IJCAI International Joint Conference on Artificial Intelligence
.
2019-Augus
, pp.
4419
4425
.
Zhu
Y.
,
Xu
W.
,
Zhang
J.
,
Du
Y.
,
Zhang
J.
,
Liu
Q.
,
Yang
C.
&
Wu
S.
(
2021
)
A Survey on Graph Structure Learning: Progress and Opportunities
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data