Abstract
Water cyber-physical systems (CPSs) have experienced anomalies from cyber-physical attacks as well as conventional physical and operational failures (e.g., pipe leaks/bursts). In this regard, rapidly distinguishing and identifying a facing failure event from other possible failure events is necessary to take rapid emergency and recovery actions and, in turn, strengthen system's resilience. This paper investigated the performance of machine learning classification models – support vector machine (SVM), random forest (RF), and artificial neural networks (ANNs) – to differentiate and identify failure events that can occur in a water distribution network (WDN). Datasets for model features related to tank water levels, nodal pressure, and water flow of pumps and valves were produced using hydraulic model simulation (WNTR and epanetCPA tools) for C-Town WDN under pipe leaks/bursts, cyber-attacks, and physical attacks. The evaluation of accuracy, precision, recall, and F1-score for the three models in failure type identification showed the variation of their performances depending on the specific failure types and data noise levels. Based on the findings, this study discussed insights into building a framework consisting of multiple classification models, rather than relying on a single best-performing model, for the reliable classification and identification of failure types in WDNs.
HIGHLIGHTS
Further investigation for anomaly detection machine learning models in identifying a specific failure type is needed for WDN resilience.
Machine learning models showed reliable failure identification performance.
The models’ performance varied with the failure types and data noise levels.
The models produced misclassification between different failure events that produced similar hydraulic responses.
Insights into a framework with multiple classification models were discussed to improve the reliable failure identification of WDNs.
INTRODUCTION
Numerous smart meters, sensors, and data acquisition systems are being used to monitor and autonomously control WDNs (water distribution networks) owing to the ongoing technological advances over a couple of decades, particularly the development of affordable sensors and universal internet access (Wu et al. 2022). The smart system deploys Information and Communication Technologies (ICTs), which have received attention to achieve efficiency, sustainability, livability, and resilience goals in urban water management (Mutchek & Williams 2014). ICT employs complex architecture including sensors, communication, programable logic controllers (PLCs), actuators, remote terminal units (RTUs), and data and control servers – called Supervisory Control and data acquisition system (SCADA) (Nazir et al. 2021). Transforming conventional WDNs to water CPSs (Cyber-Physical Systems) with SCADA has supported real-time monitoring and data collection and remote control to improve the system's operational efficiency and capabilities related to rapid, accurate failure detection and timely recovery actions, which, in turn, strengthens system resilience (Walsby 2013; Mutchek & Williams 2014; Shin et al. 2018).
However, the water CPSs have become more susceptible to cyber and physical attacks. Water infrastructure, such as wastewater treatment facilities and WDNs, is one of the most targeted by cybercriminals since it is essential to the sustainable growth of modern society. In 2000, a former employee of the wastewater treatment facility in Maroochy, Australia, changed the pumps' operation by maliciously sending the incorrect command, causing the wastewater to overflow and produce an unpleasant odor (Ramotsoela et al. 2018). Similarly, in Georgia, USA, a drinking water system was physically attacked in 2013; the attacker gained physical access to the system and changed the fluorine and chlorination settings (Do et al. 2017). It is also reported that the water sector had the third largest number of cyber-physical incidents among critical infrastructures (Clark et al. 2017).
Numerous studies have introduced detection models and algorithms for operational anomalies from cyber-physical attacks in WDNs. For example, Housh & Ohar (2018) used a simulation-model-based approach for cyber-physical attack detection for WDNs. Abokifa et al. (2019) used a combination of ANN (artificial neural network) and principal component analysis (PCA) for the real-time detection of attacks in WDNs. Taormina & Galelli (2018) employed deep learning autoencoders (AEs) with a threshold for reconstruction error that can detect cyber-attacks. Tsiami & Makropoulos (2021) introduced the algorithm of graph convolutional neural network considering the temporal and spatial relationships of SCADA data to improve the detection of cyber-physical attack events. Housh et al. (2022) proposed a semi-supervised detection algorithm with dimensionality reduction followed by support vector data description (SVDD), which does not require labeled attack datasets in its training to consider real-world applications. Brentan et al. (2021) introduced a two-step process for cyber-attack detection, which includes a fast Independent Component Analysis algorithm to separate multiple sensors data into the individual component (flow, pressure, and tank water level) and a statistical control algorithm (abrupt change point detection algorithm) to detect changes in control variables due to the attacks. A more detailed description of the approaches can be found in the Battle of the Attack Detection Algorithm (BATADAL) (Taormina et al. 2018).
However, the operational anomalies in the WDNs can be caused by not only cyber-physical failures due to attacks or malfunctions but also conventional failures/disruptions such as significant pipe leaks or bursts. In this regard, the first step to rapidly take emergency or recovery actions against the WDN disruptions – i.e., strengthen the resilience of WDNs – is the rapid identification of failure events that caused operational anomalies (Shin et al. 2018). However, while previous studies have tested their models and algorithms for a specific failure type, they have made few efforts to distinguish and identify a failure type during a WDN disruption from other failure types – which can occur in a WDN or water CPS. For example, the approaches introduced in BATADAL were evaluated for the detection of cyber-attack events only.
For conventional physical failures and disruptions, a common way to detect pipe leaks or bursts is monitoring minimum night flow in district metered areas (DMAs) (Amoatey et al. 2021). DMAs are hydraulically independent sectors of a WDN, which typically have inlet flow meters and pressure sensors to monitor pipe leakage. The analysis of minimum night flow for a DMA using probabilistic approaches or machine learning models, considering minimal human activities during the night, has suggested the effective detection of background pipe leakage or bursts. For example, Głomb et al. (2023) investigated the performance of multiple machine learning anomaly detectors in the rapid and accurate detection of pipe leaks using the data of DMAs' water consumption, inflow, and pressure. In addition, the analysis of acoustic sensor data from a DMA is also used to detect and localize pipe leaks (Xue et al. 2020). Siddique et al. (2023) used an acoustic emission scalogram combined with a deep learning algorithm (convolution neural network) to diagnose pipe conditions.
Similarly, Nam et al. (2019) proposed hybrid PCA and exponentially weighted moving average (EWMA) for the detection and isolation monitoring of the pipe burst. Mashhadi et al. (2021) discussed the use of machine learning algorithms for leak detection and localization in WDNs. Fan et al. (2021) used ANN (supervised) and AE (unsupervised) algorithms for leak detection. Ahmad et al. (2023) used a novel vulnerability index and 1-D convolutional neural network for pipe leak and size detection. Here, the acoustic emission hit feature was used for pipe leak detection. Asghari et al. (2023) employed machine learning-based transient analysis for leak detection, which substitutes complex inefficient optimization algorithms with machine learning models. In this context, further investigation is needed into how well the data-driven algorithms and models perform in identifying a specific failure type from multiple failure types that can occur in water CPSs. The rapid identification of the failure type will help the system manager quickly implement the response and recovery actions to return to the normal operating conditions of WDNs (Shin et al. 2020).
Other infrastructure sectors have investigated the classification of failure events in their systems using data-driven models. Anwar et al. (2015) used different machine learning models to differentiate cyber-attacks from physical faults in a smart electrical grid. Patil et al. (2019) used and compared RF (random forest), SVM (support vector machine), K-nearest neighbor (KNN) and Bagging Tree to classify sensor faults and cyber-attacks in smart buildings. Hashim et al. (2020) used PCA and multiclass SVM for detecting and identifying faults (leakage and equipment malfunction) in nonresidential building water pipes. Nazir et al. (2021) used KNN and SVM for multiclass classification as supervised learning and unsupervised AE for detecting anomalies in IT operations in WDNs. However, to the best of our knowledge, less attention has been paid to distinguishing and identifying failure types for WDNs with CPSs.
Also, the data-driven models in the previous studies are trained, validated, and tested using clean datasets and the assumption of faultless sensor monitoring. The real-world sensors consist of faults and noise in their measurements (El-Zahab & Zayed 2019). These alterations are either uniformly or unevenly reported in the dataset. The uniform noises in the dataset make it challenging to identify anomalies (Abokifa et al. 2019), which leads to misinterpretations of WDN failures. Therefore, it is crucial to assess the model's performance to outliers brought on by measurement noise, which may not always signify failure and has not yet been fully investigated for WDNs under cyber-physical failures.
Thus, this study evaluates machine learning classification models to differentiate and identify the failure types among cyber-physical attacks and conventional disruptions (pipe leaks/bursts) using datasets including noise, with the following question: can the machine learning classification models that have been used to detect WDN's anomalies from a specific type of failure also differentiate and identify a failure event from other possible failure events? The contributions of this study include providing insights into advancing data-driven models for identifying different failure types in WDNs. This will help rapid emergency and recovery actions to WDN disruptions from cyber-physical attacks and conventional physical failures and, in turn, enhance the WDN's resilience.
METHODS
The process of testing the performance of classification models in WDN failure type identification.
The process of testing the performance of classification models in WDN failure type identification.
Datasets for WDN failures
Study of the WDN
Characterization of C-town WDN failures
This study considered three types of WDN disruptions – i.e., conventional disruptions (pipe leaks/bursts), cyber-attacks, and physical attacks. Collecting real-world data balanced between normal operational and failure states is a challenge. This is because the occurrence of cyber-physical attacks in WDNs is rare, despite its growing risk, and the lack of a dataset (unbalanced dataset) can affect the performance of failure identification models (Dogo et al. 2020). Thus, as also considered in Taormina et al. (2018), the datasets for 29 variables for failure type identification were created through the simulation of hydraulic models – i.e., Water Network Tool for Resilience (WNTR) and epanetCPA (Klise et al. 2018; Taormina et al. 2019). WNTR is an open-source Python-based model that runs on the EPANET engine. This study used WNTR to generate datasets for conventional disruption scenarios (pipe leaks/bursts) by pressure-driven analysis. WNTR was iteratively simulated to obtain the operational data (e.g., nodal pressure, tank water level, water flow, and status of the pump), changing the parameters and leak nodes. The epanetCPA, an object-oriented MATLAB toolbox based on EPANET, was used for simulating both cyber and physical attacks in the C-town WDN. The epanetCPA tool can simulate the interactions between WDN's physical components (e.g., tanks, pumps, and valves) and cyber components (PLC and SCADA), which provides the flexibility to design the cyber and physical attack scenarios that can occur in WDNs. The datasets consisting of the events of 388 pipe leak, 128 cyber-attack, and 72 physical attack records, respectively, were created for training the failure identification models.
Conventional disruption scenarios




Cyber-attack scenarios
Cyber-attack scenarios for water CPSs are presented by Taormina et al. (2017, 2019). In this study, 6 days (144 h) duration of each scenario is created for the cyber-attack datasets. A total of 128 datasets for cyber-attack scenarios described in Table 1 are generated for training and testing machine learning models for failure identification in this study. Cyber-attack scenario 1 in Table 1 signifies the attack on the communication link between the sensors for the tank water level and PLCs. For example, the reading on the water level in a tank is manipulated as being constantly higher than a threshold level, regardless of its actual condition, which directs the PLC to close pumps and valves. Similarly, the same attack is simulated for other components (T2:PLC3, T3:PLC4, T4:PLC5, T5:PLC7, and T7:PLC9) for 96, 108, 120, and 132 h.
Cyber-attack scenario specification
No . | Scenario . | Attacked components . | Duration (h) . |
---|---|---|---|
1 | Communication between tank water level and PLC | T1 and PLC2; T2 and PLC3; T3 and PLC4; T4 and PLC6; T5 and PLC7; T7 and PLC9 | 96,108,120,132 |
2 | Modification of control logic of PLC (Switches the pump intermittently) | PLC1 and Pump 1 and 2; PLC3 and Pump 4 and 5; PLC3 and Pump 6 and 7; PLC5 and Pump 8; PLC5 and Pump 10 and 11 | 96,108,120,132 |
3 | Denial of Service (DOS): Connection link between PLCs | PLC2 and PLC1; PLC4 and PLC3; PLC9 and PLC5; PLC6 and PLC3; PLC7 and PLC5 | 96,108,120,132 |
4 | Replay attacks in Scenario 1 | T1 and PLC2, and SCADA T2 and PLC3, and SCADA T3 and PLC4, and SCADA T4 and PLC6, and SCADA T5 and PLC7, and SCADA T7 and PLC9, and SCADA | 96,108,120,132 |
5 | Replay attacks in Scenario 2 | PLC1 and Pump 1 and 2, and SCADA PLC3 and Pump 4 and 5, and SCADA PLC3 and Pump 6 and 7, and SCADA PLC5 and Pump 8, and SCADA PLC5 and Pump 10 and 11, and SCADA | 96,108,120,132 |
6 | Replay attacks in scenario 3 | PLC2 and PLC1, and SCADA PLC4 and PLC3, and SCADA PLC9 and PLC5, and SCADA PLC6 and PLC3, and SCADA PLC7 and PLC5, and SCADA | 96,108,120,132 |
No . | Scenario . | Attacked components . | Duration (h) . |
---|---|---|---|
1 | Communication between tank water level and PLC | T1 and PLC2; T2 and PLC3; T3 and PLC4; T4 and PLC6; T5 and PLC7; T7 and PLC9 | 96,108,120,132 |
2 | Modification of control logic of PLC (Switches the pump intermittently) | PLC1 and Pump 1 and 2; PLC3 and Pump 4 and 5; PLC3 and Pump 6 and 7; PLC5 and Pump 8; PLC5 and Pump 10 and 11 | 96,108,120,132 |
3 | Denial of Service (DOS): Connection link between PLCs | PLC2 and PLC1; PLC4 and PLC3; PLC9 and PLC5; PLC6 and PLC3; PLC7 and PLC5 | 96,108,120,132 |
4 | Replay attacks in Scenario 1 | T1 and PLC2, and SCADA T2 and PLC3, and SCADA T3 and PLC4, and SCADA T4 and PLC6, and SCADA T5 and PLC7, and SCADA T7 and PLC9, and SCADA | 96,108,120,132 |
5 | Replay attacks in Scenario 2 | PLC1 and Pump 1 and 2, and SCADA PLC3 and Pump 4 and 5, and SCADA PLC3 and Pump 6 and 7, and SCADA PLC5 and Pump 8, and SCADA PLC5 and Pump 10 and 11, and SCADA | 96,108,120,132 |
6 | Replay attacks in scenario 3 | PLC2 and PLC1, and SCADA PLC4 and PLC3, and SCADA PLC9 and PLC5, and SCADA PLC6 and PLC3, and SCADA PLC7 and PLC5, and SCADA | 96,108,120,132 |
In scenario 2, the control logic of PLCs is manipulated, resulting in intermittent switching on/off of pumps. This attack scenario was also carried out for different components and duration, as shown in Table 1.
Scenario 3 is a Denial-of-Service attack (DOS) that was designed for the PLCs. In this scenario, PLC fails to receive the data for updated water levels for the tank and keeps the pump on. Scenarios 4, 5, and 6 were designed with a replay attack in scenarios 1, 2, and 3, respectively, but hide the attacks as if the WDN is in normal conditions by deliberately replaying data of the WDN status under a normal state. Here, the cyber-attack scenarios presented above are implemented by the attack on the cyber assets of WDNs. It is also noted that the attack scenarios (including physical attacks in the following section) can similarly occur from malfunctions or failures in sensors or actuators.
Physical attack scenarios
This study considered a physical attack as physically breaching the system's control and directly altering the system's operation – e.g., changing the pump status (on/off) being hidden against the control rule. Table 2 summarizes the specification of physical attack scenarios. Altogether 72 datasets were created every 6 days (144 h) with hourly interval data using epanetCPA in the C-town WDN.
Physical attack scenarios
No . | Scenario . | Components . | Duration (h) . |
---|---|---|---|
1 | Turn on the pump | Pump 1, Pump 2, Pump 4, Pump 5, Pump 6, Pump 7, Pump 8, Pump 10, Pump 11 | 96,108,120,132 |
2 | Turn off the pump | Pump 1, Pump 2, Pump 4, Pump 5, Pump 6, Pump 7, Pump 8, Pump 10, Pump 11 | 96,108,120,132 |
No . | Scenario . | Components . | Duration (h) . |
---|---|---|---|
1 | Turn on the pump | Pump 1, Pump 2, Pump 4, Pump 5, Pump 6, Pump 7, Pump 8, Pump 10, Pump 11 | 96,108,120,132 |
2 | Turn off the pump | Pump 1, Pump 2, Pump 4, Pump 5, Pump 6, Pump 7, Pump 8, Pump 10, Pump 11 | 96,108,120,132 |
Normal operation scenario
WNTR simulation was carried out with a pressure-driven analysis model for a 6-month period to produce the normal condition dataset (4,320 h). Data of 1-h intervals were used for training the machine learning models for failure identification. The simulation used the C-town WDN with the default base demand and demand patterns.
Generation of data noise
In this study, the training datasets for normal and failure conditions were obtained through the hydraulic models (WNTR and epanetCPA) simulation, which considered no noise in the sensor data. However, in practice, the sensor data of WDN contains errors and noise in their measurement. In this regard, this study additionally tested the machine learning model's performance in failure type identification against different noise levels. The noise was added to the datasets for continuous features, which follows Gaussian distribution. For every observation of a clear signal, a randomly generated noise value based on the Gaussian distribution was produced and added to the dataset that was obtained through hydraulic models' simulation (Abokifa et al. 2019). The noise value had varied standard deviation values from zero mean. The noisy datasets were produced using a standard deviation range of 0–0.3 with an interval of 0.05, with zero denoting data that is entirely clear and 3.0 denoting an increasing order of noise.
Failure identification models
Identifying and differentiating the type of a failure event among the ones that can occur in a WDN is a multiclass classification problem. In this regard, this study adopted three supervised machine learning models – ANN, SVM, and RF, which have been widely used for WDN anomaly detection (Jain et al. 1996; Breiman 2001; Widodo & Yang 2007).
ANN
The ANN is a supervised machine learning model with a network of multiple layers fully connected consisting of neurons. The basic frameworks are the input, hidden, and output layers (Fan et al. 2021). Links connect the nodes of a layer, and the network becomes increasingly deep as hidden layers are added. The ANN model can explain complicated nonlinear relations by increasing the number of neurons and hidden layers, which also produces good accuracy despite the expense of high computation requirements and the risk of overfitting (Fan et al. 2021).
In this study, the ANN architecture has one input layer with 29 features that comprise hourly interval pressure data from nodes and tanks, two hidden layers with 76 neurons each, and an output layer providing the probabilistic classification of the normal states and three disrupted states from pipe leaks, cyber-attacks, and physical attacks in the C-town WDN. Rectified linear unit (RELU) was selected as the activation function in hidden layers (Agarap 2018). The values in the output layer are scaled using the SoftMax function to reflect probabilities of normal, cyber-attack, physical attack, and pipe leak/burst events, which are added up to 1. Instead of simply dividing each probability by the total, it employs the exponential function, which helps highlight higher values and suppress lower ones. In contrast to linear regression, the SoftMax function allows for the presence of many classes that assist in multiclass classification (Qi et al. 2017).
The ANN model was built using Keras (Kim et al. 2022) dense function, with the weights initialized automatically as biases. Keras dense function was selected because of its simplicity and fast iteration, even in complex models. A sequential layer activates feed-forward neural networks, and layers are added sequentially. Sequential models construct deep neural networks by adding layers on top of each other. The model was compiled with stochastic gradient descent (SGD) optimizer to minimize the loss function, which was set to 0.9. The learning rate was set to 0.01. Training data is normalized with a standard scaler. One hot encoding was applied that transfers categorical value to the multiple class columns and assigns a binary value of 0 and 1 to the respective class. Seventy-five per cent of data was used for training, 25% for testing with stratified sampling, and a batch size of 48 and 100 epochs was used while training and testing the dataset.
SVM
SVM is also a supervised machine learning model that has been mainly used for classification, regression, and outlier detection (Pedregosa et al. 2011). SVM can handle very large features, making it efficient in complex classification tasks (Widodo & Yang 2007). When using the SVM model, each data point is represented as a point in n number of dimensional spaces (features), with each feature's value being the value of a specific coordinate. Next, classification is performed by identifying the hyper-plane that effectively distinguishes the two classes. Higher-dimensional spaces are mapped using kernel functions to convert the original dataset, which includes both linear and nonlinear data, into a linear dataset. The most used are three types of kernels: linear, polynomial, and radial basis function (RBF). All SVM kernels include two main parameters – i.e., regularization parameter C and kernel coefficient, gamma (γ) (Sunkad & Soujanya 2016). Parameter C balances the misclassification of training samples versus decision surface simplicity. A small C soothes the decision surface, whereas a greater C attempts to categorize every training sample accurately. The gamma parameter defines the influence of a particular training data. Kernel RBF was selected with hyperparameters C and gamma equal to 10 and 0.07, respectively. A combined dataset for the normal and disrupted states from pipe leaks, cyber-attacks, and physical attacks was created and divided into a 75% training and a 25% test set to train and test the model for failure identification. Stratified sampling was conducted to uniformly distribute each failure class's samples into training and test datasets.
RF
RF is a supervised machine learning model that has been used as both a classifier and regressor (Breiman 2001). RF generates the decision tree by random data sampling and obtains the prediction from each tree, selecting the most appropriate solution by voting (Breiman 2001). It also provides the importance of each feature for classification and regression, which assists in the feature selection (Hasan et al. 2016). RF is one of the popular classification models because of its fast execution, minimal tuning parameters, ability to produce generalization error, and its applicability in high dimensional datasets (Cutler et al. 2012). In this study, the number of trees in the forest (n_estimators) was set to 10, and the criteria to measure the quality of a split were selected as Entropy. The combined failure dataset was split into 75% training, and 25% test. Samples were normalized with a standard scaler before training and testing.
Evaluation indicators
In general, the performance of data-driven models can be represented using four types: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) (Sokolova & Lapalme 2009). A desirable performance of the models is to detect the failure types without missing the failure conditions and reduce the false alarms. In this regard, the performance of the machine learning models in failure type identification was evaluated using the indicators of accuracy, precision, recall, and F1-score derived from the confusion matrix (Sokolova & Lapalme 2009). These indicators provide the values bounded between 0 and 1. The value 0 indicates poor performance, while the value 1 implies the best performance. The mathematical representation of the indicators is provided in Equations (3)–(6).
RESULTS AND DISCUSSION
Hydraulic response to failure events
Time variation of Tank 2 water levels during normal and disrupted conditions.
It is also noted from Figure 3 that the cyber and physical attack events produced similar hydraulic responses (emptying tank water) with time, even though they were different attack scenarios. The rapid and significant drop of the tank water level could be consequently produced due to the cyber-attack manipulating tank water level readings, leading to the closure of valve 2 and the deactivation of the pumps, and the physical attack turning off the main pump directly. Based on similar results from this study, Taormina et al. (2017) also noted the importance of identifying the cause or approach of cyber-physical attacks as well as detecting anomalous conditions of the system due to the attacks. In this context, it is considered that the simulation of hydraulic models (WNTR and epanetCPA) produced appropriate datasets containing both different and similar hydraulic responses of C-town WDN under disruptive events, which are used to test the failure identification performance of the target machine learning models in the following section.
Failure type identification performance
The confusion matrix for the accuracy of (a) SVM, (b) RF, and (c) ANN in failure identification.
The confusion matrix for the accuracy of (a) SVM, (b) RF, and (c) ANN in failure identification.
The higher rate of misclassification between cyber-attack and physical attack events is attributed to the similarity in hydraulic response of the WDN from different failure processes between the two types of attacks. In a physical attack scenario designed in this study, the attacker physically gains access to the pumps, runs them continuously, or shuts them down when they are active. Similar results can also be achieved during a cyber-attack via an attack on communication, denial of service, and attack on PLCs. For instance, in a denial-of-service attack, the PLC is unable to receive updated water levels from the tank, causing the pump to run or stop for an extended period. These different scenarios can produce similar hydraulic performance; however, their resolution may require different options to recover the disrupted system. In this regard, the SVM, RF, and ANN models need to be trained and improved further to distinguish and identify the failure types that have similar hydraulic responses from different failure events, especially which require different approaches for emergency and recovery actions to improve WDN resilience.
Table 3 presents the precision, recall, and F1-score of each model based on the confusion matrix (Figure 4). It is seen that all failure types except physical attacks were identified with higher performance scores. The three models showed higher performance in identifying conventional disruptions with pipe leaks, based on the F1 scores, compared to the disruptions from cyber and physical attacks. As described above, this occurred due to the misclassification of the models between cyber and physical attack events, some of which produced similar hydraulic responses in the C-town WDN (Figure 3). The F1-score for cyber-attack identification was 0.87/0.84/0.83 for SVM/RF/ANN models, whereas the F1-score for physical attack identification was 0.72/0.68/0.57, which showed a higher rate of misclassification with physical attack events. Similarly, SVM and RF had accuracy of 0.88 and ANN had accuracy of 0.84 in failure type identification.
Performance value obtained from SVM, RF, and ANN
Failure class . | Models . | Precision . | Recall . | F1-score . |
---|---|---|---|---|
Normal | SVM | 0.90 | 0.98 | 0.94 |
RF | 0.93 | 0.99 | 0.96 | |
ANN | 0.70 | 1 | 0.82 | |
Pipe leaks/bursts | SVM | 0.99 | 0.93 | 0.96 |
RF | 0.99 | 0.97 | 0.98 | |
ANN | 1 | 0.90 | 0.95 | |
Cyber-attack | SVM | 0.82 | 0.92 | 0.87 |
RF | 0.82 | 0.85 | 0.84 | |
ANN | 0.83 | 0.84 | 0.83 | |
Physical attack | SVM | 0.86 | 0.62 | 0.72 |
RF | 0.74 | 0.63 | 0.68 | |
ANN | 0.98 | 0.41 | 0.57 |
Failure class . | Models . | Precision . | Recall . | F1-score . |
---|---|---|---|---|
Normal | SVM | 0.90 | 0.98 | 0.94 |
RF | 0.93 | 0.99 | 0.96 | |
ANN | 0.70 | 1 | 0.82 | |
Pipe leaks/bursts | SVM | 0.99 | 0.93 | 0.96 |
RF | 0.99 | 0.97 | 0.98 | |
ANN | 1 | 0.90 | 0.95 | |
Cyber-attack | SVM | 0.82 | 0.92 | 0.87 |
RF | 0.82 | 0.85 | 0.84 | |
ANN | 0.83 | 0.84 | 0.83 | |
Physical attack | SVM | 0.86 | 0.62 | 0.72 |
RF | 0.74 | 0.63 | 0.68 | |
ANN | 0.98 | 0.41 | 0.57 |
Overall, the evaluation of the model's performance with accuracy, precision, recall, and F1-score suggested that SVM and RF models had superior performance in distinguishing and identifying overall failure types, compared to ANN. However, it should be noted that the three machine learning models had different performances depending on the failure types. It is considered that the different performances are attributed to the impacts of various factors such as models' algorithms (the degree of linearity/nonlinearity between input and output variables), selected features and their scaling, failure event specifications, and training data size (Ahsan et al. 2021; Zhang et al. 2021; Umoh et al. 2022). For example, SVM is more sensitive to feature scaling than RF, while ANN can benefit from scaled features to accelerate convergence (Ahsan et al. 2021). In addition, ANN shows good performance with large training datasets in capturing complex relationships of the features, while RF and SVM are more effective in training limited or smaller datasets (Zhang et al. 2021). In this regard, it would be suggested to couple multiple machine learning models in a single framework to differentiate different failure types, rather than relying on a single best-performing model.
Given the superiority of supervised learning in multiclass classification, the three machine learning models, especially SVM and RF, had reliable performance in identifying the failure types. However, the machine learning model's performance can vary significantly based on the placement of the sensors or selection of the features. The monitoring sensors need to be installed strategically based on feature selection or optimization based on model performance, and the model's parameters must be optimized for global monitoring and detection of diverse failures in WDN.
In addition, a real-world challenge in training and testing the machine learning models or data-driven models is to find the cyber-physical failure datasets in balance to the datasets of normal operating conditions, which can affect the failure identification performance of the models in this study (Fan et al. 2021). Failure events due to cyber-physical attacks in WDNs rarely occur, compared to the period of normal operating conditions and other conventional disruptive events (e.g., pipe leaks). Thus, the unbalanced datasets consisting of system's performance under normal and disrupted conditions can impair the classification performance of the machine learning models in a real WDN. In this regard, incorporating synthetic data that are produced using hydraulic simulation models (e.g., WNTR and epanetCPA) into unbalanced datasets can be a way to improve the performance of the machine learning models in failure type identification.
Failure identification under data noise
Effect of data noise on models' performance in failure type identification.
As seen in Figure 5, SVM demonstrated relatively higher accuracy for overall data noise levels, compared to RF and ANN. However, it also showed a consistent drop in the performance with increasing data noise. In turn, its accuracy decreased lower than the accuracy of ANN, which was the least sensitive among the three models to the data noise levels. Considering the trend in the RF performance from Figure 5, RF was expected to be less sensitive to the data noise at high noise levels compared to SVN and ANN. This implies that the three different models can demonstrate different performances in failure type identification depending on the noise levels of training data. In practice, sensing data has noise at various levels from various noise sources such as sensor ageing, malfunctions, and miscalibration, communication disruptions, traffic, and human errors (Rousso et al. 2023). As observed in Figure 5, the best-performing models can vary with the levels of data noise. Therefore, it is suggested to not rely on a single best-performing model for distinguishing and identifying a failure event but rather integrate the results from multiple models for more reliable failure type identification with confidence across different levels of data noise.
CONCLUSIONS
Resilience-based strategies aiming at the minimization of WDN's disruptions from a failure event and rapid recovery have received great attention in recent years to design, operate, and manage critical infrastructure including WDNs. In this regard, a smart systems approach with ICT-based sensors and controllers has been considered and employed in the infrastructure to effectively and resiliently manage disruptive events. However, despite the potential effects of the smart systems approach, they are more exposed to subtle cyber threats. Consequently, this may lead to an increase in the complexity of detecting and identifying failure types and further exacerbating the vulnerability to both cyber and conventional failure events, giving rise to concerns in infrastructure services. Thus, the first step of reactive actions to secure the resilience of smart infrastructure systems will be rapid detection and identification of a failure event, which will be followed by emergency and recovery actions. Anomaly detection and localization are also a critical step in responding to a disruptive event with emergency actions. However, when a failure event occurs, the challenge is how to distinguish the actual failure event from the potential failure events that could occur in the infrastructure systems.
In this regard, this study investigated the performance of three supervised machine learning models – SVM, RF, and ANN – in identifying failure types among cyber-physical attacks and conventional physical disruptions (pipe leaks/bursts). They were trained and tested using the datasets including 29 features related to tank water levels, nodal pressure, and flow status of pumps and valves under the three types of WDN failures – i.e., pipe leaks/bursts, cyber-attacks, and physical attacks. Overall, three models showed reliable performance in identifying the failure types. However, their performances varied depending on the specific failure types, and no single model with consistently superior performance for all failure types was identified. In addition, testing the three models with data noise showed a decrease in their performance in failure type identification. However, the variation of their performance was also different depending on the classification models and levels of data noise. Thus, the use of multiple classification models, rather than relying on a single best-performing model, is recommended to improve the capability of WDNs to distinguish and identify a failure event from different potential failure events.
In addition, the classification models produced a higher rate of misclassification between cyber-attack and physical attack events, due to the similarity in hydraulic response of the C-town WDN from the different failure events. Thus, a failure type identification framework with multiple classification models needs to be designed to distinguish the failure events that can produce similar hydraulic responses, which require different emergency and recovery options during system disruptions. These results suggest insights into building a data-driven analytics framework for the reliable classification and identification of failure types in real-world WDNs.
The findings of this study will contribute to improving the capability of WDNs to rapidly and reliably differentiate and identify failure types and, in turn, find adequate emergency and recovery options depending on the failure events. It is also considered that the findings and insights in this study can be further discussed in distinguishing and identifying contamination events (e.g., malicious contaminant injection or accidental contamination intrusion). However, the application of machine learning classification models for reliable failure identification suggests the following challenges as future work. The SVM, RF, and ANN produced misclassification between cyber-attack and physical attack events, due to the similar hydraulic responses of the C-town WDN from different failure types, which can require different approaches to emergency and recovery options. Thus, further studies on data-driven models to characterize and differentiate failure types that can produce similar hydraulic responses are needed.
Second, this study considered three supervised classification models (SVM, RF, and ANN) and three failure types (conventional pipe leaks/bursts, cyber-attacks, and physical attacks). In this regard, more diverse data-driven (machine learning and deep learning) models can be tested to identify various failure types including attacks that maliciously open a fire hydrant, contamination and mechanical failures with various specific failure scenarios. In addition, the failure identification performance of the models can be further discussed with the different sizes/locations (e.g., proximity to critical storage tanks or reservoirs), severity, and timing and the various types of conventional disruptive events and the operational failures due to not only cyber-physical attacks but also malfunctions/errors in cyber and physical assets of WDNs.
Third, the SVM, RF, and ANN models in this study showed a reliable performance in the presence of data noise. However, as the level of data noise increased, their performances varied with the noise levels. Thus, further investigation of their performance using real-world datasets (e.g., missing data, poor sensor data quality) is suggested, which can increase the chance of practical applicability.
Fourth, the incidents of cyber-physical attacks in the water sector are reported as the third most frequently targeted area among critical infrastructure systems. Considering the interconnected sectors such as energy/power systems (that have the first-largest incidents), the vulnerability of the water systems to cyber-physical attacks is relatively high. Therefore, future studies can be guided more toward understanding the cascading failures between interdependent infrastructure systems due to cyber-physical attacks.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.