The problem of fault diagnosis in potable water supply networks is addressed in this paper. Two different fault diagnosis approaches are proposed to deal with this problem. The first one is based on a model-based approach exploiting a priori information regarding physical/temporal relations existing among the measured variables in the monitored system, providing fault detection and isolation capabilities by means of the residuals generated using these measured variables and their estimations. This a priori information is provided by the topology and the physical relations between the elements constituting the system. Alternatively, the second approach relies on a data-driven solution meant to exploit the spatial and temporal relationships present in the acquired data streams in order to detect and isolate faults. Relationships between data streams are modelled using sequences of linear dynamic time-invariant models, whose estimated coefficients are used to feed a hidden Markov model. Afterwards, a cognitive method based on a functional graph representation of the system isolates the fault when existing. Finally, a performance comparison between these two approaches is carried out using the Barcelona water supply network, showing successful and complementary results which suggest the integrated usage in order to improve the results achieved by each one separately.
INTRODUCTION
Water networks are complex large-scale systems needing highly sophisticated supervisory and control schemes in order to satisfy a certain degree of performance when unfavourable faulty conditions are occurring. To deal with this problem, the use of a fault detection and isolation (FDI) system capable of detecting and isolating these faults (or events) is highly desirable, aiming to help the operators to identify which is the actual event occurring in the water network. The FDI problem applied to water networks has been extensively studied from various perspectives and at different levels (see, e.g., Lees 2000; Colombo & Karney 2002; Misiunas et al. 2006). On the one hand, at the district metered area (DMA) level, many FDI approaches have addressed the problem of leak/burst detection and isolation (see, e.g., Mounce et al. 2009, 2011; Mounce & Boxall 2010; Wu et al. 2010; Bicik et al. 2011; Palau et al. 2011; Perez et al. 2011, 2014; Xia et al. 2011; Romano et al. 2013, 2014a, 2014b; Sanchez-Fernandez et al. 2015; Veldman de Roo et al. 2015), where different approaches are applied. Sensor data validation and reconstruction when exploiting the temporal redundancy of the sensor measurements is also addressed in several works (see, e.g., Prescott & Ulanicki 2001; Filion et al. 2007; Quevedo et al. 2010; Eliades & Polycarpou 2012; Farley et al. 2012; Cugueró-Escofet et al. 2016). The problem of water quality monitoring concerning contamination event detection has also been extensively addressed (see, e.g., Eliades & Polycarpou 2010). Moreover, regarding DMA monitoring (either for leaks or water quality), there is also a related problem involving optimal sensor placement in order to maximize the performance of the FDI algorithms applied. This problem has been studied separately both for leak detection and location (see, e.g., Ostfeld & Salomons 2004; Krause et al. 2008; Pérez et al. 2010; Wu 2011; Casillas et al. 2013; Cugueró-Escofet et al. 2015) and quality monitoring (see, e.g., Eliades & Polycarpou 2010). On the other hand, at the water supply level (i.e., the network connecting the water potabilization plants with the water distribution tanks) less research has been carried out (see, e.g., Ragot & Maquin 2006; Quevedo et al. 2014). The water supply networks, also referred to as trunk main systems, are regional networks used to supply water to the cities and villages of a certain region. This kind of network can be analysed using a flow-driven model, that is, using mass balance linear relations, alternatively to water distribution networks, which are generally modelled using pressure-driven models implying non-linear non-explicit relations. The use of mass balance relations for modelling regional water supply networks is appropriate because an actuator is typically installed in each pipe, which establishes its flow. Of course, the energy balances could also be formulated in this case, but this would add extra complexity which is actually not needed since the goal is to establish analytical redundancy relations (ARRs) between flow sensors. This is the case, e.g., in Quevedo et al. (2014), where the problem of sensor data validation and reconstruction (which is addressed for DMA networks in Quevedo et al. (2010) exploiting the temporal redundancy of the sensor measurements) has been extended to water supply networks, considering combined temporal/spatial redundancy models. In Ragot & Maquin (2006), a model-based FDI approach is applied to the Nancy water network, a city in the north-eastern French department of Meurthe-et-Moselle, in order to detect faults in the sensors. The present paper also moves towards the FDI application to a water supply network by proposing two different fault diagnosis approaches: a model-based approach using a priori information of the system, i.e., the physical relation between its elements, and a data-driven approach, which is able to exploit a priori information about the network topology to perform fault diagnosis but does not require any additional information about the physical models of the water supply network. According to the literature, model-based approaches rely on the concept of analytical redundancy (Blanke et al. 2006), which is based on the use of software sensors, i.e., models using available sensor historic records in order to estimate the desired sensor measurement, as an alternative to hardware-based approaches, which rely on the use of extra hardware sensors. Although hardware redundancy is desirable in critical elements, the use of the latter in large-scale water networks may be dramatically expensive because of the installation, calibration and maintenance actions to be performed on the system when considering this approach.
The fault diagnosis problem in critical infrastructure systems, such as potable water supply networks (PWSNs), involve the answers to some common questions formulated in general fault diagnosis problems, such as if there is a fault affecting the system (fault detection stage) or which is the actual faulty element in this system (fault isolation stage). Also, sometimes it is important to know the magnitude of the fault occurring in order to decide the importance of this fault and the corresponding actions to be taken. The novelty of this paper is not only to compare two well-accepted and promising general purpose fault diagnosis methods (one model-based, the other data-based), but also to determine the main features of each method and which is the best way to combine them in order to optimize the overall performance at both fault detection and fault isolation stages, when considering PWSNs as is the case here. Specifically, the Barcelona PWSN is used as the case study in this work. In ideal situations, the use of a model obtained from the physical relations, as considered in the first approach, should lead to the optimal solution. However, it may be noted that analytical models may be affected by several system practical issues, such as the potential uncertainty on the model parameters (e.g., actual tank surface), the difficulty of having an on-line well-calibrated model due to frequent network topology changes (caused by, e.g., new elements like tanks added or blocked pipes resulting from maintenance operations) and common changes in the consumers’ demand behaviour, which are hard to determine in real-time operation. Hence, a data-based approach, as suggested in the second method, is also a useful and effective alternative to the use of analytical models obtained from physical/temporal relations existing in the network.
The structure of the paper is as follows: the next section presents both the FDI model-based method combining both spatial and time series (TS) models, and the data-based approach based on a cognitive fault diagnosis system (CFDS) method exploiting hidden Markov models (HMMs). The case study, based on the Barcelona PWSN, is presented next. This is followed by a section in which fault isolation results obtained by each methodology are presented, compared and discussed. Finally, conclusions and ongoing work are outlined.
FAULT DIAGNOSIS METHODOLOGY
The methodologies presented here aim to detect and isolate faults of different kinds appearing in PWSNs, as discussed later in the section ‘Fault scenarios’. These may well represent actual common hydraulic faults occurring and jeopardizing the performance of water networks, e.g., leaks, bursts or sensor communication faults, as further detailed in the same section. Generally, in order to apply these methodologies, the set of considered faults to be addressed should be defined beforehand. This allows to generate the set of relations or data-based models able to detect and isolate the set of specified faults.
Method I: fault diagnosis based on PTPR
Residual generation
Spatial consistency residuals
TS residuals
FDI scheme
Method II: fault diagnosis system based on the cognitive approach
The considered CFDS method is based on the ability to characterize the functional dependencies among the streams of acquired data, where each functional dependency models the temporal and spatial relationships between couples of data streams. The main characteristics of the CFDS are the ability to work without any a priori information about the physical models of the system and the possibility to isolate the potential faults by exploiting a functional graph representation of the system. Details about the considered CFDS can be found in Alippi et al. (2013). In this applicative scenario, we considered the possibility to include the a priori information about the topology of the water supply network. In fact, the physical phenomenon of the water flow induces a causality among the respective acquired data streams, allowing those relationships in which this causality principle does not hold to be discarded.
Functional relationships in G are modelled either by a linear time-invariant (LTI) dynamic system or by a sequence of LTI dynamic systems following the HMM hypothesis (i.e., the Markov memoryless property of stochastic processes). Among the wide range of LTI dynamic systems, we focus on single-input single-output (SISO) models such as autoregressive with exogenous input (ARX) models, autoregressive moving average with exogenous input (ARMAX) models or output error (OE) models (Ljung 1999) in their predictive form, i.e., parametrized in . Here, represents the parameter vector of the considered predictive models, while p represents the cardinality of .
Once a change in one of the HMM-based CDTs is detected, the cognitive fault diagnosis layer is activated to isolate the fault within the system. The basic idea of this cognitive isolation mechanism is as follows: when a fault affects a sensor, all the relationships connected to that sensor should be affected by this change. Hence, by looking at the likelihood of all the relationships in V, we are able to identify the sensor of the system that has been affected by the fault, and thus to isolate it. Details about the cognitive level presented here can also be found in Algorithm 2 of Alippi et al. (2013).
CASE STUDY
Barcelona PWSN
Tanks:d175LOR, d200CGY, d268CGY, d361CGY
Actuators with flow sensors:iOrioles, iCanGuey1d2, iCanGuey2, iCanGuey3
Demands with flow sensors:c175LOR, c200CGY, c268CGY, c361CGY
Level sensors:xd175LOR, xd200CGY, xd268CGY, xd361CGY
Residual definition
In Figures 8 and 9, , and y are the incoming tank flow, consumer demand and tank level, respectively, and , and are the corresponding measured values. The corresponding discrete-time model equations, including the considered faults, are as follows:
where y(k) is the actual tank level, is the measured tank level, is the actual demand flow, is the measured demand flow, is the actual input tank flow, is the set-point pump flow, is the measured input flow, is the multiplicative fault signal component related to element c, is the additive fault signal component related to element c, with time profile and behaviour, is the sampling time and A is the cylindrical tank surface.
Furthermore, for each input and output with periodic behaviour of the i-th tank subsystem, a TS HW model can be derived and the following ARRs may be obtained:
From residuals (21), (26)–(28) and Equations (17)–(20), the theoretical FSM for the subsystems considered (Figures 8 and 9) is presented in Table 1. In the latter, the sensitivity of each residual to each fault is detailed by means of a 0 (i.e., non-sensitive) or a 1 (i.e., sensitive) in the corresponding element of the matrix, obtaining a fault signature from each of its columns. Also, ordinal index i is assigned for each tank subsystem as follows: i = 1 for d175LOR, i= 2 for d200CGY, i= 3 for d268CGY and i= 4 for d361CGY tank subsystem, respectively. Moreover, it may also be observed how spatial consistency residuals are used for fault detection here, since is sensitive to all the considered faults within each i single tank subsystem, while TS residuals are employed for fault isolation purposes.
. | fym . | fqoutm . | fqinm . | fp . | fym1 . | fqoutm1 . | fqinm1 . | fp1 . | fym2 . | fqoutm1 . | fqinm2 . | fp2 . | fym3 . | fqoutm3 . | fqinm3 . | fp3 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
. | fym . | fqoutm . | fqinm . | fp . | fym1 . | fqoutm1 . | fqinm1 . | fp1 . | fym2 . | fqoutm1 . | fqinm2 . | fp2 . | fym3 . | fqoutm3 . | fqinm3 . | fp3 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
Fault scenarios
The Barcelona PWSN simulator allows the introduction of faults of different kinds in distinct elements of this water network. Here, faults of freezing, offset and drift nature are considered:
Freezing: and for
Offset: and for
Drift: and for
Moreover, the faults considered are either of abrupt or incipient nature, as defined by their time profile as follows:
where is the constant characterizing the evolution of the corresponding fault and is the time instant when the fault occurs. The faults presented in this section are meant to be generic, but may well represent actual common hydraulic faults occurring in water networks, e.g., leaks (which may be represented by offset/drift abrupt/incipient faults), bursts (which may be represented by offset abrupt faults) or sensor communication faults (which may be represented by freezing abrupt faults).
Different fault scenarios are defined in order to test and compare the methods presented here, all including random normally distributed measurement noise of full scale. The dataset considered to implement these fault scenarios lasts for seven months, with a sampling period T = 1 hour and a fault appearing at in different elements of the Barcelona PWSN subsystems considered (Figures 8 and 9, respectively):
iOrioles pump sensor in Orioles subsystem
c175LOR demand sensor in Orioles subsystem
iCanGuey1d2 pump sensor in Can Guey subsystem
iCanGuey2 pump sensor in Can Guey subsystem
iCanGuey3 pump sensor in Can Guey subsystem
c268CGY demand sensor in Can Guey subsystem
c361CGY demand sensor in Can Guey subsystem
The way these faults affect the hydraulic network components is represented in the models (19) and (20). These fault scenarios are part of a fault benchmark used in the framework of the Seventh Framework Program European project iSense (ref. FP7-ICT-2009-6) as a collaborative dataset provided by the Polytechnic University of Catalonia to be used by all the partners involved. The parametrization of the faults involved in this benchmark is depicted in Table 2.
. | . | . | PTPR method . | CFDS method . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Delay [# of samples] . | . | . | . | Delay [# of samples] . | . | . | . | ||
Id. . | Type of fault . | Magnitude . | Detection . | Isolation . | FP [%] . | FN [%] . | Iso. [%] . | Detection . | Isolation . | FP [%] . | FN [%] . | Iso. [%] . |
1 | Offset abr. iOrioles | 10% MFD | 2 | 4 | 0 | 7.34 | 22.22 | 4 | 4 | 7.44 | 94.52 | 5.48 |
2 | Offset abr. iOrioles | 25% MFD | 2 | 2 | 0 | 2.37 | 79.16 | 3 | 3 | 11.57 | 4.11 | 95.89 |
3 | Offset inc. iOrioles | 10% MFD | 12 | 23 | 0 | 3.04 | 4.16 | 35 | 35 | 12.40 | 82.19 | 17.81 |
4 | Offset inc. iOrioles | 25% MFD | 9 | 13 | 0 | 4.53 | 52.77 | 16 | 16 | 0.00 | 27.40 | 72.60 |
5 | Drift abr. iOrioles | 1% MFD | 9 | 13 | 0 | 2.71 | 73.61 | 8 | 8 | 13.22 | 10.96 | 89.04 |
6 | Drift abr. iOrioles | 10% MFD | 3 | 3 | 0 | 2.78 | 84.72 | 4 | 4 | 1.65 | 5.48 | 94.52 |
7 | Drift inc. iOrioles | 1% MFD | 12 | 23 | 0 | 3.54 | 59.72 | 18 | 18 | 9.92 | 24.66 | 75.34 |
8 | Drift inc. iOrioles | 10% MFD | 7 | 7 | 0 | 4.09 | 77.77 | 4 | 4 | 8.26 | 8.22 | 91.78 |
9 | Offset abr. c175LOR | 10% MFD | 1 | 3 | 0 | 0.02 | 65.27 | 3 | 3 | 0.00 | 4.11 | 95.89 |
10 | Offset abr. c175LOR | 25% MFD | 1 | 3 | 0 | 0.02 | 65.27 | 1 | 1 | 0.00 | 1.37 | 98.63 |
11 | Offset inc. c175LOR | 10% MFD | 10 | 21 | 0 | 0.29 | 34.72 | 47 | 47 | 0.00 | 64.38 | 35.62 |
12 | Offset inc. c175LOR | 25% MFD | 7 | 11 | 0 | 0.13 | 58.33 | 65 | 65 | 0.00 | 89.04 | 10.96 |
13 | Drift abr. c175LOR | 1% MFD | 7 | 7 | 0 | 0.11 | 59.72 | 33 | 33 | 0.00 | 45.21 | 54.79 |
14 | Drift abr. c175LOR | 10% MFD | 2 | 4 | 0 | 0.04 | 63.88 | 4 | 4 | 0.00 | 5.48 | 94.52 |
15 | Drift inc. c175LOR | 1% MFD | 10 | 15 | 0 | 0.25 | 52.77 | 39 | 39 | 0.00 | 53.42 | 46.58 |
16 | Drift inc. c175LOR | 10% MFD | 5 | 7 | 0 | 0.11 | 59.72 | 8 | 8 | 38.02 | 10.96 | 89.04 |
17 | Freezing abr. iOrioles | – | 7 | 12 | 0 | 7.22 | 8.33 | 17 | 17 | 10.74 | 68.49 | 31.51 |
18 | Freezing inc. iOrioles | – | 19 | 36 | 0 | 7.52 | 5.55 | 3 | 3 | 22.31 | 73.97 | 26.03 |
19 | Freezing abr. c175LOR | – | 11 | 14 | 0 | 16.79 | 6.94 | 33 | 33 | 0.00 | 45.21 | 54.79 |
20 | Freezing inc. c175LOR | – | 59 | – | 0 | 92.84 | 0 | 58 | 58 | 4.13 | 79.45 | 20.55 |
21 | Offset abr. iCanGuey1d2 | 15% MFD | 2 | 4 | 0 | 1.37 | 39.73 | 21 | 22 | 0 | 30.56 | 69.44 |
22 | Offset inc. iCanGuey1d2 | 15% MFD | 9 | 26 | 0 | 12.33 | 8.22 | 32 | 50 | 0 | 69.44 | 30.56 |
23 | Drift abr. iCanGuey1d2 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 5 | 11 | 0 | 15.28 | 84.72 |
24 | Freezing abr. iCanGuey1d2 | – | 3 | 8 | 0 | 9.59 | 71.23 | 12 | 21 | 0 | 29.17 | 70.83 |
25 | Offset abr. iCanGuey2 | 15% MFD | 2 | – | 0 | 1.37 | – | 14 | 29 | 0 | 40.28 | 59.72 |
26 | Offset inc. iCanGuey2 | 15% MFD | 8 | 26 | 0 | 12.33 | 2.74 | 32 | 53 | 0 | 73.61 | 26.39 |
27 | Drift abr. iCanGuey2 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 7 | 9 | 0 | 12.50 | 87.50 |
28 | Freezing abr. iCanGuey2 | – | 3 | 4 | 0 | 5.48 | 67.12 | 11 | 12 | 0 | 16.67 | 83.33 |
29 | Offset abr. c361CGY | 15% MFD | 2 | 4 | 0 | 1.37 | 93.15 | 3 | 3 | 0 | 4.17 | 95.83 |
30 | Offset inc. c361CGY | 15% MFD | 9 | 11 | 0 | 10.96 | 82.19 | 10 | 12 | 0 | 16.67 | 83.33 |
31 | Drift abr. c361CGY | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 3 | 3 | 0 | 4.17 | 95.83 |
32 | Freezing abr. c361CGY | – | 10 | 12 | 0 | 34.25 | 10.96 | 9 | 11 | 0 | 15.28 | 84.72 |
33 | Offset abr. c268CGY | 15% MFD | 2 | 4 | 0 | 1.37 | 93.15 | 1 | 1 | 0 | 1.39 | 98.61 |
34 | Offset inc. c268CGY | 15% MFD | 8 | 11 | 0 | 9.59 | 80.82 | 3 | 9 | 0 | 12.50 | 87.50 |
35 | Drift abr. c268CGY | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 2 | 2 | 0 | 2.78 | 97.22 |
36 | Freezing abr. c268CGY | – | 10 | 12 | 0 | 56.16 | 6.85 | 3 | 11 | 0 | 15.28 | 84.72 |
37 | Offset abr. iCanGuey3 | 15% MFD | 2 | 4 | 0 | 1.37 | 63.01 | 5 | 7 | 0 | 9.72 | 90.28 |
38 | Offset inc. iCanGuey3 | 15% MFD | 8 | 28 | 0 | 28.77 | 12.33 | 28 | 29 | 0 | 40.28 | 59.72 |
39 | Drift abr. iCanGuey3 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 4 | 4 | 0 | 5.56 | 94.44 |
40 | Freezing abr. iCanGuey3 | – | 3 | 10 | 0 | 26.03 | 64.38 | 9 | 13 | 0 | 18.06 | 81.94 |
. | . | . | PTPR method . | CFDS method . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Delay [# of samples] . | . | . | . | Delay [# of samples] . | . | . | . | ||
Id. . | Type of fault . | Magnitude . | Detection . | Isolation . | FP [%] . | FN [%] . | Iso. [%] . | Detection . | Isolation . | FP [%] . | FN [%] . | Iso. [%] . |
1 | Offset abr. iOrioles | 10% MFD | 2 | 4 | 0 | 7.34 | 22.22 | 4 | 4 | 7.44 | 94.52 | 5.48 |
2 | Offset abr. iOrioles | 25% MFD | 2 | 2 | 0 | 2.37 | 79.16 | 3 | 3 | 11.57 | 4.11 | 95.89 |
3 | Offset inc. iOrioles | 10% MFD | 12 | 23 | 0 | 3.04 | 4.16 | 35 | 35 | 12.40 | 82.19 | 17.81 |
4 | Offset inc. iOrioles | 25% MFD | 9 | 13 | 0 | 4.53 | 52.77 | 16 | 16 | 0.00 | 27.40 | 72.60 |
5 | Drift abr. iOrioles | 1% MFD | 9 | 13 | 0 | 2.71 | 73.61 | 8 | 8 | 13.22 | 10.96 | 89.04 |
6 | Drift abr. iOrioles | 10% MFD | 3 | 3 | 0 | 2.78 | 84.72 | 4 | 4 | 1.65 | 5.48 | 94.52 |
7 | Drift inc. iOrioles | 1% MFD | 12 | 23 | 0 | 3.54 | 59.72 | 18 | 18 | 9.92 | 24.66 | 75.34 |
8 | Drift inc. iOrioles | 10% MFD | 7 | 7 | 0 | 4.09 | 77.77 | 4 | 4 | 8.26 | 8.22 | 91.78 |
9 | Offset abr. c175LOR | 10% MFD | 1 | 3 | 0 | 0.02 | 65.27 | 3 | 3 | 0.00 | 4.11 | 95.89 |
10 | Offset abr. c175LOR | 25% MFD | 1 | 3 | 0 | 0.02 | 65.27 | 1 | 1 | 0.00 | 1.37 | 98.63 |
11 | Offset inc. c175LOR | 10% MFD | 10 | 21 | 0 | 0.29 | 34.72 | 47 | 47 | 0.00 | 64.38 | 35.62 |
12 | Offset inc. c175LOR | 25% MFD | 7 | 11 | 0 | 0.13 | 58.33 | 65 | 65 | 0.00 | 89.04 | 10.96 |
13 | Drift abr. c175LOR | 1% MFD | 7 | 7 | 0 | 0.11 | 59.72 | 33 | 33 | 0.00 | 45.21 | 54.79 |
14 | Drift abr. c175LOR | 10% MFD | 2 | 4 | 0 | 0.04 | 63.88 | 4 | 4 | 0.00 | 5.48 | 94.52 |
15 | Drift inc. c175LOR | 1% MFD | 10 | 15 | 0 | 0.25 | 52.77 | 39 | 39 | 0.00 | 53.42 | 46.58 |
16 | Drift inc. c175LOR | 10% MFD | 5 | 7 | 0 | 0.11 | 59.72 | 8 | 8 | 38.02 | 10.96 | 89.04 |
17 | Freezing abr. iOrioles | – | 7 | 12 | 0 | 7.22 | 8.33 | 17 | 17 | 10.74 | 68.49 | 31.51 |
18 | Freezing inc. iOrioles | – | 19 | 36 | 0 | 7.52 | 5.55 | 3 | 3 | 22.31 | 73.97 | 26.03 |
19 | Freezing abr. c175LOR | – | 11 | 14 | 0 | 16.79 | 6.94 | 33 | 33 | 0.00 | 45.21 | 54.79 |
20 | Freezing inc. c175LOR | – | 59 | – | 0 | 92.84 | 0 | 58 | 58 | 4.13 | 79.45 | 20.55 |
21 | Offset abr. iCanGuey1d2 | 15% MFD | 2 | 4 | 0 | 1.37 | 39.73 | 21 | 22 | 0 | 30.56 | 69.44 |
22 | Offset inc. iCanGuey1d2 | 15% MFD | 9 | 26 | 0 | 12.33 | 8.22 | 32 | 50 | 0 | 69.44 | 30.56 |
23 | Drift abr. iCanGuey1d2 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 5 | 11 | 0 | 15.28 | 84.72 |
24 | Freezing abr. iCanGuey1d2 | – | 3 | 8 | 0 | 9.59 | 71.23 | 12 | 21 | 0 | 29.17 | 70.83 |
25 | Offset abr. iCanGuey2 | 15% MFD | 2 | – | 0 | 1.37 | – | 14 | 29 | 0 | 40.28 | 59.72 |
26 | Offset inc. iCanGuey2 | 15% MFD | 8 | 26 | 0 | 12.33 | 2.74 | 32 | 53 | 0 | 73.61 | 26.39 |
27 | Drift abr. iCanGuey2 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 7 | 9 | 0 | 12.50 | 87.50 |
28 | Freezing abr. iCanGuey2 | – | 3 | 4 | 0 | 5.48 | 67.12 | 11 | 12 | 0 | 16.67 | 83.33 |
29 | Offset abr. c361CGY | 15% MFD | 2 | 4 | 0 | 1.37 | 93.15 | 3 | 3 | 0 | 4.17 | 95.83 |
30 | Offset inc. c361CGY | 15% MFD | 9 | 11 | 0 | 10.96 | 82.19 | 10 | 12 | 0 | 16.67 | 83.33 |
31 | Drift abr. c361CGY | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 3 | 3 | 0 | 4.17 | 95.83 |
32 | Freezing abr. c361CGY | – | 10 | 12 | 0 | 34.25 | 10.96 | 9 | 11 | 0 | 15.28 | 84.72 |
33 | Offset abr. c268CGY | 15% MFD | 2 | 4 | 0 | 1.37 | 93.15 | 1 | 1 | 0 | 1.39 | 98.61 |
34 | Offset inc. c268CGY | 15% MFD | 8 | 11 | 0 | 9.59 | 80.82 | 3 | 9 | 0 | 12.50 | 87.50 |
35 | Drift abr. c268CGY | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 2 | 2 | 0 | 2.78 | 97.22 |
36 | Freezing abr. c268CGY | – | 10 | 12 | 0 | 56.16 | 6.85 | 3 | 11 | 0 | 15.28 | 84.72 |
37 | Offset abr. iCanGuey3 | 15% MFD | 2 | 4 | 0 | 1.37 | 63.01 | 5 | 7 | 0 | 9.72 | 90.28 |
38 | Offset inc. iCanGuey3 | 15% MFD | 8 | 28 | 0 | 28.77 | 12.33 | 28 | 29 | 0 | 40.28 | 59.72 |
39 | Drift abr. iCanGuey3 | 15% MFD | 3 | 4 | 0 | 2.74 | 93.15 | 4 | 4 | 0 | 5.56 | 94.44 |
40 | Freezing abr. iCanGuey3 | – | 3 | 10 | 0 | 26.03 | 64.38 | 9 | 13 | 0 | 18.06 | 81.94 |
MFD, maximum flow/demand.
Methods setting
The HMM-based CDT uses ARX linear models for the extraction of the parameters . In the case of Orioles subsystem (Figure 8), the relationship patterns among the measured tank level, the input flow and the measured demands are modelled. The dependency graph is learned by considering all the binary relationships with autocorrelation greater or equal to . The result is the graph presented in Figure 4 with . In the case of Can Guey subsystem (Figure 9), we only considered those relationships compatible with the causality given by the water flow phenomenon (i.e., those in Figure 2(a)) and selected those having autocorrelation greater or equal to , where the higher threshold is due to the increased complexity of the network. Under faulty conditions, the considered relationships will exhibit changes that will depend on the kind and magnitude of the fault introduced, thus their monitoring is useful for fault isolation purposes.
Regarding the PTPR initialization, the first 13 days of data are used as training dataset to identify the model parameters, the next 13 days are used as validation dataset to obtain the corresponding fault detection threshold and the rest of the data is used as test dataset. Regarding the HMMs, also a total of 26 days are used for training and validation purposes: 23 and 25 days are used for training the Orioles and the Can Guey cases, respectively, while the remaining days are used to compute the threshold for detection and validation (Alippi et al. 2013). The orders of ARX models have been chosen by means of a validation procedure. The log-likelihood window length has been set to and the batch size has been set to and for the Orioles and Can Guey subsystems, respectively.
Figures of merit
The numerical results are presented by means of different figures of merit. The first stage in fault diagnosis deals with faultless vs. faulty situation discrimination. The performance achieved in this fault detection stage is measured by the next indices:
Detection delay: Number of samples needed by the fault diagnosis method to detect a certain fault.
False positives (FP): Percentage of test dataset faultless samples (i.e., not affected by a certain fault) that are determined as faulty by the fault detection method. FP corresponds to false alarms in FDI terminology (see Blanke et al. 2006).
False negatives (FN): Percentage of test dataset faulty samples (i.e., affected by a certain fault) that are determined as faultless by the fault detection method within the 72 samples (i.e., 3 days) after a fault is produced. FN corresponds to missed alarms in FDI terminology (see Blanke et al. 2006).
Moreover, the second stage in fault diagnosis involves fault isolation and classification abilities of the FDI method. These may be quantified by different figures of merit, which are defined as follows:
Isolation delay: Number of samples needed by the fault diagnosis method to isolate a certain fault.
Isolation index: Percentage of test dataset faulty samples (i.e., affected by a certain fault) that are properly isolated within the 72 samples (i.e., 3 days) after a fault is produced, considering a certain fault scenario.
RESULTS
On the one hand, Table 2 shows generally better detection and isolation delay results achieved by PTPR method than by CFDS method, for the pump sensors of iOrioles (e.g., fault Id. 3, 4), iCanGuey1d2 (e.g., fault Id. 21, 22, 26), iCanGuey3 (e.g., fault Id. 37, 38) and the demand sensor c175LOR (e.g., fault Id. 11, 12, 13, 15, 19), with the exceptions of some faults with similar or better performance achieved by CFDS affecting iOrioles pump sensor (e.g., fault Id. 1, 2, 5, 6, 8), c175LOR (e.g., fault Id. 10), c268CGY and c361CGY demand sensors (e.g., fault Id. 29 to 36), freezing incipient faults affecting Orioles system (e.g., faults Id. 18, 20) or iCanGuey pump sensor (fault Id. 25) for isolation delay. Also, generally better FP and FN rates are achieved by the PTPR method, with some exceptions regarding freezing nature faults (i.e., faults Id. 20, 32, 36, 40) for FN rates. This overall detection and isolation behaviour leads to a generally quicker and more reliable diagnosis of the faulty component under study achieved by the PTPR method. On the other hand, the CFDS method grants better isolation rates in general (with the exception of faults Id. 1, 12, 13, 15, 23, 24, 27 where the PTPR method provides similar or better results, depending on the scenario considered), which makes it useful to confirm the isolated fault occurring in the system. It is worth noting that the CFDS method provides the performance in Table 2 without assuming any a priori information about the physical model of the system. In the case of faults with an increasing profile, the detection delays of the CFDS are worse than those obtained with the PTPR method. This is due to the fact that the HMM-based CDT is more sensitive to abrupt changes in the parameter distribution. Also, CFDS is generally characterized by higher FP index values. The reason for this behaviour is two-fold: nominal state approximation and process time invariance. First, the nominal model is estimated during an initial training phase that, in principle, could lead to inaccurate models, i.e., model bias due to, e.g., an incorrect selection of the family of models, the lack of enough data for training or the fact that training data do not excite the whole dynamics of the process. This undesired model bias tends to induce FP detections in the testing phase. Second, the process under monitoring could be intrinsically time-varying and not follow the Markov assumption. This leads to FP detection induced by an estimated model which is not able to fully describe the process.
DISCUSSION
From the light of the results in the previous section, both FDI methods introduced present satisfactory performance for the fault scenarios considered, also showing some complementarity features which suggest possible integration in order to improve the overall fault diagnosis. Specifically, the PTPR method obtains generally lower detection and isolation delays, as well as better FP and FN rates. Hence, it is a good choice for early reliable isolation of the faults appearing in the system, while the CFDS obtains generally better isolation rates, which allows reliable confirmation of the fault being detected and isolated. Regarding the benefits of each method, on the one hand, PTPR is based on physical models describing normal behaviour and does not need to have data from all the possible fault scenarios to perform the fault diagnosis, in contrast to the CFDS method. On the other hand, the main drawback of the PTPR approach is the deep knowledge of the model structure and parameters required to successfully apply this methodology, which is not needed by the CFDS since it is a data-based approach. These facts further motivate the integration of both methods for FDI, taking advantage of the highlights which characterize each one separately.
CONCLUSIONS
In this work, the application and comparison of two well-accepted general purpose fault diagnosis methods (one model-based, the other data-based) applied to a real PWSN located in the Barcelona area, is developed. Most of the works in the literature addressing fault diagnosis in water networks have treated the problem at the distribution level, but not at the supply level. However, water supply networks have characteristics which allow the application of techniques that cannot be applied to distribution networks. The first method is built upon a model-based approach exploiting a priori information regarding physical/temporal relations which exist between the measured variables within the monitored system, while the second aims at characterizing and detecting changes in the probabilistic pattern sequence of the data coming from this system. Some enhancements of the two approaches considered are introduced, such as the use of TS residuals, not generally considered in classical residual-based fault diagnosis schemes, which traditionally use physical residuals. Successful results have been achieved by both methods, showing good complementary conditions which suggest integrated usage in order to improve the results achieved by each one separately. These results have been tested using heterogeneous types of faults in representative subsystems within the network under study. Future work, including additional analysis of the fault diagnosis results achieved, considering uncertainty such as noise and modelling/measurement errors derived by the examination of the system under study, is to be done in the ongoing works involving the Barcelona PWSN.
ACKNOWLEDGEMENTS
This work has been partially funded by the Spanish Government (Ministerio de Economía y Competitividad) and FEDER through the Projects ECOCIS (Ref. DPI2013-48243-C2-1-R) and HARCRICS (Ref. DPI2014-58104-R), and EFFINET grant FP7-ICT-2012-318556 of the European Commission.