Water quality management is a key area to guarantee drinking water safety to users. This task is based on disinfection techniques, such as chlorination, applied to the drinking water network to prevent the growth of microorganisms present in the water. The continuous monitoring of water quality parameters is fundamental to assess the sanitary conditions of the drinking water and to detect unexpected events. The whole process is based on the assumption that the information retrieved from quality sensors is totally reliable, but due to the complexity of the calibration and maintenance of these chemical sensors, several factors affect the accuracy of the raw data collected. Consequently, any decision might be based on a non-solid base. Therefore, this work presents a data analytics monitoring methodology based on temporal and spatial models to discover if a sensor is detecting a real change in water quality parameters or is actually providing inconsistent information due to some malfunction. The methodology presented anticipated by 12.4 days, on average, the detection of a sensor problem before the fault was reported by the water utilities expert using knowledge accumulated with visual analysis. The proposed methodology has been satisfactorily tested on the Barcelona drinking water network.
INTRODUCTION
One of the main tasks of the water utilities (WU) is to transport and supply drinking water to users throughout water distribution systems (WDS). Two of the WU's main areas of concern are, on the one hand, the operations department, to manage hydraulic infrastructure (e.g., pumping stations, reservoirs, pipes, etc.), and on the other hand, the water quality control department, to manage drinking water safety. Furthermore, different legal frameworks regulate the quality of drinking water supplied.
Water quality monitoring and control management programmes involve several tasks. As detailed in Bartram & Ballance (1996), such tasks are monitoring network design (e.g., which parameters have to be measured, how often, etc.), laboratory work (e.g., chemical analysis, laboratory tests, etc.) and analytical quality assurance (e.g., production of reliable data), among other elements.
There are several techniques to treat the water in WDS and keep it healthy for human consumption. One common disinfection technique is the chlorination of water. This process consists of injecting chlorine or derivatives in the water. The injected chlorine is consumed (i.e., by chemical reaction) in the WDS due to two main factors (Powell et al. 2000): on the one hand, due to reactions in the bulk water as, for example, by the presence of organic content in the water, by decay of the initial chlorine concentration because of the physical conditions (e.g., temperature); on the other hand, the chlorine reacts at the pipe wall, known as biofilm (a group of microorganisms adhered to the pipes' surface).
The WU monitors the water quality parameters with on-line water quality sensors (multi-parametric and single-parametric) installed along the water transport and distribution networks. The most common water quality parameters monitored on-line are conductivity, temperature, pH and chlorine. Other interesting parameters, such as total organic carbon (TOC), are well-known indicators of water quality. Moreover, laboratory analyses of water samples taken from different points of the network are essential to analyse biological and chemical components unobserved by the on-line sensors, or even to contrast them against on-line observations.
Quality sensors require a specific calibration planning prescribed by the manufacturer depending on the sensor model to guarantee the reliability of the observations. Moreover, a preventive maintenance planning (e.g., bimonthly or quarterly) is also specified by the manufacturer to preserve data reliability.
Even though applying preventive planning, these quality sensors could be affected by several problems, such as the ones listed in Table 1. Thus, a corrective planning is always required to solve these unexpected problems affecting the sensors' reliability.
Cause . | Consequence . |
---|---|
Communication problem | Data gap |
Loss of sensitivity | Flat signal or slow drift down |
Electronic malfunction | Noise and peaks in the signal |
Miscalibration of the sensor | Offsets affecting the real value |
Cause . | Consequence . |
---|---|
Communication problem | Data gap |
Loss of sensitivity | Flat signal or slow drift down |
Electronic malfunction | Noise and peaks in the signal |
Miscalibration of the sensor | Offsets affecting the real value |
There has been significant research to detect and avoid intended and unintended injection of hazardous substances in WDSs to guarantee the drinking water safety. Several works have studied this particular subject in order to detect water contamination events. In Byer & Carlson (2005), different contaminants introduced into tap water were detected measuring pH, turbidity, conductivity, TOC and chlorine, and establishing as detection limits a threshold based on three times the standard deviation above the average of each magnitude. In Hou et al. (2014), a probabilistic principal component analysis method using UV-Vis spectrometers was detailed to detect contaminant injection into WDS. In Eliades et al. (2014), a model-based approach considering the chlorine input injection was used to compute bounds to compare with the sensors' measurements. In Hall et al. (2007), a benchmark of a set of sensors from different manufacturers measuring distinct quality parameters was presented, allowing analysis and comparison of the sensitivity on the presence of various contaminants. In Hart et al. (2010), operational data and water quality were combined to reduce the false positive rate in the quality event detection. In Ba & McKenna (2015), different change-point detection algorithms were applied to the residuals of an autoregressive model. Sensor placement is also an important topic to improve quality monitoring, meanwhile reducing operational costs, as discussed in Rathi & Gupta (2014). The hydraulic model and a simulation software are proposed in Nejjari et al. (2012) to detect and localize water quality abnormal parameters in the WDS.
Model-based approaches, such as in Eliades et al. (2014) and Nejjari et al. (2012), have the main drawback that the performance depends directly on the water network model's accuracy. Moreover, due to the complex behaviour of the water parameters, it is unfeasible to develop an accurate physical model to describe the water quality dynamics. Hence, data-driven approaches are very interesting in this case and therefore widely used.
In addition, a major drawback, in general, of the existing approaches to detect drinking water quality events is that they are based on the assumption that data gathered from these sensors are accurate and precise. However, as we have pointed out, raw data from quality sensors might not be ready for analysis or to draw solid conclusions. Unreliable water quality information is a serious problem for the WU in the quest to guarantee a water supply that assures the users' health.
Hence, the main motivation of this work is to provide a data analytics methodology for monitoring quality sensors and events applicable to drinking water networks, such as those mentioned before.
The contributions of this work are two-fold. On the one hand, this work provides a methodology to get a solid information basis, discarding unreliable data, to improve the decision-making of the WU in water quality management. On the other hand, a set of indicators is provided that allow improvement of preventive planning, reducing the number of expensive corrective actions.
This work is focused on the application of the proposed methodology to solve the problem of quality events and unreliable sensors' detection in a real WU with on-line monitored water quality parameters. In particular, the proposed methodology has been satisfactorily tested on the Barcelona drinking water network.
CASE STUDY
The case study, used to illustrate the proposed methodology for monitoring quality sensors and events, is based on the Barcelona drinking water network. The Barcelona drinking water network is a complex WDS of over 4,600 km that supplies drinking water to 218 demand sectors. In this WDS, thousands of sensors are installed throughout the network to know with precision the hydraulic state of the network in order to control and manage it efficiently. In addition, quality sensors and analysers are installed to handle the water quality control.
The tank collects water to satisfy the three demand sectors. A chlorination process is continuously undertaken in this tank based on an actuator (chlorine injection), a chlorine analyser and some reference given by the WU's operators. At the entrance to each demand sector, a multi-parameter water quality sensor is installed to monitor and control the quality of the supplied water.
The WU collects hourly observations from multi-parameter sensors and 15-minute observations from chlorine analysers. The parameters observed are temperature, conductivity, pH and chlorine. The single-parameter sensors measure chlorine.
The water quality data collected are analysed by the experts using a visualization software to check for any existing quality event or sensor problem. Another software system allows the experts to contrast field samples analysed in the laboratory against the data collected from the sensors.
The methodology, presented next, has been inspired by the knowledge of the experts to analyse, check and even forecast problems in the water quality system.
METHODOLOGY
The methodology described in this section describes and analyses the procedure followed to obtain a robust decision regarding the two monitoring objectives. As we discussed before, the first objective is to detect changes in the water quality parameters that can compromise the safety of the water supplied, and the second objective is to discriminate whether the problem detected is a real change in the water quality parameters or whether it has been generated by unreliable observations due to some of the problems presented in Table 1.
Data pipeline
Note that calibration and validation stages use independent data sets to avoid common problems when fitting a model (e.g., over-fitting).
As mentioned above, in the pre-processing stage, we first remove the outliers from the hourly observations y(t) collected by the WU. We define an outlier as any observation more than three times interquartile ranges (IQRs) above the third quartile.
TSM
ANNs have been widely used in modelling time series in water networks, for example, water demands (Wu et al. 2014). In Palani et al. (2008), ANNs were used to learn existent linear and non-linear relationships between factors from water quality data in order to forecast these variables. In Sun (2013), ANN models were developed to predict groundwater level changes using a set of predictors: previous precipitation, terrestrial water storage change and maximum and minimum temperatures. In Valipour et al. (2013), the goal was to forecast the inflow to Dez Dam reservoir using ANN, auto regressive moving average (ARMA) and auto regressive integrated moving average (ARIMA) models. The authors identified that the ARMA and ARIMA models performed better to forecast the inflow for the next 12 months, and ANN models performed better to forecast the next 5 years. In Valipour (2016), three different structures of ANN, the non-linear autoregressive neural network (NARNN), the non-linear input–output and the NARNN with exogenous input, were compared to forecast the precipitation in Gilan to detect drought and wet year alarms.
SM
Two spatial relations are considered in this methodology: the predecessor rule (PD) and the divergence measure (DV).
This residual is first evaluated in normal conditions (without faults) to establish a threshold. Hence, we can compare the residual computed on-line against the threshold to detect a divergence between a reference sensor r and a spatially linked sensor s.
Fault diagnosis
Using any of the proposed models alone, it will only be possible to detect that something unexpected, based on the historical knowledge, has occurred. However, it will not be possible to distinguish whether the problem is a sensor fault or a quality event.
In particular, the Holt–Winters TSM, MV and ANN models are able to detect unexpected changes in the quality parameters signal, but they do not allow determination of whether the change produced is a real change in the water quality parameters or, if it is actually due to inaccurate data collected from a sensor affected by some problem. Hence, spatial information is required to contrast the events detected against additional information provided by other related sensors.
Thus, the SM, DV and PD are considered to provide this additional information.
Furthermore, some of the temporal models presented in the previous section are redundant regarding our goals. For instance, the MV, Holt–Winters and ANN detect abrupt changes in the behaviour of the quality measurement signal. A comparison among them will be presented in order to select the one that presents the better detection performance. Analogously, the spatial-based models, PD and DV, track the dissimilarity with respect to other spatial related sensors. Again, after comparing them, the one that provides the best detection performance is selected.
Combining all this knowledge, a fault diagnosis scheme is developed to interpret the combination of the results and provide the key indicators to the WU to improve and anticipate the sensors' maintenance operations.
The states are characterized in Table 2 as a function of the activation of model-based tests, except the calibration state, which is clearly known by the WU maintenance department.
PD . | ANN . | . | Cause . |
---|---|---|---|
1 | 1 | 0 | Distribution sensor fault |
1 | 0 | 0 | Distribution sensor fault |
0 | 1 | 1 | Quality event |
0 | 0 | 0 | Normal state |
PD . | ANN . | . | Cause . |
---|---|---|---|
1 | 1 | 0 | Distribution sensor fault |
1 | 0 | 0 | Distribution sensor fault |
0 | 1 | 1 | Quality event |
0 | 0 | 0 | Normal state |
As detailed in Table 2, a sensor is in normal state when all the tests are not active. A quality event is diagnosed when the PD test is not active and ANN is active. When the PD test is activated, a sensor fault is diagnosed, regardless of the ANN test.
RESULTS
In this section, results based on the Barcelona case study, previously detailed, are presented to show the performance of the methodology proposed in this work.
The data used to generate the results come from the multi-parametric (chlorine, pH, temperature and conductivity) sensors (0794, 0795 and 0801), the chlorine analyser X127701D and the events reported by the WU experts to the maintenance department (detailed in Figure 2).
The historical data of events allow us to analyse the performance of our diagnosis approach. The performance measure selected is the anticipation in days and the false alarms' rate.
A 1-year data set has been divided into three independent subsets: a training set (1 month of data) is used to calibrate the models, a validation set (15 days) is used to analyse how the model generalizes with new data, and finally, a test set (7 months) is used to show the performance of each model detailed in the methodology. We assume that the training and validation sets have no events in order to characterize the system in a normal state (i.e., without faults).
This particular scenario was detected and fixed by the WU as follows. On 21 January 2014, the quality water data analyst, in a check routine, detected a slow chlorine decay of the 0794 sensor compared against the other two sensors (0795 and 0801) and to the transport sensor (X127701D). Once the problem was discovered, the water quality analyst reported the event to the maintenance department. Then, on 22 January, a maintenance technician made a readjustment in order to recalibrate the sensor. Due to this operation, the 0794 sensor showed an abrupt increase in chlorine and a decrease in pH at the same time, during 2 days (22 to 24 January), and after this period it converges again.
The PD model has detected a divergence between the sensor 0794 and the transport analyser X127701D since 15 January. As it can be seen in Figure 8(b), there is a sequence of two solid blocks: the first detection, from 15 to 21 January, of the degradation fault, and the second from 22 to 24 January is the maintenance operation.
The models DV and PD perform in a similar way, detecting divergence between spatial related sensors and able to detect a drift fault. The models HW and ANN detect abrupt changes but not a drift fault. The MV model is the least sensitive model at detecting extreme events, the peaks being caused by the maintenance operation.
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-04-06 | 2014-04-22 | 2014-04-23 | 16 |
2 | 2014-04-27 | 2014-05-01 | ||
3 | 2014-05-02 | 2014-05-11 | ||
4 | 2014-06-11 | 2014-06-13 | 2014-06-16 | 5 |
5 | 2014-07-13 | 2014-08-19 | 2014-08-19 | 37 |
6 | 2014-08-22 | 2014-08-25 |
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-04-06 | 2014-04-22 | 2014-04-23 | 16 |
2 | 2014-04-27 | 2014-05-01 | ||
3 | 2014-05-02 | 2014-05-11 | ||
4 | 2014-06-11 | 2014-06-13 | 2014-06-16 | 5 |
5 | 2014-07-13 | 2014-08-19 | 2014-08-19 | 37 |
6 | 2014-08-22 | 2014-08-25 |
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-03-11 | 2014-03-16 | 2014-03-17 | 6 |
2 | 2014-04-04 | 2014-04-07 | ||
3 | 2014-05-24 | 2014-05-25 | 2014-05-27 | 3 |
4 | 2014-05-30 | 2014-06-09 | 2014-06-04 | 5 |
5 | 2014-06-20 | 2014-06-26 | ||
6 | 2014-07-18 | 2014-08-10 | 2014-08-11 | 23 |
7 | 2014-08-27 | 2014-09-04 | 2014-09-02 | 5 |
8 | 2014-09-12 | 2014-09-21 | 2014-09-23 | 10 |
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-03-11 | 2014-03-16 | 2014-03-17 | 6 |
2 | 2014-04-04 | 2014-04-07 | ||
3 | 2014-05-24 | 2014-05-25 | 2014-05-27 | 3 |
4 | 2014-05-30 | 2014-06-09 | 2014-06-04 | 5 |
5 | 2014-06-20 | 2014-06-26 | ||
6 | 2014-07-18 | 2014-08-10 | 2014-08-11 | 23 |
7 | 2014-08-27 | 2014-09-04 | 2014-09-02 | 5 |
8 | 2014-09-12 | 2014-09-21 | 2014-09-23 | 10 |
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-03-16 | 2014-03-18 | ||
2 | 2014-03-28 | 2014-04-07 | 2014-04-07 | 9 |
3 | 2014-04-11 | 2014-04-27 | 2014-04-28 | 17 |
4 | 2014-05-09 | 2014-05-09 | ||
5 | 2014-05-22 | 2014-05-24 | ||
6 | 2014-05-28 | 2014-06-01 | ||
7 | 2014-06-02 | 2014-06-14 | 2014-06-16 | 13 |
8 | 2014-07-31 | 2014-08-03 | ||
9 | 2014-08-05 | 2014-08-10 | ||
10 | 2014-08-11 | 2014-08-25 | ||
11 | 2014-09-04 | 2014-09-05 | ||
12 | 2014-09-12 | 2014-09-25 | ||
13 | 2014-09-28 | 2014-09-29 | ||
14 | 2014-10-01 | 2014-10-11 |
. | Start detection . | End detection . | Event reported . | Anticipation (days) . |
---|---|---|---|---|
1 | 2014-03-16 | 2014-03-18 | ||
2 | 2014-03-28 | 2014-04-07 | 2014-04-07 | 9 |
3 | 2014-04-11 | 2014-04-27 | 2014-04-28 | 17 |
4 | 2014-05-09 | 2014-05-09 | ||
5 | 2014-05-22 | 2014-05-24 | ||
6 | 2014-05-28 | 2014-06-01 | ||
7 | 2014-06-02 | 2014-06-14 | 2014-06-16 | 13 |
8 | 2014-07-31 | 2014-08-03 | ||
9 | 2014-08-05 | 2014-08-10 | ||
10 | 2014-08-11 | 2014-08-25 | ||
11 | 2014-09-04 | 2014-09-05 | ||
12 | 2014-09-12 | 2014-09-25 | ||
13 | 2014-09-28 | 2014-09-29 | ||
14 | 2014-10-01 | 2014-10-11 |
The rows with a blank in the event reported column, apparently false alarms, are motivated by two causes. On the one hand, the table shows only reported events, not planned maintenance operations (information not available). Thus, some events detected by our approach have been fixed in the maintenance operations before being detected and reported by the WU experts. For instance, Figure 13 shows a decay of the 0801 chlorine signal starting at the end of July until the end of August. At the end of August, an abrupt rise of chlorine is caused by a planned maintenance operation. On the other hand, false alarms occur due to the tight thresholds considered. For instance, the 0801 sensor fault detected on 2014-05-09 is for 1 day only (it does not appear in the bottom colour bar of Figure 13 because of the short width).
With the approach presented in this work, we anticipated by 12.4 days, on average, the detection of a sensor problem before the fault was reported by the WU expert using knowledge accumulated by visual analysis.
CONCLUSIONS
This paper has proposed a methodology to detect water quality changes based on multi-parametric sensors. It has been shown that it is not possible, looking at the different tests separately, to distinguish between a sensor fault and an actual quality event. A fault diagnosis algorithm has been developed that is able to distinguish between water quality events and problems affecting the sensors, such as loss of sensitivity.
This approach has been applied to the Barcelona water network, and the results obtained show that the methodology detailed is able to anticipate the detection of future problems in chlorine sensors compared to the visual analysis applied by WU experts. Hence, the proposed approach improves the water quality control management and reduces the corrective maintenance actions. As a future research, it is planned to integrate the hydraulic model in the methodology in order to reduce the uncertainty of the methodology, and extend the proposed methodology to predict the degradation of the sensors and to plan the maintenance according to the sensors' health.
ACKNOWLEDGEMENTS
This work is partially supported by AGAUR Doctorat Industrial 2013-DI-041. The authors also wish to thank the support received by the company Aigües de Barcelona in the development of this work. This work has been funded by the Spanish Government (MINECO) through the project CICYT ECOCIS (ref. DPI2013-48243-C2-1-R), by MINECO and FEDER through the project CICYT HARCRICS (ref. DPI2014-58104-R).