Abstract

Placing fixed water quality monitoring stations in a water distribution system can greatly improve the security of the system via prompt detection of poor water quality. In the event that a harmful substance is injected into a water distribution system, large populations can be put at risk of exposure to the contaminant. Promptly detecting the presence of a contaminant will reduce the number of people put at risk of exposure. However, to protect against a wide variety of possible contaminants, a water quality monitoring station will need to identify contamination via recognition of anomalous changes in a suite of surrogate water quality indicators (chlorine, pH, etc.). This work attempts to place water quality monitoring stations within the water distribution at locations that best detect contamination events via surrogate water quality signals. Networks of water quality monitoring stations are designed to minimize the population affected prior to contamination event detection, and simultaneously minimize the expected number of false positive detections, under uncertain water quality conditions. Solutions generated in this study are compared to solutions designed via classical detection methods. Results show the sensor networks designed without consideration to detection via surrogate water quality parameters have higher false positive detection rates.

INTRODUCTION

Within a water distribution system (WDS), water quality is monitored and maintained at or near the sources, i.e. water treatment plants, desalination plants, etc. Throughout the WDS, downstream water quality is then assumed to be satisfactory given that the pipe network is closed and secure, and that the inlet water quality is adequate for the hydraulic travel times between the sources and the consumers. In a system that is perfectly understood and perfectly secured, the assumption that water quality management at the inlet is sufficient may be valid. However, in real-world WDSs the rate of water quality deterioration will not be perfectly known, and network intrusions and/or backflows can dramatically reduce water quality within a WDS. An intrusion of a harmful substance can expose downstream populations to potential ingestion, and infection or poisoning by harmful contaminants.

Placing water quality monitoring stations (WQMSs), or sensors, within a water distribution system has shown to improve a managing authority's ability to detect a contamination, and reduce the potential size of a population put at risk. Although it is beneficial to monitor a large number of water quality parameters, budget, space, and accessibility constraints will limit the number of parameters a WQMS can measure. Instead of attempting to identify a contamination via detecting the presence of some contaminant, a contamination will be identified by observing anomalous readings in a small suite of water quality parameters as a response to the presence of a contaminant.

Typically, the performance of an early warning system (EWS) composed of water quality sensors is highly sensitive to the locations at which sensors are placed. Initial studies in WQMS placement identified the best locations to place WQMSs according to demand based coverage, and assumed that water upstream from a WQMS was also considered ‘safe’ (Lee & Deininger 1992; Kumar et al. 1997). Later methods explicitly placed sensors to detect contamination intrusion events (Kessler et al. 1998) by defining a minimum allowable volume of contaminated water delivered prior to detection, termed the ‘level of service’. A network of WQMSs was then designed to meet the prescribed level of service. More advanced optimization methods were later applied to identify WQMS locations according to water quality simulation results, including mixed integer programing (MIP), genetic algorithms, and greedy heuristic algorithms. Ostfeld & Salomons (2004) proposed a genetic algorithm to determine the locations to place WQMSs. Performance of each WQMS network was evaluated against a large number of randomly generated contamination events. By evaluating solutions against a large number of random contamination events, the solutions are expected to perform well against any single contamination event. Berry et al. (2006) cast the sensor placement problem as an MIP to minimize the mass consumed following a contamination event. A large number of teams attempted to solve the sensor placement problem during the Battle of the Water Sensor Networks (BWSN) (Ostfeld et al. 2008). A set of 5 and 20 sensors was placed to minimize the: 1. time required to detect a contamination event, 2. population affected prior to detection of a contamination, 3. mass delivered prior to detection of a contamination event, and to maximize the likelihood of detection of a contamination event. Methods of the BWSN included genetic algorithms, mixed integer programing, a greedy algorithm, a demand based heuristic, an engineering strawman approach, and population based heuristic evolutionary algorithms. Specifically, the greedy algorithm proved highly effective, and was further studied (Krause et al. 2008) for sensor placement on large WDSs. Krause et al. (2008) structured the population affected objective function as a submodular function reporting the population ‘protected’ as opposed to the population affected as a function of the sensor network. Doing so allowed the proposed greedy algorithm to efficiently solve the sensor placement problem to near optimality, even on large water distribution systems. Later studies further evaluated sensor unit performance by assuming ‘imperfect’ detection (Berry et al. 2009), incorporated risk based objective functions for sensor placement (Weickgenannt et al. 2010), and uncertainties in the sensor placement problem (Xu et al. 2010; Comboul & Ghanem 2013).

In parallel, research examined the task of detecting anomalous changes in water quality measurements to identify a contamination in the system. Numerous algorithms have been developed and tested to identify true contamination events from realistic water quality data including: basic incremental outlier detection algorithms, and linear filters (Mckenna et al. 2008); artificial neural networks (Perelman et al. 2012; Arad et al. 2013); multi-variate classifiers (Oliker & Ostfeld 2014); and model based event detection algorithm incorporating variable contamination event injection concentration and durations, as well as uncertain consumer demands (Yang & Boccelli 2016b). Event detection studies have also considered a system wide event detection algorithm (EDA), where water quality signals from different locations are integrated to better identify true positives and reduce false positives (Koch & Mckenna 2011; Oliker et al. 2016; Yang & Boccelli 2017). Integrating the water quality signals across multiple sensors improved the performance of event detection algorithms, and provided the best performing methods for contamination event detection to date. However, no studies have attempted to place water quality sensors to best perform specifically with respect to the water quality parameters observed at those locations.

Budget constraints are expected to limit the number of sensors that can be deployed, and accordingly, water quality data will be sparsely distributed throughout the WDS. As shown above, this issue has catalyzed a large amount of research to determine what methods can be used to best place WQMSs within a WDS. However, a majority of previous work has placed sensors within the WDS based on the presence of a contaminant, not based on the response of water quality parameters that a WQMS would observe, due to the presence of a contaminant. This limitation may lead to a large gap in the expected performance and the true performance of a WQMS used for contamination detection as a location that may be best for detecting a contaminant, may not be the best for detection of anomalous water quality signals.

This study aims to determine how WQMS locations influence the performance of an early warning system for contamination event detection, specifically when WQMSs observe surrogate water quality signals. This method allows the system design phase to incorporate the effects of background water quality uncertainty into the contamination detection task. Placing sensors under deterministic operating conditions and with conservative contaminants has provided strong baselines for sensor placement, however, these systems may not perform well in a real-world scenario where uncertainties are present. As shown in Figures 1 and 2, following injection the contaminant itself is often propagated throughout the WDS relatively evenly, however, the water quality signals can vary throughout the WDS as a function of operational conditions, and WDS system design. By incorporating background water quality uncertainty into the sensor placement task, we expect that sensors will be placed at network locations that provide water quality signals most indicative of a contamination, and least sensitive to changes in the background water quality signals.

Figure 1

Likelihood of contaminant arriving at a node following the simulation of 1,000 random contamination events (instant input of 5 kg in the WDS).

Figure 1

Likelihood of contaminant arriving at a node following the simulation of 1,000 random contamination events (instant input of 5 kg in the WDS).

Figure 2

Water quality signals observed at selected locations following a contaminant intrusion at the grey star within the Net3 WDS.

Figure 2

Water quality signals observed at selected locations following a contaminant intrusion at the grey star within the Net3 WDS.

METHODOLOGY

Water quality simulations were performed within EPANET-MSX (Shang et al. 2007). EPANET-MSX uses the hydraulic simulation data calculated within EPANET 2 (Rossman 2000) and then solves the multi-species chemical equilibrium and reaction equations throughout the simulated water distribution system for a defined simulation time. Nicotine was used as a contaminant to be injected into the network for optimizing WQMS locations. To analyze the sensitivity of the WQMS placements to the water quality model used for optimization, a second water quality model was applied and a different contaminant was simulated to be injected into the network, Parathion. Water quality monitoring stations were assumed to monitor three water quality parameters, free chlorine, pH, and alkalinity. Accordingly, the respective reaction and equilibrium equations governing an intrusion of Nicotine and Parathion were applied from the previous studies of Yang & Boccelli (2016a), and Schwartz et al. (2014), respectively, described in the section below.

Water quality simulations

For both water quality models, chlorine input was defined at a source location to simulate chlorination in the water distribution system. Preliminary simulations showed the chlorine concentrations throughout the network to be cyclical with respect to time and correlated with pumping schedules changing the hydraulic regime of the WDS. To reduce the cyclical nature of the chlorine concentration throughout the simulation, water quality simulations were modeled for 12 days, allowing background chlorine levels to reach a relatively steady state. In the baseline model, after the 12-day simulation, contamination events were defined to take place at any point between the 15th and 16th day of the simulation at any node in the network as an instantaneous injection of 5 kg of contaminant. To reduce the computational burden of the water quality simulations, the water quality parameters at the end of the 12-day simulation were saved and a 5-day simulation was initialized with the water quality parameters calculated at the end of the 12-day simulation. To further test the solutions developed using the baseline Nicotine model, a sensitivity analysis was performed by setting the contamination injection duration to 12 hours. A 15-minute water quality timestep was defined for the water quality simulations. It is important to note that the water quality models for the two contaminants are different; the Nicotine model uses a two species second order decay model, and the Parathion model uses a first order decay model. This difference leads to discrepancies in the background chlorine levels in the two models, and thus, Nicotine was considered as the main contaminant of interest for this study. The Parathion injection was used in a sensitivity analysis to compare the sensor networks against a contaminant and water quality models that were not used during the optimization phase.

Nicotine model

Nicotine was used as a contaminant for the primary responsiveness of chlorine to its presence. This primary response is expected to lead to a weak multi-variate signal, because the pH and alkalinity do not show a strong response. For designing a network of WQMSs, a weak signal is advantageous in that a system able to detect a weakly responsive contaminant would be expected to easily detect any other more responsive contaminant. The 12-day simulation allowed the pH levels and the chlorine levels to come to equilibrium within the WDS. The background pH equilibrium equations are shown below (Equations (1)–(4)) followed by the reaction equations for chlorine, alkalinity, and pH as a response to the presence of Nicotine (Equations (5)–(8)). Table 1 presents the reaction rate coefficients for chlorine decay according to background dissolved organic carbon and Nicotine . For a more detailed discussion of the equilibrium models, reaction equations, and rate coefficient estimation the reader is directed to Yang & Boccelli (2016a). 
formula
(1)
 
formula
(2)
 
formula
(3)
 
formula
(4)
 
formula
(5)
 
formula
(6)
 
formula
(7)
 
formula
(8)
Table 1

Nicotine model reaction rate coefficients

Reaction coefficient Value  
 0.0015 
 0.0045 
 0.028 
 0.239 
Reaction coefficient Value  
 0.0015 
 0.0045 
 0.028 
 0.239 

Parathion model

Parathion was chosen as a second contaminant, used to test the sensitivity of the sensor network to a contamination and a water quality model ‘unknown’ during the optimization phase. Opposed to Nicotine, Parathion injections affect the background pH and alkalinity signals; combined with its effects on the background chlorine signal, Parathion is expected to provide a stronger signal in surrogate water quality parameters indicative of a contamination. Parathion is also a more toxic substance; while Nicotine has a of roughly 9.7 mg/kg (Mayer 2014), Parathion is highly toxic, with a of 2 mg/kg (Schwartz et al. 2014). Parathion should present a case in which the EWS will receive a strong, combined signal from all water quality parameters, however, given the low the system will require especially prompt detection to reduce the population affected. For a single EDA however, the two contaminants require parameterizing an EDA such that it will recognize either contamination signal, while still maintaining a low false positive detection rate for both the Nicotine and Parathion contaminations.

Table 2

Parathion model reaction rate coefficients

Parameter name Value Description 
  Reaction rate coefficient between PA and HOCl 
  Hydrolysis rate coefficient for PA 
  – 
  – 
  Hydrolysis rate coefficient for PAO 
  – 
  – 
  Reaction rate coefficient between PA and OCl 
  Hydrolysis rate coefficient between PAO and OCl 
  – 
  First order chlorine decay rate 
Parameter name Value Description 
  Reaction rate coefficient between PA and HOCl 
  Hydrolysis rate coefficient for PA 
  – 
  – 
  Hydrolysis rate coefficient for PAO 
  – 
  – 
  Reaction rate coefficient between PA and OCl 
  Hydrolysis rate coefficient between PAO and OCl 
  – 
  First order chlorine decay rate 
For a more detailed description of the water quality model used for the Parathion injections, the reader is referred to Schwartz et al. (2014), and for a more thorough description of the Parathion reaction kinetics the reader is directed to Duirk et al. (2009). Reaction rate coefficient values are presented in Table 2. 
formula
(9)
 
formula
(10)
 
formula
(11)
 
formula
(12)
 
formula
(13)

Event detection algorithm

To detect contamination, a simple multi-variate incremental (termed MINC herein) algorithm was employed as presented in McKenna et al. (2008). It was assumed that a WQMS would perfectly observe and record the levels of the water quality parameters modeled in the EPANET-MSX simulation. The MINC algorithm uses a moving window to calculate the mean and standard deviation of a water quality parameter's measurements across the moving window. The mean and standard deviation of the water quality parameters are used to normalize the most recent water quality measurement, shown in the equation below: 
formula
(14)
where and are the normalized water quality signal and the actual water quality measurement for parameter p at time t, and and are the mean and standard deviation of water quality parameter p (calculated across the moving window), respectively. The set of all normalized water quality signals are input into the equation below to calculate the del value used to identify an anomalous water quality reading. 
formula
(15)
where is the del value, P is number of water quality signals being observed, and is the mean Z value of the 'th water quality parameter across the current moving window. For each data point, the MINC algorithm compares the magnitude of the del value with a predefined threshold del value. Any measurement greater than the threshold del value then is considered to be anomalous. Figure 3 provides a schematic of the EDA operation, mapping the normalized water quality signals to a single value.
Figure 3

Schematic time series of network water quality data showcasing the normalized chlorine, pH, and alkalinity signals (a) and the combined del measurement signal (b).

Figure 3

Schematic time series of network water quality data showcasing the normalized chlorine, pH, and alkalinity signals (a) and the combined del measurement signal (b).

SPSA algorithm

Water quality signals showed high spatial variability through the WDS. As a result, parameterizing a single window size and a single del threshold for all sensors would bias WQMSs to be placed at locations which performed well for the single EDA parameters assigned to the algorithm. To overcome this issue, the optimal window size and del threshold was approximated for each potential WQMS in the network and these values were assigned to the EDA when evaluating solutions during the optimization phase. The simultaneous perturbation stochastic approximation (SPSA) algorithm (Spall 1998) was used to approximate the optimal values of the window size and del threshold. Starting from an initial guess of the optimal parameters, the SPSA algorithm iteratively generates perturbations of the decision variables (in this case and ) and evaluates the effect of the perturbation on the loss function value. Using only two loss function evaluations, the SPSA algorithm then approximates the local function gradient to compute the variable values for the next iteration. A simple description of the SPSA algorithm is provided herein, for a thorough description of the SPSA algorithm the reader is directed to Spall (1998). Parameters used for the SPSA algorithm are provided in Table 3, taken from Spall (1998). In this case, a simple heuristic was defined (Equation (16)) as a loss function to minimize the population affected prior to contamination event detection, and to minimize the false positive detection rate. 
formula
(16)
where is the score of the parameters W, window size, and , delta threshold, for location i. is the mean population affected given a WQMS at location i as a function of the parameters ; and is the maximum value of the population affected. is the mean number of false positives calculated across a suite of contamination events given a WQMS at location i as a function of the parameters and ; and is the maximum possible average number of false positive detections (a false positive for each contamination event). The multiplier is used to ensure that the false positive rate is minimized, while the affected population is minimized; the multiplier will approach 1 as the approaches zero.
Table 3

Optimization scenarios prescribed for each network

Contaminant type Network condition 
Nicotine intrusion events Deterministic background water quality 
Nicotine intrusion events Uncertain background water quality 
Nicotine intrusion events Nicotine signal for detection 
Nicotine and Parathion intrusion events Uncertain background water quality 
Contaminant type Network condition 
Nicotine intrusion events Deterministic background water quality 
Nicotine intrusion events Uncertain background water quality 
Nicotine intrusion events Nicotine signal for detection 
Nicotine and Parathion intrusion events Uncertain background water quality 

Objective functions

Water quality monitoring station networks were designed to minimize the fallout of a contamination event in a water distribution system under conditions of background water quality uncertainty, while simultaneously best identifying contamination events via surrogate water quality parameters. Accordingly, the optimizer minimized the expected population affected prior to detection of a contamination event that is greater than a predefined allowable population affected, and the mean number of false positive detections calculated for each contamination event. The computation of the population affected prior to detection is shown below (Ostfeld et al. 2008): 
formula
(17)
 
formula
(18)
where is the cumulative mass delivered to node i at time t of contamination event e, is the daily per capita consumption rate (2 L/day), is the simulation timestep, is the concentration of contaminant at node i and time t of contamination event e, and is an instantaneous demand parameter (the instantaneous demand divided by the average daily demand of node ). is the cumulative population affected by event e at time t, is the standard normal cumulative demand function, is a probit slope parameter (0.34), W is the mean weight of an individual (70 kg) is the dose of contaminant expected to affect 50% of those exposed, and is the population served at node i, estimated assuming that a single consumer uses 300 L of water per day.

To incorporate uncertainty in the background water quality, the objective function initially proposed by Babayan et al. (2005), to optimize WDS design under demand uncertainty, was amended to incorporate background water quality variability and the population affected following a contamination event. Babayan et al. (2005) transformed a stochastic chance constrained optimization problem into an equivalent deterministic problem by calculating the standard deviation of WDS junction pressures due to junction demand uncertainty. The standard deviation of junction pressures was then used to characterize the effect of demand uncertainty, and as a ‘safety factor’ for system designs. For example: ensuring that a junction's pressure was above a minimum threshold pressure plus two times the calculated standard deviation of that junction's pressure was equivalent to meeting a 95% chance constraint, as two standard deviations describe 95% of a value's variability.

This approach was applied herein, however opposed to network demand, chlorine input concentrations were generated using Monte Carlo sampling from a uniform distribution defined within 30% of the deterministic model's input parameters and each resultant chlorine input level was then simulated. For each defined contamination event, the respective chlorine inputs were used to simulate the fate of the contaminant and the response of chlorine, pH, and alkalinity; and compute the cumulative population affected by the contamination event over a defined simulation time using EPANET-MSX. During optimization, the water quality signals resulting from each chlorine input condition were passed through the EDA, and the corresponding population affected was computed. The standard deviation of the population affected prior to detection for a respective contamination event was then computed to provide a ‘safety factor’ in the EWS's design. Equation (19) below shows the resultant objective function used, which returned the population affected greater than a threshold allowable population affected minus the ‘safety factor’ in system design. 
formula
(19)
where is the population affected greater than the threshold allowable affected population, is the number of contamination events evaluated, is the allowed population affected by a single contamination event, is the factor of safety value used to define the level of system robustness ( is a deterministic solution, estimates 99.7% of the potential affected population levels), is the standard deviation in the affected population as a function of background chlorine (for a specific EDA), and ) is the population affected prior detection of contamination event e under deterministic conditions.

Minimizing the PA objective (Equation (19)) requires that a sensor network promptly detects contaminations emanating from locations; and at times that place large populations at risk of exposure. Conversely, contamination events with more benign input locations and times are proportionally less important for the sensor network to detect promptly. With a limited number of WQMSs to place in a WDS, there is an inherent tradeoff between quickly detecting contamination events and detecting all possible contamination events. Minimizing the population affected objective implicitly considers this trade off by attempting to quickly detect the contamination events that quickly affect large populations, and allows longer detection times for contamination events that take a longer time to infect large populations.

To maximize the dependability of the wireless sensor network, the sensors were placed at the locations which provide the most robust and reliable water quality signals. This was accomplished by minimizing the affected population above (Equation (19)), while simultaneously minimizing the expected number of false positive detections for each contamination event, shown below: 
formula
(20)
 
formula
(21)
where is the mean number of false positives reported by solution S, E is the total number of contamination events evaluated, is a simple Boolean count function, which takes a time series of the del values calculated at a specific WQMS, and the threshold del value , and reports the number of times that the WQMS computes a del signal greater than the threshold del value before the contamination intrusion takes place ( is the input time of contamination event ).

Optimization

An efficient noisy multi-objective messy genetic algorithm (GA) was employed to identify the set of network junctions that best detected contamination events. The employed GA used adaptive population sizing, and generational contamination event suite re-sampling to improve the efficiency of the GA. Messy GA operators were chosen to allow the GA to add or remove sensors according to the best system performance. Unlike previous WQMS placement studies, no defined number of WQMSs, number of allowable WQMSs, or solution cost objectives were defined that actively constrained or minimized the number of WQMSs placed in the network. By minimizing the expected number of false positive detections, optimization removed sensors that did not reduce the population affected, and only added false positive detections. Adaptive population sizing, as in Kollat & Reed (2006), improved efficiency in search, and ‘re-stimulated’ search by initializing new populations throughout the GA run, and re-sizing the new populations according to the size of the current Pareto optimal set. Generational re-sampling was incorporated to improve the ability of solutions to generalize and perform well against uncertainty in contamination event characteristics. During each generation of the GA, candidate solutions were evaluated against a suite of randomly generated contamination events, and the mean performance across all contamination events was reported. This method is sensitive to the size of the contamination event suite used. A small contamination event suite may not appropriately characterize the true expected behavior of any single contamination event according to the law of large numbers (i.e. the mean of all possible contamination events). In this case, the generated solutions are likely to perform poorly against contamination events not included in the contamination event suite (overfitting). Using a large contamination event suite will lead to better solution generalization, however, it will greatly increase the computational burden of the algorithm because every solution will need to be evaluated against a large number of contamination events. Prior to the optimization phase, a suite of 400 random contamination events was randomly sampled; and during each generation of the GA, 100 contamination events were randomly sampled from the initial 400 contamination events, and used for evaluation during the GA. This method permitted the use of a relatively small evaluation suite during each GA generation, while the re-sampling operator continuously exposed the GA to a diverse set of contamination events.

Messy GAs (Goldberg et al. 1989) are partitioned into two distinct phases: the primordial phase, and the juxtapositional phase. During the primordial phase, all possible combinations of variables are defined (up to some number of variables, k) and each combination is evaluated. The initial population used by the GA is generated by sampling combinations of variables generated during the primordial phase, according to their relative performance. This method initializes the GA with an initial population ‘primed’ with high performing variable combinations (building blocks). In this study k was set to one, such that the primordial phase of the GA would evaluate every single sensor location. During the primordial phase the SPSA algorithm was run for each sensor location before evaluating the performance of a sensor at that location. A single value was required to score the performance of each single sensor location prior to sampling building blocks into the GA's initial population. The heuristic loss function proposed above, Equation (16), was used to score the single sensor location performances.

During the juxtaposition phase the GA operates similarly to a traditional GA, evaluating a population of solutions, sampling solutions according to their relative performance, crossing solutions together and mutating individual solutions, and lastly placing the newly generated solutions into the population of the next generation. However, in the case of a messy GA, crossover is replaced with a ‘cut and splice’ operator. Cut and splice either cuts two solutions at some point within their respective solution strings, and exchanges the portions of the string around the cut point; or splice simply merges two solutions together end to end. For this study, the probability of a solution to be cut was 0.9, and the probability that a solution would be spliced was 0.1. Mutation was defined to redefine, add, or remove one value of a solution string following cut and splice (i.e. one WQMS within the solution string). The mutation rate was defined according to the size of the current population, such that a single instance of mutation was expected for each generation, and equal probabilities were defined to add, remove, or re-define the value of a solution. A dominance based tournament of size 2 was chosen to select new solutions from the current GA populations. Tournament participants were compared to the current non-dominated set of solutions, and a solution that was non-dominated with respect to the current ‘optimal’ non-dominated set was assigned to ‘win’. In the event of a tie (both solutions ‘win’ or both solutions ‘lose’) a sharing function was used to calculate the density of nearby solutions (a sharing radius was set to 0.25), and the solution with the lowest local solution density was chosen as the tournament winner.

Figure 4 provides a schematic overview of the entire methodology.

Figure 4

Schematic overview of the entire proposed methodology.

Figure 4

Schematic overview of the entire proposed methodology.

CASE STUDIES

The methodology proposed above was tested on two distinct WDSs: the EPANET Net3 network (Rossman 2000), composed of 92 junctions, 2 reservoirs, 3 tanks, 2 pumps, and 117 pipes; and the Ky7 network (Jolly et al. 2014) composed of 481 junctions, 1 reservoir, 3 tanks, 1 pump, and 603 pipes. The Net3 network represents a network with primarily one-directional flow from sources towards consumers, while the Ky7 network represents a network with large amounts of mixing and multi-directional flow in the center of the WDS; and the network is more branched along the periphery. For each WDS, chlorine was input and varied at the source of the WDS. Figure 5 shows a plot of the respective WDSs, the chlorine input locations, and the daily average network-wide chlorine concentrations following the 12-day simulations. For the Net3 network, the mean chlorine concentration across all network junctions was 2.55 mg/L with a standard deviation of 0.32 mg/L, and for the Ky7 network, the mean chlorine concentration across all network junctions was 2.374 mg/L with a standard deviation of 0.497 mg/L.

Figure 5

Plot of the Net3 and Ky7 networks and respective chlorine input locations. For each junction the darkness represents the average chlorine concentration after the initial 12-day simulation.

Figure 5

Plot of the Net3 and Ky7 networks and respective chlorine input locations. For each junction the darkness represents the average chlorine concentration after the initial 12-day simulation.

For each network, the optimization scenarios defined in Table 4 were conducted. Following optimization, final evaluation was conducted against 1,000 newly generated contamination events. Each contamination event can take place at any junction in the network, at any time of the day, with a total mass input ranging from 0 to 5 kg instantaneously input into the system. Each proposed WQMS network was exposed to the contamination events, and the population affected prior to detection was reported. If the affected population was less than 0.5% of the total population served, the solution was deemed to ‘pass,’ and otherwise was deemed to fail. Solutions were compared based on the percentage of the 1,000 contamination events where the solution ‘passed.’ For the previously developed solutions, the EDA parameters calculated using the SPSA algorithm were assigned to the sensors for evaluations. Supplementary data including all Matlab files and EPANET files can be found at: https://www.dropbox.com/sh/8iitq658g547t1o/AABH61Ak1LHR3nn_k4GTucpWa?dl=0.

Table 4

SPSA parameters

Parameter Value 
 20 
 0.602 
 0.101 
 
 2.5 
 
Parameter Value 
 20 
 0.602 
 0.101 
 
 2.5 
 

RESULTS

The proposed optimization method (proposed GA) proved successful to develop a network of WQMSs for contamination event detection using surrogate water quality parameters. Water quality monitoring stations were placed at locations which reduced the expected population affected, while simultaneously reducing the expected number of false positive detections. These two objectives proved effective in also governing the number of sensors placed in the network. Traditionally, sensor placement has been performed under the paradigm of a network coverage problem, where more sensors equals better event detection performance. Given that sensors also provide false positive detections, more sensors may not equate directly to better system performance. Although more sensors are likely to improve the detection capabilities of a contamination early warning system, more sensors are also susceptible to more false positive detections, due to non-contamination event anomalies in the water quality signals, and thus reduce the dependability of the contamination signal. Increased false positive detections may be overcome through proper EDA parameterization, however, it is an important factor to consider in the design of an early warning system. In general, the solutions developed that provided the fewest false detections had the fewest number of sensors in the network.

Final evaluation of the solutions developed herein and from previous studies showed discrepancies in performance between the solutions designed as a part of this study, and classical solutions developed via detection of a conservative contaminant (TEVA-SPOT and the Ohar et al. (2015) solutions). In many cases, the classical solutions provided the best performance with respect to the fraction of contamination events that were detected before an allowable population was affected, however, these solutions led to unreliable false positive detection rates, often around 0.2. When evaluated against Nicotine (the same contaminant used during optimization) the solutions developed without considering background chlorine uncertainty provided performance roughly equal to the classical solutions, as shown in Figure 6(a), indicating that the variability in background water quality leads to false positive detections. In this work, only solutions developed with consideration given to background water quality variability were able to achieve reliable false positive detection rates (Figure 6(a)–(c)). When considering a second contaminant (Parathion), the systems generally performed worse (Figure 6(b)). As expected, incorporating both contaminants into the optimization phase proved to improve the solution quality, for both Nicotine and Parathion contamination intrusions. However, caution should be taken in considering the generalizability of the sensor networks, given the two contaminants used different water quality models. The increase in false positive detection rates observed in the Parathion evaluations is caused by different background water quality levels, which were expected to have arisen from the water quality model's characteristics, as opposed to background water quality. Sensor layouts are presented in Figure 7, with respect to the performance of each sensor layout.

Figure 6

Evaluation of all proposed, and classical Net3 solutions. False positive detection rate is calculated as the rate of false positives detected per contamination event. Allowable threshold affected population set to 0.5% of the total network's population served.

Figure 6

Evaluation of all proposed, and classical Net3 solutions. False positive detection rate is calculated as the rate of false positives detected per contamination event. Allowable threshold affected population set to 0.5% of the total network's population served.

Figure 7

Schematic solutions computed for the Net3 network. For each point, the respective sensor networks are presented within the WDS as the colored points. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2018.162.

Figure 7

Schematic solutions computed for the Net3 network. For each point, the respective sensor networks are presented within the WDS as the colored points. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2018.162.

The SPSA algorithm proved highly efficient for determining good parameterization for the del threshold value and window size. As seen in Figure 8 the solution ROC (receiver operator curve) plots were developed by varying the del threshold from 1% of the original del value, to 100% more than the original del value assigned by the SPSA algorithm (at 10% increments, i.e. 10%, 20%, 30% …. 190%, 200%). The ratio of true positive detections (y-axis) and the ratio of false positive detection (x-axis) was then plotted for each instance of the del threshold. The ROC plots show the difficulty in contamination event detection in WDSs. Ideally, a ROC plot will have an area beneath the curve of 1, with the optimal points located in the upper left portion of the ROC curve (true positive detection ration of 1, with a false positive detection of 0). In many cases, the SPSA parameterization corresponds to a point far from what would be expected optimal; however it is a ‘near-optimal’ point for the respective ROC plot.

Similar results were observed for the Ky7 network (Figure 9), where sensors were again placed to best detect a Nicotine contamination according to the response in chlorine, pH, and alkalinity. Again, the solutions developed to incorporate water quality uncertainty performed best when evaluated against random contamination events with potentially uncertain water quality (Figure 9). The solutions developed to detect only Nicotine performed worst, with solutions developed according to deterministic surrogate water quality parameters performing better than the Nicotine detection case, but worse than the case incorporating background water quality uncertainty. Unlike the Net3 case study, in the Ky7 network the deterministic optimization case led to a larger number of potential solutions than the uncertain optimization case. Many of the solutions developed in the deterministic case performed quite well during evaluation with respect to the uncertain solutions. Interestingly, only during the deterministic optimization phase were solutions developed which allowed high false positive detection rates, >0.5. This is indicative of the deterministic optimization case being able to develop sensor networks that result in lower population affected than the uncertain case, however, the solutions are highly sensitive to potential background water quality uncertainty. Figure 10 provides the sensor layouts calculated for the Ky7 network alongside the respective layout's performance.

Figure 8

Sensor network receiver-operating-curves for solutions generated on the Net3 network. Window A presents the solutions generated using an initial window size of 150 and a del threshold of 2 given to the SPSA algorithm. Window B presents the solutions generated using an initial window size of 80 and a del threshold of 3 given to the SPSA algorithm. Window C presents solutions generated for Parathion and Nicotine detection using an initial window of 80 and a threshold of 3 given to the SPSA algorithm.

Figure 8

Sensor network receiver-operating-curves for solutions generated on the Net3 network. Window A presents the solutions generated using an initial window size of 150 and a del threshold of 2 given to the SPSA algorithm. Window B presents the solutions generated using an initial window size of 80 and a del threshold of 3 given to the SPSA algorithm. Window C presents solutions generated for Parathion and Nicotine detection using an initial window of 80 and a threshold of 3 given to the SPSA algorithm.

Figure 9

Evaluation results of the Ky7 solutions for detection of a Nicotine injection.

Figure 9

Evaluation results of the Ky7 solutions for detection of a Nicotine injection.

Figure 10

Schematic solutions computed for the Ky7 network. For each point, the respective sensor networks are presented within the WDS as the colored points. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2018.162.

Figure 10

Schematic solutions computed for the Ky7 network. For each point, the respective sensor networks are presented within the WDS as the colored points. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2018.162.

Receiver operator curves were generated for the Ky7 solution, similarly to the solutions evaluated from the Net3 network. Opposed to the Net3 network, the analysis showed very difficult ROC curves for the solutions generated for the Ky7 network, and the EDA parameters chosen using the SPSA algorithm were often located at sub-optimal points along the curve. Interestingly, the ROC curve shows that the parameters assigned using the SPSA algorithm are often located at the worst possible parameterizations, points that provide the worst performance with respect to the likelihood of false positive detection with zero benefit to the likelihood of contamination event detection (Figure 11). As a secondary analysis, the EDA parameters were varied and evaluated with respect to the affected population, shown in Figure 12. This analysis showed the effectiveness of the SPSA method, as the SPSA parameterizations were best able to span the entire solution space. All other assigned parameterizations cluster solutions within specific reigns of the solutions space, for example, a del value of ½ the SPSA value leads to worse detection performance, and a del value of ¼ the SPSA value leads to a large number of false positive detections regardless of the sensor layout. Increasing the SPSA del value by a factor of 1.5 and 2 leads to worse event detection performance, with relatively no benefit in the false positive detection rate. The discrepancy in performance, according to Figures 11 and 12, is believed to be due to the non-trivial relationship between ROC evaluation measures, event detection likelihoods, and the evaluation measures used in sensor network design. Care should be taken in evaluating event detection algorithms with standard ROC curves, as optimal performance with respect to a ROC curve may not lead to optimal performance with respect to contamination event detection metrics.

Figure 11

Sensor network receiver-operating-curves for solutions generated on the Ky7 network. Each plot corresponds to a specific sensor network in the Ky7 network developed under conditions of uncertain water quality.

Figure 11

Sensor network receiver-operating-curves for solutions generated on the Ky7 network. Each plot corresponds to a specific sensor network in the Ky7 network developed under conditions of uncertain water quality.

Figure 12

Secondary evaluation of the Ky7 EDA parameterization with respect to the allowable population affected threshold.

Figure 12

Secondary evaluation of the Ky7 EDA parameterization with respect to the allowable population affected threshold.

A number of sensitivity analyses were performed for the Net3 solutions to determine how the solutions developed within this study under naïve conditions would perform in a more realistic scenario. Accordingly, all solutions were evaluated under conditions of: continuous 12 hour contaminant injection with a total mass input equal to the baseline instantaneous contamination input; consumer demands randomly sampled within 15% of the deterministic model's demand level for each instantaneous contamination event; and an EDA observing only Nicotine and pH water quality data. The results of these evaluations are shown below in Figure 13.

Figure 13

Post optimization of sensor networks under variable contamination event and EDA conditions. Plot (a) shows the performance of the sensor networks when evaluated against contamination events injected continuously for 12 hours. Plot (b) shows the effect of demand uncertainty. Plot (c) shows the effect of evaluating the solutions when using only the chlorine signal and the pH signal.

Figure 13

Post optimization of sensor networks under variable contamination event and EDA conditions. Plot (a) shows the performance of the sensor networks when evaluated against contamination events injected continuously for 12 hours. Plot (b) shows the effect of demand uncertainty. Plot (c) shows the effect of evaluating the solutions when using only the chlorine signal and the pH signal.

The results of these sensitivity analyses provide valuable insight into the proposed systems. First, the detection performance of the sensor networks generally decreases when considering longer, lower mass input rate contamination events and uncertain demands. Specifically, uncertain demand conditions are shown to greatly increase the number of false positive detections of all solutions, and are important to consider when using surrogate water quality parameters for event detection. Using only the chlorine and pH water quality signals greatly reduces the false positive detection rates of all solutions. Simplifying the EDA to use as few water quality data streams as possible can reduce system complexity, and improve performance, however, it may limit the detection capabilities to a fewer number of reactive contaminants.

CONCLUSIONS

The simulations herein have provided an initial investigation into the influence of uncertain water quality on the placement of water quality monitoring stations. Using surrogate water quality parameters to identify a contamination event combines the signal-processing task of detecting the presence of a contamination, with the sensor placement task of best exposing sensors to potential contaminations. The sensor networks developed herein have been optimized to best detect a contamination event under conditions of background water quality uncertainty.

The method proposed herein successfully placed sensors to best detect contamination events. Minimizing the population affected greater than an allowable affected population while minimizing the expected number of false positive detections for a contamination event provided efficient sensor solutions for contamination event detection. Because no constraint was imposed on the number of sensors within a sensor network, the proposed GA only placed a sensor if it reduced the affected population, with a minimal increase in the potential for false positive detections. This framing of the sensor placement problem provides a unique method to best ensure that only the most efficient sensors are placed in the water distribution system.

Recent developments in event detection in water distribution systems have shown the desire to reduce the rate of false positive detections provided by an event detection algorithm. Identifying the best locations to place water quality monitoring stations by explicitly considering the water quality signal observed at each location in the WDS has been shown to provide lower false positive detection rates and strong performance with respect to the population affected by the contamination, even when employing a basic local EDA. However, using surrogate water quality signals to place sensors in a WDS places a large amount of dependence on the fidelity of the water quality model used. In this study, the solutions developed using the Nicotine water quality model were shown to outperform previous benchmark solutions, deterministic solution models, and contaminant based sensor placements; and although the solution performance degrades when evaluated against a second water quality model, the same trends are observed.

As a preliminary study in EDA based sensor placement there are numerous limitations to this study, and directions to consider in future work. To begin with, of the three water quality parameters chosen, pH and alkalinity are highly correlated, and this correlation can be observed in Figure 3. In the future it would be beneficial to incorporate either the pH or the alkalinity and employ a third measure that is not correlated with the free chlorine or the pH, or to simplify the model and only use free chlorine and pH or alkalinity, which has been shown to provide strong performance for Nicotine detection (Figure 13). Secondly, the EDA used herein is quite naïve; state-of-the-art integrated system wide EDA algorithms or more intelligent algorithms which recognize water quality anomalies caused by normal operation from ‘unknown’ anomalies (Hart & McKenna 2009; Romano et al. 2014) should be incorporated into the proposed framework. Given the modularity of the proposed method, any event detection algorithm can be incorporated for the event detection task. An integrated system wide EDA used during the sensor placement phase would be expected to place sensors at locations that even better detect contamination with reduced false positive rates given the water quality signal observed. This EDA can facilitate intelligent recognition of false positive events caused by normal operation to reduce the false positive rate. In this study, sensor networks were designed and EDA parameters were prescribed based only on instantaneous contamination event simulations. It would be valuable to consider variable contamination event characteristics in future work, specifically for prescribing EDA parameters. Considering longer duration contaminations may improve the system wide total event detection performance when contaminations do not travel through the WDS as a clear contaminant ‘pulse.’ Secondly, uncertainty in consumer demands should be integrated in the contamination event simulations, and system design phase to ensure that the sensor networks are robust to variable network hydraulics, and thus variable contaminant transport times. Demand uncertainty can be easily integrated in to the proposed methodology using the proposed objective function (Equation (19)) by simultaneously simulating uncertain demand realizations alongside uncertain chlorine input realizations. Lastly, new objectives can be incorporated into the optimization framework for additional design criteria. For example, an additional objective could be incorporated to best locate a contamination event following detection, integrating the contamination event detection and localization problems into the sensor network design phase. Given the widespread availably of high-powered computing, fully integrating event detection and event localization tasks within the sensor placement task, even with complex water quality simulations and incorporation of uncertainties, has become more feasible.

ACKNOWLEDGEMENTS

This study was supported by the United States – Binational Science Foundation (BSF), by the Technion Funds for Security research, by the joint Israeli Office of the Chief Scientist (OCS) Ministry of Science, Technology and Space (MOST), and by the Germany Federal Ministry of Education and Research (BMBF), under project no. 02WA1298.

REFERENCES

REFERENCES
Arad
J.
,
Housh
M.
,
Perelman
L.
&
Ostfeld
A.
2013
A dynamic thresholds scheme for contaminant event detection in water distribution systems
.
Water Res.
47
,
1899
1908
.
doi:10.1016/j.watres.2013.01.017
.
Babayan
A.
,
Kapelan
Z.
,
Savic
D.
&
Walters
G.
2005
Least-cost design of water distribution networks under demand uncertainty
.
J. Water Resour. Plan. Manag.
131
,
375
382
.
Berry
J.
,
Hart
W. E.
,
Phillips
C. A.
,
Uber
J. G.
&
Watson
J.-P.
2006
Sensor placement in municipal water networks with temporal integer programming models
.
J. Water Resour. Plan. Manag.
132
,
218
224
.
doi:10.1061/(ASCE)0733-9496(2006)132:4(218)
.
Berry
J.
,
Carr
R. D.
,
Hart
W. E.
,
Leung
V. J.
,
Phillips
C. A.
&
Watson
J.-P.
2009
Designing contamination warning systems for municipal water networks using imperfect sensors
.
J. Water Resour. Plan. Manag.
135
,
253
263
.
doi:10.1061/(ASCE)0733-9496(2009)135:4(253)
.
Comboul
M.
&
Ghanem
R.
2013
Value of information in the design of resilient water distribution sensor networks
.
J. Water Resour. Plan. Manag.
139
,
449
455
.
doi:10.1061/(ASCE)WR.1943-5452.0000259
.
Goldberg
D.
,
Korb
B.
&
Deb
K.
1989
Messy genetic algorithms: motivation, analysis, and first results
.
Complex Syst.
3
,
493
530
.
Hart
D. B.
&
McKenna
S. A.
2009
CANARY user's manual, version 4.1. Sandia National Laboratories. US Environmental Protection Agency
.
Jolly
M. D.
,
Lothes
A. D.
,
Bryson
L. S.
&
Ormsbee
L.
2014
Research database of water distribution system models
.
J. Water Resour. Plan. Manag.
140
,
410
416
.
doi:10.1061/(ASCE)WR.1943-5452.0000352
.
Kessler
A.
,
Sinai
G.
&
Ostfeld
A.
1998
Detecting accidental contaminations in municipal water networks
.
J. Water Resour. Plan. Manag.
doi:10.1061/(ASCE)0733-9496(1998)124:4(192)
.
Koch
M. W.
&
McKenna
S. A.
2011
Distributed sensor fusion in water quality event detection
.
J. Water Resour. Plan. Manag.
137
,
10
19
.
doi:10.1061/(ASCE)WR.1943-5452.0000094
.
Krause
A.
,
Leskovec
J.
,
Guestrin
C.
,
VanBriesen
J.
&
Faloutsos
C.
2008
Efficient sensor placement optimization for securing large water distribution networks
.
J. Water Resour. Plan. Manag.
doi:10.1061/(ASCE)0733-9496(2008)134:6(516)
.
Kumar
A.
,
Kansal
M. L.
&
Arora
G.
1997
Identification of monitoring stations in water distribution system
.
J. Environ. Eng.
doi:10.1061/(ASCE)0733-9372(1997)123:8(746)
.
McKenna
S. A.
,
Wilson
M.
&
Klise
K. A.
2008
Detecting changes in water quality data
.
J. Am. Water Works Assoc.
100
,
74
85
.
Oliker
N.
&
Ostfeld
A.
2014
Minimum volume ellipsoid classification model for contamination event detection in water distribution systems
.
Environ. Model. Softw.
57
,
1
12
.
doi:10.1016/j.proeng.2014.02.141
.
Oliker
N.
,
Ohar
Z.
&
Ostfeld
A.
2016
Spatial event classification using simulated water quality data
.
Environ. Model. Softw.
doi:10.1016/j.envsoft.2015.11.013
.
Ostfeld
A.
&
Salomons
E.
2004
Optimal layout of early warning detection stations for water distribution systems security
.
J. Water Resour. Plan. Manag.
doi:10.1061/(ASCE)0733-9496(2004)130:5(377)
.
Ostfeld
A.
,
Uber
J. G.
,
Salomons
E.
,
Berry
J. W.
,
Hart
W. E.
,
Phillips
C. A.
,
Watson
J.-P.
,
Dorini
G.
,
Jonkergouw
P.
,
Kapelan
Z.
,
di Pierro
F.
,
Khu
S.-T.
,
Savic
D.
,
Eliades
D.
,
Polycarpou
M.
,
Ghimire
S. R.
,
Barkdoll
B. D.
,
Gueli
R.
,
Huang
J. J.
,
McBean
E. A.
,
James
W.
,
Krause
A.
,
Leskovec
J.
,
Isovitsch
S.
,
Xu
J.
,
Guestrin
C.
,
VanBriesen
J.
,
Small
M.
,
Fischbeck
P.
,
Preis
A.
,
Propato
M.
,
Piller
O.
,
Trachtman
G. B.
,
Wu
Z. Y.
&
Walski
T.
2008
The Battle of the Water Sensor Networks (BWSN): a design challenge for engineers and algorithms
.
J. Water Resour. Plan. Manag.
134
,
556
568
.
doi:10.1061/(ASCE)0733-9496(2008)134:6(556)
.
Perelman
L.
,
Arad
J.
,
Housh
M.
&
Ostfeld
A.
2012
Event detection in water distribution systems from multivariate water quality time series
.
Environ. Sci. Technol.
doi:10.1021/es3014024
.
Romano
M.
,
Kapelan
Z.
&
Savic
D.
2014
Automated detection of pipe bursts and other events in water distribution systems
.
J. Water Resour. Plan. Manag.
140
(
4
),
457
467
.
Rossman
L.
2000
EPANET 2 Users Manual
.
US EPA Water supply and water resources division
,
Cincinnati
.
Shang
F.
,
Uber
J.
&
Rossman
L.
2007
EPANET Multi-Species Extension User's Manual
.
Risk Reduction Engineering Laboratory, US Environmental Protection Agency
,
Cincinnati, USA
.
Spall
J. C.
1998
Implementation of the simultaneous perturbation algorithm for stochastic optimization
.
IEEE Trans. Aerosp. Electron. Syst.
doi:10.1109/7.705889
.
Weickgenannt
M.
,
Kapelan
Z.
,
Blokker
M.
&
Savic
D. A.
2010
Risk-based sensor placement for contaminant detection in water distribution systems
.
J. Water Resour. Plan. Manag.
136
,
629
636
.
Xu
J.
,
Johnson
M. P.
,
Fischbeck
P. S.
,
Small
M. J.
&
VanBriesen
J. M.
2010
Robust placement of sensors in dynamic water distribution systems
.
Eur. J. Oper. Res.
202
,
707
716
.
doi:10.1016/j.ejor.2009.06.010
.
Yang
X.
&
Boccelli
D. L.
2017
Integrated systemwide model-based event detection algorithm
.
J. Water Resour. Plan. Manag.
143
,
4017047
.
doi:10.1061/(ASCE)WR.1943-5452.0000801
.
Yang
X.
&
Boccelli
D. L
, .
2016a
Dynamic water-quality simulation for contaminant intrusion events in distribution systems
.
J. Water Resour. Plan. Manag.
142
,
doi:10.1061/(ASCE)WR.1943-5452.0000674
.
Yang
X.
&
Boccelli
D. L.
2016b
Model-based event detection for contaminant warning systems
.
J. Water Resour. Plan. Manag.
142
(
11
),
04016048
.