A novel methodology to sectorize water supply networks (WSNs) depending on a main transmission line is presented in this paper. The methodology is based on concepts derived from the social network theory and graph theory (namely, community detection and shortest path respectively); and also on a multi-objective optimization process by means of agent swarm optimization (ASO). A series of energy, operative, and economic criteria are optimized in this process. The core idea is to form sectors over the distribution network based on communities found using a community detection algorithm (Walktrap). The methodology is flexible and enables the technical staff in water utilities to make decisions at different stages. It has been tested by generating four feasible solutions over a portion of a real WSN.

## ABBREVIATIONS AND NOTATION

- ASPV
accumulated shortest path value

- ASO
agent swarm optimization

- BFS
breadth first search

- DN
distribution network

*E*elevation

*Q*flow/demand

*H*head

head loss

minimum pressure target

nodal power

operational power

penalty cost

*P*pressure

resilience index

- SP
shortest path

specific weight of water

standard deviation of pressure

total available power

- TN
trunk network

- SPV
shortest path value

- WSN
water supply network

## INTRODUCTION

Water supply network (WSN) segmentation and/or sectorization can be described as a technique for enhancing the operative control of WSNs, including non-revenue water management, pressure control, and chemical quality monitoring. The technique implies the definition of areas that can be partially isolated from the rest of the network. Such isolation is carried out by closing some installed valves (depending on their current conditions), installing new valves, or cutting some pipes. Each of the resulting sectors is then fed by a single (or several) entrances equipped with a flowmeter, which enables the inflow to be permanently monitored. Since the 1980s, when the first sectorization implementation was reported in British cities (Farley 2010), the technique has become very popular, especially among Latin American and European water utilities.

Despite the benefits associated with the deployment of a sectorization layout, there are also some drawbacks. From the outset, a pressure drop can be indicated as the most relevant drawback. When some pipes are closed, the general available flow area is reduced and this increases the head loss due to pipe friction, which translates into an overall pressure drop. Although a pressure reduction can be positive (given its relation with reductions in background leakage flow, frequency of appearance of new bursts and breaks, and domestic consumption (Thornton & Lambert 2005, 2006), an excessive reduction in pressure can also be translated into supply incapacity. Moreover, a sectorization project implementation also implies the investment of economic resources which may sometimes be scarce.

Based upon the above aspects, it can be stated that a good sectorization layout has to find a good compromise among a series of positive and negative aspects. The number of possible layouts may be large, depending on the WSN extension and its looping configuration. In some cities with small and/or highly branched WSNs, it is feasible to define sectors empirically (based only on network maps) and verify them through hydraulic simulation in any hydraulic simulation package; however, in more complex contexts (the case of most large cities) with a very large number of options, the use of mathematical and computational tools becomes necessary.

Since the publication of the *District Metering Areas: Guidance Notes* by members of the Water Loss Task Force of the International Water Association (Morrison *et al.* 2007), several methodologies have been proposed to tackle the problem of WSN sectorization. Most of these studies have addressed the problem using graph theory and energy criteria (Tzatchkov *et al.* 2008; Di Nardo & Di Natale 2011; Di Nardo *et al.* 2014; Alvisi & Franchini 2014) in combination with multi-agent systems (Izquierdo *et al.* 2011; Herrera *et al.* 2012; Hajebi *et al.* 2013), optimization approaches (Gomes *et al.* 2012; Di Nardo *et al.* 2013; Hajebi *et al.* 2014; De Paola *et al.* 2014), and, very recently, with concepts derived from social network theory (Diao *et al.* 2013; Campbell *et al.* 2014a; Giustolisi & RidoIfi 2014).

Sectorization can be addressed in two different manners depending on the topology of the WSNs. When WSNs rely on several sources within the network, sectors can be set around the sources. In this case, sectors may count on one (or several) exclusive sources. However, in other contexts, sources are far from users, and supply depends on a main transmission line (trunk network).

This work describes a flexible methodology tailored to the latter type of network. The methodology encompasses a disaggregation of the WSN into two components, namely, a trunk network (TN) and a distribution network (DN). Disaggregation is carried out in order to separate the TN pipes from the DN pipes, so the TN pipes are not included within the sectors, reducing any likely negative impact of the sectorization implementation over the DN, and enhancing leakage control (as leakage/flow monitoring is focused on the pipes of the DN where the frequency of leakage appearance is higher). The sectors that meet the established size constraints are defined over the DN. Finally, the entrance (or entrances) to each sector is determined by means of an optimization process with agent swarm optimization (ASO) (Montalvo *et al.* 2014). The water utility staff must take decisions based on their experience in three steps of the process: (1) definition of the trunk network; (2) selection of the sector size; and (3) selection of a feasible solution. These decisions define the final outcome of the whole process.

This paper is an extension of the contribution presented at the WDSA 2014 (Campbell *et al.* 2014b).

## GRAPH THEORY APPLIED TO WATER SUPPLY NETWORKS’ SECTORIZATION

Graphs can be described as structures topologically formed by nodes interconnected by links (also known as edges – in this paper, the two terms are used interchangeably). Graph theory is a branch of mathematics dedicated to the study of such structures. The topology of the WSNs coincides with the topology of their associated graphs. Therefore, WSNs can be abstracted as graphs, and the algorithms and concepts belonging to the domain of the graph theory can be implemented over them. In this work, the shortest path (SP) concept is implemented to define the TN, and a community detection algorithm is implemented to define structural communities which are used to define sectors.

The SP between nodes in a given network, or the path with minimum number of vertices, together with community detection, are two of the concepts most broadly explored within the domain of the graph and social network theory. By applying the SP concept over a digraph (a graph with the direction of links initially set) it is possible to determine the frequency of all the possible paths in a graph. The information contained about the pipes of a given WSN can be incorporated in the edges of its graph, including the direction of flow in the pipes, in order to define the directions of the edges on the graph. Evidently, to know the flow direction, a hydraulic simulation is required. When such a hydraulic simulation (e.g., with EPANET) is carried out in the scenario of maximal demand (the point where the demand pattern curve is higher), the flow direction on each pipe corresponds to the one of the worst scenarios.

Once the directions of the edges on the graph are defined, it is possible to calculate a so-called shortest path value (SPV) between each pair of nodes. This is carried out through the widely known breadth first search (BFS) algorithm (Moore 1959) which performs a recursive search of every reachable node from a given starting node. Given a source node (any node and not only the water sources), BFS systematically explores all the vertices of the graph to find the subset of reachable nodes from that source. The distance (in terms of number of vertices) from the source to the entire set of reachable nodes is then computed. Finally, a tree is formed with the source node as the root and followed by all reachable nodes.

It is important to highlight the fact that in the case of unweighted graphs, all the edges are assumed to have the same weight (1), and thus, the SP between any pair of nodes is assessed solely on the number of nodes.

The SP matrix enables ranking the nodes on their importance in the graph. A given node is considered more important the greater the number of nodes it can reach; this corresponds to the sum of every row in the former matrix. In this work, such a sum is called accumulated shortest path value (ASPV). This value can also be transferred to the links by assigning to each link a value that is equal to the ASPV on its corresponding downstream node + 1. Note that 1 is added in order to include the final node of every path, which always has 0 SPV. The greater the ASPV on a given link (or pipe), the greater its importance within the graph, and therefore, in order to maximize the reliability of the WSN, the given pipe should never be included between any sector. As the ASPV of some of the pipes might be considerably greater than the ASPV in other pipes, it is rescaled by dividing the corresponding value of each pipe by the greatest value among all the pipes. In this way, the ASPV of the pipe connected to the main source would have a value equal to 1, and the pipes connected to the extreme nodes should have values near 0.

It is important to note that if the ASPV is used alone, some pipes with a small diameter and flow can be included among the TN if they are able to deliver water to the largest pipes. Nevertheless, if the ASPV is used as a factor that multiplies the flow that is conducted through each pipe, instead of an individual criterion, only pipes with large diameter and flow values are included in the TN. We note such a multiplication by ASPV*.

As a final outcome, a hierarchy of pipes is formed with the sources as roots. If there is more than one source, the pipes connected to each source are expected to be more important in the hierarchy depending upon the number of nodes that each is able to reach.

WSNs in most large cities are formed by blocks of looped networks connected to other blocks and to a TN (using just a few pipes). The pipes of the TN are also expected to be less frequent and to have the highest ASPV* values, whereas the pipes of the DN are expected to be more frequent and to have lower ASPV* (near 0 in the case of pipes that are connected to a node that only receives water). Based on this idea, in this paper, we propose the use of analysis of frequency to determine which lines are part of the TN and which are part of the DN. To do so, a frequency table is constructed with a very large number of ASPV* breaks, in order to obtain the smallest possible intervals. The relative frequency and percentage of cumulative frequency are then calculated. When the relative frequency and the cumulative frequency are plotted in a Pareto diagram it shows how there is a large percentage of pipes with low ASPV* (approximately 80% of the pipes). It is also evident that among the pipes with less frequent ASPV* (the remaining 20%), there are noticeable changes in the tendency of their intervals. These change points can be used as a segregation criterion and will produce several alternative scopes of the TN, from which the technical staff of the water utility can select the most convenient option. For example, the scope can be limited to the set of pipes that, if removed, cause supply problems. In a more general way, the scope may include all the pipes with values that are significantly less frequent.

Original network | Feasible solution 1 | Feasible solution 2 | ||
---|---|---|---|---|

Parameters | Values | Functions | Values | Values |

0.48 | 30.68 (current value: 0.30) | 0.65 (current value: 0.47) | ||

20,524.04 | 2210.97 (current value: 22735.02) | 86.73 (current value: 20610.77) | ||

6.98 | 3.30 (current value: 10.28) | 0.029 (current value: 6.95) | ||

0.00 | 259.59 | 0.00 | ||

0 | 2 | 2 | ||

0 | 5 | 5 |

Original network | Feasible solution 1 | Feasible solution 2 | ||
---|---|---|---|---|

Parameters | Values | Functions | Values | Values |

0.48 | 30.68 (current value: 0.30) | 0.65 (current value: 0.47) | ||

20,524.04 | 2210.97 (current value: 22735.02) | 86.73 (current value: 20610.77) | ||

6.98 | 3.30 (current value: 10.28) | 0.029 (current value: 6.95) | ||

0.00 | 259.59 | 0.00 | ||

0 | 2 | 2 | ||

0 | 5 | 5 |

The idea of marking, distinguishing, and/or decoupling the TN from the DN has been previously proposed in works related to the sectorization of WSNs (e.g., Di Nardo & Di Natale 2011; Diao *et al.* 2013; Campbell *et al.* 2014b; Hajebi *et al.* 2014). Some of these studies have suggested the use of parameters associated with the hydraulics of the network (flow, head loss, diameters), or the use of a measure known as *edge betweenness*, which characterizes the centrality of the pipes throughout the network (Fortunato 2010). Using only the edge betweenness parameter has a drawback: the pipes connected to the tanks would have lower values for this parameter, making the process more complex. Using only flow as the criterion to distinguish the TN from the DN may include among the TN several pipes that are delimited for a reduced area (not so relevant for the hydraulics of the system), while the use of diameters may work well on a logical design of WSNs. However, in several real situations, it is possible to find over-dimensioned or sub-dimensioned pipes. Based on all this, the ASPV proposed here can be considered a more robust and accurate parameter to distinguish the TN from the DN.

Figure 2(b)–2(d) show three alternative scopes of the TN. Figure 2(b) includes all the pipes with values of cumulative frequency below 85%. This corresponds to the less strict scenario. Figure 2(c) corresponds to the first change in the interval values. The removal of any pipe in this subset of pipes would cut the water supply for at least a portion of the network. Finally, Figure 2(d) represents the second change of interval values, this being the stricter scenario. The removal of any of the pipes in this subset would cut the water supply in the whole network.

## COMMUNITY DETECTION ALGORITHM

As mentioned above, the sectorization of WSNs depending on a TN must be approached in a different manner than the sectorization of WSNs where the sectors are established around individual water sources. In the first case, it is convenient to sectorize based on the topology of the network, which, as described below, is related to the concept of community detection in social networks.

Within graph theory, clustering techniques (or subdivision of sets of data into highly similar groups separated by few connections) are of major interest. These techniques can be subdivided into two subgroups: flat techniques (where the number of subdivisions is a priori defined (*k-means*, *clara,* spectral clustering) and an unrelated set of clusters is produced) and those techniques where there is no initial number of divisions (producing a hierarchy of related clusters – see Manning *et al.* (2008)). The main difference between community detection and clustering is that, while communities in graphs are related, explicitly or implicitly, to the concept of edge density (number of edges inside a cluster or community versus the number of edges that connect different communities), in data clustering, clusters are sets of multidimensional vectors that are ‘close’ to each other in distance or similarity (Newman & Girvan 2004; Newman 2006a; Fortunato 2010).

One of the most widely known algorithms to detect communities in graphs is known as *hierarchical clustering*. This algorithm works based on the construction of clusters following a hierarchical structure. It can be classified among the unsupervised machine learning methods, and is widely used for classification and pattern recognition purposes. The idea behind this method is to build pairwise groups that successively merge with other groups until all the nodes of the graph are part of the same group (when a bottom-up strategy is followed); or groups successively subdivide until all the clusters are formed by one cluster or singleton (when a top-down strategy is followed). All merges or separations may be depicted through a dendrogram, which is a diagram in an X-Y plan, where the X axis contains all the observations or vectors, and the Y axis contains a scale that establishes a numerical value for each partition. One of the advantages of this technique, with respect to other clustering techniques, is the fact that it is not mandatory to initially set a number of groups on which the data must be subdivided. In this sense, there are several indicators that can be used to determine the ‘best partition’. The most widely known is the silhouette width index proposed by Rousseeuw (1987).

Despite the advantage that is represented by the fact that it is not necessary to initially predefine a number of clusters, the methods to find the most appropriate partitioning might not be suited to all situations. Most recent community detection algorithms tackle this problem by setting some quality partitioning indexes, the modularity index being the most broadly known (Newman 2006b).

In the fields of mathematics and probability theory, random walks are defined as stochastic processes where the position of a given particle at a certain instant relies solely on its position at a previous instant and on a random variable, which determines its subsequent direction and step length.

The Walktrap algorithm proposed by Pons & Lapaty (2006) is based on the idea that random walks in a graph tend to get ‘trapped’ into densely connected parts corresponding to communities. If random walks of a given length are preformed over a graph, the information in the *P* matrix (probability matrix, transition matrix, substitution matrix, or Markov matrix) about the probabilities to go from node *i* to node *j* in *k* steps is sufficiently long to gather enough information about the topology of the network.

*k*partitioning and corresponds to the distance described above.

It is important to point out two aspects: first, this function depends only on each cluster and its minimization does not require information about other sectors; second, the method follows a greedy strategy; thus, each pair of adjacent sectors is merged and the variation is computed. The merging that induces the lowest is selected as the new partition.

*m*is the number of edges,

*n*is the number of nodes, and

*H*is the height of the dendrogram. In terms of comparison with other community detection algorithms, the tests performed by Orman

*et al.*(2011) rank it as the second best community detection algorithm, just after the Infomap algorithm (Rosvall & Bergstrom 2008). The results obtained by Savić

*et al.*(2012) when comparing it with three community detection algorithms – namely, Newman-Girvan (Newman & Girvan 2004), label propagation (Raghavan

*et al.*2007), and greedy modularity optimization (Clauset

*et al.*2004) – show the advantages of the algorithm in terms of community quality and evolutionary stability.

It is important to note that the number of partitions in the resulting dendrogram can be significantly large (equal to the number of nodes), and so a manual selection is not feasible. To deal with this, a maximum threshold of pipe length per sector is defined. The threshold should be defined by the water utility staff based upon the experience in dealing with the leakage events in the network and the background leakage rate. For example, if the bursts frequency and the background leakage flow are high, then the pipe length per sector must be smaller than in the case where these parameters are lower. In this regard, GIZ *et al.* (2011) recommend 4–30 km of pipe length as an appropriate sector size range.

Once the threshold is defined, a loop is applied to compare all the partitions, starting from the best partition of the dendrogram and going either up or down, as necessary. The first partition where the length of all the sectors is under the maximum threshold value is selected. Partitions close to the best partition are stronger in terms of communitarian structure, this being the reason why the first partition that satisfies the established subdivision criterion must be selected.

In the previously described searching process, some small communities (communities that are not worth setting as sectors) can be further merged with larger sectors. Taking this into consideration, the maximum threshold can be slightly varied (lowered) so that, if a further merger affects a large sector, the final pipe length does not violate the maximum length constraint. For the above example, the maximal size constraint was set at 6 km.

### Merging small communities

A candidate pipe, as explained later, is a pipe that can be assigned as a boundary pipe between two given sectors, or assigned as a sector entrance.

*MSL*stands for minimum size length and stand for candidate pipe

*i*.

## MULTI-OBJECTIVE OPTIMIZATION

Once the list of candidate pipes is obtained, the next step corresponds to the selection of the entrance of each sector. As explained above, the pressure drop caused by closing some pipes is proportional to the number of those pipes. Nevertheless, when a candidate pipe remains open, a flowmeter must be positioned on it to control the flow. Setting more candidate pipes as closed pipes than sector entrances is preferred for two reasons: first, the smaller the number of entrances to each sector, the greater the accuracy in controlling inlet flow and, therefore, the greater the probability of detecting a new leak; on the other hand, the cost of installing a flowmeter in economic and operative terms is expected to be higher than the cost of installing a boundary valve (when it is necessary to install a new valve).

The problem is addressed as a multi-objective optimization problem with five objectives: (objective 1) the minimization of the deviation , of the resilience index proposed by Todini (2000); (objective 2) the minimization of the increase of operational power ; (objective 3) the minimization of the variation of standard deviation of the pressure between the network nodes; (objective 4) a penalty cost , for the nodes that are unable to meet a minimum pressure target ; and finally (objective 5) the minimization of the cost associated with the purchase of new valves and flow meters . The first two objectives can be considered as energy criteria, the next two as operative criteria, and the fifth objective as an economic criterion. The optimization problem is solved using the ASO algorithm proposed by Montalvo *et al.* (2014). Agent swarm optimization is a novel paradigm that exploits swarm intelligence and borrows some ideas from multi-agent based systems. It is aimed at supporting decision-making processes by solving multi-objective optimization problems. ASO offers robustness through a framework where various population-based algorithms co-exist. In addition, agents are endowed with sets of problem-dependent specific rules with two clear objectives, namely fine-tuning the agent behavior to specialize in the optimization problem being tackled, while, at the same time, reducing the search space, thus enabling decisions to be produced with increased reliability and within a reasonable time frame.

Each of the former functions was included as it was considered that they fairly represent the most important aspects that need to be considered for the sectorization of a middling or large WSN. However, as explained below, there are still some other important aspects that should be taken into consideration.

In the following sections the objectives are described.

### Objective 1: variation of the resilience index

The measures the capacity of a network to overcome pipe failures and still be able to satisfy a minimum pressure restriction.

*:*head of a tank ; : flow of a pump ; : head of a pump .

*et al.*(2014) (Equation (10)): where : after sectorization; : before sectorization.

### Objective 2: minimization of the variation of the operational power

### Objective 3: minimization of variation of the standard deviation of pressure between the network nodes

Pressure uniformity in WSNs plays an important role in the reduction of . More pressure-uniform WSNs are expected to be more efficient in energy terms, and to some extent, in terms of leakage management as well. The second aspect is related to the control of excessive pressures. When pressure is reduced, a reduction in the background leakage flow is to be expected. However, leakage control by means of pressure reduction has to be carried out with extreme precaution as it might also be translated into supply incapacity (Morrison *et al.* 2007).

### Objective 4: penalty cost for deficit of nodal pressure

*m*); : minimun pressure target ; : pressure in node .

In the equation above, *n* represents the number of nodes where the pressure constraint is not satisfied.

A symbolic value of 0.8 monetary units/m was assumed for .

### Objective 5: economic criteria

As described in the Introduction, when performing sectorizations, water utilities require some investment for the purchase and installation of boundary valves (if the existing valves are in poor condition or there are no valves) and flowmeters. Normally, valves are cheaper than flowmeters. In some countries (especially developing countries), they are completely buried, with a small empty cylinder giving access. Conversely, flowmeters are more expensive and must be located in special chambers. Such chambers need enough room to install the flowmeter as well as a bypass, pressure gauge, and at least three control valves (two for flowmeter maintenance and one for the bypass). The chambers also require enough room for a person to be able to perform maintenance. In a real situation, the construction of a flowmeter chamber disturbs normal city life as it interrupts traffic, especially when chambers have to be located over the TN pipes. This type of impact can be considered as an externality and is not normally taken into consideration. As this methodology isolates the TN from the DN, the impact described above is significantly reduced.

### Limitations of the optimization model

It is very important to stress that although this optimization scheme may work to conveniently address the problem of sectorization, the results cannot be considered as completely optimized. A reduction of the network pressure not only reduces the background leakage flow, but also the frequency of new pipe breaks and bursts (Pearson *et al.* 2005; Fantozzi & Lambert 2007). This is very important, as a 28–80% reduction in break-bursts can be translated into a reduction of annual repair costs that is usually far greater than the value of the water saved (Thornton & Lambert 2006). A pressure reduction is also expected to have an effect on domestic consumption (Lambert & Thorton 2005). Therefore, such effects need to be considered among the optimization functions.

It is also very important to note that the optimal point of leakage reduction varies according to the cost of water production and transportation; therefore, this point must be considered as a restriction of the optimization model.

Based on these two aspects, a more complete optimization model must initially define the economic level of leakage reduction, and based on that level, estimate the required pressure reduction and the set of feasible alternative sectorization layouts. We are currently working on this new model.

## APPLICATION TO AN EXAMPLE NETWORK

Original network | Solution 1 | Solution 2 | Solution 3 | ||
---|---|---|---|---|---|

Parameters | Values | Functions | Values | Values | Values |

0.55 | 12.53 | 14.93 | 16.52 | ||

24788.61 | 171.58 | 161.77 | 173.76 | ||

5.47 | 0.511 | 1.20 | 0.51 | ||

0 | 0 | 0 | 0 | ||

0 | 6 | 6 | 6 | ||

0 | 15 | 15 | 15 |

Original network | Solution 1 | Solution 2 | Solution 3 | ||
---|---|---|---|---|---|

Parameters | Values | Functions | Values | Values | Values |

0.55 | 12.53 | 14.93 | 16.52 | ||

24788.61 | 171.58 | 161.77 | 173.76 | ||

5.47 | 0.511 | 1.20 | 0.51 | ||

0 | 0 | 0 | 0 | ||

0 | 6 | 6 | 6 | ||

0 | 15 | 15 | 15 |

## CONCLUSIONS

This paper tackles the problem of WSN sectorization depending on a TN by using a novel methodology based on the concepts of SP derived from graph theory and the Walktrap community detection algorithm derived from social network theory, and a multi-objective optimization process using ASO. The methodology provides at different steps, feasible solutions, from where the technical staff of the water utilities may select the most convenient solution. To address the definition of the TN in a WSN (where it is not yet defined) a hierarchy of pipes, according to their importance for the supply, is established based on the ASPV* coefficient proposed here. This criterion is more robust than the use of a single characteristic of the pipes because it considers the flow magnitude and the direction of each pipe in the most critical scenario. Such a hierarchy of pipes also enables the water utilities to define the scope of the TN based on the characteristics of each context. Once the TN is defined, it is detached from the DN, in order not to include it among the sectorization, thus, ensuring the reliability of the WSN.

The Walktrap algorithm creates a structural hierarchy of communities in the DN, from where a partition can be selected out of a set of obtained partitions, depending on the given context. The advantage behind the use of a community detection algorithm is related to the fact that it reveals the community structure that is expected to be found in real WSNs, which is normally formed by blocks of looped subnetworks separated from other blocks by a few links. It also guarantees homogeneity throughout the sectors, since it is expected that each block is also formed by similar nodes. Finally, when sectors are defined, the TN is reincorporated and a set of candidate pipes is established. This set is formed by the pipes that separate sectors, and those that separate sectors from the TN. From this set, the pipes that are established as entrances to each sector are defined. This definition is carried out by means of a multi-objective optimization process performed with ASO. In this process, reliability, operative, and economic criteria are optimized. Once again, the process does not provide a single solution, but a set of feasible solutions, from where the staff of the water utility can select the most convenient solution.

Despite the good results obtained by the application of the proposed process, the results cannot be considered completely optimized, as some aspects are still unconsidered. The first two are the effect of the pressure reduction over the frequency of new pipe break-bursts and over a reduction of domestic consumption. The third aspect is the economic level of leak reductions, which must be evaluated before any exploration of sectorization layouts.

This analysis is carried out in the most critical scenario in order to give a high level of importance to the reliability of the network. Therefore, once a specific sectorization layout (or a few options) is selected it is necessary to perform extended period simulations.