## Abstract

This paper presents a novel method using a clustering, detection, and optimization model to devise a solution of fully automatic partitioning in a water distribution system (WDS). First, the Black Hole Clustering Algorithm is employed to divide the WDS into different partitions. Second, two types of outliers are eliminated by multistage iterative processes including traverse, k-Nearest neighbor, and the Warshall algorithm. Finally, the boundary conditions of the partitions are optimized by a Non-dominated Sorting Porcellio Scaber Algorithm to minimize the number of boundary pipes required to balance pressures and reduce leakages. Seven WDSs are employed as case studies to verify the practicability of the method. The Open Water Analytics toolbox is applied to code the hydraulic calculation program. The result demonstrates that average pressure and leakage cost decreases after optimization.

## HIGHLIGHTS

A novel partition method based on monitors pressure sensitivity analysis was presented.

Partition errors were examined by a new outliers searching model.

The minimum number of valves, pressure uniformity function, and leakage uniformity function were employed to optimized WDSs.

## INTRODUCTION

Fossil energy is one of the most important resources for humans to survive and develop. About 4% of the yearly total energy power is consumed via the water distribution system (WDS) in developed nations (Pasha & Lansey 2014). Additionally, up to 15% of water from a WDS is lost in developed nations and this loss is even higher, 35%, in developing nations (Zhang *et al.* 2016). In China, the leakage rate of water was more than 27% (Laiyun 2015). Reducing leakage can save costs and improve the quality, efficiency, and safety of the water supply, promoting socio-economic development. In order to reduce leakage, it is essential to reconstruct and extend old urban WDSs.

The best approach to reconstruction efforts is to determine the optimal partitions in a WDS. Partition means to divide the WDS into different zones, which is widely applied in many cities around the world, and many studies have presented the methods of optimal partition problems.

Malcolm and Farley presented the District Metered Area (DMA) model in 1989. In this model, WDS can be divided into different partitions. In each partition, pressure-reducing valves (PRVs) are installed. Control of these valves can reduce leakage (Ulanicki *et al.* 2003). However, this is a static model with only one partition solution for any time. With the development of sensor equipment, massive data is more easily acquired. Data analysis technology can be applied in WDS to create an energy-saving condition (Wu & Rahman 2017). An optimal partition can convert the static to a dynamic model by using sensor equipment. Zhang and Shen presented a channel networks model, which is a dynamic embedded model (Zhang & Shen 2007). The layers of partitions are determined and then analyzed layer by layer to set the boundaries of valves and sensors. According to the sensor data, valves that can be opened or closed in a WDS are gradually optimized. Finally, the state of the steady and unsteady flow can be obtained.

Due to limited conditions of economy and manpower, a real WDS cannot directly be used to divide from the start, thus simulation software should be used to precheck the effectiveness of the partition model. EPANET 2.0 is a calculator for the hydraulic model from the U.S. Environmental Protection Agency that can be used for extended period simulation (EPS) (Rossman 2000) and is able to conveniently process real-time hydraulic simulation. EPANET 2.0 can estimate the conditions of WDSs if the model meets any hydraulic limiting requirements (Price & Ostfeld 2016) and can also be re-developed with the Visual studio or Matlab platform. For instance, Porteau 4.0, a modification of the EPANET software, has been developed, which can partition a WDS into zones and optimize pipes, valves, and tanks (Gilbert *et al.* 2017).

Nowadays, one-key generation technologies are widely developed. A lot of time and manpower can be saved because of the qualities of easy use of the personal computer. Although significant efforts have been made to develop partitioning technologies in WDS, few can realize fully automatic partitioning. Here, a fully automatic partitioning method in WDS is presented to expedite partition speed for engineers. In this method, engineers only need to input a WDS file into a mini software to solve the program.

The full-automatic partitioning methodology is described in the methodology section. A Black Hole Clustering Algorithm was firstly applied to a WDS in the partitioning algorithm subsection. To eliminate generated outliers due to false clustering, three detection methods including traverse, k-Nearest neighbor, and the Warshall model were applied to remove outliers in the outliers detection subsection. Furthermore, the optimization algorithm subsection showed that the Non-dominated Sorting and Porcellio Scaber algorithms were combined to solve a multi-objective optimal problem in WDS. In the case studies section, seven WDSs were employed to verify the practicability of the method, and the results are presented. The conclusion and improvement of this work are presented in the conclusion section.

## METHODOLOGY

### Partitioning algorithm

The Black Hole Clustering Algorithm (BHCA) was used to divide the WDS into different partitions for the first time. This method was presented by Hatamlou (2013), and references the form of black holes in space. A black hole is created when a giant planet collapses. Its gravitation is so intense that nothing can escape if it enters within the Schwarzschild radius. In other words, anything that enters this radius will disappear forever.

*l*, the maximum number of iterations are set to

*I*, and the star size-population size

*P*is manually set. Next, nodes in WDS weights are considered as variables. In this application, the pressure differences at monitor points between leakage and normal states serve as the stars (weights). The difference equation can be simply expressed as follows:where is the pressure difference of the monitor point

*j*between node

*i*leakage and the normal amount at time

*t*; is the pressure of the monitor point

*j*at time

*t*when node

*i*is experiencing leakage; is the pressure of the monitor point

*j*at time

*t*when node

*i*is normally functioning and not leaking.

*i*weights at time

*t*;

*m*and

*n*are the number of nodes and monitor points, respectively.

Next, the average of the matrix value was calculated as: , where *T* is the total time.

*c*can be randomly generated, which is similar to the matrix where each value range is greater than the minimum pressure difference and less than the maximum pressure difference in matrix . The formalism of this matrix is as follows:where

*l*represents the number of stars, or the WDS partitions number;

*n*is the number of monitor points; and is a vector of the partition center

*i*values.

*i*weight and the center of partition (). is a diagonal matrix including

*n*elements whose diagonal elements are , where is the degree of membership of in matrix . Bezdec (1981) defined the calculation of by following equation:where

*m*is the power exponent ().

For each node, the least distance value is selected in Equation (6), and allocated to divide the node into partitions with respect to the minimum membership degree [].

*p*groups objective function and standard of clustering results are solved. Searching the objective function for the minimum solution in the population size of groups provides the local optimal solution. Then, the black holes with a Schwarzschild radius

*R*can be represented as:

Next, calculate each distance between the partition centers in groups and the local optimal solution (Farzadnia & Shirazi 2017). If a center distance is greater than the Schwarzschild radius *R* of the black hole, the stars (partition centers) around it are absorbed. Meanwhile, new stars will be regenerated, randomly.

Next, the objective function minimum solution is recalculated and the center location of partitions is updated until reaching the maximum iterations *I*. Finally, the best optimal solution with the least objective function value is obtained. WDS is divided into different partitions using Equation (6). The process of clustering is illustrated in Figure 1.

### Outliers detection

After clustering, WDS is divided into *l* partitions. However, some outliers exist in each partition. For instance, as shown in Figure 2, a few nodes belonging to partition 1 are falsely allocated into the zone of partition 2, but are not connected with other nodes of partition 2. These nodes are called as the first outliers. In the same way, monitor points are also assumed as first outliers that are falsely allocated to partition 0 (temporary partition).

To eliminate the first outliers, the connectivity of all of nodes should be determined. To do this, a partition is selected and the connectivity of all nodes are traversed in this zone. If a node is not connected with any other nodes in this zone, this information will be recorded in the database and its partition index is then set to 0 (temporary partition). Second, other partitions are then selected and the nodes are traversed until all partitions are iterated. Then, all nodes in partition 0 are allocated to adjacent zones (excluding partition 0) so that nodes of partition 0 are connected to a partition with nodes with more connections than the other partitions. In this way, the first outliers are eliminated.

As shown in Figure 3, the partition index of two outliers is first set to 0. The connection states of the nodes are then analyzed. Outliers 1 and 2 are connected by two nodes of partition 1 and the zero node of partition 2. Two nodes are allocated to partition 1 because the number of connected nodes in partition 1 is greater than node in partition 2.

Although the first outliers are eliminated and the partition becomes integrated, nodes of each partition are still not completely connected. Nodes may be isolated because the clustering algorithm does not consider coordinates including the horizontal position (x, y) and the elevation (z) as weights, considered as second outliers, as illustrated in Figure 4.

These outliers are eliminated in two steps:

In the first step, a kind of K-Nearest neighbor (KNN) detection algorithm is employed (Valizadeh *et al.* 2009), which is a distance detection method based on a few outliers distance where the outliers are far from others. The advantage of KNN is that the distributed method is integrated, making the proximity measure of outliers more easily achieved than by using other statistical distributions methods.

*N*is equal to 4, four nearest distances, 198.20 m, 484.16 m, 431.32 m, and 596.33 m are selected. According to Equation (10), is calculated as follows:

d_{1j}
. | Distance (m) . |
---|---|

2 | 596.33 |

3 | 789.64 |

4 | 491.32 |

5 | 484.14 |

6 | 198.20 |

d_{1j}
. | Distance (m) . |
---|---|

2 | 596.33 |

3 | 789.64 |

4 | 491.32 |

5 | 484.14 |

6 | 198.20 |

(427.50 m) belongs to the boundaries; thus, node 1 is not a second outlier.

When all nodes in every partition have been searched, the acquired outliers of partitions index are set to 0.

In the second step, although all outlier nodes have been searched, a few nodes of each partition remain not connected. To address this, the Warshall algorithm (Gao 2014) is employed to judge the connectivity of nodes at each partition.

The Warshall algorithm was presented by Warshall (1962), and is an efficient way to judge undirected graph connectivity. In this algorithm, the reachability of nodes at each partition is expressed by an adjacent matrix, in which rows and columns are used to represent the node numbers. The elements are equal to 1 if two nodes are connected or 0 if they are not connected. This method starts from the first column diagonal element and finds all nodes connected with it in this column, recording the states of connection. By traversing the diagonal element for all columns, an array can be obtained. Next, starting from this array of rows, all connected nodes are found and all the subsets are deleted until the array elements do not change.

An example of an undirected graph is shown in Figure 5.

*A*of Figure 5 can be defined as follows:where is the connection between nodes

*i*and

*j*. For example, is equal to 1 because nodes 3 and 4 are connected. is equal to 0 because nodes 3 and 5 are not connected.

Then, the connected states of the undirected graph are calculated as shown in Figure 6.

After judging the connectivity of nodes, the maximum number of connected nodes in a zone is retained for each partition and the other partition index values are set to 0. If a partition has two maximum connected nodes subsets, only the first is retained. Here, 1, 2, 3, and 4 nodes are retained and 5, 6, 7, and 8 of the partition number are set to 0 in Figure 6.

When two steps of the second outliers detection are completed, the first outlier detection process is repeated. The multistage iterative process is performed until no outliers can be detected. The outlier detection process is illustrated in Figure 7.

### Optimization algorithm

After outlier detection, WDS is divided into different partitions without any abnormal nodes, which can be made sure by checking the connectivity of each partition. Next, it is important to decide whether the connection pipes between two partitions, as shown in Figure 8, are open or closed to balance the pressure of the network and reduce leakage. This can be implemented by using valves on these pipes. To solve this problem, a novel method that combines two algorithms into a Non-dominated Sorting Porcellio Scaber Algorithm (NSPSA) was developed as a multi-objective function optimization method.

The Non-dominated Sorting (NS) algorithm is a lower complexity procedure than other approaches (Deb *et al.* 2002). NS requires two parameters to be calculated including the domination count whose solutions dominate *p* and the domination solutions , which is dominated by variable *p*. The Porcellio Scaber Algorithm (PSA) is a new method to solve optimization problems, and is inspired by the observation of two survival rules of a species of woodlice, aggregation and propensity (Zhang & Li 2017). PSA is like a genetic algorithm (GA), which is a kind of group optimization algorithm. But it contains some new features compared with GA: one is the optimal swarm solutions for searching in the solution space without crossover and mutation process, which can avoid getting into local optimum and is easy to implement and does not have many parameters to be adjusted.

In this analysis, three objective functions are employed to optimize WDS for balancing pressure and reducing leakage.

*et al.*2017):where is the connection pipe at partition

*i*when the valve is open, or ; is the sum of the WDS partition boundary pipes; is the sum of the open connection pipes.

*n*is the number of nodes; is the pressure of node

*i*at time

*t*; is the minimum of all node pressures at time

*t*; is the average pressure of a node at time

*t*; is the pressure uniformity value.

*i*,

*j*(connecting with pipe ), respectively.

*t*; is the water level of tank

*i*; and are the minimum and maximum, respectively, of all tank water levels at time

*t*.

For better balanced pressure, we hope to obtain the best solution of under the best solution and . Thus, NSPSA is applied to solve for the minimum values of three objectives, and the process is as follows:

In the first step, parameters were all initialized:

*G*for Porcellio Scabers

*X*randomly, and if the state of the pipe is opened or closed. For instance, if

*G*is 3 and the connection pipes are 4, the variable matrix

*X*is:where is the value of pipes

*j*for population

*i*.

In the second step, were solved using Equations (14) and (19), and the pressure solved by hydraulic calculation for calculating and . Finally, compare the boundaries of and . If they meet the requirement, retain and , or set them to Inf (infinity).

In the third step, a non-dominated sort (Srinivas & Deb 1994) was done:

Then, find all the individuals of whose functions are better than the other individuals in the population and save those into set and mark as rank 1. Next, traverse and find the for each number *i*. Traverse each of all the individual and then execute . If , save it into set *Q*. Then, = , and repeat this process until is a null set.

Because all function values in (1) are less than (2), (3), is equal to 0 and (2), (3) are combined to the and .

Then, mark each element of (1) to rank 1. Similarly, (2) is less than (3) but larger than (1), so is equal to 1 and . (3) is larger than (1), (2), so is equal to 2.

is not a null set; traverse . and = 1−1 = 0, so .

Mark element of (2) to rank 2. -1 = 2-1 = 1, reject it. = and traverse . -1-1 = 2-1-1 = 0, so and mark element of (2) to rank 3. = , and traverse .

Then, is a null set, so *Q* is a null set. = and is null.

Finally, (1), (2) and (3) are ranked as 1, 2, and 3.

*m*.

*p*is a function to map the fitness of Porcellio Scaber to an action strength. The equation

*p*is as follows:where is a random vector of the same dimension as

Then, combine half of the old population with half of the new population, which are all half of higher sort elements.

Finally, update the PSA and execute step (2) to (5) until reaching the maximum iteration *I*.

The process of clustering is illustrated in Figure 9.

## CASE STUDY

A WDS case study of Exa7 (Bentley Systems 2013), shown in Figure 10, was employed to verify the practicability of the above method. The network has 381 junctions (including nine SCADA pressure monitor points), 469 pipes, three pumps, and one tank. The average water usage is 5,220 m^{3}/h. The leakage discharge is set to 1% of the nodal demand if there is node demand or 3% of the average demand of 156.60 m^{3}/h, equal to 4.70 m^{3}/h. The WDS partition index *l* is 10. The time interval of hydraulic calculation is set to one hour in EPS and the maximum time is set to 24th hour.

The system is as follows: CPU is i7-3410 m; RAM is DDR3-16G; ROM is ssd-256G; OS is Windows 10, and Matlab-2016a, and Open Water Analytics (OWA) toolbox is used.

Multistage iterative full-automatic partitioning method in WDS of Exa7 is performed as follows:

- 1.The Open Water Analytics (OWA) toolbox (McDonnell
*et al.*2007) is used for hydraulic calculations. This is an EPANET classlib of Matlab developed by the KIOS Research Center for Intelligent Systems and Networks, University of Cyprus. First, monitor points are selected based on the pressure sensitivity analysis model. In Exa7 WDS, we can select monitors via the method of Zhang*et al.*2016 for a reference. It will be concisely illustrated in this section. Hydraulic analysis of the normal state is simulated in WDS and pressures are obtained in each node. Then, increase water demand of a node as a leakage state and hydraulic analysis is processed. Each node of water demand is applied in the same way to obtain all leakage states. After calculation, pressures in each leakage state can be solved. Next, difference of unit pressure values can be solved as follows:where*p*and_{i}*p′*are the pressures of node_{i}*i*in the normal and leakage states, respectively;*p*and_{k}*p’*are the pressures of node_{k}*k*in the normal and leakage states, respectively.

*x*

_{ik}_{-}are standardized by:where

*x′*is the standardization result of

_{ik}*x*; is the average of all the unit pressure differences between node k and others; is the standard deviation of node k.

_{ik}*x’’*is obtained. Finally, euclidean distance is employed to select monitors from value of smallest to biggest following Equation (26). If adjacent monitor points exist, only select the best one.where is the euclidean distance;

_{ik}*N*is the number of nodes.

Second, the initial pressures of monitor points are calculated in the normal state by OWA. The leakage discharge value is added to the demand of a node (without monitor points for difference inspection) to simulate the leakage state, and the pressures of monitor points are hydraulically calculated again. Then, the difference of the two results is determined as the pressure difference of this node. Similarly, the leakage of other nodes can be simulated and the pressure differences are calculated. These results can then be used to construct a difference matrix (372 rows and 9 columns). In the same way, all time difference matrices can be calculated. These can be used to calculate the difference average value across time. The matrix of Equation (2) can be obtained.

- 2.
The matrix (10 rows and 9 columns) of the partition center position is randomly produced. The BHCA is used to divide WDS into 10 partitions. The power exponent

*m*is set to 2 in this case. The result of clustering is shown in Figure 11(a). - 3.
Outliers exist in each partition. Thus, two kinds of outliers should be eliminated. First, a partition is selected and the connectivity of all nodes is searched for the first outliers. Then, the other partitions are searched. When all partitions have been searched, the first outlier partition index can be set to zero. Then, the first outliers are allocated to partitions that are connected with them. These nodes connected it number in a partition are greater than the other partitions shown in Figure 11(b). Secondly, the node distance at each partition is evaluated to search for second outliers. The KNN model is used to detect outliers that are far away from groups, and the k-nearest value

*N*is set to 10. Then, the node partition index is set to 0. Next, the connectivity of nodes at each partition is evaluated by Warshall algorithm. If a partition is not connected, the largest connected node number group is retained and the other group index values are set to 0, as shown in Figure 11(c). Finally, the first outlier is subjected to the multistage iterative function until all outliers are removed. - 4.
After outlier detection, this case is divided into different partitions without abnormal nodes, as shown in Figure 11(d). Next, the connection pipes between two partitions were determined, as shown in Figure 12. The number of connection pipes is 35.

Then, the NSPSA model is used to optimize WDS. Three equations (Equations (14)–(16)) are designed to minimize the number of boundary pipes, and make the pressure and leakage balances. The range of solutions was limited by two constraint conditions, the range of node pressure was set to 10 m and 50 m, and the range of water level of tank was set to 10 m and 25 m. The maximum iteration *I* was set to 100 and the update weight is 0.8. The solution set of the three equations are shown in Figure 13 and the top ten results are shown in Table 2. The run time of this WDS file is 3,468.69s (contained cluster time 24.68s and optimized time 3,444.01s).

Connection pipes . | Pressure uniformity . | Leakage uniformity . |
---|---|---|

11 | 444.42 | 23.65 |

13 | 441.30 | 23.63 |

13 | 440.57 | 23.65 |

13 | 440.57 | 23.65 |

15 | 436.36 | 23.64 |

14 | 438.72 | 23.63 |

15 | 437.85 | 23.63 |

Connection pipes . | Pressure uniformity . | Leakage uniformity . |
---|---|---|

11 | 444.42 | 23.65 |

13 | 441.30 | 23.63 |

13 | 440.57 | 23.65 |

13 | 440.57 | 23.65 |

15 | 436.36 | 23.64 |

14 | 438.72 | 23.63 |

15 | 437.85 | 23.63 |

The best solution of the number of connection pipes was selected to minimize the leakage uniformity equation. Thus, the point: (11, 444.42, 23.65) should be considered as the best solution, as shown in Table 3. Pressure reducing valves (PRV) were used to improve pressure management. The pressure and leakage uniformity were reduced by 162.60 and 3.35 respectively compared to the values before (607.03, 27.01), where are calculated by opening all of the boundary pipes.

Partitions . | Nodes . | Pipes . | Connection pipes . | Connection pipes (open) . |
---|---|---|---|---|

1 | 56 | 67 | 8 | 2 |

2 | 81 | 93 | 16 | 6 |

3 | 28 | 31 | 6 | 1 |

4 | 40 | 46 | 9 | 5 |

5 | 64 | 68 | 7 | 2 |

6 | 31 | 34 | 5 | 2 |

7 | 6 | 7 | 3 | 1 |

8 | 37 | 48 | 10 | 3 |

9 | 14 | 14 | 3 | 1 |

10 | 26 | 29 | 3 | 1 |

Total | 383 | 437 | 70/2 = 35 | 24/2 = 12 |

Partitions . | Nodes . | Pipes . | Connection pipes . | Connection pipes (open) . |
---|---|---|---|---|

1 | 56 | 67 | 8 | 2 |

2 | 81 | 93 | 16 | 6 |

3 | 28 | 31 | 6 | 1 |

4 | 40 | 46 | 9 | 5 |

5 | 64 | 68 | 7 | 2 |

6 | 31 | 34 | 5 | 2 |

7 | 6 | 7 | 3 | 1 |

8 | 37 | 48 | 10 | 3 |

9 | 14 | 14 | 3 | 1 |

10 | 26 | 29 | 3 | 1 |

Total | 383 | 437 | 70/2 = 35 | 24/2 = 12 |

At the end of this model, 24 hour periods are selected to calculate the node average pressure and the pipe average leakage of each partition. The results are compared with those before partition optimization, as illustrated in Figures 14 and 15.

Seven example WDSs were used to test the method and demonstrate that it is universally feasible. The studied WDSs include three Kentucky (KY) systems (Jolly *et al.* 2014) from the Kentucky statewide WDS database, KY3, KY5, and KY11. The fourth and fifth are BWSN1 and BWSN2, developed by Ostfeld *et al.* 2008. The sixth is an example system distributed with EPANET called Net3 (Hwang & Lansey 2017). The final one is a case study of City F in China. Monitor points selection is based on a pressure sensitivity analysis model according to Amin *et al.* 2013 except for BWSN2, which used data of Zhengyi Wu in Ostfeld *et al.* 2008. The parameters of these WDSs are listed in Table 4 and the optimized setting values are:

WDS . | Nodes . | Pipes . | Tanks . | Reservoirs . | Pumps . | Monitors . |
---|---|---|---|---|---|---|

KY3 | 273 | 349 | 3 | 3 | 5 | 10 |

KY5 | 423 | 496 | 3 | 4 | 11 | 10 |

KY11 | 805 | 846 | 28 | 1 | 23 | 10 |

BWSN1 | 126 | 168 | 2 | 1 | 2 | 10 |

BWSN2 | 12,523 | 14,822 | 2 | 2 | 4 | 20 |

Net3 | 92 | 117 | 3 | 2 | 2 | 10 |

City-F | 920 | 1,032 | 0 | 1 | 6 | 10 |

WDS . | Nodes . | Pipes . | Tanks . | Reservoirs . | Pumps . | Monitors . |
---|---|---|---|---|---|---|

KY3 | 273 | 349 | 3 | 3 | 5 | 10 |

KY5 | 423 | 496 | 3 | 4 | 11 | 10 |

KY11 | 805 | 846 | 28 | 1 | 23 | 10 |

BWSN1 | 126 | 168 | 2 | 1 | 2 | 10 |

BWSN2 | 12,523 | 14,822 | 2 | 2 | 4 | 20 |

Net3 | 92 | 117 | 3 | 2 | 2 | 10 |

City-F | 920 | 1,032 | 0 | 1 | 6 | 10 |

Different WDSs need to set on different parameters. In Ky3, is set to 35 m and 100 m. The range of water levels of three tanks are set to [42.88, 48.98], [44.88, 52.50] and [28.69, 39.36] m, respectively. In Ky5, is set to 30 m and 150 m. The range of water levels of three tanks are set to [17.58, 25.18], [7.18, 17.85], and [8.19, 18.85] m. In Ky11, is set to 40 m and 200 m. The range of water levels of the 28 tanks are set to [0.10, 4.67], [0.00, 3.96], [30.07, 39.21], [12.22, 20.15], [1.87, 11.92], [3.80, 10.50], [13.83, 20.54], [28.02, 35.64], [18.22, 21.27], [22.38, 30.00], [0.20, 9.34], [33.89, 38.46], [30.74, 39.89], [25.99, 33.61], [24.47, 30.56], [0.13, 8.66], [41.03, 51.69], [0.22, 7.54], [18.65, 26.57], [34.33, 43.17], [21.19, 25.76], [0.07, 8.00], [18.91, 25.01], [32.94, 39.65], [15.70, 21.80], [12.18, 19.19], [14.47, 27.28] and [6.35, 10.92] m, respectively. In BWSN1, is set to 15 m and 510 m. The range of water levels of two tanks are set to [0, 9.75] and [0, 12.77] m. In BWSN2, is set to 5 m and 100 m. The range of water levels of tanks are set to [0, 7.17] and [1.22, 5.29] m. In Net3, is set to 10 m and 150 m. The range of water levels of three tanks are set to [0, 9.78], [1.98, 12.28], and [1.22, 10.82] m. The number of partitions is then set to 5. In City F, is set to 10 m and 40 m and there are no tanks in WDS.

The corresponding topological graphs are shown in Figure 16. The best optimized results for each WDS are given in Table 5 and illustrated in Figure 17.

WDS . | Partitions . | Connection pipes . | Connection pipes (open) . | Pressure uniformity . | Leakage uniformity . | Cluster time (s) . | Optimized time (s) . | Total time (s) . |
---|---|---|---|---|---|---|---|---|

KY3 | 10 | 22 | 15 | 0.18 | 1.03 | 11.48 | 111.65 | 123.13 |

KY5 | 10 | 27 | 21 | 0.11 | 10.75 | 17.54 | 134.27 | 151.81 |

KY11 | 10 | 13 | 12 | 0.02 | 0.53 | 70.32 | 1,340.21 | 1,410.53 |

BWSN1 | 5 | 14 | 5 | 2.66 | 0.18 | 131.28 | 1,713.03 | 1,844.31 |

BWSN2 | 10 | 94 | 40 | 31.44 | 260.35 | 2,883.54 | 163,666.67 | 166,550.21 |

Net3 | 5 | 12 | 8 | 2.32 | 2.07 | 5.66 | 33.95 | 39.61 |

City-F | 10 | 36 | 29 | 0.13 | 0.07 | 71.15 | 1,229.85 | 1,301.00 |

WDS . | Partitions . | Connection pipes . | Connection pipes (open) . | Pressure uniformity . | Leakage uniformity . | Cluster time (s) . | Optimized time (s) . | Total time (s) . |
---|---|---|---|---|---|---|---|---|

KY3 | 10 | 22 | 15 | 0.18 | 1.03 | 11.48 | 111.65 | 123.13 |

KY5 | 10 | 27 | 21 | 0.11 | 10.75 | 17.54 | 134.27 | 151.81 |

KY11 | 10 | 13 | 12 | 0.02 | 0.53 | 70.32 | 1,340.21 | 1,410.53 |

BWSN1 | 5 | 14 | 5 | 2.66 | 0.18 | 131.28 | 1,713.03 | 1,844.31 |

BWSN2 | 10 | 94 | 40 | 31.44 | 260.35 | 2,883.54 | 163,666.67 | 166,550.21 |

Net3 | 5 | 12 | 8 | 2.32 | 2.07 | 5.66 | 33.95 | 39.61 |

City-F | 10 | 36 | 29 | 0.13 | 0.07 | 71.15 | 1,229.85 | 1,301.00 |

The results show that the pressure and leakage uniformity values at each partition for each specific case are reduced and partitions success in division. In other words, energy and economic losses are decreased due to the balanced pressure and reduced leakage. Thus, the multistage iterative fully automatic partitioning method is a practicable method for WDSs control and management.

## CONCLUSION

A multistage iterative methodology with fully automatic partitioning is presented. In this methodology, BHCA is employed to divide WDS into different partitions. First and second outliers are eliminated by multistage iterative methods using traverse, KNN, and Warshall algorithms. Finally, the boundaries of each partition are optimized by NSPSA. The methodology is verified by case studies of real WDSs. The results indicate that the method is a practicability way to partition WDSs, balance pressures and reduce leakages.

Although it can automatically process a WDS file for partition, there are limits to this methodology. Monitor points are selected by pressure sensitivity analysis model, where it might be necessary to manually exclude monitor points before running the program. The second outliers are detected by application of the Warshall algorithm, and outliers are allocated in the clusters with the largest number of nodes, but some nodes might be set to an unfitted zone, which also might change the pressure sensitivity variations of the monitors. Future work should develop a new way to select monitors that meets the requirements above. And second oulier detection methods should be improved to adjust pressure sensitivity variations.

## ACKNOWLEDGEMENT

This work was supported by the Shenzhen Science and Technology Plan Program (KJYY20170413170501147) and the Major Science and Technology Program for Water Pollution Control and Treatment (2014ZX07406-003).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.