Partitioning water distribution networks (WDNs) into district metered areas offers benefits including reduced nonrevenue water and simplified pressure management. However, current research in this field tends to narrowly focus on the topological relationships among nodes, often overlooking the influence of pressure reducing valves (PRVs) and pump stations on clustering results. To address this limitation, this study introduces a topology-distance-based clustering (TDBC) method that enhances the accuracy of partitioning by explicitly considering the impact of PRVs and pump stations. In the TDBC method, the Floyd algorithm is initially employed to construct a topological distance matrix that quantifies the degree of node connectivity. By amplifying topological distances for links including PRVs and pump stations, their effect on clustering results is incorporated. Subsequently, nodes are clustered using the K-means algorithm based on the resulting topology-distance matrix. The proposed TDBC approach is applied to four network cases, and its outcomes are compared with those of two traditional methods. The comparative analysis indicates that the TDBC algorithm achieves precise partitioning results for networks incorporating PRVs or pump stations, while ensuring a harmonious balance between modularity and the uniformity of the partitioning results, even in networks with complex structures and highly interconnected loops.

  • Incorporation of PRVs and pump stations: The TDBC method introduces a novel approach by explicitly incorporating the influence of pressure reducing valves (PRVs) and pump stations into the clustering process.

  • Topological distance-based clustering: The TDBC method shifts from traditional geographical proximity-based methods to a topological distance-based approach.

  • Improved clustering uniformity.

Treated water, being an indispensable resource for human survival, is commonly distributed to customers through networks (Alamanos 2021; Li & Yu 2022; Tufa & Abate 2022). However, with the expansion of urban areas, water distribution networks (WDNs) are also growing in size, presenting significant management challenges (Albarakati et al. 2021; Liu & Kang 2021; Zhang et al. 2022). It is widely acknowledged that partitioning WDNs into district metered areas (DMAs) offers several benefits (Saldarriaga et al. 2019). These benefits include reducing nonrevenue water, simplifying pressure management, facilitating rapid detection of burst pipes, and isolating districts to prevent contamination (Bui et al. 2020; Diao 2021). Therefore, the partitioning of WDNs into smaller sub-networks (i.e., DMAs) has become a prominent and extensively studied topic (Nardo et al. 2017; Creaco & Haidar 2019).

The manual partitioning WDNs into DMAs presents challenges due to the complex network topology, especially in large-scale systems with significant looping. Hence, extensive research has been dedicated to automating the creation of DMAs over the past few decades. Notably, Nardo & Natale (2011) and Nardo et al. (2013, 2020) have been significant contributors to utilizing optimization techniques based on graph theory for automating the design of DMAs. Additionally, Deuerlein (2008) and Perelman & Ostfeld (2011) developed topological clustering tools for WDN analysis; however, these were not specifically designed for WDN partitioning. In contrast, Liu & Han (2018) presented a three-step methodology employing spectral clustering and graph theory for effective DMA design. Building upon this progress, Liu & Lansey (2020) introduced a multi-phase DMA design method to address the partitioning of complex WDNs. More recently, Creaco et al. (2022) introduced a novel multi-objective genetic algorithm with linear programming for efficient WDN partitioning.

Diao et al. (2013) pioneered the concept of community-based partitioning for WDNs, utilizing modularity as a key indicator. Subsequently, researchers in WDN partitioning have devoted considerable attention to identifying community structures, characterized by strong internal connections, and assessing modularity (Bui et al. 2020). However, it has been observed that maximizing modularity may lead to the formation of impractically large clusters (Zhang et al. 2017). Hence, the consideration of other metrics, such as uniformity, has become indispensable. For instance, Liu et al. (2018) compared different methods for creating DMAs and highlighted the balance of partitions as a critical indicator. Additionally, Zhang et al. (2021) proposed a modified modularity approach to represent elevation uniformity within DMAs. Recently, Creaco et al. (2023) introduced a novel modularity formulation and algorithm for the optimal clustering of dual WDNs, ensuring uniform properties and improved hydraulic performance. Nevertheless, it is noteworthy that while the aforementioned studies have significantly contributed to WDN partitioning, they focus solely on the topological relationships between pipes and nodes. None of them takes into account existing pressure reducing valves (PRVs) and pump stations, despite these components directly influencing the partitioning results. Therefore, an essential challenge remains in incorporating PRVs and pump stations into the automatic partitioning process.

Detecting communities in complex networks using unsupervised machine learning, especially the K-means clustering algorithm, constitutes a noteworthy research field due to its simplicity, efficiency, and adaptability in defining the desired number of clusters. For instance, Li et al. (2019) integrated K-means with density and canopy clustering, demonstrating superior accuracy and visualization capabilities for social and biological networks. Zhou et al. (2020) introduced the CPMK-Means algorithm, addressing convergence and time complexity issues and outperforming other algorithms across various metrics. In a different domain, Chaudhary & Singh (2021) employed K-means clustering to analyze the impact of COVID-19, uncovering hidden community structures among countries. In the context of WDN partitioning research, Hajebi et al. (2013) applied the K-means algorithm to geographically cluster the network and employed a multi-agent system for boundary node adjustments based on hydrological constraints. Nonetheless, further validation is necessary to ascertain its effectiveness in larger WDNs.

In summary, DMA design involves two primary phases: clustering, which forms DMAs based on network connectivity and topology; and sectorization, which optimizes the placement of flow meters and valves for network performance and cost minimization. This study specifically focuses on the first phase of network clustering and presents an innovative topology-distance-based clustering (TDBC) method for WDN partitioning. Compared to existing methods, the significant contribution of this research lies in its explicit integration of PRVs and pump stations during the clustering process, aiming to achieve a more accurate DMA design.

Specifically, PRVs and pump stations are indispensable components in a WDN: PRVs lower high water pressure to protect infrastructure and conserve water, while pump stations elevate pressure to ensure a consistent flow, especially in areas with elevation differences. As a result, these elements naturally define water supply zone boundaries and should be integrated into the DMA design of a WDN. To explicitly account for their impact on automatic network partitioning, the proposed TDBC method introduces topological distance to quantify node connectivity and amplifies the distance for links containing PRVs and pump stations. Nodes are then clustered using the K-means algorithm based on this topology distance matrix, ensuring that networks can be divided according to the specified clustering number.

The remaining sections of the paper are as follows: Section 2 starts by presenting the schematic of this research work. It is then followed by a description of the proposed TDBC method. Additionally, the modularity-based fast greedy clustering (MFGC) method is briefly introduced. Section 3 provides information about the four networks used in this study. In Section 4, the application results of three different methods to the four networks are compared and analyzed. Section 5 summarizes the main findings of this study.

The schematic of this research work is illustrated in Figure 1. As can be observed from this figure, based on four networks, the study compares three clustering methods: the traditional geographical clustering-based partitioning (GCBP) method proposed by Hajebi et al. (2013), the TDBC method developed in this paper, and the MFGC method presented by Diao et al. (2013). The objective is to investigate their difference in clustering mechanisms and assess their effectiveness in networks containing PRVs and pump stations. Additionally, a performance comparison is conducted based on the metrics of modularity and uniformity.
Figure 1

The schematic of this research work.

Figure 1

The schematic of this research work.

Close modal

In this study, the GCBP and TDBC methods were developed and executed within the MATLAB environment, while the MFGC method was specifically implemented using the Python language. Additionally, network files sourced from EPANET (Rossman 2000) were employed to construct the adjacency matrix, thereby facilitating the creation of the topology distance matrix and the successful execution of the MFGC method.

The proposed TDBC method

Figure 2 illustrates the implementation steps of the proposed TDBC method. Step 1 uses the Floyd algorithm to construct the topological distance matrix, where each element represents the count of pipes in the shortest path between nodes. Therefore, this matrix effectively captures the degree of connectivity between nodes in a network. Additionally, to consider the impact of PRVs on clustering results, the topological distance of pipes containing PRVs or pump stations is assigned a value of 10, while others have a distance of 1. This is indicated by the red numbers for the topological distance between node 1 and node 5, considering the presence of PRVs. In step 2, the K-means clustering algorithm partitions the network based on this matrix, aiming to cluster nodes with strong connectivity for an automated partitioning process. The Floyd and K-means clustering algorithms will be introduced as follows.
Figure 2

The implementation steps of the proposed TDBC method.

Figure 2

The implementation steps of the proposed TDBC method.

Close modal

The Floyd algorithm, a fundamental technique in graph theory and network analysis, efficiently computes the shortest paths between all pairs of vertices in a graph, even when negative edge weights are present (Hougardy 2010). This versatility makes it a valuable choice for addressing various real-world problems in this study. The key advantages of using the Floyd algorithm are as follows: Firstly, it can efficiently determine all-pairs shortest paths in a single run, making it particularly advantageous for scenarios requiring computation of multiple shortest paths simultaneously. Secondly, the algorithm's simplicity and ease of implementation render it accessible to researchers and developers with varying levels of expertise. Thirdly, the Floyd algorithm's ability to handle graphs with negative edge weights, provided no negative cycles are present, enhances its applicability in real-world models involving negative weights. For further implementation details of the Floyd algorithm, please refer to Zhao & Zhao (2017).

The K-means algorithm is a widely used clustering technique for partitioning a given dataset into K distinct clusters. It operates iteratively by minimizing the total intra-cluster variance, ensuring convergence to a stable solution. Based on the constructed topological distance matrix, the K-means algorithm treats the nodes in a WDN as data points in a multi-dimensional space. By assigning each data point to the cluster with the nearest mean, it naturally lends itself to partitioning WDNs. This is attributed to its ability to cluster nodes with smaller topological distances, resulting in tightly connected nodes within each cluster while maintaining sparse connectivity between different clusters. Consequently, the incorporation of the topological distance matrix renders the K-means algorithm a valuable tool for effectively achieving this objective in WDN partitioning tasks.

One of the key advantages of the K-means algorithm lies in its simplicity and efficiency. It efficiently handles large datasets and can be easily implemented, making it a practical choice for various clustering tasks. Additionally, the algorithm's straightforward interpretation empowers users to gain insights into the clustering process and make informed decisions based on the resulting clusters. In this study, we adopt the approach proposed by Arthur & Vassilvitskii (2007) for selecting initial centroids, as it has been proven to yield more stable and improved clustering results compared to the standard K-means method.

The MFGC method

The MFGC algorithm has gained widespread adoption as a prominent technique for clustering networks and partitioning them into communities, with its primary objective being the optimization of the modularity measure that assesses the quality of the resulting partition (Newman 2004). The modularity value can be computed as follows:
formula
(1)
formula
(2)
formula
(3)
formula
(4)
where m is the total number of edges; Aυω is an element of the adjacency matrix of the network (Aυω = 1 if vertices υ and ω are connected; otherwise, Aυω = 0); kυ and kω are the degrees of nodes υ and ω, denoting the number of edges connected to each respective node; δ(cυ, cω) equals 1 when they both belong to the same community, otherwise, it is 0. Equation (1) can be considered a scoring function that evaluates the actual proportion of intracommunity edges against its expected value in the randomized network. The MFGC method operates through iterative processes of merging or splitting nodes to maximize the modularity score, which reflects the density of edges within communities compared to those between communities.

Diao et al. (2013) were among the pioneers in introducing the MFGC algorithm for WDN partitioning. The rationale behind their work stems from the goal of achieving clustering results for WDNs where nodes within each cluster are tightly interconnected, while connections between different clusters are sparse. This arrangement is expected to be advantageous as it allows for the implementation of DMAs with a reduced number of flow meters and valves, leading to more efficient water management. Furthermore, as WDNs expand in size due to the growth of residential communities, they naturally exhibit community-like properties. Hence, the application of the MFGC algorithm to detect and partition communities within WDNs is justified. Further implementation details for the MFGC algorithm can be found in Diao et al. (2013).

As depicted in Figure 3, four networks were employed as case studies to facilitate diverse investigative purposes. The first network, referred to as the simple network, was an artificially constructed WDN consisting of merely eight nodes with a straightforward structure. Nonetheless, a PRV was deliberately introduced to examine the algorithm's ability to accurately partition the nodes, duly accounting for the presence of this PRV. On the other hand, the C-town network encompassed numerous pump stations, which showcased distinctive partitioning characteristics. This particular network offered a valuable opportunity to assess the effectiveness of the proposed algorithm in scenarios where pump stations are present. To elaborate, the C-town network was hypothetically established by Ostfeld et al. (2012) for model calibration and was subsequently employed as the benchmark network in the 2014 WDSA Conference's Battle of Background Leakage Assessment for Water Networks (Giustolisi et al. 2016).
Figure 3

Layout of four networks used as case studies.

Figure 3

Layout of four networks used as case studies.

Close modal

The Anytown and rural networks represented a class of networks with complex structures and highly interconnected loops, or they were large-scale complex DMAs requiring further partitioning. These two networks were selected to test the algorithm performance in terms of modularity and uniformity when applied to complex network partitioning scenarios. The Anytown distribution system, conceived by Walski et al. (1987) for a Battle of the Networks competition aimed at enhancing analysis methods, is a hypothetical system. The rural network, on the other hand, represents a hydraulic model of an actual network derived from a piped irrigation system. It was initially employed by Marchi et al. (2014) to assess the performance of various evolutionary algorithms in optimizing WDSs. For comprehensive network information, including the EPANET files for both the C-town and Anytown networks, please refer to the Kentucky Water Resources Research Institute's website. Detailed information about the rural network can be found in Marchi et al. (2014).

Comparison with the traditional GCBP method

This section compares the GCBP method with the newly proposed TDBC method on the simple and C-town networks, investigating clustering mechanisms and partitioning effectiveness with PRVs and pump stations. Both methods use the K-means clustering algorithm, but GCBP relies on geographical coordinates, while TDBC uses the topological distance matrix.

Partitioning results for the simple network

As shown in Figure 4, both the GCBP and proposed TDBC methods aim to partition the eight nodes into two clusters. However, the GCBP method yields erroneous partitioning results for node 4 and node 8. Specifically, in Figure 4(a), nodes 1, 2, 3, and 8 are erroneously grouped together, while nodes 5, 6, 7, and 4 are also incorrectly grouped together. This is because node 8 is spatially closer to node 3, and similarly, node 4 is closer to node 6. Consequently, this indicates that relying solely on the geographical information of the network nodes cannot lead to correct clustering results.
Figure 4

Partitioning results from two methods applied to the simple network.

Figure 4

Partitioning results from two methods applied to the simple network.

Close modal

In contrast, the proposed TDBC method clusters node 4 with nodes 1, 2, and 3, and node 8 with nodes 5, 6, and 7, as depicted in Figure 4(b). The rationale behind the results can be attributed to the direct connections between node 4 and node 2, the direct connection between node 8 and node 7, as well as the presence of the PRV between these two clusters. Therefore, this indicates that the proposed TDBC method takes into account the pipe connectivity, as well as the presence of PRVs to achieve accurate partitioning results.

To gain insights into the fundamental principles of each method, visual depictions of the distance relationships between nodes derived from the GCBP and proposed TDBC methods are presented in Figure 5. For the GCBP approach, as indicated by Figure 5(a), distance relationships are established based on the geographical coordinates associated with each node. On the other hand, for the proposed TDBC method, as indicated by Figure 5(b), the relationships are formed using the topological distance matrix provided in Figure 2. Note that the connectivity relationship is established with consideration of the clustering center centroid. Specifically, in Figure 5(b), the clustering centroid is positioned to approximate the midpoint between node 6 and node 7. As a result, no connection is depicted between node 6 and node 7, despite the fact that these nodes are merely one topological distance apart.
Figure 5

Distance relationships between nodes derived from the geographical coordinates (a) and topology distance matrix (b).

Figure 5

Distance relationships between nodes derived from the geographical coordinates (a) and topology distance matrix (b).

Close modal

As illustrated by Figure 5(a), the traditional GCBP method prioritizes nodes with shorter geographical distances for clustering, such as nodes 4 and 6, as well as nodes 3 and 8. However, Figure 5(b) displays the proposed TDBC algorithm, which clusters nodes based on closer topological distances, such as node 2 with node 4 and node 7 with node 8. This distinction highlights that the proposed method focuses on the level of connectivity between nodes rather than their geographical distances. In conclusion, the comparative analysis reveals distinct clustering mechanisms, with the TDBC method outperforming in achieving accurate network partitioning. This is attributed to its incorporation of nodal connectivity and the presence of PRVs, effectively improving clustering accuracy.

Partitioning results for C-town network

Figure 6 shows the partitioning results of four methods: the manual partitioning method (Figure 6(a)), the proposed TDBC methods with and without the magnified topological distances for pump stations (Figure 6(b) and 6(d)), and the traditional GCBP method (Figure 6(c)). As depicted in Figure 6(a), the existence of pump stations within the network facilitates the division into five distinct subareas, making manual partitioning straightforward. Hence, the partitioning result obtained from the manual method serves as a valuable benchmark for assessing partitioning results obtained by other methods. Figure 6(b) and 6(d) compare the TDBC method with different topological distance assignments for pump stations (388 and 1, respectively), confirming the effectiveness of magnified topological distances in achieving more accurate results. The value of 388 corresponds to the total number of nodes in the network.
Figure 6

Partitioning results from four approaches applied to the C-town network.

Figure 6

Partitioning results from four approaches applied to the C-town network.

Close modal

Figure 6(b) reveals that the proposed TDBC method produces identical partitioning results to the manual method, demonstrating its effectiveness in handling real-world networks, including pump stations. On the other hand, Figure 6(d) indicates less accurate results for the proposed TDBC method without magnified topological distances for pump stations. This observation further validates the rationale behind the TDBC method, which emphasizes the importance of magnifying topological distances of pump stations to achieve more precise network partitioning.

Additionally, Figure 6(c) demonstrates that the traditional GCBP method produces impractical partitioning results. This is evident from the fact that the GCBP method groups together a large number of nodes that lack connections to each other. This is a result of the method's sole reliance on the geographical locations of nodes, with no consideration for node connections or network topology. Consequently, the resulting clusters significantly deviate from the manual method's results. This emphasizes the infeasibility of the traditional GCBP method for effectively clustering complex networks with pump stations.

Comparison with the MFGC method

This section compares the proposed TDBC method with the MFGC method on the Anytown network and the rural network, using modularity and uniformity metrics. The aim is to understand the capabilities and limitations of these two clustering methods.

Partitioning results for Anytown network

Figure 7 displays the partitioning results obtained from the proposed TDBC method and the MFGC method on the Anytown network, considering different numbers of clusters ranging from two to five. The TDBC method is depicted in Figure 7(a)–7(d), while the MFGC method is shown in Figure 7(e)–7(h). As can be seen, despite using the same number of clusters, the two algorithms yield divergent cluster patterns. Notably, the clustering results generated by the MFGC method (Figure 7(e)–7(h)) demonstrate merging relationships, where smaller communities combine to form larger ones. Conversely, the clustering results obtained from the proposed TDBC method do not exhibit such merging relationships. This disparity underscores the fundamental differences in the clustering mechanisms employed by the proposed TDBC method and the MFGC approach. In order to further assess the performance difference between these two methods, the metrics of modularity and uniformity are employed for evaluating the partitioning results, as discussed below.
Figure 7

Partitioning results from the TDBC (a–d) and MFGC (e–h) methods applied to the Anytown network.

Figure 7

Partitioning results from the TDBC (a–d) and MFGC (e–h) methods applied to the Anytown network.

Close modal
Figure 8 presents the modularity values for different clusters obtained by the proposed TDBC method and the MFGC method. From this figure, it is observed that when partitioning the network into four or five clusters, the MFGC method exhibits higher modularity values, indicating stronger internal connectivity. However, in the case of two or three clusters, the proposed TDBC algorithm outperforms the MFGC method, suggesting a more logical and cohesive partition structure. Besides, the MFGC approach attains the maximum modularity value of 0.375 with four clusters, while the proposed TDBC method achieves the maximum modularity value of 0.374 with three clusters. Therefore, the application of the proposed TDBC method proves to be advantageous when aiming to partition the network into a relatively small number of clusters.
Figure 8

Modularity values for different clusters using two clustering methods.

Figure 8

Modularity values for different clusters using two clustering methods.

Close modal
Figure 9 provides a comparison of node uniformity across four clusters obtained from the two methods, allowing for an evaluation of their performance in achieving uniformity. In this context, ‘node uniformity’ refers to the distribution of node counts among the clusters. The figure incorporates visual representations such as the maximum (top bar), minimum (bottom bar), 25th and 75th percentiles (box top and bottom edges), median (line in the box), and mean (cross). Note that in addition to modularity, achieving similar numbers of nodes across different clusters is a crucial objective, as stated previously.
Figure 9

Node uniformity across different clusters for two different methods.

Figure 9

Node uniformity across different clusters for two different methods.

Close modal

Figure 9 illustrates the superior performance of the proposed TDBC algorithm compared to the MFGC method in achieving greater node uniformity across the four clustering scenarios. Specifically, the box plots representing the TDBC algorithm exhibit smaller interquartile ranges, thereby indicating better uniformity in node counts across the clusters. In contrast, the MFGC method displays larger box plots, suggesting greater discrepancies in node counts among the clusters. Additionally, as the number of clusters decreases, the box plots of the MFGC method gradually increase in size, indicating a decrease in node count uniformity. However, the proposed TDBC algorithm consistently maintains smaller box plots across different cluster numbers, thereby demonstrating its ability to maintain balanced node counts.

Furthermore, Figure 9(a) provides a noteworthy observation when the network is partitioned into two clusters. The box plot of the proposed TDBC algorithm shows a straight line, indicating an equal number of nodes in both clusters. Conversely, the MFGC method demonstrates poorer performance in terms of node uniformity, as evidenced by one cluster containing 7 nodes while the other cluster contains 15 nodes, exhibiting a significant difference. This stark contrast serves as compelling evidence for the significant advantage of the proposed TDBC method in achieving balanced node counts during the clustering process. To shed light on the inferior performance of the MFGC method in terms of node uniformity, Figure 10 presents a comparative analysis of the clustering mechanisms employed by the proposed TDBC method and the MFGC method.
Figure 10

Comparison of the clustering mechanism of the MFGC (a) and TDBC (b) methods.

Figure 10

Comparison of the clustering mechanism of the MFGC (a) and TDBC (b) methods.

Close modal

As depicted in Figure 10, the MFGC and TDBC methods demonstrate fundamentally different clustering mechanisms, with MFGC relying on the consolidation of small communities for clustering, while the TDBC method achieves clustering through the generation of new centroids. Figure 10(a) shows that the MFGC method achieves a final partitioning scheme comprising two clusters by merging four clusters through three rounds of community merging. Specifically, C5 is initially merged with C1, resulting in a community comprising six nodes. Subsequently, C3 is merged with C4, leading to a community with nine nodes. Finally, these two merged communities are combined to form a larger community consisting of 15 nodes. It is important to note that throughout this process, the C2 community does not participate in any community merging, resulting in an uneven distribution of nodes among the final two communities. This discrepancy can be attributed to the mechanism of the MFGC method, which prioritizes the maximization of the current modularity Q without taking node uniformity into account. Consequently, as the desired number of clusters decreases, there is a tendency to merge large communities with other large communities, while the size of small communities remains unchanged. This ultimately leads to nonuniformity among nodes within clusters, highlighting the prioritization of modularity at the expense of node uniformity.

As depicted in Figure 10(b), the TDBC method, built upon the K-means algorithm, achieves clustering by generating dispersed cluster centroids. This implies that the clustering process is executed autonomously for each partition, and the formation of larger communities does not necessitate the amalgamation of smaller ones. Consequently, this avoids the issue of community merging that is evident in the MFGC method. As a result, the TDBC method ensures a more balanced distribution of node counts among the clustering partitions. In addition, the proposed TDBC method utilizes topological distance for node clustering, resulting in autonomous grouping of nodes with strong connections within their respective clusters. This aspect contributes to the achievement of high modularity in the obtained clustering results.

Application results for rural network

Figure 11 shows the modularity values obtained by the proposed TDBC method and the MFGC method when applied to the rural network. The modularity values are calculated for clusters ranging from 2 to 15. From Figure 11, it is evident that the proposed TDBC method achieves results that are very close to those of the MFGC method in terms of modularity values across different cluster sizes. It is worth noting that the MFGC method achieves a maximum modularity value of 0.821 with 13 clusters, while the proposed TDBC method attains a maximum modularity value of 0.808 with 12 clusters. Generally, the MFGC method performs slightly better than the proposed TDBC method for larger cluster sizes. However, for smaller cluster sizes, such as two clusters, the proposed TDBC method exhibits a higher modularity value compared to the MFGC algorithm.
Figure 11

Modularity comparisons for different clusters using two methods to the rural network.

Figure 11

Modularity comparisons for different clusters using two methods to the rural network.

Close modal

Overall, the proposed TDBC method consistently achieves favorable outcomes in terms of modularity, with all values surpassing 0.3, despite modularity (Q) not being the optimization criterion. According to Newman & Girvan (2004), a modularity value greater than 0.3 indicates satisfactory petitioning results with significant community structure. Therefore, these results suggest that the proposed TDBC method effectively partitions the network into clusters that align with the community structure, characterized by strong internal connections.

To further investigate the performance of the algorithms, Figure 12 presents a comparison of node uniformity between two methods with different cluster numbers (5, 10, and 15 partitions). As depicted in the figure, the MFGC method yields larger box plots for all three clustering scenarios, indicating greater variations in number of nodes among the clusters. Besides, it is noteworthy that when the network is partitioned into five clusters, two significant outliers are observed, indicating the presence of both excessively large and small cluster sizes within that particular clustering scheme. Specifically, as illustrated in Figure 13(a), cluster 1 (blue) exhibits an excessively large node size, whereas cluster 5 (yellow) displays an undersized node population. This discrepancy arises from the gradual merging process, where cluster 1 continuously incorporates smaller ones, resulting in an increasing number of nodes. In contrast, cluster 5 maintains its original size throughout the merging process, leading to a substantial disparity in node counts among the clusters.
Figure 12

Node uniformity comparison for two methods applied to the rural network with different clusters.

Figure 12

Node uniformity comparison for two methods applied to the rural network with different clusters.

Close modal
Figure 13

Clustering results of the rural network from the MFGC (a) and TDBC (b) methods.

Figure 13

Clustering results of the rural network from the MFGC (a) and TDBC (b) methods.

Close modal

In contrast, the proposed TDBC method consistently generates smaller and more consistent box plots across the three clustering schemes, implying more comparable number of nodes among the clusters, as indicated by Figure 13(b). These findings indicate that the TDBC algorithm is capable of producing clustering schemes that are more uniformly distributed across various partition scenarios. This adaptability is achieved by updating the centers for each clustering implementation, thereby avoiding situations with excessively large or small cluster sizes. Consequently, the proposed TDBC method ensures more balanced clustering sizes and mitigates the issue of extreme disparities observed in the proposed MFGC method. In summary, within the framework of networks possessing intricate structures, the TDBC method, when compared to the MFGC method, produces clustering patterns with comparable modularity, yet it showcases less variation in node count across the clusters. This suggests that the TDBC method holds substantial promise as an approach to enhance clustering performance within such networks.

This paper presents a novel TDBC method for automatic partitioning of WDNs, which serves as the first step in forming DMAs. The motivation behind this method arises from the oversight in previous studies, where the incorporation of the PRVs and pump stations into the clustering process was neglected, despite their significant impact on the partitioning results. The proposed approach involves two key steps: firstly, constructing a topological distance matrix, and secondly, employing K-means clustering for automated partitioning, wherein the topological distance for links containing PRVs or pump stations is magnified. Comparative evaluations are conducted among the results obtained from three methods, including the traditional GCBP method, the proposed TDBC method, and the MFGC method, across four distinct networks, and the main findings are summarized.

  • (1)

    The proposed TDBC algorithm achieves correct partitioning results for networks that include PRVs or multiple pump pumps. However, the traditional GCBP method proves to be impractical when applied to these types of networks, as evidenced by the erroneous partitioning results observed in both the cases of a simple network and the C-town network.

  • (2)

    The new TDBC method differs significantly from the GCBP method in its partitioning approach. Instead of using geographical proximity, the TDBC method uses topological distance to assess node connectivity. This shift leads to partitioning results with greater modularity values, even though modularity is not the main optimization goal. This is confirmed by the clustering results for Anytown and rural networks, where modularity surpassed 0.3. These results confirm TDBC's effectiveness in creating well-structured clusters.

  • (3)

    The TDBC algorithm outperforms the MFGC method in terms of clustering uniformity, even though it may have slightly lower modularity values. This distinction is especially evident when dealing with fewer clusters. The MFGC method has a tendency to merge smaller communities to create larger ones, prioritizing modularity at the expense of uniformity. Conversely, the TDBC algorithm executes the clustering independently for each implementation. This guarantees that the establishment of larger communities does not require merging smaller ones, thus reducing the occurrence of significantly varying cluster sizes and enhancing uniformity within the clusters.

In conclusion, the proposed TDBC can serve as a valuable alternative for automatically partitioning WDNs. This is attributed to its explicit consideration of the impact of components such as valves and pumps on the network partitioning, coupled with its ability to maintain a balance between community attributes and the uniformity of partitioning results. However, the validation of the proposed TDBC method in this study is confined to relatively small networks, a limitation requiring future research attention. Furthermore, given its successful application in addressing PRVs and pump stations in DMA design, there is significant potential to expand the method's scope to encompass pipes crossing rivers or administrative districts. This expansion, involving the incorporation of pseudo-PRVs on these pipes, represents an important avenue for future research.

This work was funded by the National Natural Science Foundation of China (52260011).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Alamanos
A.
2021
Water resources planning under climate and economic changes in Skiathos island, Aegean
.
Journal of Water Supply: Research and Technology – AQUA
70
(
7
),
1085
1093
.
Albarakati
A.
,
Tassaddiq
A.
&
Kale
Y.
2021
Evaluation of the vulnerability in water distribution systems through targeted attacks
.
Journal of Water Supply: Research and Technology – AQUA
70
(
8
),
1257
1271
.
Arthur
D.
&
Vassilvitskii
S.
2007
k-means + +: The advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)
.
Creaco
E.
,
Zheng
F.
&
Pezzinga
G.
2022
Minimum transport-driven algorithm for water distribution network partitioning
.
Journal of Water Supply: Research and Technology – AQUA
71
(
1
),
120
138
.
Creaco
E.
,
Giudicianni
C.
&
Mottahedin
A.
2023
Improved community detection for WDN partitioning in the dual topology based on segments and valves
.
Journal of Hydroinformatics
25 (4), 1341–1357.
Deuerlein
J. W.
2008
Decomposition model of a general water supply network graph
.
Journal of Hydraulic Engineering
134
(
6
),
822
832
.
Diao
K.
2021
Towards resilient water supply in centralized control and decentralized execution mode
.
Journal of Water Supply: Research and Technology – AQUA
70
(
4
),
449
466
.
Diao
K.
,
Zhou
Y.
&
Rauch
W.
2013
Automated creation of district metered area boundaries in water distribution systems
.
Journal of Water Resources Planning and Management
139
(
2
),
184
190
.
Giustolisi
O.
,
Berardi
L.
,
Laucelli
D.
,
Savic
D.
&
Kapelan
Z.
2016
Operational and tactical management of water and energy resources in pressurized systems: competition at WDSA 2014
.
Journal of Water Resources Planning & Management
142
(
5
),
C4015002
.
Hajebi
S.
,
Barrett
S.
,
Clarke
A.
&
Clarke
S.
2013
Multi-agent simulation to support water distribution network partitioning
.
European Simulation and Modelling Conference
2013
,
163
168
.
Hougardy
S.
2010
The Floyd-Warshall algorithm on graphs with negative cycles
.
Information Processing Letters
110
(
8
),
279
281
.
Li
J.
&
Yu
R.
2022
A study on villages of the planning of water infrastructure planning: 25 villages in Shaanxi Province as an example
.
Journal of Water Supply: Research and Technology – AQUA
72
(
1
),
19
31
.
Li
M.
,
Wang
H.
,
Long
H.
,
Xiang
J.
,
Wang
B.
,
Xu
J.
&
Yang
J.
2019
Community detection and visualization in complex network by the Density-Canopy-Kmeans algorithm and MDS embedding
.
IEEE Access
7
,
120616
120625
.
Liu
J.
&
Han
R.
2018
Spectral clustering and multicriteria decision for design of district metered areas
.
Journal of Water Resources Planning and Management
144
(
5
), 04018013.
Liu
J.
&
Kang
Y.
2021
Segment-based resilience response and intervention evaluation of water distribution systems
.
Journal of Water Supply: Research and Technology – AQUA
71
(
1
),
100
119
.
Liu
J.
&
Lansey
K. E.
2020
Multiphase DMA design methodology based on graph theory and many-objective optimization
.
Journal of Water Resources Planning and Management
146
(
8
), 04020068.
Marchi
A.
,
Dandy
G.
,
Wilkins
A.
&
Rohrlach
H.
2014
Methodology for comparing evolutionary algorithms for optimization of water distribution systems
.
Journal of Water Resources Planning and Management
140
(
1
),
22
31
.
Nardo
A. D.
,
Natale
M. D.
,
Santonastaso
G. F.
&
Venticinque
S.
2013
An automated tool for smart water network partitioning
.
Water Resources Management
27
(
13
),
4493
4508
.
Nardo
A. D.
,
Natale
M. D.
,
Giudicianni
C.
,
Santonastaso
G.
,
Tzatchkov
V.
&
Varela
J.
2017
Economic and energy criteria for district meter areas design of water distribution networks
.
Water
9
(
7
),
463
.
Nardo
A. D.
,
Natale
M. D.
,
Mauro
A. D.
,
Díaz
E. M.
,
Garcia
J. A. B.
,
Santonastaso
G. F.
&
Tuccinardi
F. P.
2020
An advanced software to design automatically permanent partitioning of a water distribution network
.
Urban Water Journal
17
(
3
),
259
265
.
Newman
M. E. J.
2004
Fast algorithm for detecting community structure in networks
.
Physical Review E
69
(
6
),
066133
.
Newman
M. E. J.
&
Girvan
M.
2004
Finding and evaluating community structure in networks
.
Physical Review E
69
(
2
),
026113
.
Ostfeld
A.
,
Salomons
E.
,
Ormsbee
L.
,
Uber
J. G.
&
Bros
C. M.
2012
Battle of the water calibration networks
.
Journal of Water Resources Planning and Management
138
(
5
),
523
532
.
Perelman
L.
&
Ostfeld
A.
2011
Topological clustering for water distribution systems analysis
.
Environmental Modelling & Software
26
(
7
),
969
972
.
Rossman
L. A.
2000
EPANET 2 User's Manual
.
U.S. EPA
,
Cincinnati
.
Saldarriaga
J.
,
Bohorquez
J.
,
Celeita
D.
,
Vega
L.
,
Paez
D.
,
Savic
D.
,
Dandy
G.
,
Filion
Y.
,
Grayman
W.
&
Kapelan
Z.
2019
Battle of the water networks district metered areas
.
Journal of Water Resources Planning and Management
145
(
4
), 04019002.
Tufa
G.
&
Abate
B.
2022
Assessment of accessibility and hydraulic performance of the water distribution system of Ejere Town
.
Journal of Water Supply: Research and Technology – AQUA
71
(
4
),
577
592
.
Walski
T. M.
,
Brill
E. D.
Jr.
,
Gessler
J.
,
Goulter
I. C.
,
Jeppson
R. M.
&
Lansey
K.
1987
Battle of networks models: epilogue. Journal of water resources planning and management 113 (2), 191–203
.
Zhang
Q.
,
Wu
Z. Y.
,
Zhao
M.
,
Qi
J.
,
Huang
Y.
&
Zhao
H.
2017
Automatic partitioning of water distribution networks using multiscale community detection and multiobjective optimization
.
Journal of Water Resources Planning and Management
143
(
9
), 04017057.
Zhang
T.
,
Yao
H.
,
Chu
S.
,
Yu
T.
&
Shao
Y.
2021
Optimized DMA partition to reduce background leakage rate in water distribution networks
.
Journal of Water Resources Planning and Management
147
(
10
), 04021071.
Zhang
C.
,
Liu
H.
,
Pei
S.
,
Zhao
M.
&
Zhou
H.
2022
Multi-objective operational optimization toward improved resilience in water distribution systems
.
Journal of Water Supply: Research and Technology – AQUA
71
(
5
),
593
607
.
Zhao
L.
&
Zhao
J.
2017
Comparison study of three shortest path algorithm
. In
2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC)
.
Zhou
Z.
,
Xiao
Z.
&
Deng
W.
2020
Improved community structure discovery algorithm based on combined clique percolation method and K-means algorithm
.
Peer-to-Peer Networking and Applications
13
(
6
),
2224
2233
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).