ABSTRACT
Online monitoring is increasingly essential for the effective management and operation of urban sewer systems, yet resource limitations necessitate careful planning of sensor deployment. This study aims to address the impact of time lags on monitoring point selection in urban drainage systems using unsupervised machine learning techniques. A novel method is introduced to determine the optimal number and placement of sensors in manholes, using cluster analysis informed by simulated time-series data. The proposed methodology involves two sequential stages: the first stage clusters time-series data based on morphology similarity using the time-lagged cross-correlation (TLCC) coefficient, which measures the temporal alignment between datasets. The second stage further refines these clusters by considering magnitude similarity, employing dynamic time warping distance to quantify shape-based similarities and improve clustering accuracy. The proposed approach allows for flexible threshold adjustments to accommodate specific engineering requirements, enabling the design of monitoring strategies tailored to a predetermined number of locations. Furthermore, the study explores the impact of rainfall intensity on sensor placement, providing actionable guidance for sewer managers to improve monitoring efficiency and address urban water management challenges.
HIGHLIGHTS
A novel bilayer iterative clustering approach is proposed to optimize urban drainage monitoring.
Time-lagged cross-correlation and dynamic time warping improve clustering precision.
The methodology is validated using both a hypothetical network and a real-world case study.
Monitoring strategies address resource limitations and adapt to varying rainfall intensities.
INTRODUCTION
The Internet of Things (IoT) has become a vital tool for managing urban drainage systems (UDS), providing real-time monitoring that enables the early detection of potential risks and a comprehensive understanding of sewer conditions (Lee et al. 2016; Edmondson et al. 2018). IoT systems also facilitate the accumulation of long-term dynamic data, which support the evaluation and optimization of sewer network operations (Kang et al. 2012; Bai et al. 2021). To obtain reliable and accurate monitoring data, it is crucial to optimize the placement of monitoring points a key area of focus (Schilperoort et al. 2008; Banik et al. 2015). However, limited human and financial resources often constrain the deployment of monitoring systems (Meijer et al. 2018; Vonach et al. 2018). Thus, the challenge lies in achieving effective monitoring with the fewest possible monitoring points (Banik et al. 2017; Sambito & Freni 2021).
Optimal sensor placement has been extensively studied in related fields, including contamination detection in water distribution networks, surface water monitoring, and flood forecasting (Rathi & Gupta 2014a; Fattoruso et al. 2015; Rathi et al. 2016; Adedoja et al. 2018; Jiang et al. 2020). Meta-heuristic algorithms, such as ant colony optimization, genetic algorithms, and particle swarm optimization, are frequently employed to identify optimal sensor locations (Afshar et al. 2015; Gong et al. 2022; Harif et al. 2023). While these algorithms can navigate complex search spaces to identify near-optimal solutions, they often face challenges related to the computational burden, subjectivity, and reduced robustness in large-scale systems (Rathi & Gupta 2014b; Sharma & Kumar 2022).
Clustering techniques have emerged as an efficient alternative to reduce computational complexity. For example, Perelman and Ostfeld (2012) proposed clustering network structures to identify pollution sources and place monitoring points at cluster centers, followed by optimization within each cluster. This method improves efficiency and provides valuable insights into network topology (Guo et al. 2018; Cardoso et al. 2021). Similarly, information theory has been applied to evaluate the information value and redundancy of potential monitoring points, enhancing decision-making and reducing computational costs (Khorshidi et al. 2018; Brentan et al. 2021). These approaches demonstrate the value of predetermining potential sensor locations before applying optimization algorithms.
Despite advancements in related fields, optimizing monitoring locations in UDS remains underexplored. Existing studies often leverage data-driven machine learning techniques, such as cluster analysis, to analyze time-series data generated by monitoring systems (Li et al. 2020; Li 2021). For instance, Guo et al. (2018) used fuzzy clustering to select monitoring points based on simulated scenarios, while Wang et al. (2023) integrated hydraulic and water quality attributes with decision-making factors to refine clustering methods. However, in UDS, rainfall can experience delays as it flows through pipes and nodes (Talei & Chua 2012; Zhang et al. 2023). Traditional monitoring strategies often overlook these time lags, but accounting for them is vital to accurately tracking the system's dynamic changes and enabling effective responses. Moreover, existing studies on clustering time-series data lack controllability and cannot achieve fine-grained control over the clustering degree in line with the actual engineering needs.
Cluster analysis relies on similarity measures to group objects, making the choice of similarity metric critical (Basaran & Günes 2016). Traditional measures, such as Euclidean distance and Pearson's correlation coefficient, are commonly used but are unsuitable for time-series data with time lags or gaps (Popivanov & Miller 2002; Gontijo et al. 2020; Li et al. 2022). The dynamic time warping (DTW) algorithm addresses these limitations by aligning time-series data flexibly, accounting for temporal delays between measurement points (Dai et al. 2014; Lee et al. 2020). Additionally, time-lagged cross-correlation (TLCC) offers a robust method to quantify alignment and similarity between sequences (Tal et al. 2011; Tóth & Balogh 2012). Both TLCC and DTW distances allow for a more accurate identification of time differences between monitoring points, thus optimizing sensor placement.
This study aims to address the impact of time lags on monitoring point selection in UDS using unsupervised machine learning techniques. A novel bilayer iterative clustering method was developed to analyze the similarity relationships among monitoring points while optimizing their number and placement. The methodology was validated on a virtual sewer network and applied to a real-world case study in Ningbo, China, demonstrating its effectiveness in enhancing monitoring strategies for UDS.
METHODOLOGY
Overview of the proposed framework
(1) Data collection and model development: Data on the pipe network, topography, rainfall, and other relevant factors are collected to develop a drainage network model of the study area using simulation software (InfoWorks ICM).
(2) Model calibration and simulation: The model is calibrated and validated using limited existing monitoring data, and simulations are conducted to generate time-series data at all nodes under various rainfall conditions.
(3) Threshold customization: Setting appropriate thresholds is crucial for refining the clustering process. The outer threshold, based on the Pearson correlation coefficient, determines the similarity of time series in terms of morphology. The inner threshold, derived from the Euclidean distance formula, further refines the clusters by accounting for differences in magnitude. Both thresholds are adjustable, allowing them to be tailored to the specific characteristics of the data and the goals of the analysis.
(4) Morphology similarity clustering: Time series are clustered based on morphology similarity using the TLCC coefficient as the similarity measure. TLCC quantifies the alignment of temporal patterns between two time series while accounting for time delays caused by flow dynamics. The first-layer threshold is applied to group nodes into clusters that exhibit similar temporal patterns, ensuring an appropriate level of clustering resolution.
(5) Magnitude similarity clustering: Each morphology-based cluster is further refined by clustering for magnitude similarity, employing DTW distance as the similarity measure. DTW aligns time series by stretching or compressing segments to minimize differences, enabling the identification of nodes with similar magnitude profiles, even if timing variations exist. The second-layer threshold is applied during this stage to further refine the clustering process.
(6) Selection of monitoring points: Representative nodes from each refined cluster are selected as optimal monitoring locations based on their ability to best represent the cluster's overall characteristics. The selection process prioritizes nodes with the smallest average distance to all other points within the cluster, ensuring that the chosen locations are centrally located. Additionally, the quality of the clustering is assessed using the silhouette coefficient (SC), with nodes from clusters that exhibit higher silhouette scores being favored.
One should be aware that the order of conducting morphology and magnitude similarity clustering hinges on the particular data set and monitoring goal, and different orders will result in different clustering outcomes. This study exemplifies the case of first conducting morphology similarity clustering and then conducting magnitude similarity clustering. In this case, the clustering results will emphasize the morphology characteristics of the samples, and the magnitude similarity will contribute to further refining the clustering outcomes.
Development of the clustering model
The clustering methodology serves as a powerful tool for analyzing multivariate data by grouping objects based on their similarity or distance. Objects within the same cluster exhibit high similarity, while significant differences exist between clusters. Hierarchical clustering, a classic partitioning method, constructs a hierarchical structure of class similarities, enabling the clear visual identification of divisions. Compared to other clustering techniques, hierarchical clustering offers distinct advantages, such as not requiring a predefined number of clusters, flexible definitions for distance and similarity, and fewer constraints.
In this study, a bilayer iterative clustering approach is proposed to identify monitoring locations with precision and flexibility. The methodology consists of two successive stages: the first stage clusters time-series data based on morphological similarity, while the second stage refines these clusters by considering magnitude similarity. In the first stage, the TLCC coefficient is used to evaluate performance similarity, grouping time series into initial clusters. The second stage further subdivides these clusters using the DTW distance as a measure of similarity. Agglomerative hierarchical clustering is employed throughout, relying on the maximum distance between elements within clusters to ensure cohesiveness.
To ensure stable and reliable clustering, the process incorporates iterative analysis under specific threshold constraints. The workflow begins by calculating the TLCC coefficient between all time series in the dataset, forming a similarity matrix. Using hierarchical clustering, the time series are grouped into clusters (Mi, where ) based on an outer threshold δ1. Within each cluster, the similarity of time series falls within the threshold δ1, while the similarity between clusters exceeds this threshold. These clusters are then further refined in the inner layer, where the DTW distance is used as the similarity measure. Applying an inner threshold δ2, each outer cluster is subdivided into smaller clusters (Ni, where
). The total number of clusters is determined as
.
The thresholds δ1 and δ2 are adjustable to meet specific engineering requirements, allowing for tailored clustering refinement. Smaller thresholds result in more detailed clusters but increase the total number of clusters. These thresholds are selected based on the following principles:
(1) Outer threshold (δ1): This is determined by referencing the Pearson correlation coefficient's rank order (e.g., Table 1). Strong positive correlations are typically used to select δ1, ensuring finer clustering of morphology similarity.
(2) Inner threshold (δ2): This is derived from a maximum allowable amplitude difference (d) among monitoring indices with similar morphology but differing magnitudes. Using the Euclidean distance formula, δ2 is calculated as
, where n is the time-series dimension. Smaller values of d result in finer clustering by emphasizing magnitude similarity.
Reference for the threshold δ1 selection
Degree . | Positive correlation . | Negative correlation . |
---|---|---|
Very weak or no correlation | 0.8–1.0 | 1.0–1.2 |
Weak correlation | 0.6–0.8 | 1.2–1.4 |
Moderate correlation | 0.4–0.6 | 1.4–1.6 |
Strong correlation | 0.2–0.4 | 1.6–1.8 |
Extremely strong correlation | 0.0–0.2 | 1.8–2.0 |
Degree . | Positive correlation . | Negative correlation . |
---|---|---|
Very weak or no correlation | 0.8–1.0 | 1.0–1.2 |
Weak correlation | 0.6–0.8 | 1.2–1.4 |
Moderate correlation | 0.4–0.6 | 1.4–1.6 |
Strong correlation | 0.2–0.4 | 1.6–1.8 |
Extremely strong correlation | 0.0–0.2 | 1.8–2.0 |
Time-lagged cross-correlation analysis
Selecting an appropriate similarity measure is essential for accurately clustering time-series data. Metrics such as the correlation coefficient are commonly used to quantify the degree of similarity, with higher values indicating greater similarity. Pearson's correlation coefficient, a widely adopted measure of morphology similarity, is particularly effective for capturing the direction of change in time-series patterns. It accounts for minor local variations in morphology and eliminates the need for prior data normalization. However, Pearson's correlation coefficient is insufficient for monitoring indicators in UDS due to the presence of time lags inherent in water transport processes.
To address this limitation, the cross-correlation function is used in this study to calculate morphology similarity while accounting for time lags. The time lag between two time series is identified at the point where the correlation reaches its maximum value. This maximum correlation value, considered a lag-dependent Pearson correlation coefficient, reflects the degree of similarity between the two time series. Specifically, one series is shifted by a time lag p relative to the other.


DTW for similarity measurement
Comparison of Euclidean distance and DTW. (a) Misaligned time series, (b) Euclidean distance mapping, (c) DTW alignment, and (d) DTW warping path.
Comparison of Euclidean distance and DTW. (a) Misaligned time series, (b) Euclidean distance mapping, (c) DTW alignment, and (d) DTW warping path.
To address this limitation, DTW offers a more flexible approach by allowing elastic shifts along the time axis. DTW accommodates both global phase corrections and local distortions, such as stretched or compressed segments. This algorithm aligns two time series optimally and provides a distance measure to quantify their similarity. As shown in Figure 2(c), DTW achieves an alignment where the distance between corresponding points accurately reflects the true similarity of the time series.
While the classical DTW algorithm effectively handles local time shifts, it can occasionally produce unnatural alignments, known as pathological warping. To address this, a modified DTW algorithm is used in this study. This version introduces constraints on the warping path by limiting the total number of connections during optimization. These constraints reduce the likelihood of incorrect alignments while maintaining flexibility in similarity measurement. The maximum allowable warping path length is used as the time lag between the two time series.
For further details on this modified DTW algorithm, refer to Zhang's work (2017). This enhancement ensures that DTW remains a reliable tool for analyzing time-series data in urban drainage monitoring strategies.
Optimization of monitoring sites
Clustering analysis provides a systematic approach to determine monitoring locations by grouping nodes based on their time series similarity. Varying the similarity threshold δ produces different clustering outcomes. For each threshold δ, monitoring locations are identified as the nodes within each cluster that have the smallest average distance to all other nodes in the cluster.
The SCk value ranges from −1 to 1, with higher values indicating better clustering quality. This metric is used to determine the most suitable number of clusters.
Although increasing the number of clusters may reduce the average distance among time series within each cluster, excessive clustering often leads to diminishing returns due to marginal effects. To balance clustering quality and practicality, a maximum threshold for the number of clusters (kmax) is predefined. The SCk is then calculated for candidate cluster numbers ranging from 2 to kmax. The k value corresponding to the highest SCk is selected as the optimal number of clusters.
In real-world datasets, anomalous time series, such as those caused by sewer surcharges in specific regions, can complicate clustering. When the chosen k is inappropriate, these anomalies may be misclassified with regular time series, leading to a significant drop in SCk. Properly isolating anomalous time series into separate clusters or grouping them together enhances clustering quality and ensures meaningful results. By using SC as a quantitative metric, this approach effectively identifies and addresses such anomalies, improving the reliability of monitoring strategies.
METHOD VALIDATION THROUGH A HYPOTHETICAL NETWORK
Hypothetical network configuration and simulation
Clustering results and analysis
.Flow rate time-series clustering results. (a) Morphology similarity clustering results (Clustering Step 1). (b–g) Magnitude similarity clustering results (Clustering Step 2) based on results from Clustering Step 1: M1–M6. (h) Bilayer clustering results.
.Flow rate time-series clustering results. (a) Morphology similarity clustering results (Clustering Step 1). (b–g) Magnitude similarity clustering results (Clustering Step 2) based on results from Clustering Step 1: M1–M6. (h) Bilayer clustering results.
These six clusters, each representing distinct morphologies, were then used as the basis for inner-layer clustering based on magnitude similarity. Setting the variation threshold δ2 = 0.4386, the corresponding maximum allowable difference in flow rate amplitude (df) = 0.02 m3/s was determined. Clusters, such as M1, M3, M4, and M6, exceeded this threshold, leading to further subdivision into two clusters each. This bilayer clustering approach ultimately produced 10 clusters (Figure 5(b)–5(g)), with maximum magnitude differences within each cluster below δ2.
Water level time series clustering results. (a) Morphology similarity clustering results (Clustering Step 1); (b–d) Magnitude similarity clustering results (Clustering Step 2) based on results from Clustering Step 1: M1-M3. (e) Bilayer clustering results.
Water level time series clustering results. (a) Morphology similarity clustering results (Clustering Step 1); (b–d) Magnitude similarity clustering results (Clustering Step 2) based on results from Clustering Step 1: M1-M3. (e) Bilayer clustering results.
This example demonstrates the effectiveness of the proposed method in distinguishing between the complex behaviors of flow rate and water level indices in sewer networks.
Identification of key monitoring points
The location of monitoring sensors. (a) Flow rate and (b) water level.
For flow rate monitoring, using thresholds δ1 = 0.2 and δ2 = 0.4386 (df = 0.02 m3/s), 10 monitoring locations were identified, covering both main and branch pipes (Figure 7(a)). These locations effectively capture flow rate conditions across the study area at this threshold level while avoiding redundancy.
For water level monitoring, thresholds δ1 = 0.2 and δ2 = 6.5795 (dw = 0.3 m) also resulted in 10 monitoring locations (Figure 7(b)). However, only 60% of these locations overlapped with those for flow rate monitoring. Water level monitoring points were more evenly distributed and concentrated in the main pipe.
The geographic distribution of representative nodes revealed that clustering is influenced by upstream–downstream relationships and pipeline structures. This highlights the need to consider specific monitoring objectives, such as overflow prediction, flood risk assessment, or system optimization, when determining monitoring locations. Different monitoring indicators may require separate or combined analyses to ensure optimal placement.
Evaluation of controllable clustering performance
Adjusting the inner and outer thresholds allows for control over the granularity of clustering, enabling the selection of an optimal balance between precision and practical constraints. The number of clusters is influenced by monitoring requirements and economic limitations.
Final number of clusters and evaluation index SCk adjusted by outer and inner thresholds. (a) Flow rate and (b) water level.
Final number of clusters and evaluation index SCk adjusted by outer and inner thresholds. (a) Flow rate and (b) water level.
The monitoring scheme's validity was assessed using SCk. Figure 10 illustrates that increasing cluster numbers does not always improve SCk; instead, it follows a decreasing trend if clustering becomes unreasonable. For flow rate data, SCk is highly sensitive to δ1, declining when df < 0.03 m3/s before rising. The maximum SCk of 0.793 occurs at δ1 = 0.2 and df = 0.04 m3/s, with eight clusters identified. For water level data, SCk is less influenced by δ1. The maximum SCk of 0.575 is achieved with seven clusters at δ1 = 0.1 and dw = 0.4 m. These results underscore the importance of adjusting thresholds to balance precision with practical requirements, ensuring optimal clustering and monitoring configurations.
This evaluation ensures that the selected clustering configuration maximizes SCk while meeting project-specific monitoring needs and device limitations. The number of clusters corresponds to the number of required sensors, enabling the identification of optimal monitoring locations through rigorous clustering analysis.
REAL-WORLD APPLICATION: BL COMMUNITY
Study area characteristics and data sources
Study area information. (a) Geographical location and (b) land-use information.
To establish a hydraulic model for the drainage system, data on rainfall, sewer network specifications, and hydrological and hydraulic parameters were integrated. Rainfall data were collected using an L3 tipping bucket rain gauge, recording at one-min intervals. During the monitoring period (July 30 to December 31, 2020), rainfall occurred for 35 days, totaling 408 mm. Rainfall return periods (P = 0.5, 5, and 50 years) were calculated using the Ningbo rainstorm intensity formula (Zhejiang Provincial Department of Housing and Urban-Rural Development 2020). Rainfall event characteristics and statistics are summarized in Table 2.
Part of rainfall events
Rainfall events . | Aug. 27 18:30–20:30 . | Sept. 11 9:20–10:20 . | Sept. 14 11:00–18:00 . | P = 0.5 2 h . | P = 5 2 h . | P = 50 2 h . |
---|---|---|---|---|---|---|
Duration of rainfall (h) | 2 | 10 | 7 | 2 | 2 | 2 |
Accumulated precipitation (mm) | 15 | 21.6 | 31 | 31.9 | 74.8 | 117.7 |
Peak rainfall intensity (mm/min) | 0.8 | 0.6 | 0.4 | 1.1 | 2.6 | 4.1 |
Rainfall events . | Aug. 27 18:30–20:30 . | Sept. 11 9:20–10:20 . | Sept. 14 11:00–18:00 . | P = 0.5 2 h . | P = 5 2 h . | P = 50 2 h . |
---|---|---|---|---|---|---|
Duration of rainfall (h) | 2 | 10 | 7 | 2 | 2 | 2 |
Accumulated precipitation (mm) | 15 | 21.6 | 31 | 31.9 | 74.8 | 117.7 |
Peak rainfall intensity (mm/min) | 0.8 | 0.6 | 0.4 | 1.1 | 2.6 | 4.1 |
Flow rate and water level data were gathered during rainfall events via on-site monitoring. Network data, including manhole elevations, pipe dimensions, and invert elevations, were obtained in CAD format from the 2009 pipeline census by the Zhenhai Planning, Surveying, and Design Institute. Hydrological and hydraulic parameters such as land use, vegetation coverage, and surface slopes were also provided by the institute. Manning's roughness coefficient for the drainage pipes was determined based on regional pipe characteristics.
Development of the hydraulic model
The case study hydraulic model was constructed in InfoWorks ICM, incorporating pipeline and manhole data from CAD files. The pipeline network was simplified to represent main and branch pipelines, excluding manholes on road edges. The pipeline topology was carefully reviewed to ensure accuracy.
The model covered a 7.56-hectare area, including 316 sub-catchments, 282 manholes, 279 drainage pipes, one pumping station, and six outlets. The network spans approximately 5,587.8 m, with pipe diameters ranging from 200 to 400 mm (Figure 10). Monitoring points were strategically arranged, focusing on manholes with branch access. These points, marked as blue nodes, include 43 nodes and 44 pipes, with flow monitoring directions indicated by blue pipes. Surface runoff parameters were assigned based on actual land-use conditions, and a fixed percentage runoff model was employed. Confluence dynamics were analyzed using the SWMM model, leveraging detailed land-use data.
Initial model parameters were set based on site-specific conditions and prior studies. The fixed coefficients for runoff, confluence, initial loss, and Manning's roughness were subjected to iterative adjustments to achieve precise model calibration. To accomplish this, calibration and validation were performed using rainfall events from August 27 and September 11, 2020, which provided flow data for two representative nodes (Node 1 and Node 2). These events, with rainfall depths of 15.0 and 21.6 mm over 2 and 10 h, respectively, were used to fine-tune parameters such as runoff coefficients, confluence factors, initial loss, and Manning's roughness. Situated in close proximity to the midstream and downstream regions of the pipeline system, the two designated nodes served as representative locations, effectively representing the overall characteristics of the model.
Results of model calibration and verification
Node number . | Calibration (Aug. 27) . | Verification (Sept. 11) . | ||
---|---|---|---|---|
R2 . | NSE . | R2 . | NSE . | |
Node 1 | 0.88 | 0.87 | 0.78 | 0.77 |
Node 2 | 0.92 | 0.90 | 0.88 | 0.87 |
Node number . | Calibration (Aug. 27) . | Verification (Sept. 11) . | ||
---|---|---|---|---|
R2 . | NSE . | R2 . | NSE . | |
Node 1 | 0.88 | 0.87 | 0.78 | 0.77 |
Node 2 | 0.92 | 0.90 | 0.88 | 0.87 |
Calibration results from data collected on August 27, 2020: (a) Node 1 and (b) Node 2. Validation results from data collected on September 11, 2020: (c) Node 1 and (d) Node 2.
Calibration results from data collected on August 27, 2020: (a) Node 1 and (b) Node 2. Validation results from data collected on September 11, 2020: (c) Node 1 and (d) Node 2.
Optimization of monitoring point allocation
Final cluster analysis results and evaluation metrics (based on data from September 14th). (a) Flow rate clusters with the SCk index. (b) Water level clusters with the SCk index. Optimal monitoring locations. (c) Flow rate (δ1 = 0.65, df = 0.005 m3/s). (d) Water level (δ1 = 0.2, dw = 0.1 m).
Final cluster analysis results and evaluation metrics (based on data from September 14th). (a) Flow rate clusters with the SCk index. (b) Water level clusters with the SCk index. Optimal monitoring locations. (c) Flow rate (δ1 = 0.65, df = 0.005 m3/s). (d) Water level (δ1 = 0.2, dw = 0.1 m).
For the flow rate indicator, the optimal clustering quality was achieved at δ1 = 0.65 and df = 0.005 m3/s, yielding an impressive clustering coefficient (SCk) of 0.922 with seven monitoring points. For the water level indicator, the best results were obtained at δ1 = 0.2 and dw = 0.1 m, with SCk = 0.666 and 14 monitoring points. These outcomes highlight that different thresholds influence clustering refinement but do not imply correctness or error, allowing threshold selection to align with specific engineering requirements.
The optimal monitoring point arrangements are depicted in Figure 12(c) and 12(d). The pipes marked in different colors represent those belonging to the same flow rate cluster, and triangles mark monitoring locations. The flow rate monitoring scheme reveals that nodes within the same cluster are spatially dispersed yet maintain similar positions relative to upstream and downstream pipe network sections. Conversely, the water level monitoring scheme requires twice as many points, with locations differing significantly due to the strong influence of topography.
Monitoring points are selected to represent the overall pipe network performance while allowing for targeted investigations. Clusters demonstrate process line similarities with slight variations within thresholds, making it sufficient to monitor one point per cluster.
However, financial constraints often limit monitoring at all optimal points. Decision-makers must adjust thresholds to balance clustering outcomes and SCk values, tailoring monitoring schemes to practical limitations. The methodology proposed in this study provides a structured approach to optimizing monitoring point selection under various constraints, aiding decision-making in complex scenarios.
Comparative analysis of rainfall scenarios
Monitoring stormwater runoff during rainfall provides critical insights into the overall runoff volume and water quality in UDS. However, runoff characteristics, such as quantity and quality, can vary significantly across drainage nodes due to factors like rainfall intensity and the intervals between events. Therefore, rainfall variability must be carefully considered when selecting monitoring locations.
Optimal monitoring scheme of different rainfall levels (δ1 = 0.25, df = 0.01 m3/s). (a) September 14th, (b) P = 0.5, (c) P = 5 and (d) P = 50.
Optimal monitoring scheme of different rainfall levels (δ1 = 0.25, df = 0.01 m3/s). (a) September 14th, (b) P = 0.5, (c) P = 5 and (d) P = 50.
The results indicate that higher rainfall return periods require more monitoring points, with adjustments or additions to the configuration used for lower return periods. This relationship can be explained by the greater variability in runoff characteristics, such as higher flow volumes, more intense pollutant transport, and spatially heterogeneous runoff patterns, associated with heavier rainfall events (Saharia et al. 2021; Zhou et al. 2021). During such extreme events, additional monitoring points are necessary to capture the variability in both flow and water quality across the drainage network. Importantly, monitoring points optimized for smaller rainfall events can still provide valuable data during more intense rainfall, ensuring flexibility and reliability in the monitoring strategy.
The location of flow rate monitoring points under different rainfall events (total of 10 sensors). (a) September 14th (δ1 = 0.15, df = 0.01 m3/s, SCk = 0.877). (b) P = 0.5 (δ1 = 0.15, df = 0.02 m3/s, SCk = 0.814). (c) P = 5 (δ1 = 0.45, df = 0.01 m3/s, SCk = 0.898). (d) P = 50 (δ1 = 0.45, df = 0.015 m3/s, SCk = 0.836).
The location of flow rate monitoring points under different rainfall events (total of 10 sensors). (a) September 14th (δ1 = 0.15, df = 0.01 m3/s, SCk = 0.877). (b) P = 0.5 (δ1 = 0.15, df = 0.02 m3/s, SCk = 0.814). (c) P = 5 (δ1 = 0.45, df = 0.01 m3/s, SCk = 0.898). (d) P = 50 (δ1 = 0.45, df = 0.015 m3/s, SCk = 0.836).
Table 4 highlights the consistency of flow monitoring points across different rainfall events. A high degree of overlap (50%) is observed among the monitoring points identified for the September 14th rainfall event and those for events with return periods of P = 0.5 and P = 5. Similarly, strong consistency exists between events with return periods of P = 5 and P = 50. However, the agreement drops to 30% when considering all three events (P = 0.5, 5, and 50) collectively.
The percentage of coincident flow rate monitoring points for the different rainfall events
Rainfall events . | Sept. 14 . | P = 0.5 . | P = 5 . | P = 50 . |
---|---|---|---|---|
Sept. 14 | 100 | 50 | 50 | 40 |
P = 0.5 | 50 | 100 | 30 | 30 |
P = 5 | 50 | 30 | 100 | 50 |
P = 50 | 40 | 30 | 50 | 100 |
Rainfall events . | Sept. 14 . | P = 0.5 . | P = 5 . | P = 50 . |
---|---|---|---|---|
Sept. 14 | 100 | 50 | 50 | 40 |
P = 0.5 | 50 | 100 | 30 | 30 |
P = 5 | 50 | 30 | 100 | 50 |
P = 50 | 40 | 30 | 50 | 100 |
While additional experiments would offer valuable insights, the proposed method provides a strong foundation for optimizing the placement of monitoring points under various rainfall scenarios. We recommend that future research, when feasible, incorporates real-world rainfall data to validate and refine the monitoring network design, enhancing the practical applicability of the strategy.
CONCLUSIONS
This paper introduces a novel approach based on iterative clustering to determine optimal locations for monitoring sites in UDS. The method utilizes model simulations to analyze the temporal dynamics of all nodes under varying rainfall conditions. To address the influence of time delays on node-specific time-series data, the analysis incorporates TLCC coefficients and DTW distances within a bilayer clustering framework. By adjusting the threshold values, the method achieves a balance between clustering refinement and engineering requirements, enabling a practical and accurate determination of monitoring points.
The application of this approach to both an illustrative scenario and a case study demonstrates that the placement of monitoring points is influenced by the chosen indicators. Clusters of nodes tend to form in similar upstream or downstream segments of the pipe network, reflecting the spatial organization of the system. For water level observations, clusters are confined to distinct regions, requiring a greater number of monitoring points to achieve optimal clustering accuracy. Notably, monitoring points identified under normal rainfall conditions retain their effectiveness during more intense rainfall events, highlighting their adaptability.
However, practical constraints, such as equipment limitations and rainfall variability, necessitate a careful weighting of different rainfall events to achieve a judicious monitoring arrangement. These findings underscore the utility of the proposed approach in developing efficient systems for managing pipe networks and monitoring urban drainage, offering valuable insights for improved urban water infrastructure planning.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the staff from the Zhenhai Planning, Surveying, and Design Institute for their assistance in the field sensor location survey and installation.
FUNDING
This study was supported by the National Key Research and Development Program of China (2022YFC3203200) and the Key Research and Development Program of Ningbo (2023Z216).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.