ABSTRACT
Clustering analysis of small watersheds is an effective tool for identifying the similarity of runoff generation and concentration. In this paper, 545 small watersheds in the hilly areas of Shandong Province were investigated, and 12 indicators representing their climate and subsurface characteristics were selected to identify communities based on hydrological similarity. We further analyzed the hydrological connections among the small watersheds within each community using three indicators (network mean, centrality, and k-core). Finally, the clustering results were evaluated on the basis of the small watershed flood peak modulus. The results of this complex network method indicate that the study area contained six large communities and nine small communities. The community-clustering results were reasonable and showed the interconnectedness of the watersheds within each community. The three network indicators adequately described the degree of similarity, the representativeness of the watersheds, and the spatial scales of similar hydrological features. This method should be helpful for addressing the issue of parameter transplantation in ungauged watersheds and implementation of a flood risk management strategy.
HIGHLIGHTS
Small watershed classification is conducted in a complex network approach.
Network evaluation metrics were used to represent hydrological linkages between small watersheds in the community.
The flood modulus of the small watersheds was calculated using the inference equation method for reasonableness verification.
INTRODUCTION
Accurate hydrological forecasting can effectively prevent flooding and reduce disaster losses (Das et al. 2016; Das et al. 2021). At present, due to the lack of hydrological monitoring stations or the short time to build stations in the domestic hilly areas, resulting in limited or no hydrological data, it is difficult to satisfy the requirements of parameter rate fixing for flood forecasting models, and the changing conditional of the subsurface aggravate the difficulty of hydrological forecasting in the ungauged area (Zhu et al. 2020). Therefore, hydrological forecasting in the ungauged area has always been one of the difficult problems of hydrological research (Merz & Blöschl 2004). The physical attribute similarity method is one of the main methods for hydrological similarity research in small watersheds (Wu et al. 2023), numerous studies have proved that the physical attribute similarity method can effectively identify hydrologically similar watersheds and reveal the spatial distribution of hydrological characteristics of small watersheds (Sellami et al. 2014; Gong et al. 2021), thus solving the problem of hydrological forecasting in ungauged watersheds and contributing to the implementation of flood risk management strategies.
At present, physical attribute similarity methods mainly include the watershed clustering method and hydrological similarity evaluation (Sun et al. 2023). Many scholars have applied cluster analysis (Fan & Liu 2015) to the identification and selection of similar sub-watersheds. Commonly used clustering algorithms include K-means algorithm (Xue 2016), fuzzy clustering (Choubin et al. 2017), etc. Yi et al. (2014) implemented and verified the clustering of sub-watersheds based on self-organizing map and the hierarchical clustering method, followed by parameter shifting for information-poor areas using the Hydrologiska Byråns Vattenbalansavdelning model. Yang et al. (2022) sorted 64 typical sub-watersheds in hilly areas into 11 groups of similar watersheds based on the physical similarity method using the k-means clustering algorithm and carried out hydrological-model parameter-transfer experiments. Mayer et al. (2014) classified the U.S. portion of the Great Lakes basin using cluster analysis. Barbhuiya et al. (2023) classified watersheds based on the physical attribute similarity approach using the K-means algorithm and evaluated donor watersheds in each group based on the performance of the GR4J hydrologic model. However, these clustering algorithms have limitations: (a) they do not reflect the degree of similarity between watersheds; (b) they do not quantify the role of sub-watersheds in each cluster; and (c) they do not provide information on the connectivity between all pairs of watersheds.
Hydrological similarity evaluation is defined as the use of a hydrological similarity metric to measure the degree of similarity within a collection of similar features in two watersheds. Many scholars have established a variety of hydrological similarity evaluation indexes, such as degree of connectedness (Deng et al. 2006), Euclidean distance (Zhu et al. 2020), etc. Li et al. (2018) used the root-mean-square error of the flow-duration curve as the similarity index and found that the hydrological responses of similar basins are also similar in nature. Wang et al. (2021) constructed a physical hydrological similarity index based on climatic, subsurface, and hydrological characteristics; it can effectively identify similar watersheds, to some extent reflecting hydrological response characteristics. Liang et al. (2023) constructed a hydrological similarity assessment index between basins to quantify the degree of hydrological similarity between nested basins in Fujian Province. The hydrological similarity assessment captures the degree of similarity between basins but not the homogeneity of categories between selected similar basins.
The community-detection algorithm combines the advantages of watershed clustering with those of similarity evaluation: it can construct a complex network based on the hydrological similarity between sub-watersheds, and it identifies communities within the network. In recent years, community-detection algorithms for complex networks have been applied to hydrological studies. Sun et al. (2018) have used a community-detection algorithm to classify hydrological objects such as precipitation grids. Yang et al. (2019) based on the indicators of daily mean streamflow, areal precipitation, potential evaporation, and temperature, a community-detection algorithm was used to classify 242 catchments in the United States and to identify catchments with similar flood seasonality; Halverson & Fleming (2015) used one to classify catchments based on inter-annual hydrographs. Deepthi & Sivakumar (2023) explored how the concept of complex networks can be used to assess the importance of individual sites in a hydrologic network. However, there are fewer studies that combine community-detection algorithms with physical attribute characteristics of watersheds to analyze the hydrological similarity and classify sub-watersheds in hilly areas.
In this paper, the concept of a complex network is introduced, based on the physical attribute characteristics of watersheds, combining the watershed hydrological similarity evaluation method with the community-detection algorithm, and using three kinds of network metrics, namely, average degree, centrality, and k-core, to analyze the hydrological similarity of 545 small watersheds of the hilly areas in Shandong Province with lack of information, and to realize the division of communities, and to explain in detail the hydrological connectivity between sub-watersheds within each community. The clustering rationality was finally verified using the flood peak modulus.
METHOD
Similarity evaluation methods










Complex network approaches
A complex network is a special kind of network structure in which the objects of a complex system are treated as nodes and the relationships between the objects as edges. Its properties may include self-organization, self-similarity, attractors, the small-world property, and the scale-free property (Scarsoglio et al. 2013). A conceptual diagram of a complex network is shown in Figure S1. In this paper, nodes represent sub-watersheds and edges represent the degree of similarity between them.
Louvain community-detection algorithm
Community detection (Fortunato 2010) in graphs is usually conducted with clustering algorithms that divide graph nodes into different communities according to how tightly connected or similar they are; connections between nodes belonging to different communities are sparse or dissimilar (Clauset et al. 2004). Figure S2 shows a community structure, where nodes within the same circle form a community.




The flow of the Louvain algorithm is shown in Table S1. The Louvain algorithm is a greedy optimization algorithm and may fall into local optima. Therefore, the algorithm should be executed several times and the result with the largest degree of modularity chosen as the final community division result.
Construction of complex network
Based on the physical features, the similarity calculated using the similarity evaluation method (Section 2.1) is used as the similarity metric when constructing a complex network, which is unique to this study. Connections are added between two nodes only if the similarity between the two basins exceeds a set threshold for connectivity, and the value of the similarity is used as the weight of the edge. In this study, it is also necessary to determine the optimal connection threshold to ensure that the best clustering results can be obtained. Different connection thresholds may lead to different results in the division of the network as well. Moreover, because the modularity increases as the number of connections in the network decreases, the modularity between different networks cannot be used for comparison. Therefore, the difference between the modularity of the actual network and the typical modularity for a given connection threshold, i.e., the incremental modularity, is utilized instead when comparing the clustering of networks with different connection thresholds and determining the optimal connection threshold (Yang et al. 2019).
Network evaluation indicator
Network averaging



Centrality
Centrality describes the role of a node in a community network (Landherr et al. 2010). In this study, we focus on whether there is a node with high centrality in the community that can represent the overall node characteristics. Therefore, we evaluate the centrality of the nodes in the sub-networks of each community. Figure S3 shows an example diagram of the degree of network centrality.



k-Core
The k-core is essentially a set of densely connected nodes (Carmi et al. 2007). The process of determining the k-core can be divided into the following steps: first, remove all nodes that have only one connected edge; second, remove all nodes that have only two connected edges; and so on in turn for all integers i < k. The remaining nodes and their edges form a ‘core’ of degree k. An example of i-shells and a k-core nucleus is given in Figure S4.
Study area
Shandong Province is located in the northeastern part of the North China Plain coastal area, to the east of the Taihang Mountains. The province's land area of about 155,800 km2. Shandong Province has a complex topographic structure: the terrain of the central region is mountainous; the southwest and northwest are low-lying and flat; the east has gentle rolling hills, forming the skeleton of the mountainous hills, plains, and basins of the staggering terrain in the middle of the situation. Mount Tai is located in the middle; its main peak is 1,532.7 m above sea level, the highest point in the province. The Yellow River Delta is generally 2–10 m above sea level and is the lowest point in the province. The geomorphology of the territory is complex. The mountains account for about 15% of the province's area. They are mainly in the central part of Shandong, a localized area of southwestern Shandong, and the eastern part of the province; their average elevation is about 700 m. Hills account for about 16% of the province's area, mainly in the eastern and southwestern Lu regions; their average elevation is 100–500 m. Shandong Province has a well-developed water system. The province is crossed by the Huaihe, Yellow, Haihe, Xiaoqing, Jiaodong, and other rivers. The lakes are mainly concentrated in the southern part of the Luzhong Hills and the Lunan Plain. Shandong Province has a warm temperate monsoon climate with concentrated precipitation, rain and heat in the same season, a short spring and autumn, and a long winter and summer. The average annual temperature is 11–14°C. The average precipitation is 676.5 mm and the average natural runoff is 22.29 billion m3.
General topography of typical hilly areas of Shandong Province, including the research area.
General topography of typical hilly areas of Shandong Province, including the research area.
RESULTS AND DISCUSSION
Determining the similarity metric
On the basis of related studies (Yi et al. 2014; Mayer et al. 2014; Xue 2016), watershed area, average watershed slope, watershed shape factor, river-network density, main-channel specific drop, main-channel length, river curvature, topographic index (Jin et al. 2017), normalized difference vegetation index (NDVI), curve number (CN) value, and multi-year average maximum 3-h and 6-h rainfall were selected as similarity metrics. The CN value mainly reflects the land use and soil type of the watershed (Al-Ghobari et al. 2020), and the multi-year average maximum 3-h and 6-h rainfall can reflect the watershed rainfall characteristics. Please refer to Table S2 for the calculation table of CN values in Shandong Province; on the basis of determining the CN values corresponding to different soil types and different land use types, determine the CN value of each small watershed in accordance with the area-weighted average method (Xue 2016). Table S3 shows the maximum, minimum, and average values of each similarity metric in all sub-watersheds. Table 1 shows the calculated similarities between selected sub-watersheds (due to the large number of watersheds, only some of the watershed similarities are shown in the table).
Calculated similarity between sub-watersheds in the study area (show some watersheds)
. | Yunliang . | Zhaili . | Tongtian . | Yingwen . | Mata . | Xizhou . | Duanmengli . | … . | Shanma . |
---|---|---|---|---|---|---|---|---|---|
Yunliang | 1.00 | 0.59 | 0.59 | 0.52 | 0.54 | 0.46 | 0.42 | … | 0.55 |
Zhaili | 0.59 | 1.00 | 0.54 | 0.54 | 0.40 | 0.43 | 0.37 | … | 0.47 |
Tongtian | 0.59 | 0.54 | 1.00 | 0.52 | 0.66 | 0.50 | 0.57 | … | 0.69 |
Yingwen | 0.52 | 0.54 | 0.52 | 1.00 | 0.42 | 0.45 | 0.31 | … | 0.44 |
Mata | 0.54 | 0.40 | 0.66 | 0.42 | 1.00 | 0.48 | 0.58 | … | 0.62 |
Xizhou | 0.46 | 0.43 | 0.50 | 0.45 | 0.48 | 1.00 | 0.49 | … | 0.72 |
Duanmengli | 0.42 | 0.37 | 0.57 | 0.31 | 0.58 | 0.49 | 1.00 | … | 0.63 |
… | … | … | … | … | … | … | … | … | … |
Shanma | 0.55 | 0.47 | 0.69 | 0.44 | 0.62 | 0.72 | 0.63 | … | 1.00 |
. | Yunliang . | Zhaili . | Tongtian . | Yingwen . | Mata . | Xizhou . | Duanmengli . | … . | Shanma . |
---|---|---|---|---|---|---|---|---|---|
Yunliang | 1.00 | 0.59 | 0.59 | 0.52 | 0.54 | 0.46 | 0.42 | … | 0.55 |
Zhaili | 0.59 | 1.00 | 0.54 | 0.54 | 0.40 | 0.43 | 0.37 | … | 0.47 |
Tongtian | 0.59 | 0.54 | 1.00 | 0.52 | 0.66 | 0.50 | 0.57 | … | 0.69 |
Yingwen | 0.52 | 0.54 | 0.52 | 1.00 | 0.42 | 0.45 | 0.31 | … | 0.44 |
Mata | 0.54 | 0.40 | 0.66 | 0.42 | 1.00 | 0.48 | 0.58 | … | 0.62 |
Xizhou | 0.46 | 0.43 | 0.50 | 0.45 | 0.48 | 1.00 | 0.49 | … | 0.72 |
Duanmengli | 0.42 | 0.37 | 0.57 | 0.31 | 0.58 | 0.49 | 1.00 | … | 0.63 |
… | … | … | … | … | … | … | … | … | … |
Shanma | 0.55 | 0.47 | 0.69 | 0.44 | 0.62 | 0.72 | 0.63 | … | 1.00 |
Before community delineation, the correlation analysis of each similarity indicator was carried out using the Person correlation coefficient (Yang et al. 2022), and the results are shown in Figure S5. It can be seen that the correlation between slope and stream ratio drop, topographic index, elevation, and CN value is stronger, and the rest of the indicators are more independent.
The sources of the data used in this paper are digital elevated model data from the Advanced Land Observing Satellite (ALOS, launched in 2006) satellite phased-array L-band synthetic aperture radar (purchased); NDVI from the Resources and Environmental Science Data Center (http://www.resdc.cn/); land use and soil subsurface data from the results of the Shandong Province Flash Flood Hazard Analysis and Assessment Project; rainfall data from the Hydrological Center of Shandong Province.
Distribution of watershed communities
Results of watershed clustering
It is generally accepted that two basins are similar if their similarity is greater than 0.6 (Qi et al. 2007). Considering the network complexity and other factors, the connectivity threshold was determined after several tests; at this connectivity threshold, the incremental modularity reaches its maximum value, i.e.,
. Points (watersheds) with similarity greater than 0.65 are filtered as input conditions to the Louvain algorithm to get the final community segmentation, similarity values as weights for edges. The complex network consisted of 545 nodes and 2,239 edges.
Meanwhile, the results of the Louvain algorithm delineation were briefly compared with those obtained from K-means algorithm clustering (Figure S6). It was found that the distribution of each category of watersheds was more dispersed in the results of K-means, and the categories of some watersheds were unreasonable, which might be due to the hard clustering property of K-means algorithm and its sensitivity to noisy data, and the K-means clustering could not reflect the degree of similarity among watersheds. On the other hand, the Louvain algorithm is based on the similarity between watersheds to classify communities, which can fully reflect the similarity between watersheds, retain the stronger relationship, and filter out the weaker relationship. In conclusion, the Louvain clustering algorithm is more advantageous than the K-means algorithm, and it can represent the hydrological connection between watersheds more clearly and provide a deeper explanation through the network indicators.
Watershed characteristics of different communities
Distribution of basic characteristics of communities in each basin (Note: ‘high’, ‘relatively high’, ‘medium’, ‘relatively low’, ‘low’ in the figure represent the distribution of the indicator values of the community watershed among all the sub-watersheds).
Distribution of basic characteristics of communities in each basin (Note: ‘high’, ‘relatively high’, ‘medium’, ‘relatively low’, ‘low’ in the figure represent the distribution of the indicator values of the community watershed among all the sub-watersheds).
Communities 2 and 3 exhibit only a few differences. Both are in hilly areas with higher elevations and slopes; their soil types are dominated by sandy loam and sandy clay loam; in terms of land use, both are dominated by arable land with a small amount of woodland and grassland. However, the degree of vegetation cover in Community 2 is slightly lower than in Community 3. Therefore, Community 2 has a higher CN value, with much weaker water permeability, making it more susceptible to flooding.
Community 4 and Community 5 are relatively similar geographically. Although both are located in hilly areas with low elevations and gentle slopes, the elevation and slopes of Community 5 are in general slightly lower than those of Community 4. In terms of land use types, both Communities 4 and 5 are predominantly arable land and townships, with a relatively small proportion of forested and grassed areas. In terms of soil types, Community 4 is predominantly sandy clay loam, sandy loam, and clay loam, whereas Community 5 is predominantly sandy clay loam and clay loam, with a relatively weak permeability. Because of the large proportion of township land in the two communities, the flooding damage within their boundaries would be severe in the event of extreme rainfall.
Community 6 is at a low elevation and comprises low-slope river valleys, lakesides, sea inlets, and other types of plains. The shape factor of its watershed is generally large, and runoff is slow to change. The community is dominated by towns and cropland, with a larger CN and a larger percentage of town area. It is not prone to flash floods, but people living here should be alert to the occurrence of inundation.
Analysis of network indicators
Network averaging
The average degree of the sub-networks generated by each large community is illustrated in Table 2. Community 5 has the lowest average degree of the network, which indicates that there are fewer connections between nodes in this community. This means that the homogeneity of the conditions of the watersheds in terms of runoff generation and concentration is low. Two factors may explain this result. First, the sub-watersheds in Community 5 belong to the transition area of the hilly plains, which has a relatively complex geography. Second, this paper introduced the weights of the edges, making the connections between some nodes stronger and reducing those between other nodes, thus leading to a decrease in the average degree of the sub-network.
Average degree of the sub networks induced in each large community
Community . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|---|
Average degree of community sub-network | 7.23 | 7.9 | 6.7 | 9.5 | 6.0 | 11.4 |
Community . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|---|
Average degree of community sub-network | 7.23 | 7.9 | 6.7 | 9.5 | 6.0 | 11.4 |
Community 6 has the highest average degree of the network, which may be because it is located in plain-like areas such as river valleys, lakesides, and estuaries, with relatively simple topographic structures and higher homogeneity of catchment runoff generation and concentration conditions between watersheds. Consequently, there are more pairs of basins with similar catchment runoff generation and concentration.
Centrality
Large community centrality (olive green indicates the watershed with the highest degree, red indicates the watershed with the highest betweenness, and orange indicates that the watershed has the highest centrality of both kinds).
Large community centrality (olive green indicates the watershed with the highest degree, red indicates the watershed with the highest betweenness, and orange indicates that the watershed has the highest centrality of both kinds).
k-Core
Rationality analysis based on flood peak modulus




Flood peak modulus results: (a) flood peak modulus distribution in sub-catchments and (b) distribution of flood peak modulus by sub-basin within six large communities.
Flood peak modulus results: (a) flood peak modulus distribution in sub-catchments and (b) distribution of flood peak modulus by sub-basin within six large communities.
Figure 6(b) is a scatter plot of the distribution of flood peak modulus in each small watershed within the six large communities. It can be seen that there is a certain degree of aggregation of flood peak modulus in small watersheds within the same community. The average slope of the watershed, the specific drop of the river channel, and the rainfall are the main indicators affecting the flood peak modulus; therefore, from Community 1 to Community 6, the flood peak modulus in small watersheds shows a declining trend with the decrease of the slope and the specific drop. There are some small watersheds with similar flood moduli that belong to different communities, e.g., some members of Communities 1, 2, and 3. This is because the selected similarity indexes defining the communities also include CN value, vegetation index, river curvature, and river density, which have little correlation with the indicators influencing flood peak modulus.
CONCLUSION
In this paper, 545 typical sub-watersheds in hilly areas of Shandong Province were classified using the Louvain community-detection algorithm. The hydrological significance of the three network metrics was further explored, and a rationality analysis was carried out based on the computed sub-watershed flood peak modulus. The main conclusions drawn were as follows:
(a) The 545 watersheds were divided into a total of 6 large communities and 9 small communities, with the small communities showing a high degree of uniqueness. The various communities differed considerably in certain characteristics. Therefore, this method can be used to identify the similarity of runoff generation and concentration in small watersheds and provides a new tool for similarity analysis studies in other small watersheds of the same type.
(b) The network-averaging results reflected the degree of homogeneity within the same community; the centrality measures allowed for the selection of watersheds that were representative of the characteristics of the community as a whole; and the k-core helped to identify the extent of the areas in a community with the most similar conditions for the runoff generation and concentration. This facilitates the implementation of flood risk management strategies for watersheds.
(c) The next step could be to focus on the application of these results, such as the parameter transplantation of ungauged watersheds. The degree of influence of the different indicators on the flow production and catchment processes in the various watersheds should also be considered and used as a basis for setting the weighting coefficients of the indicators.
ACKNOWLEDGEMENTS
The authors wish to gratefully acknowledge the financial assistance from the Natural Science Foundation of Shandong Province (ZR2020ME249), National Natural Science Foundation of China (42301046), and the other anonymous reviewer whose comments significantly improved the quality of this paper.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.