ABSTRACT
Aiming at the spatial heterogeneity of ecohydrological features in Ziya River Basin, this study constructed a regional index system based on entropy weight method, and proposed an innovative probability-density-driven Gaussian mixture model (GMM) clustering method. In this study, the GMM clustering method was used to calculate the posterior probability values for each unit, which utilised the 1 km×1 km grid of the Ziya River Basin as the calculating unit and selected indicators related to the underlying surface characteristics, human activities, and hydrological and ecological features. The weights of each indicator are as follows: elevation 0.263, slope 0.218, precipitation 0.053, temperature 0.048, relative humidity 0.055, runoff 0.224, vegetation normalised index 0.030, and gross domestic product per capita, 0.108. By calculating the posterior probability, the basin is divided into 5 ecohydrological regions. The zoning results reflect the similarities and differences in the ecohydrological situation of the basin. After the Kruskal-Wallis testing and analysis, all eigenvalues were p = 0.000<0.05, indicating that the eight indicators were not the same for each type of distribution. This study provides a minimum management unit for basin water resource management departments, conserves resources, and improves social benefits.
HIGHLIGHT
In this paper, the Gaussian mixture model clustering method based on probability density was used to calculate the posterior probability values for each unit, which utilised the 1 km × 1 km grid of the Ziya River Basin as the calculating unit and selected indicators related to underlying surface characteristics, human activities, and hydrological and ecological characteristics.
INTRODUCTION
In recent decades, problems such as water shortages, the uneven distribution of water resources, and poor ecological conditions have become serious (Noori et al. 2021; Naderian et al. 2024; Yan et al. 2022). Understanding the differences in the ecology and spatial distribution of water resources across an entire watershed is a prerequisite for ecological and water resource management. The spatial distribution of ecological and hydrological elements in a watershed is quite different, and it is impossible to cover an area from one point, which brings great difficulties to the current research. To comprehensively understand the distribution patterns of water resources in the region, it is necessary to conduct ecohydrological zoning, clarify the scope of each region, and provide a theoretical basis for developing ‘unadaptive’ water resource management programmes and measures in each zone (Richter et al. 1997).
Ecohydrological zoning, rooted in an objective analysis of river basin ecohydrology, elucidates the similarities and differences within ecohydrological systems and assesses the effects of human activities on these processes (Bai et al. 2023). It is also the basis for developing ecological protection strategies, the optimal allocation of water resources, and minimum management units. Currently, the study of ecohydrological zoning is divided according to the area controlled by the region, watershed, or hydrological station. Most division methods adopt the traditional clustering method, which involves judging the similarities and differences based on the distance before dividing the different ecohydrological zones. For example, Mosley (1981) divided 90 regions on the North Island and 84 regions on the South Island of New Zealand into five types of zoning using a clustering method based on regional ecological and water resource characteristics. Wolock et al. (2004)divided 50 states of the United States into 43,931 sub-watersheds, each of which is approximately 200 km2 and partitioned them using a geographic information system and the traditional clustering method. Nathan & McMahon (1990) divided southern Australia into 184 units with unit areas ranging from 1 to 250 km2. They used a combination of methods based on cluster analysis, multiple regression, and principal component analysis to make zonal predictions. Feng et al. (2013) divided the Haihe River Basin into 1,399 sub-watersheds using remote sensing data and spatial distribution maps of climatic features and used principal component analysis and K-means clustering to partition the sub-basins into hydrological zones. Audebert et al. (2024) used the traditional clustering method to construct the framework of global ecological zone classification with biological temperature, annual precipitation, and potential evapotranspiration rate as the core indexes.
The above studies have the following problems: (1) The division unit is large: at present, the ‘surface’ management with a wide range, uniformity, and large scale can no longer meet the needs of the management department and the society, and it is time-consuming and costly. Point-to-point, refined, and personalised management will become a management method highly recommended by today's water resource management departments. For example, grid-based urban management, mainly through collecting information from each district, can achieve point-to-point, accurate management. (2) The partitioning method is somewhat limited; the current study mainly focused on the traditional clustering method, which is unsuitable for all watersheds. Its flexibility is low and can only fit data with circular clusters, which is quite different from the actual data distribution. The output result is relatively rigid and cannot output the probability value, which belongs to ‘hard clustering’.
Based on the concept of urban grid management, this study collected the data of each grid that affected ecological and hydrological processes, integrated the information, and partitioned the integrated data of each grid. Different grid management tools and methods can be formulated according to local conditions, personalised management, and management efficiency. The Gaussian mixture model (GMM) provides a finer grained analysis by assigning a posterior probability to each sample. Combining multiple Gaussian distributions allows the GMM to fit data distributions of any shape. It is widely used in the power industry, testing industry, and other research fields; however, ecohydrological zoning has not yet been applied. Besides, the classical method of significance analysis is analysis of variance (ANOVA); however, it needs to be established to meet the assumption premise of a normal distribution, whereas the actual data do not necessarily fully meet the normal distribution. Therefore, this study chooses a test that does not require the assumption of a normal distribution, that is, the Kruskal–Wallis (K–W) test.
This study comprehensively considers the intrinsic connections within ecological and hydrological systems, grounded in their mutual feedback mechanisms. It integrates the entire chain of ecological, meteorological, hydrological, and human activity factors. The problems of low accuracy, limited adaptability, and large segmentation units in current segmentation methods are solved. This research is conducted based on the ecological environment and hydrological characteristics of the Ziya River Basin, which are influenced by human activities. This study introduces clustering and Gaussian distribution theory from statistics into the analysis of ecological and hydrological zoning. The results provide a theoretical foundation and scientific basis for managing the ecological environment and water resources.
STUDY AREA AND DATA SOURCES
Overview of the study area
The Ziya River is a tributary of the southern Haihe River Basin. There are two major tributaries in the upper reaches: the Hutuo River (north) and Fuyang River (south). The river basin starts from Taihang Mountains in the west, extends to the Bohai Sea in the east, and reaches the Zhangwei River in the south. It spans Shanxi and Hebei provinces, as well as Tianjin City, covering a geographical area between latitude 36°03′N and 39°35′N and longitude 112°20′E and 117°50′E, with a total watershed area of about 46,800 km2.
The Hutuo River originates in Fanshi County, Xinzhou City, Shanxi Province. It has a total length of 587 km and a watershed area of 25,200 km2. It has a narrow river channel and enters the canyon to the Gangnan Reservoir, which is relatively rich in hydraulic resources. The Fuyang River originates at the southern foot of Busan in the Fengfeng Mining District, Handan City, at the eastern foot of Taihang Mountain. It has a total length of 413 km and a watershed area of 14,800 km2, located in the Hebei Province.
The general topography of the Ziya River Basin is high in the west and low in the east, featuring mountainous terrain in the northwest of the basin, which is located in Shanxi Province. The eastern and central parts of the basin are plains located in the Hebei Province. The landform is interspersed with medium and low mountains and hills. Mountainous areas account for approximately 67% of the total basin area, and plains account for approximately 33%. The Ziya river basin water system map is shown in Figure 1.
Data sources
Spatial distribution map of hydrometeorological characteristic factors.
Spatial distribution map of hydrometeorological characteristic factors.
All the above data were used from 2010 to 2019, resampled to 1 km × 1 km by geographic information system (GIS), and converted by Albers projection.
METHODOLOGY
Construction of the indicator system
Selection of indicators
Indicator selection is the foundation and key to zoning, and a reasonable degree of indicator determination has a decisive impact on zoning results. Therefore, the construction of an ecohydrological zoning index system must follow certain guiding ideologies and principles (Yin 2006). The selection principle of the indicators is as follows:
(1) Completeness: The selected indicators should reflect the ecological, spatial, and hydrological characteristics of the watershed as much as possible, including topography, ecological environment, hydrometeorology, and human activities; (2) ease of accessibility: the selection of indicators should consider the difficulty of collecting information; and (3) representativeness: the watershed characteristic indicators were selected for this study. The ecohydrological zoning indicators selected in this study included elevation, slope, NDVI, precipitation, air temperature, relative humidity, runoff, per capita GDP, and eight other indicators.
(1) Topographic features: The topographic elevation and average slope values of the watershed directly affect the growth and distribution of vegetation and the rainfall production and catchment processes. The elevation and slope data for each grid were extracted as topographic feature values from the DEM remote-sensing image map.
(2) Meteorological characteristics: Precipitation is the most important source of runoff, and the relative humidity and average temperature can reflect the degree of evaporation. The above climatic characteristics directly affect vegetation growth; therefore, this study considered precipitation, relative humidity, and average air temperature as the characteristic values of climate to reflect the wet and dry degrees of the basin climate. Based on the spatial interpolation of the measured data, the precipitation, relative humidity, and mean temperature on the spatial grid were inferred as meteorological eigenvalues.
(3) Vegetation characteristics: The NDVI can directly reflect the degree of vegetation cover, which has an impact on the hydrological processes of the basin, such as production and sinking, and is the most important factor affecting its ecological and hydrological processes. The NDVI value is the multiyear average value for each grid.
(4) Hydrological characteristics: The runoff process includes production and sinking. The production and sinking link is affected by precipitation, vegetation, topography, subsurface, and other comprehensive influences; therefore, runoff is the output result under the common influence of many factors and comprehensively considers the entire process of ecological hydrology.
(5) Human activities: Generally, places with high per capita GDP have more human activities and a greater impact on the ecological environment. Most ecosystem problems caused by water shortages and the degradation of the water environment are due to excessive human development and utilisation (Cai et al. 2010). GDP per capita is an important factor affecting differences in ecohydrological characteristics. This study used GDP per capita to represent the strength of human activities.
Indicator empowerment
Indicator assignment refers to assigning values according to the degree of importance of an indicator. The entropy method (Agarwal et al. 2016) was used to determine the degree of dispersion of the indicators. The higher the degree of dispersion, the greater the weight. In this study, the entropy weight method is used to assign values to the eight selected indicators, which can determine the weights based on the information of the indicator data itself, thus avoiding the interference of subjectivity. Using the objective assignment method to replace the commonly used subjective assignment method, the evaluation results are more authentic and credible.
Ecohydrological zoning
Gaussian mixture clustering principle
GMM clustering refers to a linear combination of multiple normal distribution functions and is an unsupervised clustering algorithm (Iwata 2023; Jiang et al. 2023). The GMM is the fastest learning-rate probabilistic model for solving the maximum likelihood function using the expectation–maximisation (EM) algorithm, which is based on the principle of constructing the fittest model of the mixed multidimensional Gaussian distribution by fitting the input set (VanderPlas 2016). The EM algorithm was first proposed by Dempster et al. (1977). Its core contribution is to decompose complex maximum likelihood estimates into alternating expectation calculations (E steps) and parameter optimisation (M steps), providing a general iterative framework for hidden variable models.



The EM algorithm consists of the following two steps:
The Gaussian distribution function is isotropic, that is, the covariance matrix is diagonal, and the simplified model is adapted to partitions of different shapes and sizes.
The specific steps of the GMM clustering algorithm are as follows:
(1) Initialising the parameters: Set the number of Gaussian distribution functions, K, to five, the maximum number of iterations to 100, and the initial parameters to randomly selected sample values. The reason for setting it to 5 is: it can be observed through the scatter plot that the data are naturally divided into five distinct clusters, and the contour coefficient performance is optimal when the classification number is 5.
(2) Step E is performed to calculate the posterior probability that each sample belongs to a partition.
(3) Step M is performed to update the parameters according to the posterior probability.
(4) Determine whether the maximum number of iterations has been reached or if the change in the results obtained before and after the two iterations is less than the set error; if either condition is met, stop the iteration.
Entropy weight method principle
K–W test principle
The K–W test is a nonparametric statistical method (Gao & Chen 2011) to compare whether there is a significant difference in the medians of three or more independent samples. It is an extension of the application of ANOVA. The basic idea of the K–W test is to combine all the data, rank the combined data, and calculate the rank sums to determine whether there is a significant difference in the medians of three or more independent samples.
(2) Calculate the
-value and make a decision based on the K–W test statistic.
Generally, the null hypothesis is that there is no significant difference between the data medians. If <
(usually 0.05), the null hypothesis is rejected; if
>
, the null hypothesis is accepted.
Zoning steps
To better understand the ecohydrological situation of the Ziya River Basin and grasp the spatial distribution characteristics of ecohydrology in the basin, a 1 km × 1 km grid was used as the division unit, and a geographic information system and Gaussian clustering algorithm were used to realise the division of ecohydrological zones and draw a map of the basin's ecohydrological zones. The specific steps are:
(1) Division unit: The Ziya River Basin area covers an area of approximately 46,800 km2. We divide it into 47,890 1 km × 1 km grids, which can enhance the accuracy of partitioning, improve the efficiency of post management, and reduce the risk of fragmentation.
(2) Assignment unit: Using the above grid as the assignment unit, select the indicators that comprehensively reflect the characteristics of the subsurface, human activities, and hydrological and ecological features. Interpolate the multiyear average data of each indicator to each grid and assign eight groups of eigenvalues to 47,890 grids.
(3) Normalisation: Multiple sets of data have different dimensions and cannot be calculated. It is necessary to perform maximum and minimum normalisation processing on the data to reduce the number of data dimensions and create dimensionless data to facilitate subsequent data processing.
(4) Seeking weights: Based on the differences and similarities in the ecological hydrology of each grid, the entropy weight method was utilised to assign values and determine the objective weights of the indicators.
(5) Gaussian hybrid clustering partitioning: Calculate the parameter values for steps E and M of the Gaussian hybrid algorithm and iterate to finally reach the convergence condition. According to the output of the Gaussian hybrid clustering algorithm, the study area was divided into several ecohydrological partitions, and the probability value of each sample belonging to each partition was outputted to determine the category of each division unit and draw the partition map.
(6) K–W test: According to the principle of the K–W test, calculate whether the distribution of each eigenvalue in different partitions is significantly different. A further two-by-two test was conducted to determine the differences between groups.
RESULTS AND DISCUSSION
Empowerment of indicators
The eight selected indicators were assigned weights according to the entropy value method, and the weighting results are listed in Table 1.
Results of the entropy weight method
Indicator . | Elevation . | Slope . | Precipitation . | Temperature . | Humidity . | Runoff . | NDVI . | GDP per capita . |
---|---|---|---|---|---|---|---|---|
Weighted | 0.263 | 0.218 | 0.053 | 0.048 | 0.055 | 0.224 | 0.030 | 0.108 |
Indicator . | Elevation . | Slope . | Precipitation . | Temperature . | Humidity . | Runoff . | NDVI . | GDP per capita . |
---|---|---|---|---|---|---|---|---|
Weighted | 0.263 | 0.218 | 0.053 | 0.048 | 0.055 | 0.224 | 0.030 | 0.108 |
Partitioning results and characterisation
Line chart of posterior probability values for each classification model.
Mean values of posterior probabilities for each classification
Type . | Type 0 . | Type 1 . | Type 2 . | Type 3 . | Type 4 . |
---|---|---|---|---|---|
Type 0 | 0.98 | 0.00 | 0.00 | 0.01 | 0.01 |
Type 1 | 0.00 | 0.94 | 0.04 | 0.00 | 0.02 |
Type 2 | 0.00 | 0.02 | 0.97 | 0.00 | 0.00 |
Type 3 | 0.01 | 0.00 | 0.00 | 0.95 | 0.03 |
Type 4 | 0.00 | 0.00 | 0.00 | 0.01 | 0.98 |
Type . | Type 0 . | Type 1 . | Type 2 . | Type 3 . | Type 4 . |
---|---|---|---|---|---|
Type 0 | 0.98 | 0.00 | 0.00 | 0.01 | 0.01 |
Type 1 | 0.00 | 0.94 | 0.04 | 0.00 | 0.02 |
Type 2 | 0.00 | 0.02 | 0.97 | 0.00 | 0.00 |
Type 3 | 0.01 | 0.00 | 0.00 | 0.95 | 0.03 |
Type 4 | 0.00 | 0.00 | 0.00 | 0.01 | 0.98 |
Results of ecohydrological zoning by the Gaussian mixed clustering method.
Based on the ecohydrological zoning map of the Ziya River Basin, the ecohydrological characteristics and spatial distribution of the watershed in each zoning area are as follows (Table 3):
(1) The Type 0 ecohydrological subarea is located in the north-western area of the Ziya River Basin. Compared with other types, its average elevation is the largest in the basin, reaches 1,236.7 m, and its average slope is relatively steep at 11.3%, which belongs to the mountainous area. The elevation was concentrated between 1,000 and 2,000 mm, and the area within this range accounted for 61.1% of the total area. This subarea has few human activities, a lower GDP, little precipitation of 480.1 mm, and a low temperature of 8.64 °C. This region is located on the leeward slope of Taihang Mountain, where the vegetation coverage is low at only 0.71. Consequently, the interception effect of vegetation is minimal, and the lowest average daily runoff is only 0.74 mm. The area of category 0 is about 0.72 million km2, accounting for 15.42% of the entire Ziya River Basin.
(2) The Type 1 ecohydrological subarea is located in the southern part of the Ziya River Basin. It features lower elevation, a smaller slope, more human activities, and the highest GDP of 37.6 million yuan compared with other types. The GDP is concentrated in the range of 20,000–50,000 yuan, and the area within this range accounts for 76.6% of the total area. This region has the highest temperature of 14.17 °C, more precipitation of 514.4 mm, the lowest vegetation coverage at only 0.68, less retention of runoff, and the highest average daily runoff of 13.82 mm. The area of this subarea is about 0.47 million km2, accounting for 10.06% of the entire Ziya River Basin area, which is the smallest area proportion.
(3) The Type 2 ecohydrological subarea is located in the eastern area of the Ziya River Basin, with the lowest average elevation and average slope compared with the other types, which are 39.23 m and 1.68%, respectively, and belong to the plain area. The slopes were concentrated between 1.5 and 2.3%, and the area within this range accounted for 53.8% of the total area. This region has the lowest average annual precipitation at 471.7 mm, but the vegetation normalisation index is relatively high at 0.76, and the average daily runoff is not small at 11.8 mm. The area of this subarea is approximately 13,200 km2, accounting for 28.27% of the entire Ziyang River Basin.
(4) The Type 3 ecohydrological subarea is located in the west-central part of the Ziya River Basin, with a higher elevation, an average value of 1,188.64 m, and the largest average slope of 19.71% compared with the other types. There are few human activities and favourable climatic conditions, such as precipitation and temperature. This region is located on the windward slope of the Taihang Mountains and has a relatively humid climate. However, the vegetation coverage was the largest, and the vegetation interception effect was large; therefore, the runoff was only 2.33 mm. Runoff was concentrated in the range of 1–5.5 mm, and the area within this range accounted for 53.3% of the total area. The area of this subarea is about 0.60 million km2, accounting for 12.85% of the entire Ziyang River Basin area.
(5) The Type 4 ecohydrological subarea is located in the middle of the Ziya River Basin and belongs to an excessive zone between the mountainous and plain areas. The elevation was 778.31 m, the average slope was 12.66%, and the average annual precipitation was the largest at 521.13 mm. Precipitation was concentrated between 500 and 550 mm, and this range accounted for 66.0% of the total area. Although the precipitation was high, the vegetation coverage was not low at 0.75; therefore, the vegetation retention effect was significant. In addition, there are many human activities in this area; therefore, the average daily runoff was not the highest at 9.72 mm. The area of this subdistrict is approximately 15,600 km2 and it accounts for 33.40% of the entire Ziyang River Basin.
Mean eigenvalues within ecohydrological subgroups
Type . | Elevation (m) . | Slope (%) . | Precipitation (mm) . | Temperature (°C) . | Humidity (%) . | NDVI . | Runoff (mm) . | GDP per capita (ten thousand dollars) . | Area (million km2) . |
---|---|---|---|---|---|---|---|---|---|
0 | 1,236.70 | 11.30 | 480.1 | 8.64 | 0.53 | 0.71 | 0.74 | 2.05 | 0.72 |
1 | 138.02 | 2.67 | 514.4 | 14.17 | 0.57 | 0.68 | 13.82 | 3.76 | 0.47 |
2 | 39.23 | 1.68 | 471.7 | 13.64 | 0.58 | 0.76 | 11.80 | 2.90 | 1.32 |
3 | 1,186.64 | 19.71 | 512.7 | 7.63 | 0.56 | 0.79 | 2.33 | 2.79 | 0.60 |
4 | 780.31 | 12.66 | 521.1 | 12.02 | 0.55 | 0.75 | 9.72 | 3.57 | 1.56 |
Type . | Elevation (m) . | Slope (%) . | Precipitation (mm) . | Temperature (°C) . | Humidity (%) . | NDVI . | Runoff (mm) . | GDP per capita (ten thousand dollars) . | Area (million km2) . |
---|---|---|---|---|---|---|---|---|---|
0 | 1,236.70 | 11.30 | 480.1 | 8.64 | 0.53 | 0.71 | 0.74 | 2.05 | 0.72 |
1 | 138.02 | 2.67 | 514.4 | 14.17 | 0.57 | 0.68 | 13.82 | 3.76 | 0.47 |
2 | 39.23 | 1.68 | 471.7 | 13.64 | 0.58 | 0.76 | 11.80 | 2.90 | 1.32 |
3 | 1,186.64 | 19.71 | 512.7 | 7.63 | 0.56 | 0.79 | 2.33 | 2.79 | 0.60 |
4 | 780.31 | 12.66 | 521.1 | 12.02 | 0.55 | 0.75 | 9.72 | 3.57 | 1.56 |
Zoning reasonableness analysis
Significance test analysis
Eigenvalues . | Elevation . | Slope . | Precipitation . | Temperature . | Humidity . | NDVI . | Runoff . | GDP per capita . |
---|---|---|---|---|---|---|---|---|
Test statistic | 36,374.8 | 30,669.2 | 22,463.2 | 38,021.2 | 23,979.7 | 6,912.0 | 23,283.4 | 12,630.8 |
Degrees of freedom | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Test p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Eigenvalues . | Elevation . | Slope . | Precipitation . | Temperature . | Humidity . | NDVI . | Runoff . | GDP per capita . |
---|---|---|---|---|---|---|---|---|
Test statistic | 36,374.8 | 30,669.2 | 22,463.2 | 38,021.2 | 23,979.7 | 6,912.0 | 23,283.4 | 12,630.8 |
Degrees of freedom | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Test p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Note: Significance level is 0.05.
Based on the K–W principle, the assumptions are:
H0: The median of each eigenvalue is the same for each classification.
H1: The median eigenvalue differs for each classification.
The test was analysed by calculating the p-value. If p < 0.05, the original hypothesis H0 was rejected, indicating that the distribution of the eigenvalue was not the same for each classification.
After testing and analysis, all eigenvalues were p = 0.000 < 0.05, and the original hypothesis H0 was rejected, indicating that the eight indicators were not the same for each type of distribution. The differences between the various types were further tested, and the results obtained were all p = 0.000 < 0.05, indicating significant differences in the eigenvalues of each partition; that is, each eigenvalue was different in each region, which also quantitatively indicates significant differences in the data between the partitions.
Comparative analysis of Gaussian mixture cluster map and K-means traditional cluster distribution map shows (Figure 9) that Gaussian mixture cluster has less fragmentation. Further quantitative analysis based on each data set was performed, as shown in Table 5; a total of 40 mean and variance values were calculated for eight indicators in five class partitions using different algorithms. There are many indicators whose mean value of the GMM algorithm is smaller than that of the K-means algorithm, and the degree of dispersion of the GMM algorithm is significantly lower than that of the K-means algorithm, indicating that the GMM algorithm results are more concentrated, the fluctuation within the group is small, and the data are relatively stable. This indicates that the Gaussian mixture clustering method can adapt well to research units with higher accuracy and finer divisions (Ren et al. 2020).
Mean and variance statistics for GMM algorithm and K–M algorithm under different classifications
Mean/variance . | 0 . | 1 . | 2 . | 3 . | 4 . | |||||
---|---|---|---|---|---|---|---|---|---|---|
GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | |
Elevation | 1,236.7 (1,024,129.94) | 1,488.5 (1,177,163.83) | 138.02 (28,477.53) | 77.63 (26,239.11) | 39.23 (1,537.86) | 108.71 (68,499.32) | 1,186.64 (1,327,621.91) | 993.05 (609,337.24) | 780.31 (707,809.91) | 846.31 (619,390.42) |
Slope | 11.3 (207.85) | 21.15 (187.33) | 2.67 (10.52) | 2.26 (17.8) | 1.68 (1.47) | 2.65 (22.16) | 19.71 (213.02) | 8.73 (100.21) | 12.66 (185.42) | 16.68 (181.25) |
Precipitation | 480.14 (27,620.91) | 495.05 (28,130.57) | 514.4 (25,114.59) | 478.33 (24,947.58) | 471.7 (16,743.46) | 492.45 (28,777.42) | 512.72 (22,778.61) | 501.4 (30,437.12) | 521.13 (30,377.31) | 530.79 (29,447.86) |
Temperature | 8.64 (23.44) | 7.24 (28.53) | 14.17 (11.49) | 13.47 (19.13) | 13.64 (12.77) | 13.72 (17.44) | 7.63 (34.05) | 9.85 (27.96) | 12.02 (28.43) | 12.13 (27.48) |
Humidity | 0.53 (0.03) | 0.55 (0.03) | 0.57 (0.01) | 0.57 (0.02) | 0.58 (0.03) | 0.58 (0.03) | 0.56 (0.02) | 0.53 (0.02) | 0.55 (0.02) | 0.55 (0.01) |
NDVI | 0.71 (0.12) | 0.77 (0.1) | 0.68 (0.13) | 0.74 (0.14) | 0.76 (0.13) | 0.72 (0.13) | 0.79 (0.08) | 0.71 (0.11) | 0.75 (0.11) | 0.78 (0.09) |
Runoff | 0.74 (1.72) | 1.76 (12.23) | 13.82 (124.46) | 5.09 (37.12) | 11.8 (208.8) | 18.23 (142.39) | 2.33 (12.66) | 2.12 (20.54) | 9.72 (148.37) | 11.22 (141.65) |
GDP per capita | 2.05 (1.14) | 2.26 (5.34) | 3.76 (13.98) | 3.26 (9.08) | 2.9 (7.49) | 3.27 (12.32) | 2.79 (7.12) | 2.48 (4.52) | 3.57 (11.87) | 3.83 (11.98) |
Area | 0.72 | 0.68 | 0.47 | 0.99 | 1.32 | 1.14 | 0.6 | 0.96 | 1.56 | 0.89 |
Mean/variance . | 0 . | 1 . | 2 . | 3 . | 4 . | |||||
---|---|---|---|---|---|---|---|---|---|---|
GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | GMM . | K–M . | |
Elevation | 1,236.7 (1,024,129.94) | 1,488.5 (1,177,163.83) | 138.02 (28,477.53) | 77.63 (26,239.11) | 39.23 (1,537.86) | 108.71 (68,499.32) | 1,186.64 (1,327,621.91) | 993.05 (609,337.24) | 780.31 (707,809.91) | 846.31 (619,390.42) |
Slope | 11.3 (207.85) | 21.15 (187.33) | 2.67 (10.52) | 2.26 (17.8) | 1.68 (1.47) | 2.65 (22.16) | 19.71 (213.02) | 8.73 (100.21) | 12.66 (185.42) | 16.68 (181.25) |
Precipitation | 480.14 (27,620.91) | 495.05 (28,130.57) | 514.4 (25,114.59) | 478.33 (24,947.58) | 471.7 (16,743.46) | 492.45 (28,777.42) | 512.72 (22,778.61) | 501.4 (30,437.12) | 521.13 (30,377.31) | 530.79 (29,447.86) |
Temperature | 8.64 (23.44) | 7.24 (28.53) | 14.17 (11.49) | 13.47 (19.13) | 13.64 (12.77) | 13.72 (17.44) | 7.63 (34.05) | 9.85 (27.96) | 12.02 (28.43) | 12.13 (27.48) |
Humidity | 0.53 (0.03) | 0.55 (0.03) | 0.57 (0.01) | 0.57 (0.02) | 0.58 (0.03) | 0.58 (0.03) | 0.56 (0.02) | 0.53 (0.02) | 0.55 (0.02) | 0.55 (0.01) |
NDVI | 0.71 (0.12) | 0.77 (0.1) | 0.68 (0.13) | 0.74 (0.14) | 0.76 (0.13) | 0.72 (0.13) | 0.79 (0.08) | 0.71 (0.11) | 0.75 (0.11) | 0.78 (0.09) |
Runoff | 0.74 (1.72) | 1.76 (12.23) | 13.82 (124.46) | 5.09 (37.12) | 11.8 (208.8) | 18.23 (142.39) | 2.33 (12.66) | 2.12 (20.54) | 9.72 (148.37) | 11.22 (141.65) |
GDP per capita | 2.05 (1.14) | 2.26 (5.34) | 3.76 (13.98) | 3.26 (9.08) | 2.9 (7.49) | 3.27 (12.32) | 2.79 (7.12) | 2.48 (4.52) | 3.57 (11.87) | 3.83 (11.98) |
Area | 0.72 | 0.68 | 0.47 | 0.99 | 1.32 | 1.14 | 0.6 | 0.96 | 1.56 | 0.89 |
Note: () in parentheses refers to the variance of the corresponding indicator.
The GMM outputs probability values, which can effectively avoid hard clustering and are better than traditional clustering methods. According to the comparison of partition results, it can be seen that the GMM clustering results have fewer fragmented patches, smaller variance, and higher concentration. It is more suitable for dividing areas into smaller units. The ecohydrological indicators selected this time are similar to the index system of Audebert et al. (2024) and Liu et al. (2021), and the index weighting method is also consistent with Yin (2006). This study revealed the similarities and differences in the ecohydrological characteristics of the Ziya River Basin, which can provide a minimum management unit for the water resource management department of the Ziya River Basin and a theoretical basis for solving problems such as hydrological data translation in areas with no data in the basin. In this study, the GMM was used for partitioning to improve the fitness of clustering and to enhance the robustness of the application (Lysaght & Stockwood 1996).
CONCLUSIONS
(1) In this study, eight indicators were selected from the basic geography, hydrometeorology, ecological environment, human activities, and other data of the Ziya River Basin to construct the basin ecohydrological zoning index system, and the entropy method was used to assign weights to the eight indicators. The weights were as follows: elevation 0.263, slope 0.218, precipitation 0.053, temperature 0.048, relative humidity 0.055, runoff 0.224, vegetation normalised index 0.030, and GDP per capita 0.108.
(2) Taking the 1 km × 1 km grid as the dividing unit, we apply the probability density-based Gaussian mixed-model clustering algorithm, the 47,890 dividing units in the Ziya River Basin were finally classified into five categories of ecohydrological types of zoning.
(3) The K–W nonparametric test was used to test the zones, and a significant difference was observed in the median of the data for each zone (p = 0.000 < 0.05). Further testing of the differences between the two categories revealed that there were significant differences between the two categories (p = 0.000 < 0.05), indicating the difference in the eigenvalues of the zones and confirming the significance of the partitioning.
On the basis of ecohydrological zoning, ecological regions such as wetlands and water conservation forests can be identified, which are crucial for maintaining regional ecological balance. This information can inform the development of differentiated water allocation schemes. For example, water conservation and groundwater protection are emphasised in arid areas, while flood control and water quality management are emphasised in humid areas. Besides, according to the ecological and hydrological characteristics of different zones, the restoration plan should be formulated, such as afforestation and returning farmland to forest in the ecological and hydrological zones of mountainous areas with serious soil erosion. The study of ecohydrological zoning can break the traditional single-factor management and realise the ‘water–ecology–society’ coordination. It provides spatial visualisation tools that reduce the costs associated with management trial and error. This approach balances resource utilisation and ecological protection and helps the ‘double carbon’ goal and ecological civilisation construction.
Gaussian mixture clustering has relatively high computational complexity, especially when dealing with high-dimensional data or large-scale datasets. The iterative process of the EM algorithm may require significant computation time and tends to have slow convergence. Besides, Gaussian mixture clustering is quite sensitive to noise and outliers. Outliers can significantly affect the parameter estimation of the Gaussian distributions, potentially leading to inaccurate clustering results. The deficiency of the Gaussian mixture clustering method provides a direction for further research in the future.
FUNDING
The financial support for this work was provided by the National Natural Science Foundation of China (Grant no. 52239004).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.