Abstract
River water is an important source for drinking water supply in Northwestern New Territories of Hong Kong. Thus, there is no denying the fact that monitor the quality of river water is a must for the locals. In this study, a mixed multivariate analysis method was used to lower monitoring costs by optimizing the layout of water quality monitoring stations. To this purpose, the data from a period of five years and over 36,000 observations was evaluated in this article. The cluster analysis approach was also used to categorize monitoring stations into three groups. What's more, three latent factors that predominantly influence the river water quality were assessed using factor analysis: anthropogenic pollution, seawater intrusion and geological processes, and the nitrification process. A spatial pattern using the three latent factor scores was plotted and six redundant monitoring stations were identified by this pattern. Finally, discriminant analysis was used to extract seven significant parameters. The results showed that the surface water-monitoring program of the watercourses in the Northwestern New Territories (Hong Kong) could be adjusted by reducing the monitoring stations to 18 and the measured chemical parameters to seven to ensure the detection of water quality and reduce the cost.
HIGHLIGHTS
Studying the spatial variation of water resource by a mixed multivariate statistical method.
Performing cluster analysis for three latent factors that affect river water quality.
Establish evaluation system to identified redundant monitoring stations.
Extract seven significant parameters for measuring the contamination conditions.
Give a more reasonable monitoring station layout to ensure the detection of water quality and reduce the cost.
Graphical Abstract
INTRODUCTION
River water is one of the vital water resources. However, distinct from ground water and lakes, it is vulnerable to pollution. anthropogenic activities and natural processes can degrade the quality of surface water and impair its usability without difficulty. Accordingly, to control water pollution, monitor water quality in river basins (Simeonov et al. 2003), and interpret the temporal and spatial variations in water quality (Dixon & Chiswell 1996; Singh et al. 2004), monitoring water quality regularly is essential. Take Hong Kong for instance, the Northwestern New Territories is one of the most polluted regions in Hong Kong. The Hong Kong Environmental Protection Department (Hong Kong EPD) has established 24 water quality monitoring stations in the Northwestern New Territories and performed routine water quality monitoring in order to protect the water resource.
Multivariate statistical techniques such as cluster analysis (CA), factor analysis (FA), and discriminate analysis (DA) are commonly used to categorizing raw data due to their ability to analysis multiple related factors. For instance, they are applied to characterize the quality of river water, allowing a better understanding of the temporal and spatial variations in water quality, and be applied to the identification of discriminant parameters that are of use in optimizing monitoring network as well (Simeonov et al. 2003; Shrestha & Kazama 2007).
There is no denying the fact that multivariate statistical methods have the ability to extract meaningful information from a data set; prior research has proved this. FA and CA, as two kinds of multivariate statistical methods, were applied to assess the impacts of human activities on spatial variations in the water quality of 19 rivers (Wang et al. 2007). Meanwhile, the usefulness of CA, FA, and DA for interpreting complex data sets, assessing water quality, identifying pollution factors, and understanding temporal and spatial variations has been demonstrated in water quality for effective river water quality management (Kowlkowski et al. 2006; Shrestha & Kazama 2007; Venkastesharaju et al. 2010; Juahir et al. 2011; Samsudin et al. 2011; Wang et al. 2012). CA and DA were used to assess temporal and spatial variations in water quality, indicating that performing DA extracted significant parameters responsible for most variations in river water quality (Zhou et al. 2007). CA, FA, and DA were used to reduce the monitoring stations and the measured chemical parameters to lower monitoring cost (Wang et al. 2014). Moreover, multivariate statistical techniques have been applied to assess groundwater and lake water (Yang et al. 2010; Lu et al. 2012). The CA, FA, and DA multivariate methods utilized in the current study have been shown to be highly useful tools for extracting key information from complex data sets of water quality.
In the present study, large data sets obtained during a five-year (2009–2013) monitoring program were subjected to CA, FA, and DA to extract latent information about the similarities or dissimilarities among the monitoring sites. Fourteen water quality parameters, collected quarterly for one year, were carried out on water quality index (WQI) and multivariate statistical techniques in Agra, Uttar Pradesh, India. (Isaac & Siddiqui 2022) However, some reported work has been carried out, and the conclusion can be drawn that the collection times of four times a year are not enough to analyze the actual situation well. Considering the influences of temporal differences, we focus on determining redundant monitoring stations and identifying water quality variables responsible for spatial variations in water quality. The WQI has been applied to categorize the water quality, which is quite useful to infer the quality of water for the people and policy makers in the concerned area. (Ghoderao et al. 2022) We also focus on testing the validity of the results in spatial DA. As is shown in the research, the spatial landscape patterns resulted in differences in the concentrations of TCB and TP, while the concentrations of TCB and TP were applied as main indicators to represent the water quality in the recipient rivers and streams to some degree (Chang et al. 2022).
MATERIAL
Monitoring area and sampling
The Indus River is one of the largest rivers in the area, with a total length of about 49 km, covering an area of 43 km2. The Beas River, major branch of the Indus, flows from Lam Tsuen Country Park and covers an area of 20 km2. The Ganges River originating in Wo Ken Shan has a smaller area of 10 km2. Yuen Long Creek is around 60 km long and covers an area of 27 km2. With an area of 44.3 km2, the 50-km long Kam Tin River passes through the urban areas of Kam Tin and Yuen Long. All these four watercourses in the Yuen Long Basin flow into the inner Deep Bay via concrete channels.
HKEPD has collected water quality data from 24 monitoring sites, covering a wide range of the 13 inland watercourses.
Monitored parameters and data pretreatment
The data for 24 water quality monitoring sites, consisting of 48 water quality parameters monitored monthly over five years (2009 − 2013), were obtained from the Hong Kong EPD (2009-2013). Of these 48 parameters, 25 were selected based on their sampling continuity at all the selected monitoring sites, to be used in the present analysis. The selected parameters included electrical conductivity (EC), pH, dissolved oxygen (DO), temperature (TEMP), chemical oxygen demand (COD), five-day biochemical oxygen demand (BOD5), ammonia–nitrogen (), total Kjeldahl nitrogen (TKN), nitrate nitrogen (
), total phosphorus (TP), Escherichia coliforms (E. coli), fecal coliforms (F. coli), total solids (TS), total suspended solids (TSS), sulfide (
), fluoride (F), arsenic (As), aluminum (Al), iron (Fe), copper (Cu), chromium (Cr), manganese (Mn), lead (Pb), nickel (Ni), and zinc (Zn). All the water quality parameters are expressed in milligram/liter, except pH, EC (μS·cm−1), TEMP (°C), E. coli (cfu/100 ml) and F. coli (cfu/100 ml). The basic statistics of the five-year data set (36,000 observations) on river water quality are summarized in Table 1.
Statistical descriptives of water quality parameters
Parameters . | Mean . | SD . | SE . | Minimum . | Maximum . |
---|---|---|---|---|---|
TEMP | 24.77 | 4.82 | 0.13 | 12.20 | 37.10 |
pH | 7.54 | 0.47 | 0.01 | 6.30 | 10.10 |
EC | 782.35 | 2,630.91 | 69.33 | 22.00 | 29,410.00 |
TS | 580.65 | 1,932.69 | 50.93 | 30.00 | 24,000.00 |
TSS | 30.84 | 80.87 | 2.13 | 0.25 | 1,100.00 |
DO | 7.22 | 2.34 | 0.06 | 1.50 | 18.60 |
COD | 19.32 | 30.27 | 0.80 | 1.00 | 420.00 |
BOD5 | 13.00 | 27.80 | 0.73 | 0.05 | 240.00 |
TKN | 4.56 | 7.55 | 0.20 | 0.03 | 100.00 |
![]() | 0.86 | 1.56 | 0.04 | 0.00 | 40.00 |
![]() | 3.43 | 6.36 | 0.17 | 0.00 | 84.00 |
TP | 0.75 | 1.14 | 0.03 | 0.01 | 12.00 |
As | 2.94 | 3.76 | 0.10 | 0.50 | 39.00 |
![]() | 0.03 | 0.13 | 0.00 | 0.01 | 4.02 |
F | 0.29 | 0.17 | 0.00 | 0.10 | 1.20 |
Zn | 43.83 | 77.68 | 2.05 | 5.00 | 1,500.00 |
Ni | 2.45 | 3.77 | 0.10 | 0.50 | 63.00 |
Mn | 163.14 | 255.52 | 6.73 | 5.00 | 3,100.00 |
Pb | 4.20 | 10.91 | 0.29 | 0.00 | 190.00 |
Fe | 694.51 | 907.54 | 23.92 | 25.00 | 15,000.00 |
Cr | 1.01 | 2.61 | 0.07 | 0.50 | 67.00 |
Cu | 5.20 | 10.19 | 0.27 | 0.50 | 190.00 |
Al | 222.80 | 299.91 | 7.90 | 25.00 | 3,900.00 |
Escherichia coli | 180,196.93 | 538,185.64 | 14,182.44 | 0.50 | 6,800,000.00 |
Fecal coliforms | 441,387.30 | 1,549,019.78 | 40,820.26 | 0.50 | 40,000,000.00 |
Parameters . | Mean . | SD . | SE . | Minimum . | Maximum . |
---|---|---|---|---|---|
TEMP | 24.77 | 4.82 | 0.13 | 12.20 | 37.10 |
pH | 7.54 | 0.47 | 0.01 | 6.30 | 10.10 |
EC | 782.35 | 2,630.91 | 69.33 | 22.00 | 29,410.00 |
TS | 580.65 | 1,932.69 | 50.93 | 30.00 | 24,000.00 |
TSS | 30.84 | 80.87 | 2.13 | 0.25 | 1,100.00 |
DO | 7.22 | 2.34 | 0.06 | 1.50 | 18.60 |
COD | 19.32 | 30.27 | 0.80 | 1.00 | 420.00 |
BOD5 | 13.00 | 27.80 | 0.73 | 0.05 | 240.00 |
TKN | 4.56 | 7.55 | 0.20 | 0.03 | 100.00 |
![]() | 0.86 | 1.56 | 0.04 | 0.00 | 40.00 |
![]() | 3.43 | 6.36 | 0.17 | 0.00 | 84.00 |
TP | 0.75 | 1.14 | 0.03 | 0.01 | 12.00 |
As | 2.94 | 3.76 | 0.10 | 0.50 | 39.00 |
![]() | 0.03 | 0.13 | 0.00 | 0.01 | 4.02 |
F | 0.29 | 0.17 | 0.00 | 0.10 | 1.20 |
Zn | 43.83 | 77.68 | 2.05 | 5.00 | 1,500.00 |
Ni | 2.45 | 3.77 | 0.10 | 0.50 | 63.00 |
Mn | 163.14 | 255.52 | 6.73 | 5.00 | 3,100.00 |
Pb | 4.20 | 10.91 | 0.29 | 0.00 | 190.00 |
Fe | 694.51 | 907.54 | 23.92 | 25.00 | 15,000.00 |
Cr | 1.01 | 2.61 | 0.07 | 0.50 | 67.00 |
Cu | 5.20 | 10.19 | 0.27 | 0.50 | 190.00 |
Al | 222.80 | 299.91 | 7.90 | 25.00 | 3,900.00 |
Escherichia coli | 180,196.93 | 538,185.64 | 14,182.44 | 0.50 | 6,800,000.00 |
Fecal coliforms | 441,387.30 | 1,549,019.78 | 40,820.26 | 0.50 | 40,000,000.00 |
The prerequisite of most multivariate statistical methods is that variables should conform to the normal distribution, because of which, checking the normality of the distribution of each variable by analyzing kurtosis and skewness statistical test before multivariate statistical analysis is a must (Johnson & Wichern 1992; Lattin et al. 2003; Papatheodorou et al. 2006). The original data demonstrated values of kurtosis ranging from −0.727 to 633.359 and skewness ranging from −0.332 to 22.449, indicating with 95% confidence that distributions were far from normal. Since most of the values of kurtosis or skewness were greater than zero, the original data were transformed in the form (Kowalkowski et al. 2006; Papatheodorou et al. 2006). After log-transformation, the kurtosis and skewness values ranged from −1.261 to 2.978 and − 1.757 to 1.18, respectively. However, the the distributions of the log-transformed TS,
and Cr were also non-normal. Therefore, they were not regarded in the following study. All log-transformed variables were also z-scale standardized in the case of CA (the mean and variance were set to zero and one, respectively) to minimize the effects of different units and variance of variables and to render the data dimensionless (Liu et al. 2003; Singh et al. 2004).
METHODOLOGY
CA is an unsupervised pattern recognition method that divides a large group of cases into smaller groups or clusters of relatively similar cases dissimilar to other groups. Hierarchical CA, the most common approach, starts with each case in a separate cluster and joins the clusters together step by step until only one cluster remains (Lattin et al. 2003; McKenna 2003). The Euclidean distance usually provides an index of the similarity between two samples, and a distance can be represented by the difference between transformed values of the samples (Otto 1998). In this study, hierarchical CA was performed on the standardized data using Ward's method with squared Euclidean distances as a measure of similarity. The method uses analysis of variance (ANOVA) to calculate the distances between clusters to minimize the sum of squares of any two possible clusters at each step. Both temporal and spatial variations in water quality were determined from hierarchical CA using the linkage distance (Wunderlin et al. 2001; Simeonov et al. 2003; Singh et al. 2004; Astel et al. 2006; Kowlkowski et al. 2006; Shrestha & Kazama 2007).
Another multivariate technique, FA yields the general relationship between measured water quality parameters by elucidating the multivariate patterns that might help to simplify and classify the original data. It can be used to determine the spatial and temporal distribution of resultant factors and interpret them. This may yield insight into the main processes that govern the distribution of water quality parameters. Firstly, the raw data was standardized and made dimensionless. Secondly, the correlation coefficient matrix, eigenvalues, and eigenvectors were determined to yield the covariance matrix. Finally, the data are transformed into factors, and only factors with eigenvalues that exceed 1 were retained in this study (Reyment & Joreskog 1993). The contribution of each factor (factor score) at each monitoring station was computed and depicted spatially.



In this study, DA was performed on original data using the standard and backward stepwise modes to evaluate the spatial variations in water quality. The best discriminant functions for each mode were constructed considering the quality of the classification matrix and the number of parameters.
RESULTS AND DISCUSSION
Temporal similarity and period grouping

Conclusion can be drawn that monthly data reflect local water quality better than quarterly data. Actually, Figure 2 demonstrates that the temporal patterns to water quality were not purely consistent with the four seasons or the dry/wet seasons. The premise of water quality analysis is that relevant data needs to be collected at least once a month for the ensurance of accuracy.
Spatial similarity and site grouping

Dominating water quality factor and patterns

Rotated component loadings of the three principal components including eigenvalues greater than one, their percentage of variance, and cumulative percentage of variance in the FA
Pamameters . | Factors . | ||
---|---|---|---|
1 . | 2 . | 3 . | |
TEMP | 0.52 | 0.18 | 0.74 |
pH | 0.02 | −0.23 | 0.69 |
EC | 0.51 | 0.54 | 0.31 |
TS | 0.84 | 0.43 | −0.19 |
TSS | −0.84 | −0.28 | 0.38 |
DO | 0.83 | 0.52 | 0.11 |
COD | 0.84 | 0.49 | 0.02 |
BOD5 | 0.77 | 0.58 | 0.15 |
TKN | −0.60 | 0.21 | 0.56 |
![]() | 0.76 | 0.58 | 0.14 |
![]() | 0.68 | 0.62 | 0.20 |
TP | 0.03 | 0.87 | 0.07 |
As | 0.82 | 0.00 | 0.36 |
![]() | 0.84 | 0.50 | −0.03 |
F | 0.69 | 0.57 | 0.09 |
Zn | 0.25 | 0.77 | −0.33 |
Ni | 0.88 | 0.17 | −0.02 |
Mn | 0.29 | 0.84 | −0.23 |
Pb | 0.81 | 0.50 | 0.04 |
Fe | 0.89 | −0.10 | 0.22 |
Cr | 0.87 | 0.40 | 0.05 |
Cu | 0.86 | 0.44 | 0.00 |
Eigenvalue | 14.41 | 2.36 | 1.85 |
Percent of variance | 65.50 | 10.72 | 8.42 |
Cumulative variance (%) | 65.50 | 76.22 | 84.64 |
Pamameters . | Factors . | ||
---|---|---|---|
1 . | 2 . | 3 . | |
TEMP | 0.52 | 0.18 | 0.74 |
pH | 0.02 | −0.23 | 0.69 |
EC | 0.51 | 0.54 | 0.31 |
TS | 0.84 | 0.43 | −0.19 |
TSS | −0.84 | −0.28 | 0.38 |
DO | 0.83 | 0.52 | 0.11 |
COD | 0.84 | 0.49 | 0.02 |
BOD5 | 0.77 | 0.58 | 0.15 |
TKN | −0.60 | 0.21 | 0.56 |
![]() | 0.76 | 0.58 | 0.14 |
![]() | 0.68 | 0.62 | 0.20 |
TP | 0.03 | 0.87 | 0.07 |
As | 0.82 | 0.00 | 0.36 |
![]() | 0.84 | 0.50 | −0.03 |
F | 0.69 | 0.57 | 0.09 |
Zn | 0.25 | 0.77 | −0.33 |
Ni | 0.88 | 0.17 | −0.02 |
Mn | 0.29 | 0.84 | −0.23 |
Pb | 0.81 | 0.50 | 0.04 |
Fe | 0.89 | −0.10 | 0.22 |
Cr | 0.87 | 0.40 | 0.05 |
Cu | 0.86 | 0.44 | 0.00 |
Eigenvalue | 14.41 | 2.36 | 1.85 |
Percent of variance | 65.50 | 10.72 | 8.42 |
Cumulative variance (%) | 65.50 | 76.22 | 84.64 |
Major factor score pattern of 24 monitoring stations.
Spatial variations in water quality
Spatial-DA was performed using the original data set of 22 parameters after classification into the three major groups, A, B, and C, obtained through CA. The sites were the dependent variables and the measured parameters constituted the independent variables. DFs and CMs obtained from the standard, and backward stepwise modes of DA, are shown in Tables 3 and 4. The standard DA mode constructed DFs using 22 parameters. However, the backward stepwise DA showed that EC, , TP, As, Ni, Fe and F. coli were the discriminant parameters in spatial variation, with correct assignations of 90.4% for the three group sites (Table 5). Thus, the spatial-DA results suggested that only seven parameters, i.e., EC,
, TP, As, Ni, Fe, and F. coli were needed to account for most of the expected spatial variations in water quality.
Wilks’ lambda and chi-square test of DA of spatial variation of water quality
Pamameters . | Test of fun.(s) . | R . | Wilks’ lambda . | Chi-square . | p level . |
---|---|---|---|---|---|
Standard | 1 | 0.900 | 0.093 | 3,391.609 | 0.000 |
2 | 0.733 | 0.560 | 828.344 | 0.000 | |
Backward | 1 | 0.876 | 0.108 | 3,196.171 | 0.000 |
2 | 0.707 | 0.607 | 716.639 | 0.000 |
Pamameters . | Test of fun.(s) . | R . | Wilks’ lambda . | Chi-square . | p level . |
---|---|---|---|---|---|
Standard | 1 | 0.900 | 0.093 | 3,391.609 | 0.000 |
2 | 0.733 | 0.560 | 828.344 | 0.000 | |
Backward | 1 | 0.876 | 0.108 | 3,196.171 | 0.000 |
2 | 0.707 | 0.607 | 716.639 | 0.000 |
Classification functions coefficients for DA of spatial variation
Pamameters . | Standard mode . | Backward stepwise mode . | ||||
---|---|---|---|---|---|---|
A . | B . | C . | A . | B . | C . | |
TEMP | 119.056 | 117.096 | 122.896 | |||
PH | 1,890.406 | 1,926.923 | 1,921.663 | |||
EC | 36.417 | 40.743 | 40.369 | 24.536 | 28.305 | 28.329 |
TSS | −1.068 | −2.879 | −3.841 | |||
DO | −40.907 | −39.121 | −39.505 | |||
COD | 2.640 | 4.041 | 5.784 | |||
BOD5 | −21.665 | −21.704 | −21.387 | |||
TKN | −20.251 | −21.483 | −21.823 | |||
![]() | −1.690 | .037 | .571 | 2.636 | 4.505 | 4.836 |
![]() | 20.226 | 19.990 | 22.191 | |||
TP | .851 | 6.783 | 7.771 | −14.531 | −9.120 | −5.824 |
As | −38.406 | −36.820 | −38.773 | −7.135 | −4.620 | −7.034 |
F | −89.199 | −90.628 | −89.879 | |||
Zn | 16.902 | 16.177 | 17.093 | |||
Ni | 25.312 | 21.205 | 27.540 | −20.065 | −25.898 | −18.528 |
Mn | 24.957 | 24.775 | 24.260 | |||
Pb | −32.943 | −33.213 | −32.147 | |||
Fe | 43.198 | 52.260 | 51.758 | 34.513 | 41.679 | 40.917 |
Cu | −4.672 | −3.119 | −2.583 | |||
Al | 14.433 | 13.065 | 12.646 | |||
E. coli | 4.340 | 4.082 | 4.096 | |||
F. coli | 5.331 | 7.910 | 8.626 | 7.467 | 9.272 | 10.632 |
Constant | −1,061.939 | −1,122.534 | −1,127.033 | −88.898 | −14.057 | −117.496 |
Pamameters . | Standard mode . | Backward stepwise mode . | ||||
---|---|---|---|---|---|---|
A . | B . | C . | A . | B . | C . | |
TEMP | 119.056 | 117.096 | 122.896 | |||
PH | 1,890.406 | 1,926.923 | 1,921.663 | |||
EC | 36.417 | 40.743 | 40.369 | 24.536 | 28.305 | 28.329 |
TSS | −1.068 | −2.879 | −3.841 | |||
DO | −40.907 | −39.121 | −39.505 | |||
COD | 2.640 | 4.041 | 5.784 | |||
BOD5 | −21.665 | −21.704 | −21.387 | |||
TKN | −20.251 | −21.483 | −21.823 | |||
![]() | −1.690 | .037 | .571 | 2.636 | 4.505 | 4.836 |
![]() | 20.226 | 19.990 | 22.191 | |||
TP | .851 | 6.783 | 7.771 | −14.531 | −9.120 | −5.824 |
As | −38.406 | −36.820 | −38.773 | −7.135 | −4.620 | −7.034 |
F | −89.199 | −90.628 | −89.879 | |||
Zn | 16.902 | 16.177 | 17.093 | |||
Ni | 25.312 | 21.205 | 27.540 | −20.065 | −25.898 | −18.528 |
Mn | 24.957 | 24.775 | 24.260 | |||
Pb | −32.943 | −33.213 | −32.147 | |||
Fe | 43.198 | 52.260 | 51.758 | 34.513 | 41.679 | 40.917 |
Cu | −4.672 | −3.119 | −2.583 | |||
Al | 14.433 | 13.065 | 12.646 | |||
E. coli | 4.340 | 4.082 | 4.096 | |||
F. coli | 5.331 | 7.910 | 8.626 | 7.467 | 9.272 | 10.632 |
Constant | −1,061.939 | −1,122.534 | −1,127.033 | −88.898 | −14.057 | −117.496 |
Classification matrix for DA of spatial variation
Monitoring sites . | Percent correct . | Period assigned by DAa . | ||
---|---|---|---|---|
A . | B . | C . | ||
Standard mode | ||||
A | 93.00 | 279 | 19 | 2 |
B | 86.2 | 18 | 362 | 40 |
C | 93.3 | 0 | 48 | 672 |
Total | 91.2 | 297 | 429 | 714 |
Backward stepwise mode | ||||
A | 92.3 | 277 | 20 | 3 |
B | 85.7 | 19 | 360 | 41 |
C | 91.7 | 0 | 60 | 660 |
Total | 90.4 | 296 | 440 | 704 |
Monitoring sites . | Percent correct . | Period assigned by DAa . | ||
---|---|---|---|---|
A . | B . | C . | ||
Standard mode | ||||
A | 93.00 | 279 | 19 | 2 |
B | 86.2 | 18 | 362 | 40 |
C | 93.3 | 0 | 48 | 672 |
Total | 91.2 | 297 | 429 | 714 |
Backward stepwise mode | ||||
A | 92.3 | 277 | 20 | 3 |
B | 85.7 | 19 | 360 | 41 |
C | 91.7 | 0 | 60 | 660 |
Total | 90.4 | 296 | 440 | 704 |
aChecked by cross-validation method.
Based on the above results, backward DA was proved to be a valuable tool to recognize the discriminant parameters in spatial variations of surface water quality; additionally, it was essential to strengthen the monitoring accuracy of EC, , TP, As, Ni, Fe, and F. coli to clearly identify variations in future. Furthermore, compared to another two groups, the pollution of group C was relatively serious and should be controlled.
CONCLUSIONS
In this case study, different multivariate statistical methods were used to assess spatial variations in water quality of watercourses in the Northwestern New Territories, Hong Kong to take the influences of temporal differences into consideration. Hierarchical CA grouped the 12 months into two periods (the first and second periods) and classified 24 sampling sites into three groups (A, B, and C) based on the similarity of water quality characteristics. The temporal and spatial similarities and groupings could facilitate the design of an optimal future monitoring strategy that could decrease monitoring frequency, the number of sampling stations, and the corresponding costs for the Northwestern New Territories. Moreover, DA provided better results spatially with great discriminatory ability, according to significance tests. DA rendered an important reduction in the required amount of data for the three groups of monitoring sites, because it only used seven parameters (EC, , TP, As, Ni, Fe, and F. coli) for the spatial analysis and produced more than 90.40% correct assignations. There is no doubt that water quality is influenced by color degree, hardness, and EC. From the pH value in the original data, the conclusion can be easily drawn that the color degree of these water qualities is similar, which is because with the development of technology, the factory will first neutralize the acid and base of sewage before discharge, so it is reasonable to remove the color degree. The rest of the hardness and EC are the factors that we screened out. Therefore, DA allowed a reduction in the dimensionality of the large data set and indicated a few significant parameters responsible for large variations in water quality that could reduce the number of sampling parameters. This study illustrates that multivariate statistical methods are an excellent exploratory tool for interpreting complex water quality data sets and for understanding spatial variations considering the influences of temporal differences, which are useful and effective for water quality management. Meanwhile, through the results of water quality analysis of all the monitoring stations, the monitoring points that can be optimized are selected. It is of economic value to reduce the testing cost while ensuring the effect of water quality testing.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (No. 12101604).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.