With rapid urbanization and industrialization in Hubei, assessing water quality and identifying key influencing factors are crucial for lake conservation. This study utilized remote sensing and machine learning to analyze water quality parameters, including the permanganate index, total phosphorus (TP), and turbidity, in Liangzi Lake (Hubei's second-largest lake) from 2019 to 2022. A spatial quantification model and statistical analysis were employed to assess human activity intensity at different buffer scales (1,000–4,000 m) and identify key influencing factors. Results showed significant seasonal and annual variations in water quality, with the highest pollution in autumn and the lowest in winter. Pollution levels decreased from 2019 to 2020 but gradually increased from 2020 to 2022, possibly related to changes in human activities during the COVID-19 pandemic. Spatially, the Gaotang sub-lake had the highest permanganate index and TP pollution, while the Manjiang sub-lake had the highest turbidity. Precipitation and livestock density were the primary factors, accounting for 26.7–30.8% and 12.9–17.6% of water quality variation, respectively. At the 1,000 m and 2,000–4,000 m buffer zones, industrial output and population density were the dominant human activity factors influencing water quality. This study provides practical insights for targeted lake management strategies and environmental protection efforts.

  • Remote sensing and machine learning were used to investigate the lake water quality.

  • Random forest outperformed the backpropagation neural network in water quality inversion accuracy.

  • Water quality parameters exhibited significant seasonal and inter-annual variations.

  • Key factors include precipitation, industry, population, and livestock density.

  • Targeted water management strategies were proposed based on buffer zone impacts.

Lake water quality is crucial for ecological balance, water resource management, and biodiversity conservation. Accurately assessing pollution levels and identifying key influencing factors are essential for effective lake environmental protection (Yang et al. 2020; Wu et al. 2021). Water quality fluctuations are influenced by both natural factors and human activities (Li et al. 2023b; Tong et al. 2023). However, with increasing population growth and intensified land use changes, human activities have become the primary driving force. Pollution occurs through two main pathways: point source pollution (industrial and domestic wastewater discharge) and non-point source pollution (agricultural fertilizers, livestock manure, and urban runoff) (Grizzetti et al. 2008). These pollutants contribute to lake eutrophication, cyanobacterial blooms, and aquatic habitat degradation, ultimately threatening ecosystems and drinking water safety. Therefore, understanding the spatiotemporal variations in lake water quality and its responses to human activities is essential for pollution control and sustainable water resource management (de Oliveira et al. 2017).

Three methods are commonly used to study the response of lake water quality to human activities. The first method uses mechanistic models, such as Soil and Water Assessment Tool (SWAT) and AnnAGNPS (Yasarer et al. 2016), to simulate the impact of human activities on lake water quality. It is widely applied in non-point source pollution studies (Niroula et al. 2023). This method provides realistic simulation results and captures the dynamic changes in water quality. However, it requires extensive data and some key parameters are difficult to obtain, making it highly sensitive to parameter variations (Jeung et al. 2023). The second method is the pollution load method, which quantitatively evaluates the impact of point and non-point source pollution on the water environment, including applications such as equivalent standard pollution load (Zhou et al. 2020) and output coefficient models (Cai et al. 2018; Zhang et al. 2020; Tong et al. 2022). The advantage of this method is that it simplifies data acquisition and allows for an intuitive quantification of the pollutant contributions. However, it lacks mechanistic analysis of pollution sources and water quality changes, making it difficult to effectively capture dynamic changes in water quality. The third method involves calculating the human activity intensity index based on an evaluation index system. This method comprehensively considers factors such as population (Chen et al. 2016), agricultural activities, socioeconomic development (Li et al. 2023a), and land use structure (Brown et al. 2017; Kim et al. 2022), and establishes a regression model to quantify the impact of these factors on water quality. The primary advantages of this method are the convenience of data acquisition, its ability to reflect human activity impacts on lake water quality from multiple dimensions, and its suitability for large-scale regional water quality assessments. Although the limitations of this method include the inability of regression analysis to reveal causal relationships and its neglect of system complexity, it is operationally simple and easily integrates with spatial analysis techniques, making it an effective tool for assessing the relationship between lake water quality and human activities.

Acquiring water quality data are a fundamental prerequisite for studying the response of lake water quality to human activities. Real-time monitoring of water quality changes at designated points has become the primary method for tracking water quality changes (Huang et al. 2021; Mokarram et al. 2022). However, this method has significant limitations: although real-time monitoring can provide data with high temporal resolution, the selection of monitoring sites is often constrained by cost and accessibility. Particularly for large water bodies, the limited number of monitoring points makes it difficult to comprehensively reflect the spatial heterogeneity of water quality (Li et al. 2022). Additionally, real-time monitoring equipment is susceptible to environmental factors (such as water currents and biological interference), which may introduce local biases in the data, resulting in inaccurate assessments of overall water quality conditions. In contrast, remote sensing technology significantly reduces the cost of acquiring water quality parameters while providing a comprehensive view of the target area. Through its extensive spatial coverage, it avoids water quality assessment errors caused by insufficient monitoring points. Unlike traditional field monitoring, which relies on dense sampling points and manual collection, remote sensing can achieve relatively high-precision retrieval of water quality parameters with smaller datasets, offering an efficient alternative for large-scale water quality studies. Currently, water quality parameters such as chlorophyll-a (Neil et al. 2019; Dang et al. 2023), turbidity (Sun et al. 2021; Zhou et al. 2021), and total phosphorus (TP) (Gao et al. 2015; Xiong et al. 2022) have been successfully retrieved, providing robust data support for water environment protection.

Among the sensors available, the multispectral imager on Sentinel-2 is considered the most suitable for inland water remote sensing (Kim et al. 2022). Its high spatial resolution (10–60 m) and high temporal resolution (2–5 days) offer adequate conditions for inland water quality monitoring. For the choice of inversion models, machine learning algorithms, which better capture complex nonlinear relationships among input features and are more robust to atmospheric correction errors, have become increasingly popular in inland water remote sensing. These algorithms include backpropagation (BP) neural networks, random forests, and support vector machines (Kim et al. 2022; Jiang et al. 2023). Additionally, traditional inversion methods such as physical models (e.g., Hydrolight) (Rodero et al. 2021), semi-analytical algorithms (e.g., Quasi-Analytical Algorithm (QAA)) (Chen et al. 2022a), statistical models (e.g., band ratio methods) (Ha et al. 2017), and hybrid approaches (e.g., Look-Up Table (LUT)) (Khattab & Merkel 2014) have been widely employed for remote sensing inversion of water quality parameters, each demonstrating unique advantages in different application scenarios.

Human activities exhibit distinct spatial characteristics, making population, social, and economic statistical data increasingly important in studying the relationship between humans and the natural environment. However, these data are typically aggregated by administrative units (such as cities or counties), which do not fully align with the spatial scope of geographic data. This mismatch creates difficulties in integration, making it challenging to accurately reflect the complex interactions between human activities and the natural environment (Halpern et al. 2015; Kovarik & van Beynen 2015). Spatialization of socioeconomic data can effectively resolve these issues. This process involves mapping statistical data onto spatial grid cells with a certain resolution based on the potential spatiotemporal distribution characteristics of socioeconomic data, thereby simulating the geographic distribution of these data (De Bono & Mora 2014). Spatialization methods based on land use data are commonly used for spatializing statistics such as population and gross domestic product (GDP) (Zhang et al. 2020; Zhang et al. 2022). Therefore, integrating regional population, livestock quantity, and other socioeconomic statistical data with basic geographic unit data through spatialization techniques can provide strong support for improving the accuracy of human activity quantification and also aid in the detailed analysis of the impact of human activities on lake water quality.

This study proposes a novel quantitative framework to analyze the response of water quality to human activities. The research is structured as follows: First, it employs remote sensing inversion techniques to assess water quality from a planar perspective; second, it uses an evaluation index system-based method and a spatial quantification model to evaluate human activity intensity; finally, it applies redundancy analysis (RDA) to identify the key human activity factors driving water quality changes in Liangzi Lake. This study provides a scientific basis for understanding the mechanisms by which human activities impact lake water quality, contributing to the protection and management of lake ecosystems.

This study focuses on three core modules – planar quantification of lake water quality parameters, spatial quantification of human activity intensity, and identification of water quality driving factors – to systematically reveal the impact mechanisms of human activities on lake water quality. First, in the planar quantification of water quality parameters, the study utilized Sentinel-2 multispectral remote sensing imagery, processed into Level-2A reflectance data via Sen2Cor atmospheric correction. Water quality inversion models tailored to Liangzi Lake were constructed using BP neural networks and random forest (RF) algorithms. These models train the relationship between reflectance (from single and multi-band combinations) and water quality parameters, with the higher-accuracy model selected to generate the spatiotemporal distribution of key water quality parameters in Liangzi Lake.

Second, in the spatial quantification of human activity intensity, the study selected eight indicators to reflect human activity characteristics around Liangzi Lake. Some indicators, derived from statistical panel data, were transformed into spatial data by constructing grids and integrating land use data, using spatial interpolation and weight allocation methods. The remaining indicators were directly obtained as spatial data. Subsequently, the average values of these indicators within 1,000, 2,000, 3,000, and 4,000 m buffer zones were calculated to quantify human activity intensity across different buffer ranges.

Finally, in the identification of water quality driving factors, the study integrated water quality parameters with 11 influencing factors (including human activity and meteorological factors) and employed RDA to assess their impact on water quality. Multicollinear variables were removed prior to analysis, followed by the generation of ordination plots using Canoco software. In these plots, arrow length and direction indicate the contribution and correlation of factors, respectively, enabling the identification of key driving factors (e.g., agriculture or urbanization impacts) within different buffer zones. Figure 1 illustrates the methodological framework of the study.
Figure 1

Schematic of the research route.

Figure 1

Schematic of the research route.

Close modal

Study area

Liangzi Lake (30°5′N-30°18′N, 114°21′E-114°39′E) is situated across Wuhan and Ezhou cities in China, encompassing a watershed area of 3,265 km2 within the Yangtze River Basin. It is the second-largest lake in Hubei Province and is listed in the Asia Wetland Conservation Directory. Liangzi (LZ) Lake has a vast lake area, typically divided into five sub-lake regions (Figure 2): Zhangqiao (ZQ) Lake, Niushan (NS) Lake, Gaotang (GT) Lake, Manjiang (MJ) Lake, and Qianjiang (QJ) Lake (Xu et al. 2018). Its catchment area is extensive, mainly encompassing rural areas with a low level of urbanization. Historically, long-term fish farming using net cages led to a decrease in the lake's self-purification capacity. Although all fish farming activities have now ceased, the lake's self-purification capacity is recovering slowly, and the natural ecological chain remains imbalanced. Additionally, as a transboundary lake, Liangzi Lake faced the issue of ‘one lake, two standards’ before 2020: the Ezhou section followed Class III water quality standards, while the Wuhan section adhered to Class II standards. This discrepancy in management and standards has increased the difficulty of managing Liangzi Lake.
Figure 2

Schematic of the study area.

Figure 2

Schematic of the study area.

Close modal

In recent years, due to the development of primary industries such as aquaculture and livestock farming around the lake, as well as the growth of manufacturing and supply industries, Liangzi Lake has experienced environmental issues such as eutrophication, non-point source pollution, and a reduction in lake area. According to previous studies, the main sources of pollutants in Liangzi Lake include upstream river input, surface runoff input, and human activities (such as industrial, agricultural, and livestock activities) in buffer zones (Sun et al. 2021; Zhou et al. 2021).

Data sources

Remote sensing data

Sentinel-2 satellite data are sourced from the European Copernicus Space Program and can be freely accessed via the Copernicus Data Hub (https://scihub.copernicus.eu/). In this study, Sentinel-2 Level-1C data were downloaded and processed into Level-2A reflectance images using the Sen2Cor processor (version 2.9), with its correction effectiveness for water bodies previously validated (Sola et al. 2018; Warren et al. 2019; Kim et al. 2022). To standardize resolution, the original images were resampled to 10 m through interpolation, ensuring pixel consistency.

The study selected 44 cloud-free remote sensing images of the study area from January 2019 to February 2023 (acquisition dates detailed in Table S1), covering the spring, summer, autumn, and winter seasons of the region.

Water quality data

This study selected six water quality parameters, including dissolved oxygen (DO), turbidity (Turb), permanganate index (CODMn), ammonia nitrogen (NH3-N), TP, total nitrogen (TN), and chlorophyll-a (Chla). Data, spanning January 2019 to March 2023, were obtained from the China Environmental Monitoring Center and monitored by four national surface water quality automatic monitoring stations in Wuhan. The geographical locations of these stations are shown in the right panel of Figure 1, with detailed coordinates provided in Table S2. Water quality data were updated every four hours and synchronized with Sentinel-2 satellite observations around 11:00 AM by selecting measurements from 8:00 to 12:00 AM.

Socioeconomic and meteorological data

The socioeconomic data used in this study (e.g., fertilizer application rates, grain production, and livestock numbers) were primarily derived from the statistical yearbooks of Hubei Province and the province's national economic and social development bulletins. These data were aggregated at the district or county level. For indicators lacking district- or county-level data, city-wide figures were adjusted proportionally according to area. Specifically, wind speed data were obtained from the National Climatic Data Center (station ID: 57494, Wuhan Station), while temperature (TEM) and precipitation (PRE) data for the period 2019–2022 were sourced from the National Earth System Science Data Center (http://www.geodata.cn).

Water quality remote sensing inversion model

Single-band reflectance from Sentinel-2 imagery was used to create various band combinations (Table 1). These combinations were analyzed for correlation with measured water quality data. The calculation method is outlined in Supplementary Information S1. The 10 band combinations with the highest correlations were selected. BP neural network and RF models were then applied to link them with water quality parameters. The dataset was split randomly: 70% for training and 30% for accuracy validation. Model performance was assessed using three metrics: coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). The more accurate inversion model was chosen to evaluate the spatiotemporal distribution of key water quality parameters in Liangzi Lake. Seasonal or annual means were calculated using all valid pixels within the respective period (a season or a year).

Table 1

Band combination modes

Band combinationFormula
Single-band bi 
lnbi 
Two-band bi/bj 
bibj 
bi+bj 
bi/(bibj
((bibj)/(bi + bj))2 
Three-band bi/(bj + bk
bi/(bjbk
Black and Odorous Water Index (Yao et al. 2019(Rrs(560) − Rrs(665))/(Rrs(490) + Rrs(560) + Rrs(665)) 
Water Cleanliness Index (Li et al. 2019 
Enhanced Vegetation Index (Huete et al. 20022.5(Rrs(842) − Rrs(665))/(Rrs(842) + 6Rrs(665) − 7.5Rrs(490) + 1) 
Chl-a Index (Ogashawara & Li 2019(Rrs(740)/Rrs(665) − Rrs(740)/Rrs(705) 
Band combinationFormula
Single-band bi 
lnbi 
Two-band bi/bj 
bibj 
bi+bj 
bi/(bibj
((bibj)/(bi + bj))2 
Three-band bi/(bj + bk
bi/(bjbk
Black and Odorous Water Index (Yao et al. 2019(Rrs(560) − Rrs(665))/(Rrs(490) + Rrs(560) + Rrs(665)) 
Water Cleanliness Index (Li et al. 2019 
Enhanced Vegetation Index (Huete et al. 20022.5(Rrs(842) − Rrs(665))/(Rrs(842) + 6Rrs(665) − 7.5Rrs(490) + 1) 
Chl-a Index (Ogashawara & Li 2019(Rrs(740)/Rrs(665) − Rrs(740)/Rrs(705) 

Not.: i, j, and k represent the bands of Sentinel-2, i, j, k = 2, 3, 4, 5, 6, 7, 8, 8A, 9,11,12; , , and λ2, λ3, λ4 represent the central wavelength of the second, third, and fourth bands of Sentinel-2, respectively.

The BP neural network is a multi-layer model trained by error BP. It excels in nonlinear mapping, making it popular for water quality remote sensing (He et al. 2021; Chen et al. 2022b). This study used a three-layer BP neural network. The learning rate was set to 0.015 after optimization tests. Other parameter details are in Table S3. The RF model consists of multiple decision trees (Jiang et al. 2023). It boosts accuracy and reduces overfitting through combined predictions. This makes it effective for diverse datasets. In this study, the RF model used 100 trees, selected experimentally. Other parameters stayed at default values.

Construction and quantification of human activity intensity

Human activities that stress water environment systems mainly include industrial activities, agricultural activities, and human living activities. Based on literature review and expert consultation, eight indicators were selected to represent human activity intensity: grain yield per unit of cultivated land (GYUCL), fertilizer load on cultivated land (FLCL), livestock density (LD), average industrial output per unit area (AIOUA), population density (PD), road density (RD), nighttime light intensity (NLI), and impervious surface ratio (ISR).

The raw data (GYUCL, FLCL, LD, and AIOUA) are socioeconomic panel data, requiring spatial processing to reveal their distribution patterns. The processing method is as follows: First, a 30 m × 30 m grid was created within the study area, and land use data (derived from 335,709 Landsat images processed by Yang and Huang using Google Earth Engine (Yang & Huang 2021)) were integrated using the Union tool in ArcGIS. Next, the proportion of different land use types in each grid cell was calculated. Finally, these proportions were multiplied by the weights corresponding to each indicator to obtain the spatial adjustment coefficients.

The weight of each land use type depends on its correlation with human activity intensity, population distribution, production functions (e.g., grain or industrial output), and residential suitability. These factors collectively determine the weight differences in the spatialization process. The weights are obtained through methods such as literature review, expert scoring, and field surveys. The detailed information on the spatial land use weights of the above four socioeconomic indicators is shown in Table S4.

The spatial calculations for GYUCL, FLCL, LD, and AIOUA can be found in the following equations:
(1)
(2)
(3)
(4)
where GYUCLj (t/km2), FLCLj (t/km2), LDj (heads/km2), and AIOUAj (108 yuan/km2) are the GYUCL, FLCL, LD, and AIOUA of the jth grid cell, respectively; P1j (t/km2), P2j (t/km2), P3j (heads/km2) and P4j (108 yuan/km2) are the GYUCL, FLCL, LD, and AIOUA in the district or county where the jth grid cell is located, respectively; M1i, M2i, M3i, and M4i are the weights of GYUCL, FLCL, LD, and AIOUA for the ith land use type (Table S4), respectively; Ai is the proportion of the area of the ith land use type within the grid cell to the total area of the grid cell; and n represents the total number of land use types, set to 6 in this study.
The remaining four indicators (PD, RD, NLI, and ISR) can directly obtain spatialized data without additional spatial processing. PD data were sourced from the LandScan global population dataset for the years 2019–2022, available on the official website https://landscan.ornl.gov/, and vector-clipped to the study area. RD data were processed using the Line Density function for interpolation, achieved through ArcGIS's Line Density analysis tool, and the results were linearly normalized. NLI was collected from the National Earth System Science Data Center, part of the National Science & Technology Infrastructure of China (http://www.geodata.cn), and vector-clipped to the study area (Chen et al. 2021). ISR is calculated based on land use information and reflects the proportion of impervious surfaces (i.e., hardened areas where rainwater cannot infiltrate, such as buildings, roads) (Zhang & Xu 2021).
(5)
where the term ISRj is the ISR of the jth grid cell, Sbj (km2)is the area of impervious surfaces within the jth grid cell, and Sj (km2) is the total area of the jth grid cell.

The buffer zone refers to the land area extending outward from the lakeshore as the inner boundary by different distances (set as 1,000, 2,000, 3,000, and 4,000 m in this study). As a transitional area between the lake and the surrounding land, this region is subject to agricultural, urban construction, transportation, and other activities, whose impacts on lake water quality, hydrological processes, and ecological health are non-negligible. The values of human activity indicators within each buffer zone were calculated by averaging all grid-based indicators in the zone, in order to reflect the overall characteristics of human activity intensity in the region.

Redundancy analysis

Water quality changes are also susceptible to the influence of meteorological factors, such as TEM, PRE, and wind speed (WIN). For example, PRE affects water quality through runoff and dilution processes, TEM regulates chemical and biological activities in water bodies, and wind speed influences lake mixing and resuspension processes. Therefore, in addition to analyzing the impacts of human activity factors on water quality, this study also included these three meteorological factors. However, it is important to note that the study focused on Liangzi Lake as a whole and did not delve into specific conditions of individual sub-lake areas. This is due to the study period being four years and the analysis being based on annual data, which did not provide enough samples for correlation and RDA for individual sub-lake areas.

In this study, the response variable matrix comprises three water quality parameters – CODMn, TP, and Turb – which were selected due to their relatively high retrieval accuracy. The explanatory variable matrix includes 11 influencing factors, consisting of 3 meteorological variables and 8 indicators representing the intensity of human activities. RDA was conducted for water quality in relation to natural factors and human activity factors within 1,000, 2,000, 3,000, and 4,000 m buffer zones. The method for delineating buffer zones of each sub-lake area is detailed in Supplementary Information S2.

To avoid the problem of overlapping information among explanatory variables, we first used variance inflation factor (VIF) analysis to identify and remove variables with strong collinearity. Removing these redundant variables helps ensure the reliability of subsequent statistical analysis. After filtering out highly collinear variables, we applied RDA to explore how the remaining explanatory variables relate to water quality parameters. RDA is a multivariate statistical method that can reveal how multiple environmental and human activity factors jointly influence water quality (Israels 1984; Ding et al. 2016). This approach aligns with our study's goal of identifying the most influential driving forces.

The RDA is implemented, and the ordination plots are produced using Canoco software as a means to quantify the relationships between the explanatory variables and water quality parameters. This process enables the evaluation of factors potentially affecting water quality, with emphasis placed on explanatory variables with longer arrows aligned with the direction of the water quality parameters (Zhao et al. 2015).

Performance evaluation of inversion models

During the correlation analysis, it was found that DO, TN, and Chla exhibited weak correlations with remote sensing image reflectance, making them difficult to effectively retrieve using remote sensing data. Considering both modeling accuracy and practical applicability, this study excluded these three parameters from the remote sensing inversion models, which may to some extent limit a comprehensive depiction of lake water quality. Therefore, the models were constructed based solely on the parameters with stronger correlations to remote sensing data: CODMn, TP, and Turb.

The inversion accuracy results are shown in Table 2. For both the training and testing sets, the coefficients of determination (R2) for CODMn, TP, and Turb inverted using RF were superior to those obtained using the BP neural network. Notably, for CODMn, RF achieved an R2 of 0.92 in the testing set, 0.86 for TP, and 0.76 for Turb. The MAE and RMSE values were lower under the RF model compared to the BP model, demonstrating the superior inversion performance of RF. Hence, the RF-based water quality inversion values were selected to explore the relationship between water quality and human activities. The RF model performed better in this study, possibly due to its stronger anti-overfitting ability and its advantage in handling high-dimensional nonlinear data.

Table 2

Comparison of water quality inversion model accuracy

IndexInversion modelR2
MAE
RMSE
Training setTest setTraining setTest setTraining setTest set
CODMn RF 0.93 0.92 0.286 0.355 0.417 0.462 
BP 0.76 0.79 0.611 0.646 0.741 0.853 
TP RF 0.91 0.86 0.002 0.005 0.003 0.006 
BP 0.80 0.79 0.004 0.005 0.006 0.006 
Turb RF 0.76 0.76 5.404 6.310 11.50 8.833 
BP 0.50 0.57 9.183 8.321 16.37 12.11 
IndexInversion modelR2
MAE
RMSE
Training setTest setTraining setTest setTraining setTest set
CODMn RF 0.93 0.92 0.286 0.355 0.417 0.462 
BP 0.76 0.79 0.611 0.646 0.741 0.853 
TP RF 0.91 0.86 0.002 0.005 0.003 0.006 
BP 0.80 0.79 0.004 0.005 0.006 0.006 
Turb RF 0.76 0.76 5.404 6.310 11.50 8.833 
BP 0.50 0.57 9.183 8.321 16.37 12.11 

Figure 3 presents the optimal inversion results of the RF model. It can be observed that all three water quality parameter inversion models share a common feature: at low-concentration values, the measured and predicted data points clustered near the 1:1 line, while at relatively high concentrations, the data points were more dispersed. This indicates that the RF model performed better in inverting low-concentration water quality than in high-concentration scenarios. When CODMn > 8.0 mg/L, TP > 0.07 mg/L, or Turb >78 NTU, data points fell below the 1:1 line, indicating potential underestimation of water pollution. This discrepancy may be related to factors such as the limited number of high-concentration samples, the reduced sensitivity of remote sensing reflectance at high pollution levels, or the limited fitting capability of the model for extreme values.
Figure 3

Validation results of water quality inversion accuracy. (N represents the sample size.)

Figure 3

Validation results of water quality inversion accuracy. (N represents the sample size.)

Close modal

Spatiotemporal distribution of water quality parameters

Intra-annual variability

Figures 4 and S1, S2 show the spatial distribution of CODMn, TP, and Turb for Liangzi Lake from 2019 to 2022 across different seasons using the RF-based water quality inversion model. Based on the effective pixels identified from these three figures, Table S5 presents the spatial averages of water quality parameters for the entire lake area and each sub-lake area across seasons over the 4 years. The maximum seasonal ratios of CODMn, TP, and Turb in the entire lake area within a year were 2.2, 2.3, and 2.5, respectively, all observed in 2019 (note that spring values were not calculated for this year, so the actual ratios might be higher). The minimum seasonal ratios were 1.6, 1.4, and 1.1, observed in 2021, 2020, and 2020, respectively, indicating significant inter-annual differences in seasonal water quality fluctuations across Liangzi Lake.
Figure 4

Spatial distribution of CODMn in Liangzi Lake across different seasons.

Figure 4

Spatial distribution of CODMn in Liangzi Lake across different seasons.

Close modal

These above three water quality parameters exhibit significant seasonal variation. Overall, CODMn, TP, and Turb were highest in autumn (September–November), followed by summer (June–August), and lowest in spring (March–May) or winter (December–February of the following year). Only CODMn and TP showed slight deviations in seasonal variation in certain years; for example, CODMn was higher in summer than in autumn in 2020 and 2022, and TP also showed similar trends in 2022. This seasonal variation pattern may be influenced by the combined effects of climatic conditions and human activities. For example, increased rainfall and the input of agricultural non-point source pollution in summer and autumn may lead to elevated nutrient concentrations; while in winter, lower temperatures and weakened biological activity are conducive to water quality improvement. In addition, seasonal differences in human activities such as tourism, aquaculture, and sewage discharge may also affect the spatial and temporal distribution of pollutants.

Figure 5(a) shows the multi-year average seasonal spatial averages of water quality parameters for each sub-lake area to further reveal long-term trends in different seasons and spatial locations. Seasonal variation was apparent in all sub-lake areas, with trends similar to those of the entire lake; summer and autumn values were notably higher than those in spring and winter. Spatially, for CODMn and TP, GT Lake consistently showed higher values across all seasons compared to other sub-lakes, while QJ Lake and NS Lake had the lowest values. For Turb, NS Lake consistently had the lowest values across all seasons, with other lakes showing slightly higher levels. Overall, GT Lake had the highest pollution levels throughout the year, while NS Lake had the lowest, with the remaining three lakes falling in between. GT Lake's consistently high pollution levels may be associated with intensive agricultural activities (such as high grain cultivation density and fertilizer application rates) and a relatively high PD within its buffer zone.
Figure 5

Spatial averages of water quality parameters across different seasons (a) and years (b) in Liangzi Lake and its sub-lake regions.

Figure 5

Spatial averages of water quality parameters across different seasons (a) and years (b) in Liangzi Lake and its sub-lake regions.

Close modal

Inter-annual variability

Aside from significant seasonal variations, inter-annual changes more effectively reflect the impact of human activities on water quality. Figures 6, S3, and S4 show the annual average spatial distributions of CODMn, TP, and Turb for Liangzi Lake from 2019 to 2022 based on remote sensing inversion. As shown in Figure 5(b), the lowest values for all three parameters occurred in 2020, with a trend of decreasing followed by increasing values over the 4 years. One possible explanation is that the severe COVID-19 pandemic in 2020 led to a reduction in human activities, which may have improved water quality; as human activities resumed after the pandemic was brought under control, water quality parameters might have increased accordingly. The variability in water quality parameters between 2021 and 2022 was substantial, with the annual averages in 2022 being 1.07 times (CODMn), 1.50 times (TP), and 2.47 times (Turb) the annual averages in 2020, respectively, indicating that Turb had the greatest inter-annual variability, followed by TP and CODMn had the least.
Figure 6

Spatial distribution of CODMn in Liangzi Lake across different years.

Figure 6

Spatial distribution of CODMn in Liangzi Lake across different years.

Close modal

The inter-annual trends of water quality parameters in the five sub-lake areas were generally consistent with those of the entire Liangzi Lake. After 2020, NS Lake exhibited the smallest changes in the three water quality parameters compared to other sub-lake areas, indicating it had the lowest pollution levels. In contrast, GT Lake showed the greatest increase and the highest pollution levels.

Additionally, the spatial distribution maps of water quality parameters derived from remote sensing inversion (Figures 6, S3, and S4) show relatively higher values in lake branches and shallow nearshore areas. This is due to their geographical location and ecological characteristics, making them more susceptible to external pollution sources. With limited environmental capacity, these areas have weaker dilution and self-purification abilities, leading to more severe water pollution. This finding is consistent with actual environmental phenomena and further confirms the effectiveness of the water quality inversion model used in this study. To mitigate pollution in these areas, water quality can be improved by controlling pollution sources and establishing ecological buffer zones.

Human activity intensity status in the buffer zones

Figure 7 shows the spatial distribution of eight human activity intensity indicators within the 4,000 m buffer zone around Liangzi Lake from 2019 to 2022. Based on the spatial quantification values of these indicators within 1,000, 2,000, 3,000, and 4,000 m buffer zones (Table S6), Table 3 details the trends, differences among different buffer zones, and the sub-lake areas with maximum and minimum values. With the exception of GYUCL, which showed little change, and FLCL, which declined, the remaining six indicators (LD, AIOUA, PD, RD, NLI, and ISR) exhibited varying degrees of increase across different-sized buffer zones around all sub-lake areas. This suggests that agricultural production around Liangzi Lake has become more efficient, relying on fewer fertilizers to maintain the same yield, while industrial and urban development is accelerating, leading to changes in land use and increased environmental pressure.
Table 3

Summary of changes and spatial variations in human activity intensity indicators across buffer zones

IndicatorTrend of change over the yearsDifferences across buffer zonesMax indicator areaMin indicator area
GYUCL No significant change Consistent across all buffer zones GT ZQ and NS 
FLCL Decreasing over the years The use of chemical fertilizers decreased in the nearer buffer zones (1,000 m, 2,000 m) while maintaining a relatively high usage level in the farther buffer zones (3,000 m, 4,000 m). GT ZQ and NS 
LD Increasing over the years Consistent across all buffer zones No significant differences  
AIOUA Significant decline in 2020, followed by a continuous increase until 2022 For NS and GT, the AIOUA increased progressively as the buffer distance expanded from 1,000 to 4,000 m. Conversely, For ZQ, MJ, and QJ, the AIOUA was highest within the 1,000 m buffer zone. NS GT 
PD In all buffer zones of ZQ, GT, and QJ, as well as the farther buffer zones of NS (3,000 and 4,000 m), there was a noticeable decline in 2020, followed by a slight increase.
In the closer buffer zones of NS (1,000 and 2,000 m) and all buffer zones of MJ, there was a rapid increase in 2020, followed by a continued slight increase. 
Except for GT, in the other sub-lake areas, PD was lower in the closer buffer zones (1,000 m, 2,000 m) and higher in the farther buffer zones (3,000 m, 4,000 m). GT QJ 
RD Increasing over the years The RD increased as the buffer distance increased from 1,000 to 4,000 m.
Temporary roads were possibly added in the buffer zones of NS and QJ in 2020. 
NS QJ and GT 
NLI Rapid annual growth The NLI in all sub-lake areas, except for ZQ, was relatively weaker within the 1,000 m buffer zone, but significantly stronger within the 2,000, 3,000, and 4,000 m buffer zones. NS ZQ 
ISR Nearly unchanged or showing very slight growth For NS and GT, the ISR increased as the buffer distance increased from 1,000 to 4,000 m.
In contrast, the opposite trend occurred in ZQ, MJ, and QJ. 
MJ ZQ 
IndicatorTrend of change over the yearsDifferences across buffer zonesMax indicator areaMin indicator area
GYUCL No significant change Consistent across all buffer zones GT ZQ and NS 
FLCL Decreasing over the years The use of chemical fertilizers decreased in the nearer buffer zones (1,000 m, 2,000 m) while maintaining a relatively high usage level in the farther buffer zones (3,000 m, 4,000 m). GT ZQ and NS 
LD Increasing over the years Consistent across all buffer zones No significant differences  
AIOUA Significant decline in 2020, followed by a continuous increase until 2022 For NS and GT, the AIOUA increased progressively as the buffer distance expanded from 1,000 to 4,000 m. Conversely, For ZQ, MJ, and QJ, the AIOUA was highest within the 1,000 m buffer zone. NS GT 
PD In all buffer zones of ZQ, GT, and QJ, as well as the farther buffer zones of NS (3,000 and 4,000 m), there was a noticeable decline in 2020, followed by a slight increase.
In the closer buffer zones of NS (1,000 and 2,000 m) and all buffer zones of MJ, there was a rapid increase in 2020, followed by a continued slight increase. 
Except for GT, in the other sub-lake areas, PD was lower in the closer buffer zones (1,000 m, 2,000 m) and higher in the farther buffer zones (3,000 m, 4,000 m). GT QJ 
RD Increasing over the years The RD increased as the buffer distance increased from 1,000 to 4,000 m.
Temporary roads were possibly added in the buffer zones of NS and QJ in 2020. 
NS QJ and GT 
NLI Rapid annual growth The NLI in all sub-lake areas, except for ZQ, was relatively weaker within the 1,000 m buffer zone, but significantly stronger within the 2,000, 3,000, and 4,000 m buffer zones. NS ZQ 
ISR Nearly unchanged or showing very slight growth For NS and GT, the ISR increased as the buffer distance increased from 1,000 to 4,000 m.
In contrast, the opposite trend occurred in ZQ, MJ, and QJ. 
MJ ZQ 

Note. GYUCL, FLCL, LD, AIOUA, PD, RD, NLI, and ISR denote grain yield per unit of cultivated land, fertilizer load on cultivated land, livestock density, average industrial output per unit area, population density, road density, nighttime light intensity, and impervious surface ratio, respectively; ZQ, NS, GT, MJ, and QJ are short for Zhangqiao sub-lake, Niushan sub-lake, Gaotang sub-lake, Manjiang sub-lake, and Qianjiang sub-lake.

Figure 7

Spatial distribution of human activity intensity in the full buffer zone of Liangzi Lake across different years: (a) GYUCL, (b) FLCL, (c) LD, (d) AIOUA, (e) PD, (f) RD, (g) NLI, and (h) ISR.

Figure 7

Spatial distribution of human activity intensity in the full buffer zone of Liangzi Lake across different years: (a) GYUCL, (b) FLCL, (c) LD, (d) AIOUA, (e) PD, (f) RD, (g) NLI, and (h) ISR.

Close modal

Additionally, half of the indicators (FLCL, PD, RD, and NLI) showed higher values in larger buffer zones, indicating that human activities were primarily concentrated in the peripheral areas of the lake rather than the nearshore regions. This trend may be due to urbanization and industrial development occurring further from the lake, while nearshore areas were designated as low-intensity or non-development zones for water quality and ecological protection.

Notably, AIOUA and PD showed a significant decrease in 2020, followed by a rebound. One possible explanation is that this relates to prolonged lockdowns around Liangzi Lake during the COVID-19 pandemic. It's assumed that lockdowns reduced industrial production and migrant workers, thus affecting the AIOUA and PD.

Impact of human activities in buffer zones on lake water quality

Based on the VIF values, FLCL, WIN, and ISR were identified as redundant variables among the 11 explanatory variables. Removing these redundant variables helps reduce multicollinearity, enhancing the explanatory power and stability of the RDA model while preventing redundant information from interfering with the identification of primary influencing factors. The remaining eight explanatory variables were used in the RDA. Figure 8 reflects the correlations between the matrix of human activity intensity factors, meteorological factors, and the ordination axes at different buffer zone scales (1,000, 2,000, 3,000, 4,000 m).
Figure 8

RDA ordination plot at (a) 1,000 m, (b) 2,000, (c) 3,000, and (d) 4,000 m scales.

Figure 8

RDA ordination plot at (a) 1,000 m, (b) 2,000, (c) 3,000, and (d) 4,000 m scales.

Close modal

Table S7 quantitatively presents the RDA results for different buffer zones (1,000, 2,000, 3,000, 4,000 m), including the percentage of total variance explained by each explanatory variable and statistical significance (p-values). The factors influencing water quality changes did not vary significantly across different buffer zone scales. Except for the 1,000 m buffer zone, where the composition of influential factors (PRE, AIOUA, and LD) slightly differed, the top three contributing factors were identical across the other three buffer zone scales, namely PD, PRE, and LD.

PRE and LD were consistently the primary factors influencing water quality changes across all buffer zones, explaining 26.7–30.8% and 12.9–17.6% of the water quality variation, respectively. Combined with the correlation analysis results (Figures S5 and S6), PRE showed a significant negative correlation with CODMn, TP, and Turb in all buffer zones, particularly with TP (r = −0.68, p < 0.001). The increase in PRE may have facilitated the dispersion and dilution of organic pollutants, nutrients, and suspended particles in the water, thereby improving water quality. LD (representing livestock and poultry farming activities) exhibited a significant positive correlation with TP and Turb in multiple buffer zones, indicating that livestock and poultry farming activities have a notable impact on the eutrophication and turbidity of the water body. Based on this finding, the dilution effect of PRE can be enhanced by optimizing rainwater management (e.g., adding rainwater collection facilities), and waste management of LD-related livestock and poultry farming activities can be strengthened to reduce pollution.

AIOUA (representing industrial activities) had a particularly significant impact on water quality in the 1,000 m buffer zone, explaining 28.4% of the water quality variation. However, its influence gradually diminished as the buffer zone expanded. In the 1,000 m buffer zone, AIOUA was significantly negatively correlated with CODMn (r = −0.55, p < 0.05). This result may reflect the effects of efficient wastewater treatment measures, stringent pollution control policies, and the characteristics of industrial activities. These factors might have led to lower CODMn levels being detected in lake areas associated with high industrial activity. As the buffer zone expanded, other non-industrial factors became more important in influencing water quality, reducing the significance of industrial activities. The influence of AIOUA diminished as the buffer zone expanded, possibly because industrial activities were primarily concentrated in nearshore areas, while in larger buffer zones, contributions from agriculture and urbanization gradually became dominant.

While AIOUA was identified as a major influencing factor only within the 1,000 m buffer zone, PD replaced it as the primary influencing factor in the 2,000, 3,000, and 4,000 m buffer zones, explaining 25.0–35.7% of the water quality variation. PD showed a significant positive correlation with CODMn and TP, especially with CODMn (r = 0.70–0.78, p < 0.001). PD reflects the broader impacts of domestic sewage discharge, land-use changes (such as urbanization), and infrastructure. Domestic sewage typically carries organic matter, nutrients (such as nitrogen and phosphorus), and pathogens, which directly affect water quality indicators like CODMn and TP. As the buffer zone expands, the cumulative effects of PD on water quality become stronger, making PD the primary influencing factor in larger buffer zones. This possibly stems from rising PD in urbanization, causing a surge in domestic sewage. Land-use changes, like farmland turning into residential or commercial areas, add impervious surfaces and speed up runoff pollution. Infrastructure development disrupts the hydrological cycle, worsening the buildup of organic matter and nutrients, thus degrading water quality.

In summary, the main factors influencing water quality and their strengths vary across different buffer zone scales in Liangzi Lake. In formulating water quality management strategies, it is essential to consider these primary influencing factors at different spatial scales to implement more effective measures for water quality protection. Detailed water quality management strategies within the buffer zones of Liangzi Lake are presented in Table 4.

Table 4

Summary of main influencing factors and management strategies for each buffer zone

Buffer zoneMain influencing factorsManagement strategies
1,000 m PRE, AIOUA, and LD 
  • a. Industrial control: Strengthen wastewater treatment, and promote clean production

  • b. Livestock management: Regulate density, and promote ecological farming

  • c. Runoff management: Build rainwater collection systems, and establish vegetative buffers

 
2,000m
3,000 m
4,000 m 
PD, PRE, and LD

 
  • a. Sewage treatment: Enhance collection and treatment, and prevent direct discharge

  • b. Land use planning: Optimize planning, and promote green infrastructure

  • c. Livestock management: Promote efficient wastewater treatment technologies

  • d. Precipitation management: Establish rainwater management systems, and enhance natural filtration

 
Buffer zoneMain influencing factorsManagement strategies
1,000 m PRE, AIOUA, and LD 
  • a. Industrial control: Strengthen wastewater treatment, and promote clean production

  • b. Livestock management: Regulate density, and promote ecological farming

  • c. Runoff management: Build rainwater collection systems, and establish vegetative buffers

 
2,000m
3,000 m
4,000 m 
PD, PRE, and LD

 
  • a. Sewage treatment: Enhance collection and treatment, and prevent direct discharge

  • b. Land use planning: Optimize planning, and promote green infrastructure

  • c. Livestock management: Promote efficient wastewater treatment technologies

  • d. Precipitation management: Establish rainwater management systems, and enhance natural filtration

 

Comparison of water quality evaluation methods

To validate the effectiveness of the remote sensing-based water quality assessment (referred to as planar assessment) used in this study, a comparison was made with the water quality assessment based on monitoring stations (referred to as point assessment). Four monitoring stations are distributed across NS Lake, QJ Lake, and ZQ Lake (Figure 1). In the point assessment, NS Lake's water quality was represented by the average of water parameters from monitoring station S1, ZQ Lake's by S4, and QJ Lake's by the average of S2 and S3. For the planar assessment, all pixels of water quality retrieval within the corresponding sub-lake area were extracted and averaged. As shown in Figure S7, the planar assessment values for the three water quality parameters were generally higher than the point assessment values. This may be because remote sensing inversion covers the entire water body and is more sensitive to localized pollution, whereas monitoring stations primarily reflect conditions at specific locations. Additionally, uncertainties in the remote sensing inversion model itself may lead to overestimation of certain water quality parameters. Therefore, while the results suggest that traditional monitoring station-based assessments may underestimate water pollution levels, the planar assessment values should still be interpreted with caution.

Additionally, water quality parameters were rated according to the ‘Environmental Quality Standards for Surface Water (GB 3838–2002)’ issued by the former State Environmental Protection Administration (now the Ministry of Ecology and Environment). The standard classifies water quality parameters into five categories, with Class I being the best and Class V the worst. Since Turb is not explicitly classified in this standard, only the evaluation results for CODMn and TP were compared. As shown in Table S8, the point assessment ratings were often lower than those of the planar assessment, with a higher frequency observed for TP, reaching up to 20%. This further suggests that a limited number of monitoring stations may not fully capture the actual pollution conditions of the water body, and incorporating remote sensing-based water quality monitoring can improve the accuracy of pollution assessments. Previous studies have also pointed out that the limited spatial coverage of monitoring stations often fails to capture the spatial heterogeneity of water pollution, while remote sensing technology, by providing high-resolution continuous data, can significantly improve the accuracy of water quality assessments (Gholizadeh et al. 2016; Jaywant & Arif 2024).

However, it should be noted that part of the difference between the two methods may also stem from the accuracy of the retrieval model. For example, Chebud et al. (2012) used Landsat Thematic Mapper (TM) data and an artificial neural network to monitor phosphorus, turbidity, and chlorophyll-a, achieving a high correlation (R2 > 0.95). Similarly, Elhag et al. (2019) employed Sentinel-2 data to retrieve chlorophyll-a, nitrate, and turbidity in Wadi Baysh Dam Lake, Saudi Arabia, with R2 values reaching 0.94–0.96. In comparison, the RF model in this study achieved R2 values of 0.92 for CODMn, 0.86 for TP, and 0.76 for turbidity. While these results are competitive, they are slightly lower, particularly for turbidity, which may be influenced by the optical complexity of Liangzi Lake (e.g., interference from suspended solids and colored dissolved organic matter). Future research could focus on improving the retrieval model to enhance its accuracy.

Qualitative analysis of factors affecting water quality in sub-lake areas

In the previous section, due to data limitations, only a quantitative analysis of the factors affecting the water quality of the entire lake was conducted. However, due to differences in geographical location and surrounding environmental conditions, the water quality varies significantly among the sub-lake areas within the lake. Therefore, this section conducted a qualitative analysis of the reasons for water quality differences in each sub-lake area by considering the socioeconomic and natural geographic characteristics of their surrounding buffer zones, aiming to provide a more comprehensive understanding of the driving factors behind water quality changes.

The GT Lake exhibited the highest levels of CODMn and TP. This may be attributed to the intensive agricultural activities and relatively high PD within its buffer zone. As shown in Table S6, the GT buffer zone had the highest values for GYUCL, FLCL, and PD, which collectively contributed to the increased input of organic matter and nutrients, leading to organic pollution and eutrophication in the lake area. To mitigate this problem, sustainable agricultural practices (such as reducing fertilizer use and adopting precision agriculture techniques) can be promoted, and population-related pollution can be managed by improving sewage treatment systems.

The water quality of NS Lake was relatively good, with both COD and TP at relatively low levels. This is closely related to the predominance of industrial production, lower intensity of agricultural activities, and strict control of industrial wastewater discharge within the NS buffer zone. The NS area had the highest values for AIOUA, RD, and NLI, reflecting its high level of industrialization. The reason industrial activities have not led to higher pollution may include effective wastewater treatment systems and strict environmental regulations. Additionally, NS Lake has been designated as a fishery resource protection area and an emergency backup water source for Wuhan, resulting in stricter environmental protection measures. NS Lake remains isolated from other sub-lake areas due to the NS dike separating it. Additionally, its smaller area results in reduced impact from wind and waves, which may be the reason for its low Turb.

MJ Lake experienced frequent boat traffic and had a large number of docks, making its ISR value the highest. The frequent waterborne activities may have contributed to the increased water pollution in this lake area. To reduce pollution, the impact of waterborne activities can be managed by restricting boat traffic, enforcing strict emission standards, and adding interception and filtration measures.

The buffer zones of ZQ and QJ Lakes had relatively low levels of development. In the ZQ buffer zone, indicators such as GYUCL, FLCL, NLI, and ISR had the lowest values, while the PD value in the QJ buffer zone was the lowest. This indicates that these areas had relatively few socioeconomic activities and low development intensity, contributing to their relatively good water quality. The low pollution level may stem from the reduction of anthropogenic pollution sources and the filtration of runoff by natural buffers (such as wetlands or vegetation).

This study utilized water quality remote sensing inversion models and response analysis methods to examine the spatiotemporal distribution characteristics of water quality in Liangzi Lake from 2019 to 2022 and to assess the impact of human activities on water quality changes. The findings demonstrate that TP, CODMn, and Turb were the water quality parameters with the highest inversion accuracy. Among the models tested, the RF model consistently outperformed the BP neural network, particularly in the inversion of CODMn, where the coefficient of determination (R2) reached 0.92. The possible reason for the RF model's superior performance is its strong anti-overfitting capability and its proficiency in handling high-dimensional nonlinear data.

The primary factors influencing water quality within buffer zones of Liangzi Lake were identified as PRE, industrial activities (AIOUA), PD, and livestock and poultry farming density (LD). PRE and LD were found to have the most significant impact on water quality across all buffer zones, explaining 26.7 to 30.8% and 12.9 to 17.6% of the variation in water quality, respectively. While AIOUA had a significant impact within the 1,000 m buffer zone, its influence diminished as the buffer zone expanded. This may be related to the concentrated spatial distribution and limited diffusion range of industrial pollution sources. To address this, industrial enterprises within the 1,000 m buffer zone can be required to install efficient wastewater treatment equipment, and vegetative buffer strips can be established along the lake to intercept runoff pollutants. Conversely, PD exerted a more substantial impact on water quality in the larger buffer zones (2,000, 3,000, and 4,000 m), particularly showing a strong positive correlation with CODMn and TP. This may reflect the cumulative effect of domestic sewage and non-point source pollution from densely populated areas as distance increases. To address this, wastewater treatment plants can be upgraded and urban wetlands can be constructed within the 2,000–4,000 m buffer zones.

The study also revealed marked seasonal and inter-annual variations in water quality, with the higher values observed in summer and autumn and the lower in spring and winter. Poor water quality in summer and autumn may result from the combined effects of rainfall-driven non-point source pollution, enhanced biological activity due to high temperatures, and peak agricultural and tourism activities. Water quality improved in 2020, possibly due to a temporary reduction in human activities during the pandemic. As restrictions were lifted and activities resumed, water quality subsequently declined. Furthermore, the water quality parameters were notably higher in lake branches and shallow nearshore areas, likely due to their geographical and ecological characteristics, which make them more susceptible to external pollution sources. Nearshore vegetative buffer strips should be strengthened, and pollution source emissions from tributaries should be controlled to reduce pollutant input.

This study underscores the complex interplay between human activities and meteorological factors on the water quality of Liangzi Lake across different spatial scales. The findings highlight the need for water quality management strategies that account for the spatial distribution and intensity of key influencing factors.

This research was supported by funding from the National Key R&D Program of China (2023YFC3205600).

Developed the methodology by M.W., Y.Q., and B.G. Material prepared, collection, and analysis by Y.Q., Y.B., and R.G. Wrote the original draft prepared by M.W. and Y.Q. Rendered support in funding acquisition by M.W. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Brown
C. J.
,
Jupiter
S. D.
,
Albert
S.
,
Klein
C. J.
,
Mangubhai
S.
,
Maina
J. M.
,
Mumby
P.
,
Olley
J.
,
Stewart-Koster
B.
,
Tulloch
V.
&
Wenger
A.
(
2017
)
Tracing the influence of land-use change on water quality and coral reefs using a Bayesian model
,
Scientific Reports
,
7
,
10
.
Chebud
Y.
,
Naja
G. M.
,
Rivero
R. G.
&
Melesse
A. M.
(
2012
)
Water quality monitoring using remote sensing and an artificial neural network
,
Water Air and Soil Pollution
,
223
,
4875
4887
.
Chen
Q.
,
Mei
K.
,
Dahlgren
R. A.
,
Wang
T.
,
Gong
J.
&
Zhang
M.
(
2016
)
Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression
,
Science of The Total Environment
,
572
,
450
466
.
Chen
Z. Q.
,
Yu
B. L.
,
Yang
C. S.
,
Zhou
Y. Y.
,
Yao
S. J.
,
Qian
X. J.
,
Wang
C. X.
,
Wu
B.
&
Wu
J. P.
(
2021
)
An extended time series (2000–2018) of global NPP-VIIRS-like nighttime light data from a cross-sensor calibration
,
Earth System Science Data
,
13
,
889
906
.
Chen
M.
,
Xiao
F.
,
Wang
Z.
,
Feng
Q.
,
Ban
X.
,
Zhou
Y.
&
Hu
Z.
(
2022a
)
An improved QAA-based method for monitoring water clarity of Honghu lake using landsat TM, ETM+ and OLI data
,
Remote Sensing
,
14
,
3798
.
Chen
Z.
,
Dou
M.
,
Xia
R.
,
Li
G. Q.
&
Shen
L. S.
(
2022b
)
Spatiotemporal evolution of chlorophyll-a concentration from MODIS data inversion in the middle and lower reaches of the Hanjiang River, China
,
Environmental Science and Pollution Research
,
29
,
38143
38160
.
Dang
X. Y.
,
Du
J.
,
Wang
C.
,
Zhang
F. F.
,
Wu
L.
,
Liu
J. P.
,
Wang
Z.
,
Yang
X.
&
Wang
J. X.
(
2023
)
A hybrid chlorophyll a estimation method for oligotrophic and mesotrophic reservoirs based on optical water classification
,
Remote Sensing
,
15
,
22
.
De Bono
A.
&
Mora
M. G.
(
2014
)
A global exposure model for disaster risk assessment
,
International Journal of Disaster Risk Reduction
,
10
,
442
451
.
Ding
J.
,
Jiang
Y.
,
Liu
Q.
,
Hou
Z.
,
Liao
J.
,
Fu
L.
&
Peng
Q.
(
2016
)
Influences of the land use pattern on water quality in low-order streams of the Dongjiang River basin, China: a multi-scale analysis
,
Science of The Total Environment
,
551
,
205
216
.
Gao
Y. N.
,
Gao
J. F.
,
Yin
H. B.
,
Liu
C. S.
,
Xia
T.
,
Wang
J.
&
Huang
Q.
(
2015
)
Remote sensing estimation of the total phosphorus concentration in a large lake using band combinations and regional multivariate statistical modeling techniques
,
Journal of Environmental Management
,
151
,
33
43
.
Grizzetti
B.
,
Bouraoui
F.
&
De Marsily
G.
(
2008
)
Assessing nitrogen pressures on European surface water
,
Global Biogeochemical Cycles
,
22
,
14
.
Halpern
B. S.
,
Frazier
M.
,
Potapenko
J.
,
Casey
K. S.
,
Koenig
K.
,
Longo
C.
,
Lowndes
J. S.
,
Rockwood
R. C.
,
Selig
E. R.
,
Selkoe
K. A.
&
Walbridge
S.
(
2015
)
Spatial and temporal changes in cumulative human impacts on the world's ocean
,
Nature Communications
,
6
,
7
.
Huang
J. C.
,
Zhang
Y. J.
,
Bing
H. J.
,
Peng
J.
,
Dong
F. F.
,
Gao
J. F.
&
Arhonditsis
G. B.
(
2021
)
Characterizing the river water quality in China: recent progress and on-going challenges
,
Water Research
,
201
,
14
.
Huete
A.
,
Didan
K.
,
Miura
T.
,
Rodriguez
E. P.
,
Gao
X.
&
Ferreira
L. G.
(
2002
)
Overview of the radiometric and biophysical performance of the MODIS vegetation indices
,
Remote Sensing of Environment
,
83
,
195
213
.
Israels
A. Z.
(
1984
)
Redundancy analysis for qualitative variables
,
Psychometrika
,
49
,
331
346
.
Jiang
Y. Z.
,
Kong
J. L.
,
Zhong
Y. L.
,
Zhang
J. Y.
,
Zheng
Z. J.
,
Wang
L. Z.
&
Liu
D. M.
(
2023
)
The optimal method for water quality parameters retrieval of urban river based on machine learning algorithms using remote sensing images
,
International Journal of Remote Sensing
,
45
(
19–20
),
7297
7317
.
Kim
Y. W.
,
Kim
T.
,
Shin
J.
,
Lee
D. S.
,
Park
Y. S.
,
Kim
Y.
&
Cha
Y.
(
2022
)
Validity evaluation of a machine-learning model for chlorophyll a retrieval using Sentinel-2 from inland and coastal waters
,
Ecological Indicators
,
137
,
15
.
Li
J.
,
Li
J.
,
Zhu
L.
,
Shen
Q.
,
Dai
H.
&
Zhu
Y.
(
2019
)
Remote sensing identification and validation of urban black and odorous water in Taiyuan city
,
Journal of Remote Sensing
,
23
,
773
784
.
Li
S. J.
,
Chen
F. F.
,
Song
K. S.
,
Liu
G.
,
Tao
H.
,
Xu
S. Q.
,
Wang
X.
,
Wang
Q.
&
Mu
G. Y.
(
2022
)
Mapping the trophic state index of eastern lakes in China using an empirical model and Sentinel-2 imagery data
,
Journal of Hydrology
,
608
,
13
.
Li
J. Z.
,
Zheng
Z. B.
,
Liu
G.
,
Chen
N.
,
Lei
S. H.
,
Du
C.
,
Xu
J.
,
Li
Y.
,
Zhang
R. F.
&
Huang
C.
(
2023a
)
Estimating effects of natural and anthropogenic activities on trophic level of inland water: analysis of Poyang Lake Basin, China, with landsat-8 observations
,
Remote Sensing
,
15
,
21
.
Li
S. J.
,
Xu
S. Q.
,
Song
K. S.
,
Kutser
T.
,
Wen
Z. D.
,
Liu
G.
,
Shang
Y. X.
,
Lyu
L.
,
Tao
H.
,
Wang
X.
,
Zhang
L. L.
&
Chen
F. F.
(
2023b
)
Remote quantification of the trophic status of Chinese lakes
,
Hydrology and Earth System Sciences
,
27
,
3581
3599
.
Mokarram
M.
,
Pourghasemi
H. R.
,
Huang
K.
&
Zhang
H. C.
(
2022
)
Investigation of water quality and its spatial distribution in the Kor River basin, Fars province, Iran
,
Environmental Research
,
204
,
9
.
Neil
C.
,
Spyrakos
E.
,
Hunter
P. D.
&
Tyler
A. N.
(
2019
)
A global approach for chlorophyll-a retrieval across optically complex inland waters based on optical water types
,
Remote Sensing of Environment
,
229
,
159
178
.
Niroula
S.
,
Wallington
K.
&
Cai
X. M.
(
2023
)
Addressing data challenges in riverine nutrient load modeling of an intensively managed agro-industrial watershed
,
Journal of the American Water Resources Association
,
13
,
213
225
.
Sola
I.
,
García-Martín
A.
,
Sandonís-Pozo
L.
,
Álvarez-Mozos
J.
,
Pérez-Cabello
F.
,
González-Audícana
M.
&
Llovería
R. M.
(
2018
)
Assessment of atmospheric correction methods for Sentinel-2 images in Mediterranean landscapes
,
International Journal of Applied Earth Observation and Geoinformation
,
73
,
63
76
.
Sun
X. H.
,
Liu
J. Q.
,
Wang
J. R.
,
Tian
L. Q.
,
Zhou
Q.
&
Li
J.
(
2021
)
Integrated monitoring of lakes’ turbidity in Wuhan, China during the COVID-19 epidemic using multi-sensor satellite observations
,
International Journal of Digital Earth
,
14
,
443
463
.
Tong
S. L.
,
Li
W. P.
,
Chen
J.
,
Xia
R.
,
Lin
J. Y.
,
Chen
Y.
&
Xu
C. Y.
(
2023
)
A novel framework to improve the consistency of water quality attribution from natural and anthropogenic factors
,
Journal of Environmental Management
,
342
,
10
.
Warren
M. A.
,
Simis
S. G.
,
Martinez-Vicente
V.
,
Poser
K.
,
Bresciani
M.
,
Alikas
K.
,
Spyrakos
E.
,
Giardino
C.
&
Ansper
A.
(
2019
)
Assessment of atmospheric correction algorithms for the Sentinel-2A MultiSpectral Imager over coastal and inland waters
,
Remote Sensing of Environment
,
225
,
267
289
.
Wu
Z. S.
,
Lai
X. J.
&
Li
K. Y.
(
2021
)
Water quality assessment of rivers in Lake Chaohu Basin (China) using water quality index
,
Ecological Indicators
,
121
,
8
.
Xu
X.
,
Huang
X. L.
,
Zhang
Y. L.
&
Yu
D.
(
2018
)
Long-term changes in water clarity in Lake Liangzi Determined by remote sensing
,
Remote Sensing
,
10
,
15
.
Yang
J.
&
Huang
X.
(
2021
)
The 30m annual land cover dataset and its dynamics in China from 1990 to 2019
,
Earth System Science Data
,
13
,
3907
3925
.
Yang
X.
,
Cui
H. B.
,
Liu
X. S.
,
Wu
Q. G.
&
Zhang
H.
(
2020
)
Water pollution characteristics and analysis of Chaohu Lake basin by using different assessment methods
,
Environmental Science and Pollution Research
,
27
,
18168
18181
.
Yao
Y.
,
Shen
Q.
,
Zhu
L.
,
Gao
H.
,
Cao
H.
,
Han
H.
,
Sun
J.
&
Li
J.
(
2019
)
Remote sensing identification of urban black-odor water bodies in Shenyang city based on GF-2 image
,
Journal of Remote Sensing
,
23
,
230
242
.
Yasarer
L. M. W.
,
Sinnathamby
S.
&
Sturm
B. S. M.
(
2016
)
Impacts of biofuel-based land-use change on water quality and sustainability in a Kansas watershed
,
Agricultural Water Management
,
175
,
4
14
.
Zhang
X. S.
&
Xu
Z. J.
(
2021
)
Functional coupling degree and human activity intensity of production-living-ecological space in underdeveloped regions in China: case study of Guizhou Province
,
Land
,
10
,
13
.
Zhang
X. D.
,
Wang
X. D.
,
Zhou
Z. X.
,
Li
M. W.
&
Jing
C. F.
(
2022
)
Spatial quantitative model of human activity disturbance intensity and land use intensity based on GF-6 image, empirical study in southwest mountainous county, China
,
Remote Sensing
,
14
,
17
.
Zhao
J.
,
Lin
L. Q.
,
Yang
K.
,
Liu
Q. X.
&
Qian
G. R.
(
2015
)
Influences of land use on water quality in a reticular river network area: a case study in Shanghai, China
,
Landscape and Urban Planning
,
137
,
20
29
.
Zhou
L.
,
Zhang
X.
,
Liu
L. X.
,
Chen
T.
,
Liu
X.
,
Ren
Y. F.
,
Zhang
H. R.
,
Li
X. D.
&
Ao
T. Q.
(
2020
)
An approach to evaluate non-point source pollution in an ungauged basin: a case study in Xiao'anxi River Basin, China
,
Water Supply
,
20
,
3646
3657
.
Zhou
Q.
,
Wang
J. R.
,
Tian
L. Q.
,
Feng
L.
,
Li
J.
&
Xing
Q. G.
(
2021
)
Remotely sensed water turbidity dynamics and its potential driving factors in Wuhan, an urbanizing city of China
,
Journal of Hydrology
,
593
,
14
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).