ABSTRACT
With rapid urbanization and industrialization in Hubei, assessing water quality and identifying key influencing factors are crucial for lake conservation. This study utilized remote sensing and machine learning to analyze water quality parameters, including the permanganate index, total phosphorus (TP), and turbidity, in Liangzi Lake (Hubei's second-largest lake) from 2019 to 2022. A spatial quantification model and statistical analysis were employed to assess human activity intensity at different buffer scales (1,000–4,000 m) and identify key influencing factors. Results showed significant seasonal and annual variations in water quality, with the highest pollution in autumn and the lowest in winter. Pollution levels decreased from 2019 to 2020 but gradually increased from 2020 to 2022, possibly related to changes in human activities during the COVID-19 pandemic. Spatially, the Gaotang sub-lake had the highest permanganate index and TP pollution, while the Manjiang sub-lake had the highest turbidity. Precipitation and livestock density were the primary factors, accounting for 26.7–30.8% and 12.9–17.6% of water quality variation, respectively. At the 1,000 m and 2,000–4,000 m buffer zones, industrial output and population density were the dominant human activity factors influencing water quality. This study provides practical insights for targeted lake management strategies and environmental protection efforts.
HIGHLIGHTS
Remote sensing and machine learning were used to investigate the lake water quality.
Random forest outperformed the backpropagation neural network in water quality inversion accuracy.
Water quality parameters exhibited significant seasonal and inter-annual variations.
Key factors include precipitation, industry, population, and livestock density.
Targeted water management strategies were proposed based on buffer zone impacts.
INTRODUCTION
Lake water quality is crucial for ecological balance, water resource management, and biodiversity conservation. Accurately assessing pollution levels and identifying key influencing factors are essential for effective lake environmental protection (Yang et al. 2020; Wu et al. 2021). Water quality fluctuations are influenced by both natural factors and human activities (Li et al. 2023b; Tong et al. 2023). However, with increasing population growth and intensified land use changes, human activities have become the primary driving force. Pollution occurs through two main pathways: point source pollution (industrial and domestic wastewater discharge) and non-point source pollution (agricultural fertilizers, livestock manure, and urban runoff) (Grizzetti et al. 2008). These pollutants contribute to lake eutrophication, cyanobacterial blooms, and aquatic habitat degradation, ultimately threatening ecosystems and drinking water safety. Therefore, understanding the spatiotemporal variations in lake water quality and its responses to human activities is essential for pollution control and sustainable water resource management (de Oliveira et al. 2017).
Three methods are commonly used to study the response of lake water quality to human activities. The first method uses mechanistic models, such as Soil and Water Assessment Tool (SWAT) and AnnAGNPS (Yasarer et al. 2016), to simulate the impact of human activities on lake water quality. It is widely applied in non-point source pollution studies (Niroula et al. 2023). This method provides realistic simulation results and captures the dynamic changes in water quality. However, it requires extensive data and some key parameters are difficult to obtain, making it highly sensitive to parameter variations (Jeung et al. 2023). The second method is the pollution load method, which quantitatively evaluates the impact of point and non-point source pollution on the water environment, including applications such as equivalent standard pollution load (Zhou et al. 2020) and output coefficient models (Cai et al. 2018; Zhang et al. 2020; Tong et al. 2022). The advantage of this method is that it simplifies data acquisition and allows for an intuitive quantification of the pollutant contributions. However, it lacks mechanistic analysis of pollution sources and water quality changes, making it difficult to effectively capture dynamic changes in water quality. The third method involves calculating the human activity intensity index based on an evaluation index system. This method comprehensively considers factors such as population (Chen et al. 2016), agricultural activities, socioeconomic development (Li et al. 2023a), and land use structure (Brown et al. 2017; Kim et al. 2022), and establishes a regression model to quantify the impact of these factors on water quality. The primary advantages of this method are the convenience of data acquisition, its ability to reflect human activity impacts on lake water quality from multiple dimensions, and its suitability for large-scale regional water quality assessments. Although the limitations of this method include the inability of regression analysis to reveal causal relationships and its neglect of system complexity, it is operationally simple and easily integrates with spatial analysis techniques, making it an effective tool for assessing the relationship between lake water quality and human activities.
Acquiring water quality data are a fundamental prerequisite for studying the response of lake water quality to human activities. Real-time monitoring of water quality changes at designated points has become the primary method for tracking water quality changes (Huang et al. 2021; Mokarram et al. 2022). However, this method has significant limitations: although real-time monitoring can provide data with high temporal resolution, the selection of monitoring sites is often constrained by cost and accessibility. Particularly for large water bodies, the limited number of monitoring points makes it difficult to comprehensively reflect the spatial heterogeneity of water quality (Li et al. 2022). Additionally, real-time monitoring equipment is susceptible to environmental factors (such as water currents and biological interference), which may introduce local biases in the data, resulting in inaccurate assessments of overall water quality conditions. In contrast, remote sensing technology significantly reduces the cost of acquiring water quality parameters while providing a comprehensive view of the target area. Through its extensive spatial coverage, it avoids water quality assessment errors caused by insufficient monitoring points. Unlike traditional field monitoring, which relies on dense sampling points and manual collection, remote sensing can achieve relatively high-precision retrieval of water quality parameters with smaller datasets, offering an efficient alternative for large-scale water quality studies. Currently, water quality parameters such as chlorophyll-a (Neil et al. 2019; Dang et al. 2023), turbidity (Sun et al. 2021; Zhou et al. 2021), and total phosphorus (TP) (Gao et al. 2015; Xiong et al. 2022) have been successfully retrieved, providing robust data support for water environment protection.
Among the sensors available, the multispectral imager on Sentinel-2 is considered the most suitable for inland water remote sensing (Kim et al. 2022). Its high spatial resolution (10–60 m) and high temporal resolution (2–5 days) offer adequate conditions for inland water quality monitoring. For the choice of inversion models, machine learning algorithms, which better capture complex nonlinear relationships among input features and are more robust to atmospheric correction errors, have become increasingly popular in inland water remote sensing. These algorithms include backpropagation (BP) neural networks, random forests, and support vector machines (Kim et al. 2022; Jiang et al. 2023). Additionally, traditional inversion methods such as physical models (e.g., Hydrolight) (Rodero et al. 2021), semi-analytical algorithms (e.g., Quasi-Analytical Algorithm (QAA)) (Chen et al. 2022a), statistical models (e.g., band ratio methods) (Ha et al. 2017), and hybrid approaches (e.g., Look-Up Table (LUT)) (Khattab & Merkel 2014) have been widely employed for remote sensing inversion of water quality parameters, each demonstrating unique advantages in different application scenarios.
Human activities exhibit distinct spatial characteristics, making population, social, and economic statistical data increasingly important in studying the relationship between humans and the natural environment. However, these data are typically aggregated by administrative units (such as cities or counties), which do not fully align with the spatial scope of geographic data. This mismatch creates difficulties in integration, making it challenging to accurately reflect the complex interactions between human activities and the natural environment (Halpern et al. 2015; Kovarik & van Beynen 2015). Spatialization of socioeconomic data can effectively resolve these issues. This process involves mapping statistical data onto spatial grid cells with a certain resolution based on the potential spatiotemporal distribution characteristics of socioeconomic data, thereby simulating the geographic distribution of these data (De Bono & Mora 2014). Spatialization methods based on land use data are commonly used for spatializing statistics such as population and gross domestic product (GDP) (Zhang et al. 2020; Zhang et al. 2022). Therefore, integrating regional population, livestock quantity, and other socioeconomic statistical data with basic geographic unit data through spatialization techniques can provide strong support for improving the accuracy of human activity quantification and also aid in the detailed analysis of the impact of human activities on lake water quality.
This study proposes a novel quantitative framework to analyze the response of water quality to human activities. The research is structured as follows: First, it employs remote sensing inversion techniques to assess water quality from a planar perspective; second, it uses an evaluation index system-based method and a spatial quantification model to evaluate human activity intensity; finally, it applies redundancy analysis (RDA) to identify the key human activity factors driving water quality changes in Liangzi Lake. This study provides a scientific basis for understanding the mechanisms by which human activities impact lake water quality, contributing to the protection and management of lake ecosystems.
MATERIALS AND METHODS
This study focuses on three core modules – planar quantification of lake water quality parameters, spatial quantification of human activity intensity, and identification of water quality driving factors – to systematically reveal the impact mechanisms of human activities on lake water quality. First, in the planar quantification of water quality parameters, the study utilized Sentinel-2 multispectral remote sensing imagery, processed into Level-2A reflectance data via Sen2Cor atmospheric correction. Water quality inversion models tailored to Liangzi Lake were constructed using BP neural networks and random forest (RF) algorithms. These models train the relationship between reflectance (from single and multi-band combinations) and water quality parameters, with the higher-accuracy model selected to generate the spatiotemporal distribution of key water quality parameters in Liangzi Lake.
Second, in the spatial quantification of human activity intensity, the study selected eight indicators to reflect human activity characteristics around Liangzi Lake. Some indicators, derived from statistical panel data, were transformed into spatial data by constructing grids and integrating land use data, using spatial interpolation and weight allocation methods. The remaining indicators were directly obtained as spatial data. Subsequently, the average values of these indicators within 1,000, 2,000, 3,000, and 4,000 m buffer zones were calculated to quantify human activity intensity across different buffer ranges.
Study area
In recent years, due to the development of primary industries such as aquaculture and livestock farming around the lake, as well as the growth of manufacturing and supply industries, Liangzi Lake has experienced environmental issues such as eutrophication, non-point source pollution, and a reduction in lake area. According to previous studies, the main sources of pollutants in Liangzi Lake include upstream river input, surface runoff input, and human activities (such as industrial, agricultural, and livestock activities) in buffer zones (Sun et al. 2021; Zhou et al. 2021).
Data sources
Remote sensing data
Sentinel-2 satellite data are sourced from the European Copernicus Space Program and can be freely accessed via the Copernicus Data Hub (https://scihub.copernicus.eu/). In this study, Sentinel-2 Level-1C data were downloaded and processed into Level-2A reflectance images using the Sen2Cor processor (version 2.9), with its correction effectiveness for water bodies previously validated (Sola et al. 2018; Warren et al. 2019; Kim et al. 2022). To standardize resolution, the original images were resampled to 10 m through interpolation, ensuring pixel consistency.
The study selected 44 cloud-free remote sensing images of the study area from January 2019 to February 2023 (acquisition dates detailed in Table S1), covering the spring, summer, autumn, and winter seasons of the region.
Water quality data
This study selected six water quality parameters, including dissolved oxygen (DO), turbidity (Turb), permanganate index (CODMn), ammonia nitrogen (NH3-N), TP, total nitrogen (TN), and chlorophyll-a (Chla). Data, spanning January 2019 to March 2023, were obtained from the China Environmental Monitoring Center and monitored by four national surface water quality automatic monitoring stations in Wuhan. The geographical locations of these stations are shown in the right panel of Figure 1, with detailed coordinates provided in Table S2. Water quality data were updated every four hours and synchronized with Sentinel-2 satellite observations around 11:00 AM by selecting measurements from 8:00 to 12:00 AM.
Socioeconomic and meteorological data
The socioeconomic data used in this study (e.g., fertilizer application rates, grain production, and livestock numbers) were primarily derived from the statistical yearbooks of Hubei Province and the province's national economic and social development bulletins. These data were aggregated at the district or county level. For indicators lacking district- or county-level data, city-wide figures were adjusted proportionally according to area. Specifically, wind speed data were obtained from the National Climatic Data Center (station ID: 57494, Wuhan Station), while temperature (TEM) and precipitation (PRE) data for the period 2019–2022 were sourced from the National Earth System Science Data Center (http://www.geodata.cn).
Water quality remote sensing inversion model
Single-band reflectance from Sentinel-2 imagery was used to create various band combinations (Table 1). These combinations were analyzed for correlation with measured water quality data. The calculation method is outlined in Supplementary Information S1. The 10 band combinations with the highest correlations were selected. BP neural network and RF models were then applied to link them with water quality parameters. The dataset was split randomly: 70% for training and 30% for accuracy validation. Model performance was assessed using three metrics: coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). The more accurate inversion model was chosen to evaluate the spatiotemporal distribution of key water quality parameters in Liangzi Lake. Seasonal or annual means were calculated using all valid pixels within the respective period (a season or a year).
Band combination modes
Band combination . | Formula . |
---|---|
Single-band | bi |
lnbi | |
Two-band | bi/bj |
bi − bj | |
bi+bj | |
bi/(bi − bj) | |
((bi − bj)/(bi + bj))2 | |
Three-band | bi/(bj + bk) |
bi/(bj − bk) | |
Black and Odorous Water Index (Yao et al. 2019) | (Rrs(560) − Rrs(665))/(Rrs(490) + Rrs(560) + Rrs(665)) |
Water Cleanliness Index (Li et al. 2019) | ![]() |
Enhanced Vegetation Index (Huete et al. 2002) | 2.5(Rrs(842) − Rrs(665))/(Rrs(842) + 6Rrs(665) − 7.5Rrs(490) + 1) |
Chl-a Index (Ogashawara & Li 2019) | (Rrs(740)/Rrs(665) − Rrs(740)/Rrs(705) |
Band combination . | Formula . |
---|---|
Single-band | bi |
lnbi | |
Two-band | bi/bj |
bi − bj | |
bi+bj | |
bi/(bi − bj) | |
((bi − bj)/(bi + bj))2 | |
Three-band | bi/(bj + bk) |
bi/(bj − bk) | |
Black and Odorous Water Index (Yao et al. 2019) | (Rrs(560) − Rrs(665))/(Rrs(490) + Rrs(560) + Rrs(665)) |
Water Cleanliness Index (Li et al. 2019) | ![]() |
Enhanced Vegetation Index (Huete et al. 2002) | 2.5(Rrs(842) − Rrs(665))/(Rrs(842) + 6Rrs(665) − 7.5Rrs(490) + 1) |
Chl-a Index (Ogashawara & Li 2019) | (Rrs(740)/Rrs(665) − Rrs(740)/Rrs(705) |
Not.: i, j, and k represent the bands of Sentinel-2, i, j, k = 2, 3, 4, 5, 6, 7, 8, 8A, 9,11,12; ,
, and λ2, λ3, λ4 represent the central wavelength of the second, third, and fourth bands of Sentinel-2, respectively.
The BP neural network is a multi-layer model trained by error BP. It excels in nonlinear mapping, making it popular for water quality remote sensing (He et al. 2021; Chen et al. 2022b). This study used a three-layer BP neural network. The learning rate was set to 0.015 after optimization tests. Other parameter details are in Table S3. The RF model consists of multiple decision trees (Jiang et al. 2023). It boosts accuracy and reduces overfitting through combined predictions. This makes it effective for diverse datasets. In this study, the RF model used 100 trees, selected experimentally. Other parameters stayed at default values.
Construction and quantification of human activity intensity
Human activities that stress water environment systems mainly include industrial activities, agricultural activities, and human living activities. Based on literature review and expert consultation, eight indicators were selected to represent human activity intensity: grain yield per unit of cultivated land (GYUCL), fertilizer load on cultivated land (FLCL), livestock density (LD), average industrial output per unit area (AIOUA), population density (PD), road density (RD), nighttime light intensity (NLI), and impervious surface ratio (ISR).
The raw data (GYUCL, FLCL, LD, and AIOUA) are socioeconomic panel data, requiring spatial processing to reveal their distribution patterns. The processing method is as follows: First, a 30 m × 30 m grid was created within the study area, and land use data (derived from 335,709 Landsat images processed by Yang and Huang using Google Earth Engine (Yang & Huang 2021)) were integrated using the Union tool in ArcGIS. Next, the proportion of different land use types in each grid cell was calculated. Finally, these proportions were multiplied by the weights corresponding to each indicator to obtain the spatial adjustment coefficients.
The weight of each land use type depends on its correlation with human activity intensity, population distribution, production functions (e.g., grain or industrial output), and residential suitability. These factors collectively determine the weight differences in the spatialization process. The weights are obtained through methods such as literature review, expert scoring, and field surveys. The detailed information on the spatial land use weights of the above four socioeconomic indicators is shown in Table S4.
The buffer zone refers to the land area extending outward from the lakeshore as the inner boundary by different distances (set as 1,000, 2,000, 3,000, and 4,000 m in this study). As a transitional area between the lake and the surrounding land, this region is subject to agricultural, urban construction, transportation, and other activities, whose impacts on lake water quality, hydrological processes, and ecological health are non-negligible. The values of human activity indicators within each buffer zone were calculated by averaging all grid-based indicators in the zone, in order to reflect the overall characteristics of human activity intensity in the region.
Redundancy analysis
Water quality changes are also susceptible to the influence of meteorological factors, such as TEM, PRE, and wind speed (WIN). For example, PRE affects water quality through runoff and dilution processes, TEM regulates chemical and biological activities in water bodies, and wind speed influences lake mixing and resuspension processes. Therefore, in addition to analyzing the impacts of human activity factors on water quality, this study also included these three meteorological factors. However, it is important to note that the study focused on Liangzi Lake as a whole and did not delve into specific conditions of individual sub-lake areas. This is due to the study period being four years and the analysis being based on annual data, which did not provide enough samples for correlation and RDA for individual sub-lake areas.
In this study, the response variable matrix comprises three water quality parameters – CODMn, TP, and Turb – which were selected due to their relatively high retrieval accuracy. The explanatory variable matrix includes 11 influencing factors, consisting of 3 meteorological variables and 8 indicators representing the intensity of human activities. RDA was conducted for water quality in relation to natural factors and human activity factors within 1,000, 2,000, 3,000, and 4,000 m buffer zones. The method for delineating buffer zones of each sub-lake area is detailed in Supplementary Information S2.
To avoid the problem of overlapping information among explanatory variables, we first used variance inflation factor (VIF) analysis to identify and remove variables with strong collinearity. Removing these redundant variables helps ensure the reliability of subsequent statistical analysis. After filtering out highly collinear variables, we applied RDA to explore how the remaining explanatory variables relate to water quality parameters. RDA is a multivariate statistical method that can reveal how multiple environmental and human activity factors jointly influence water quality (Israels 1984; Ding et al. 2016). This approach aligns with our study's goal of identifying the most influential driving forces.
The RDA is implemented, and the ordination plots are produced using Canoco software as a means to quantify the relationships between the explanatory variables and water quality parameters. This process enables the evaluation of factors potentially affecting water quality, with emphasis placed on explanatory variables with longer arrows aligned with the direction of the water quality parameters (Zhao et al. 2015).
RESULTS
Performance evaluation of inversion models
During the correlation analysis, it was found that DO, TN, and Chla exhibited weak correlations with remote sensing image reflectance, making them difficult to effectively retrieve using remote sensing data. Considering both modeling accuracy and practical applicability, this study excluded these three parameters from the remote sensing inversion models, which may to some extent limit a comprehensive depiction of lake water quality. Therefore, the models were constructed based solely on the parameters with stronger correlations to remote sensing data: CODMn, TP, and Turb.
The inversion accuracy results are shown in Table 2. For both the training and testing sets, the coefficients of determination (R2) for CODMn, TP, and Turb inverted using RF were superior to those obtained using the BP neural network. Notably, for CODMn, RF achieved an R2 of 0.92 in the testing set, 0.86 for TP, and 0.76 for Turb. The MAE and RMSE values were lower under the RF model compared to the BP model, demonstrating the superior inversion performance of RF. Hence, the RF-based water quality inversion values were selected to explore the relationship between water quality and human activities. The RF model performed better in this study, possibly due to its stronger anti-overfitting ability and its advantage in handling high-dimensional nonlinear data.
Comparison of water quality inversion model accuracy
Index . | Inversion model . | R2 . | MAE . | RMSE . | |||
---|---|---|---|---|---|---|---|
Training set . | Test set . | Training set . | Test set . | Training set . | Test set . | ||
CODMn | RF | 0.93 | 0.92 | 0.286 | 0.355 | 0.417 | 0.462 |
BP | 0.76 | 0.79 | 0.611 | 0.646 | 0.741 | 0.853 | |
TP | RF | 0.91 | 0.86 | 0.002 | 0.005 | 0.003 | 0.006 |
BP | 0.80 | 0.79 | 0.004 | 0.005 | 0.006 | 0.006 | |
Turb | RF | 0.76 | 0.76 | 5.404 | 6.310 | 11.50 | 8.833 |
BP | 0.50 | 0.57 | 9.183 | 8.321 | 16.37 | 12.11 |
Index . | Inversion model . | R2 . | MAE . | RMSE . | |||
---|---|---|---|---|---|---|---|
Training set . | Test set . | Training set . | Test set . | Training set . | Test set . | ||
CODMn | RF | 0.93 | 0.92 | 0.286 | 0.355 | 0.417 | 0.462 |
BP | 0.76 | 0.79 | 0.611 | 0.646 | 0.741 | 0.853 | |
TP | RF | 0.91 | 0.86 | 0.002 | 0.005 | 0.003 | 0.006 |
BP | 0.80 | 0.79 | 0.004 | 0.005 | 0.006 | 0.006 | |
Turb | RF | 0.76 | 0.76 | 5.404 | 6.310 | 11.50 | 8.833 |
BP | 0.50 | 0.57 | 9.183 | 8.321 | 16.37 | 12.11 |
Validation results of water quality inversion accuracy. (N represents the sample size.)
Validation results of water quality inversion accuracy. (N represents the sample size.)
Spatiotemporal distribution of water quality parameters
Intra-annual variability
Spatial distribution of CODMn in Liangzi Lake across different seasons.
These above three water quality parameters exhibit significant seasonal variation. Overall, CODMn, TP, and Turb were highest in autumn (September–November), followed by summer (June–August), and lowest in spring (March–May) or winter (December–February of the following year). Only CODMn and TP showed slight deviations in seasonal variation in certain years; for example, CODMn was higher in summer than in autumn in 2020 and 2022, and TP also showed similar trends in 2022. This seasonal variation pattern may be influenced by the combined effects of climatic conditions and human activities. For example, increased rainfall and the input of agricultural non-point source pollution in summer and autumn may lead to elevated nutrient concentrations; while in winter, lower temperatures and weakened biological activity are conducive to water quality improvement. In addition, seasonal differences in human activities such as tourism, aquaculture, and sewage discharge may also affect the spatial and temporal distribution of pollutants.
Spatial averages of water quality parameters across different seasons (a) and years (b) in Liangzi Lake and its sub-lake regions.
Spatial averages of water quality parameters across different seasons (a) and years (b) in Liangzi Lake and its sub-lake regions.
Inter-annual variability
Spatial distribution of CODMn in Liangzi Lake across different years.
The inter-annual trends of water quality parameters in the five sub-lake areas were generally consistent with those of the entire Liangzi Lake. After 2020, NS Lake exhibited the smallest changes in the three water quality parameters compared to other sub-lake areas, indicating it had the lowest pollution levels. In contrast, GT Lake showed the greatest increase and the highest pollution levels.
Additionally, the spatial distribution maps of water quality parameters derived from remote sensing inversion (Figures 6, S3, and S4) show relatively higher values in lake branches and shallow nearshore areas. This is due to their geographical location and ecological characteristics, making them more susceptible to external pollution sources. With limited environmental capacity, these areas have weaker dilution and self-purification abilities, leading to more severe water pollution. This finding is consistent with actual environmental phenomena and further confirms the effectiveness of the water quality inversion model used in this study. To mitigate pollution in these areas, water quality can be improved by controlling pollution sources and establishing ecological buffer zones.
Human activity intensity status in the buffer zones
Summary of changes and spatial variations in human activity intensity indicators across buffer zones
Indicator . | Trend of change over the years . | Differences across buffer zones . | Max indicator area . | Min indicator area . |
---|---|---|---|---|
GYUCL | No significant change | Consistent across all buffer zones | GT | ZQ and NS |
FLCL | Decreasing over the years | The use of chemical fertilizers decreased in the nearer buffer zones (1,000 m, 2,000 m) while maintaining a relatively high usage level in the farther buffer zones (3,000 m, 4,000 m). | GT | ZQ and NS |
LD | Increasing over the years | Consistent across all buffer zones | No significant differences | |
AIOUA | Significant decline in 2020, followed by a continuous increase until 2022 | For NS and GT, the AIOUA increased progressively as the buffer distance expanded from 1,000 to 4,000 m. Conversely, For ZQ, MJ, and QJ, the AIOUA was highest within the 1,000 m buffer zone. | NS | GT |
PD | In all buffer zones of ZQ, GT, and QJ, as well as the farther buffer zones of NS (3,000 and 4,000 m), there was a noticeable decline in 2020, followed by a slight increase. In the closer buffer zones of NS (1,000 and 2,000 m) and all buffer zones of MJ, there was a rapid increase in 2020, followed by a continued slight increase. | Except for GT, in the other sub-lake areas, PD was lower in the closer buffer zones (1,000 m, 2,000 m) and higher in the farther buffer zones (3,000 m, 4,000 m). | GT | QJ |
RD | Increasing over the years | The RD increased as the buffer distance increased from 1,000 to 4,000 m. Temporary roads were possibly added in the buffer zones of NS and QJ in 2020. | NS | QJ and GT |
NLI | Rapid annual growth | The NLI in all sub-lake areas, except for ZQ, was relatively weaker within the 1,000 m buffer zone, but significantly stronger within the 2,000, 3,000, and 4,000 m buffer zones. | NS | ZQ |
ISR | Nearly unchanged or showing very slight growth | For NS and GT, the ISR increased as the buffer distance increased from 1,000 to 4,000 m. In contrast, the opposite trend occurred in ZQ, MJ, and QJ. | MJ | ZQ |
Indicator . | Trend of change over the years . | Differences across buffer zones . | Max indicator area . | Min indicator area . |
---|---|---|---|---|
GYUCL | No significant change | Consistent across all buffer zones | GT | ZQ and NS |
FLCL | Decreasing over the years | The use of chemical fertilizers decreased in the nearer buffer zones (1,000 m, 2,000 m) while maintaining a relatively high usage level in the farther buffer zones (3,000 m, 4,000 m). | GT | ZQ and NS |
LD | Increasing over the years | Consistent across all buffer zones | No significant differences | |
AIOUA | Significant decline in 2020, followed by a continuous increase until 2022 | For NS and GT, the AIOUA increased progressively as the buffer distance expanded from 1,000 to 4,000 m. Conversely, For ZQ, MJ, and QJ, the AIOUA was highest within the 1,000 m buffer zone. | NS | GT |
PD | In all buffer zones of ZQ, GT, and QJ, as well as the farther buffer zones of NS (3,000 and 4,000 m), there was a noticeable decline in 2020, followed by a slight increase. In the closer buffer zones of NS (1,000 and 2,000 m) and all buffer zones of MJ, there was a rapid increase in 2020, followed by a continued slight increase. | Except for GT, in the other sub-lake areas, PD was lower in the closer buffer zones (1,000 m, 2,000 m) and higher in the farther buffer zones (3,000 m, 4,000 m). | GT | QJ |
RD | Increasing over the years | The RD increased as the buffer distance increased from 1,000 to 4,000 m. Temporary roads were possibly added in the buffer zones of NS and QJ in 2020. | NS | QJ and GT |
NLI | Rapid annual growth | The NLI in all sub-lake areas, except for ZQ, was relatively weaker within the 1,000 m buffer zone, but significantly stronger within the 2,000, 3,000, and 4,000 m buffer zones. | NS | ZQ |
ISR | Nearly unchanged or showing very slight growth | For NS and GT, the ISR increased as the buffer distance increased from 1,000 to 4,000 m. In contrast, the opposite trend occurred in ZQ, MJ, and QJ. | MJ | ZQ |
Note. GYUCL, FLCL, LD, AIOUA, PD, RD, NLI, and ISR denote grain yield per unit of cultivated land, fertilizer load on cultivated land, livestock density, average industrial output per unit area, population density, road density, nighttime light intensity, and impervious surface ratio, respectively; ZQ, NS, GT, MJ, and QJ are short for Zhangqiao sub-lake, Niushan sub-lake, Gaotang sub-lake, Manjiang sub-lake, and Qianjiang sub-lake.
Spatial distribution of human activity intensity in the full buffer zone of Liangzi Lake across different years: (a) GYUCL, (b) FLCL, (c) LD, (d) AIOUA, (e) PD, (f) RD, (g) NLI, and (h) ISR.
Spatial distribution of human activity intensity in the full buffer zone of Liangzi Lake across different years: (a) GYUCL, (b) FLCL, (c) LD, (d) AIOUA, (e) PD, (f) RD, (g) NLI, and (h) ISR.
Additionally, half of the indicators (FLCL, PD, RD, and NLI) showed higher values in larger buffer zones, indicating that human activities were primarily concentrated in the peripheral areas of the lake rather than the nearshore regions. This trend may be due to urbanization and industrial development occurring further from the lake, while nearshore areas were designated as low-intensity or non-development zones for water quality and ecological protection.
Notably, AIOUA and PD showed a significant decrease in 2020, followed by a rebound. One possible explanation is that this relates to prolonged lockdowns around Liangzi Lake during the COVID-19 pandemic. It's assumed that lockdowns reduced industrial production and migrant workers, thus affecting the AIOUA and PD.
Impact of human activities in buffer zones on lake water quality
RDA ordination plot at (a) 1,000 m, (b) 2,000, (c) 3,000, and (d) 4,000 m scales.
RDA ordination plot at (a) 1,000 m, (b) 2,000, (c) 3,000, and (d) 4,000 m scales.
Table S7 quantitatively presents the RDA results for different buffer zones (1,000, 2,000, 3,000, 4,000 m), including the percentage of total variance explained by each explanatory variable and statistical significance (p-values). The factors influencing water quality changes did not vary significantly across different buffer zone scales. Except for the 1,000 m buffer zone, where the composition of influential factors (PRE, AIOUA, and LD) slightly differed, the top three contributing factors were identical across the other three buffer zone scales, namely PD, PRE, and LD.
PRE and LD were consistently the primary factors influencing water quality changes across all buffer zones, explaining 26.7–30.8% and 12.9–17.6% of the water quality variation, respectively. Combined with the correlation analysis results (Figures S5 and S6), PRE showed a significant negative correlation with CODMn, TP, and Turb in all buffer zones, particularly with TP (r = −0.68, p < 0.001). The increase in PRE may have facilitated the dispersion and dilution of organic pollutants, nutrients, and suspended particles in the water, thereby improving water quality. LD (representing livestock and poultry farming activities) exhibited a significant positive correlation with TP and Turb in multiple buffer zones, indicating that livestock and poultry farming activities have a notable impact on the eutrophication and turbidity of the water body. Based on this finding, the dilution effect of PRE can be enhanced by optimizing rainwater management (e.g., adding rainwater collection facilities), and waste management of LD-related livestock and poultry farming activities can be strengthened to reduce pollution.
AIOUA (representing industrial activities) had a particularly significant impact on water quality in the 1,000 m buffer zone, explaining 28.4% of the water quality variation. However, its influence gradually diminished as the buffer zone expanded. In the 1,000 m buffer zone, AIOUA was significantly negatively correlated with CODMn (r = −0.55, p < 0.05). This result may reflect the effects of efficient wastewater treatment measures, stringent pollution control policies, and the characteristics of industrial activities. These factors might have led to lower CODMn levels being detected in lake areas associated with high industrial activity. As the buffer zone expanded, other non-industrial factors became more important in influencing water quality, reducing the significance of industrial activities. The influence of AIOUA diminished as the buffer zone expanded, possibly because industrial activities were primarily concentrated in nearshore areas, while in larger buffer zones, contributions from agriculture and urbanization gradually became dominant.
While AIOUA was identified as a major influencing factor only within the 1,000 m buffer zone, PD replaced it as the primary influencing factor in the 2,000, 3,000, and 4,000 m buffer zones, explaining 25.0–35.7% of the water quality variation. PD showed a significant positive correlation with CODMn and TP, especially with CODMn (r = 0.70–0.78, p < 0.001). PD reflects the broader impacts of domestic sewage discharge, land-use changes (such as urbanization), and infrastructure. Domestic sewage typically carries organic matter, nutrients (such as nitrogen and phosphorus), and pathogens, which directly affect water quality indicators like CODMn and TP. As the buffer zone expands, the cumulative effects of PD on water quality become stronger, making PD the primary influencing factor in larger buffer zones. This possibly stems from rising PD in urbanization, causing a surge in domestic sewage. Land-use changes, like farmland turning into residential or commercial areas, add impervious surfaces and speed up runoff pollution. Infrastructure development disrupts the hydrological cycle, worsening the buildup of organic matter and nutrients, thus degrading water quality.
In summary, the main factors influencing water quality and their strengths vary across different buffer zone scales in Liangzi Lake. In formulating water quality management strategies, it is essential to consider these primary influencing factors at different spatial scales to implement more effective measures for water quality protection. Detailed water quality management strategies within the buffer zones of Liangzi Lake are presented in Table 4.
Summary of main influencing factors and management strategies for each buffer zone
Buffer zone . | Main influencing factors . | Management strategies . |
---|---|---|
1,000 m | PRE, AIOUA, and LD |
|
2,000m 3,000 m 4,000 m | PD, PRE, and LD |
|
Buffer zone . | Main influencing factors . | Management strategies . |
---|---|---|
1,000 m | PRE, AIOUA, and LD |
|
2,000m 3,000 m 4,000 m | PD, PRE, and LD |
|
DISCUSSION
Comparison of water quality evaluation methods
To validate the effectiveness of the remote sensing-based water quality assessment (referred to as planar assessment) used in this study, a comparison was made with the water quality assessment based on monitoring stations (referred to as point assessment). Four monitoring stations are distributed across NS Lake, QJ Lake, and ZQ Lake (Figure 1). In the point assessment, NS Lake's water quality was represented by the average of water parameters from monitoring station S1, ZQ Lake's by S4, and QJ Lake's by the average of S2 and S3. For the planar assessment, all pixels of water quality retrieval within the corresponding sub-lake area were extracted and averaged. As shown in Figure S7, the planar assessment values for the three water quality parameters were generally higher than the point assessment values. This may be because remote sensing inversion covers the entire water body and is more sensitive to localized pollution, whereas monitoring stations primarily reflect conditions at specific locations. Additionally, uncertainties in the remote sensing inversion model itself may lead to overestimation of certain water quality parameters. Therefore, while the results suggest that traditional monitoring station-based assessments may underestimate water pollution levels, the planar assessment values should still be interpreted with caution.
Additionally, water quality parameters were rated according to the ‘Environmental Quality Standards for Surface Water (GB 3838–2002)’ issued by the former State Environmental Protection Administration (now the Ministry of Ecology and Environment). The standard classifies water quality parameters into five categories, with Class I being the best and Class V the worst. Since Turb is not explicitly classified in this standard, only the evaluation results for CODMn and TP were compared. As shown in Table S8, the point assessment ratings were often lower than those of the planar assessment, with a higher frequency observed for TP, reaching up to 20%. This further suggests that a limited number of monitoring stations may not fully capture the actual pollution conditions of the water body, and incorporating remote sensing-based water quality monitoring can improve the accuracy of pollution assessments. Previous studies have also pointed out that the limited spatial coverage of monitoring stations often fails to capture the spatial heterogeneity of water pollution, while remote sensing technology, by providing high-resolution continuous data, can significantly improve the accuracy of water quality assessments (Gholizadeh et al. 2016; Jaywant & Arif 2024).
However, it should be noted that part of the difference between the two methods may also stem from the accuracy of the retrieval model. For example, Chebud et al. (2012) used Landsat Thematic Mapper (TM) data and an artificial neural network to monitor phosphorus, turbidity, and chlorophyll-a, achieving a high correlation (R2 > 0.95). Similarly, Elhag et al. (2019) employed Sentinel-2 data to retrieve chlorophyll-a, nitrate, and turbidity in Wadi Baysh Dam Lake, Saudi Arabia, with R2 values reaching 0.94–0.96. In comparison, the RF model in this study achieved R2 values of 0.92 for CODMn, 0.86 for TP, and 0.76 for turbidity. While these results are competitive, they are slightly lower, particularly for turbidity, which may be influenced by the optical complexity of Liangzi Lake (e.g., interference from suspended solids and colored dissolved organic matter). Future research could focus on improving the retrieval model to enhance its accuracy.
Qualitative analysis of factors affecting water quality in sub-lake areas
In the previous section, due to data limitations, only a quantitative analysis of the factors affecting the water quality of the entire lake was conducted. However, due to differences in geographical location and surrounding environmental conditions, the water quality varies significantly among the sub-lake areas within the lake. Therefore, this section conducted a qualitative analysis of the reasons for water quality differences in each sub-lake area by considering the socioeconomic and natural geographic characteristics of their surrounding buffer zones, aiming to provide a more comprehensive understanding of the driving factors behind water quality changes.
The GT Lake exhibited the highest levels of CODMn and TP. This may be attributed to the intensive agricultural activities and relatively high PD within its buffer zone. As shown in Table S6, the GT buffer zone had the highest values for GYUCL, FLCL, and PD, which collectively contributed to the increased input of organic matter and nutrients, leading to organic pollution and eutrophication in the lake area. To mitigate this problem, sustainable agricultural practices (such as reducing fertilizer use and adopting precision agriculture techniques) can be promoted, and population-related pollution can be managed by improving sewage treatment systems.
The water quality of NS Lake was relatively good, with both COD and TP at relatively low levels. This is closely related to the predominance of industrial production, lower intensity of agricultural activities, and strict control of industrial wastewater discharge within the NS buffer zone. The NS area had the highest values for AIOUA, RD, and NLI, reflecting its high level of industrialization. The reason industrial activities have not led to higher pollution may include effective wastewater treatment systems and strict environmental regulations. Additionally, NS Lake has been designated as a fishery resource protection area and an emergency backup water source for Wuhan, resulting in stricter environmental protection measures. NS Lake remains isolated from other sub-lake areas due to the NS dike separating it. Additionally, its smaller area results in reduced impact from wind and waves, which may be the reason for its low Turb.
MJ Lake experienced frequent boat traffic and had a large number of docks, making its ISR value the highest. The frequent waterborne activities may have contributed to the increased water pollution in this lake area. To reduce pollution, the impact of waterborne activities can be managed by restricting boat traffic, enforcing strict emission standards, and adding interception and filtration measures.
The buffer zones of ZQ and QJ Lakes had relatively low levels of development. In the ZQ buffer zone, indicators such as GYUCL, FLCL, NLI, and ISR had the lowest values, while the PD value in the QJ buffer zone was the lowest. This indicates that these areas had relatively few socioeconomic activities and low development intensity, contributing to their relatively good water quality. The low pollution level may stem from the reduction of anthropogenic pollution sources and the filtration of runoff by natural buffers (such as wetlands or vegetation).
CONCLUSION
This study utilized water quality remote sensing inversion models and response analysis methods to examine the spatiotemporal distribution characteristics of water quality in Liangzi Lake from 2019 to 2022 and to assess the impact of human activities on water quality changes. The findings demonstrate that TP, CODMn, and Turb were the water quality parameters with the highest inversion accuracy. Among the models tested, the RF model consistently outperformed the BP neural network, particularly in the inversion of CODMn, where the coefficient of determination (R2) reached 0.92. The possible reason for the RF model's superior performance is its strong anti-overfitting capability and its proficiency in handling high-dimensional nonlinear data.
The primary factors influencing water quality within buffer zones of Liangzi Lake were identified as PRE, industrial activities (AIOUA), PD, and livestock and poultry farming density (LD). PRE and LD were found to have the most significant impact on water quality across all buffer zones, explaining 26.7 to 30.8% and 12.9 to 17.6% of the variation in water quality, respectively. While AIOUA had a significant impact within the 1,000 m buffer zone, its influence diminished as the buffer zone expanded. This may be related to the concentrated spatial distribution and limited diffusion range of industrial pollution sources. To address this, industrial enterprises within the 1,000 m buffer zone can be required to install efficient wastewater treatment equipment, and vegetative buffer strips can be established along the lake to intercept runoff pollutants. Conversely, PD exerted a more substantial impact on water quality in the larger buffer zones (2,000, 3,000, and 4,000 m), particularly showing a strong positive correlation with CODMn and TP. This may reflect the cumulative effect of domestic sewage and non-point source pollution from densely populated areas as distance increases. To address this, wastewater treatment plants can be upgraded and urban wetlands can be constructed within the 2,000–4,000 m buffer zones.
The study also revealed marked seasonal and inter-annual variations in water quality, with the higher values observed in summer and autumn and the lower in spring and winter. Poor water quality in summer and autumn may result from the combined effects of rainfall-driven non-point source pollution, enhanced biological activity due to high temperatures, and peak agricultural and tourism activities. Water quality improved in 2020, possibly due to a temporary reduction in human activities during the pandemic. As restrictions were lifted and activities resumed, water quality subsequently declined. Furthermore, the water quality parameters were notably higher in lake branches and shallow nearshore areas, likely due to their geographical and ecological characteristics, which make them more susceptible to external pollution sources. Nearshore vegetative buffer strips should be strengthened, and pollution source emissions from tributaries should be controlled to reduce pollutant input.
This study underscores the complex interplay between human activities and meteorological factors on the water quality of Liangzi Lake across different spatial scales. The findings highlight the need for water quality management strategies that account for the spatial distribution and intensity of key influencing factors.
FUNDING
This research was supported by funding from the National Key R&D Program of China (2023YFC3205600).
AUTHOR CONTRIBUTIONS
Developed the methodology by M.W., Y.Q., and B.G. Material prepared, collection, and analysis by Y.Q., Y.B., and R.G. Wrote the original draft prepared by M.W. and Y.Q. Rendered support in funding acquisition by M.W. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.