Abstract
Freshwater lakes are facing increasingly serious water quality problems. Remote sensing techniques are effective tools for monitoring spatiotemporal information of chromophoric dissolved organic matter (CDOM), a biochemical indicator for water quality. In this study, the Gradient Boosting Regression Tree (GBRT) model and Sentinel-2A/B imagery were combined to estimate low CDOM concentrations (0.003 m−1 < aCDOM(440) <1.787 m−1) in Xin'anjiang Reservoir, an important drinking water resource in Zhejiang Province, China, providing the CDOM distributions and dynamics with high spatial (10 m) and temporal (5 day) resolutions. The possible environmental factors that may affect CDOM spatiotemporal patterns and dynamics were analyzed using Sentinel-2 image-observed data in 2018. Results showed that CDOM in the reservoir exhibited a clear increased gradient from its transition and lacustrine zones to the riverine zones, indicating that the rivers carried a substantial load of organic matter to the lake. The precipitation may increase CDOM concentrations but it has a delayed effect, while it may also shortly decrease CDOM concentrations due to the rainwater dilution. We also found that the correlations between CDOM and water temperature, air pressure, and wind speed were very low, indicating that these factors may not have significant impacts on CDOM variations in the reservoir. This study demonstrated that the GBRT model and Sentinel-2 imagery have the potential to accurately monitor CDOM spatiotemporal variations in reservoirs with low CDOM concentrations, which advances our understanding on the relations between the dissolved organic matter and its coupling environmental factors in river-reservoir systems.
HIGHLIGHTS
Low-concentration reservoir CDOM (chromophoric dissolved organic matter) can be estimated by using Sentinel-2 images and machine learning.
The GBRT (Gradient Boosting Regression Tree) method performed much better than other traditional and machine learning methods.
The satellite observed CDOM variations demonstrated some correlations to the upstream hydrological and meteorological conditions of the reservoir.
INTRODUCTION
Freshwater lakes and reservoirs in high population density regions are facing increasingly serious water-quality problems, which are caused by human activities and climate change (Li et al. 2017; Mushtaq & Lala 2017). As one of the water quality indicators, chromophoric dissolved organic matter (CDOM) is the biologically active component of dissolved organic matter (DOM). CDOM absorbs up to 90% of the underwater solar radiation in 400–500 nm (Belanger et al. 2008), protecting underwater ecosystems from harmful UV radiation exposure (Stedmon et al. 2000). However, CDOM also has a few negative effects on the processes of drinking water treatment (Zhang et al. 2011), such as reducing the effectiveness of oxidants and disinfectants and producing undesirable disinfection by-products during oxidation processes (Baghoth et al. 2011). Understanding the sources, concentration, and cycling of CDOM in freshwaters is important for managing aquatic resources and predicting the outcomes of environmental change (Olmanson et al. 2020). As a result, monitoring CDOM in reservoirs and studying its spatiotemporal distribution are very important for effective water quality evaluation and drinking water conservation (Zhang et al. 2011).
Compared with the traditional methods of collecting field samples and measuring CDOM's absorption coefficients in the lab using spectrophotometers, remote sensing techniques can monitor CDOM variations over a large scale. The traditional remote sensing models for estimating CDOM include empirical, semi-analytical, spectral matching, matrix inversion, and so on (Kutser et al. 2005a; Brezonik et al. 2015; Ruescas et al. 2018). However, these models do not perform well for complex waters, particularly for those inland waters with low-concentration CDOM (Chen et al. 2019). Zhu et al. compared 15 CDOM retrieval methods using the data in Lake Huron, and they found that most methods did not provide reliable estimates of CDOM levels, and even those that provided the best estimates tended to yield underestimates at high CDOM levels and overestimates at low CDOM concentrations (Zhu et al. 2014). Artificial neural network (ANN) has been also used for CDOM inversion since the last century (Heddam 2014), and it shows potential to deal with complex inland water. Recently, machine learning-based methods have also been used for CDOM inversion and obtained accurate results (Ruescas et al. 2018). So far, there are many available machine learning-based methods, such as RFR (Random Forest Regression), Gradient Boosting Regression Tree (GBRT), SVR (Support Vector Machine Regression), and GPR (Gaussian Process Regression) (Ruescas et al. 2018; Shah et al. 2019), but which one is more suitable for low-concentration CDOM in freshwater lakes/reservoirs remains unknown, and of course we want to determine the best one. We have noticed that recently the GBRT was proposed for machine learning (Deng et al. 2019), and it has been proved to be more robust and preferred than the other machine learning methods, such as the Random Forest and Support Vector Machine (Yang et al. 2020). GBRT has the advantages that it can fit complex nonlinear relationships but does not need prior data transformation or elimination of the outlier scans (Elith et al. 2008). Therefore, in this study, we tested the GBRT model and compared it with the other several machine learning methods, and our results did show that GBRT was the best among the tested models.
Some land-oriented satellite sensors, such as the Landsat series, have fine spatial resolutions (30 m in Landsat 8) and have been demonstrated with accepted accuracy for inland CDOM estimation. Some Landsat 8-based band-ratio models have been proposed for monitoring low CDOM concentrations (the absorption coefficients of CDOM at 440 nm ranges 0.066–1.242 m−1) (Chen et al. 2019). However, the temporal resolution (16 days) of Landsat series satellites is too low for complex inland waters, which are often highly varied in short periods (Toming et al. 2016). In the case of cloud cover on the day of the satellite passing, it is difficult to obtain high-quality Landsat images in a region within several months. Therefore, researchers have paid more attention to Sentinel-2 sensors, which have higher spatial (10 m) and temporal resolutions (5 days) (Chen et al. 2017b). Sentinel-2 sensors were widely applied to monitor CDOM concentrations (Toming et al. 2016; Xu et al. 2018; Al-Kharusi et al. 2020). Thus, in this study, Sentinel-2 was selected as image data to estimate CDOM concentrations. The bands and their spatial resolutions of Sentinel-2A/B are shown in Table 1.
Spectral bands for the Sentinel-2A/B sensors
. | Sentinel-2A . | Sentinel-2B . | . | ||
---|---|---|---|---|---|
Band number . | Central wavelength (nm) . | Bandwidth (nm) . | Central wavelength (nm) . | Bandwidth (nm) . | Spatial resolution (m) . |
1 | 443.9 | 27 | 442.3 | 45 | 60 |
2 | 496.6 | 98 | 492.1 | 98 | 10 |
3 | 560.0 | 45 | 559 | 46 | 10 |
4 | 664.5 | 38 | 665 | 39 | 10 |
5 | 703.9 | 19 | 703.8 | 20 | 20 |
6 | 740.2 | 18 | 739.1 | 18 | 20 |
7 | 782.5 | 28 | 779.7 | 28 | 20 |
8 | 835.1 | 145 | 833 | 133 | 10 |
8a | 864.8 | 33 | 864 | 32 | 20 |
9 | 945.0 | 26 | 943.2 | 27 | 60 |
10 | 1,373.7 | 75 | 1,376.9 | 76 | 60 |
11 | 1,613.7 | 143 | 1,610.4 | 141 | 20 |
12 | 2,202.4 | 242 | 2,185.7 | 238 | 20 |
. | Sentinel-2A . | Sentinel-2B . | . | ||
---|---|---|---|---|---|
Band number . | Central wavelength (nm) . | Bandwidth (nm) . | Central wavelength (nm) . | Bandwidth (nm) . | Spatial resolution (m) . |
1 | 443.9 | 27 | 442.3 | 45 | 60 |
2 | 496.6 | 98 | 492.1 | 98 | 10 |
3 | 560.0 | 45 | 559 | 46 | 10 |
4 | 664.5 | 38 | 665 | 39 | 10 |
5 | 703.9 | 19 | 703.8 | 20 | 20 |
6 | 740.2 | 18 | 739.1 | 18 | 20 |
7 | 782.5 | 28 | 779.7 | 28 | 20 |
8 | 835.1 | 145 | 833 | 133 | 10 |
8a | 864.8 | 33 | 864 | 32 | 20 |
9 | 945.0 | 26 | 943.2 | 27 | 60 |
10 | 1,373.7 | 75 | 1,376.9 | 76 | 60 |
11 | 1,613.7 | 143 | 1,610.4 | 141 | 20 |
12 | 2,202.4 | 242 | 2,185.7 | 238 | 20 |
The goals of this study are (1) exploring the ability of the GBRT model and Sentinel-2 images to retrieve low-concentration CDOM in Xin'anjiang Reservoir; (2) identifying the possible environmental factors that may affect CDOM spatiotemporal patterns of the reservoir. To achieve the goals, field measurements were conducted in Xin'anjiang Reservoir from April to October 2018. The Sentinel-2 image-observed results demonstrated the CDOM spatiotemporal variations, and hydrological and meteorological data of the reservoir in 2018 were used to study the couplings between CDOM and water temperature, river flow, rainfall, and other environmental factors.
DATA AND METHODS
Study site
Xin'anjiang Reservoir (also known as Lake Qiandaohu) is in the western part of Zhejiang Province and the southern part of Anhui Province, China (Figure 1). It is the third major river system in Anhui province and its discharge makes the largest river, Xin'anjiang River (or Qiantang River), in Zhejiang province (Wang et al. 2012). This reservoir has a water area of 580 km2, 1,078 islands with each area larger than 2,500 m2, a maximal water volume of 178.4 × 108 m3, and a basin area of 10,480 km2 (Wu et al. 2015). The recorded mean annual air temperature is 17 °C, and the mean annual precipitation is 1,637 mm (1961–2014). Due to its high water clarity and good water quality, Xin'anjiang Reservoir is a key drinking water source for China's Yangtze River Delta Region, serving a surrounding population of at least ten million (Xin et al. 2007). However, short-term algal blooms have appeared in the lake since the 1990s (Zhang et al. 2014), making its water quality worse periodically.
(a) Study site map showing the field sampling points, locations of meteorological stations and hydrological stations, three main inflow rivers, and five sub-regions (C, NE, NW, SE, and SW) in Xin'anjiang Reservoir. (b) A Sentinel-2 raw image acquired on April 9 in 2018, showing the land cover in the watershed of the Xin'anjiang Reservoir.
(a) Study site map showing the field sampling points, locations of meteorological stations and hydrological stations, three main inflow rivers, and five sub-regions (C, NE, NW, SE, and SW) in Xin'anjiang Reservoir. (b) A Sentinel-2 raw image acquired on April 9 in 2018, showing the land cover in the watershed of the Xin'anjiang Reservoir.
According to its geographical and ecological function, the Xin'anjiang Reservoir consists of a riverine zone, a transition zone and a lacustrine zone (Liu et al. 2014). The concentration ranges of chlorophyll, total suspended matter and total phosphorus in Xin'anjiang Reservoir in October 2018 were 2.402–13.165 μg/L, 0.476–3.430 mg/L, and 0.009–0.112 mg/L, respectively (Zeng et al. 2020). In previous studies, the reservoir was divided into five sub-regions to reveal nutrient status in different aquatic environments (Li et al. 2017). In this study, to better analyze temporal and spatial variations of CDOM, we also divided the reservoir into the northeastern (NE, the riverine zone), northwestern (NW, the riverine zone), southwestern (SW, the riverine zone), central (C, the transition zone) and southeastern areas (SE, the lacustrine zone), see Figure 1.
Field measurement

Hydrological and meteorological data
The monthly flow data of Xin'anjiang River in 2018 was collected from two hydrological stations called Tunxi and Yuliang (Figure 1), from the Taihu basin hydrological information service system (http://data.cma.cn/). Meteorological data, including rainfall, air temperature, wind speed, and humidity were obtained at the Chun'an meteorological station (Figure 1) and downloaded from the China Meteorological Data-Sharing Service System.
Sentinel-2A/B acquisition and preprocessing
Gradient Boosting Regression Tree
As a non-linear model, GBRT can fit complex nonlinear relationships and does not need prior data transformation or elimination of the outlier scans (Elith et al. 2008). The GBRT model uses the CART tree as the weak classifier and requires multiple iterations. The newly generated regression tree would fit the error of the previous tree in each iteration, which is the biggest difference from the Random Forest. In Random Forest, multiple regression trees are independent, and the training results of different trees are not further optimized. The gradient descent method is used to move toward the negative gradient of loss function in each iteration, which makes the loss function decline. In general, each iteration of the GBRT model produces a weak classifier, whose accuracy is not high, but integration of weak classifiers can achieve higher accuracy (Natekin & Knoll 2013; Rokach 2016; Kuang et al. 2018). The weighted sum of the prediction results of each regression tree is the predicted value, see Figure 2.
There are several important parameters of the GBRT model: (a) the max depth of each weak learner (generally no more than five), (b) The maximum quantity of weak learners, and (c) the learning rate (higher learning rate means a stronger correction and makes the model more complex). Thus, we need to choose appropriate parameters for improving the performance of the model.
Other alternative machine learning algorithms
Machine-learning methods have been used to simulate the nonlinear relationships between CDOM concentration and remote sensing reflectance. They are mainly classified into tree-based algorithms, neural networks, and kernel methods. Many previous studies have confirmed that neural networks can predict CDOM absorption from ocean color (Kishino et al. 2005; Chen et al. 2017a), but it requires large training data and possibly produces the risk of over-fitting and being trapped in a local minimum (Zhan et al. 2001). In this work, four nonlinear regression machine-learning algorithms were tested and compared: RFR (Random Forest Regression), GBRT, SVR (Support Vector Machine Regression), and GPR (Gaussian Process Regression). RFR and GBRT are representatives of decision families, while SVR and GPR are the representatives of kernel methods.
ALGORITHM DEVELOPMENT
Algorithm architecture
The band-ratio algorithms involving Rrs in blue, green, and red domains have been widely used for CDOM remote sensing in freshwater lakes (Kutser et al. 2005b; Mannino et al. 2014; Joshi et al. 2017). In addition, the band-ratio models can reduce more uncertainty of atmospheric correction than using single-band reflectance (Stramska & Stramski 2005; Cherukuru et al. 2016). In this study, all Sentinel-2 band-ratios combined among B2(Rrs490), B3(Rrs560), and B4(Rrs665) were tested as the inputs of the GBRT model and we determined the optimal combinations in terms of four indicators: coefficient of determination (R2), root mean squared error (RMSE), Bias and mean absolute percentage error (MAPE).
The field-measured 50 samples were randomly divided into two independent datasets with a sample proportion of approximately 3:1, a common ratio used in machine learning (Shah et al. 2019). Therefore, the GBRT's training and validation datasets contain 37 and 13 field measured samples, respectively. We also used the 16 samples collected in April 19 and June 18 for image validation because the Sentinel-2 also acquired the images in the two days. The image-derived spectra were input into the GBRT model and then the model outputs were validated by the 16 field-measured CDOM concentrations. Note that during the two field measurements we collected 23 samples, but seven were not used due to the cloud cover in the images. The optimal number of base learners is 50. The learning rate is 0.1. The max depth of each learner is 5 and other parameters are configured by the model default.
Comparison to other CDOM retrieval algorithms
RESULTS AND DISCUSSION
Field measured CDOM and above-surface spectra
In many previous study sites, CDOM concentrations in inland waters were often found in wide ranges (Griffin et al. 2011; Kutser 2012), for example, aCDOM(440) ranged from 0.6 to 19.4 m–1 in 15 Minnesota lakes (Brezonik et al. 2005). Comparing with these CDOM highly-varied lakes, CDOM concentrations in Xin'anjiang Reservoir were relatively low in 2018. The field measured aCDOM(440) ranged from 0.1 to 1.78 m−1 with mean 0.7 m−1. Our field measured data showed that there were also seasonal variations: the measured aCDOM(440) in April was within 0.1–0.99 m−1 (mean 0.39 m−1), but in October the range changed to 0.7–1.8 m−1 (mean 1.14 m−1), demonstrating that CDOM concentrations were higher in autumn than in the early spring. The measured CDOM concentrations during our four field trips are shown in Table 2.
Descriptive statistics of the field measured aCDOM(440) in Xin'anjiang Reservoir, 2018
Date . | Min . | Max . | Mean . | SD . | Samples . |
---|---|---|---|---|---|
Apr. 9 | 0.248 | 0.994 | 0.609 | 0.291 | 10 |
Apr. 19 | 0.103 | 0.506 | 0.252 | 0.116 | 15 |
June 18 | 0.597 | 1.162 | 0.796 | 0.196 | 8 |
Oct. 27 | 0.710 | 1.788 | 1.142 | 0.339 | 17 |
All | 0.103 | 1.788 | 0.713 | 0.442 | 50 |
Date . | Min . | Max . | Mean . | SD . | Samples . |
---|---|---|---|---|---|
Apr. 9 | 0.248 | 0.994 | 0.609 | 0.291 | 10 |
Apr. 19 | 0.103 | 0.506 | 0.252 | 0.116 | 15 |
June 18 | 0.597 | 1.162 | 0.796 | 0.196 | 8 |
Oct. 27 | 0.710 | 1.788 | 1.142 | 0.339 | 17 |
All | 0.103 | 1.788 | 0.713 | 0.442 | 50 |
The measured remote sensing reflectance of the Xin'anjiang Reservoir is shown in Figure 3, showing the typical spectral signatures of complex inland freshwater (Toming et al. 2016; Xu et al. 2018). The low absorptions and scatterings of chlorophyll and carotene in phytoplankton produced the peaks at 570 nm (Gurlin et al. 2011; Liu et al. 2011). The small reflection valleys at 670 nm may be caused by maximum absorptions of chlorophyll-a (Beatriz Juarez et al. 2008). The chlorophyll fluorescence at around 700 nm, caused by the minimum absorptions of algal pigments and pure water, were not remarkable in Xin'anjiang Reservoir (Chen et al. 2020).
Model assessment and comparison
To determine the optimal inputs of the GBRT model, all the possible band-ratio combinations were examined and the statistical results are shown in Table 3. Over-fitting occurred when B2/B3 and B4/B3 were used as inputs, which led the model to perform poorly on the validation dataset. In terms of four indicators (R2, RMSE, Bias, and MAPE), B2/B3 and B4/B2 were selected to retrieve CDOM concentrations. The best GBRT model yielded R2 = 0.92 and 0.95, MAPE = 0.3 and 18%, RMSE = 0.16 and 0.13 m−1, and Bias = −0.0015 and −0.01 m−1 for the training and validation datasets, respectively. The GBRT-model-retrieved CDOM were compared with the field measured CDOM (see results in Figure 4(a) and 4(b)). Sixteen samples out of the training and validation datasets were further compared with the derived aCDOM(440) from two quasi-synchronous (within 3 hours) Sentinel-2 images (see Figure 3(c)). The B2/B3 and B4/B2 algorithms applied on Sentinel-2 images produced acceptable accuracy (R2 = 0.77, RMSE = 0.1 m−1, Bias= 0.062 m−1 and MAPE = 19.0%), indicating that the Sentinel-2 GBRT model had acceptable accuracy for retrieving CDOM concentrations in Xin'anjiang Reservoir. When the CDOM concentrations at lower levels (<0.37 m−1), there is still a certain overestimation. The extremely low CDOM concentrations will become weaker to the spectral response, which means that CDOM absorbance at low wavelengths may be overridden by Chl-a and SPM absorptions. At the same time, the weaker CDOM absorbance signal transmitted to the satellite image may be seriously affected by the noise.
Modeling aCDOM(440) using different Sentinel-2 band combinations
Input . | Dataset . | ![]() | ![]() | ![]() | ![]() |
---|---|---|---|---|---|
B2/B3, B4/B2 | training | 0.92 | 0.3% | 0.16 | −0.0015 |
validation | 0.95 | 18% | 0.13 | −0.0121 | |
B2/B3, B4/B3 | training | 0.97 | 0.01% | 0.01 | 0.0013 |
validation | 0.06 | 72% | 0.48 | −0.1026 | |
B4/B3, B4/B2 | training | 0.78 | 0.59% | 0.28 | −0.0047 |
validation | 0.61 | 46% | 0.29 | −0.0172 | |
B2/B3, B4/B2, B4/B3 | training | 0.910 | 0.44% | 0.21 | −0.0034 |
validation | 0.71 | 36% | 0.24 | −0.0058 |
Input . | Dataset . | ![]() | ![]() | ![]() | ![]() |
---|---|---|---|---|---|
B2/B3, B4/B2 | training | 0.92 | 0.3% | 0.16 | −0.0015 |
validation | 0.95 | 18% | 0.13 | −0.0121 | |
B2/B3, B4/B3 | training | 0.97 | 0.01% | 0.01 | 0.0013 |
validation | 0.06 | 72% | 0.48 | −0.1026 | |
B4/B3, B4/B2 | training | 0.78 | 0.59% | 0.28 | −0.0047 |
validation | 0.61 | 46% | 0.29 | −0.0172 | |
B2/B3, B4/B2, B4/B3 | training | 0.910 | 0.44% | 0.21 | −0.0034 |
validation | 0.71 | 36% | 0.24 | −0.0058 |
Estimating CDOM based on a GBRT model and Sentinel-2 band ratios B2/B3 and B4/B2: the field-measured versus the model-estimated CDOM for (a) the training dataset and (b) the validation dataset, and (c) validating the model using 16 field samples, which were collected at the same time (within 3 hours) of two Sentinel-2 images.
Estimating CDOM based on a GBRT model and Sentinel-2 band ratios B2/B3 and B4/B2: the field-measured versus the model-estimated CDOM for (a) the training dataset and (b) the validation dataset, and (c) validating the model using 16 field samples, which were collected at the same time (within 3 hours) of two Sentinel-2 images.
In this study, two models developed from previous studies were tested with our situ-data in Xin'anjiang Reservoir. We found that the logarithmic model (R2 = 0.0009, RMSE = 0.4578 m−1, see Figure 5(a)) and the polynomial model (R2 = 0.1125, RMSE = 0.4389 m−1, see Figure 5(b)) are not able to retrieve low CDOM concentrations in Xin'anjiang Reservoir. The relation between low CDOM concentrations and Rrs in Xin'anjiang Reservoir may be more complex. Simple logarithmic and polynomial models did not achieve a good fit in Xin'anjiang Reservoir. Whether the GBRT model can be generalized to other low-concentration reservoirs requires more data in support. We also compared the GBRT with the Random Forest (RF), Support Vector Machine (SVM), and Gaussian Process (GP) regression models. Band-ratios B2/B3 and B4/B3 were selected as inputs in all models. Results from the statistical analysis are shown in Table 4. Of the kernel methods, SVM did not perform well for either the training (R2 = 0.36, MAPE = 70.1%) or the validation datasets (R2 = 0.25, MAPE = 65.7%), and GP produced over-fitting for the validation dataset (R2 = 0, MAPE = 82.0%). Of the decision families, compared to the RF model, GBRT's R2 increased 25% while its MAPE decreased 62.1% for the validation dataset.
Comparison of four representative machine learning algorithms
Algorithm . | Dataset . | ![]() | ![]() | ![]() | ![]() |
---|---|---|---|---|---|
SVR | Training | 0.36 | 70.1% | 0.37 | −0.031 |
Validation | 0.25 | 65.7% | 0.39 | −0.047 | |
GRR | Training | 0.99 | 0.10% | 0.01 | 0.002 |
Validation | 0.02 | 82.0% | 0.77 | −0.51 | |
RFR | Training | 0.50 | 47.3% | 0.22 | 0.006 |
Validation | 0.76 | 47.5% | 0.31 | 0.002 | |
GBRT | Training | 0.92 | 0.3% | 0.16 | −0.002 |
Validation | 0.95 | 18% | 0.13 | −0.01 |
Algorithm . | Dataset . | ![]() | ![]() | ![]() | ![]() |
---|---|---|---|---|---|
SVR | Training | 0.36 | 70.1% | 0.37 | −0.031 |
Validation | 0.25 | 65.7% | 0.39 | −0.047 | |
GRR | Training | 0.99 | 0.10% | 0.01 | 0.002 |
Validation | 0.02 | 82.0% | 0.77 | −0.51 | |
RFR | Training | 0.50 | 47.3% | 0.22 | 0.006 |
Validation | 0.76 | 47.5% | 0.31 | 0.002 | |
GBRT | Training | 0.92 | 0.3% | 0.16 | −0.002 |
Validation | 0.95 | 18% | 0.13 | −0.01 |
Spatiotemporal variations of CDOM
The trained GBRT model was applied to 12 Sentinel-2 images in 2018 to estimate CDOM concentrations in Xin'anjiang Reservoir (see results in Figure 6). To further analyze CDOM's spatiotemporal variations, monthly mean aCDOM(440) in five sub-regions were estimated from the valid pixels in each region (Figure 7). The results showed that estimated monthly mean CDOM concentrations of the entire reservoir in 2018 were low, with the range 0.07–1.90 m−1 and mean 0.62 m−1. Estimated monthly mean CDOM concentrations in different sub-regions demonstrated similar variation trends (see Figure 8(a)). They fluctuated from January to April (0.51–0.61 m−1), increased from May (0.57 m−1) to October (0.76 m−1), and became lower in November (0.55 m−1) and December (0.54 m−1). The satellite observed CDOM's seasonal variations in Xin'anjiang Reservoir were consistent with some previous observations in Mälaren boreal lakes (Kutser 2012). In Xin'anjiang Reservoir, CDOM concentrations dropped significantly in August (0.59 m−1), which may be caused by the heavy rainfall that occurred shortly before the image acquisition time. This phenomenon can be interpreted as being the consequence of dilution due to a large amount of freshwater brought by the rainfall (Shi et al. 2018; Lyu et al. 2020). After the algal bloom, it was found that CDOM concentrations increased sharply, indicating that phytoplankton degradation was likely to be a major source of CDOM under bloom conditions (Zhao et al. 2009). In summer, there is a pressing need to monitor and control the water quality of Xin'anjiang Reservoir to prevent algae outbreaks. The image-observed highest CDOM concentrations in autumn 2018 (see the September and October images in Figure 5) were consistent with the field measured results (see Table 2). The CDOM-rich water presented during the autumn was possibly caused by the occurrence of the algal bloom (Zhang et al. 2014).
Monthly estimated CDOM concentrations observed from Sentinel-2 images in Xin'anjiang Reservoir in 2018.
Estimated monthly mean aCDOM(440) derived from Sentinel-2 images in five sub-regions of Xin'anjiang Reservoir. The red lines represent the mean values of the entire reservoir. All subfigures have the same labels and axes as the subfigure of January.
(a) The image-estimated and field measured CDOM temporal variations (smoothed) in five sub-regions as well as the entire Xin'anjiang Reservoir, (b) monthly total precipitation in Xin'anjiang Reservoir in 2018, and (c) monthly mean inflow rate from the Xin'anjiang River in 2018.
Based on the image-observed CDOM distribution in Xin'anjiang Reservoir, its monthly estimated average aCDOM(440) in five sub-regions decreased in the following order: SE 0.656 ± 0.132 m−1, Central 0.636 ± 0.124 m−1, SW 0.635 ± 0.156 m−1, NE 0.605 ± 0.151 m−1, and NW 0.575 ± 0.113 m−1. CDOM concentrations in SE (lacustrine zone) and C (transition zone) regions were typically higher than those in the NW, NE, and SW regions (riverine zone). It is known that the watershed runoff and rainfall would bring aquatic CDOM and terrestrial humic-like substances into the reservoir (Zhang et al. 2011; Zhou et al. 2016), causing a large amount of CDOM to be carried to and pooled in lacustrine and transition zones through three rivers (Xin'anjiang River, Fuqiangxi River, and Wuqiangxi River), which then resulted in relatively high CDOM concentrations in lacustrine and transition zones.
Relations between CDOM and hydrological and meteorological factors
Some previous studies have indicated that the increased monthly mean precipitation and inflow rate may increase lacustrine CDOM concentrations (Zhou et al. 2016). However, in this study, CDOM concentrations in Xin'anjiang Reservoir peaked in September (0.79 m−1) and October (0.76 m−1) 2018, not in the typical rainy months from May to July (see Figure 8(a)–8(c)). It is known that photochemical degradation and photobleaching are important CDOM removal mechanisms (Moran et al. 2000; Tzortziou et al. 2007; Helms et al. 2008; Yoshida et al. 2018). For example, a 12-day experiment conducted in Lake Taihu found that CDOM absorptions aCDOM(355) and aCDOM(280) decreased by 29.8% and 20.8% when CDOM was exposed to natural solar radiation (Zhang et al. 2009). In 2018, Xin'anjiang Reservoir experienced high-intensity UV-B radiation during the hot and wet season (from June to August) that would significantly increase photobleaching of the surface water and thereby partly remove CDOM introduced by the precipitation and inflow. In addition, the precipitation also has a short-term effect on CDOM, in that a large amount of rainwater may dilute the CDOM already in lake water (Zhou et al. 2015). Therefore, CDOM concentrations estimated in Xin'anjing Reservoir were not too high during the rainy season in 2018. In comparison, the higher CDOM concentrations in September (0.79 m−1) and October (0.76 m−1) may be caused by several reasons: (a) after the high-precipitation months, surface runoff and groundwater with abundant soil organic matter still gradually released CDOM into the water, (b) the solar radiation became weaker and hence slowed down the photobleaching of CDOM, and (c) during the autumn, more and more deciduous leaves decayed and directly increased the organic matter content in soil and water. Therefore, the Xin'anjiang CDOM peaks in September and October were actually the lag effect of the rainfall during the wet seasons. The delay effect of the rainfall can be also observed in the image-derived CDOM concentrations in March and April, which may be caused by the relatively large rainfall in February.
The correlations between the satellite-derived CDOM concentrations and hydrological and meteorological factors (temperature, air pressure, and wind) were analyzed. The R2 between CDOM concentrations and water temperature, air pressure, and wind speed were 0.25, 0.001 and 0.008 respectively, indicating that these factors may not have significant impacts on CDOM variations in the Xin'anjiang Reservoir.
Model uncertainty
Due to the very large area of the lake and limited sampling cost, it was difficult to complete a full-coverage sampling of the entire lake within a couple of days and then repeat such full sampling at different seasons, so the full-coverage data were aggregated from the different sub-regions and times. The inversion model based on such aggregated data may work well for the entire lake over a year-long period, but may cause uncertainty for a specific sub-region at a specific time. The aggregated model tends to highlight the global status but eliminate the regional details. Figure 8(a) shows the image-estimated and the field-measured mean CDOM concentrations over the five sub-regions in different months. For the entire lake and some sub-regions (SE and C on 9 April and 18 June), the estimated and measured mean values matched very well within 0.4–0.8 m−1; however, in some regions (SW, C, NE, and NW on 19 April and 27 October), some relatively small (0.2–0.4 m−1) and large (0.8–1.2 m−1) observations were ‘averaged’ by the aggregated model, hence introducing uncertainties. We suggest that for estimating the regional details, region-oriented specific models should be separately made.
CONCLUSIONS
A remote sensing model was developed to estimate low-concentration CDOM in Xin'anjiang Reservoir, China, using a GBRT algorithm and Sentinel-2 images (two-band ratios, blue/green and red/blue as the input). By the image validation, the model performance was acceptable with accuracy RMSE = 0.1 m−1 and MAPE= 19.0%. The GBRT model was then applied to observe CDOM variations in Xin'anjiang Reservoir in 2018. CDOM exhibited a clear increased gradient from the transition and lacustrine zones to the riverine zones, indicating that the rivers carried a substantial load of organic matter to the reservoir. In 2018, the highest CDOM concentrations were observed in September (0.79 m−1) and October (0.76 m−1) rather than in high precipitation and inflow rate months. We also found that the correlations between CDOM concentrations and water temperature, air pressure, and wind speed were very low (R2 = 0.25, 0.001 and 0.008 respectively), indicating that these factors may not have significant impacts on CDOM variations in Xin'anjiang Reservoir.
Since the field measurement and observations were only conducted within one year, more environmental factors and events may largely or occasionally change CDOM and water quality scenarios, and hence the GBRT model is still subject to be further improved in future. Nevertheless, this study demonstrated that the GBRT model and Sentinel-2 imagery have the potential to accurately monitor CDOM spatiotemporal variations in reservoirs of low concentrations, and hence advancing our understanding on the relations between the dissolved organic matter and its coupling environmental factors in river, lake, and reservoir systems.
ACKNOWLEDGEMENTS
This study was supported by the National Key R&D Program of China (2017YFB0503902) and the National Natural Science Foundation of China (No. 41971373, 41876031).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.