Remotely observed variations of reservoir low concentration chromophoric dissolved organic matter and its response to upstream hydrological and meteorological conditions using Sentinel-2 imagery and Gradient Boosting Regression Tree

Freshwater lakes are facing increasingly serious water quality problems. Remote sensing techniques are effective tools for monitoring spatiotemporal information of chromophoric dissolved organic matter (CDOM), a biochemical indicator for water quality. In this study, the Gradient Boosting Regression Tree (GBRT) model and Sentinel-2A/B imagery were combined to estimate low CDOM concentrations (0.003 m < aCDOM(440) <1.787 m ) in Xin’anjiang Reservoir, an important drinking water resource in Zhejiang Province, China, providing the CDOM distributions and dynamics with high spatial (10 m) and temporal (5 day) resolutions. The possible environmental factors that may affect CDOM spatiotemporal patterns and dynamics were analyzed using Sentinel-2 image-observed data in 2018. Results showed that CDOM in the reservoir exhibited a clear increased gradient from its transition and lacustrine zones to the riverine zones, indicating that the rivers carried a substantial load of organic matter to the lake. The precipitation may increase CDOM concentrations but it has a delayed effect, while it may also shortly decrease CDOM concentrations due to the rainwater dilution. We also found that the correlations between CDOM and water temperature, air pressure, and wind speed were very low, indicating that these factors may not have significant impacts on CDOM variations in the reservoir. This study demonstrated that the GBRT model and Sentinel-2 imagery have the potential to accurately monitor CDOM spatiotemporal variations in reservoirs with low CDOM concentrations, which advances our understanding on the relations between the dissolved organic matter and its coupling environmental factors in river-reservoir systems.


INTRODUCTION
Freshwater lakes and reservoirs in high population density regions are facing increasingly serious water-quality problems, which are caused by human activities and climate change (Li et al. ; Mushtaq & Lala ). As one of the water quality indicators, chromophoric dissolved organic matter (CDOM) is the biologically active component of dissolved organic matter (DOM). CDOM absorbs up to 90% of the underwater solar radiation in 400-500 nm (Belanger et al. ), protecting underwater ecosystems from harmful UV radiation exposure (Stedmon et al. ). However, CDOM also has a few negative effects on the processes of drinking water treatment (Zhang et al. ), such as reducing the effectiveness of oxidants and disinfectants and producing undesirable disinfection by-products during oxidation processes (Baghoth et al. ). Understanding the sources, concentration, and cycling of CDOM in freshwaters is important for managing aquatic resources and predicting the outcomes of environmental change (Olmanson et al. ). As a result, monitoring CDOM in reservoirs and studying its spatiotemporal distribution are very important for effective water quality evaluation and drinking water conservation (Zhang et al. ).
Compared with the traditional methods of collecting field samples and measuring CDOM's absorption coefficients in the lab using spectrophotometers, remote sensing techniques can monitor CDOM variations over a large scale. The traditional remote sensing models for estimating CDOM include empirical, semi-analytical, spectral matching, matrix inversion, and so on (Kutser et al. a; Brezonik et al. ; Ruescas et al. ). However, these models do not perform well for complex waters, particularly for those inland waters with low-concentration CDOM (Chen et al. ). Zhu et al. compared 15 CDOM retrieval methods using the data in Lake Huron, and they found that most methods did not provide reliable estimates of CDOM levels, and even those that provided the best estimates tended to yield underestimates at high CDOM levels and overestimates at low CDOM concentrations (Zhu et al. ). Artificial neural network (ANN) has been also used for CDOM inversion since the last century (Heddam ), and it shows potential to deal with complex inland water.
Recently, machine learning-based methods have also been used for CDOM inversion and obtained accurate results (Ruescas et al. ). So far, there are many available machine learning-based methods, such as RFR (Random Shah et al. ), but which one is more suitable for low-concentration CDOM in freshwater lakes/reservoirs remains unknown, and of course we want to determine the best one. We have noticed that recently the GBRT was proposed for machine learning (Deng et al. ), and it has been proved to be more robust and preferred than the other machine learning methods, such as the Random Forest and Support Vector Machine (Yang et al. ). GBRT has the advantages that it can fit complex nonlinear relationships but does not need prior data transformation or elimination of the outlier scans (Elith et al. ). Therefore, in this study, we tested the GBRT model and compared it with the other several machine learning methods, and our results did show that GBRT was the best among the tested models.
Some land-oriented satellite sensors, such as the Landsat series, have fine spatial resolutions (30 m in Landsat 8) and have been demonstrated with accepted accuracy for inland CDOM estimation. Some Landsat 8-based band-ratio models have been proposed for monitoring low CDOM concentrations (the absorption coefficients of CDOM at 440 nm ranges 0.066-1.242 m À1 ) (Chen et al. ). However, the temporal resolution (16 days) of Landsat series satellites is too low for complex inland waters, which are often highly varied in short periods (Toming et al. ). In the case of cloud cover on the day of the satellite passing, it is difficult to obtain high-quality Landsat images in a region within several months. Therefore, researchers have paid more attention to Sentinel-2 sensors, which have higher spatial (10 m) and temporal resolutions (5 days) (Chen et al. b

Study site
Xin'anjiang Reservoir (also known as Lake Qiandaohu) is in the western part of Zhejiang Province and the southern part of Anhui Province, China (Figure 1). It is the third major river system in Anhui province and its discharge makes the largest river, Xin'anjiang River (or Qiantang River), in Zhejiang province (Wang et al. ). This reservoir has a water area of 580 km 2 , 1,078 islands with each area larger than 2,500 m 2 , a maximal water volume of 178.4 × 10 8 m 3 , and a basin area of 10,480 km 2 (Wu et al. ). The recorded mean annual air temperature is 17 C, and the mean annual precipitation is 1,637 mm . Due to its high water clarity and good water quality, Xin'anjiang Reservoir is a key drinking water source for China's Yangtze River Delta Region, serving a surrounding population of at least ten million (Xin et al. ). However, short-term algal blooms have appeared in the lake since the 1990s (Zhang et al. ), making its water quality worse periodically.
According to its geographical and ecological function, In previous studies, the reservoir was divided into five sub-regions to reveal nutrient status in different aquatic environments (Li et al. ). In this study, to better analyze temporal and spatial variations of CDOM, we also divided the reservoir into the northeastern (NE, the riverine zone), northwestern (NW, the riverine zone), southwestern (SW, the riverine zone), central (C, the transition zone) and southeastern areas (SE, the lacustrine zone), see Figure 1.

Field measurement
Four sampling field trips were conducted in Xin'anjiang Reservoir from April to October in 2018, covering the seasons of spring (twice in April), summer (June), and autumn (October) in 2018 (see Figure 1), and in total 50 samples were obtained (see Table 1). Water samples were collected at the surface (about 10 cm below) using a standard water-fetching bottle, and then preserved in amber bottles (polypropylene 250 mL) at ambient water temperatures before being delivered to the laboratory for analysis within 24 h, measuring some water quality parameters such as the concentrations of CDOM, COD, TP, suspended substances, and chlorophyll-a. At each sampling location, the above-surface water spectra were measured using an ASD (Analytical Spectral Devices, Inc.) FieldSpec ® spectroradiometer (wavelength range 325-1,075 nm with 1 nm interval) (ASD User Guide, https://www.malvernpanalytical.com). Three in-situ optical properties were measured: the above-surface radiance To minimize uncertainties, ten spectra were measured and the median one was selected for calculating the remote sensing reflectance (R rs (λ), sr À1 ): where the surface reflectance factor ρ ¼ 0.028, according to In the laboratory, water samples were filtered through a GF/F glass microfiber membrane (0.45 μm) under low pressure (<5 atm). The filtered sample was used to measure CDOM absorption coefficients a CDOM (λ) with a Cary-100 spectrophotometer (Cary-100 User Guide, https://www. agilent.com/cs/library/usermanuals/public/1972_7000.pdf) with a Milli-Q baseline correction. a CDOM (λ) can be calculated by the following equation: where A(λ) is the Cary-measured CDOM absorbance and the pathlength is the path length (1 cm) of the used cuvette.
In this study, CDOM concentrations are parameterized by its absorption coefficient at 440 nm (Zhu et al. ).
In order to use Sentinel-2A/B data to derive a CDOM (440), the field spectra of R rs were resampled to Sentinel-2 bands using its spectral response function (Chen et al. b).
where R rs (B k ) is the simulated R rs for the k-th (k ¼ 2, 3, 4) band of Sentinel-2A/B, which is summed from λ m to λ n for the k-th band, and R rs_field are the field measured spectra. gives the earth surface reflectance (R t ). Then, a 5 × 5 mean filtering was used to reduce the image uncertainty.
At last, the remote sensing reflectance (R rs , sr À1 ) was calculated by where L r (θ, φ) is the surface radiance at zenith angle θ and azimuth angle φ; E d is the downwelling irradiance. The two variables were estimated using the Hydrolight (Ver. The weighted sum of the prediction results of each regression tree is the predicted value, see Figure 2.
The GBRT model is expressed as follows: where n is the number of weak learners, θ is the coefficient (reducing the over fitting), f is the weak learner, and F is the

Comparison to other CDOM retrieval algorithms
In order to test the advantages of GBRT for estimating lowconcentration CDOM in a freshwater environment, two previous traditional models were also compared. A logarithmic model (see Equation (6) Another model was a unary quadratic polynomial Landsat-8 model (see Equation (7)), in a Brazil reservoir, where a CDOM-(440) ranges from 0.644 m À1 to 1.413 m À1 (Alcantara et al.

).
The second model is more comparable with our model in terms of water type and CDOM range.
where B2, B3, and B4 are the bands of Landsat 8. The parameters A, B, and C need to be re-calibrated for different study sites.

RESULTS AND DISCUSSION
Field measured CDOM and above-surface spectra  (440) ranged from 0.1 to 1.78 m À1 with mean 0.7 m À1 . Our field measured data showed that there were also seasonal variations: the measured a CDOM (440) in April was within 0.1-0.99 m À1 (mean 0.39 m À1 ), but in October the range changed to 0.7-1.8 m À1 (mean 1.14 m À1 ), demonstrating that CDOM concentrations were higher in autumn than in the early spring. The measured CDOM concentrations during our four field trips are shown in Table 2.
The measured remote sensing reflectance of the Xin'anjiang Reservoir is shown in Figure 3,

Model assessment and comparison
To determine the optimal inputs of the GBRT model, all the possible band-ratio combinations were examined and the statistical results are shown in Table 3      In this study, two models developed from previous studies were tested with our situ-data in Xin'anjiang Reservoir. We found that the logarithmic model (R 2 ¼ 0.0009,    All subfigures have the same labels and axes as the subfigure of January.

10
Z. Zhang et al. | Estimating reservoir CDOM using Sentinel-2 imagery and GBRT Water Supply | in press | 2021 Corrected Proof with the field measured results (see Table 2). The CDOMrich water presented during the autumn was possibly caused by the occurrence of the algal bloom (Zhang et al.

).
Based on the image-observed CDOM distribution in by 29.8% and 20.8% when CDOM was exposed to natural solar radiation (Zhang et al. 

Model uncertainty
Due to the very large area of the lake and limited sampling cost, it was difficult to complete a full-coverage sampling of the entire lake within a couple of days and then repeat such full sampling at different seasons, so the full-coverage data were aggregated from the different sub-regions and times.
The inversion model based on such aggregated data may work well for the entire lake over a year-long period, but may cause uncertainty for a specific sub-region at a specific time. The aggregated model tends to highlight the global status but eliminate the regional details. Figure 8( suggest that for estimating the regional details, regionoriented specific models should be separately made.

CONCLUSIONS
A remote sensing model was developed to estimate low-concentration CDOM in Xin'anjiang Reservoir, China, using a GBRT algorithm and Sentinel-2 images (two-band ratios, blue/green and red/blue as the input have the potential to accurately monitor CDOM spatiotemporal variations in reservoirs of low concentrations, and hence advancing our understanding on the relations between the dissolved organic matter and its coupling environmental factors in river, lake, and reservoir systems.