Evaluation of the Crosta method for the retrieval of water quality parameters from remote sensing data in the Pearl River estuary

In recent decades, many algorithms have been developed for the retrieval of water quality parameters using remotely sensed data. However, these algorithms are specific to a certain geographical area and cannot be applied to other areas. In this study, feature-orientated principal component (PC) selection, based on the Crosta method and using Landsat Thematic Mapper (TM) for the retrieval of water quality parameters (i.e., total suspended sediment concentration (TSM) and chlorophyll a (Chla)), was carried out. The results show that featureorientated PC TSM, based on the Crosta method, obtained a good agreement with the MERIS-based TSM product for eight Landsat TM images. However, the Chla information, selected using the featureorientated PC, has a poor agreement with the MERIS-based Chla product. The accuracy of the atmospheric correction method and MERIS product may be the main factors influencing the accuracy of the TSM and Chla information identified by the Landsat TM images using the Crosta method. The findings of this study would be helpful in the retrieval of spatial distribution information on TSM from the long-term historical Landsat image archive, without using coincident ground measurements. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/wqrj.2020.024 ://iwaponline.com/wqrj/article-pdf/55/2/209/709563/wqrjc0550209.pdf Feng Gao School of Resources and Environment, Shanxi University of Finance and Economics, Taiyuan 030006, China Feng Gao Yunpeng Wang (corresponding author) State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China E-mail: wangyp@gig.ac.cn Yuanzhi Zhang Nanjing University of Information Science & Technology, Nanjing 210044, China This article has been made Open Access thanks to the kind support of CAWQ/ACQE (https://www. cawq.ca).


INTRODUCTION
In recent years, global studies have shown that it is possible to pair in-situ data collected at discrete stations with remotely sensed data, with the aim of retrieving water quality parameters. Many algorithms have been developed using different kinds of methods (Miller &  linear relationships between satellite remote sensing reflectance and field measurements of water quality parameters (e.g., total suspended sediment concentration (TSM) and chlorophyll a (Chla)) in the estuary and inland waters to region-specific algorithms, explicitly developed for coastal waters, using in-situ measured reflectance and TSM or Chla.
For example, Xi & Zhang () established an empirical two-band model using the ratio of remote sensing reflectance at 629 and 671 nm to retrieve the TSM concentration in the Pearl River estuary (PRE). In addition, they used a MERIS image and in-situ remote sensing reflectance to map the distribution of the TSM in the PRE. Xing et al. () used in-situ remote sensing reflectance and a combination of Hyperion bands using an exponential regression model to estimate the TSM concentration in the PRE, and they obtained a good performance. However, all these algorithms have no uniform model, because they lack a physical foundation, and most of them are therefore geographically dependent and cannot be applied to other areas (Zhang et al. ).
In the field of geology, the Crosta method has been used to extract remote sensing alteration information for mineral exploration. The idea of this method is to take advantage of the information contained in Landsat Thematic Mapper (TM) imagery for mineral exploration purposes. The Crosta method is essentially principal component analysis (PCA). An appropriate band combination can be used to conduct PCA, based on the specific spectral characteristics of different minerals in different spectral ranges, with the aim of selecting the principal component (PC) with specific mineral information. In 1989, the Crosta method was proposed and successfully applied to extract iron oxide and mudding anomalies from Landsat TM data (Crosta & Moore ). Each PC obtained by PCA using Landsat TM data often has a certain geological significance, which it does not share with any other component, meaning that each PC has unique characteristics. For example, the criterion for judging the PC of iron-stained minerals is based on the four bands of TM1, TM3, TM4 and TM5. The characteristic vector of the PC should be constituted by, and its TM3 coefficient should be the opposite of that of, TM1 and TM4.
The coefficient symbol of TM3 is generally the same as that of TM5. Based on these criteria, the information on iron dyeing contained in the PC can be selected. Thus, this selected PC can be called the anomalous PC of iron dyeing.
Crosta uses these spectral characteristics and combinations of several TM bands to diagnose remote sensing alteration information for iron exploration (Crosta & Moore ).
By the end of 2012, the Landsat TM instrument had obtained a large amount of data on the Earth's surface over 28 years (Wu et al. ). The image archive collected by the Landsat TM is historically unique, and it provides excellent opportunities for us to monitor and analyze the long-term spatiotemporal dynamics of the Earth's surface parameters, like TSM and Chla.
The objective of this study is to use Landsat TM imagery to evaluate the Crosta method in the retrieval of water quality parameters (e.g., TSM and Chla) in the PRE. Firstly, the spectral characteristics of TSM and Chla in the visible and near-infrared (NIR) band are described. Secondly, appropriate Landsat TM bands for diagnosing the TSM and Chla of the water components are selected. Thirdly, the effect of the combinations of eigenvectors with reflectance values of spectral bands on PC images is analyzed. Finally, an analysis of the correlation between the feature-orientated PC and corresponding MERIS products (TSM and Chla) is presented.

Study area
The Pearl River is the second largest river in China and is considered one of the most complicated fluvial networks in the world. It is comprised of three major tributaries (the western, northern and eastern rivers) and other small rivers, draining into the PRE, which occupies an area of ∼17,200 km 2 . It plays a vital role in supplying fresh water to the large cities in the Pearl River Delta region, such as Macau, Hong Kong, Zhuhai, and Guangzhou. The annual suspended sediment load of the river is 88.7 Mt/y. The annual average flowrate is 2,281 × 10 8 m 3 /y, and the annual suspended-sediment discharge is 6,567 × 10 4 t/y for the western river. The annual average water and suspended sediment discharges for the northern river are 449 × 10 8 m 3 /y and 864 × 10 4 t/y, respectively, while those for the eastern river are 234 × 10 8 m 3 /y and 236 × 10 4 t/y, respectively (Wu et al. ). The total water and sediment discharges of these three major rivers account for more than 80% and 95% of the total load entering the sea, respect-  Table 1.
In this study, atmospheric correction was performed on Landsat TM images using the FLAASH atmospheric correction model with the ENVI 5.3.1 software. The pixel surface reflectance for all Landsat bands was retrieved. Due to the limitation of in-situ measurements, the results of the atmospheric correction were not analyzed and evaluated in this study. The MERIS sensor is designed mainly for ocean and coastal water remote sensing, with 15 narrow spectral bands in a range of 390-1,040 nm and a revisit period of one to three days. The eight bands centered at 412, 442, 490, 510, 560, 620, 665 and 708 nm were used, along with a neural network, to derive the MERIS level 2 products (Kratzer et al. ). MERIS FR level 2 products were projected into Geographic Lat/Log (WGS84). The sample points that matched with the feature-orientated PC were extracted using Beam 4.8 software (http://www.brockmann-consult.de/cms/ web/beam/welcome). The average PC values that matched with the sample points were extracted using a 3 by 3 window on specific PC images. Detailed information on the bands and spectral range can be found in Table 2. A single MERIS FR level 2 product has many bands, These datasets are assumed as the ground measurements and are further employed to validate the performance of the feature-orientated PCs.

Spectral characteristics of TSM and Chla
Water components, TSM and Chla, have unique spectral characteristics in the electromagnetic spectrum, and water reflectance is highly variable over the visible and NIR spec-     Algebraically, for p original variables, This sequence continues for all p PCs. The first few PCs will tend to contain (or explain) a large percentage of the total variance and may be used to describe multivariance patterns or variance in water quality. Often these patterns are related to specific sources of contamination should be significantly larger than those of TM5 and TM7.
The criterion for judging the PC of Chla is that the TM1 and TM3 coefficients of the PC should be significantly larger than those of the other bands, and the TM1 coefficient of the PC should generally be the opposite of that of the TM3 coefficient symbols.

Statistical analysis and accuracy assessment
An analysis of the correlation between the feature-orientated PC and ground truth TSM and Chla was conducted using Pearson's correlation coefficient, with a statistical significance of p < 0.0001. The significance level of 0 (less than 0.0001) indicated, in this study, that there are strong correlations between the feature-orientated PC and satellite-based TSM and Chla.
The root mean square error (RMSE, Equation (1) Table 3. In Table 3 The knowledge on the spectral characteristics of TSM and Chla in the Landsat TM bands was used to define and select the PCs containing spectral information due to these water components.
As shown in Table 3  An image data set (Figure 3(a)) and PC1 subset and TM3 and the very high reflectance value in the subscene of Figure 3(a), the pixel in the crosshair appears bright (Figure 3(b)). This means that the TSM concentration of the pixel is very high in the crosshair shown in Figure 3(a).
A second pixel, selected in the sub-scene (Figure 4(a)) and PC1 image (Figure 4(b)) of Landsat TM on Oct. 17, 2003, is presented in Figure 4.  Figure 3. Thus, in the resulting PC1 image, the pixel in the crosshair appears dark (Figure 4(b)). In this case, the TSM concentration of the pixel is relatively lower than that of the pixel shown in Figure 3. the results of the Crosta method directly. The results show that there is a reliable correlation between PC3 and the MERIS Chla product (R 2 ¼ 0.74, p-value <0.0001, RMSE ¼ 2.88 mg/m 3 ) (Figure 6(b)).
Considering the distribution of PC1 (Figure 2   Overall, the feature-orientated PC (TSM) can effectively