Abstract
The coarse spatial resolutions of satellite-based soil moisture (SM) products restrict their applications at smaller spatial scales. In this study, the monthly European Space Agency Climate Change Initiative SM data (ESA CCI SM) from 2000 to 2016 was downscaled from 25- to 1-km resolution in the Taihu Lake Basin, a typical humid area with complex terrain and land uses. The normalized difference vegetation index (NDVI) and land surface temperature (LST) were used as auxiliary data. The regional monthly mean ESA CCI SM values were classified into low value (0.24–0.30 m3m–3), mid-value (0.30–0.33 m3m–3) and high value (0.33–0.39 m3m–3) months by the K-means clustering algorithm. The linear (multiple linear regression) and non-linear (support vector machine) downscaling models were compared. In addition, whether building downscaling models based on wetness conditions could improve the accuracies was tested. Results showed that without considering wetness conditions, the linear method was slightly better than the non-linear method. However, linear models constructed based on wetness conditions performed the best, which demonstrated that wetness conditions should be considered in the downscaling process. Results of this study would improve the accuracies in downscaling satellite-based SM data, facilitating their applications at regional scales.
HIGHLIGHTS
The quality of ESA CCI SM was evaluated in Taihu Basin Lake.
Linear and non-linear downscaling methods were compared.
Wetness conditions should be considered in the downscaling process.
Graphical Abstract
INTRODUCTION
As a key factor in modulating the water cycles and energy exchanges of terrestrial ecosystems, soil moisture (SM) is critical in the various researches and applications, such as water resource assessment (Renzullo et al. 2014; Zhang et al. 2016a), natural hazard modeling and monitoring (Ray et al. 2010; Ponziani et al. 2011), weather forecasting (Dirmeyer et al. 2016; Tuttle & Salvucci 2016), nutrient cycling (Zhu et al. 2018), and drought detection (Bolten et al. 2010; Wang et al. 2016; Liao et al. 2020). Therefore, obtaining reliable SM information (e.g. high accuracy, good spatial coverage, and high resolution) has received a wide range of interest (Jaeger & Seneviratne 2010; Dorigo et al. 2017).
Traditional large spatial scale SM data is acquired through ground-based measurements, for example the International Soil Moisture Network (ISMN) (Dorigo et al. 2011), the Global Soil Moisture Data Bank (Robock et al. 2000), the U.S. SCAN network (Bitar et al. 2012) and Chinese Ecosystem Research Field Observational Stations Network (An et al. 2016; Liu et al. 2018). These sources provide point-based long-term SM data over large areas. However, the quantity of ground observation stations is limited, which makes them incapable of capturing spatial heterogeneities. Meanwhile, the ground-based SM datasets are also restricted by their observation frequencies. The development of remote sensing technology makes it possible to obtain large-scale, all-day, and all-weather SM data. Optical remote sensing for monitoring SM is susceptible to atmospheric conditions and geographical environment, whereas the microwave, with a longer wavelength and strong penetrability, can realize all-day and all-weather observation. According to different sensors, the microwave remote sensing can divide into active microwave, passive microwave and the combination of both. Many microwave SM datasets have been available such as Environmental Satellite (ENVISAT)-Advanced SAR (ASAR) (Pathe et al. 2009), Meteorological Operational Satellite Program (METOP)-Advanced SCATterometer (ASCAT) (Albergel et al. 2009), passive microwave-based Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) (Kawanishi et al. 2003), Advanced Microwave Scanning Radiometer 2 (AMSR-2) (Li et al. 2004), and Soil Moisture Ocean Salinity (SMOS) (Kerr et al. 2002). Soil Moisture Active Passive (SMAP) (Knipper et al. 2017) and European Space Agency Climate Change Initiative (ESA CCI) (Liu et al. 2011; Dorigo et al. 2015) also provide the combination of both active and passive microwave-based products. However, spatial resolutions of these datasets are generally coarse (several tens of kilometers). Therefore, they are incapable of capturing the spatial variation of SM at smaller spatial scales.
Thus, it is necessary to generate high-resolution SM data for smaller scale researches and implications. SM usually shows high spatio-temporal variability since it is affected by different factors like topography, pedology, vegetation, and meteorology (Crow et al. 2012; Petropoulos et al. 2015). In most of the existing downscaling methods, high accuracy resolution data (e.g. from the Optical/TIR sensors) was utilized as auxiliary variables to refine the satellite-based coarse SM data. This is because SM has been reported to correlate with factors like land surface temperature (LST) and vegetation index (VI) that can be obtained from the Optical/TIR remote sensing (Piles et al. 2016; Knipper et al. 2017). Besides, elevation, slope, precipitation and other factors were also taken into consideration in some studies (Montzka et al. 2018; Zhao et al. 2018b).
Different approaches have been applied in downscaling satellite-based SM data. They can be divided into linear and non-linear categories. In the linear models, the polynomial-fitting method was developed based on the ‘universal triangle’ space between LST and VI. Examples using this kind of approach included Zhao et al. (2018a) who downscaled the AMSR2 L3 SM data; Piles et al. (2016) proposed a linear regression between the SMOS-derived SM and Moderate Resolution Imaging Spectroradiometer (MODIS) optical and infrared datasets. However, other studies believed that the relationships between SM and auxiliary factors were complex due to the fluctuating terrain, complex meteorology and other conditions. Therefore, non-linear relationships should be properly modeled to perform reliable downscaling. With the development of machine learning, non-linear methods have been used in the SM downscaling. For example, Srivastava et al. (2013) compared artificial neural network (ANN), support vector machine (SVM), relevance vector machine, and generalized linear models for downscaling SMOS SM data, and found that the ANN performed the best; Liu et al. (2018) compared Classification and Regression trees, K-nearest Neighbors, Bayesian, and Random Forests in downscaling the European Space Agency Climate Change Initiative soil moisture (ESA CCI SM) from 25- to 1-km over the three provinces of northeast China, and found that Random Forests yielded the best performance. In different areas with varied geology, topography, soil and land use backgrounds, relationships between SM and downscaling factors can be temporal and spatially varied, and thus performances of linear and nonlinear models can be inconsistent.
The temporal and spatial variations of SM are affected by a great number of environmental factors which exhibited high spatio-temporal variability across scales. Among them, wetness conditions have been reported to have a large influence on SM spatial patterns and their major controlling factors. Previous studies proposed that under wet soil conditions the topography was the dominant factor that controlled the direction of water movement, thereby affecting the spatial distribution of SM; while soil properties determined water holding capacity, and thus had primary control of SM distribution under dry conditions (Grayson et al. 1997; Penna et al. 2013). This indicates that under different wetness conditions the response of SM to its influenced factors can be different. However, in the current SM downscaling research, wetness conditions were usually neglected. Most of them established linear or non-linear relationships in the downscaling process based on the whole time series of satellite-based SM datasets.
In arid and semi-arid areas with simple terrain, the satellite-based SM products can reflect the variation of SM better than in the areas with complex terrain and vegetation cover. This was due to a single vegetation type and less surface fluctuations would have less interference with the sensor signals. Many SM downscaling studies focused on these arid and semi-arid areas (Knipper et al. 2017; Tagesson et al. 2018), while in the humid areas with complex terrain (like Taihu Lake Basin), due to the complex terrain and land cover, the precision of the satellite-based SM products was not good, which increases the challenges in the downscaling process (Mascaro et al. 2010; Liu et al. 2018).
Based on the above insights, this study aimed at evaluating whether wetness conditions and non-linearity should be considered in downscaling the ESA CCI SM in a humid area with complex terrain – Taihu Lake Basin. First, linear and non-linear methods were applied to downscale the 25-km monthly ESA CCI SM to 1-km in Taihu Lake Basin from 2000 to 2016 and then their accuracies were compared. Second, based on the monthly mean ESA CCI SM of the study area, the whole period (204 months) was divided into three categories (low value month, mid-value month and high value month). To consider the wetness conditions in the downscaling, the linear and non-linear methods were used to downscale the SM under these three categories. On this basis, the importance of wetness conditions and non-linearity were discussed.
MATERIALS AND METHODS
Study area and ground-based observations
The Taihu Lake Basin is located on the east coast of China with an area of approximately 36,895 km2 (Figure 1(a) and 1(b)). Plain is the dominant terrain in this basin, accounting for about two-thirds of the total area. The mountain and hills in the western and southwestern parts make up about one-sixth of the total area, and water bodies account for the remaining one-sixth. The average elevation is about 36 m above sea level, ranging from –23 to 1,481 m. This basin has a typical subtropical monsoon climate area, with an annual average temperature of 14–18 °C and annual cumulative precipitation of 1,000–1,400 mm. Rainfall is unevenly distributed throughout the year, mainly concentrated in the summer and fall.
(a) Digital elevation model and locations of ground-based observation stations, (b) location of Taihu Lake Basin, (c) grid cells of the ESA CCI SM in Taihu Lake Basin.
(a) Digital elevation model and locations of ground-based observation stations, (b) location of Taihu Lake Basin, (c) grid cells of the ESA CCI SM in Taihu Lake Basin.




Data sources
ESA CCI SM data
The ESA scientists produced three kinds of SM datasets, including active SM data, passive SM data and combined active passive microwave datasets. In this study, the ESA CCI SM Version 04.2 dataset (www.esa-soilmoisture-cci.org) was utilized as the original data in light of the data integrity and stability. The temporal resolution of ESA CCI SM dataset is daily and the spatial resolution is 25-km from 1978 to 2016 (still being updated). At the global scale, the accuracy of this dataset was reported to be acceptable when validated by ground-based observations, with average Spearman correlation coefficient and unbiased root-mean-square differences values of 0.46 and 0.04 m3m–3, respectively (Dorigo et al. 2015). Monthly SM data was calculated through the daily data and used in this study. There were several reasons for using this temporal scale. First, the daily CCI SM contained gaps due to the satellite scanning paths, daily data could not completely cover the study area. Second, the monthly data was more valuable for monitoring long-term hydrological processes. Finally, when averaging daily data to monthly scale, the outliers could partly be eliminated and the errors were reduced.
MODIS products
The MODIS spectral reflectance data (MOD09Q1) were used to calculate the normalized difference vegetation index (NDVI). The spatial resolution of MOD09Q1 is 250-m, which corresponds to a maximum synthetic reflectance of 8 days. First, the MODIS Reprojection Tool (MRT, National Aeronautics and Space Administration, America) was used to data mosaic, re-projection and band information extraction. Then the data clipping and format conversion were performed in ArcGIS 10.2 (the Environmental Systems Research Institute, Redlands, California). According to the extracted band information, the NDVI was calculated using the formula [NDVI = (NIR–R)/(NIR + R)], where NIR and R are the near-infrared and red spectral reflectance, respectively. The smoothing and noise reduction works of NDVI data were conducted with the S-G filtering method of TIMESAT3.2 (Jönsson & Eklundh 2004), which was used to reconstruct the NDVI time series. The 1-km resolution daily surface temperature was derived from the MOD11A1 product. The processing of LST data was similar to NDVI data. All MODIS datasets from 2000 to 2016 were obtained from NASA's Earth Observing System (http://reverb.echo.nasa.gov/). The NDVI and LST datasets were ascended to the monthly temporal resolution and resampled to 1- and 25-km spatial resolutions using the nearest neighbor algorithm. In this study, Pearson correlation analysis was made between the measured SM of six ground stations and the NDVI and LST of the stations’ corresponding grids. Meanwhile, it was also made between ESA CCI SM, NDVI and LST of the six stations’ corresponding grids.
Digital elevation model and meteorological data
The Digital Elevation Model (DEM) data with spatial resolution of 30-m was obtained from Shuttle Radar Topography Mission (SRTM, http://srtm.csi.cgiar.org/srtmdata/). In order to keep the spatial resolution consistent with the downscaled SM, the 30-m DEM data was resampled to 1-km. The DEM data was converted into ASCII format to facilitate the interpolation of meteorological data. The air temperature and precipitation data were obtained from 753 meteorological stations across China from the China Meteorological Science Data Sharing Service Network. The ANUSPLINE (Hutchinson 1991, 1998) was applied to spatially interpolate the data and obtain the regional temperature and precipitation maps for the study area from 2000 to 2016, with a spatial resolution of 1-km and a temporal resolution of 8 days. Wang et al. (2017) showed that temperature and precipitation interpolations obtained using ANUSPLINE can explain 90 and 77% of the spatial variance. Based on the 8-day interval interpolated meteorological data, the monthly, seasonal and annual temperature and precipitation data were calculated for further analysis.
Downscaling approach
Linear method and SVM









The SVM method introduced by Vapnik (1995) was used as the non-linear downscaling method in this study. In the application of SVM, the selection of kernel function and its parameters are critical (Horn et al. 2018). The kernel functions are applied to convert a low-dimensional non-linear problem into a high-dimensional linear problem. Many studies have successfully applied this method in hydrological studies (Zhu et al. 2017; Liang et al. 2018). Based on the previous experience of using SVM, radial basis function (RBF) was selected as the kernel function. The resampled 25-km NDVI, LST, NDVI2, LST2 and NDVI * LST were used as the input data, while the 25-km ESA CCI SM was the output of SVM. In this process, the data was trained. Once the models were built, the finer resolution (1-km) NDVI, LST, NDVI2, LST2 and NDVI * LST were input into them to compute the SM at 1-km resolution, which was to achieve the purpose of downscaling. In this study, LibSVM was wielded as the SVM computation, which was a simple, practical, fast and effective SVM pattern recognition and regression software package (Hsu & Lin 2002). The SVM parameters contained the penalty parameter c, the width parameter g, and the variable p of RBF-kernel function. The accuracy of these parameters directly affected the results of the model output. In order to minimize errors in SM prediction, particle swarm optimization algorithm combined with K-fold cross validation were utilized to determine these parameters. This approach was named as ‘LibSVM’ in the following text.
Clustering segmentation algorithm
K-means clustering algorithm was used to classify the spatial mean of monthly ESA CCI SM in the study area from 2000 to 2016. During the whole study period, monthly mean SM values were classified as relatively low value, mid-value and high value. Then, the linear method and SVM were applied to downscale SM under different wetness conditions. These two corresponding approaches were named as ‘K-means linear’ and ‘K-means LibSVM’, respectively, in the following text.
Validation
In this study, different indices were adopted to evaluate the accuracies of downscaled SM. The coefficient of determination (R2) was used to evaluate the spatial consistence between the in situ data and the corresponding pixel values of the downscaled SM. The root mean square error (RMSE) was used to measure the deviation between the in situ data and downscaled SM. Linear regressions were used to evaluate the relationships between the in situ SM and the downscaled SM.
RESULTS
General description of ESA CCI SM
During the study period (2000–2016), the regional mean ESA CCI SM fluctuated between 0.241 and 0.389 m3m–3, and the mean value was 0.316 m3m–3 (Figure 2(a)). The high value months of ESA CCI SM mainly occurred in January, February and December, and the lowest value month occurred in August (Figure 2(b)). The precipitation was mainly concentrated in June–August, while the monthly mean ESA CCI SM had lower values (0.32–0.30 m3m–3) from June to August (Figure 2(b)). The SM failed to reach the highest value of the year in the rainy season (June–August) at the monthly scale. This can be attributed to two reasons; first, owing to the high temperature and plant growth, the evapotranspiration was high during the rainy season. Second, precipitation during the rainy season was always short-term (lasting within 1 day) with high intensities. Under these circumstances, surface and subsurface runoff occurred and less precipitation was retained in soils. Therefore, at the monthly scale, SM may not be consistently moist. On the contrary, the SM was higher in autumn and winter with moderate precipitation due to the evapotranspiration being relatively weak and precipitation could be retained well in soils. For example, although the precipitation in August was greater than January, April and November, the monthly mean ESA CCI SM of the study area was lower than the other three months (Figure 2(c)).
(a) Time series variation of regional monthly mean ESA CCI SM, monthly total precipitation, and monthly average temperature in Taihu Lake Basin from 2000 to 2016. (b) Monthly mean ESA CCI SM and monthly total precipitation from 2000 to 2016. The error bar stands for the standard deviation. (c) The 8-day interval precipitation, average temperature and monthly mean ESA CCI SM of January, April, August and November (representing winter, spring, summer and autumn, respectively). (d) Classification results of monthly ESA CCI SM (means of the study area) using K-mean clustering algorithm.
(a) Time series variation of regional monthly mean ESA CCI SM, monthly total precipitation, and monthly average temperature in Taihu Lake Basin from 2000 to 2016. (b) Monthly mean ESA CCI SM and monthly total precipitation from 2000 to 2016. The error bar stands for the standard deviation. (c) The 8-day interval precipitation, average temperature and monthly mean ESA CCI SM of January, April, August and November (representing winter, spring, summer and autumn, respectively). (d) Classification results of monthly ESA CCI SM (means of the study area) using K-mean clustering algorithm.
To investigate the importance of wetness conditions in the downscaling process, the original ESA CCI SM in the study period was processed using the K-means clustering method (Figure 2(d)). The monthly mean SM of the study area varied from 0.24 to 0.30 m3m–3, from 0.30 to 0.33 m3m–3 and from 0.33 to 0.39 m3m–3, defined as low value, mid-value and high value months, respectively. In general, ESA CCI SM data did not fit well with the ground-based monitoring data (Figure 3). Most of the stations had R2 values below 0.20 and RMSE values above 0.050 m3 m–3.
Scatterplots, linear regression equations, RMSE values, and R2 values between original ESA CCI SM and ground-based in situ SM observations.
Scatterplots, linear regression equations, RMSE values, and R2 values between original ESA CCI SM and ground-based in situ SM observations.
Comparing the original and downscaled SM
Based on the four downscaling approaches (Linear, K-means linear, LibSVM and K-means LibSVM), SM were obtained with 1-km resolution from 2000 to 2016. The downscaled SM using linear and LibSVM approaches did not agree well with the original ESA CCI SM (R2 < 0.07) (Figure 4(a) and 4(c)). Considering the wetness conditions, the K-means linear and K-means LibSVM downscaled results showed better fitting effects with the original ESA CCI SM data (R2 > 0.49) (Figure 4(b) and 4(d)). However, the RMSE values were similar among these four approaches (from 0.037 to 0.042).
Scatterplots of original ESA CCI SM and (a) Linear SM; (b) K-means linear SM; (c) LibSVM SM; (d) K-means LibSVM SM.
Scatterplots of original ESA CCI SM and (a) Linear SM; (b) K-means linear SM; (c) LibSVM SM; (d) K-means LibSVM SM.
The mean value of original ESA CCI SM was higher than those of downscaled SM, while the mean values of these four downscaled SM were similar (Figure 5). The largest standard deviation (0.027) was shown in original ESA CCI SM, followed by K-means linear SM (0.022), K-means LibSVM SM (0.020), linear SM (0.016) and the LibSVM SM (0.014) (Figure 5). A map of the mean ESA CCI SM (from 2000 to 2016) was compared with maps of mean SM generated using these four downscaled approaches (Figures 1(c) and 6). These five maps have similar spatial patterns. However, in the mean ESA CCI SM map, data did not completely cover the whole study area due to the coarse spatial resolution and large water body (Taihu Lake) (Figure 1(c)). In contrast, spatial coverage of the four downscaled SM maps were visually more reasonable. The SM of all low value, mid-value and high value months were averaged for the original ESA CCI SM, K-means linear SM and K-means LibSVM SM (Figure 7). Under the same wetness conditions, the downscaled maps reflected the spatial changes and captured the details better.
Comparison of multi-year average SM obtained from the original ESA CCI SM and four downscaled methods (Linear, K-means linear, LibSVM and K-means LibSVM) in the Taihu Lake Basin. The error bar is the standard deviation.
Comparison of multi-year average SM obtained from the original ESA CCI SM and four downscaled methods (Linear, K-means linear, LibSVM and K-means LibSVM) in the Taihu Lake Basin. The error bar is the standard deviation.
Validation
The validation results of the downscaled SM data by different downscaling methods are shown in Table 1 and Figure 8. When validating the downscaled data, the stations of 58,346, 58,245 and 58,356 were better than other stations. In terms of R2, The results obtained by the K-means linear method had the highest R2 value (0.355) with the ground-based monitoring data, the Linear method (R2 = 0.242) ranked second, followed by the K-means LibSVM method (R2 = 0.230), and the LibSVM method had the lowest R2 value (0.200). The average RMSE of the K-means linear approach was the lowest (0.032 m3 m–3) among these four downscaling methods. In addition, the downscaled SMs of these four methods have improved the R2 values and decreased the RMSE values when compared with the validation of the original ESA CCI SM (R2 = 0.125, RMSE = 0.054 m3 m–3). These results indicated that four methods can improve the accuracy of the SM data, among which the K-means linear method performed the best.
Validation results of original ESA CCI SM and different downscaling methods with ground-based SM stations
In situ . | Original data . | Downscaled data . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
ESA CCI SM . | Linear SM . | LibSVM SM . | K-means linear SM . | K-means LibSVM SM . | ||||||
R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | |
58,245 | 0.201 | 0.049 | 0.311 | 0.035 | 0.269 | 0.036 | 0.352 | 0.034 | 0.286 | 0.036 |
58,252 | 0.117 | 0.057 | 0.135 | 0.035 | 0.098 | 0.040 | 0.320 | 0.030 | 0.247 | 0.032 |
58,269 | 0.002 | 0.050 | 0.221 | 0.036 | 0.150 | 0.037 | 0.268 | 0.033 | 0.109 | 0.040 |
58,342 | 0.285 | 0.056 | 0.201 | 0.036 | 0.130 | 0.038 | 0.385 | 0.032 | 0.212 | 0.037 |
58,346 | 0.093 | 0.032 | 0.271 | 0.026 | 0.340 | 0.027 | 0.426 | 0.023 | 0.298 | 0.028 |
58,356 | 0.056 | 0.081 | 0.316 | 0.045 | 0.214 | 0.047 | 0.379 | 0.042 | 0.228 | 0.073 |
In situ . | Original data . | Downscaled data . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
ESA CCI SM . | Linear SM . | LibSVM SM . | K-means linear SM . | K-means LibSVM SM . | ||||||
R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | |
58,245 | 0.201 | 0.049 | 0.311 | 0.035 | 0.269 | 0.036 | 0.352 | 0.034 | 0.286 | 0.036 |
58,252 | 0.117 | 0.057 | 0.135 | 0.035 | 0.098 | 0.040 | 0.320 | 0.030 | 0.247 | 0.032 |
58,269 | 0.002 | 0.050 | 0.221 | 0.036 | 0.150 | 0.037 | 0.268 | 0.033 | 0.109 | 0.040 |
58,342 | 0.285 | 0.056 | 0.201 | 0.036 | 0.130 | 0.038 | 0.385 | 0.032 | 0.212 | 0.037 |
58,346 | 0.093 | 0.032 | 0.271 | 0.026 | 0.340 | 0.027 | 0.426 | 0.023 | 0.298 | 0.028 |
58,356 | 0.056 | 0.081 | 0.316 | 0.045 | 0.214 | 0.047 | 0.379 | 0.042 | 0.228 | 0.073 |
Note: The unit of RMSE is m3 m–3.
DISCUSSION
Assessment of ESA CCI SM with ground-based station data
The emissivity and permittivity of soil are calculated by the brightness temperature received by the microwave sensor in passive microwave remote sensing, while the permittivity of the soil is inversed by the backscattering coefficient of the surface feature in active microwave. Therefore, active microwave can perform better in densely vegetated areas and passive microwave can perform better in sparse areas. The ESA CCI SM v04.2 is a combination of both active and passive datasets (Gruber et al. 2019). At the global scale, 596 in situ stations were applied to validate the ESA CCI SM and acceptable accuracy was achieved (mean R = 0.46 and mean RMSE = 0.04 m3 m–3) (Dorigo et al. 2015). The majority of the ISMN stations are distributed in North America and Europe. The evaluation of ESA CCI SM in many other regions relies mainly on local in situ data. For example, Ikonen et al. (2015) assessed the ESA CCI SM by the Sodankylä local in situ SM observation network. The results showed that the average correlation between the daily ESA CCI SM and in situ data is 0.479 and the average RMSE is 0.039 m3 m–3 ; An et al. (2016) utilized the national agricultural meteorological data to validate the ESA CCI SM data in China and found that R2 values were 0.014–0.561 and the RMSE values were 0.006–0.096 m3 m–3; in northeast China, Liu et al. (2018) results validated the regional mean R2 and RMSE values were 0.183 and 0.074 m3 m–3, respectively.
The relatively low R2 values and high RMSE values between ESA CCI SM and ground-based station data were attributed to the following reasons. First, due to the coarse resolution of ESA CCI SM, a pixel always contains complex land cover. This may result in a proportional difference between in situ measurements and ESA CCI SM domain grid estimates (Peng et al. 2015; Gruber et al. 2019). Second, the detection depth of ESA CCI SM is not consistent with the installation depth of the in situ SM sensors. The microwave sensors can only detect 3–5 cm depth below the earth's surface, while the in situ observations monitor SM at 10 cm depth or even deeper. This may lead to additional uncertainties when evaluating ESA CCI SM (Brocca et al. 2011; Dorigo et al. 2017). Third, the monitoring time between ESA CCI SM and in situ is not consistent. The local observations monitoring time for SM varies. On the contrary, ESA CCI SM monitoring time is fixed at 00:00 UTC. Although the CCI SM data was averaged from daily to monthly, which reduced some errors, the inconsistency of daily monitoring time would also create errors and uncertainties in validation.
In this study, the validation results of ESA CCI SM were generally comparable to those reported in previous studies. While cropland and urban and built-up areas are the main land cover type in Taihu Lake Basin, the validation R2 values ranged from 0.002 to 0.285 for six ground-based station data. The average R2 was 0.125, which was lower than the mean values of the whole of China and northeast China as found by An et al. (2016) and Liu et al. (2018). The RMSE values range from 0.032 to 0.081 m3 m–3 and the mean RMSE was 0.054 m3 m–3, which was better than those of the whole of China (RMSE = 0.063 m3 m–3) and northeast China (RMSE = 0.074 m3 m–3) (An et al. 2016; Liu et al. 2018). ESA CCI SM data was averaged from daily to monthly to improve data quality, which could eliminate the outliers and reduce the errors. Ikonen et al. (2015) also found that averaging ESA CCI SM data from daily to weekly significantly improved correlation and reduced RMSE with in situ observations in the Sodankylä.
Quality of the downscaled SM data
In this study, the downscaled results of all four methods were better than the original data (Figure 8). The validation results with ground-based observations showed that the R2 values improved and RMSE values reduced, compared with validation results of the original ESA CCI SM (R2 = 0.125, RMSE = 0.054 m3 m–3). The downscaled results also showed the spatial details better (Figures 6 and 7). Among these four methods, the K-means linear method performed best.
Spatial distributions of multi-year average downscaled SM using the approaches of (a) Linear SM, (b) K-means linear SM, (c) LibSVM SM, (d) K-means LibSVM SM.
Spatial distributions of multi-year average downscaled SM using the approaches of (a) Linear SM, (b) K-means linear SM, (c) LibSVM SM, (d) K-means LibSVM SM.
Spatial distributions of representative maps of original ESA CCI SM and downscaled SM under different wetness conditions: original ESA CCI SM (a) low value, (b) mid-value and (c) high value months; K-means linear downscaled SM (d) low value, (e) mid-value and (f) high value months; K-means LibSVM downscaled SM (g) low value, (h) mid-value and (i) high value months. The results averaged all data of low value, mid-value and high value months’ data, respectively.
Spatial distributions of representative maps of original ESA CCI SM and downscaled SM under different wetness conditions: original ESA CCI SM (a) low value, (b) mid-value and (c) high value months; K-means linear downscaled SM (d) low value, (e) mid-value and (f) high value months; K-means LibSVM downscaled SM (g) low value, (h) mid-value and (i) high value months. The results averaged all data of low value, mid-value and high value months’ data, respectively.
Comparison of original ESA CCI SM and different downscaling methods validation results. Box plot of R2 (a) and RMSE (b). The mean, median, 25 and 75 percentiles, and maximum and minimum values of R2 and RMSE.
Comparison of original ESA CCI SM and different downscaling methods validation results. Box plot of R2 (a) and RMSE (b). The mean, median, 25 and 75 percentiles, and maximum and minimum values of R2 and RMSE.
However, the validation results of the downscaled SM did not perform perfectly. This could be attributed to the following reasons: First, spatial distributions of the downscaled SM and their correlations with the in situ SM measurements are restricted by the accuracy of the original ESA CCI SM dataset (Tagesson et al. 2018); Second, the land cover and soil properties have great influences on satellite-based SM products. Previous studies found that microwave penetration depth was affected by soil texture and canopy (Casa et al. 2013; Sawut et al. 2014). Consequently, the diverse land cover and soil texture in Taihu Lake Basin would cause the microwave sensor to penetrate at different depths; Third, in this study the ground-based monitoring data was the relative humidity (%). Errors and uncertainties may occur during the unit conversion (from relative humidity to volumetric soil water content) due to the field capacity and soil bulk density dataset having substantial uncertainty (An et al. 2016; Liu et al. 2018).
Linear vs. non-linear models in downscaling
The selection between linear or non-linear models was determined by the relationships between the SM and the ancillary factors used for downscaling. In this study, the linear models performed slightly better than the non-linear models. This could probably be attributed to two reasons. First, relationships between the SM and ancillary factors may be linear in this study area. Significant Pearson correlations (p < 0.05) were observed between the observed SM and NDVI/LST at the six ground stations. In addition, for the Pearson correlations between ESA CCI SM and NDVI/LST of these six ground stations, three of them were significant at p < 0.05. Second, the SVM model could not substantially improve the downscaling accuracy due to the lack of principle and method to determine the best kernel function. This would result in an over-fitting effect and thus increase the uncertainty. In some cases, the SVM did not always obtain optimal training and predict results, which meant it was not always better than linear regression (Twarakavi et al. 2009; Kovačević et al. 2010; Zhu et al. 2017). In previous studies, many scholars downscaled the satellite-based SM products well via linear models (Brocca et al. 2010; Piles et al. 2016; Knipper et al. 2017), whereas the non-linear models were also utilized successfully in other cases (Srivastava et al. 2013; Liu et al. 2018; Zhao et al. 2018b). Hence, it is necessary to explore the relationship between the SM and the ancillary factors when performing downscaling. Linear and non-linear models would have different applicability in different regions.
Considering wetness conditions in downscaling
In this study, the K-means clustering algorithm was used to classify the ESA CCI SM into low value, mid-value and high value months. Numerous previous studies have used the K-means clustering algorithm to perform classifications in soil and hydrology research. For example, Twarakavi et al. (2010) determined the ideal number of soil hydraulic classes by K-means clustering algorithm; Zhang et al. (2016b) reported that considering the K-means clustering of kriged soil texture, pedotransfer functions and model inversion can perform well in parameterizing the vadose zone hydraulic properties; Lai et al. (2018) also demonstrated the advantage of K-means clustering in soil hydraulic parameters’ spatial distribution.
Approaches considering wetness conditions in downscaling have generally outperformed those without considering wetness conditions. Therefore, building SM downscaling models for different wetness conditions seemed necessary (Figure 8 and Table 1). This was because under different wetness conditions, not only the major controlling factors of SM were different, but also the responses of SM to its controlling factors. For example, in the wet periods, topography has a great influence on SM distribution, but during the dry periods SM distribution depends primarily on soil properties (Grayson et al. 1999). Therefore, a single linear or non-linear model cannot well represent the relationship between the SM and auxiliary downscaling factors during the whole time series. It is essential to consider wetness conditions in downscaling satellite-based SM. However, few studies have taken this into account.
CONCLUSIONS
Although the original remote sensing SM has a wide coverage area, it cannot meet the high-precision analysis requirements due to the coarse spatial resolution. Therefore, downscaling work is necessary. In this study, the MODIS NDVI and LST were applied as auxiliary data to downscale the ESA CCI SM from 25- to 1-km via linear and non-linear models in Taihu Lake Basin. Comparing the linear model, the non-linear model cannot substantially improve the downscaling accuracy. In addition, whether considering the wetness conditions in building SM downscaling models could elevate the downscaling accuracy was tested in this study. When linear or non-linear downscaling models were constructed specifically for low value, mid-value and high value months, the accuracies of downscaled SM can be obviously improved. This indicates that wetness conditions should be considered in downscaling the satellite-based SM products in Taihu Lake Basin, which is a typical humid area with complex terrain and land use.
ACKNOWLEDGEMENTS
This study was financially supported by the National Natural Science Foundation of China (41971117), Key Research Plans of Frontier Sciences, Chinese Academy of Sciences (QYZDB-SSW-DQC038), and the Youth Innovation Promotion Association, Chinese Academy of Sciences (2020317).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.