Abstract
Soil moisture (SM) is a vital variable controlling water and energy exchange between the atmosphere and land surface. Spatiotemporally continuous SM information is urgently needed for large-scale meteorological and hydrological applications. Considering the weakness of the penalized least square regression based on the discrete cosine transform (DCT-PLS) method when the missing data are not evenly distributed in the original data set, this study proposes an in situ observation-combined DCT-PLS (ODCT-PLS) to reconstruct missing values of daily surface SM from the Climate Change Initiative program of the European Space Agency (ESA CCI). The result of the reconstruction for ESA CCI SM data in the Xiliaohe River Basin from 2013 to 2020 showed that the SM reconstructed by ODCT-PLS was in better agreement with in situ soil moisture compared with that reconstructed by DCT-PLS, with the average correlation coefficient (CORR) increasing by 0.3636, the average root mean squared error (RMSE) decreasing by 0.0109 m3/m3 and the average BIAS decreasing by 0.0047 m3/m3. Compared with the original ESA CCI SM, DCT-PLS and ODCT-PLS can both restore the spatial variation of SM in the study area. The reconstruction method proposed in our study provides a valuable alternative to reconstruct the three-dimensional geophysical dataset with spatially or temporally continuous data gap.
HIGHLIGHTS
This paper proposed a new reconstruction method based on the measured observation data and the DCT-PLS method.
This paper utilized the measured observation data by using the CDF matching method.
The new method this paper proposed can be applied to reconstruct three-dimensional geophysical datasets whose data gaps are spatial-temporally continuous.
INTRODUCTION
Soil moisture (SM) has been widely recognized as a key variable in the climate system, given its important role in controlling the exchange of water and energy between the land surface and atmosphere (Long et al. 2019; Wang et al. 2019a, 2019b; Yue et al. 2019). As a result, SM has been used in a wide range of applications, including drought monitoring, flood forecasting (Wanders et al. 2014) and water resource management (Dobriyal et al. 2012). A long time series of spatiotemporally continuous SM data can help to understand meteorological and hydrological processes. Therefore, reliable, accurate and complete SM information is urgently needed (Zhang et al. 2021a).
As an important environmental variable, SM can be obtained from in situ observations or retrieved from satellite remote sensing observations (Dirmeyer et al. 2004). The in situ SM observations have a reliable and accurate estimation at the point scale and can measure SM in different layers from the top layer to the deeper layer. However, they also have disadvantages and limitations such as the small spatial scale and the high cost (Dorigo et al. 2017). In the last few decades, satellite remote sensing has been considered an effective and powerful tool for retrieving surface SM (usually at the depth of 0–5 cm) (Zeng et al. 2015). Particularly, microwave remote sensing is recognized as the most promising method to monitor SM owing to its capability to work 24 h a day without being limited by weather conditions, to penetrate clouds, vegetation and soil and the physical relation to SM and dielectric constant (Moran et al. 2004). There are a number of global remote sensing SM products retrieved by microwave sensors from various satellite research and application institutions such as the Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the Soil Mobility Active Passive (SMAP), the Soil Mobility and Ocean Salinity (SMOS) of the European Space Agency (ESA), the Advanced Microwave Scanning Radiometer 2 (AMSR2) of Japan and the Microwave Radiation Imager (MWRI) of China (Han et al. 2015; Zeng et al. 2015; Xu & Frey 2021). However, the temporal coverages of the satellite products mentioned above are much too short to satisfy the need for large numbers of hydrometeorological applications (Jin & Henderson 2011; Fan et al. 2020; Zhang et al. 2021b). To handle this challenge, the Climate Change Initiative of the European Space Agency (ESA CCI) produced a complete (temporal coverage for more than 40 years) and consistent global SM dataset by blending active and passive microwave products, the so-called ESA CCI SM product (Liu et al. 2011; Albergel et al. 2012; Dorigo et al. 2017).
Although the ESA CCI SM data set can provide a long-term SM data product on a global scale, one of the limitations is that there are data gaps in it due to satellite orbit changes, radio frequency interference (RFI) and physical limitations of satellite sensors. The data gaps limit the practical application of ESA CCI SM products, especially in fields that need spatially complete and temporally continuous SM data, such as climate simulation, drought monitoring and water resources management (Llamas et al. 2020).
Several methods have been proposed to fill the gap of satellite-derived data sets. From the perspective of dimension, these methods can be divided into the following two categories: the remote sensing image reconstruction methods based on spatial variation and the time series reconstruction methods based on the temporal change. The remote sensing image reconstruction methods mainly include inverse distance weighting (IDW) (Fan et al. 2016; Yan et al. 2020), kriging (Pesquer et al. 2011) and the point estimation model of biased sentinel hospitals-based area disease estimation (PBSHADE) (Xu et al. 2013). These approaches usually assume that the original part and the missing part of the image have the same or similar spatial variability and generally take spatial autocorrelation and spatial heterogeneity into account. The time series reconstruction methods utilize the time change information to interpolate the missing values, such as the simple exponential smoothing (SES) (Gardner 2006) and autoregressive integrated moving average (ARIMA) (Yozgatligil et al. 2013). However, these time series reconstruction methods may perform poorly when the temporal coverage of the original time series is shorter than the shortest period of the target variable or the spatial correlation of the target variable is stronger than the temporal correlation. In addition, the machine learning algorithms such as random forest (RF), support vector machine (SVM) and annual neural network (ANN), are widely used in filling data gaps of remotely sensed data in recent years (Zhang et al. 2021a). However, the model based on the machine learning algorithm is hardly transferable for it largely depends on the selection of input variables and the relationship between input variables and target variables. That is to say, the structure of the model is highly related to the input variables, the target variable, study area and period.
For large geophysical data sets, it is very important to use spatiotemporal variation information to estimate missing values. Wang et al. (2012) proposed a penalized least square regression based on the discrete cosine transform (DCT-PLS) method to fill the data gap of global SM data sets. The DCT-PLS method can make full use of the full three-dimensional SM information in time and space to fill the data gap and can obtain reliable estimation when the missing values are evenly distributed in the original data set. However, the gaps in many remotely sensed SM data sets are not distributed uniformly (Garcia 2010; Dembele et al. 2019). Taking ESA CCI SM in the Xiliaohe River Basin as an example, remotely sensed SM of the whole study area is continuously missing in the non-growing season of vegetation leading to the loss of full three-dimensional information of SM on the time and space scales. Moreover, when the continuous gap of the remote sensing data exceeds the range of spatiotemporal autocorrelation of SM, the result of reconstruction only using the original data will inevitably deviate from the actual value.
Fortunately, the in situ SM observations can make up for the full three-dimensional information in space and time due to its temporal integrity and spatial representativeness despite the fact that it is measured at the point scale. Therefore, this study attempts to utilize the three-dimensional information provided by the measured SM data to reconstruct the ESA CCI SM product based on DCT-PLS when the missing values are not distributed uniformly. The proposed method is named the in situ observation-combined penalized least square regression based on the discrete information transformation (ODCT-PLS). The main idea of the ODCT-PLS method is to reconstruct the remotely sensed data equipped with the true three-dimensional information provided by the in situ observations after bias correction to fill data gaps to obtain more accurate and reliable reconstruction.
MATERIALS
Study area
Data
Spatial distribution of ratio of ESA CCI SM data gaps during 2013–2020.
The in situ soil moisture observations (SM_O) used in this study is from the National Meteorological Administration, which is measured at six depths (10, 20, 30, 40, 50 and 60 cm) in terms of volumetric water content (m3/m3) with a daily temporal resolution. SM at the top layer (0–10 cm) is used to assess the effect of two reconstruction methods.
Time series of ESA CCI SM and in situ soil moisture observations of Huolinguole, Alukeerqin and Naiman in 2019.
Time series of ESA CCI SM and in situ soil moisture observations of Huolinguole, Alukeerqin and Naiman in 2019.
METHODS
Bias correction



The CDF adjustment process is as follows:
Draw the CDF curves of SM_S and SM_O, respectively, and divide the curve into ten segments with 0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 as the nodes.
Perform segmented linear regression on each segment to obtain the corresponding linear regression equations.
Each corresponding linear regression equation is applied to SM_O according to the range of the CDF of SM_O, and the adjusted SM_O_bc is obtained.
DCT-PLS
DCT-PLS was originally used for automatic smoothing of multidimensional incomplete dataset. Therefore, it can fill missing data in spatiotemporal geophysical datasets after adjustment. Penalized least squares regression is a thin plate spline smoothing method for one-dimensional data, which balances the fidelity of the data with the roughness of the average function. Garcia (2010) proved that the penalized least squares regression can be expressed by DCT, which uses the sum of cosine functions oscillating at different frequencies to represent data. Since DCT can be multidimensional, the penalized least squares regression based on DCT can be extended to a multidimensional dataset. The DCT-PLS algorithm is briefly introduced following, and the mathematical details refer to the reference (Garcia 2010).









The optimal s value is automatically selected with the goal of minimizing GCV. In our process of data reconstruction, the minimum GCV value is 0.0041 and the corresponding optimal s value is 6.34810−4.
The DCT-PLS method can make full use of the spatiotemporal variation information of remote sensing data and achieve good reconstruction results when the missing values are randomly and uniformly distributed in original remote sensing data. However, as mentioned in the Data section, SM_S in our study shows a mode of continuously temporal loss. Therefore, it is easy for the reconstructed SM_S using DCT-PLS (SM_DCTPLS) to lose the temporal variation information of SM, especially continuously low values in the non-growing seasons. That is to say, the direct application of DCT-PLS for reconstruction may lead to the overestimation of SM_S, especially in the period when data are missing.
ODCT-PLS
To overcome the shortcoming of DCT-PLS, this study proposes an ODCT-PLS approach to reconstruct SM_S, which can make full use of the time series information of observed SM_O and can achieve good result when there are chunks of missing values temporally or spatially in SM_S. The main steps of ODCT-PLS are as follows: first, the CDF matching method is adopted to match the observation data with remote sensing data. Second, the missing remote sensing data of the pixel where the station located is supplemented with the matched observation data for each station. Finally, the missing data of the remote sensing data with the matched observation data are reconstructed using DCT-PLS and the reconstruction result is called SM_ODCTPLS. The specific procedures are as follows:
The CDF matching method was used to establish piecewise linear equations for SM_O and SM_S in the period where data of SM_S are complete.
Piecewise linear equation was applied to SM_O in the period when SM_S missed, and the SM_O_bc was added to SM_S to obtain SM_SO.
Reconstruct the SM_SO using DCT-PLS.
Evaluation metrics











The value range of the first part on the right side of the equation is [−1,1], which quantified the correlation between S and R. The value range of the middle part is [0,1], which quantified the average value similarity between S and R. The value range of the last part is also [0,1], which quantified the variance similarity between S and R and being the optical value 1 if =
. The value range of UIQI is [−1,1], and the closer the value is to 1, the more similar S and R are (Yozgatligil et al. 2013), the better the power of the reconstruction method to save the spatial heterogeneity is.
RESULTS
Performance of bias correction of in situ SM observations
Table 1 shows RMSE, CORR (P < 0.01) and BIAS of the in situ SM observations (SM_O) at all stations before and after deviation correction and remote sensing data during the study period (2013–2020). It can be seen from Table 1 that RMSE after deviation correction is smaller than before deviation correction, and the value is below 0.055. Except for Balihan, Gangzi and other stations whose CORR was greater than 0.5 before correction, the CORR increased after correction, and more than half of the stations had CORR greater than 0.5 after correction. Except for the stations of Naiman, Kezuozhong, Zhalute and Fuhe, the value of BIAS after correction decreased significantly compared with that before correction, and the value was around 0.02, and the phenomenon of ‘underestimation’ of remote sensing data from measured data before correction was significantly improved in Wengniute, Gangzi, Shuangliao, Keshiketeng, Linxi, Balinyou and other stations. In general, the BIAS correction method has a better effect on pixels with large difference between SM_O and SM_S mean values at Huolinguole and Kailu stations, and can significantly reduce RMSE and BIAS between measured data and remote sensing data. This is exactly the purpose of applying the bias correction method to measured data in this study. Therefore, the CDF matching method can meet the requirements of this study to correct the measured data and improve the time series information of remote sensing data.
Statistic comparison of SM_O and SM_O_bc against SM_S
Station name . | SM_O . | SM_O_bc . | ||||
---|---|---|---|---|---|---|
RMSE . | CORR . | BIAS . | RMSE . | CORR . | BIAS . | |
Huolinguole | 0.155 | 0.403 | 0.115 | 0.049 | 0.390 | 0.021 |
Alukeerqin | 0.040 | 0.472 | 0.020 | 0.037 | 0.610 | 0.019 |
Naiman | 0.049 | 0.360 | 0.000 | 0.042 | 0.363 | 0.018 |
Aohan | 0.068 | 0.389 | −0.036 | 0.045 | 0.406 | 0.018 |
Ningcheng | 0.062 | 0.215 | −0.025 | 0.048 | 0.251 | 0.018 |
Wengniute | 0.081 | 0.527 | −0.074 | 0.039 | 0.559 | 0.019 |
Chifeng | 0.055 | −0.015 | 0.039 | 0.053 | −0.338 | −0.017 |
Kailu | 0.102 | 0.250 | 0.052 | 0.046 | 0.311 | 0.016 |
Tongliao | 0.050 | 0.476 | −0.017 | 0.042 | 0.502 | 0.017 |
Balihan | 0.041 | 0.636 | −0.022 | 0.037 | 0.627 | 0.016 |
Keerqin | 0.065 | 0.503 | −0.049 | 0.038 | 0.504 | 0.017 |
Gangzi | 0.079 | 0.676 | −0.070 | 0.033 | 0.658 | 0.018 |
Shuangliao | 0.089 | 0.424 | −0.074 | 0.049 | 0.506 | 0.019 |
Keshiketeng | 0.071 | 0.451 | −0.064 | 0.037 | 0.445 | 0.018 |
Linxi | 0.073 | 0.659 | −0.063 | 0.034 | 0.641 | 0.017 |
Balinyou | 0.087 | 0.647 | −0.080 | 0.035 | 0.635 | 0.018 |
Kezuozhong | 0.081 | 0.187 | −0.012 | 0.059 | 0.285 | 0.020 |
Shebotu | 0.058 | 0.574 | 0.022 | 0.044 | 0.548 | 0.017 |
Balinzuo | 0.060 | 0.759 | −0.036 | 0.032 | 0.726 | 0.018 |
Zhalute | 0.041 | 0.725 | 0.008 | 0.032 | 0.681 | 0.013 |
Fuhe | 0.053 | 0.387 | −0.006 | 0.040 | 0.375 | 0.019 |
Bayaertuhushuo | 0.113 | 0.709 | −0.108 | 0.036 | 0.676 | 0.017 |
Station name . | SM_O . | SM_O_bc . | ||||
---|---|---|---|---|---|---|
RMSE . | CORR . | BIAS . | RMSE . | CORR . | BIAS . | |
Huolinguole | 0.155 | 0.403 | 0.115 | 0.049 | 0.390 | 0.021 |
Alukeerqin | 0.040 | 0.472 | 0.020 | 0.037 | 0.610 | 0.019 |
Naiman | 0.049 | 0.360 | 0.000 | 0.042 | 0.363 | 0.018 |
Aohan | 0.068 | 0.389 | −0.036 | 0.045 | 0.406 | 0.018 |
Ningcheng | 0.062 | 0.215 | −0.025 | 0.048 | 0.251 | 0.018 |
Wengniute | 0.081 | 0.527 | −0.074 | 0.039 | 0.559 | 0.019 |
Chifeng | 0.055 | −0.015 | 0.039 | 0.053 | −0.338 | −0.017 |
Kailu | 0.102 | 0.250 | 0.052 | 0.046 | 0.311 | 0.016 |
Tongliao | 0.050 | 0.476 | −0.017 | 0.042 | 0.502 | 0.017 |
Balihan | 0.041 | 0.636 | −0.022 | 0.037 | 0.627 | 0.016 |
Keerqin | 0.065 | 0.503 | −0.049 | 0.038 | 0.504 | 0.017 |
Gangzi | 0.079 | 0.676 | −0.070 | 0.033 | 0.658 | 0.018 |
Shuangliao | 0.089 | 0.424 | −0.074 | 0.049 | 0.506 | 0.019 |
Keshiketeng | 0.071 | 0.451 | −0.064 | 0.037 | 0.445 | 0.018 |
Linxi | 0.073 | 0.659 | −0.063 | 0.034 | 0.641 | 0.017 |
Balinyou | 0.087 | 0.647 | −0.080 | 0.035 | 0.635 | 0.018 |
Kezuozhong | 0.081 | 0.187 | −0.012 | 0.059 | 0.285 | 0.020 |
Shebotu | 0.058 | 0.574 | 0.022 | 0.044 | 0.548 | 0.017 |
Balinzuo | 0.060 | 0.759 | −0.036 | 0.032 | 0.726 | 0.018 |
Zhalute | 0.041 | 0.725 | 0.008 | 0.032 | 0.681 | 0.013 |
Fuhe | 0.053 | 0.387 | −0.006 | 0.040 | 0.375 | 0.019 |
Bayaertuhushuo | 0.113 | 0.709 | −0.108 | 0.036 | 0.676 | 0.017 |
Note: p < 0.01.
Time series of SM_O, SM_O_bc and SM_S of Huolinguole, Alukeerqin and Naiman in 2019.
Time series of SM_O, SM_O_bc and SM_S of Huolinguole, Alukeerqin and Naiman in 2019.
Among these three sites, Huolinguole has the smallest ratio of missing data, the largest RMSE and BIAS between SM_O and SM_S, and the best correction effect. In general, the effect of bias correction has little relation with the percentage of missing data, which means the effect of bias correction does not get better with the decrease of the percentage of missing data. On one hand, the establishment of piecewise linear equations and the calculation of evaluation indicators are carried out in the period of non-missing data. On the other hand, for stations with small systematic error and high correlation between SM_O and SM_S, bias correction may introduce additional uncertainties.
Performance of data reconstruction for ESA CCI SM
Performance of data reconstruction against observation SM
Statistics comparison of SM_DCT-PLS and SM_ODCT-PLS against SM_O
Station name . | SM_DCT-PLS and SM_O . | SM_ODCT-PLS and SM_O . | ||||
---|---|---|---|---|---|---|
RMSE . | CORR . | BIAS . | RMSE . | CORR . | BIAS . | |
Huolinguole | 0.124 | 0.176 | −0.064 | 0.114 | 0.554 | −0.069 |
Alukeerqin | 0.044 | 0.083 | −0.001 | 0.032 | 0.599 | −0.010 |
Naiman | 0.075 | 0.110 | 0.034 | 0.061 | 0.477 | 0.028 |
Aohan | 0.101 | 0.150 | 0.072 | 0.085 | 0.514 | 0.063 |
Ningcheng | 0.100 | 0.273 | 0.059 | 0.100 | 0.392 | 0.066 |
Wengniute | 0.088 | 0.348 | 0.081 | 0.086 | 0.556 | 0.080 |
Chifeng | 0.057 | −0.159 | −0.029 | 0.055 | 0.231 | −0.034 |
Kailu | 0.095 | 0.019 | −0.012 | 0.081 | 0.422 | −0.020 |
Tongliao | 0.093 | −0.067 | 0.052 | 0.064 | 0.620 | 0.041 |
Balihan | 0.048 | 0.545 | 0.029 | 0.048 | 0.674 | 0.035 |
Keerqin | 0.102 | 0.203 | 0.079 | 0.093 | 0.591 | 0.078 |
Gangzi | 0.092 | 0.607 | 0.083 | 0.095 | 0.694 | 0.088 |
Shuangliao | 0.108 | 0.299 | 0.092 | 0.108 | 0.511 | 0.097 |
Keshiketeng | 0.094 | 0.069 | 0.086 | 0.082 | 0.556 | 0.077 |
Linxi | 0.112 | 0.144 | 0.097 | 0.092 | 0.680 | 0.083 |
Balinyou | 0.124 | 0.152 | 0.112 | 0.103 | 0.689 | 0.097 |
Kezuozhong | 0.100 | 0.108 | 0.045 | 0.094 | 0.355 | 0.051 |
Shebotu | 0.067 | 0.195 | 0.007 | 0.046 | 0.664 | −0.008 |
Balinzuo | 0.097 | 0.351 | 0.075 | 0.078 | 0.741 | 0.063 |
Zhalute | 0.054 | 0.360 | 0.014 | 0.037 | 0.719 | 0.005 |
Fuhe | 0.093 | −0.133 | 0.052 | 0.070 | 0.533 | 0.041 |
Bayaertuhushuo | 0.136 | 0.392 | 0.129 | 0.127 | 0.678 | 0.122 |
Station name . | SM_DCT-PLS and SM_O . | SM_ODCT-PLS and SM_O . | ||||
---|---|---|---|---|---|---|
RMSE . | CORR . | BIAS . | RMSE . | CORR . | BIAS . | |
Huolinguole | 0.124 | 0.176 | −0.064 | 0.114 | 0.554 | −0.069 |
Alukeerqin | 0.044 | 0.083 | −0.001 | 0.032 | 0.599 | −0.010 |
Naiman | 0.075 | 0.110 | 0.034 | 0.061 | 0.477 | 0.028 |
Aohan | 0.101 | 0.150 | 0.072 | 0.085 | 0.514 | 0.063 |
Ningcheng | 0.100 | 0.273 | 0.059 | 0.100 | 0.392 | 0.066 |
Wengniute | 0.088 | 0.348 | 0.081 | 0.086 | 0.556 | 0.080 |
Chifeng | 0.057 | −0.159 | −0.029 | 0.055 | 0.231 | −0.034 |
Kailu | 0.095 | 0.019 | −0.012 | 0.081 | 0.422 | −0.020 |
Tongliao | 0.093 | −0.067 | 0.052 | 0.064 | 0.620 | 0.041 |
Balihan | 0.048 | 0.545 | 0.029 | 0.048 | 0.674 | 0.035 |
Keerqin | 0.102 | 0.203 | 0.079 | 0.093 | 0.591 | 0.078 |
Gangzi | 0.092 | 0.607 | 0.083 | 0.095 | 0.694 | 0.088 |
Shuangliao | 0.108 | 0.299 | 0.092 | 0.108 | 0.511 | 0.097 |
Keshiketeng | 0.094 | 0.069 | 0.086 | 0.082 | 0.556 | 0.077 |
Linxi | 0.112 | 0.144 | 0.097 | 0.092 | 0.680 | 0.083 |
Balinyou | 0.124 | 0.152 | 0.112 | 0.103 | 0.689 | 0.097 |
Kezuozhong | 0.100 | 0.108 | 0.045 | 0.094 | 0.355 | 0.051 |
Shebotu | 0.067 | 0.195 | 0.007 | 0.046 | 0.664 | −0.008 |
Balinzuo | 0.097 | 0.351 | 0.075 | 0.078 | 0.741 | 0.063 |
Zhalute | 0.054 | 0.360 | 0.014 | 0.037 | 0.719 | 0.005 |
Fuhe | 0.093 | −0.133 | 0.052 | 0.070 | 0.533 | 0.041 |
Bayaertuhushuo | 0.136 | 0.392 | 0.129 | 0.127 | 0.678 | 0.122 |
Note: p < 0.01.
Time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019.
Time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019.
From the time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019 (Figure 5), all the CORR between SM_ODCT-PLS and SM_O are above 0.55 and higher than that of SM_DCT-PLS and SM_O, indicating that SM_ODCT-PLS is more related with SM_O than SM_DCT-PLS. Due to the continuous loss of SM_S in January, February and December of 2019 (Figure 3), DCT-PLS would lead to the overestimation for SM since the reconstruction is on the basis of the remaining data which is high in one year. Unlikely, SM_ODCT-PLS integrated the information of time series of both SM_S and SM_O, so that the reconstruction error caused by time-continuously missing data was avoided effectively by using more complete and valid data.
The above analyses show that, SM_ODCT-PLS has smaller mean error and stronger correlation with SM_O compared with SM_DCT-PLS. Therefore, ODCT-PLS can not only retain the spatiotemporal three-dimensional information of remote sensing data, but also could reduce the phenomenon of ‘overestimation’ for the temporally continuous missing data. By the way, chunks of data gaps of remote sensing data will lead to overestimation or underestimation, depending on the characteristics of the geographical phenomenon and the periods or regions where it is missed.
Performance of data reconstruction against remote sensing SM
Spatial distribution of (a) SM_S, (b) SM_DCT-PLS and (c) SM_ODCT-PLS on July 31, 2019.
Spatial distribution of (a) SM_S, (b) SM_DCT-PLS and (c) SM_ODCT-PLS on July 31, 2019.
Box plots of the UIQI of SM_DCT-PLS and SM_ODCT-PLS against SM_S from 2013 to 2020.
Box plots of the UIQI of SM_DCT-PLS and SM_ODCT-PLS against SM_S from 2013 to 2020.
Spatial distribution of RMSE, CORR and BIAS of SM_DCT-PLS (left) and SM_ODCT-PLS (right) against SM_S.
Spatial distribution of RMSE, CORR and BIAS of SM_DCT-PLS (left) and SM_ODCT-PLS (right) against SM_S.
DISCUSSION
Performance analysis of ODCT-PLS
Previous study showed that the pattern of data gap affects the performance of reconstruction (Kong et al. 2014). As one of the optimization estimation methods dependent on the time and space information of the original data, DCT-PLS performs well when reconstructing large three-dimensional geophysical data whose missing data is distributed uniformly (Zhang et al. 2016) or in a single dimension (Fredj et al. 2016). However, the effect of DCT-PLS is greatly reduced when there is a large chunk of missing data in the original data to be reconstructed (Kong et al. 2014; Liu et al. 2020). For instance, Liu et al. (2020) evaluated trend differences of LST time series before and after reconstruction for MODIS LST data sets using DCT-PLS, and found that there were obvious trend differences in areas with more missing data, indicating that a large number of data missing may lead to obvious trend uncertainties such as overestimation or underestimation. In this study, ESA CCI SM in the Xiliaohe River Basin has a total data loss in the study area in a continuous period. Since the missing values were concentrated in January, February and December when SM was low in the Xiliaohe River Basin, SM_DCT-PLS will overestimate the actual SM due to the influence of the non-missing data before and after the period where data were missing. By the way, it is uncertain that how the estimated values using DCT-PLS would deviate the actual values, overestimate or underestimate, depending on the type of the geographical event, the period of missing values and the region in a specific study.
An ODCT-PLS approach can avoid the overestimation or underestimation effectively by using matched observation data (SM_O_bc) to supplement the time series information of remote sensing data. The estimated SM using ODCT-PLS has smaller deviation and greater correlation with the observation SM at stations compared to those estimated by DCT-PLS. The SM images reconstructed by DCT-PLS and ODCT-PLS have strong spatial consistency with remote sensing SM images, suggesting that they can both retain well the spatial heterogeneity of SM. The results illustrated that ODCT-PLS approach overcame the limitation of DCT-PLS when there are large chunks of missing data in the original remote sensing data, so it can achieve good reconstruction results even for the remote sensing data with continuous missing data in space and time.
Possible influence of the temporal and spatial distribution of observation data on the effect of ODCT-PLS
Yang et al. (2018) reconstructed the SM data obtained by inversion of Fengyun 3B remote sensing data using different methods and found that DCT-PLS could only get good filling results in homogeneous areas. Fredj et al. (2016) used DCT-PLS to fill the gap of coastal ocean surface current under different data missing scenarios, and found that the validity of DCT-PLS depends on the characteristics of surrounding ocean currents. In other word, the validity of DCT-PLS depends on the spatiotemporal representativeness of original data sets. Perhaps the better method is to take the ‘temporal and spatial distribution patterns’ (Gong & Si 2013; Yi 2013; Tan et al. 2020) of geographical data as a priori information for the interpolation or reconstruction of remote sensing data.
In this study, the innovation of the proposed ODCT-PLS approach for gap-filling of the three-dimensional remote sensing data mainly lies in the application of in situ SM observations and the improvement of the DCT-PLS method. In previous studies, measured data are generally used as the reference data for the bias correction of remote sensing data, whereas in this study the measured data is matched to the remote sensing data and used as a supplement to provide some temporal and spatial distribution information. That is to say, the effectiveness of ODCT-PLS also depends on the temporal integrity and spatial representativeness of the in situ observations. Temporal integrity ensures the piecewise linear equation establishment and piecewise linear matching of the observation data in the non-missing period of remote sensing data, and spatial representativeness ensures the reconstruction effect in the period with all original remote sensing data in the study area missing. In other words, the great performance of ODCT-PLS requires that the observation sites be distributed randomly or distributed uniformly according to soil types in the study area. Moreover, the ODCT-PLS approach performs well only if the in situ observations are temporally-complete, especially when the missing data gaps exist in remote sensing data.
CONCLUSIONS
This study proposes an ODCT-PLS approach to reconstruct the ESA CCI SM data which uses observed SM data to supplement the time series information of remote sensing data. The CDF matching method is used to reduce the systematic deviation between the site-measured SM and the remotely sensed SM. The ODCT-PLS approach used matched site observed SM data to supplement the time series information of remote sensing SM. The result of the reconstruction for the ESA CCI SM data in the Xiliaohe River Basin from 2013 to 2020 showed that the SM reconstructed by ODCT-PLS was in better agreement with in situ SM observations compared with that reconstructed by DCT-PLS, with the average CORR increasing by 0.3636, the average RMSE decreasing by 0.0109 m3/m3 and the average BIAS decreasing by 0.0047 m3/m3. Compared with the original ESA CCI SM, DCT-PLS and ODCT-PLS can both restore the spatial variation of SM in study area. The study demonstrated the ODCT-PLS approach can better retain the spatiotemporal information of remote sensing data and can also reduce the overestimation in gap-filling caused by the continuous lack of remote sensing data. It can be a valuable alternative to reconstruct the three-dimensional geophysical dataset with spatially or temporally continuous data gap.
ACKNOWLEDGEMENTS
We are grateful for the financial support provided by the National Key Research and Development Program of China (No. 2019YFC1510601) and the National Natural Science of China (No. 42071040 and No. U2243203).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.