Soil moisture (SM) is a vital variable controlling water and energy exchange between the atmosphere and land surface. Spatiotemporally continuous SM information is urgently needed for large-scale meteorological and hydrological applications. Considering the weakness of the penalized least square regression based on the discrete cosine transform (DCT-PLS) method when the missing data are not evenly distributed in the original data set, this study proposes an in situ observation-combined DCT-PLS (ODCT-PLS) to reconstruct missing values of daily surface SM from the Climate Change Initiative program of the European Space Agency (ESA CCI). The result of the reconstruction for ESA CCI SM data in the Xiliaohe River Basin from 2013 to 2020 showed that the SM reconstructed by ODCT-PLS was in better agreement with in situ soil moisture compared with that reconstructed by DCT-PLS, with the average correlation coefficient (CORR) increasing by 0.3636, the average root mean squared error (RMSE) decreasing by 0.0109 m3/m3 and the average BIAS decreasing by 0.0047 m3/m3. Compared with the original ESA CCI SM, DCT-PLS and ODCT-PLS can both restore the spatial variation of SM in the study area. The reconstruction method proposed in our study provides a valuable alternative to reconstruct the three-dimensional geophysical dataset with spatially or temporally continuous data gap.

  • This paper proposed a new reconstruction method based on the measured observation data and the DCT-PLS method.

  • This paper utilized the measured observation data by using the CDF matching method.

  • The new method this paper proposed can be applied to reconstruct three-dimensional geophysical datasets whose data gaps are spatial-temporally continuous.

Soil moisture (SM) has been widely recognized as a key variable in the climate system, given its important role in controlling the exchange of water and energy between the land surface and atmosphere (Long et al. 2019; Wang et al. 2019a, 2019b; Yue et al. 2019). As a result, SM has been used in a wide range of applications, including drought monitoring, flood forecasting (Wanders et al. 2014) and water resource management (Dobriyal et al. 2012). A long time series of spatiotemporally continuous SM data can help to understand meteorological and hydrological processes. Therefore, reliable, accurate and complete SM information is urgently needed (Zhang et al. 2021a).

As an important environmental variable, SM can be obtained from in situ observations or retrieved from satellite remote sensing observations (Dirmeyer et al. 2004). The in situ SM observations have a reliable and accurate estimation at the point scale and can measure SM in different layers from the top layer to the deeper layer. However, they also have disadvantages and limitations such as the small spatial scale and the high cost (Dorigo et al. 2017). In the last few decades, satellite remote sensing has been considered an effective and powerful tool for retrieving surface SM (usually at the depth of 0–5 cm) (Zeng et al. 2015). Particularly, microwave remote sensing is recognized as the most promising method to monitor SM owing to its capability to work 24 h a day without being limited by weather conditions, to penetrate clouds, vegetation and soil and the physical relation to SM and dielectric constant (Moran et al. 2004). There are a number of global remote sensing SM products retrieved by microwave sensors from various satellite research and application institutions such as the Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the Soil Mobility Active Passive (SMAP), the Soil Mobility and Ocean Salinity (SMOS) of the European Space Agency (ESA), the Advanced Microwave Scanning Radiometer 2 (AMSR2) of Japan and the Microwave Radiation Imager (MWRI) of China (Han et al. 2015; Zeng et al. 2015; Xu & Frey 2021). However, the temporal coverages of the satellite products mentioned above are much too short to satisfy the need for large numbers of hydrometeorological applications (Jin & Henderson 2011; Fan et al. 2020; Zhang et al. 2021b). To handle this challenge, the Climate Change Initiative of the European Space Agency (ESA CCI) produced a complete (temporal coverage for more than 40 years) and consistent global SM dataset by blending active and passive microwave products, the so-called ESA CCI SM product (Liu et al. 2011; Albergel et al. 2012; Dorigo et al. 2017).

Although the ESA CCI SM data set can provide a long-term SM data product on a global scale, one of the limitations is that there are data gaps in it due to satellite orbit changes, radio frequency interference (RFI) and physical limitations of satellite sensors. The data gaps limit the practical application of ESA CCI SM products, especially in fields that need spatially complete and temporally continuous SM data, such as climate simulation, drought monitoring and water resources management (Llamas et al. 2020).

Several methods have been proposed to fill the gap of satellite-derived data sets. From the perspective of dimension, these methods can be divided into the following two categories: the remote sensing image reconstruction methods based on spatial variation and the time series reconstruction methods based on the temporal change. The remote sensing image reconstruction methods mainly include inverse distance weighting (IDW) (Fan et al. 2016; Yan et al. 2020), kriging (Pesquer et al. 2011) and the point estimation model of biased sentinel hospitals-based area disease estimation (PBSHADE) (Xu et al. 2013). These approaches usually assume that the original part and the missing part of the image have the same or similar spatial variability and generally take spatial autocorrelation and spatial heterogeneity into account. The time series reconstruction methods utilize the time change information to interpolate the missing values, such as the simple exponential smoothing (SES) (Gardner 2006) and autoregressive integrated moving average (ARIMA) (Yozgatligil et al. 2013). However, these time series reconstruction methods may perform poorly when the temporal coverage of the original time series is shorter than the shortest period of the target variable or the spatial correlation of the target variable is stronger than the temporal correlation. In addition, the machine learning algorithms such as random forest (RF), support vector machine (SVM) and annual neural network (ANN), are widely used in filling data gaps of remotely sensed data in recent years (Zhang et al. 2021a). However, the model based on the machine learning algorithm is hardly transferable for it largely depends on the selection of input variables and the relationship between input variables and target variables. That is to say, the structure of the model is highly related to the input variables, the target variable, study area and period.

For large geophysical data sets, it is very important to use spatiotemporal variation information to estimate missing values. Wang et al. (2012) proposed a penalized least square regression based on the discrete cosine transform (DCT-PLS) method to fill the data gap of global SM data sets. The DCT-PLS method can make full use of the full three-dimensional SM information in time and space to fill the data gap and can obtain reliable estimation when the missing values are evenly distributed in the original data set. However, the gaps in many remotely sensed SM data sets are not distributed uniformly (Garcia 2010; Dembele et al. 2019). Taking ESA CCI SM in the Xiliaohe River Basin as an example, remotely sensed SM of the whole study area is continuously missing in the non-growing season of vegetation leading to the loss of full three-dimensional information of SM on the time and space scales. Moreover, when the continuous gap of the remote sensing data exceeds the range of spatiotemporal autocorrelation of SM, the result of reconstruction only using the original data will inevitably deviate from the actual value.

Fortunately, the in situ SM observations can make up for the full three-dimensional information in space and time due to its temporal integrity and spatial representativeness despite the fact that it is measured at the point scale. Therefore, this study attempts to utilize the three-dimensional information provided by the measured SM data to reconstruct the ESA CCI SM product based on DCT-PLS when the missing values are not distributed uniformly. The proposed method is named the in situ observation-combined penalized least square regression based on the discrete information transformation (ODCT-PLS). The main idea of the ODCT-PLS method is to reconstruct the remotely sensed data equipped with the true three-dimensional information provided by the in situ observations after bias correction to fill data gaps to obtain more accurate and reliable reconstruction.

Study area

The Xiliaohe River Basin is located between 116 °32′–124 °30′E and 41 °05′–45 °13′E in Northeast China, covering an area of approximately 13,700 km2. The main area is in Inner Mongolia Autonomous Region, with some small marginal areas in Jilin, Liaoning and Hebei provinces (Figure 1). The basin is surrounded by mountains in the north, west and south, and is adjacent to the Liaohe River Plain in the east. The terrain gradually decreases from west to east, with an altitude of 82–2,054 m. The main stream in the basin flows from west to east and finally into the Liaohe River, with the length of 312.69 km. The main tributaries include the Xilamulunhe River, the Jiaolaihe River and the Laohahe River, etc. (Figure 1). The region is temperate continental climate, which is mainly characterized by drought, little rain, an annual evaporation more than the water supply, annual average temperature of 5.0–6.5 °C, annual precipitation of 300–400 mm and annual sunshine duration of 2,800–3,100 h. It is the intersection and transition zone of the traditional agricultural area and the animal husbandry area, which is sensitive to climatic and ecological changes.
Figure 1

Location and topography of the study area.

Figure 1

Location and topography of the study area.

Close modal

Data

The remotely sensed soil moisture dataset (SM_S) used in our study is obtained from ESA CCI (http://www.esasoilmosture-cci.org/). The ESA CCI SM product is composed by combining satellite SM records of multiple active and passive microwave sensors in a collaborative manner with the daily temporal resolution, a spatial resolution of 0.25° and a unit of m3/m3. In this study, the ESA CCI SM dataset from 2013 to 2020 is utilized to evaluate the performance of ODCT-PLS and DCT-PLS in filling the data gap. According to Figure 2, the ratio of ESA CCI SM data gaps in the Xiliaohe River Basin of 2013–2020 is between 0.36 and 0.51, showing a spatial trend of decreasing from northwest to southeast.
Figure 2

Spatial distribution of ratio of ESA CCI SM data gaps during 2013–2020.

Figure 2

Spatial distribution of ratio of ESA CCI SM data gaps during 2013–2020.

Close modal

The in situ soil moisture observations (SM_O) used in this study is from the National Meteorological Administration, which is measured at six depths (10, 20, 30, 40, 50 and 60 cm) in terms of volumetric water content (m3/m3) with a daily temporal resolution. SM at the top layer (0–10 cm) is used to assess the effect of two reconstruction methods.

The ratio of data gaps in ESA CCI SM at Huolinguole, Alukeerqin and Naiman stations decreased successively, being 0.50, 0.46 and 0.40, respectively. Time series of in situ SM observations and ESA CCI SM at these three stations of the year 2019 are shown in Figure 3. It can be seen that the missing data of SM_S were mainly in January–April and November–December for 1 year. In other words, there are temporally continuous data gaps in ESA CCI SM. Moreover, data gaps at three stations showed similar temporal distribution, which means a large area of values in ESA CCI SM is missing during the non-growing season. In summary, there are both temporal and spatial chunks of missing values in ESA CCI SM.
Figure 3

Time series of ESA CCI SM and in situ soil moisture observations of Huolinguole, Alukeerqin and Naiman in 2019.

Figure 3

Time series of ESA CCI SM and in situ soil moisture observations of Huolinguole, Alukeerqin and Naiman in 2019.

Close modal

Bias correction

The cumulative probability distribution function (CDF) matching method is a nonlinear technology, which is commonly used to correct deviations between data sets. The CDF matching method has well-accepted effect in the pixels with large difference between the measured SM of the station and the satellite remote sensing SM (Ji et al. 2020). In this study, the CDF matching method was used to correct deviation between SM_O and SM_S. Thus, the SM_O after bias correction (SM_O_bc) has the value range and cumulative probability distribution the same as or similar to SM_S. The expression of CDF matching technology is as follows:
(1)
where and represent the CDF of SM_S and SM_O, respectively. x represents SM_O and represents SM_O_bc.

The CDF adjustment process is as follows:

  • Draw the CDF curves of SM_S and SM_O, respectively, and divide the curve into ten segments with 0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 as the nodes.

  • Perform segmented linear regression on each segment to obtain the corresponding linear regression equations.

  • Each corresponding linear regression equation is applied to SM_O according to the range of the CDF of SM_O, and the adjusted SM_O_bc is obtained.

DCT-PLS

DCT-PLS was originally used for automatic smoothing of multidimensional incomplete dataset. Therefore, it can fill missing data in spatiotemporal geophysical datasets after adjustment. Penalized least squares regression is a thin plate spline smoothing method for one-dimensional data, which balances the fidelity of the data with the roughness of the average function. Garcia (2010) proved that the penalized least squares regression can be expressed by DCT, which uses the sum of cosine functions oscillating at different frequencies to represent data. Since DCT can be multidimensional, the penalized least squares regression based on DCT can be extended to a multidimensional dataset. The DCT-PLS algorithm is briefly introduced following, and the mathematical details refer to the reference (Garcia 2010).

The objective of the DCT-PLS method is to minimize :
(2)
where X is the spatiotemporal dataset with data missed, W is the binary array of the same size, whose value is 0 at the location of missing values and 1 otherwise. is the Euclidean norm. and represent Laplacian operator and product of elements, respectively. s is a positive scalar controlling smoothness since as s increases the smoothness of also increases. By rewriting Equation (2) with two-dimensional DCT and its inverse transform (IDCT), can be easily calculated as follows:
(3)
where is the three-dimensional filtering tensor:
(4)
where represents the ith element along the jth dimension, and represents the magnitude of X along this dimension.
According to Equations (3) and (4), the DCT-PLS model relies on the selection of the smooth parameter s. Since high s value will lead to the loss of high frequency data, an infinitesimal value (≈0) is needed to reduce the influence of the smooth to fill the missing data. However, too low s value will lead to excessive fitting to original data. Therefore, s value which can produce the best estimate for original data, and can well predict the missing data should be adopted. Such a correct value can be estimated by a generalized cross validation (GCV) method:
(5)
(6)

The optimal s value is automatically selected with the goal of minimizing GCV. In our process of data reconstruction, the minimum GCV value is 0.0041 and the corresponding optimal s value is 6.34810−4.

The DCT-PLS method can make full use of the spatiotemporal variation information of remote sensing data and achieve good reconstruction results when the missing values are randomly and uniformly distributed in original remote sensing data. However, as mentioned in the Data section, SM_S in our study shows a mode of continuously temporal loss. Therefore, it is easy for the reconstructed SM_S using DCT-PLS (SM_DCTPLS) to lose the temporal variation information of SM, especially continuously low values in the non-growing seasons. That is to say, the direct application of DCT-PLS for reconstruction may lead to the overestimation of SM_S, especially in the period when data are missing.

ODCT-PLS

To overcome the shortcoming of DCT-PLS, this study proposes an ODCT-PLS approach to reconstruct SM_S, which can make full use of the time series information of observed SM_O and can achieve good result when there are chunks of missing values temporally or spatially in SM_S. The main steps of ODCT-PLS are as follows: first, the CDF matching method is adopted to match the observation data with remote sensing data. Second, the missing remote sensing data of the pixel where the station located is supplemented with the matched observation data for each station. Finally, the missing data of the remote sensing data with the matched observation data are reconstructed using DCT-PLS and the reconstruction result is called SM_ODCTPLS. The specific procedures are as follows:

  • The CDF matching method was used to establish piecewise linear equations for SM_O and SM_S in the period where data of SM_S are complete.

  • Piecewise linear equation was applied to SM_O in the period when SM_S missed, and the SM_O_bc was added to SM_S to obtain SM_SO.

  • Reconstruct the SM_SO using DCT-PLS.

Evaluation metrics

In order to quantitatively assess the effect of bias correction and data reconstruction, four evaluation metrics were employed: root mean squared error (RMSE), relative bias (BIAS) and correlation coefficient (CORR). The equations for RMSE, BIAS and CORR are given below:
(7)
(8)
(9)
where n is the number of samples; represents SM_O_bc in bias correction evaluation and the estimated ESA CCI SM in gap-filling method evaluation in grid i; represents SM_S in bias correction evaluation and reference SM, original ESA CCI SM or SM_O, in gap-filling method evaluation in grid i; and is the average value of and .
In addition, the universal image quality index (UIQI) is used to quantify the spatial consistency of SM_DCT-PLS and SM_ODCT-PLS with SM_S in the study area. The formula for calculating UIQI of the two images is defined as follows:
(10)
where S and R are the reconstructed remote sensing image and the original remote sensing image at the same time, respectively; and are the mean values of S and R of remote sensing images respectively; and are the variance of S and R, respectively, and is the covariance between S and R. Equation (9) is equal to a product of three terms:
(11)

The value range of the first part on the right side of the equation is [−1,1], which quantified the correlation between S and R. The value range of the middle part is [0,1], which quantified the average value similarity between S and R. The value range of the last part is also [0,1], which quantified the variance similarity between S and R and being the optical value 1 if = . The value range of UIQI is [−1,1], and the closer the value is to 1, the more similar S and R are (Yozgatligil et al. 2013), the better the power of the reconstruction method to save the spatial heterogeneity is.

Performance of bias correction of in situ SM observations

Table 1 shows RMSE, CORR (P < 0.01) and BIAS of the in situ SM observations (SM_O) at all stations before and after deviation correction and remote sensing data during the study period (2013–2020). It can be seen from Table 1 that RMSE after deviation correction is smaller than before deviation correction, and the value is below 0.055. Except for Balihan, Gangzi and other stations whose CORR was greater than 0.5 before correction, the CORR increased after correction, and more than half of the stations had CORR greater than 0.5 after correction. Except for the stations of Naiman, Kezuozhong, Zhalute and Fuhe, the value of BIAS after correction decreased significantly compared with that before correction, and the value was around 0.02, and the phenomenon of ‘underestimation’ of remote sensing data from measured data before correction was significantly improved in Wengniute, Gangzi, Shuangliao, Keshiketeng, Linxi, Balinyou and other stations. In general, the BIAS correction method has a better effect on pixels with large difference between SM_O and SM_S mean values at Huolinguole and Kailu stations, and can significantly reduce RMSE and BIAS between measured data and remote sensing data. This is exactly the purpose of applying the bias correction method to measured data in this study. Therefore, the CDF matching method can meet the requirements of this study to correct the measured data and improve the time series information of remote sensing data.

Table 1

Statistic comparison of SM_O and SM_O_bc against SM_S

Station nameSM_O
SM_O_bc
RMSECORRBIASRMSECORRBIAS
Huolinguole 0.155 0.403 0.115 0.049 0.390 0.021 
Alukeerqin 0.040 0.472 0.020 0.037 0.610 0.019 
Naiman 0.049 0.360 0.000 0.042 0.363 0.018 
Aohan 0.068 0.389 −0.036 0.045 0.406 0.018 
Ningcheng 0.062 0.215 −0.025 0.048 0.251 0.018 
Wengniute 0.081 0.527 −0.074 0.039 0.559 0.019 
Chifeng 0.055 −0.015 0.039 0.053 −0.338 −0.017 
Kailu 0.102 0.250 0.052 0.046 0.311 0.016 
Tongliao 0.050 0.476 −0.017 0.042 0.502 0.017 
Balihan 0.041 0.636 −0.022 0.037 0.627 0.016 
Keerqin 0.065 0.503 −0.049 0.038 0.504 0.017 
Gangzi 0.079 0.676 −0.070 0.033 0.658 0.018 
Shuangliao 0.089 0.424 −0.074 0.049 0.506 0.019 
Keshiketeng 0.071 0.451 −0.064 0.037 0.445 0.018 
Linxi 0.073 0.659 −0.063 0.034 0.641 0.017 
Balinyou 0.087 0.647 −0.080 0.035 0.635 0.018 
Kezuozhong 0.081 0.187 −0.012 0.059 0.285 0.020 
Shebotu 0.058 0.574 0.022 0.044 0.548 0.017 
Balinzuo 0.060 0.759 −0.036 0.032 0.726 0.018 
Zhalute 0.041 0.725 0.008 0.032 0.681 0.013 
Fuhe 0.053 0.387 −0.006 0.040 0.375 0.019 
Bayaertuhushuo 0.113 0.709 −0.108 0.036 0.676 0.017 
Station nameSM_O
SM_O_bc
RMSECORRBIASRMSECORRBIAS
Huolinguole 0.155 0.403 0.115 0.049 0.390 0.021 
Alukeerqin 0.040 0.472 0.020 0.037 0.610 0.019 
Naiman 0.049 0.360 0.000 0.042 0.363 0.018 
Aohan 0.068 0.389 −0.036 0.045 0.406 0.018 
Ningcheng 0.062 0.215 −0.025 0.048 0.251 0.018 
Wengniute 0.081 0.527 −0.074 0.039 0.559 0.019 
Chifeng 0.055 −0.015 0.039 0.053 −0.338 −0.017 
Kailu 0.102 0.250 0.052 0.046 0.311 0.016 
Tongliao 0.050 0.476 −0.017 0.042 0.502 0.017 
Balihan 0.041 0.636 −0.022 0.037 0.627 0.016 
Keerqin 0.065 0.503 −0.049 0.038 0.504 0.017 
Gangzi 0.079 0.676 −0.070 0.033 0.658 0.018 
Shuangliao 0.089 0.424 −0.074 0.049 0.506 0.019 
Keshiketeng 0.071 0.451 −0.064 0.037 0.445 0.018 
Linxi 0.073 0.659 −0.063 0.034 0.641 0.017 
Balinyou 0.087 0.647 −0.080 0.035 0.635 0.018 
Kezuozhong 0.081 0.187 −0.012 0.059 0.285 0.020 
Shebotu 0.058 0.574 0.022 0.044 0.548 0.017 
Balinzuo 0.060 0.759 −0.036 0.032 0.726 0.018 
Zhalute 0.041 0.725 0.008 0.032 0.681 0.013 
Fuhe 0.053 0.387 −0.006 0.040 0.375 0.019 
Bayaertuhushuo 0.113 0.709 −0.108 0.036 0.676 0.017 

Note: p < 0.01.

According to the time series of SM_O, SM_O_bc and SM_S of Huolinguole, Alukeerqin and Naiman stations in 2019 (Figure 4), the difference between SM_O and SM_S has been decreased after bias correction. At the Hollingole station, RMSE between SM_O and SM_S, SM_O_bc and SM_S are 0.08 and 0.04, respectively, indicating that the bias correction reduces the RMSE between observation SM and remote sensing SM; BIAS between SM_O and SM_S, SM_O_bc and SM_S are 0.08 and 0.02, respectively, indicating that the ‘overestimation’ of observation SM for remote sensing SM decreases after bias correction, which can be intuitively seen from the time series of SM at the Huolinguole station (Figure 4); CORR between SM_O and SM_S, SM_O_bc and SM_S are 0.59 and 0.57, respectively, because the correlation between observation SM and remote sensing SM decreases after bias correction due to the data loss caused by piecewise linear fitting. At the Alukeerqin station, CORR between SM_O and SM_S, SM_O_bc and SM_S are 0.02 and 0.03, respectively, indicating that the correlation between observation SM and remote sensing SM is improved after bias correction; BIAS between SM_O and SM_S, SM_O_bc and SM_S are 0.02 and 0.01 and RMSE are 0.02 and 0.03, respectively, indicating that the overestimation of observation SM for remote sensing SM decreases and the average error increases slightly after bias correction. At the Naiman station, CORR between SM_O and SM_S, SM_O_bc and SM_S are 0.35 and 0.35, RMSE are 0.00 and 0.04, BIAS are −0.00 and 0.02, respectively. Among these three stations, the remote sensing SM at the Naiman station has the maximum data integrity and the minimum correlation with observation SM. As can be seen from the time series at the Naiman station, there is positive or negative BIAS between SM_O and SM_S in different time periods, while SM_O_bc synthesize the information of both SM_O and SM_S. Therefore, RMSE and BIAS between SM_O_bc and SM_S is bigger than those between SM_O and SM_S, indicating that bias correction slightly increases the average error between observation SM and remote sensing SM.
Figure 4

Time series of SM_O, SM_O_bc and SM_S of Huolinguole, Alukeerqin and Naiman in 2019.

Figure 4

Time series of SM_O, SM_O_bc and SM_S of Huolinguole, Alukeerqin and Naiman in 2019.

Close modal

Among these three sites, Huolinguole has the smallest ratio of missing data, the largest RMSE and BIAS between SM_O and SM_S, and the best correction effect. In general, the effect of bias correction has little relation with the percentage of missing data, which means the effect of bias correction does not get better with the decrease of the percentage of missing data. On one hand, the establishment of piecewise linear equations and the calculation of evaluation indicators are carried out in the period of non-missing data. On the other hand, for stations with small systematic error and high correlation between SM_O and SM_S, bias correction may introduce additional uncertainties.

Performance of data reconstruction for ESA CCI SM

Performance of data reconstruction against observation SM

Table 2 shows an overall comparison of the estimated SM using DCT-PLS and ODCT-PLS against the observation SM. The CORR (p < 0.01) between SM_ODCT-PLS and SM_O are all higher than corresponding CORR between SM_DCT-PLS and SM_O, illustrating that ODCT-PLS can significantly improve the correlation between the estimated SM and the observation SM. RMSE between SM_ODCT-PLS and SM_O is smaller than that between SM_DCT-PLS and SM_O except at the Gangzi station, illustrating that ODCT-PLS performed better than DCT-PLS in estimating SM accurately. There are 10 sites where the absolute BIAS between SM_DCT-PLS and SM_O is smaller among these 22 sites. It is largely caused by the offset of the overestimation and the underestimation of SM_DCTPLS for SM_O (Figure 5) rather than the better performance of DCT-PLS.
Table 2

Statistics comparison of SM_DCT-PLS and SM_ODCT-PLS against SM_O

Station nameSM_DCT-PLS and SM_O
SM_ODCT-PLS and SM_O
RMSECORRBIASRMSECORRBIAS
Huolinguole 0.124 0.176 −0.064 0.114 0.554 −0.069 
Alukeerqin 0.044 0.083 −0.001 0.032 0.599 −0.010 
Naiman 0.075 0.110 0.034 0.061 0.477 0.028 
Aohan 0.101 0.150 0.072 0.085 0.514 0.063 
Ningcheng 0.100 0.273 0.059 0.100 0.392 0.066 
Wengniute 0.088 0.348 0.081 0.086 0.556 0.080 
Chifeng 0.057 −0.159 −0.029 0.055 0.231 −0.034 
Kailu 0.095 0.019 −0.012 0.081 0.422 −0.020 
Tongliao 0.093 −0.067 0.052 0.064 0.620 0.041 
Balihan 0.048 0.545 0.029 0.048 0.674 0.035 
Keerqin 0.102 0.203 0.079 0.093 0.591 0.078 
Gangzi 0.092 0.607 0.083 0.095 0.694 0.088 
Shuangliao 0.108 0.299 0.092 0.108 0.511 0.097 
Keshiketeng 0.094 0.069 0.086 0.082 0.556 0.077 
Linxi 0.112 0.144 0.097 0.092 0.680 0.083 
Balinyou 0.124 0.152 0.112 0.103 0.689 0.097 
Kezuozhong 0.100 0.108 0.045 0.094 0.355 0.051 
Shebotu 0.067 0.195 0.007 0.046 0.664 −0.008 
Balinzuo 0.097 0.351 0.075 0.078 0.741 0.063 
Zhalute 0.054 0.360 0.014 0.037 0.719 0.005 
Fuhe 0.093 −0.133 0.052 0.070 0.533 0.041 
Bayaertuhushuo 0.136 0.392 0.129 0.127 0.678 0.122 
Station nameSM_DCT-PLS and SM_O
SM_ODCT-PLS and SM_O
RMSECORRBIASRMSECORRBIAS
Huolinguole 0.124 0.176 −0.064 0.114 0.554 −0.069 
Alukeerqin 0.044 0.083 −0.001 0.032 0.599 −0.010 
Naiman 0.075 0.110 0.034 0.061 0.477 0.028 
Aohan 0.101 0.150 0.072 0.085 0.514 0.063 
Ningcheng 0.100 0.273 0.059 0.100 0.392 0.066 
Wengniute 0.088 0.348 0.081 0.086 0.556 0.080 
Chifeng 0.057 −0.159 −0.029 0.055 0.231 −0.034 
Kailu 0.095 0.019 −0.012 0.081 0.422 −0.020 
Tongliao 0.093 −0.067 0.052 0.064 0.620 0.041 
Balihan 0.048 0.545 0.029 0.048 0.674 0.035 
Keerqin 0.102 0.203 0.079 0.093 0.591 0.078 
Gangzi 0.092 0.607 0.083 0.095 0.694 0.088 
Shuangliao 0.108 0.299 0.092 0.108 0.511 0.097 
Keshiketeng 0.094 0.069 0.086 0.082 0.556 0.077 
Linxi 0.112 0.144 0.097 0.092 0.680 0.083 
Balinyou 0.124 0.152 0.112 0.103 0.689 0.097 
Kezuozhong 0.100 0.108 0.045 0.094 0.355 0.051 
Shebotu 0.067 0.195 0.007 0.046 0.664 −0.008 
Balinzuo 0.097 0.351 0.075 0.078 0.741 0.063 
Zhalute 0.054 0.360 0.014 0.037 0.719 0.005 
Fuhe 0.093 −0.133 0.052 0.070 0.533 0.041 
Bayaertuhushuo 0.136 0.392 0.129 0.127 0.678 0.122 

Note: p < 0.01.

Figure 5

Time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019.

Figure 5

Time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019.

Close modal

From the time series of SM_DCT-PLS, SM_ODCT-PLS and SM_O of Huolinguole, Alukeerqin and Naiman in 2019 (Figure 5), all the CORR between SM_ODCT-PLS and SM_O are above 0.55 and higher than that of SM_DCT-PLS and SM_O, indicating that SM_ODCT-PLS is more related with SM_O than SM_DCT-PLS. Due to the continuous loss of SM_S in January, February and December of 2019 (Figure 3), DCT-PLS would lead to the overestimation for SM since the reconstruction is on the basis of the remaining data which is high in one year. Unlikely, SM_ODCT-PLS integrated the information of time series of both SM_S and SM_O, so that the reconstruction error caused by time-continuously missing data was avoided effectively by using more complete and valid data.

The above analyses show that, SM_ODCT-PLS has smaller mean error and stronger correlation with SM_O compared with SM_DCT-PLS. Therefore, ODCT-PLS can not only retain the spatiotemporal three-dimensional information of remote sensing data, but also could reduce the phenomenon of ‘overestimation’ for the temporally continuous missing data. By the way, chunks of data gaps of remote sensing data will lead to overestimation or underestimation, depending on the characteristics of the geographical phenomenon and the periods or regions where it is missed.

Performance of data reconstruction against remote sensing SM

Figure 6 shows the spatial distribution of SM_S, SM_DCT-PLS and SM_ODCT-PLS on July 31, 2019. It illustrates the original ESA CCI SM are between 0.1001 and 0.2507. SM_DCT-PLS and SM_ODCT-PLS are between 0.1002 and 0.2500 and their UIQI is greater than 0.99. The three images showed a spatial trend of increasing SM from northwest to southeast. Figure 7 shows the box plots of the UIQI of SM_DCT-PLS and SM_ODCT-PLS against SM_S from 2013 to 2020, showing that the UIQI of SM_DCT-PLS and SM_ODCT-PLS were both greater than 0.99, indicating that DCT-PLS and ODCT-PLS can retain the statistical characteristics and spatial variation trend of the original remote sensing image well.
Figure 6

Spatial distribution of (a) SM_S, (b) SM_DCT-PLS and (c) SM_ODCT-PLS on July 31, 2019.

Figure 6

Spatial distribution of (a) SM_S, (b) SM_DCT-PLS and (c) SM_ODCT-PLS on July 31, 2019.

Close modal
Figure 7

Box plots of the UIQI of SM_DCT-PLS and SM_ODCT-PLS against SM_S from 2013 to 2020.

Figure 7

Box plots of the UIQI of SM_DCT-PLS and SM_ODCT-PLS against SM_S from 2013 to 2020.

Close modal
Figure 8 shows the spatial distribution of RMSE, CORR and BIAS of DCT-PLS and ODCT-PLS against SM_S. The RMSE and absolute BIAS are both less than 0.001 while CORR is greater than 0.99. DCT-PLS and ODCT-PLS has both been able to make full use of the three-dimensional information of the original remote sensing data, and ODCT-PLS did not perform better in improving the correlation between the estimated SM and SM_S compared with DCT-PLS.
Figure 8

Spatial distribution of RMSE, CORR and BIAS of SM_DCT-PLS (left) and SM_ODCT-PLS (right) against SM_S.

Figure 8

Spatial distribution of RMSE, CORR and BIAS of SM_DCT-PLS (left) and SM_ODCT-PLS (right) against SM_S.

Close modal

Performance analysis of ODCT-PLS

Previous study showed that the pattern of data gap affects the performance of reconstruction (Kong et al. 2014). As one of the optimization estimation methods dependent on the time and space information of the original data, DCT-PLS performs well when reconstructing large three-dimensional geophysical data whose missing data is distributed uniformly (Zhang et al. 2016) or in a single dimension (Fredj et al. 2016). However, the effect of DCT-PLS is greatly reduced when there is a large chunk of missing data in the original data to be reconstructed (Kong et al. 2014; Liu et al. 2020). For instance, Liu et al. (2020) evaluated trend differences of LST time series before and after reconstruction for MODIS LST data sets using DCT-PLS, and found that there were obvious trend differences in areas with more missing data, indicating that a large number of data missing may lead to obvious trend uncertainties such as overestimation or underestimation. In this study, ESA CCI SM in the Xiliaohe River Basin has a total data loss in the study area in a continuous period. Since the missing values were concentrated in January, February and December when SM was low in the Xiliaohe River Basin, SM_DCT-PLS will overestimate the actual SM due to the influence of the non-missing data before and after the period where data were missing. By the way, it is uncertain that how the estimated values using DCT-PLS would deviate the actual values, overestimate or underestimate, depending on the type of the geographical event, the period of missing values and the region in a specific study.

An ODCT-PLS approach can avoid the overestimation or underestimation effectively by using matched observation data (SM_O_bc) to supplement the time series information of remote sensing data. The estimated SM using ODCT-PLS has smaller deviation and greater correlation with the observation SM at stations compared to those estimated by DCT-PLS. The SM images reconstructed by DCT-PLS and ODCT-PLS have strong spatial consistency with remote sensing SM images, suggesting that they can both retain well the spatial heterogeneity of SM. The results illustrated that ODCT-PLS approach overcame the limitation of DCT-PLS when there are large chunks of missing data in the original remote sensing data, so it can achieve good reconstruction results even for the remote sensing data with continuous missing data in space and time.

Possible influence of the temporal and spatial distribution of observation data on the effect of ODCT-PLS

Yang et al. (2018) reconstructed the SM data obtained by inversion of Fengyun 3B remote sensing data using different methods and found that DCT-PLS could only get good filling results in homogeneous areas. Fredj et al. (2016) used DCT-PLS to fill the gap of coastal ocean surface current under different data missing scenarios, and found that the validity of DCT-PLS depends on the characteristics of surrounding ocean currents. In other word, the validity of DCT-PLS depends on the spatiotemporal representativeness of original data sets. Perhaps the better method is to take the ‘temporal and spatial distribution patterns’ (Gong & Si 2013; Yi 2013; Tan et al. 2020) of geographical data as a priori information for the interpolation or reconstruction of remote sensing data.

In this study, the innovation of the proposed ODCT-PLS approach for gap-filling of the three-dimensional remote sensing data mainly lies in the application of in situ SM observations and the improvement of the DCT-PLS method. In previous studies, measured data are generally used as the reference data for the bias correction of remote sensing data, whereas in this study the measured data is matched to the remote sensing data and used as a supplement to provide some temporal and spatial distribution information. That is to say, the effectiveness of ODCT-PLS also depends on the temporal integrity and spatial representativeness of the in situ observations. Temporal integrity ensures the piecewise linear equation establishment and piecewise linear matching of the observation data in the non-missing period of remote sensing data, and spatial representativeness ensures the reconstruction effect in the period with all original remote sensing data in the study area missing. In other words, the great performance of ODCT-PLS requires that the observation sites be distributed randomly or distributed uniformly according to soil types in the study area. Moreover, the ODCT-PLS approach performs well only if the in situ observations are temporally-complete, especially when the missing data gaps exist in remote sensing data.

This study proposes an ODCT-PLS approach to reconstruct the ESA CCI SM data which uses observed SM data to supplement the time series information of remote sensing data. The CDF matching method is used to reduce the systematic deviation between the site-measured SM and the remotely sensed SM. The ODCT-PLS approach used matched site observed SM data to supplement the time series information of remote sensing SM. The result of the reconstruction for the ESA CCI SM data in the Xiliaohe River Basin from 2013 to 2020 showed that the SM reconstructed by ODCT-PLS was in better agreement with in situ SM observations compared with that reconstructed by DCT-PLS, with the average CORR increasing by 0.3636, the average RMSE decreasing by 0.0109 m3/m3 and the average BIAS decreasing by 0.0047 m3/m3. Compared with the original ESA CCI SM, DCT-PLS and ODCT-PLS can both restore the spatial variation of SM in study area. The study demonstrated the ODCT-PLS approach can better retain the spatiotemporal information of remote sensing data and can also reduce the overestimation in gap-filling caused by the continuous lack of remote sensing data. It can be a valuable alternative to reconstruct the three-dimensional geophysical dataset with spatially or temporally continuous data gap.

We are grateful for the financial support provided by the National Key Research and Development Program of China (No. 2019YFC1510601) and the National Natural Science of China (No. 42071040 and No. U2243203).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Albergel
C.
,
De Rosnay
P.
,
Gruhier
C.
,
Munoz-Sabater
J.
,
Hasenauer
S.
,
Isaksen
L.
,
Kerr
Y.
&
Wagner
W.
2012
Evaluation of remotely sensed and modelled soil moisture products using global ground-based in situ observations
.
Remote Sensing of Environment
118
,
215
226
.
Dembele
M.
,
Oriani
F.
,
Tumbulto
J.
,
Mariethoz
G.
&
Schaefli
B.
2019
Gap-filling of daily streamflow time series using Direct Sampling in various hydroclimatic settings
.
Journal of Hydrology
569
,
573
586
.
Dirmeyer
P. A.
,
Guo
Z. C.
&
Gao
X.
2004
Comparison, validation, and transferability of eight multiyear global soil wetness products
.
Journal of Hydrometeorology
5
(
6
),
1011
1033
.
Dobriyal
P.
,
Qureshi
A.
,
Badola
R.
&
Hussain
S. A.
2012
A review of the methods available for estimating soil moisture and its implications for water resource management
.
Journal of Hydrology
458
,
110
117
.
Dorigo
W.
,
Wagner
W.
,
Albergel
C.
,
Albrecht
F.
,
Balsamo
G.
,
Brocca
L.
,
Chung
D.
,
Ertl
M.
,
Forkel
M.
,
Gruber
A.
,
Haas
E.
,
Hamer
P. D.
,
Hirschi
M.
,
Ikonen
J.
,
De Jeu
R.
,
Kidd
R.
,
Lahoz
W.
,
Liu
Y. Y.
,
Miralles
D.
,
Mistelbauer
T.
,
Nicolai-Shaw
N.
,
Parinussa
R.
,
Pratola
C.
,
Reimer
C.
,
Van Der Schalie
R.
,
Seneviratne
S. I.
,
Smolander
T.
&
Lecomte
P.
2017
ESA CCI soil moisture for improved earth system understanding: state-of-the art and future directions
.
Remote Sensing of Environment
203
,
185
215
.
Fan
Z.
,
Li
J.
&
Deng
M.
2016
An adaptive inverse-distance weighting spatial interpolation method with the consideration of multiple factors
.
Geomatics and Information Science of Wuhan University
41
(
6
),
842
847
.
Fan
Y.
,
Qiu
J.
,
Dong
J.
,
Zhang
X.
&
Wang
D.
2020
Error characteristics of microwave soil moisture products based on triple collocation and its spatial-temporal pattern
.
Remote Sensing Technology and Application
35
(
1
),
85
96
.
Fredj
E.
,
Roarty
H.
,
Kohut
J.
&
Lai
J.
2016
Fast gap filling of the coastal ocean surface current in the seas around Taiwan
. In:
OCEANS Conference
.
Garcia
D.
2010
Robust smoothing of gridded data in one and higher dimensions with missing values
.
Computational Statistics & Data Analysis
54
(
4
),
1167
1178
.
Gardner
E.
2006
Exponential smoothing: the state of the art – part II
.
International Journal of Forecasting
22
(
4
),
637
666
.
Gong
X.
&
Si
Y.
2013
Comparison of subsequence pattern matching methods for financial time series
. In:
2013 9th International Conference on Computational Intelligence and Security (CIS)
. pp.
154
158
.
Han
X.
,
Duan
S.
,
Tang
R.
,
Liu
H.
&
Li
Z.
2015
Comparison of AMSR-E soil moisture product and ground-based measurement over agricultural areas in China
. In
2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
. pp.
673
676
.
Ji
X.
,
Li
Y.
,
Luo
X.
,
He
D.
,
Guo
R.
,
Wang
J.
,
Bai
Y.
,
Yue
C.
&
Liu
C.
2020
Evaluation of bias correction methods for APHRODITE data to improve hydrologic simulation in a large Himalayan basin
.
Atmospheric Research
242
, 104964.
Jin
H.
&
Henderson
B.
2011
Towards a daily soil moisture product based on incomplete time series observations of two satellites
. In:
19th International Congress on Modelling and Simulation (MODSIM2011)
. pp.
1959
1965
.
Kong
L. H.
,
Xia
M. Y.
,
Liu
X. Y.
,
Chen
G. S.
,
Gu
Y.
,
Wu
M. Y.
&
Liu
X.
2014
Data loss and reconstruction in wireless sensor networks
.
Ieee Transactions on Parallel and Distributed Systems
25
(
11
),
2818
2828
.
Liu
Y. Y.
,
Parinussa
R. M.
,
Dorigo
W. A.
,
De Jeu
R. a. M.
,
Wagner
W.
,
Van Dijk
A. I. J. M.
,
Mccabe
M. F.
&
Evans
J. P.
2011
Developing an improved soil moisture dataset by blending passive and active microwave satellite-based retrievals
.
Hydrology and Earth System Sciences
15
(
2
),
425
436
.
Liu
H.
,
Lu
N.
,
Jiang
H.
,
Qin
J.
&
Yao
L.
2020
Filling gaps of monthly terra/MODIS daytime land surface temperature using discrete cosine transform method
.
Remote Sensing
12
(
3
), 361.
Llamas
R. M.
,
Guevara
M.
,
Rorabaugh
D.
,
Taufer
M.
&
Vargas
R.
2020
Spatial Gap-Filling of ESA CCI satellite-derived soil moisture based on geostatistical techniques and multiple regression
.
Remote Sensing
12
(
4
), 665.
Long
D.
,
Bai
L.
,
Yan
L.
,
Zhang
C.
,
Yang
W.
,
Lei
H.
,
Quan
J.
,
Meng
X.
&
Shi
C.
2019
Generation of spatially complete and daily continuous surface soil moisture of high spatial resolution
.
Remote Sensing of Environment
233
, 111364.
Moran
M. S.
,
Peters-Lidard
C. D.
,
Watts
J. M.
&
Mcelroy
S.
2004
Estimating soil moisture at the watershed scale with satellite-based radar and land surface models
.
Canadian Journal of Remote Sensing
30
(
5
),
805
826
.
Pesquer
L.
,
Cortes
A.
&
Pons
X.
2011
Parallel ordinary kriging interpolation incorporating automatic variogram fitting
.
Computers & Geosciences
37
(
4
),
464
473
.
Tan
Z.
,
Wang
Z.
&
Hu
H.
2020
Research on trend similarity measurement method of time series
.
Computer Engineering and Application
56
(
10
),
94
99
.
Wanders
N.
,
Karssenberg
D.
,
De Roo
A.
,
De Jong
S. M.
&
Bierkens
M. F. P.
2014
The suitability of remotely sensed soil moisture for improving operational flood forecasting
.
Hydrology and Earth System Sciences
18
(
6
),
2343
2357
.
Wang
G.
,
Garcia
D.
,
Liu
Y.
,
De Jeu
R.
&
Dolman
A. J.
2012
A three-dimensional gap filling method for large geophysical datasets: application to global satellite soil moisture observations
.
Environmental Modelling & Software
30
,
139
142
.
Wang
C.
,
Fu
B.
,
Zhang
L.
&
Xu
Z.
2019a
Soil moisture-plant interactions: an ecohydrological review
.
Journal of Soils and Sediments
19
(
1
),
1
9
.
Xu
X.
&
Frey
S.
2021
Validation of SMOS, SMAP, and ESA CCI soil moisture over a Humid Region
.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
14
,
10784
10793
.
Xu
C.
,
Wang
J.
,
Hu
M.
&
Li
Q.
2013
Interpolation of missing temperature data at meteorological stations using P-BSHADE
.
Journal of Climate
26
(
19
),
7452
7463
.
Yan
J.
,
Duan
X.
,
Zheng
W.
,
Liu
Y.
,
Deng
Y.
&
Hu
Z.
2020
An adaptive IDW algorithm involving spatial heterogeneity
.
Geomatics and Information Science of Wuhan University
45
(
1
),
97
104
.
Yang
X.
,
Zhang
C.
,
Cui
Z.
,
Yu
F.
,
Wang
J.
&
Han
Y.
2018
Filling method for soil moisture based on BP neural network
.
Journal of Applied Remote Sensing
12
(
4
), 042806.
Yi
L.
2013
Approach to analyze time series similarity pattern mining based on Haar
.
Intelligence Computation and Evolutionary Computation
180
,
287
296
.
Yozgatligil
C.
,
Aslan
S.
,
Iyigun
C.
&
Batmaz
I.
2013
Comparison of missing value imputation methods in time series: the case of Turkish meteorological data
.
Theoretical and Applied Climatology
112
(
1–2
),
143
167
.
Yue
J.
,
Tian
J.
,
Tian
Q.
,
Xu
K.
&
Xu
N.
2019
Development of soil moisture indices from differences in water absorption between shortwave-infrared bands
.
ISPRS Journal of Photogrammetry and Remote Sensing
154
,
216
230
.
Zeng
J.
,
Li
Z.
,
Chen
Q.
,
Bi
H.
,
Qiu
J.
&
Zou
P.
2015
Evaluation of remotely sensed and reanalysis soil moisture products over the Tibetan Plateau using
in situ
 observations
.
Remote Sensing of Environment
163
,
91
110
.
Zhang
G.
,
Hao
Z.
,
Zhu
S.
,
Zhou
C.
&
Hua
J.
2016
Missing data reconstruction and evaluation of retrieval precision for AMSR2 soil moisture
.
Transactions of the Chinese Society of Agricultural Engineering
32
(
20
),
137
143
.
Zhang
L.
,
Liu
Y.
,
Ren
L.
,
Teuling
A. J.
,
Zhang
X.
,
Jiang
S.
,
Yang
X.
,
Wei
L.
,
Zhong
F.
&
Zheng
L.
2021a
Reconstruction of ESA CCI satellite-derived soil moisture using an artificial neural network technology
.
Science of the Total Environment
782
, 146602.
Zhang
Q.
,
Yuan
Q.
,
Li
J.
,
Wang
Y.
,
Sun
F.
&
Zhang
L.
2021b
Generating seamless global daily AMSR2 soil moisture (SGD-SM) long-term products for the years 2013–2019
.
Earth System Science Data
13
(
3
),
1385
1401
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).