To better understand the discrepancies in evapotranspiration (ET) simulations between ET models, we intercompared four models in China: Priestley–Taylor Jet Propulsion Laboratory (PT-JPL), Penman–Montieth–Leuning Version 2 (PML-V2), Sigmoid Generalized Complementary Function (SGCF), Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC). Data from 18 flux sites were used to evaluate the model performance at daytime (when incident shortwave radiation is greater than 20 W/m2) scales. To compare more fairly, we took the intersection of the outputs from four models for the analyses in the main text. All models yielded acceptable results, with PML-V2 or SGCF performing best at most sites. The average coefficient of determination and root mean square error among all sites of LE (latent heat of ET) were 0.72 and 51.71 W/m2 for PT-JPL, 0.80 and 46.65 W/m2 for PML-V2, 0.79 and 41.13 W/m2 for SGCF, 0.70 and 51.09 W/m2 for METRIC. PT-JPL and PML-V2 underestimated ET at most sites, whereas SGCF overestimated, potentially due to uncertainties in the vegetation indices and ET constraint parameters. Compared to measurements, PT-JPL underestimated the proportion of transpiration to evapotranspiration (0.81 versus 0.59), while PML-V2 overestimated (0.81 versus 0.90). Furthermore, all models performed best in the semi-arid zone dominated by grassland sites.

  • This study evaluated the performance of four ET models that are seldomly compared together at 18 flux sites.

  • The flux sites are distributed across various biomes and climate zones in China.

  • All models yielded acceptable results, with PML-V2 and SGCF outperforming the others.

  • The ET partitioning results of PT-JPL and PML-V2 were quite different.

  • All models performed the best in the semi-arid zone.

Evapotranspiration (ET), which mainly consists of soil evaporation (E) and vegetation transpiration (T), is a vital element in the soil–vegetation–atmosphere continuum, linking the water cycle, energy cycle, and carbon cycle. Accurate measurements and prediction of ET are important for drought and climate change monitoring, agricultural practices, water management, and weather forecasting (Chen et al. 2014; Fisher et al. 2017). However, ET is recognized as one of the most challenging components in the hydrologic cycle to be measured and simulated due to its complicated dependence on subsurface and climatic conditions. ET measurements can be obtained from micro-meteorological methods, and eddy covariance (EC) systems have been widely used over the past three decades. In addition, the development of remote sensing technology has also promoted the application of some ET methods since various land surface variables (e.g. leaf area index (LAI), normalized difference vegetation index (NDVI), and land surface temperature (LST)) retrieved from optical–thermal satellites can be used to drive ET models (Melo et al. 2021; Bai 2023).

ET estimation involves the complexity of the subsurface and the uncertainty of measuring at different spatial and temporal scales. Based on different assumptions and theories, six main types of ET models have been developed and applied to estimate ET (Hu et al. 2018; Liu et al. 2022): (1) combination-type models; (2) radiation-based models; (3) surface energy balance models; (4) vegetation index (VI)-radiative surface temperature models; (5) complementary relationship models; (6) data-driven models. Combination-type models are mainly based on the Penman–Monteith (PM) approach (Monteith 1965), which combines aerodynamic theories and energy balance principles. The representative models are Penman–Montieth–Leuning (PML) and improved PML models (e.g. PML-V2) (Leuning et al. 2008; Zhang et al. 2019), which focus on the physical quantification of canopy conductance and have great biophysical implications. Radiation-based models are mainly the simplifications of the PM formula, requiring relatively less meteorological and radiometric information to estimate ET (Priestley & Taylor 1972). Priestley–Taylor Jet Propulsion Laboratory (PT-JPL) is a typical radiation-based model, scaling potential ET to actual ET by parametrically constraining the ET components (Fisher et al. 2008). Surface energy balance models are concentrated on calculating the sensible heat flux, with the latent heat flux obtained as a residual term in the energy balance. Mapping ET at High Resolution with Internalized Calibration (METRIC) and Two Source Energy Balance (TSEB) are representatives of single-source energy balance and TSEB models, respectively (Kustas & Norman 1999; Allen et al. 2007). Unlike METRIC, TSEB characterizes the energy balance in the canopy and soil layers separately (Jingyi et al. 2024). VI-radiative surface temperature models are built based on the PT equation and the triangular or trapezoidal spatial relationship between VI and LST. One typical model is the Trapezoid Interpolation Model (TIM), obtaining actual ET by using the trapezoidal spatial relationship between LST and VI to adjust the PT-based potential ET algorithm. It is simple and has advantages in estimating ET over large areas (Jiang & Islam 2001). Complementary relationship models originally describe the complementary relationship between actual ET, apparent potential ET, and potential ET. Various forms of complementary relationships have been developed now, and all of them have been proven to be effective methods that do not require a comprehensive understanding of surface properties. For example, the Advection-Aridity approach describing linear complementary relationships has been widely used since its simplicity, and methods describing non-linear complementary relationships are also increasingly accepted (e.g. Sigmoid Generalized Complementary Function (SGCF)) (Brutsaert 1979; Han & Tian 2018). Data-driven models consider fewer physical mechanisms, directly associating ET with a variety of forcing data. Random forests, artificial neural networks, and so many other machine learning methods have been widely used now (Hu et al. 2021).

All models mentioned above differ in terms of parameterization, complexity of structure, and level of input data, leading to significant differences in model simulation results (Chen et al. 2014; Yang et al. 2016). Therefore, to select the appropriate ET models meeting different research objectives, it is necessary to understand how different models perform in specific land cover types and climatic regions, and model comparison is the most direct and effective way to do this (Bai 2023). For example, Ershadi et al. (2014) compared four ET models based on 20 FLUXNET towers, finding that none of the models outperformed the others under various biomes and that the ensemble average of the models produced the best overall estimates of ET across the towers. Yan & Gao (2021) compared five complementary relationship-based models and found that the structurally simple model was superior in the absence of calibration. Hu et al. (2021) made a comparison between physical-based, data-driven, and hybrid modeling approaches at 49 flux sites worldwide, concluding that data-driven models outperformed the others. Peddinti & Kisekka (2022) evaluated three surface energy balance models using flux site data and found that the two-source model performed better than the single-source models. Although a considerable amount of model comparison work has been undertaken, there are still some drawbacks to intercomparison and assessment of model performance. First, the existing work focuses on comparisons between models of the same type or variants of the same model (Yang et al. 2015; Bhattarai et al. 2016; Yan & Gao 2021; Jiang & Liu 2022; Peddinti & Kisekka 2022). Second, the ET models are usually assessed using flux sites located in America or Europe (Ershadi et al. 2014; Bhattarai et al. 2016; Chen et al. 2020; Bu et al. 2021; Hu et al. 2021; Melo et al. 2021).

In this study, we intercompared four models (PT-JPL, PML-V2, SGCF, and METRIC) that are seldomly compared together at 18 flux sites located in China, and the main aims of this study are to (1) assess the capacity of models to characterize seasonal variation; (2) evaluate the ability of multi-source models (PT-JPL and PML-V2) partitioning ET; (3) compare model performance across ecosystems and climate regions.

ET models

The four models are PT-JPL, PML-V2, SGCF, and METRIC. PT-JPL is a radiation-based model proposed by Fisher et al. (2008) based on the PT equation (Priestley & Taylor 1972). It uses ecophysiological, atmospheric, and soil constraints to scale the potential LE to actual LE, having a relatively simple structure and low input requirements. PML-V2 (Gan et al. 2018; Zhang et al. 2019) is a combination-type model based on the PM approach (Monteith 1965), combining aerodynamic theories and energy balance principles. It partitions ET into three components and integrates a canopy conductance model that couples carbon and water fluxes, having a robust physical foundation. SGCF (Han et al. 2012; Han & Tian 2018) is a complementary relationship-based model that describes the relationship between the ratio of actual ET to Penman potential ET () and the proportion of the radiation term in (). It does not require a comprehensive understanding of the surface properties, but only the input of meteorological data. METRIC (Allen et al. 2007) is a surface energy balance model, internally calibrated by ground-based reference ET (), and calculates latent heat flux as a residual of the surface energy balance, relying on the input of thermal infrared imagery and optical imagery of vegetation. Since these four models are typical of their respective categories and few studies have combined them for model comparisons, we selected them for our study. More detailed information and formulas can be found below.

PT-JPL model

Total ET is divided into canopy transpiration (, W/m2), soil evaporation (, W/m2) and interception evaporation (, W/m2) as:
(1a)
(1b)
(1c)
(1d)
where is the latent heat flux (W/m2), L is the latent heat evaporation coefficient (2.49 × 106 W/ m2/mm), is net absorbed radiation (W/m2), is the net radiation absorbed by the canopy (W/m2), is the net radiation absorbed by the soil (W/m2), and G is the soil heat flux (W/m2). (The subscript p is to distinguish from the PT coefficient in the SGCF) is an empirical coefficient to indicate the evaporation characteristics under different environmental conditions, initially set constant at 1.26, and needs to be calibrated at each site (Zhang et al. 2017). is the slope of the curve relating saturation water vapor pressure to temperature (kPa/K), and is the psychrometric constant (kPa/K). is the relative surface wetness, is the green fraction of the canopy that is actively transpiring, is plant temperature constraint, is plant moisture constraint, is the empirical factor used as a proxy for soil water stress. These constraint parameters are defined as:
(1e)
(1f)
(1g)
(1h)
(1i)
(1j)
(1k)
where is relative humidity, is the fraction of photosynthesis active radiation () absorbed by green vegetation cover, is the fraction of PAR intercepted by green vegetation cover, is the factional total vegetation cover, is the enhanced vegetation index, is the optimum plant growth temperature (°C), and is the water vapor pressure deficit (kPa). Among them, , , , , and are factors to be calibrated.

PML-V2 model

The PML-V2 expressed and as:
(2a)
where and are the available energy absorbed by the canopy and soil respectively (W/m2), is the density of air (1.25 kg/m3), is the specific heat of air at constant pressure (1,005 J/kg/K), is the water vapor pressure deficit of the air (kPa), is the aerodynamic conductance (m/s). The expression for is as follows:
(2b)
(2c)
where k is the extinction coefficient of visible radiation, is the initial slope of the light response curve to assimilation rate (μmol CO2 (μmol PAR)−1), is the initial slope of the CO2 response curve to assimilation rate (μmol m−2 s−1 (μmol m−2 s−1)−1), is the flux density of PAR (μmol m−2 s−1), is atmospheric CO2 concentration (μmol mol−1), fixed with 380 μmol mol−1, is the maximum photosynthetic rate obtained when both and are saturated (μmol m−2 s−1), is the maximum catalytic capacity of Rubisco per unit leaf area (, μmol m−2 s−1):
(2d)
where is the value of when , is temperature. a, and b are temperature coefficients given as 0.031 and 0.115, respectively.
is the ratio of soil evaporation to the equilibrium rate corresponding to the energy absorbed at the soil surface, originally set as a constant (Leuning et al. 2008). Here, we used the parameterization scheme that expresses the f as the ratio of 32 days of antecedent precipitation and soil equilibrium evaporation (Zhang et al. 2010; Gan et al. 2018):
(2e)
where is soil equilibrium evaporation (mm/day), P is precipitation (mm/day).
PML-V2 estimates using the modified rainfall interception model of Gash (van Dijk & Bruijnzeel 2001), which can be expressed as:
(2f)
(2g)
(2h)
where is the ratio of average evaporation rate over average precipitation intensity storms (unitless), is the fractional area covered by intercepting leaves (unitless), P is the daily precipitation (mm d−1), is the reference threshold rainfall amount if the canopy is wet (mm d−1), is the canopy rainfall storage capacity (mm d−1), is the water storage capacity per unit leaf area (mm), and is the specific ratio of average evaporation rate over average rainfall intensity during storms per unit canopy cover (unitless).

SGCF model

Han & Tian (2018) updated the minimum and maximum limits of based on the work of Han et al. (2012) and the derived boundary conditions. The SGCF can be written as:
(3a)
(3b)
(3c)
(3d)
(3e)
where and are maximum and minimum boundaries respectively. and have been suggested for a daily time scale since the function and model results are not sensitive to the and (Zhou et al. 2020). is a variable that corresponds to . and b are the PT coefficient (The subscript s is to distinguish from the parameter in the PT-JPL) and asymmetric parameter separately, both of which need to be adjusted under different environmental conditions (Han & Tian 2018; Wang et al. 2022).

METRIC model

The main formulas in METRIC are as follows:
(4a)
(4b)
(4c)
where H is the sensible heat flux (W/m2), is surface albedo, is incoming shortwave radiation (W/m2), and are incoming and outcoming longwave radiation (W/m2), is broadband surface thermal emissivity, and is aerodynamic resistance (s/m).
The model assumes a linear relationship between near-surface temperature gradients () and to reduce the errors caused by uncertainty in temperature measurement and definition:
(4d)
where a and b are parameters empirically determined in the image based on the different energy distributions of the selected cold and hot pixels. The cold pixel is the pixel with high vegetation cover and adequate irrigation, having the lowest and highest NDVI, while the hot pixel is the pixel in bare soil or desert areas with the highest and lowest NDVI. The exhaustive search algorithm (ESA) introduced by Bhattarai et al. (2017) was used for selecting cold and hot pixels, which can capture the best possible case for cold and hot pixels by not setting the search range before the start of the search process.

In this study, the code for all models was written in Python, and the code for the METRIC was derived from the open-source version pyMETRIC (https://github.com/hectornieto/pyMETRIC) built by Héctor Nieto.

Tower-based in situ measurements

To verify the performance of the four models in China, data from 18 flux sites were obtained from FLUXNET (https://fluxnet.org/data) (Pastorello et al. 2020) and the National Tibetan Plateau Science Data Centre (https://data.tpdc.ac.cn) (Liu et al. 2011, 2013, 2018; Jia et al. 2012; Xu et al. 2015; Guo et al. 2020; Xu & Liu 2021), covering eight ecosystems, including barrens (BAR), grasslands (GRA), croplands (CRO), wetlands (WET), deciduous broadleaf forests (DBF), evergreen needle forests (ENF), evergreen broadleaf forests (EBF) and mixed forests (MF). Based on the dryness (defined as the ratio of potential evaporation to precipitation and characterized by the dryness index ), China is divided into arid (), semi-arid (), semi-humid (), and humid zones () (https://www.resdc.cn), then the sites were categorized into four types of sites belonging to arid, semi-arid, semi-humid, and humid zones according to their geographic locations. The distribution and specific information of sites and climate zones are shown in Supplementary material, Figure S1 and Table 1. The time series variation of precipitation and LAI for each site is illustrated in Figure 1 and Supplementary material, Figure S2.
Table 1

The description of 18 flux sites used in this study

SiteLand coveraLongitude (°N)Latitude (°E)Elevation (m)EC height (m)DurationClimate types
HZZ BAR 100.32 38.77 1,731 2.85 2015–2021 Arid 
DM CRO 100.37 38.86 1,556 4.5 2015–2021 Arid 
HL CRO 115.79 40.36 480 3.5 2013–2019 Semi-humid 
MY CRO 117.32 40.63 350 10.66 2008–2010 Semi-humid 
YIK CRO 100.41 38.86 1,519 2.81 2007–2011 Arid 
HHL DBF 101.13 41.99 874 22 2014–2021 Arid 
SDQ DBF 101.14 42.00 873 2014–2021 Arid 
DHSb EBF 112.54 23.17 300 28 2003–2005 Humid 
QYZb ENF 115.06 26.74 111 39 2003–2005 Humid 
CBSb MF 128.10 42.40 738 30 2003–2005 Humid 
AR GRA 100.46 38.05 3,033 3.5 2014–2021 Semi-arid 
CL GRA 123.51 44.58 171 2007–2010 Semi-arid 
DXb GRA 91.07 30.50 4,333 2.2 2004–2005 Semi-arid 
DSL GRA 98.94 38.84 3,739 4.5 2014–2021 Semi-arid 
DLb GRA 116.28 42.05 1,350 2009–2010 Semi-arid 
HBA GRA 101.18 37.37 3,160 2.2 2002–2004 Semi-arid 
YAK GRA 100.24 38.01 4,148 3.2 2017–2021 Semi-arid 
HBSb WET 101.33 37.61 3,190 2.5 2003–2005 Semi-arid 
SiteLand coveraLongitude (°N)Latitude (°E)Elevation (m)EC height (m)DurationClimate types
HZZ BAR 100.32 38.77 1,731 2.85 2015–2021 Arid 
DM CRO 100.37 38.86 1,556 4.5 2015–2021 Arid 
HL CRO 115.79 40.36 480 3.5 2013–2019 Semi-humid 
MY CRO 117.32 40.63 350 10.66 2008–2010 Semi-humid 
YIK CRO 100.41 38.86 1,519 2.81 2007–2011 Arid 
HHL DBF 101.13 41.99 874 22 2014–2021 Arid 
SDQ DBF 101.14 42.00 873 2014–2021 Arid 
DHSb EBF 112.54 23.17 300 28 2003–2005 Humid 
QYZb ENF 115.06 26.74 111 39 2003–2005 Humid 
CBSb MF 128.10 42.40 738 30 2003–2005 Humid 
AR GRA 100.46 38.05 3,033 3.5 2014–2021 Semi-arid 
CL GRA 123.51 44.58 171 2007–2010 Semi-arid 
DXb GRA 91.07 30.50 4,333 2.2 2004–2005 Semi-arid 
DSL GRA 98.94 38.84 3,739 4.5 2014–2021 Semi-arid 
DLb GRA 116.28 42.05 1,350 2009–2010 Semi-arid 
HBA GRA 101.18 37.37 3,160 2.2 2002–2004 Semi-arid 
YAK GRA 100.24 38.01 4,148 3.2 2017–2021 Semi-arid 
HBSb WET 101.33 37.61 3,190 2.5 2003–2005 Semi-arid 

aBAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands.

bFlux sites without the observations of G.

Figure 1

LAI and precipitation time series at eight representative sites. Green dots represent the original LAI time series, green lines represent the processed LAI time series by the Savitzky-Golay filter (LAI_SG) method, and blue vertical bars denote precipitation. BAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands. The superscript a represents stations with missing precipitation data. (For the data of the remaining ten sites, see Supplementary material, Figure S2).

Figure 1

LAI and precipitation time series at eight representative sites. Green dots represent the original LAI time series, green lines represent the processed LAI time series by the Savitzky-Golay filter (LAI_SG) method, and blue vertical bars denote precipitation. BAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands. The superscript a represents stations with missing precipitation data. (For the data of the remaining ten sites, see Supplementary material, Figure S2).

Close modal

These sites provide heat flux and meteorological data. The data used here include LE, H, Rn, G, air temperature, wind speed, relative humidity (RH), atmospheric pressure, and incident shortwave radiation. Since Rn is the basis of model simulation, all models utilized observed values except for METRIC (see Section 2.1 for specific formulas). For missing observations within a short period (less than 2 hours), use the linear interpolation to complete, otherwise use the mean daily variation method (Falge et al. 2001). In addition, the energy non-closure issue is inherent in EC systems (Twine et al. 2000), so we utilized the Bowen ratio () energy closure method (Twine et al. 2000) to minimize the impact of energy imbalances. For sites without observed data of G, energy closure was not applied, and these sites have been labeled in Table 1.

In this study, we used only daytime data to drive the model. Daytime is defined as the time when incident shortwave radiation is greater than 20 W/m2, and this criterion can help filter out the periods with relatively large flux observation errors (i.e. early morning, late evening, and nighttime) (Ershadi et al. 2014; Melo et al. 2021). The results of the PT-JPL, PML-V2, and SGCF were finally aggregated to the daytime scale, and the instantaneous scale outcomes of the METRIC were upscaled to the daytime scale using the evaporative fraction (EF) method (Brutsaert & Sugita 1992).

Remote sensing-based measurements

NDVI and enhanced vegetation index (EVI) were used to produce constraint factors , , and in PT-JPL, and NDVI was also used to choose cold and hot pixels in METRIC. LAI was utilized for the energy division in PT-JPL and PML-V2. These vegetation indices and LST were obtained from The Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6.1. The combination of MOD13A1 and MYD13A1 produced the 8-day NDVI and EVI data with a spatial resolution of 500 m. LAI was extracted from the MCD15A3H, which has a 4-day temporal resolution and 500 m spatial resolution. The MOD11_L2 is a near-instantaneous LST dataset with a spatial resolution of 1 km. Then these data were uniformly resampled to 500 m. Specifically, METRIC needs to be run on a regional scale, so we circled a 10 km × 10 km area around each site in Google Earth as the extent of the data download. For the other three models running at the point scale, the linear interpolation and Savitzky-Golay filter (Savitzky & Golay 1964) were applied to obtain vegetation indices on a continuous daily scale of each site, and then all these three models used the same values of vegetation indices for their parameterization, as well as . The vegetation indices data were processed and downloaded on the Google Earth Engine (GEE) (https://code.earthengine.google.com/), which is a cloud-based platform for spatial data analysis. The LST data were downloaded from Earthdata Search (https://search.earthdata.nasa.gov/), and then processed by Python.

Statistical analysis

Model performance was evaluated by four statistical metrics including coefficient of determination (R2), root mean square error (RMSE), percentage bias (PBIAS), and Nash–Sutcliffe efficiency coefficient (NSE) as follows:
(5a)
(5b)
(5c)
(5d)
where and mean simulated and observed values respectively. N is the number of samples. From to 1, represents the best model result.

Model calibration and validation

The empirical parameters that need to be calibrated for the three models (i.e. PT-JPL, PML-V2, and SGCF) are shown in Table 2. To obtain optimal parameters for specific environmental conditions, the value of the RMSE between observations and simulations for LE was minimized at each site using the simulated annealing (SA) algorithm, which starts from a certain high initial temperature and finds the global optimal solution of the objective function randomly with the decreasing temperature parameter (Metropolis et al. 1953; Kirkpatrick et al. 1983). In addition, all three models were calibrated at each site individually. The available data from each site were first divided equally into two parts, then each part served as the calibration and validation sets, respectively, and the validation sets from the two parts were eventually combined for model evaluation (Gan et al. 2018; Zhang et al. 2019).

Table 2

The calibration ranges of model parameters

ModelParametersUnitRangeReference
PT-JPL  Unitless (0.5, 2) Fisher et al. (2008); Zhang et al. (2017)  
 (0, 1) 
 (0, 1.5) 
 (0.9, 1.1) 
 (−0.048, −0.025) 
 (−0.06, −0.04) 
  (5, 40) 
PML-V2   (10, 120) Gan et al. (2018); Zhang et al. (2019)  
  (0.01, 0.07) 
  (0.01, 0.07) 
 unitless (2, 20) 
  (0.01, 0.3) 
 unitless (0.01, 0.3) 
SGCF  unitless (0.8, 1.5) Han & Tian (2018); Wang et al. (2020); Wang et al. (2022); Zhou et al. (2020)  
 (0.6, 20) 
ModelParametersUnitRangeReference
PT-JPL  Unitless (0.5, 2) Fisher et al. (2008); Zhang et al. (2017)  
 (0, 1) 
 (0, 1.5) 
 (0.9, 1.1) 
 (−0.048, −0.025) 
 (−0.06, −0.04) 
  (5, 40) 
PML-V2   (10, 120) Gan et al. (2018); Zhang et al. (2019)  
  (0.01, 0.07) 
  (0.01, 0.07) 
 unitless (2, 20) 
  (0.01, 0.3) 
 unitless (0.01, 0.3) 
SGCF  unitless (0.8, 1.5) Han & Tian (2018); Wang et al. (2020); Wang et al. (2022); Zhou et al. (2020)  
 (0.6, 20) 

Seasonal dynamics of predicted LE and measured LE

To assess the ability of the models to characterize the seasonal dynamics of LE, the simulated and observed daytime LE are illustrated in Figure 2 and Supplementary material, Figure S3. The observations of LE (EC measurements) are represented by hollow circles, and the simulations of LE are characterized by lines with different colors. Since the METRIC could not produce a continuous time series, its simulations were just added to the graph as points (characterized by red pentagrams). The evaluation metrics (R2, RMSE, PBIAS, and NSE) presented in each figure were calculated by the models using all available data points of each site. To present the seasonal variation of LE in more detail, Figure 2 only displays the changes in LE over 2 years for the sites. To avoid overlapping of lines, the left (PT-JPL and PML-V2) and right panels (SGCF and METRIC) show the simulation results of the four models, respectively. As shown in Figure 2 and Supplementary material, Figure S3, the simulations of four models were in good agreement with the EC measurements at all sites except for a BAR site (HZZ) and two DBF sites (HHL and SDQ). At the BAR site, the PML-V2 and SGCF weakened the volatility of the growing season (determined based on LAI (LAI > 0.2LAImax), mostly from April to October), and underestimated LE. At the DBF sites, the observed LE had a relatively long high-value duration, while neither the PT-JPL nor the SGCF reflected this feature. Overall, PT-JPL had a relatively high capability of modeling seasonal variation, but it tended to overestimate LE in the non-growing season at CRO sites (DM and HL) and DBF sites (HHL and SDQ). The PML-V2 could characterize changes in LE well during the growing season while failing to show fluctuations during the non-growing season. This phenomenon is even more significant at the alpine station AR, where from November 2014 to March 2015, the LE simulated by PML-V2 was almost zero. The SGCF had an overly smooth growth and decline rate of LE, which could not display fluctuating changes in LE well. Furthermore, it overestimated LE before and after the growth stage, making the period of the growing period longer than in reality, while the PML-V2 did the opposite.
Figure 2

Seasonal dynamics of observed LE and predicted LE for the data available period at eight representative sites. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient. (For the data of the remaining ten sites, see Supplementary material, Figure S3).

Figure 2

Seasonal dynamics of observed LE and predicted LE for the data available period at eight representative sites. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient. (For the data of the remaining ten sites, see Supplementary material, Figure S3).

Close modal

ET partitioning results of PT-JPL and PML-V2

The ET component fractions of PT-JPL and PML-V2 at each site are shown in Figure 3. T/ET (or ) ranged from 0.12 to 0.65 for PT-JPL, and from 0.57 to 0.98 for PML-V2, with mean values of 0.36 and 0.75, respectively. Compared to PML-V2, PT-JPL obtained lower and higher . The ET partitioning results of the two models varied considerably, so we referred to the observed data involved in other literature to evaluate the ability of the two models to divide the ET. We extracted simulations for the same dates as the literature and calculated the mean values for all dates to compare with the observations. The observed data in the literature were all from the Daman isotope ground observation station, and the results are shown in Table 3. The mean values of the observed data, PT-JPL and PML-V2 were 0.81, 0.59, and 0.90, respectively. The statistics indicated that PT-JPL tended to underestimate , while PML did the opposite.
Table 3

Average for observations and two models at the same periods

ReferenceObservationsPT-JPLPML-V2
Yang et al. (2015)  0.85 0.36 0.93 
Yang et al. (2018)  0.87 0.68 0.85 
Chen et al. (2021)  0.71 0.72 0.92 
ReferenceObservationsPT-JPLPML-V2
Yang et al. (2015)  0.85 0.36 0.93 
Yang et al. (2018)  0.87 0.68 0.85 
Chen et al. (2021)  0.71 0.72 0.92 
Figure 3

The simulated ET component fractions of PT-JPL and PML-V2 at 18 sites. is canopy transpiration, is soil evaporation, is interception evaporation.

Figure 3

The simulated ET component fractions of PT-JPL and PML-V2 at 18 sites. is canopy transpiration, is soil evaporation, is interception evaporation.

Close modal

Overall model skills

As the input data for each model is different, the number of model outputs varied. The PT-JPL, PML-V2, SGCF, and METRIC produced 26,354, 26,226, 25,383, and 393 daytime data points, respectively. The performance for models with the total data points of each site is shown in Figure 4. The correlation coefficients for the four models were greater than 0.7 at most sites, and compared to PT-JPL and METRIC, PML-V2 and SGCF have a smaller degree of dispersion. The average R2 and RMSE of LE at 18 flux sites were 0.72 and 49.66 W/m2 for PT-JPL, 0.73 and 50.03 W/m2 for PML-V2, 0.8 and 43.38 W/m2 for SGCF, 0.67 and 51.10 W/m2 for METRIC. In addition, according to the calculated model evaluation metrics, there was no significant difference between the results of the calibration set and validation set, with differences in R2 of less than 0.01 and RMSE of less than 0.8 W/m2. More specific information is available in the Supplementary materials, including the performance of four models with total data points at 18 sites, the calibration parameters of each site, and the differences in performance between the calibration and validation sets at each site (Supplementary material, Tables S1–S3).
Figure 4

Taylor diagrams of estimated daytime LE (W/m2) during the whole research period for four models at 18 flux sites.

Figure 4

Taylor diagrams of estimated daytime LE (W/m2) during the whole research period for four models at 18 flux sites.

Close modal
To conduct the model comparisons more fairly, the results shown in the main text below were generated from data commonly owned by each model, with a total of 386 data points. Figure 5 illustrates the scatter plots of daytime LE predicted by four models with LE measured at 18 flux sites. The R2 and RMSE of LE were 0.77 and 49.21 W/m2 for PT-JPL, 0.81 and 45.54 W/m2 for PML-V2, 0.85 and 44.60 W/m2 for SGCF, 0.81 and 47.49 W/m2 for METRIC. Among these four models, only the SGCF overestimated the LE with a PBIAS of 13.01%, while the other three models had PBIAS of −8.25, −6.70, and −2.34%, respectively. In addition, the NSE of all models was greater than 0.75, indicating a high level of model credibility. In general, the overall performance of the four models was satisfactory.
Figure 5

The scatter plots of daytime LE predicted by four models with LE measured at 18 flux sites.

Figure 5

The scatter plots of daytime LE predicted by four models with LE measured at 18 flux sites.

Close modal
In addition to exploring the overall performance of each model, we also explored the model performances at individual towers with the data shown in Figure 5. Figure 6 presents the statistical parameters for four models at each site and the average at all sites. The R2 of PT-JPL ranged from 0.22 to 0.94, with a mean value of 0.72. The average RMSE, PBIAS, and NSE were 51.71 W/m2, −12.20%, and 0.37, respectively. The R2 under PML-V2 varied from 0.38 to 0.97, with a mean value of 0.80. The average RMSE, PBIAS, and NSE were 45.65 W/m2, −9.46%, and 0.53, respectively. The R2 for SGCF was in the range of 0.47–0.91, with a mean value of 0.79. The average RMSE, PBIAS, and NSE were 41.13 W/m2, 12.69%, and 0.63, respectively. The R2 for METRIC had a range of 0.26–0.98, with a mean value of 0.70. The average RMSE, PBIAS, and NSE were 51.09 W/m2, −2.53%, and 0.33, respectively. PT-JPL and PML-V2 tended to underestimate LE at most of the sites, while SGCF overestimated them, and METRIC did not exhibit a significant trend. The results demonstrated that none of the models consistently outperformed the others at all flux sites. At most sites, it was either PML or SGCF that had the highest R2 and lowest RMSE. On average, PML and SGCF performed slightly better than PT-JPL and METRIC.
Figure 6

Statistical parameters for four models at each site and the average at all sites. BAR, Barrens; CRO, Croplands; DBF, Deciduous broadleaf forests; EBF, Evergreen broadleaf forests; ENF, Evergreen needleleaf forests; GRA, Grasslands; MF, Mixed forests; WET, Wetlands. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient.

Figure 6

Statistical parameters for four models at each site and the average at all sites. BAR, Barrens; CRO, Croplands; DBF, Deciduous broadleaf forests; EBF, Evergreen broadleaf forests; ENF, Evergreen needleleaf forests; GRA, Grasslands; MF, Mixed forests; WET, Wetlands. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient.

Close modal

Model skills under different land covers and climate types

The sites were categorized according to the land cover types and the model performance is shown in Figure 7. At the BAR site, all models performed poorly and only PML-V2 presented relatively reasonable results, with R2, RMSE, PBIAS, and NSE of 0.38, 24.83 W/m2, −7.63%, and 0.36, respectively. Contrary to the BRA site, all models performed well at the GRA sites, and PML-V2 had an overall better performance than the other models, with average R2, RMSE, PBIAS, and NSE of 0.83, 34.70 W/m2, −5.59%, and 0.69, respectively. At the CRO sites, METRIC outperformed the other models, with an average R2, RMSE, PBIAS, and NSE of 0.82, 46.23 W/m2, −3. 61%, and 0.73, respectively. At two DBF sites, all models significantly underestimated LE, and PT-JPL performed the worst with average R2, RMSE, PBIAS, and NSE of 0.61, 122.65 W/m2, −52.41%, and −1.15, respectively. In general, PML-V2 and SGCF showed better behavior than the other two models at all forest sites, and PT-JPL performed worse than METRIC at the DBF sites. At the WET site, PT-JPL showed the best performance, followed by SGCF, METRIC, and PML-V2.
Figure 7

Model skills under different land covers and climate zones.

Figure 7

Model skills under different land covers and climate zones.

Close modal

In the arid zone, PML-V2 obtained the highest average R2 (0.78) and relatively high RMSE (61.14 W/m2), followed by METRIC, having average R2, RMSE, PBIAS, and NSE of 0.77, 63.59 W/m2, −19.08%, and 0.10, respectively. SGCF yielded the lowest average RMSE (51.69 W/m2) and absolute PBIAS (8.03%). PT-JPL had the worst performance, with average R2, RMSE, PBIAS, and NSE of 0.63, 79.57 W/m2, −21.69%, and −0.16, respectively. All models performed well in the semi-arid zone, with an average R2 greater than 0.75, RMSE less than 39.0 W/m2, and NSE greater than 0.78. SGCF had the highest R2 (0.87) and lowest RMSE (36.14 W/m2) among four models. In the semi-humid zone, PML-V2 and SGCF exhibited better performance than PT-JPL and METRIC, with an average R2 greater than 0.75 and RMSE less than 51.0 W/m2. In the humid zone, PML-V2 and SGCF still outperformed PT-JPL and METRIC, with R2 of 0.85, and 0.76, respectively. METRIC performed the worst with average R2, RMSE, PBIAS, and NSE of 0.54, 61.48 W/m2, 16.80%, and −0.12, respectively.

Uncertainties of input data and data process

The overall performance of the four models was satisfactory, but estimation errors could not be avoided. Therefore, we first analyzed the potential causes of errors during the research process: (1) Energy non-closure is an inherent problem in EC systems, six flux stations involved in this study did not have soil heat flux observations, which meant that they were unable to conduct energy closure procedures. Although some studies have shown that the results without energy closure are similar to those with, this could still be a source of uncertainty (Cao et al. 2021; Melo et al. 2021). For the other 12 flux sites, the soil heat flux is usually measured below the surface (5–10 cm depth), while energy closure typically requires soil heat flux at the soil surface, and errors will be passed on in the energy closure procedure if corrections are not made for heat storage between the soil heat flux panel and the soil surface (Heusinkveld et al. 2004; Gan & Gao 2015). (2) The spatial resolution of the remote sensing data used was 500 m, while the footprint range of the EC system depends on the reference height of each station, and the mismatch between the EC source area of the surface observations and the pixel size of the remote sensing data can cause errors (Palmer et al. 2020), which was not considered in this study. (3) To obtain a continuous time series of vegetation indices at the site scale, LAI and NDVI were interpolated and smoothed. According to Figure 2, the smoothing process reduced the peaks of the LAI, which could be attributed to the fact that the SG filter smoothed VI variations within the local data window and underestimated the local extreme points, ultimately leading to the underestimation of LAI, as well as NDVI. (4) Reasonable parameter settings may go beyond the given parameter calibration ranges since the ranges were borrowed from other literature at the daily scale, while this study was conducted at the daytime scale, which might in turn lead to estimation errors.

To validate inference (4), we tested PT-JPL and PML-V2 using all data points at 18 sites. Specifically, the range of sensitive parameters in PT-JPL ( (0.5, 2.5), (0, 2), (0.5, 2.5)) and PML-V2 ( (0, 180), (0, 0.2), (0, 0.2)) were expanded while the range of insensitive parameters was kept unchanged, then the model results were obtained by re-calibration parameterization, and model results are shown in Supplementary material, Table S4. Compared with the original model results, it was found that adjusting the parameter range would significantly improve the model performance at some sites that performed poorly under the original parameter conditions (HHL and SDQ). Taking the HHL station as an example, the R2 of the PT-JPL and PML-V2 simulation results had been improved by 0.1 and 0.04, respectively, and the RMSE had decreased by 38.07 and 22.66 W/m2 respectively. While the parameter adjustments had little impact on the originally better-performing sites, most sites had R2 and RMSE variations in the range of 0.01–0.03 and 0.3–5 W/m2, respectively. In addition, the insensitive parameter values did not change significantly from the original (see Supplementary material, Table S5). Whereas the sensitive parameters had a relatively wide range of variation, for instance, in PT-JPL varied in the range of −0.4 to 0.4 at most sites, and in PML-V2 varied in the range of −65 to 65 μmol m−2 s−1. Overall, the poor model performance at some of the sites was partly due to unreasonable parameter range settings.

Model performance

PML-V2 or SGCF performed the best at the majority of sites, with R2 ranging from 0.38 (0.28, statistical results for all data points) to 0.97 (0.90), and 0.47 (0.40) to 0.91 (0.92), respectively. PML-V2 is based on the combination of energy balance and a mass transfer method that incorporates conductance factors, having a solid biophysical foundation. And its favorable performance could be attributed to this. Compared to the other three models, the input data for SGCF are all ground-based observations with relatively small uncertainties, which may be one of the reasons for its better performance (Gan et al. 2021). However, the simulations of SGCF exhibited a systematic bias, overestimating LE at 13 sites. Some existing studies presented similar results, but the reasons for the overestimation of LE by SGCF have not been elucidated (Wang et al. 2020; Yan & Gao 2021). We found some possible reasons for this phenomenon. First, the application of the complementary principle is based on the premise that the humidity conditions at the land surface can be expressed by atmospheric conditions, which requires sufficient feedback between the land surface and the atmosphere to eventually reach an equilibrium state (Han & Tian 2020; Wang et al. 2021). Morton (1983) suggested that the complementary theory does not apply to short time scales since it takes time to reach equilibrium between the land surface and the atmosphere. Although the applicability of the SGCF on various time scales has been demonstrated, it may still be one of the uncertainties contributing to estimation errors (Wang et al. 2021; Yan & Gao 2021). Second, several studies have shown that the parameters of CR-based models (e.g. and ) are not fixed over the year, usually changing with the state of vegetation growth and aridity (Gan et al. 2021). For example, Gan et al. (2021) evaluated a generalized complementary relationship ET model (GCR) at 60 flux sites around the world, finding that the key parameter () of the model was strongly correlated with atmospheric conditions, varying by 0.3–0.4 in a year, and model performance would be improved if parameter calibration was undertaken. Whereas in this study, both and b were fixed throughout the year, which might have led to bias in the estimation. In addition, the boundary values and for were used as constant values 0 and 1, while the boundary values are only close to 0 and 1 even under completely dry and completely wet conditions, which may be another source of estimation inaccuracy (Han & Tian 2018; Zhou et al. 2020). Third, the SGCF simulation had a relatively smooth change in LE (Figure 3), and the model set the RMSE as the calibration target. So to match the peaks, the entire simulation curve was lifted accordingly, eventually resulting in an overestimation of the rest of the period. In addition, based on the above reasons, the problems presented by SGCF in modeling seasonal dynamics could be rationally explained. SGCF had overly smooth LE growth and decline rates, failing to display fluctuations. This may be because the simulated results from SGCF are based on the assumption that the surface and the atmosphere have reached a steady state while the actual observed data have not. Furthermore, the overestimation of SGCF in the non-growing season and slightly underestimation in the growing season might be due to the lack of consideration of seasonal changes in vegetation cover.

Apart from the overestimation of SGCF, it is also worth noting that PML-V2 and PT-JPL underestimated LE at 11 and 12 sites, respectively. For PT-JPL, vegetation indices (LAI, NDVI, and EVI) are key constraints. First, LAI determines the distribution of available energy between soil and vegetation (energy is divided according to vegetation cover), thus determining potential transpiration. Second, NDVI and EVI are involved in the process of limiting potential transpiration to actual transpiration by biophysical constraint parameters. The underestimation of vegetation indices may result in a lower allocation of available energy to the canopy and the lower canopy transpiration constraint parameters (, , etc.), ultimately leading to the underestimation of (Kustas & Norman 1999; Yang et al. 2016). In this study, vegetation indices were underestimated due to the inherent flaws of the interpolation and smoothing methods, which could somewhat explain the underestimation of PT-JPL. PML-V2 simulated LE well in the growing season and underestimated LE in the non-growing season. Soil evaporation in the non-growing season is dominant, so for most sites (located in areas with low vegetation cover, or where vegetation growth is significantly seasonal), the underestimation of LE could be attributed to the underestimation of . use (soil evaporation constrain parameter, expressed as the ratio of cumulative P over the previous 32 days to cumulative ) to indirectly characterize soil moisture (equivalent to soil water content, i.e. SWC) status. Although is simple and easy to use, it has some drawbacks. First, assigns the same weight to daily P and . This may lead to the response of soil moisture to P not being captured since the effect of P on soil moisture diminishes over time after precipitation events (Zhang et al. 2010; Gan et al. 2018; Bai 2023), ultimately resulting in the underestimation of . Second, ignores the impact of snowfall and snowmelt. On the one hand, solid precipitation may not have been recorded during the non-growing season, leading to low observed precipitation. On the other hand, snowmelt changes the soil moisture state and then participates in the evaporation process. However, this part of evaporation is neglected. In addition, the sublimation of ice and snow is not taken into account. These reasons could explain why PML-V2 underestimated LE. For example, AR is an alpine site that exhibits significant seasonality, with simulated LE close to zero in the non-growing season, while DHS and QYZ sites located in warm and humid regions presented favorable simulation results in the non-growing season (as shown in Figure 2).

Based on the discussion above, the significant discrepancy between the ET partitioning results of PT-JPL and PML-V2 could also be explained. Since ground-based ET partitioning data are generally not widely available, it is difficult to judge the partitioning results. According to the site data (Daman station) and model outputs from other literature, in this study, PT-JPL tended to overestimate and , while PML-V2 did the opposite. The underestimation of , and overestimation of and by PT-JPL could be found in other literature (Luo et al. 2022; Bai 2023). The reasons for the underestimation of have been mentioned above. As for the overestimation of and , the reasons are as following. uses atmospheric moisture conditions (, expressed as ) to indirectly reflect the effect of soil moisture on soil evaporation, which may bring about inaccurate estimates of due to the decoupling of soil moisture conditions and atmospheric moisture conditions on daily and shorter time scales (Fisher et al. 2008; Gan et al. 2019). Similarly, the constraint parameter for (, expressed as ) uses RH to reflect the state of the P, ignoring the fact that RH and P are usually decoupled. Therefore, the current and equations may overestimate soil moisture and canopy interception, which in turn leads to overestimation of and . And then the overestimation of directly contributes to the underestimation of since they are tightly coupled through in the model (Luo et al. 2022; Bai 2023). The underestimation of in PML-V2 is the main reason for the overestimation of , as explained above. As for in PML-V2, most of the sites selected for this study are located in arid and semi-arid zones with low precipitation and low vegetation cover. Canopy interception will be relatively small in such an environment, so it may be reasonable that many sites have interception evaporation close to zero. In a word, PML-V2 tended to underestimate in the non-growing season, and consequently showed low values and high values.

Neither PT-JPL nor PML-V2 constrains soil evaporation directly through actual soil moisture content, and indirectly constraining soil evaporation through meteorological variables may introduce significant uncertainty in the simulation results. Based on the above analysis, we infer that the soil evaporation constraint parameter is the critical factor that causes PT-JPL to overestimate and PML-V2 to underestimate . To verify this speculation, we selected four stations with soil moisture observations and explored the trends between the soil evaporation constraint parameters and soil moisture status on the daily ((a)–(d) in Figure 8) and monthly scales ((e)–(h) in Figure 8). Where the soil evaporation constraint parameters are in PT-JPL and in PML-V2, respectively. Soil moisture status is characterized by the normalized soil moisture, expressed as . The daily-scale data were derived from the average of daily half-hourly observations, and the monthly-scale data were derived from the average of daily-scale data. To better show the details of the variation between soil moisture state and soil evaporation constraint parameters at daily scales, (a)–(d) in Figure 8 only present the variation over the two years for each of the four sites. Since the monthly-scale changes are clearer and more regular, (e)–(h) in Figure 8 present the trends for each of the four sites over the period of data availability. The results indicated that and soil moisture had a relatively consistent trend, which may be attributed to the precipitation data used and the water balance principle applied in the parameter. However, on both the daily and monthly scales, lagged behind the rise in soil moisture during phases of rising soil moisture. Whereas, during phases of declining soil moisture, preceded the decline in soil moisture. This might lead to the underestimation of in the non-growing season as well as the early and late growing season, which in turn led to the underestimation of . The results from DM, DSL, and HBS sites could support this statement. Different from , varied considerably and was not consistent with soil moisture. On the daily scale, was quite variable, probably due to the large variations of RH and VPD, suggesting that the use of atmospheric conditions to reflect the state of soil moisture would bring a large uncertainty. On the monthly scale, could not show seasonal characteristics, so it certainly failed to reflect the actual soil moisture status. According to the results of DSL and HBS, had been fluctuating above and below 0.7, and its overestimation might lead to the continuous overestimation of , eventually resulting in an overestimation of . Overall, the speculation (i.e. soil evaporation constraint parameters are key factors contributing to the uncertainty in estimating for PT-JPL and PML-V2) is somewhat proven by a comparative analysis of changing trends between the soil evaporation constraint parameters and soil moisture status.
Figure 8

The trends between the soil evaporation constraint parameters (i.e. in PT-JPL and in PML-V2) and soil moisture status (expressed as ) on the daily and monthly scales. (a)–(d) represent daily time scales, and (e)–(h) represent monthly time scales.

Figure 8

The trends between the soil evaporation constraint parameters (i.e. in PT-JPL and in PML-V2) and soil moisture status (expressed as ) on the daily and monthly scales. (a)–(d) represent daily time scales, and (e)–(h) represent monthly time scales.

Close modal

Model performance varied across land cover types. All models performed well under GRA sites, with R2 of 0.84, 0.83, 0.87, and 0.78, respectively, which may be due to the more homogeneous subsurface of GRA sites. METRIC outperformed the other models at the CRO sites, possibly owing to its internal calibration process and the fact that it was built specifically for croplands. The underestimation of LE in PT-JPL and PML-V2 might be due to that they did not take into account the fact that crops require a lot of irrigation during growth. All models underestimated LE at two DBF sites (HHL and SDQ), especially the other three models except for SGCF, where the PBIAS was even less than −50%, and this also contributed to a decrease in the average or overall performance of the models. These two sites are located in arid zones, where vegetation will have deep root systems that can utilize deeper soil moisture or even groundwater to maintain growth, but no model took the influences of groundwater into account during ET modeling (Chen et al. 2020). Combined with the underestimation of LAI mentioned above, LE was ultimately significantly underestimated.

Across climate zones, the models did not exhibit a characteristic decline in model performance with increasing aridity. Almost all models achieved the best results in the semi-arid zone, while in the other zones, model performance was variable. Except for METRIC, all models performed the worst in the arid zone, and both PT-JPL and PML-V2 performed second only to the semi-arid zone in the humid zone. However, this does not assert that the model performs better with increasing moisture since the sites dominating each region have different land cover types, and the model performance seems to be more influenced by land cover types.

Some studies have also shown that temperature-based models are more suitable for water-limited areas with partial vegetation cover, and conductance-based models work best for both water-stressed and energy-limited conditions for vegetated areas (Chen & Liu 2020). METRIC is a temperature-based model that uses LST to calculate H, and thus the energy balance residual LE, having an average R2 of 0.77, 0.72, 0.71, and 0.54 from the arid zone to the humid zone, respectively. The performance of METRIC decreases accordingly as the vegetation cover increases and the degree of water limitation decreases, which to some extent corroborates the statement mentioned above. PML-V2 is a conductance-based model that uses vegetation information obtained from optical remote sensing to calculate stomatal or canopy conductance, obtaining the highest R2 of 0.85 in the wet zone dominated by forest stations with high LAI and did not show a trend of decreasing performance with increasing dryness. The overall performance of PML-V2 seems not to validate that it performs better in water-limited and energy-limited areas. Therefore, when specific to a model or certain application scenarios, the general laws may become inapplicable.

In this study, combing remotely sensed data and site observations, four terrestrial ET models based on different theories were intercompared at 18 flux sites distributed in China. PML-V2 and SGCF had the best performance at most sites, with mean R2 of 0.80 (0.73, statistical results for all data points) and 0.79 (0.80), respectively. All models performed well in the semi-arid zone dominated by the GRA sites and did not show a decline in model performance with increasing aridity. PT-JPL and PML-V2 underestimated LE at most sites, while SGCF did the opposite. The ET partitioning results of PT-JPL and PML-V2 differed considerably. PT-JPL tended to overestimate and because of the unreasonable parameterization scheme of and . PML-V2 underestimated soil evaporation in the non-growing season due to the inherent shortcomings of its constraint parameter , and then resulted in the underestimation of . Overall, the performance of each model varied between sites, ecosystems, and climate zones, and the discrepancies in the results could be attributed to the uncertainties of the input data and inherent flaws in the model structure. Since the ET partitioning results still have a large uncertainty, more in-depth research is needed in the future.

This work was supported by the Key Project of the National Natural Science Foundation of China under Grant No. 42330506 and the General Program of the National Science Foundation of China under Grant No. 42071054. This work used meteorological and EC data acquired and shared by the National Tibetan Plateau Science Data Centre (https://data.tpdc.ac.cn) and the FLUXNET community (https://fluxnet.org/data/download-data).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Allen
R. G.
,
Tasumi
M.
&
Trezza
R.
2007
Satellite-based energy balance for mapping evapotranspiration with internalized calibration (METRIC)—Model
.
Journal of Irrigation and Drainage Engineering
133
(
4
),
380
394
.
Bai
P.
2023
Comparison of remote sensing evapotranspiration models: Consistency, merits, and pitfalls
.
Journal of Hydrology
617
,
128556
.
Bhattarai
N.
,
Shaw
S. B.
,
Quackenbush
L. J.
,
Im
J.
&
Niraula
R.
2016
Evaluating five remote sensing based single-source surface energy balance models for estimating daily evapotranspiration in a humid subtropical climate
.
International Journal of Applied Earth Observation and Geoinformation
49
,
75
86
.
Bhattarai
N.
,
Quackenbush
L. J.
&
Im
J.
2017
A new optimized algorithm for automating endmember pixel selection in the SEBAL and METRIC models
.
Remote Sensing of Environment
196
,
178
192
.
Brutsaert
W.
&
Sugita
M.
1992
Application of self-preservation in the diurnal evolution of the surface-energy budget to determine daily evaporation
.
Journal of Geophysical Research-Atmospheres
97
(
D17
),
18377
18382
.
Cao
M.
,
Wang
W.
,
Xing
W.
,
Wei
J.
,
Chen
X.
,
Li
J.
&
Shao
Q.
2021
Multiple sources of uncertainties in satellite retrieval of terrestrial actual evapotranspiration
.
Journal of Hydrology
601
,
126642
.
Chen
Y.
,
Xia
J.
,
Liang
S.
,
Feng
J.
,
Fisher
J. B.
,
Li
X.
,
Li
X.
,
Liu
S.
,
Ma
Z.
,
Miyata
A.
,
Mu
Q.
,
Sun
L.
,
Tang
J.
,
Wang
K.
,
Wen
J.
,
Xue
Y.
,
Yu
G.
,
Zha
T.
,
Zhang
L.
,
Zhang
Q.
,
Zhao
T.
,
Zhao
L. D.
&
Yuan
W.
2014
Comparison of satellite-based evapotranspiration models over terrestrial ecosystems in China
.
Remote Sensing of Environment
140
,
279
293
.
Chen
H.
,
Zhu
G.
,
Zhang
K.
,
Bi
J.
,
Jia
X.
,
Ding
B.
,
Zhang
Y.
,
Shang
S.
,
Zhao
N. D.
&
Qin
W.
2020
Evaluation of evapotranspiration models using different LAI and meteorological forcing data from 1982 to 2017
.
Remote Sensing
12
(
15
),
2473
.
Chen
H.
,
Huang
J. J.
,
McBean
E.
&
Singh
V. P.
2021
Evaluation of alternative two-source remote sensing models in partitioning of land evapotranspiration
.
Journal of Hydrology
597
,
126029
.
Ershadi
A.
,
McCabe
M. F.
,
Evans
J. P.
,
Chaney
N. W.
&
Wood
E. F.
2014
Multi-site evaluation of terrestrial evaporation models using FLUXNET data
.
Agricultural and Forest Meteorology
187
,
46
61
.
Falge
E.
,
Baldocchi
D.
,
Olson
R.
&
Anthoni
P.
2001
Gap filling strategies for long term energy flux data sets
.
Agricultural and Forest Meteorology
107
(
1
),
71
77
.
Fisher
J. B.
,
Melton
F.
,
Middleton
E.
,
Hain
C.
, Anderson, M., Allen, R., McCabe, M. F., Hook, S., Baldocchi, D., Townsend, P. A., Kilic, A., Tu, K., Miralles, D. D., Perret, J., Lagouarde, J.-P., Waliser, D., Purdy, A. J., French, A., Schimel, D., Famiglietti, J. S., Stephens, G. & Wood, E. F.
2017
The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources
.
Water Resources Research
53
(
4
),
2618
2626
.
Gan
R.
,
Zhang
Y.
,
Shi
H.
,
Yang
Y.
,
Eamus
D.
, Cheng, L., Chiew, F. H. S. & Yu, Q.
2018
Use of satellite leaf area index estimating evapotranspiration and gross assimilation for Australian ecosystems
.
Ecohydrology
11
(
5
),
e1974
.
Gan
G.
,
Kang
T.
,
Yang
S.
,
Bu
J.
,
Feng
Z.
&
Gao
Y.
2019
An optimized two source energy balance model based on complementary concept and canopy conductance
.
Remote Sensing of Environment
223
,
243
256
.
Gan
G.
,
Liu
Y.
,
Chen
D.
&
Zheng
C.
2021
Investigation of a nonlinear complementary relationship model for monthly evapotranspiration estimation at global flux sites
.
Journal of Hydrometeorology
22
(
10
),
2645
2658
.
Guo
A.
,
Liu
S.
,
Zhu
Z.
,
Xu
Z.
,
Xiao
Q.
, Ju, Q., Zhang, Y. & Yang, X.
2020
Impact of lake/reservoir expansion and shrinkage on energy and water vapor fluxes in the surrounding area
.
Journal of Geophysical Research-Atmospheres
125
,
20
.
Heusinkveld
B. G.
,
Jacobs
A. F. G.
,
Holtslag
A. A. M.
&
Berkowicz
S. M.
2004
Surface energy balance closure in an arid region: Role of soil heat flux
.
Agricultural and Forest Meteorology
122
(
1–2
),
21
37
.
Jia
Z.
,
Liu
S.
,
Xu
Z.
,
Chen
Y.
&
Zhu
M.
2012
Validation of remotely sensed evapotranspiration over the Hai River Basin, China
.
Journal of Geophysical Research: Atmospheres
117
(
D13
),
n/a
n/a
.
Jingyi
B.
,
Guojing
G.
,
Jiahao
C.
,
Yanxin
S.
,
Mengjia
Y.
, Yanchun Gao, Domingo, D., López-Ballesteros, A., Migliavacca, M., El-Madany, T. S., Gentine, P., Jingfeng X. & Garcia, M.
2024
Dryland evapotranspiration from remote sensing solar-induced chlorophyll fluorescence: Constraining an optimal stomatal model within a two-source energy balance model
.
Remote Sensing of Environment
303
,
113999
.
Kirkpatrick
S.
,
Gelatt
C. D.
&
Vecchi
M. P.
1983
Optimization by simulated annealing
.
Science
220
(
4598
),
671
680
.
Liu
S. M.
,
Xu
Z. W.
,
Wang
W. Z.
,
Jia
Z. Z.
,
Zhu
M. J.
,
Bai
J.
&
Wang
J. M.
2011
A comparison of eddy-covariance and large aperture scintillometer measurements with respect to the energy balance closure problem
.
Hydrology and Earth System Sciences
15
(
4
),
1291
1306
.
Liu
S.
,
Li
X.
,
Xu
Z.
,
Che
T.
,
Xiao
Q.
, Ma, M., Liu, Q., Jin, R., Guo, J., Wang, L., Wang, W., Qi, Y., Li, H., Xu, T., Ran, Y., Hu, X., Shi, S., Zhu, Z., Tan, J., Zhang, Y. & Ren, Z.
2018
The heihe integrated observatory network: A basin-Scale land surface processes observatory in China
.
Vadose Zone Journal
17
(
1
),
1
21
.
Liu
Y.
,
Qiu
G.
,
Zhang
H.
,
Yang
Y.
,
Zhang
Y.
, Wang, Q., Zhao, W., Jia, L., Ji, X., Xiong, Y., Yan, C., Ma, N., Han, S. & Cui, Y.
2022
Shifting from homogeneous to heterogeneous surfaces in estimating terrestrial evapotranspiration: Review and perspectives
.
Science China Earth Sciences
65
(
2
),
197
214
.
Melo
D. C. D.
,
Anache
J. A. A.
,
Borges
V. P.
,
Miralles
D. G.
,
Martens
B.
, Fisher, J. B., Nobrega, R. L. B., Moreno, A., Cabral, O. M. R., Rodrigues, T. R., Bezerra, B., Silva, C. M. S., Neto, A. A. M., Moura, M. S. B., Marques, T. V., Campos, S., Nogueira, J. S., Rosolem, R., Souza, R. M. S., Antonino, A. C. D., Holl, D., Galleguillos, M., Perez-Quezada, J. F., Verhoef, A., Kutzbach, L., Lima, J. R. S., Souza, E. S., Gassman, M. I., Perez, C. F., Tonti, N., Posse, G., Rains, D., Oliveira, P. T. S. & Wendland, E.
2021
Are remote sensing evapotranspiration models reliable across south American ecoregions?
Water Resources Research
57
(
11
),
e2020WR028752
.
Metropolis
N.
,
Rosenbluth
A. W.
,
Rosenbluth
M. N.
,
Teller
A. H.
&
Teller
E.
1953
Equation of state calculations by fast computing machines
.
Journal of Chemical Physics
21
(
6
),
1087
1092
.
Monteith
J. L.
1965
Evaporation and environment
.
Symposia of the Society for Experimental Biology
19
,
205
234
.
Palmer
A. R.
,
Ezenne
G. I.
,
Choruma
D. J.
,
Gwate
O.
,
Mantel
S. K.
&
Tanner
J. L.
2020
A comparison of three models used to determine water fluxes over the Albany Thicket, Eastern Cape, South Africa
.
Agricultural and Forest Meteorology
288
,
107984
.
Pastorello
G.
,
Trotta
C.
,
Canfora
E.
,
Chu
H.
,
Christianson
D.
, Cheah, Y.-W., Poindexter, C., Chen, J., Elbashandy, A., Humphrey, M., Isaac, P., Polidori, D., Ribeca, A., van Ingen, C., Zhang, L., Amiro, B., Ammann, C., Arain, M. A., Ardö, J., Arkebauer, T., Arndt, S. K., Arriga, N., Aubinet, M., Aurela, M., Baldocchi, D., Barr, A., Beamesderfer, E., Marchesini, L. B., Bergeron, O., Beringer, J., Bernhofer, C., Berveiller, D., Billesbach, D., Black, T. A., Blanken, P. D., Bohrer, G., Boike, J., Bolstad, P. V., Bonal, D., Bonnefond, J.-M., Bowling, D. R., Bracho, R., Brodeur, J., Brümmer, C., Buchmann, N., Burban, B., Burns, S. P., Buysse, P., Cale, P., Cavagna, M., Cellier, P., Chen, S., Chini, I., Christensen, T. R., Cleverly, J., Collalti, A., Consalvo, C., Cook, B. D., Cook, D., Coursolle, C., Cremonese, E., Curtis, P. S., D'Andrea, E., da Rocha, H., Dai, X., Davis, K. J., De Cinti, B., de Grandcourt, A., De Ligne, A., De Oliveira, R. C., Delpierre, N., Desai, A. R., Di Bella, C. M., di Tommasi, P., Dolman, H., Domingo, F., Dong, G., Dore, S., Duce, P., Dufrêne, E., Dunn, A., Dušek, J., Eamus, D., Eichelmann, U., ElKhidir, H. A. M., Eugster, W., Ewenz, C. M., Ewers, B., Famulari, D., Fares, S., Feigenwinter, I., Feitz, A., Fensholt, R., Filippa, G., Fischer, M., Frank, J., Galvagno, M., Gharun, M., Gianelle, D., Gielen, B., Gioli, B., Gitelson, A., Goded, I., Goeckede, M., Goldstein, A. H., Gough, C. M., Goulden, M. L., Graf, A., Griebel, A., Gruening, C., Grünwald, T., Hammerle, A., Han, S., Han, X., Hansen, B. U., Hanson, C., Hatakka, J., He, Y., Hehn, M., Heinesch, B., Hinko-Najera, N., Hörtnagl, L., Hutley, L., Ibrom, A., Ikawa, H., Jackowicz-Korczynski, M., Janouš, D., Jans, W., Jassal, R., Jiang, S., Kato, T., Khomik, M., Klatt, J., Knohl, A., Knox, S., Kobayashi, H., Koerber, G., Kolle, O., Kosugi, Y., Kotani, A., Kowalski, A., Kruijt, B., Kurbatova, J., Kutsch, W. L., Kwon, H., Launiainen, S., Laurila, T., Law, B., Leuning, R., Li, Y., Liddell, M., Limousin, J.-M., Lion, M., Liska, A. J., Lohila, A., López-Ballesteros, A., López-Blanco, E., Loubet, B., Loustau, D., Lucas-Moffat, A., Lüers, J., Ma, S., Macfarlane, C., Magliulo, V., Maier, R., Mammarella, I., Manca, G., Marcolla, B., Margolis, H. A., Marras, S., Massman, W., Mastepanov, M., Matamala, R., Matthes, J. H., Mazzenga, F., McCaughey, H., McHugh, I., McMillan, A. M. S., Merbold, L., Meyer, W., Meyers, T., Miller, S. D., Minerbi, S., Moderow, U., Monson, R. K., Montagnani, L., Moore, C. E., Moors, E., Moreaux, V., Moureaux, C., Munger, J. W., Nakai, T., Neirynck, J., Nesic, Z., Nicolini, G., Noormets, A., Northwood, M., Nosetto, M., Nouvellon, Y., Novick, K., Oechel, W., Olesen, J. E., Ourcival, J.-M., Papuga, S. A., Parmentier, F.-J., Paul-Limoges, E., Pavelka, M., Peichl, M., Pendall, E., Phillips, R. P., Pilegaard, K., Pirk, N., Posse, G., Powell, T., Prasse, H., Prober, S. M., Rambal, S., Rannik, Ü., Raz-Yaseef, N., Reed, D., de Dios, V. R., Restrepo-Coupe, N., Reverter, B. R., Roland, M., Sabbatini, S., Sachs, T., Saleska, S. R., Sánchez-Cañete, E. P., Sanchez-Mejia, Z. M., Schmid, H. P., Schmidt, M., Schneider, K., Schrader, F., Schroder, I., Scott, R. L., Sedlák, P., Serrano-Ortíz, P., Shao, C., Shi, P., Shironya, I., Siebicke, L., Šigut, L., Silberstein, R., Sirca, C., Spano, D., Steinbrecher, R., Stevens, R. M., Sturtevant, C., Suyker, A., Tagesson, T., Takanashi, S., Tang, Y., Tapper, N., Thom, J., Tiedemann, F., Tomassucci, M., Tuovinen, J.-P., Urbanski, S., Valentini, R., van der Molen, M., van Gorsel, E., van Huissteden, K., Varlagin, A., Verfaillie, J., Vesala, T., Vincke, C., Vitale, D., Vygodskaya, N., Walker, J. P., Walter-Shea, E., Wang, H., Weber, R., Westermann, S., Wille, C., Wofsy, S., Wohlfahrt, G., Wolf, S., Woodgate, W., Li, Y., Zampedri, R., Zhang, J., Zhou, G., Zona, D., Agarwal, D., Biraud, S., Torn, M. & Papale, D.
2020
The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data
.
Scientific Data
7
(
1
),
225
.
Priestley
C. H. B.
&
Taylor
R. J.
1972
On the assessment of surface heat flux and evaporation using large-scale parameters
.
Monthly Weather Review
100
(
2
),
81
92
.
Savitzky
A.
&
Golay
M. J. E.
1964
Smoothing and differentiation of data by simplified least squares procedures
.
Analytical Chemistry
36
(
8
),
1627
1639
.
Twine
T. E.
,
Kustas
W. P.
&
Norman
J. M.
2000
Correcting eddy-covariance flux underestimates over a grassland
.
Agricultural and Forest Meteorology
103
(
3
),
279
300
.
Wang
L.
,
Tian
F.
,
Han
S.
&
Wei
Z.
2020
Determinants of the asymmetric parameter in the generalized complementary principle of evaporation
.
Water Resources Research
56
(
9
),
26570
26570
.
Wang
L. M.
,
Han
S. J.
&
Tian
F. Q.
2021
At which timescale does the complementary principle perform best in evaporation estimation?
Hydrology and Earth System Sciences
25
(
1
),
375
386
.
Wang
L. M.
,
Tian
F. Q.
,
Han
S. J.
,
Cui
T.
,
Meng
X. H.
&
Hu
H. C.
2022
Determination of the asymmetric parameter in complementary relations of evaporation in alpine grasslands of the Tibetan Plateau
.
Journal of Hydrology
605
,
127306
.
Xu
Z.
&
Liu
S.
2021
Multi-scale surface flux and meteorological elements observation dataset in the Hai River Basin (Miyun site-automatic weather station) (2008-2010). https://dx.doi.org/10.3972/haihe.001.2013.db
Xu
Z.
,
Tan
J.
,
Zhang
Y.
,
Li
X.
,
Liu
S.
&
Che
T.
2015
HiWATER: Dataset of hydrometeorological observation network (an observation system of meteorological elements gradient of A'rou Superstation, 2014). https://dx.doi.org/10.3972/hiwater.248.2015.db
Yang
Y. T.
,
Long
D.
,
Guan
H. D.
,
Liang
W.
,
Simmons
C.
&
Batelaan
O.
2015
Comparison of three dual-source remote sensing evapotranspiration models during the MUSOEXE-12 campaign: Revisit of model physics
.
Water Resources Research
51
(
5
),
3145
3165
.
Yang
Z. S.
,
Zhang
Q.
,
Yang
Y.
,
Hao
X. C.
&
Zhang
H. L.
2016
Evaluation of evapotranspiration models over semi-arid and semi-humid areas of China
.
Hydrological Processes
30
(
23
),
4292
4313
.
Yang
Y.
,
Qiu
J.
,
Zhang
R.
,
Huang
S.
,
Chen
S.
, Wang, H., Luo, J. & Fan, Y.
2018
Intercomparison of three two-source energy balance models for partitioning evaporation and transpiration in semiarid climates
.
Remote Sensing
10
(
7
),
1149
.
Zhang
Y.
,
Leuning
R.
,
Hutley
L. B.
,
Beringer
J.
,
McHugh
I.
&
Walker
J. P.
2010
Using long-term water balances to parameterize surface conductances and calculate evaporation at 0.05 degrees spatial resolution
.
Water Resources Research
46
,
5
.
Zhang
Y.
,
Kong
D.
,
Gan
R.
,
Chiew
F. H. S.
,
McVicar
T. R.
,
Zhang
Q.
&
Yang
Y.
2019
Coupled estimation of 500m and 8-day resolution global evapotranspiration and gross primary production in 2002–2017
.
Remote Sensing of Environment
222
,
165
182
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data