ABSTRACT
To better understand the discrepancies in evapotranspiration (ET) simulations between ET models, we intercompared four models in China: Priestley–Taylor Jet Propulsion Laboratory (PT-JPL), Penman–Montieth–Leuning Version 2 (PML-V2), Sigmoid Generalized Complementary Function (SGCF), Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC). Data from 18 flux sites were used to evaluate the model performance at daytime (when incident shortwave radiation is greater than 20 W/m2) scales. To compare more fairly, we took the intersection of the outputs from four models for the analyses in the main text. All models yielded acceptable results, with PML-V2 or SGCF performing best at most sites. The average coefficient of determination and root mean square error among all sites of LE (latent heat of ET) were 0.72 and 51.71 W/m2 for PT-JPL, 0.80 and 46.65 W/m2 for PML-V2, 0.79 and 41.13 W/m2 for SGCF, 0.70 and 51.09 W/m2 for METRIC. PT-JPL and PML-V2 underestimated ET at most sites, whereas SGCF overestimated, potentially due to uncertainties in the vegetation indices and ET constraint parameters. Compared to measurements, PT-JPL underestimated the proportion of transpiration to evapotranspiration (0.81 versus 0.59), while PML-V2 overestimated (0.81 versus 0.90). Furthermore, all models performed best in the semi-arid zone dominated by grassland sites.
HIGHLIGHTS
This study evaluated the performance of four ET models that are seldomly compared together at 18 flux sites.
The flux sites are distributed across various biomes and climate zones in China.
All models yielded acceptable results, with PML-V2 and SGCF outperforming the others.
The ET partitioning results of PT-JPL and PML-V2 were quite different.
All models performed the best in the semi-arid zone.
INTRODUCTION
Evapotranspiration (ET), which mainly consists of soil evaporation (E) and vegetation transpiration (T), is a vital element in the soil–vegetation–atmosphere continuum, linking the water cycle, energy cycle, and carbon cycle. Accurate measurements and prediction of ET are important for drought and climate change monitoring, agricultural practices, water management, and weather forecasting (Chen et al. 2014; Fisher et al. 2017). However, ET is recognized as one of the most challenging components in the hydrologic cycle to be measured and simulated due to its complicated dependence on subsurface and climatic conditions. ET measurements can be obtained from micro-meteorological methods, and eddy covariance (EC) systems have been widely used over the past three decades. In addition, the development of remote sensing technology has also promoted the application of some ET methods since various land surface variables (e.g. leaf area index (LAI), normalized difference vegetation index (NDVI), and land surface temperature (LST)) retrieved from optical–thermal satellites can be used to drive ET models (Melo et al. 2021; Bai 2023).
ET estimation involves the complexity of the subsurface and the uncertainty of measuring at different spatial and temporal scales. Based on different assumptions and theories, six main types of ET models have been developed and applied to estimate ET (Hu et al. 2018; Liu et al. 2022): (1) combination-type models; (2) radiation-based models; (3) surface energy balance models; (4) vegetation index (VI)-radiative surface temperature models; (5) complementary relationship models; (6) data-driven models. Combination-type models are mainly based on the Penman–Monteith (PM) approach (Monteith 1965), which combines aerodynamic theories and energy balance principles. The representative models are Penman–Montieth–Leuning (PML) and improved PML models (e.g. PML-V2) (Leuning et al. 2008; Zhang et al. 2019), which focus on the physical quantification of canopy conductance and have great biophysical implications. Radiation-based models are mainly the simplifications of the PM formula, requiring relatively less meteorological and radiometric information to estimate ET (Priestley & Taylor 1972). Priestley–Taylor Jet Propulsion Laboratory (PT-JPL) is a typical radiation-based model, scaling potential ET to actual ET by parametrically constraining the ET components (Fisher et al. 2008). Surface energy balance models are concentrated on calculating the sensible heat flux, with the latent heat flux obtained as a residual term in the energy balance. Mapping ET at High Resolution with Internalized Calibration (METRIC) and Two Source Energy Balance (TSEB) are representatives of single-source energy balance and TSEB models, respectively (Kustas & Norman 1999; Allen et al. 2007). Unlike METRIC, TSEB characterizes the energy balance in the canopy and soil layers separately (Jingyi et al. 2024). VI-radiative surface temperature models are built based on the PT equation and the triangular or trapezoidal spatial relationship between VI and LST. One typical model is the Trapezoid Interpolation Model (TIM), obtaining actual ET by using the trapezoidal spatial relationship between LST and VI to adjust the PT-based potential ET algorithm. It is simple and has advantages in estimating ET over large areas (Jiang & Islam 2001). Complementary relationship models originally describe the complementary relationship between actual ET, apparent potential ET, and potential ET. Various forms of complementary relationships have been developed now, and all of them have been proven to be effective methods that do not require a comprehensive understanding of surface properties. For example, the Advection-Aridity approach describing linear complementary relationships has been widely used since its simplicity, and methods describing non-linear complementary relationships are also increasingly accepted (e.g. Sigmoid Generalized Complementary Function (SGCF)) (Brutsaert 1979; Han & Tian 2018). Data-driven models consider fewer physical mechanisms, directly associating ET with a variety of forcing data. Random forests, artificial neural networks, and so many other machine learning methods have been widely used now (Hu et al. 2021).
All models mentioned above differ in terms of parameterization, complexity of structure, and level of input data, leading to significant differences in model simulation results (Chen et al. 2014; Yang et al. 2016). Therefore, to select the appropriate ET models meeting different research objectives, it is necessary to understand how different models perform in specific land cover types and climatic regions, and model comparison is the most direct and effective way to do this (Bai 2023). For example, Ershadi et al. (2014) compared four ET models based on 20 FLUXNET towers, finding that none of the models outperformed the others under various biomes and that the ensemble average of the models produced the best overall estimates of ET across the towers. Yan & Gao (2021) compared five complementary relationship-based models and found that the structurally simple model was superior in the absence of calibration. Hu et al. (2021) made a comparison between physical-based, data-driven, and hybrid modeling approaches at 49 flux sites worldwide, concluding that data-driven models outperformed the others. Peddinti & Kisekka (2022) evaluated three surface energy balance models using flux site data and found that the two-source model performed better than the single-source models. Although a considerable amount of model comparison work has been undertaken, there are still some drawbacks to intercomparison and assessment of model performance. First, the existing work focuses on comparisons between models of the same type or variants of the same model (Yang et al. 2015; Bhattarai et al. 2016; Yan & Gao 2021; Jiang & Liu 2022; Peddinti & Kisekka 2022). Second, the ET models are usually assessed using flux sites located in America or Europe (Ershadi et al. 2014; Bhattarai et al. 2016; Chen et al. 2020; Bu et al. 2021; Hu et al. 2021; Melo et al. 2021).
In this study, we intercompared four models (PT-JPL, PML-V2, SGCF, and METRIC) that are seldomly compared together at 18 flux sites located in China, and the main aims of this study are to (1) assess the capacity of models to characterize seasonal variation; (2) evaluate the ability of multi-source models (PT-JPL and PML-V2) partitioning ET; (3) compare model performance across ecosystems and climate regions.
METHODOLOGY AND DATA
ET models
The four models are PT-JPL, PML-V2, SGCF, and METRIC. PT-JPL is a radiation-based model proposed by Fisher et al. (2008) based on the PT equation (Priestley & Taylor 1972). It uses ecophysiological, atmospheric, and soil constraints to scale the potential LE to actual LE, having a relatively simple structure and low input requirements. PML-V2 (Gan et al. 2018; Zhang et al. 2019) is a combination-type model based on the PM approach (Monteith 1965), combining aerodynamic theories and energy balance principles. It partitions ET into three components and integrates a canopy conductance model that couples carbon and water fluxes, having a robust physical foundation. SGCF (Han et al. 2012; Han & Tian 2018) is a complementary relationship-based model that describes the relationship between the ratio of actual ET to Penman potential ET () and the proportion of the radiation term in
(
). It does not require a comprehensive understanding of the surface properties, but only the input of meteorological data. METRIC (Allen et al. 2007) is a surface energy balance model, internally calibrated by ground-based reference ET (
), and calculates latent heat flux as a residual of the surface energy balance, relying on the input of thermal infrared imagery and optical imagery of vegetation. Since these four models are typical of their respective categories and few studies have combined them for model comparisons, we selected them for our study. More detailed information and formulas can be found below.
PT-JPL model






























PML-V2 model































SGCF model











METRIC model




In this study, the code for all models was written in Python, and the code for the METRIC was derived from the open-source version pyMETRIC (https://github.com/hectornieto/pyMETRIC) built by Héctor Nieto.
Tower-based in situ measurements





The description of 18 flux sites used in this study
Site . | Land covera . | Longitude (°N) . | Latitude (°E) . | Elevation (m) . | EC height (m) . | Duration . | Climate types . |
---|---|---|---|---|---|---|---|
HZZ | BAR | 100.32 | 38.77 | 1,731 | 2.85 | 2015–2021 | Arid |
DM | CRO | 100.37 | 38.86 | 1,556 | 4.5 | 2015–2021 | Arid |
HL | CRO | 115.79 | 40.36 | 480 | 3.5 | 2013–2019 | Semi-humid |
MY | CRO | 117.32 | 40.63 | 350 | 10.66 | 2008–2010 | Semi-humid |
YIK | CRO | 100.41 | 38.86 | 1,519 | 2.81 | 2007–2011 | Arid |
HHL | DBF | 101.13 | 41.99 | 874 | 22 | 2014–2021 | Arid |
SDQ | DBF | 101.14 | 42.00 | 873 | 8 | 2014–2021 | Arid |
DHSb | EBF | 112.54 | 23.17 | 300 | 28 | 2003–2005 | Humid |
QYZb | ENF | 115.06 | 26.74 | 111 | 39 | 2003–2005 | Humid |
CBSb | MF | 128.10 | 42.40 | 738 | 30 | 2003–2005 | Humid |
AR | GRA | 100.46 | 38.05 | 3,033 | 3.5 | 2014–2021 | Semi-arid |
CL | GRA | 123.51 | 44.58 | 171 | 2 | 2007–2010 | Semi-arid |
DXb | GRA | 91.07 | 30.50 | 4,333 | 2.2 | 2004–2005 | Semi-arid |
DSL | GRA | 98.94 | 38.84 | 3,739 | 4.5 | 2014–2021 | Semi-arid |
DLb | GRA | 116.28 | 42.05 | 1,350 | 4 | 2009–2010 | Semi-arid |
HBA | GRA | 101.18 | 37.37 | 3,160 | 2.2 | 2002–2004 | Semi-arid |
YAK | GRA | 100.24 | 38.01 | 4,148 | 3.2 | 2017–2021 | Semi-arid |
HBSb | WET | 101.33 | 37.61 | 3,190 | 2.5 | 2003–2005 | Semi-arid |
Site . | Land covera . | Longitude (°N) . | Latitude (°E) . | Elevation (m) . | EC height (m) . | Duration . | Climate types . |
---|---|---|---|---|---|---|---|
HZZ | BAR | 100.32 | 38.77 | 1,731 | 2.85 | 2015–2021 | Arid |
DM | CRO | 100.37 | 38.86 | 1,556 | 4.5 | 2015–2021 | Arid |
HL | CRO | 115.79 | 40.36 | 480 | 3.5 | 2013–2019 | Semi-humid |
MY | CRO | 117.32 | 40.63 | 350 | 10.66 | 2008–2010 | Semi-humid |
YIK | CRO | 100.41 | 38.86 | 1,519 | 2.81 | 2007–2011 | Arid |
HHL | DBF | 101.13 | 41.99 | 874 | 22 | 2014–2021 | Arid |
SDQ | DBF | 101.14 | 42.00 | 873 | 8 | 2014–2021 | Arid |
DHSb | EBF | 112.54 | 23.17 | 300 | 28 | 2003–2005 | Humid |
QYZb | ENF | 115.06 | 26.74 | 111 | 39 | 2003–2005 | Humid |
CBSb | MF | 128.10 | 42.40 | 738 | 30 | 2003–2005 | Humid |
AR | GRA | 100.46 | 38.05 | 3,033 | 3.5 | 2014–2021 | Semi-arid |
CL | GRA | 123.51 | 44.58 | 171 | 2 | 2007–2010 | Semi-arid |
DXb | GRA | 91.07 | 30.50 | 4,333 | 2.2 | 2004–2005 | Semi-arid |
DSL | GRA | 98.94 | 38.84 | 3,739 | 4.5 | 2014–2021 | Semi-arid |
DLb | GRA | 116.28 | 42.05 | 1,350 | 4 | 2009–2010 | Semi-arid |
HBA | GRA | 101.18 | 37.37 | 3,160 | 2.2 | 2002–2004 | Semi-arid |
YAK | GRA | 100.24 | 38.01 | 4,148 | 3.2 | 2017–2021 | Semi-arid |
HBSb | WET | 101.33 | 37.61 | 3,190 | 2.5 | 2003–2005 | Semi-arid |
aBAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands.
bFlux sites without the observations of G.
LAI and precipitation time series at eight representative sites. Green dots represent the original LAI time series, green lines represent the processed LAI time series by the Savitzky-Golay filter (LAI_SG) method, and blue vertical bars denote precipitation. BAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands. The superscript a represents stations with missing precipitation data. (For the data of the remaining ten sites, see Supplementary material, Figure S2).
LAI and precipitation time series at eight representative sites. Green dots represent the original LAI time series, green lines represent the processed LAI time series by the Savitzky-Golay filter (LAI_SG) method, and blue vertical bars denote precipitation. BAR, barrens; CRO, croplands; DBF, deciduous broadleaf forests; EBF, evergreen broadleaf forests; ENF, evergreen needleleaf forests; GRA, grasslands; MF, mixed forests; WET, wetlands. The superscript a represents stations with missing precipitation data. (For the data of the remaining ten sites, see Supplementary material, Figure S2).
These sites provide heat flux and meteorological data. The data used here include LE, H, Rn, G, air temperature, wind speed, relative humidity (RH), atmospheric pressure, and incident shortwave radiation. Since Rn is the basis of model simulation, all models utilized observed values except for METRIC (see Section 2.1 for specific formulas). For missing observations within a short period (less than 2 hours), use the linear interpolation to complete, otherwise use the mean daily variation method (Falge et al. 2001). In addition, the energy non-closure issue is inherent in EC systems (Twine et al. 2000), so we utilized the Bowen ratio () energy closure method (Twine et al. 2000) to minimize the impact of energy imbalances. For sites without observed data of G, energy closure was not applied, and these sites have been labeled in Table 1.
In this study, we used only daytime data to drive the model. Daytime is defined as the time when incident shortwave radiation is greater than 20 W/m2, and this criterion can help filter out the periods with relatively large flux observation errors (i.e. early morning, late evening, and nighttime) (Ershadi et al. 2014; Melo et al. 2021). The results of the PT-JPL, PML-V2, and SGCF were finally aggregated to the daytime scale, and the instantaneous scale outcomes of the METRIC were upscaled to the daytime scale using the evaporative fraction (EF) method (Brutsaert & Sugita 1992).
Remote sensing-based measurements
NDVI and enhanced vegetation index (EVI) were used to produce constraint factors ,
, and
in PT-JPL, and NDVI was also used to choose cold and hot pixels in METRIC. LAI was utilized for the energy division in PT-JPL and PML-V2. These vegetation indices and LST were obtained from The Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6.1. The combination of MOD13A1 and MYD13A1 produced the 8-day NDVI and EVI data with a spatial resolution of 500 m. LAI was extracted from the MCD15A3H, which has a 4-day temporal resolution and 500 m spatial resolution. The MOD11_L2 is a near-instantaneous LST dataset with a spatial resolution of 1 km. Then these data were uniformly resampled to 500 m. Specifically, METRIC needs to be run on a regional scale, so we circled a 10 km × 10 km area around each site in Google Earth as the extent of the data download. For the other three models running at the point scale, the linear interpolation and Savitzky-Golay filter (Savitzky & Golay 1964) were applied to obtain vegetation indices on a continuous daily scale of each site, and then all these three models used the same values of vegetation indices for their parameterization, as well as
. The vegetation indices data were processed and downloaded on the Google Earth Engine (GEE) (https://code.earthengine.google.com/), which is a cloud-based platform for spatial data analysis. The LST data were downloaded from Earthdata Search (https://search.earthdata.nasa.gov/), and then processed by Python.
Statistical analysis




Model calibration and validation
The empirical parameters that need to be calibrated for the three models (i.e. PT-JPL, PML-V2, and SGCF) are shown in Table 2. To obtain optimal parameters for specific environmental conditions, the value of the RMSE between observations and simulations for LE was minimized at each site using the simulated annealing (SA) algorithm, which starts from a certain high initial temperature and finds the global optimal solution of the objective function randomly with the decreasing temperature parameter (Metropolis et al. 1953; Kirkpatrick et al. 1983). In addition, all three models were calibrated at each site individually. The available data from each site were first divided equally into two parts, then each part served as the calibration and validation sets, respectively, and the validation sets from the two parts were eventually combined for model evaluation (Gan et al. 2018; Zhang et al. 2019).
The calibration ranges of model parameters
Model . | Parameters . | Unit . | Range . | Reference . |
---|---|---|---|---|
PT-JPL | ![]() | Unitless | (0.5, 2) | Fisher et al. (2008); Zhang et al. (2017) |
![]() | (0, 1) | |||
![]() | (0, 1.5) | |||
![]() | (0.9, 1.1) | |||
![]() | (−0.048, −0.025) | |||
![]() | (−0.06, −0.04) | |||
![]() | ![]() | (5, 40) | ||
PML-V2 | ![]() | ![]() | (10, 120) | Gan et al. (2018); Zhang et al. (2019) |
![]() | ![]() | (0.01, 0.07) | ||
![]() | ![]() | (0.01, 0.07) | ||
![]() | unitless | (2, 20) | ||
![]() | ![]() | (0.01, 0.3) | ||
![]() | unitless | (0.01, 0.3) | ||
SGCF | ![]() | unitless | (0.8, 1.5) | Han & Tian (2018); Wang et al. (2020); Wang et al. (2022); Zhou et al. (2020) |
![]() | (0.6, 20) |
Model . | Parameters . | Unit . | Range . | Reference . |
---|---|---|---|---|
PT-JPL | ![]() | Unitless | (0.5, 2) | Fisher et al. (2008); Zhang et al. (2017) |
![]() | (0, 1) | |||
![]() | (0, 1.5) | |||
![]() | (0.9, 1.1) | |||
![]() | (−0.048, −0.025) | |||
![]() | (−0.06, −0.04) | |||
![]() | ![]() | (5, 40) | ||
PML-V2 | ![]() | ![]() | (10, 120) | Gan et al. (2018); Zhang et al. (2019) |
![]() | ![]() | (0.01, 0.07) | ||
![]() | ![]() | (0.01, 0.07) | ||
![]() | unitless | (2, 20) | ||
![]() | ![]() | (0.01, 0.3) | ||
![]() | unitless | (0.01, 0.3) | ||
SGCF | ![]() | unitless | (0.8, 1.5) | Han & Tian (2018); Wang et al. (2020); Wang et al. (2022); Zhou et al. (2020) |
![]() | (0.6, 20) |
RESULTS
Seasonal dynamics of predicted LE and measured LE
Seasonal dynamics of observed LE and predicted LE for the data available period at eight representative sites. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient. (For the data of the remaining ten sites, see Supplementary material, Figure S3).
Seasonal dynamics of observed LE and predicted LE for the data available period at eight representative sites. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient. (For the data of the remaining ten sites, see Supplementary material, Figure S3).
ET partitioning results of PT-JPL and PML-V2




Average for observations and two models at the same periods
Reference . | Observations . | PT-JPL . | PML-V2 . |
---|---|---|---|
Yang et al. (2015) | 0.85 | 0.36 | 0.93 |
Yang et al. (2018) | 0.87 | 0.68 | 0.85 |
Chen et al. (2021) | 0.71 | 0.72 | 0.92 |
Reference . | Observations . | PT-JPL . | PML-V2 . |
---|---|---|---|
Yang et al. (2015) | 0.85 | 0.36 | 0.93 |
Yang et al. (2018) | 0.87 | 0.68 | 0.85 |
Chen et al. (2021) | 0.71 | 0.72 | 0.92 |
The simulated ET component fractions of PT-JPL and PML-V2 at 18 sites. is canopy transpiration,
is soil evaporation,
is interception evaporation.
The simulated ET component fractions of PT-JPL and PML-V2 at 18 sites. is canopy transpiration,
is soil evaporation,
is interception evaporation.
Overall model skills
Taylor diagrams of estimated daytime LE (W/m2) during the whole research period for four models at 18 flux sites.
Taylor diagrams of estimated daytime LE (W/m2) during the whole research period for four models at 18 flux sites.
The scatter plots of daytime LE predicted by four models with LE measured at 18 flux sites.
The scatter plots of daytime LE predicted by four models with LE measured at 18 flux sites.
Statistical parameters for four models at each site and the average at all sites. BAR, Barrens; CRO, Croplands; DBF, Deciduous broadleaf forests; EBF, Evergreen broadleaf forests; ENF, Evergreen needleleaf forests; GRA, Grasslands; MF, Mixed forests; WET, Wetlands. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient.
Statistical parameters for four models at each site and the average at all sites. BAR, Barrens; CRO, Croplands; DBF, Deciduous broadleaf forests; EBF, Evergreen broadleaf forests; ENF, Evergreen needleleaf forests; GRA, Grasslands; MF, Mixed forests; WET, Wetlands. R2, coefficient of determination; RMSE, root mean square error; PBIAS, percentage bias; NSE, Nash–Sutcliffe efficiency coefficient.
Model skills under different land covers and climate types
In the arid zone, PML-V2 obtained the highest average R2 (0.78) and relatively high RMSE (61.14 W/m2), followed by METRIC, having average R2, RMSE, PBIAS, and NSE of 0.77, 63.59 W/m2, −19.08%, and 0.10, respectively. SGCF yielded the lowest average RMSE (51.69 W/m2) and absolute PBIAS (8.03%). PT-JPL had the worst performance, with average R2, RMSE, PBIAS, and NSE of 0.63, 79.57 W/m2, −21.69%, and −0.16, respectively. All models performed well in the semi-arid zone, with an average R2 greater than 0.75, RMSE less than 39.0 W/m2, and NSE greater than 0.78. SGCF had the highest R2 (0.87) and lowest RMSE (36.14 W/m2) among four models. In the semi-humid zone, PML-V2 and SGCF exhibited better performance than PT-JPL and METRIC, with an average R2 greater than 0.75 and RMSE less than 51.0 W/m2. In the humid zone, PML-V2 and SGCF still outperformed PT-JPL and METRIC, with R2 of 0.85, and 0.76, respectively. METRIC performed the worst with average R2, RMSE, PBIAS, and NSE of 0.54, 61.48 W/m2, 16.80%, and −0.12, respectively.
DISCUSSION
Uncertainties of input data and data process
The overall performance of the four models was satisfactory, but estimation errors could not be avoided. Therefore, we first analyzed the potential causes of errors during the research process: (1) Energy non-closure is an inherent problem in EC systems, six flux stations involved in this study did not have soil heat flux observations, which meant that they were unable to conduct energy closure procedures. Although some studies have shown that the results without energy closure are similar to those with, this could still be a source of uncertainty (Cao et al. 2021; Melo et al. 2021). For the other 12 flux sites, the soil heat flux is usually measured below the surface (5–10 cm depth), while energy closure typically requires soil heat flux at the soil surface, and errors will be passed on in the energy closure procedure if corrections are not made for heat storage between the soil heat flux panel and the soil surface (Heusinkveld et al. 2004; Gan & Gao 2015). (2) The spatial resolution of the remote sensing data used was 500 m, while the footprint range of the EC system depends on the reference height of each station, and the mismatch between the EC source area of the surface observations and the pixel size of the remote sensing data can cause errors (Palmer et al. 2020), which was not considered in this study. (3) To obtain a continuous time series of vegetation indices at the site scale, LAI and NDVI were interpolated and smoothed. According to Figure 2, the smoothing process reduced the peaks of the LAI, which could be attributed to the fact that the SG filter smoothed VI variations within the local data window and underestimated the local extreme points, ultimately leading to the underestimation of LAI, as well as NDVI. (4) Reasonable parameter settings may go beyond the given parameter calibration ranges since the ranges were borrowed from other literature at the daily scale, while this study was conducted at the daytime scale, which might in turn lead to estimation errors.
To validate inference (4), we tested PT-JPL and PML-V2 using all data points at 18 sites. Specifically, the range of sensitive parameters in PT-JPL ( (0.5, 2.5),
(0, 2),
(0.5, 2.5)) and PML-V2 (
(0, 180),
(0, 0.2),
(0, 0.2)) were expanded while the range of insensitive parameters was kept unchanged, then the model results were obtained by re-calibration parameterization, and model results are shown in Supplementary material, Table S4. Compared with the original model results, it was found that adjusting the parameter range would significantly improve the model performance at some sites that performed poorly under the original parameter conditions (HHL and SDQ). Taking the HHL station as an example, the R2 of the PT-JPL and PML-V2 simulation results had been improved by 0.1 and 0.04, respectively, and the RMSE had decreased by 38.07 and 22.66 W/m2 respectively. While the parameter adjustments had little impact on the originally better-performing sites, most sites had R2 and RMSE variations in the range of 0.01–0.03 and 0.3–5 W/m2, respectively. In addition, the insensitive parameter values did not change significantly from the original (see Supplementary material, Table S5). Whereas the sensitive parameters had a relatively wide range of variation, for instance,
in PT-JPL varied in the range of −0.4 to 0.4 at most sites, and
in PML-V2 varied in the range of −65 to 65 μmol m−2 s−1. Overall, the poor model performance at some of the sites was partly due to unreasonable parameter range settings.
Model performance
PML-V2 or SGCF performed the best at the majority of sites, with R2 ranging from 0.38 (0.28, statistical results for all data points) to 0.97 (0.90), and 0.47 (0.40) to 0.91 (0.92), respectively. PML-V2 is based on the combination of energy balance and a mass transfer method that incorporates conductance factors, having a solid biophysical foundation. And its favorable performance could be attributed to this. Compared to the other three models, the input data for SGCF are all ground-based observations with relatively small uncertainties, which may be one of the reasons for its better performance (Gan et al. 2021). However, the simulations of SGCF exhibited a systematic bias, overestimating LE at 13 sites. Some existing studies presented similar results, but the reasons for the overestimation of LE by SGCF have not been elucidated (Wang et al. 2020; Yan & Gao 2021). We found some possible reasons for this phenomenon. First, the application of the complementary principle is based on the premise that the humidity conditions at the land surface can be expressed by atmospheric conditions, which requires sufficient feedback between the land surface and the atmosphere to eventually reach an equilibrium state (Han & Tian 2020; Wang et al. 2021). Morton (1983) suggested that the complementary theory does not apply to short time scales since it takes time to reach equilibrium between the land surface and the atmosphere. Although the applicability of the SGCF on various time scales has been demonstrated, it may still be one of the uncertainties contributing to estimation errors (Wang et al. 2021; Yan & Gao 2021). Second, several studies have shown that the parameters of CR-based models (e.g. and
) are not fixed over the year, usually changing with the state of vegetation growth and aridity (Gan et al. 2021). For example, Gan et al. (2021) evaluated a generalized complementary relationship ET model (GCR) at 60 flux sites around the world, finding that the key parameter (
) of the model was strongly correlated with atmospheric conditions, varying by 0.3–0.4 in a year, and model performance would be improved if parameter calibration was undertaken. Whereas in this study, both
and b were fixed throughout the year, which might have led to bias in the estimation. In addition, the boundary values
and
for
were used as constant values 0 and 1, while the boundary values are only close to 0 and 1 even under completely dry and completely wet conditions, which may be another source of estimation inaccuracy (Han & Tian 2018; Zhou et al. 2020). Third, the SGCF simulation had a relatively smooth change in LE (Figure 3), and the model set the RMSE as the calibration target. So to match the peaks, the entire simulation curve was lifted accordingly, eventually resulting in an overestimation of the rest of the period. In addition, based on the above reasons, the problems presented by SGCF in modeling seasonal dynamics could be rationally explained. SGCF had overly smooth LE growth and decline rates, failing to display fluctuations. This may be because the simulated results from SGCF are based on the assumption that the surface and the atmosphere have reached a steady state while the actual observed data have not. Furthermore, the overestimation of SGCF in the non-growing season and slightly underestimation in the growing season might be due to the lack of consideration of seasonal changes in vegetation cover.
Apart from the overestimation of SGCF, it is also worth noting that PML-V2 and PT-JPL underestimated LE at 11 and 12 sites, respectively. For PT-JPL, vegetation indices (LAI, NDVI, and EVI) are key constraints. First, LAI determines the distribution of available energy between soil and vegetation (energy is divided according to vegetation cover), thus determining potential transpiration. Second, NDVI and EVI are involved in the process of limiting potential transpiration to actual transpiration by biophysical constraint parameters. The underestimation of vegetation indices may result in a lower allocation of available energy to the canopy and the lower canopy transpiration constraint parameters (,
, etc.), ultimately leading to the underestimation of
(Kustas & Norman 1999; Yang et al. 2016). In this study, vegetation indices were underestimated due to the inherent flaws of the interpolation and smoothing methods, which could somewhat explain the underestimation of PT-JPL. PML-V2 simulated LE well in the growing season and underestimated LE in the non-growing season. Soil evaporation in the non-growing season is dominant, so for most sites (located in areas with low vegetation cover, or where vegetation growth is significantly seasonal), the underestimation of LE could be attributed to the underestimation of
.
use
(soil evaporation constrain parameter, expressed as the ratio of cumulative P over the previous 32 days to cumulative
) to indirectly characterize soil moisture (equivalent to soil water content, i.e. SWC) status. Although
is simple and easy to use, it has some drawbacks. First,
assigns the same weight to daily P and
. This may lead to the response of soil moisture to P not being captured since the effect of P on soil moisture diminishes over time after precipitation events (Zhang et al. 2010; Gan et al. 2018; Bai 2023), ultimately resulting in the underestimation of
. Second,
ignores the impact of snowfall and snowmelt. On the one hand, solid precipitation may not have been recorded during the non-growing season, leading to low observed precipitation. On the other hand, snowmelt changes the soil moisture state and then participates in the evaporation process. However, this part of evaporation is neglected. In addition, the sublimation of ice and snow is not taken into account. These reasons could explain why PML-V2 underestimated LE. For example, AR is an alpine site that exhibits significant seasonality, with simulated LE close to zero in the non-growing season, while DHS and QYZ sites located in warm and humid regions presented favorable simulation results in the non-growing season (as shown in Figure 2).
Based on the discussion above, the significant discrepancy between the ET partitioning results of PT-JPL and PML-V2 could also be explained. Since ground-based ET partitioning data are generally not widely available, it is difficult to judge the partitioning results. According to the site data (Daman station) and model outputs from other literature, in this study, PT-JPL tended to overestimate and
, while PML-V2 did the opposite. The underestimation of
, and overestimation of
and
by PT-JPL could be found in other literature (Luo et al. 2022; Bai 2023). The reasons for the underestimation of
have been mentioned above. As for the overestimation of
and
, the reasons are as following.
uses atmospheric moisture conditions (
, expressed as
) to indirectly reflect the effect of soil moisture on soil evaporation, which may bring about inaccurate estimates of
due to the decoupling of soil moisture conditions and atmospheric moisture conditions on daily and shorter time scales (Fisher et al. 2008; Gan et al. 2019). Similarly, the constraint parameter for
(
, expressed as
) uses RH to reflect the state of the P, ignoring the fact that RH and P are usually decoupled. Therefore, the current
and
equations may overestimate soil moisture and canopy interception, which in turn leads to overestimation of
and
. And then the overestimation of
directly contributes to the underestimation of
since they are tightly coupled through
in the model (Luo et al. 2022; Bai 2023). The underestimation of
in PML-V2 is the main reason for the overestimation of
, as explained above. As for
in PML-V2, most of the sites selected for this study are located in arid and semi-arid zones with low precipitation and low vegetation cover. Canopy interception will be relatively small in such an environment, so it may be reasonable that many sites have interception evaporation close to zero. In a word, PML-V2 tended to underestimate
in the non-growing season, and consequently showed low
values and high
values.


















The trends between the soil evaporation constraint parameters (i.e. in PT-JPL and
in PML-V2) and soil moisture status (expressed as
) on the daily and monthly scales. (a)–(d) represent daily time scales, and (e)–(h) represent monthly time scales.
The trends between the soil evaporation constraint parameters (i.e. in PT-JPL and
in PML-V2) and soil moisture status (expressed as
) on the daily and monthly scales. (a)–(d) represent daily time scales, and (e)–(h) represent monthly time scales.
Model performance varied across land cover types. All models performed well under GRA sites, with R2 of 0.84, 0.83, 0.87, and 0.78, respectively, which may be due to the more homogeneous subsurface of GRA sites. METRIC outperformed the other models at the CRO sites, possibly owing to its internal calibration process and the fact that it was built specifically for croplands. The underestimation of LE in PT-JPL and PML-V2 might be due to that they did not take into account the fact that crops require a lot of irrigation during growth. All models underestimated LE at two DBF sites (HHL and SDQ), especially the other three models except for SGCF, where the PBIAS was even less than −50%, and this also contributed to a decrease in the average or overall performance of the models. These two sites are located in arid zones, where vegetation will have deep root systems that can utilize deeper soil moisture or even groundwater to maintain growth, but no model took the influences of groundwater into account during ET modeling (Chen et al. 2020). Combined with the underestimation of LAI mentioned above, LE was ultimately significantly underestimated.
Across climate zones, the models did not exhibit a characteristic decline in model performance with increasing aridity. Almost all models achieved the best results in the semi-arid zone, while in the other zones, model performance was variable. Except for METRIC, all models performed the worst in the arid zone, and both PT-JPL and PML-V2 performed second only to the semi-arid zone in the humid zone. However, this does not assert that the model performs better with increasing moisture since the sites dominating each region have different land cover types, and the model performance seems to be more influenced by land cover types.
Some studies have also shown that temperature-based models are more suitable for water-limited areas with partial vegetation cover, and conductance-based models work best for both water-stressed and energy-limited conditions for vegetated areas (Chen & Liu 2020). METRIC is a temperature-based model that uses LST to calculate H, and thus the energy balance residual LE, having an average R2 of 0.77, 0.72, 0.71, and 0.54 from the arid zone to the humid zone, respectively. The performance of METRIC decreases accordingly as the vegetation cover increases and the degree of water limitation decreases, which to some extent corroborates the statement mentioned above. PML-V2 is a conductance-based model that uses vegetation information obtained from optical remote sensing to calculate stomatal or canopy conductance, obtaining the highest R2 of 0.85 in the wet zone dominated by forest stations with high LAI and did not show a trend of decreasing performance with increasing dryness. The overall performance of PML-V2 seems not to validate that it performs better in water-limited and energy-limited areas. Therefore, when specific to a model or certain application scenarios, the general laws may become inapplicable.
CONCLUSION
In this study, combing remotely sensed data and site observations, four terrestrial ET models based on different theories were intercompared at 18 flux sites distributed in China. PML-V2 and SGCF had the best performance at most sites, with mean R2 of 0.80 (0.73, statistical results for all data points) and 0.79 (0.80), respectively. All models performed well in the semi-arid zone dominated by the GRA sites and did not show a decline in model performance with increasing aridity. PT-JPL and PML-V2 underestimated LE at most sites, while SGCF did the opposite. The ET partitioning results of PT-JPL and PML-V2 differed considerably. PT-JPL tended to overestimate and
because of the unreasonable parameterization scheme of
and
. PML-V2 underestimated soil evaporation in the non-growing season due to the inherent shortcomings of its constraint parameter
, and then resulted in the underestimation of
. Overall, the performance of each model varied between sites, ecosystems, and climate zones, and the discrepancies in the results could be attributed to the uncertainties of the input data and inherent flaws in the model structure. Since the ET partitioning results still have a large uncertainty, more in-depth research is needed in the future.
ACKNOWLEDGEMENTS
This work was supported by the Key Project of the National Natural Science Foundation of China under Grant No. 42330506 and the General Program of the National Science Foundation of China under Grant No. 42071054. This work used meteorological and EC data acquired and shared by the National Tibetan Plateau Science Data Centre (https://data.tpdc.ac.cn) and the FLUXNET community (https://fluxnet.org/data/download-data).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.