Hydrological models and remote sensing evapotranspiration (ET) models usually are used to estimate regional ET. This study aims to integrate the advantages of both the models to simulate the daily ET processes. A compromise between these two methodologies is represented by improving the optimization of the hydrological model on the basis of a new probability optimal ET series, which is produced by a data assimilation scheme combining sparse remote estimates and continuous modeling of regional ETs. The distributed time-variant gain hydrological model (DTVGM) and a two-layer remote sensing ET model are chosen. First, the DTVGM is optimized by maximizing the Nash–Sutcliffe efficiency of daily streamflow in the Shahe River basin, and simulates the daily hydrological processes of 1999–2007. For improving the accuracy of continuous ET simulation, the DTVGM is further optimized by dual objective functions composed of the assimilated ETs and observed outlet discharge. The results show that the accuracy of the DTVGM-based daily ETs is improved after the dual optimization, and the mean absolute percentage error between the DTVGM-based ETs and the measured ETs in the study area is reduced by 5.84%. The integrated method is proved better, and improves the hydrology modeling accuracy.

## INTRODUCTION

Evapotranspiration (ET), one of the most important hydrological processes, including soil evaporation and vegetation transpiration, is the key link between the land surface energy and water balances (*WBs*) (Norman *et al.* 2003; Anderson *et al.* 2007; Galleguillos *et al.* 2011; Long & Singh 2013). Currently, the regional ET in a river basin is normally estimated by the following three methods: empirical formulae, hydrological process models, and remote sensing ET models (RSTMs), each of which is insufficient to obtain precise and continuous regional ET. The associated empirical formulae mainly include the *WB* method, water heat coupling equations, and the complementary relationship method. The first two ET methods are suitable for calculating regional ET values over long time scales within closed basins, but with low precision (Wilson *et al.* 2001; Cammalleri *et al.* 2010; Wang & Dickinson 2012). The complementary relationship between actual and potential ET is not steady, and is tightly related to weather conditions (Szilagyi & Jozsa 2008), thus the relationship explaining the evaporation paradox is also unsteady (Liu *et al.* 2006). Therefore, regional ET estimates based on the relationship between actual and potential ET also contain uncertainties.

Hydrological models are good tools for simulating and analyzing the existence, movement, and distribution of water in a particular basin (Werth *et al.* 2009). Usually, the river basin is divided into a number of hydrological response units for hydrological process simulation. As the simulation error is cumulative in the simulation process, the ETs simulated by the hydrological models have low precision (Rakovec *et al.* 2012). Furthermore, poor representation of the dynamic underlying surface changes and the observation errors in the model input data lead to additional uncertainties in hydrological models (Immerzeel & Droogers 2008). Although hydrological models could be optimized based on the runoff observation, the optimization only refers to the *WB* mechanism, not the surface energy balance. The efficient optimization of ET processes simulation is affected by the lack of reliable observed ET.

The RSTMs can estimate the surface ET with sufficient accuracy, and have become important tools for estimating regional ET (Mauser & Schädlich 1998; Bastiaanssen *et al.* 2005; Kalma *et al.* 2008; Mercado *et al.* 2009). The applicability of RSTMs based on the surface energy balance is restricted by the lack of high-frequency and high-resolution thermal data (Baroncini *et al.* 2008). Therefore, continuous ET processes are difficult to monitor when using RSTMs alone (Cammalleri & Ciraolo 2012).

For practical applications, effective estimates of time-continuous daily, monthly, and yearly ET processes at the river basin scale are especially important for water resource management, climate change studies, and agricultural planning (Peters-Lidard *et al.* 2011). The RSTMs have the advantages of producing regional ETs with high spatial resolution and accuracy, but with poor temporal resolution, while hydrological models are the most effective means of obtaining time-continuous ETs. Thus, a new method could be developed to take advantage of the complementary properties of RSTMs and hydrological models, allowing both high precision and time-continuous ET estimates.

Currently, relatively few studies have considered such a combination (Pipunic *et al.* 2008). Irmak & Kamble (2009) used the 18 daily ETs from the SEBAL (surface energy balance algorithm for land) remote sensing model as observations to optimize the parameters of the SWAP (soil, water, atmosphere and plant) model based on the genetic algorithm, and improved the accuracy of the ET process simulation. However, the number of ET observation values was very small, and the optimization did not consider the observed outlet runoff; therefore, there were limited improvements in the results, and uncertainties remained in the SWAP simulation. Additionally, remote sensing ET is essentially simulated data, not actual data, so it is not suitable to be used as observations to optimize the hydrological simulation directly. The method will bring new uncertainties. Han *et al.* (2012) used data assimilation (DA) technology to combine remote sensing ETs and hydrological model-based ETs, while Parada & Liang (2008) demonstrated that the DA-based results have an optimal probability. Therefore, the DA-based ETs are better suited and more reliable for constructing the objective functions, when optimizing the hydrological models, than remotely sensed ETs.

This study aims to propose an integrated ET simulation approach integrating the advantages of the remote sensing and the hydrological model, by improving the hydrological model optimization. The optimization is not only based on the runoff observation, but also on a new probability optimal ET series which is produced by a DA scheme combining sparse remote estimates and continuous modeling of regional ET. The daily ETs in the Shahe River basin, Beijing, for typical days between 1999 and 2007, are estimated by a two-layer RSTM proposed by Zhang *et al.* (2007), using Landsat TM/ETM + data. The daily ETs in the Shahe River basin, Beijing, were also simulated by a chosen hydrological model, a distributed time-variant gain hydrology model (DTVGM) proposed by Xia *et al.* (2005). DTVGM has a small computational cost and has been applied successfully in the Haihe River basin, Huaihe River basin, and other areas in China (Li *et al.* 2010). A widely used DA algorithm, the ensemble Kalman filter (EnKF), is used to assimilate the regional ET series from the RSTM and DTVGM. The DTVGM is optimized by dual objective functions composed of the assimilated ETs and observed outlet discharge, and the optimized DTVGM is used to produce continuous daily ETs with higher precision. The rest of the paper is structured as follows. The next section describes the basic framework along with the data, models, and methods used. This is followed by a section presenting the simulated results and analysis. The paper closes with a discussion and conclusions.

## DATA AND METHODOLOGY

### Study area

The Shahe River basin (Figure 1) is located between 40°00′– 40°30′N and 115°50′– 116°20′E in Beijing, China, and has a drainage area of approximately 1,125 km^{2}, 75% of which is mountainous. Land cover types include woodland, grassland, water bodies, developed land, and farmland. Most woodland is located in the upstream regions of the basin, while the developed land and farmland are primarily distributed in downstream regions. Woodland accounts for the largest proportion, covering approximately 47.5% of the total basin area; farmland is the second largest land cover type, accounting for 34.3% of the total area. The Shahe River flows from northwest to southeast, and the entire river basin is located in a warm climatic zone with a semi-humid and semi-arid climate. The winter is dry and cold; there is low rainfall in the spring and heavy rainfall in the summer. The average annual temperature from 1961 to 2007 was 11.5 °C, ranging from −4.1 °C in January to 25.7 °C in July. The average annual precipitation is 548 mm, approximately 80% of which falls during the period of June to September.

### Data

The data in this study mainly include satellite images, meteorological measurements, hydrological observations, and some geographic information system (GIS) data. Eighteen cloud-free Landsat images from 1999 to 2007 were chosen from the US Geological Survey (USGS) data center (http://glovis.usgs.gov/). Based on image registration, radiometric correction and some necessary preprocessing of the initial images, the vegetation cover, surface albedo, surface temperature, and some other necessary parameters were estimated following the methods of Liang (2000) and Qin *et al.* (2001).

Hydrological observation data, including runoff, rainfall, water levels, and reservoir management information, were collected from two precipitation stations and six hydrological stations distributed within the basin. Meteorological measurements, including solar radiation, air temperature, vapor pressure, and wind speed at the Beijing meteorological station were obtained from the Chinese Meteorological Administration.

In addition, GIS data including a digital elevation model (DEM) with spatial resolution of 30 m, a 1:10,000 land use map, 1:100,000 soil type map, and 1:250,000 drainage map of the study area were obtained from the Science Data Center of Resources and Environment, Chinese Academy of Sciences (CAS). All GIS data sets were converted into ESRI grid format with the same grid size and map projection. Table 1 shows the types of data used in the study.

Data | Items | Sources |
---|---|---|

Hydrologic and meteorological data | Rainfall and runoff | Hydrologic Statistical Yearbook |

Wind speed | Chinese Meteorological Administration | |

Air temperature | ||

Sunshine hours | ||

Humidity | ||

Remote sensing data | Landsat data | USGS: http://glovis.usgs.gov |

GIS data | DEM (30 m) | Science Data Center of Resources and Environment, CAS |

1:10,000 land use map | ||

1:100,000 soil classification map | ||

1:250,000 drainage map |

Data | Items | Sources |
---|---|---|

Hydrologic and meteorological data | Rainfall and runoff | Hydrologic Statistical Yearbook |

Wind speed | Chinese Meteorological Administration | |

Air temperature | ||

Sunshine hours | ||

Humidity | ||

Remote sensing data | Landsat data | USGS: http://glovis.usgs.gov |

GIS data | DEM (30 m) | Science Data Center of Resources and Environment, CAS |

1:10,000 land use map | ||

1:100,000 soil classification map | ||

1:250,000 drainage map |

### Methodology

In this study, a new integrated approach is developed to optimize the DTVGM for accurate daily ETs, based on the assimilated ETs and observed outlet discharge. The DTVGM is optimized under two different scenarios, and the flowchart of the methodology is shown in Figure 2. First, the DTVGM is calibrated and validated using the shuffled complex evolution-University of Arizona (SCE-UA) algorithm (Duan *et al.* 1994) with the Nash–Sutcliffe efficiency (*NSE*) (Gupta *et al.* 2009) of runoff as the objective function, then the model outputs the daily ETs in the Shahe River basin as Scenario 1 in Figure 2. The DTVGM-simulated daily ETs and the RSTM-based ETs are assimilated by EnKF. The root mean square error (RMSE) between the assimilated ETs and the simulated ETs, denoted as *RMSE*_{ET}, is utilized as another objective function to further optimize the constructed DTVGM, combined with the *NSE*, as Scenario 2 in Figure 2. The new integrated approach following Scenario 2 will produce continuous daily ETs with high accuracy.

#### Distributed time-variant gain model

*et al.*2005). DTVGM integrates the nonlinear systems approach and distributed hydrological simulation technology, can represent the nonlinear relationship between rainfall and runoff under the influence of human activities, and includes an excellent representation of dynamic underlying surface changes. The model has been successfully applied in the Haihe River and Heihe River basins in China (Li

*et al.*2010). Briefly, the DTVGM includes multiple components of hydrological analysis and modeling, such as distributed input data processing, a runoff generation model for each hydrological response unit, and a flow-routing model between adjacent concentration belts. The ET process simulation by DTVGM includes calculations of potential ET (

*E*) and actual ET (

_{p}*E*). Several methods can be adopted to simulate potential ET, such as the Penman–Monteith formula, the Priestley–Taylor formula and the Hargreaves formula. The Hargreaves formula is a potential ET empirical formula on the basis of the infiltration mechanism (Hargreaves & Samani 1985), which provided a better method with good agreement with Penman–Monteith in the Beijing area, and has the advantage of fewer input parameters (Luo & Rong 2007). Thus, the Hargreaves formula is utilized to calculate the

_{a}*E*for the DTVGM: where

_{p}*T*is the average temperature [°C],

*R*is total shortwave radiation [MJ/(m

_{s}^{−2}·day)], and is the latent heat of water vaporization,

*λ*

*=*2 .45 MJ/kg.

*E*can be expressed as a function of

_{a}*E*and soil moisture (

_{p}*W*): where

*P*is the precipitation [mm],

*W*is the soil moisture content [mm],

*WM*is the saturated soil moisture content [mm],

*KAW*is a coefficient between 0 and 1,

*KET*is a linear or nonlinear function of

*P*/

*ET*, and

_{p}*f*() is a linear or nonlinear function (Xia

*et al.*2005; Song

*et al.*2012).

*et al.*1994), which combines the direction-searching of deterministic, non-numerical methods and the robustness of stochastic, non-numerical methods. It adopts competition evolution theory, concepts of controlled random search, the complex shuffling method, and downhill simplex procedures to obtain a global optimal estimation (Mariani & Coelho 2011). The objective function of DTVGM is defined in Equation (3). In the optimization carried out during Scenario 1 (see Figure 2), the model is calibrated by maximizing the

*NSE*of daily outlet discharge. The

*NSE*is defined as: where

*Q*and

_{obs,i}*Q*are the observed and simulated values of daily outlet discharge at the date

_{sim,i}*i*, respectively. is the average observed value, and

*n*is the number of observations.

*RMSE*

_{ET}constructed by ET, as well as the

*NSE*constructed by daily outlet discharge. The

*RMSE*

_{ET}is defined as follows: where the ET

_{DA,i}is the ET assimilated via the DA approach on date

*i*, ET

_{DTVGM,i}is the ET simulated by the DTVGM on date

*i*, and

*m*is the number of assimilated ETs.

*RMSE*

_{ET}while ensuring

*NSE*> 0.75. Based on the objective functions for

*NSE*and

*RMSE*

_{ET}, the SCE-UA algorithm further optimizes the DTVGM for simulating a new series of continuous daily ETs with high accuracy. The

*WB*coefficient is used to evaluate the agreement between the simulated and observed runoff volume in the two scenarios. The

*WB*is closer to 1, the effect of

*WB*simulation is better (Song

*et al.*2012):

In the DTVGM, there are nearly 50 parameters, and some parameters are not important for the output response of interest. Therefore, the eight important or sensitive parameters listed in Table 2 are selected and all the parameters are uniformly distributed in the ranges (Li *et al.* 2010; Song *et al.* 2012).

Parameter | Description | Range | Optimization result | |
---|---|---|---|---|

Scenario 1 | Scenario 2 | |||

g_{1} | Time-variant gain factor, related to surface runoff generation | [0.01, 1.0] | 0.231 | 0.220 |

g_{2} | Time-variant gain factor, related to soil moisture content | [1.0, 5.0] | 2.68 | 2.711 |

Kr | Storage-outflow coefficient related to interflow runoff generation | [0.01, 1.0] | 0.182 | 0.195 |

fc | Stable infiltration rate | [0.10, 5.0] | 1.19 | 1.203 |

KAW | Coefficient for actual ET | [0.01, 1.0] | 0.584 | 0.510 |

RoughRss | Roughness coefficient of Manning's formula | [0.001, 0.1] | 0.0099 | 0.010 |

Wmi | Minimum soil moisture storage | [0.01, 0.40] | 0.044 | 0.052 |

WC | Upper layer saturated soil moisture storage | [0.1, 1.0] | 0.411 | 0.402 |

Parameter | Description | Range | Optimization result | |
---|---|---|---|---|

Scenario 1 | Scenario 2 | |||

g_{1} | Time-variant gain factor, related to surface runoff generation | [0.01, 1.0] | 0.231 | 0.220 |

g_{2} | Time-variant gain factor, related to soil moisture content | [1.0, 5.0] | 2.68 | 2.711 |

Kr | Storage-outflow coefficient related to interflow runoff generation | [0.01, 1.0] | 0.182 | 0.195 |

fc | Stable infiltration rate | [0.10, 5.0] | 1.19 | 1.203 |

KAW | Coefficient for actual ET | [0.01, 1.0] | 0.584 | 0.510 |

RoughRss | Roughness coefficient of Manning's formula | [0.001, 0.1] | 0.0099 | 0.010 |

Wmi | Minimum soil moisture storage | [0.01, 0.40] | 0.044 | 0.052 |

WC | Upper layer saturated soil moisture storage | [0.1, 1.0] | 0.411 | 0.402 |

#### Two-layer RSTM

*et al.*2007). The PCACA has the operational advantage that only single-angle remote sensing data are required, and these can be provided by most satellite data. Additionally, the layered energy-separation algorithm based on the Bowen-ratio energy balance method can reduce the uncertainties in surface energy partitioning (Zhang

*et al.*2007, 2008). The PCACA follows the trapezoid method and the Bowen-ratio energy balance method to partition surface temperature, surface albedo, and net radiation of mixed pixels and to estimate soil evaporation and vegetation transpiration. The layered energy-separation algorithm is used to calculate the Bowen-ratio

*β*, defined as the ratio of sensible heat flux to latent heat flux, for soil and vegetation (expressed as

*β*and

_{s}*β*, respectively) (Zhang

_{v}*et al.*2008). The layered energy-separation algorithm is mainly based on the relationship between the water deficit index and the potential ET (Moran

*et al.*1994; Zhang

*et al.*2007). By combining the energy separation algorithm with the

*T*space, the soil Bowen ratio,

_{m}– f*β*, and vegetation Bowen ratio,

_{s}*β*, are calculated. Then, the net radiation at the soil surface,

_{v}*R*, and at the vegetation surface,

_{sn}*R*, can be calculated based on the surface energy balance theory. Finally, soil evaporation (

_{vn}*λE*) and vegetation transpiration (

_{s}*λE*) are retrieved via the Bowen-ratio energy balance method as shown in Equation (6). If the energy exchange between vegetation and bare soil is negligible, the latent evaporation

_{v}*λE*of a mixed pixel can be described as a linear combination of

*λE*and

_{s}*λE*, as shown in Equation (7). The evaporative fraction Λ can be calculated by Equation (8). The daily ET is estimated by the evaporative fraction method (Bastiaanssen

_{v}*et al.*1998; Tasumi

*et al.*2005; Galleguillos

*et al.*2011) as shown in Equation (9): where

*λ*is evaporation latent heat in MJ·m

^{−3}and

*λ*= (2.501 − 0.02361

*T*

_{0}) × 10

^{3}MJ·kg

^{−1}in this study (Tasumi

*et al.*2005), and

*T*

_{0}is the average daily temperature in °C.

*Rn*is the daily net radiation, which is equal to the difference between net shortwave radiation and net long wave radiation (Zhan

_{daily}*et al.*2011a).

#### Data assimilation

The EnKF is utilized here for the ET DA. The EnKF is a direct observer assimilation method, and overcomes the linearization issue through a Monte Carlo approach (Reichle 2008). An ensemble of parallel model runs is generated for the same time period with the EnKF, and because the EnKF combines both observation and simulation errors the results have the ‘optimal’ probability (Crow & Wood 2003). Thus, the EnKF is employed to assimilate the RSTM-based daily ETs and the DTVGM-based daily ETs for obtaining a high-quality daily ET distribution. The DA method can reduce the uncertainty of ET estimates, so as to produce a more reliable ET time series. Thus, the DTVGM can be further optimized based on the assimilated daily ETs. The DA process adopted in this study is summarized as follows.

*X*is defined as the DTVGM-based daily ETs optimized in Scenario 1 (Figure 2), with a sample size

^{b}*n*. Each member of

*X*has a perturbation noise. Simultaneously, each RSTM-based daily ET has added a perturbation noise as the member of observed variables ensemble

^{b}*Y*. The perturbation is a Gaussian white noise with covariance

*R*. The assimilated variable

*X*(the daily ET) is updated as follows: where

^{a}*H*( ) is a nonlinear operator that relates

*X*to

*Y*, with superscript

*T*denoting the matrix transpose, and is equal to a unit matrix in this study, because both background and observed data are the daily ET. The difference between the observed and predicted values (

*Y*−

*H*(

*X*)) is weighted by the Kalman gain

^{b}*K*. The term

*P*represents the error covariance of the members of

^{b}*X*. For the steps above, if

^{b}*X*meets the error requirements, the expectation of these assimilated members can represent the daily ETs.

^{a}The proposed DA method is key to the effectiveness of further optimization. Statistically, a larger number of ensemble members in the DA method results in an ensemble mean and covariance which is closer to reality. However, large numbers rapidly increase the computational cost. Thus, it is desirable to use the minimum number of ensemble members while still obtaining satisfactory estimates of the ensemble mean and covariance. We conducted an assimilation experiment to determine the minimum number of ensemble members for accurate ET simulation, so as to optimize the computing power available for the subsequent assimilation experiments.

The ET observation errors for an observation data set are defined as the mean absolute percentage error (MAPE) between the RS-based ET and large aperture scintillometer (LAS)-based ET, and the covariance R of ET in the DA process is equal to the square of the MAPE. DA over the experiment period is performed repeatedly using different ensemble sizes from 10 to 500 members at steps of 10. The relative RMSE and MAPE are calculated to determine the optimal number of ensemble members. Figure 3 plots the number of ensemble members against RMSE and MAPE values for different ET outputs. The value corresponding to 0 ensembles is the RMSE value between the observations and full open-loop outputs for reference. As the decrease in RMSE and MAPE is minimal for more than 100 ensembles, the ensemble size of 100 members is chosen as adequate for carrying out the remaining assimilation experiments.

As the ensemble members and observation errors are fixed, the ETs are calculated using the DA process with different model errors, varying from 0 to 0.08 at steps of 0.001. When the decrease in RMSE and MAPE is minimal, the corresponding DA-based ETs are the required data. We select July 1, 1997 as an example. When the DA process as shown in Figure 4 stops, where the model error is 0.05, the corresponding ET is the optimal result.

## RESULTS

### ET estimated by RSTM and DTVGM

The time-continuous daily ETs from 1999 to 2007 in the 28 sub-basins are simulated by the DTVGM, optimized in Scenario 1 (Figure 2), and the daily ETs in the 18 cloud-free days are retrieved for each pixel by the RSTM. We take the ET simulation on July 9, 2002, as an example. Figure 5 shows the spatial distribution of daily ET simulated by the RSTM and DTVGM, respectively.

Observations from the Xiaotangshan flux observation station and Miyun station are used to validate the accuracy of remote sensing ET retrievals. The Xiaotangshan flux observation station is located in the No.1 sub-basin of the Shahe River basin, and the Miyun station is located close to the Shahe River basin and has the same geographic characteristics as those of the Shahe River basin. The test site of the flux observation station is 1 km long from north to south, and 0.5 km east to west wide. This scale is suitable for the verification of the Landsat data retrieval. Three sets of LAS are placed on three 4-meter-high towers, respectively; the LAS optical path length is 900 to 1,500 meters. The instrument data of LAS with a large optical path can be used as the data source of a sub-basin ET interpolation calculation. Based on the LAS measurement, the regional ET in the Beijing area estimated by MODIS data was validated (Jia *et al.* 2010). The results demonstrated that the proposed validation method based on LAS observation data was feasible. The RMSE and MAPE of estimated monthly and daily ET were 13.75, 0.91 mm and 22.79%, 18.61%, respectively (Jia *et al.* 2010).

In this study, the RSTM-based daily ETs are based on Landsat TM/ETM + data with 30 m spatial resolution, while the eddy correlation measurement system and LAS have an optical path over several kilometers (Solignac *et al.* 2009); thus, the impact of spatial scale on ET verification could be ignored (Marx & Kunstmann 2008; Pauwels *et al.* 2008). In order to match the time and space scales for the output of RSTM and DTVGM, the instantaneous observed flux is converted into daily ET data, and the RSTM-based daily ETs in the pixels covering the Xiaotangshan flux station are averaged by area. The results (Figure 6) show that the absolute percentage errors of the RSTM-based daily ETs over 18 days are between 1.5% and 11.1%, the MAPE is approximately 7.1%, the RMSE is approximately 0.77 mm and the correlation coefficient is 0.97. The RSTM-based daily ETs in some pixels also are compared with the observed data from the Miyun station, yielding a MAPE of 13%, RMSE of 1.45 mm, and correlation coefficient of 0.95. Therefore, the accuracy of the RSTM-based ETs is high.

The DTVGM-based daily runoff is validated with the observed outlet discharge from the hydrological stations located in the No.1 sub-basin of the Shahe River basin. A comparison between simulations and observations is shown in Figure 7, revealing that the *NSE* of the daily runoff simulation is 0.758, which indicates the simulation is good but not perfect (Zhan *et al.* 2011b). The *WB* coefficient in Scenario 1 is 1.208. Compared to the measured ETs from the Xiaotangshan station, the correlation coefficient of the simulated daily ETs in the No.1 sub-basin is 0.59 (Figure 5(b)), and the MAPE is nearly 25%. Thus, the DTVGM-based ET simulation is poor when the DTVGM is optimized only by *NSE* constructed using the outlet discharge of the whole basin. The DTVGM still needs to be improved or further optimized.

### Improvement of the DTVGM optimization

The boundaries of the sub-basins are often different to the boundaries of the grids in satellite data. In this study, the pixel-to-pixel RSTM simulation is based on the 30 × 30 m^{2} grid mode, whereas the DTVGM is based on the sub-basin mode. The RSTM-based daily ETs are aggregated into 28 sub-basins to maintain spatial consistency between the hydrological simulation and the remote sensing retrievals. Then, the EnKF-based DA is used to improve the accuracy of RSTM-based ETs and DTVGM-based ETs. The comparisons between RSTM-based ETs, DTVGM-based Ets, and DA-based ETs are shown in Table 3. The average MAPE between DTVGM-based ETs and RSTM-based ETs is approximately 12.3%, while the average MAPE between DA-based ETs and RSTM-based ETs is 4.5%. This shows that the DA-based ETs are closer to the RSTM-based ET values than the DTVGM-based values.

Date | DTVGM | DA | RSTM | ||
---|---|---|---|---|---|

Daily ET (mm) | Relative error (%) | Daily ET (mm) | Relative error (%) | Daily ET (mm) | |

1999-7-1 | 2.335 | −6.040 | 2.321 | −5.404 | 2.202 |

1999-12-24 | 0.293 | 4.870 | 0.299 | 2.922 | 0.308 |

2000-4-30 | 2.489 | −11.365 | 2.385 | −6.711 | 2.235 |

2000-12-10 | 0.247 | 16.835 | 0.272 | 8.418 | 0.297 |

2001-4-1 | 2.522 | −20.843 | 2.256 | −8.098 | 2.087 |

2001-6-4 | 3.049 | −22.548 | 2.695 | −8.320 | 2.488 |

2001-9-24 | 1.354 | 3.355 | 1.375 | 1.856 | 1.401 |

2001-12-13 | 0.299 | 2.922 | 0.305 | 0.974 | 0.308 |

2002-3-19 | 2.233 | −19.348 | 1.983 | −5.986 | 1.871 |

2002-5-22 | 2.345 | −14.614 | 2.139 | −4.545 | 2.046 |

2002-7-9 | 2.345 | 10.769 | 2.533 | 3.615 | 2.628 |

2002-10-13 | 1.421 | −14.875 | 1.299 | −5.012 | 1.237 |

2002-11-14 | 0.602 | 8.926 | 0.639 | 3.328 | 0.661 |

2002-12-16 | 0.268 | 8.844 | 0.286 | 2.721 | 0.294 |

2003-2-2 | 0.449 | −15.424 | 0.406 | −4.370 | 0.389 |

2003-4-7 | 1.109 | −20.413 | 0.958 | −4.017 | 0.921 |

2003-5-25 | 2.275 | −16.547 | 2.021 | −3.535 | 1.952 |

2007-10-3 | 1.615 | −2.475 | 1.585 | −0.571 | 1.576 |

MAPE | 12.278 | 4.467 |

Date | DTVGM | DA | RSTM | ||
---|---|---|---|---|---|

Daily ET (mm) | Relative error (%) | Daily ET (mm) | Relative error (%) | Daily ET (mm) | |

1999-7-1 | 2.335 | −6.040 | 2.321 | −5.404 | 2.202 |

1999-12-24 | 0.293 | 4.870 | 0.299 | 2.922 | 0.308 |

2000-4-30 | 2.489 | −11.365 | 2.385 | −6.711 | 2.235 |

2000-12-10 | 0.247 | 16.835 | 0.272 | 8.418 | 0.297 |

2001-4-1 | 2.522 | −20.843 | 2.256 | −8.098 | 2.087 |

2001-6-4 | 3.049 | −22.548 | 2.695 | −8.320 | 2.488 |

2001-9-24 | 1.354 | 3.355 | 1.375 | 1.856 | 1.401 |

2001-12-13 | 0.299 | 2.922 | 0.305 | 0.974 | 0.308 |

2002-3-19 | 2.233 | −19.348 | 1.983 | −5.986 | 1.871 |

2002-5-22 | 2.345 | −14.614 | 2.139 | −4.545 | 2.046 |

2002-7-9 | 2.345 | 10.769 | 2.533 | 3.615 | 2.628 |

2002-10-13 | 1.421 | −14.875 | 1.299 | −5.012 | 1.237 |

2002-11-14 | 0.602 | 8.926 | 0.639 | 3.328 | 0.661 |

2002-12-16 | 0.268 | 8.844 | 0.286 | 2.721 | 0.294 |

2003-2-2 | 0.449 | −15.424 | 0.406 | −4.370 | 0.389 |

2003-4-7 | 1.109 | −20.413 | 0.958 | −4.017 | 0.921 |

2003-5-25 | 2.275 | −16.547 | 2.021 | −3.535 | 1.952 |

2007-10-3 | 1.615 | −2.475 | 1.585 | −0.571 | 1.576 |

MAPE | 12.278 | 4.467 |

In the further optimization (Scenario 2 in Figure 2), the calibrated DTVGM from Scenario 1 is optimized by the SCE-UA with the two objective functions *NSE* and *RMSE*_{ET} calculated by Equations (2) and (3). Figure 8 shows the continuous daily average ETs for the Shahe River basin after further optimization during 1999–2007, indicating that most DTVGM-based daily ETs in April and May are less than 1 mm if calculated following Scenario 1, which is not consistent with observations. After further optimization in Scenario 2 the simulation is improved, and further optimization additionally reduces the deviation between the simulations and the assimilated ETs.

The observed data from the Xiaotangshan station are chosen to validate the ET simulation by further optimization. The sub-basin that contains this station is flat with relatively homogeneous geographical characteristics, and the ET values at each pixel within this sub-basin are considered as equivalent to each other, so each observed ET value could be compared to the average ET value for sub-basin No.1. There are observed latent heat flux data available at Xiaotangshan station for 10 days during 2002–2007. The corresponding 10 daily ETs of the No.1 sub-basin simulated by DTVGM following Scenarios 1 and 2 are shown in Table 4, revealing that the simulated ETs from the DTVGM, optimized only with the objective function *NSE*, have a large error compared with the observed ETs. After further optimization based on *NSE* and *RMSE*_{ET}, the daily ET simulation of DTVGM is significantly improved, and the MAPE between the DTVGM-based ETs and LAS-based ETs is reduced by 5.84%.

Date | DTVGM_ Scenario 1 | DTVGM_ Scenario 2 | Observation | ||
---|---|---|---|---|---|

Daily ETs (mm) | Relative error (%) | Daily ETs (mm) | Relative error (%) | Daily ETs (mm) | |

2002-9-6 | 2.405 | −14.360 | 2.350 | −11.745 | 2.103 |

2003-7-20 | 2.220 | 16.226 | 2.550 | 3.774 | 2.650 |

2003-12-9 | 0.407 | 16.598 | 0.450 | 7.787 | 0.488 |

2004-7-18 | 3.252 | 17.671 | 3.550 | 10.127 | 3.950 |

2004-10-8 | 1.622 | −13.031 | 1.567 | −9.199 | 1.435 |

2005-8-8 | 3.707 | −18.246 | 3.395 | −8.293 | 3.135 |

2006-2-25 | 0.307 | 4.063 | 0.350 | −9.375 | 0.320 |

2006-7-10 | 3.228 | 8.321 | 3.558 | −1.051 | 3.521 |

2006-12-7 | 0.612 | −9.677 | 0.595 | −6.631 | 0.558 |

2007-8-1 | 3.220 | 18.066 | 3.550 | 9.669 | 3.930 |

MAPE | 13.600 | 7.760 |

Date | DTVGM_ Scenario 1 | DTVGM_ Scenario 2 | Observation | ||
---|---|---|---|---|---|

Daily ETs (mm) | Relative error (%) | Daily ETs (mm) | Relative error (%) | Daily ETs (mm) | |

2002-9-6 | 2.405 | −14.360 | 2.350 | −11.745 | 2.103 |

2003-7-20 | 2.220 | 16.226 | 2.550 | 3.774 | 2.650 |

2003-12-9 | 0.407 | 16.598 | 0.450 | 7.787 | 0.488 |

2004-7-18 | 3.252 | 17.671 | 3.550 | 10.127 | 3.950 |

2004-10-8 | 1.622 | −13.031 | 1.567 | −9.199 | 1.435 |

2005-8-8 | 3.707 | −18.246 | 3.395 | −8.293 | 3.135 |

2006-2-25 | 0.307 | 4.063 | 0.350 | −9.375 | 0.320 |

2006-7-10 | 3.228 | 8.321 | 3.558 | −1.051 | 3.521 |

2006-12-7 | 0.612 | −9.677 | 0.595 | −6.631 | 0.558 |

2007-8-1 | 3.220 | 18.066 | 3.550 | 9.669 | 3.930 |

MAPE | 13.600 | 7.760 |

The optimization results of parameters following the two scenarios are given in Table 2, and the values of *WB* and *NES* in different years of 1999–2007 are shown in Table 5. In order to make the comparison more clear, the daily runoff and rainfall processes based on the observation and simulation are plotted in Figure 9. It can be seen that the fitting effect of runoff simulation and observation is good in most years. The values of *NES* in the two scenarios are greater than 0.75, which generally meet the accuracy requirements of the hydrological simulation (Zhan *et al.* 2011b). The *NES* becomes 0.780 in Scenario 2 from 0.758 in Scenario 1. The percentage of the *NES* in each year greater than 0.75 is 55.6% in Scenario 1, while the percentage is 77.8% in Scenario 2. It indicates that Scenario 2 has a certain improvement in the simulation for daily runoff processes. The difference of *WB* in the two scenarios is obvious. The *WB* in Scenario 2 is 1.081, which is better than 1.206 in Scenario 1. That is to say, the relative error of *WB* in Scenario 1 is more than 20%, while the relative error in Scenario 2 is less than 10%. In Scenario 1, the percentage of *WB* relative error in each year less than 20% is 33.3%, while in Scenario 2 the percentage rises to 88.9%. It can be said that the further optimization in Scenario 2 is more significant for error control of the *WB* simulation. Thus, the result in Scenario 2 indicates an effective simulation for both ET processes and runoff processes.

Year | DTVGM_ Scenario 1 | DTVGM_ Scenario 2 | ||
---|---|---|---|---|

WB | NES | WB | NES | |

1999 | 1.251 | 0.667 | 1.122 | 0.710 |

2000 | 0.838 | 0.807 | 0.882 | 0.888 |

2001 | 1.236 | 0.787 | 1.106 | 0.788 |

2002 | 1.235 | 0.784 | 1.128 | 0.843 |

2003 | 1.506 | 0.438 | 1.225 | 0.521 |

2004 | 1.270 | 0.720 | 1.028 | 0.821 |

2005 | 1.191 | 0.899 | 1.112 | 0.880 |

2006 | 1.202 | 0.722 | 1.108 | 0.750 |

2007 | 1.128 | 0.946 | 1.018 | 0.949 |

1999–2007 | 1.206 | 0.758 | 1.081 | 0.780 |

Year | DTVGM_ Scenario 1 | DTVGM_ Scenario 2 | ||
---|---|---|---|---|

WB | NES | WB | NES | |

1999 | 1.251 | 0.667 | 1.122 | 0.710 |

2000 | 0.838 | 0.807 | 0.882 | 0.888 |

2001 | 1.236 | 0.787 | 1.106 | 0.788 |

2002 | 1.235 | 0.784 | 1.128 | 0.843 |

2003 | 1.506 | 0.438 | 1.225 | 0.521 |

2004 | 1.270 | 0.720 | 1.028 | 0.821 |

2005 | 1.191 | 0.899 | 1.112 | 0.880 |

2006 | 1.202 | 0.722 | 1.108 | 0.750 |

2007 | 1.128 | 0.946 | 1.018 | 0.949 |

1999–2007 | 1.206 | 0.758 | 1.081 | 0.780 |

## DISCUSSION AND CONCLUSIONS

In this study, the uncertainty in the DA process has been minimized as much as possible by determining the optimal ensemble size (100 members), observation error, and model error in the DA experiment. This is a key step in improving the optimization of the DTVGM on the basis of a DA-based ET series, when adopting an integrated approach that achieves a compromise between the DTVGM and RSTM in a DA scheme combining sparse remote sensing estimates and continuous modeling of daily ETs. The EnKF DA algorithm plays an important role and ensures the DA-based ETs used for validation are reliable. The DA-based ETs are used to further optimize the DTVGM when combined with the observed outlet discharge, so as to obtain accurate continuous daily ETs.

The proposed integrated approach provides a new method for minimizing the model uncertainties using DA-based daily ETs. The new methodology performs well in producing a continuous simulation of daily ETs, and will provide a practical solution for accurate, continuous estimates of ETs at basin and daily scales. The case study of the Shahe River basin indicates that the new method can more accurately simulate regional ET continuously. The MAPE between the DTVGM-based ETs (optimized only following Scenario 1 in Figure 2) and observed ETs at Xiaotangshan station is 13.8%, but the MAPE is reduced to 6.4% after further optimization based on DA-based ETs. The hydrological model with the further optimization in this paper can estimate ET with high accuracy and continuous time series for improving researches such as water resource management, climate change studies, agricultural planning, and so on. The new optimized ET model not only follows the surface energy balance but also meets the regional *WB*, and has a more perfect water thermal coupling mechanism. The study will further enrich the content of ET estimation disciplines and provide a scientific basis for better understanding of the laws of regional water cycles.

Although the ET simulation accuracy is improved by the further optimization, there are some limitations. First, the verification of ET is still a relatively difficult problem due to a lack of measurement conditions in some areas. Thus there is no spatially distributed ET data to verify the accuracy of the ET simulation by remote sensing and hydrological modeling. Additionally, the accuracy of this integrated method is closely related to the number of remotely sensed observations, which are usually temporally sparse due to the lack of high-frequency and high-resolution thermal data. We only obtained 18 high-quality TM/ETM + images during 1999–2007 from the USGS website, thus the simulation efficiency could still be improved further. Future work will focus on the construction of remote sensing observation data over a longer time series so as to further optimize the method.

In addition, the fitting effect of runoff simulation and observation is good in most years except 2003. The performance of the less effective simulation in 2003 shows that the observed discharge of rivers in the flood season (June to September) is smaller, which is inconsistent with rainfall–runoff relationship in the natural state. Under the condition of underlying surface change, the hydrological model produces the same runoff process caused by the same rainfall process generally. It is shown in Figure 9 that the precipitation is lower than usual in the spring and summer of 2003 when the crop water requirement is the largest of the year. It can be inferred that the water storage measure is used to reduce the annual drought and achieve the supply and demand balance of agricultural water. This indicates the distributed hydrological model also needs to take into account the impact of human activities more comprehensively, especially the human activities that are more obvious in the basin.

## ACKNOWLEDGEMENTS

This work was partially supported by the National Key Basic Research Program of China (973 Program) (Grant No. 2015CB452701), the National Natural Science Foundation of China (41271003, 41401042), Anhui Provincial Natural Science Foundation (1508085QD69) and China Postdoctoral Science Foundation (2014M550823). The authors are grateful to the reviewers for the help and thought-provoking comments.