## Abstract

Estimation of actual evapotranspiration (ET) and diagnosis of its controlling factors contribute to addressing water scarcity challenges in arid regions. For detecting impacts of vegetation, precipitation, and evaporation capacity on ET, a monthly ET model was proposed on the basis of the three factors and applied in the upstream and midstream of the Heihe River basin, northwest China, an arid and semi-arid basin. Parameter sensitivity analysis, partial correlation analysis, and factor analysis were used to assess the controlling factors of ET. Results demonstrated that the proposed ET model can reconstruct monthly ET processes with satisfying performances and precipitation, evaporation capacity and vegetation are the controlling factors of ET in the study areas. The proposed ET model could be applicable with the three controlling factors to estimate ET in arid regions.

## INTRODUCTION

Actual evapotranspiration (ET) is always a critical focus of hydrological cycles, especially in arid regions, which are the most likely places to suffer water shortage and eco-environment issues. In arid regions, ET is the major water loss and can make a distinct difference to hydrological processes. Studies show that ET consumes almost 90% of, or even equals precipitation in arid watersheds (Zhang *et al.* 2001; Wu *et al.* 2005). However, it is still very difficult to monitor ET in fields with current equipment or techniques, so estimation of ET is very important. Many methods have been developed to assess ET (Droogers 2000; Drexler *et al.* 2004; Courault *et al.* 2005; Gao *et al.* 2011; El Tahir *et al.* 2012; Rudd & Kay 2016), for instance, remote sensing technology, Bowen ratio energy balance method, water balance model, complementary relationship method, etc. Areal ET is estimated by CRAE model, AA and GG model in basins with different climatic conditions and their performances in drier regions are poorer than in humid regions (Xu & Singh 2005). More models such as the Penman, Penman–Monteith, Wright–Penman, Blaney–Criddle, radiation balance, and Hargreaves models have been applied to calculate ET in different regions and the results illustrate that different models are applicable for different basins (DehghaniSanij *et al.* 2004). The GG model and Makkink model can estimate ET best out of seven models in Germany (Xu & Chen 2005). However, in Haihe River basin in China, the AA model outperforms the GG model (Gao *et al.* 2011). Nonlinear complementary relationship method is applicable in data-scarce regions like the Tibetan Plateau (Ma *et al.* 2015). In the Blue Nile region, regional ET was estimated by SEBAL model based on MODIS data in the wet season (El Tahir *et al.* 2012). ET was also computed by remote sensing-based methods in arid regions (Li *et al.* 2012).

Although many models have been developed, ET estimation is not an easy thing. What makes these models difficult to use is that most of them require complicated observations to force models or validate model parameters, while available measurements in arid and semi-arid regions are particularly scarce (Allen *et al.* 2011; Sun *et al.* 2011). In addition, estimation reliability is challenging for most methods involving large uncertainties, especially for remote sensing-based models. One advantage of empirical models to estimate ET is the flexibility of data and another advantage is being without the limitations of physical mechanism. It can be a new way to estimate ET by means of statistical models with controlling factors of ET in arid regions according to the close relationship between ET and its controlling factors. Eagleman (1971) developed the relationship by a cubic equation between the relative evapotranspiration (i.e., ET/E_{0}, and E_{0} is evaporation capacity) and soil moisture and the model provided satisfactory estimations. Wang *et al.* (2007) built a multi-variable function between ET and net radiation, vegetation index, and temperature, which could estimate regional or global ET accurately with remote sensing data. Sun *et al.* (2011) and Feng *et al.* (2012) observed that precipitation, vegetation, and evaporation capacity are closely related to ET, especially in arid and semi-arid regions.

Selection of factors is of importance when using statistical methods to estimate ET. Controlling factors of ET could contribute greatly to enhancing model performances. Stephenson (1998) reported that vegetation is closely related to ET. Factors like vegetation, soil water, precipitation, and evaporation capacity, which are closely linked to hydrological processes, probably can be more powerful to deliver solutions of water challenges in the real world. Moreover, these data are more possibly available and meaningful to estimate ET. Zhang *et al.* (2001) suggested that plant-related water controls ET in dry conditions while ET depends on climate, soil, and vegetation in intermediate conditions. The objective of this study is: (1) to propose a possible relationship between ET and its possible controlling factors to estimate ET in arid and semi-arid regions and (2) to evaluate the reliability of these factors.

## CASE STUDY

### Study area

As the second largest inland river in China, the Heihe River was chosen as a typical case study for assessing ET in an arid and semi-arid region (Li *et al.* 2012). Due to intensive anthropogenic interference and water shortage, the eco-environment is fragile in this basin. Interestingly, the conditions of climate and vegetation are quite different in its upstream and midstream basins. Precipitation, evaporation capacity, and vegetation are chosen as possible controlling factors to estimate ET due to the limitation of data in the basin.

Heihe River mainly supplies water to Gansu, Qinghai Province and the Inner Mongolia Autonomous Region. However, the basin suffers severe water scarcity and even droughts in recent decades (Feng *et al.* 2014; Qiu *et al.* 2016). In the study, the Yingluoxia sub-basin (YLX), the Zhengyixia sub-basin (ZYX), and the basin containing the two sub-basins (YLX&ZYX), were respectively chosen as the study areas (see Figure 1). Therefore, there are three parts of the study area: YLX, ZYX, and YLX&ZYX. The whole study area is nearly 34,000 km^{2}. The annual average rainfall in YLX is about 300–400 mm, and the pan evaporation is over 800 mm. In ZYX, the annual average rainfall is about 100–200 mm while pan evaporation is close to 1,400 mm.

As the headwater of the Heihe River basin, YLX is located in the upstream with mountain topography. Vegetation cover in the sub-basin is relatively abundant with forest, shrub, and grassland (Yin *et al.* 2015). In contrast, ZYX is located in Hexi Corridor in the midstream. It is a main agriculture base and an economic center in the Heihe River basin, with the dominant water consumption of the whole basin. Due to having almost 94% population of the basin (Yin *et al.* 2015), ZYX is intensively interferred with by anthropogenic activities. Artificial oasis and sparse grassland cover the midstream. The land use types (see Figure 2) are shown in Table 1, and provided by the Cold and Arid Regions Science Data Center at Lanzhou, China (http://westdc.westgis.ac.cn). The vegetation cover (forest and grass land) proportion in YLX is doubled in ZYX, and there is a quite difference in farmland between these two sub-basins.

Proportion/% | ||
---|---|---|

YLX | ZYX | |

Farmland | 0.5 | 21.0 |

Forestland | 2.7 | 1.9 |

Grassland | 68.0 | 33.4 |

Water body | 3.2 | 3.4 |

Settlement place | 0.1 | 1.0 |

Unused land | 25.5 | 39.3 |

Proportion/% | ||
---|---|---|

YLX | ZYX | |

Farmland | 0.5 | 21.0 |

Forestland | 2.7 | 1.9 |

Grassland | 68.0 | 33.4 |

Water body | 3.2 | 3.4 |

Settlement place | 0.1 | 1.0 |

Unused land | 25.5 | 39.3 |

### Data

In this study, precipitation, evaporation capacity, and vegetation were applied to build the ET model. Precipitation and pan evaporation at hydrological stations (see Figure 1) were collected from the Hydrology and Water Resources Survey Bureau of Gansu Province. Here, pan evaporation was employed to indicate evaporation capacity. Leaf area index (LAI) can represent vegetation characteristics, so ‘Heihe 1 km LAI production’ was downloaded from the Cold and Arid Regions Science Data Center at Lanzhou, China (http://westdc.westgis.ac.cn/). The product ‘Heihe 1 km monthly ET data’, obtained by calculation of the improved ET-Watch model based on multi-source remote sensing data (Wu *et al.* 2012), was also selected from this center as no available observed ET data could be found. The study period was from January 2007 to December 2012. The calibration period was 2007–2010 and validation was 2011–2012.

## METHODS

### ET model

*E*), precipitation (

_{0}*P*), and vegetation (

*LAI*), respectively (Sun

*et al.*2011): The relationship was explored in over 100 arid and semi-arid regions, and the applicable equation was determined with a nonlinear function as follows (Feng

*et al.*2012): where

*k*

_{1}(mm

^{−1}),

*k*

_{2},

*k*

_{3}are parameters and

*a*(mm) is the constant.

*et al.*2001; Sun

*et al.*2011). Another reason that these three factors are chosen is that related data are available in most regions, although with a sparse resolution in some regions. To find an applicable equation to further estimate ET in arid and semi-arid regions, a new equation is proposed by the relationship in Equation (1) with different possible linear or nonlinear combinations: where

*a*

_{1},

*a*

_{2},

*a*

_{3}(mm),

*a*

_{4}(mm

^{−1}),

*a*

_{5},

*a*

_{6},

*a*

_{7}(mm

^{−1}) are parameters and

*b*(mm) is the constant. This model was employed to estimate ET in the study areas. Here, the SCE-UA algorithm (Duan

*et al.*1994) was applied to determine the parameters.

*NS*) (Nash & Sutcliffe 1970), relative error (

*RE*), and normalized root-mean-square error (

*NRMSE*) are employed to estimate performances of the proposed model: where

*ET*and indicate the measured and simulated ET in

_{i}*i*time, respectively, while is the average measured ET during the whole study period.

### Partial correlation analysis

*et al.*2008): The partial correlation coefficient between variable 1 and 2 after eliminating the effects of variable 3 and 4 is computed as: In this study, partial correlation analysis is applied to validate the controlling capacity of the three factors for ET and the relationship of them.

### Factor analysis

*LAI*,

*P*, and

*E*

_{0}are all considered as the original variables. The model is: where

*X*and

_{j}*F*is the

_{i}*j*original variable and the

^{th}*i*common factor, respectively.

^{th}*p*is the number of common factors, and

*Ɛ*is the random variation of

_{j}*X*.

_{j}*a*is the factor loading, which indicates the correlation between

_{ji}*X*and

_{j}*F*. If

_{i}*a*> 0.3, it is significant. If

_{ji}*a*> 0.5, it is very significant (Panagopoulos 2014). Communality of

_{ji}*X*could be calculated as , which illustrates the contribution of all common factors to the original variable, and the cumulative variance of

_{i}*F*is .

_{i}## RESULTS

### Simulated results

The variation of *P*, *LAI*, *E*_{0}, and measured ET presented similar fluctuations during the study period, and the variation in YLX is shown in Figure 3. It also can be seen from the figure that the differences of different variables in the crests are much larger than in the troughs, particularly for the difference between E_{0} and ET.

Here, the period of 2009–2012 was selected for calibration and 2007–2008 for validation. Figure 4 shows the measured and simulated ET by the ET model and the simulated process could match the measured one with a satisfying performance. However, some simulated peak values were underestimated while some were overestimated. The unmatched peak values may ascribe to the uncertainties of the measured ET data (Wu *et al.* 2012). Also, it was interesting that the overestimated ET somehow matched the rough variations of ET/E_{0} ratio in Figure 5, particularly in YLX. The phenomenon may be due to intensive human disturbances (Qiu *et al.* 2016). Additionally, it can be seen from Figure 4 that performances in YLX&ZYX and ZYX were better than those in YLX.

In Table 2, *NS* were almost all over 0.90 except in YLX. *RE* in YLX&ZYX for validation and YLX for calibration were 7% and 12%, respectively, while the absolute values of other *RE*s were all 2%. *NRMSE* in all three basins was below 0.3. All the estimators described satisfying performances of ET simulation in the study areas, implying the applicability of the ET model in the study areas. The estimation of ET by means of the ET model in the study is more accurate than that by the evaporative fraction model in the middle reaches of the Heihe River basin (Li *et al.* 2012).

Study area | Calibration | Validation | ||||
---|---|---|---|---|---|---|

NS | RE | NRMSE | NS | RE | NRMSE | |

YLX&ZYX | 0.94 | −1% | 0.19 | 0.91 | 7% | 0.23 |

YLX | 0.88 | −12% | 0.25 | 0.90 | −2% | 0.29 |

ZYX | 0.91 | −1% | 0.21 | 0.92 | 1% | 0.25 |

Study area | Calibration | Validation | ||||
---|---|---|---|---|---|---|

NS | RE | NRMSE | NS | RE | NRMSE | |

YLX&ZYX | 0.94 | −1% | 0.19 | 0.91 | 7% | 0.23 |

YLX | 0.88 | −12% | 0.25 | 0.90 | −2% | 0.29 |

ZYX | 0.91 | −1% | 0.21 | 0.92 | 1% | 0.25 |

Table 3 shows the proposed ET models for all study areas. The parameters of *P*E*_{0}**LAI* term in the three study areas were all zero. The parameter for *P*E*_{0} part in ZYX was also zero.

Study area | ET model |
---|---|

YLX&ZYX | |

YLX | |

ZYX |

Study area | ET model |
---|---|

YLX&ZYX | |

YLX | |

ZYX |

The proposed ET model contributes to reconstructing ET processes with satisfying performances in the Heihe River basin. Due to water shortages, in many basins, intensive human activities intervene in utilization of the river flow and groundwater (Peng *et al.* 2016; Qiu *et al.* 2016), resulting in difficulties to simulate the processes of streamflow. Thus, rebuilding ET processes is of significance in unfolding the mechanism of hydrological processes. By means of the possible controlling factors related to the hydrological regime, the ET model provides a new approach to study hydrological processes in arid regions.

### Parameters’ sensitivity analysis

A simple method based on local parameter techniques (Qiu *et al.* 2013), by adding linear increments to the parameter estimated while keeping other parameters unchanged, was used to analyze the sensitivity of parameters in the proposed ET model. Parameters from *a*_{1} to *a*_{6} were estimated and *NS* acted as the performance indicator (Table 4).

Study area | Sensitivity of parameters | |||
---|---|---|---|---|

Increment of 0.1 | Increment of 0.01 | Increment of 0.001 | Increment of 0.0001 | |

YLX&ZYX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{3} > a_{5} ≈ a_{1} | a_{4} > >a_{1} ≈ a_{2} ≈ a_{3} ≈ a_{5} ≈ a_{6} |

YLX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{3} > a_{2} > a_{1} > a_{5} > a_{6}or a _{4} > >a_{2} > a_{6} > a_{1} > a_{3} > a_{5} | a_{4} > >a_{1} ≈ a_{2} ≈ a_{3} ≈ a_{5} ≈ a_{6} |

ZYX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{1} ≈ a_{5} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{1} ≈ a_{5} > a_{3} |

Study area | Sensitivity of parameters | |||
---|---|---|---|---|

Increment of 0.1 | Increment of 0.01 | Increment of 0.001 | Increment of 0.0001 | |

YLX&ZYX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{3} > a_{5} ≈ a_{1} | a_{4} > >a_{1} ≈ a_{2} ≈ a_{3} ≈ a_{5} ≈ a_{6} |

YLX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{3} > a_{2} > a_{1} > a_{5} > a_{6}or a _{4} > >a_{2} > a_{6} > a_{1} > a_{3} > a_{5} | a_{4} > >a_{1} ≈ a_{2} ≈ a_{3} ≈ a_{5} ≈ a_{6} |

ZYX | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{1} ≈ a_{5} > a_{3} | a_{4} > >a_{2} > a_{6} > a_{1} ≈ a_{5} > a_{3} |

The results showed that the sensitivity of parameters had the same rankings of a_{4} > >a_{2} > a_{6} > a_{5} > a_{1} > a_{3} when the increment was 0.1 and 0.01. The parameter a_{4} was always much more sensitive than the others, indicating the strong effect of the item *P*E*_{0} on ET. When the increment was 0.1 or 0.01, the sensitivity of parameters related to *E*_{0} was always remarkable, then *P*, and finally *LAI*. However, when the increment decreased to 0.001, the sensitivity of a_{1}, a_{3}, and a_{5} changed in the study areas: the sensitivity of a_{1} was almost the same as a_{5} and both of them were less sensitive than a_{3} in YLX&ZYX, but more sensitive than a_{3} in ZYX. Except for a_{4}, the sensitivity of other parameters was almost the same when the increment was 0.0001.

### Control factors

A three-common factor model for factor analysis was developed and the results are shown in Table 5. All original variables have one very significant factor loading at least. In YLX&ZYX, *F*_{1}, *F*_{2}, and *F*_{3} have the closest relationship with *LAI*, *E*_{0}, and *P*, respectively. In YLX, *F*_{1} has a significant relationship with *LAI*, *F*_{1} and *F*_{2} with *P*, and *F*_{2} with *E*_{0}. In ZYX, *F*_{1}-*LAI*, *F*_{2}-*P* and *F*_{1}-*E*_{0} have the most remarkable relationships. All factor loadings of *F*_{3} are not significant in YLX or ZYX, but are significant in YLX&ZYX. All communalities are larger than 0.680, indicating that the three common factors could explain the three original variables well. The same conclusion can be found by the cumulative variance.

Variable | YLX&ZYX | YLX | ZYX | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

F _{1} | F _{2} | F _{3} | Communalities | F _{1} | F _{2} | F _{3} | Communalities | F _{1} | F _{2} | F _{3} | Communalities | |

LAI | 0.685 | 0.402 | 0.439 | 0.823 | 0.835 | 0.356 | 0.021 | 0.825 | 0.757 | −0.402 | −0.263 | 0.804 |

P | 0.466 | 0.408 | 0.665 | 0.825 | 0.746 | 0.518 | −0.189 | 0.860 | 0.361 | − 0.739 | −0.066 | 0.680 |

E _{0} | 0.366 | 0.716 | 0.358 | 0.774 | 0.418 | 0.763 | −0.032 | 0.757 | 0.832 | −0.296 | 0.024 | 0.780 |

Cumulative variance | 0.820 | 0.841 | 0.763 | 1.428 | 0.977 | 0.037 | 1.396 | 0.795 | 0.074 |

Variable | YLX&ZYX | YLX | ZYX | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

F _{1} | F _{2} | F _{3} | Communalities | F _{1} | F _{2} | F _{3} | Communalities | F _{1} | F _{2} | F _{3} | Communalities | |

LAI | 0.685 | 0.402 | 0.439 | 0.823 | 0.835 | 0.356 | 0.021 | 0.825 | 0.757 | −0.402 | −0.263 | 0.804 |

P | 0.466 | 0.408 | 0.665 | 0.825 | 0.746 | 0.518 | −0.189 | 0.860 | 0.361 | − 0.739 | −0.066 | 0.680 |

E _{0} | 0.366 | 0.716 | 0.358 | 0.774 | 0.418 | 0.763 | −0.032 | 0.757 | 0.832 | −0.296 | 0.024 | 0.780 |

Cumulative variance | 0.820 | 0.841 | 0.763 | 1.428 | 0.977 | 0.037 | 1.396 | 0.795 | 0.074 |

*Note:* The significant level is denoted in bold (*a _{ji}* > 0.5).

According to the correlation coefficient between ET and different variables (see Figure 6), it can be known that *E*_{0} always had the closest relation with ET in all the study areas. *P* and *LAI* had a similar relation with ET in YLX&ZYX and YLX, while *LAI* had a much closer relation with ET than *P* in ZYX. Figure 6 demonstrates that ET was closely associated with *P*, *E*_{0}, and *LAI* in the study areas. Then, partial correlation coefficients between ET and the three factors were calculated to estimate the independent effect of the individual factors (see Figure 7). It can be seen that the partial correlation coefficients were much smaller than the corresponding correlation coefficients, especially for *P* and *LAI*. By comparison of Figures 6 and 7, it is clear that the three factors had interaction among themselves, especially *P* and *LAI*. Results showed that in all the three study areas, independent *E*_{0} had the strongest effect on ET, then independent *LAI*, and finally independent *P*, demonstrating the controlling roles of the three factors.

## DISCUSSION

During the study period, the proposed ET model is applicable to simulate monthly ET with good performances. The ET model is rather easy to operate and data requirements are easy to match. Another advantage of the model is the flexibility and augmentability by means of adding to the model possible controlling factors adapted to a certain region to enhance model performances. Thus, it is of value in generalizing the model in arid regions.

The big flaw of the ET model is the requirement of measured ET data to calibrate the model, which is similar to most hydrological models. Another deficiency is no basis of physical mechanism. The controlling factors considered and the nonlinear relationships between different factors in the model solve the problem to some degree.

The reason for the three factors considered in the study, rather than other factors, is that these three factors are not only possible controlling factors of ET but are also closely linked to hydrological processes. ET processes based on these factors have the potential to reveal hydrological processes in study areas. Although it is difficult to tell the dominant factor for ET in the study areas by the sensitivity analysis of the model parameters and the partial correlation analysis, it can be determined that *E*_{0}, *P*, and *LAI* are the controlling factors for ET in the Heihe River basin in the monthly time scale.

In YLX&ZYX, the total amount of *P* (1,478 mm) during the study period is nearly equal to that of ET (1,473 mm); the amount of *P* (1,975 mm) is slightly larger than that of ET (1,900 mm) in YLX. However, the situation is opposite in that the amount of ET (1,334 mm) is much larger than *P* (986 mm) in ZYX. The lesser amount of *P* than ET in ZYX implies that water availability derives far more from *P* in the basin. In fact, most water sources in ZYX come from the upper midstream. For YLX, although *P* provides most of the water source, other sources like snowmelt water contribute to its water availability. It has been proposed that potential evaporation in ZYX is larger than in YLX, but ET in YLX is larger than in ZYX, which attributes to the larger water availability of YLX than ZYX. However, ET in both YLX and ZYX is much less than their potential evaporation, indicating water shortages in the study areas. Although *P* here does not denote the total water availability and whether the requirement of E_{0} is matched or not, the two factors present critical controls on ET in the study areas.

According to Figure 1 and Table 1, YLX is about 70% covered and located near rivers. ZYX is half covered and half unused, of which, the half-covered ZYX is almost half covered by farmland, most of which is irrigated with large amounts of water consumption (Nian *et al.* 2013). Thus, vegetation cover in the study areas remarkably relates to ET.

## CONCLUSIONS

ET is the most important form of water loss in arid regions and diagnosis of its controlling factors may be appropriate for reaching better water resources management. An ET model with precipitation, evaporation capacity, and vegetation and optimized by SCE-UA was proposed and applied to simulate ET in the upper and middle regions of the Heihe River basin. The controlling factors of ET were diagnosed by parameter sensitivity analysis and partial correlation analysis. The following conclusions can be drawn:

The proposed ET model with the three factors can reconstruct the actual ET in the upstream and midstream of the Heihe River basin properly. Additionally, the ET model is flexible to be applicable to other arid regions by adding more controlling factors to the model.

Evaporation capacity, precipitation, and vegetation are the control factors on ET in the upstream and midstream of the Heihe River basin.

## ACKNOWLEDGEMENTS

This study was financially supported by the State Key Program of National Natural Science of China (91647202). Sincere thanks are due to two reviewers for their very helpful comments and suggestions.

## REFERENCES

*.*