Evapotranspiration is a key variable for hydrologic, climatic, agricultural, and environmental studies. Given the non-availability of economically and technically easy to implement direct measurement methods, evapotranspiration is estimated primarily through the application of empirical and regression models, and machine learning algorithms that incorporate conventional meteorological variables. While the FAO-56 Penman-Monteith equation worldwide has been recognized as the most accurate equation to estimate the reference evapotranspiration (ETo), the number of required climatic variables makes its application questionable for regions with limited ground-based climate data. This note provides a summary of empirical and semi-empirical equations linked to its data requirement and the problems associated with these models (transferability and data quality), an overview of regression models, the potential of machine learning algorithms in regression tasks, trends of reference evapotranspiration studies, and some recommendations of the topics future research should address that would lead to a further improvement of the performance and generalization of the available models. The terminology used in this note is consistent in both the theoretical and practical field of evapotranspiration, which is often dispersed in the academic literature. The goal of this note is to provide some perspective to stimulate discussion.

  • An overview of trends in ETo studies is presented.

  • The main limitation of FAO-56 Penman-Monteith is the large number of meteorological variables required.

  • There is a wide variety of empirical equations for ETo estimation.

  • The application of machine learning algorithms is increasing due to their high performance for ETo estimation.

  • Some aspects of ETo estimation methods are discussed and recommended.

Graphical Abstract

Graphical Abstract
Graphical Abstract

An overwhelming volume of scientific literature is available on evaporation, transpiration, and evapotranspiration (ET), and an immense volume of academic articles about ET estimation methods using empirical and regression models, and machine learning algorithms have been published. Based on Scopus (Elsevier) journal database (for the period 1800–2021), the number of online access documents (articles, book chapters, conference proceedings, reports, dissertations, etc.) containing the keyword ‘reference evapotranspiration’ amounts to 72,899. A search in this database using the keywords ‘estimation models’ and ‘reference evapotranspiration’ yields 31,427 results in the period 1966–2021, while the combination of the keywords ‘machine learning’ and ‘reference evapotranspiration’ yields 3,128 documents in the period 1971–2021. Analysis of the number of published documents in the Scopus journal database using the keywords ‘hybrid data-driven machine learning techniques’ and ‘reference evapotranspiration’ reveals that in the period 1998–2021 a total of 557 documents were registered. This simple analysis clearly indicates that still today quite some effort by the scientific community is devoted to improving and calibrating the measuring techniques, empirical and model estimation methods, and artificial intelligence-based methods to measure and estimate ET at different time frequencies and spatial scales. In addition, a clear shift from the classical approaches to estimate reference evapotranspiration (ETo) using empirical equations and models to artificial intelligence-based methods are noticeable. On the other hand, Figure 1 shows the temporal evolution of the number of ETo articles published in the past decade in the core collection of the Web of Science (WoS) using ‘reference evapotranspiration’ in the title. In the past decade, the total number of publications reached 804, and ETo studies showed a remarkable increase that almost tripled by 2020, with a mean production of 79 articles per year for ETo. This rising trend in the number of publications reflects a growing interest among scientists in ETo studies.

Figure 1

Temporal evolution of the number of studies recorded in Web of Science (WoS) mentioning reference evapotranspiration in the title. The dotted line represents the trendline.

Figure 1

Temporal evolution of the number of studies recorded in Web of Science (WoS) mentioning reference evapotranspiration in the title. The dotted line represents the trendline.

Close modal

When faced with determining ET in the context of a project, the problem arises which method to apply. There is so much literature on ET that in this context it is practically impossible to propose even a partial review. Therefore, analysis of the literature on this subject is time consuming and costly, and to circumvent this, the idea arose to develop a communication that can be used as a guide in selecting the most suitable approach for a given study. This note is based on a detailed analysis of the literature published in the last decade, and available in the WoS journal database. It is expected that the synthesized information will be a useful tool for water and climate researchers and practitioners when ETo is required. The goal of this note is not to arrive at any particular truth, but rather to stimulate lively discussion.

ET is the integration of land evaporation and plant transpiration from the Earth's surface, which are crucial processes in the hydrologic cycle. An accurate prediction of ET is essential for the estimation of the water budget (Equation (1)) and the management of water-related environmental systems, i.e., in agricultural, meteorological, and hydrological practices. However, the measurement of ET is complex and expensive. The methods for direct measurements of actual evapotranspiration (ETc) with high precision are lysimeters and eddy covariance systems, the costs of which are relatively high in terms of installation and maintenance. Other methods to estimate ET are the water balance, pan evaporation, or the remote sensing imaging technique.
(1)
where ΔS is the change water storage over a specified period of time (Δt), P the precipitation, SN the snowmelt, GWin and GWout the groundwater, Q the discharge and ET the evapotranspiration.
Because of the complexity of the field methods, measurements of evapotranspiration are spatially and temporally scarce. As an alternative, crop evapotranspiration (ETc) is commonly calculated based on ETo. ETo can be defined as the evapotranspiration amount from a reference crop (grass) with a height of 0.12 m, surface resistance of 70 s m−1 and albedo of 0.23 (Allen et al. 1998). ETc is obtained by multiplying ETo with a crop specific coefficient (kc). In most cases ETo is calculated using mathematical models and climatological variables as input. The FAO-56 Penman-Monteith equation has been adopted by the scientific community as a standard method for the estimation of ETo (Equation (2)), and found suitable for most climate conditions. The method is a physically based approach and requires the availability of different weather variables such as air temperature, relative humidity, solar radiation, and wind speed. The procedure for calculating the Penman-Monteith equation is documented in Allen et al. (1998).
(2)
where ETo is the reference crop evapotranspiration (mm day−1), Rn the net radiation (MJ m−2 day−1), G the soil heat flux (MJ m−2 day−1), γ the psychrometric constant (kPa °C−1), es the saturation vapor pressure (kPa), ea the actual vapor pressure (kPa), Δ the slope of the saturation vapor pressure-temperature curve (kPa °C−1), T the average air temperature (°C), and u the mean wind speed at 2 m (m s−1).

Currently, the distributed global network for eddy covariance flux measurements ‘FLUXNET’ (www.fluxnet.org), is key to generate micrometeorological data (e.g., ET) for most of the terrestrial regions and biomes of the world with different climatology. However, the network density is very low in the Global South (which is roughly defined by latitude), and, moreover, unfortunately in many remote regions a weather station has never been installed. Therefore, to integrate water resources research and management in those areas, approaches such as the FAO-56 Penman-Monteith are still needed.

The main limitation to calculate ETo by using the FAO-56 Penman-Monteith method is that the full set of climatic variables needed are not measured in many weather stations worldwide. The quality with which the weather data are measured is another problem. The meteorological data obtained by different weather instruments/sensors is not free from flaws such as lacking reliability (solar radiation), intermittent errors and questionable quality (relative humidity and wind speed). Temperature is the variable that is least prone to faulty sensor reading and is largely and easily available in many regions of the world.

To apply the FAO-56 Penman-Monteith equation under limited data conditions, classically missing solar radiation (Rs), relative humidity (RH), or wind speed (u), some guidelines have been established by Allen et al. (1998). Solar radiation and wind speed values from near weather stations with similar topography and climatic conditions can be used when local values are missing. As the second option, solar radiation can be calculated using the Hargreaves radiation formula as a function of the minimum and maximum temperature. When relative humidity data is lacking, the actual vapor pressure (ea) can be estimated by assuming that the dew point temperature (Tdew) is equal to the minimum temperature (Tmin), and under missing wind speed data, the FAO-56 Penman-Monteith equation can be estimated using the global average wind speed value of 2 m s−1.

One should be aware of the fact that uncertainties of field measurement and meteorological variables could be large, primarily associated with instrument calibration, installation, operation, and maintenance. To overcome this issue, guidelines for quality control of weather data have been established. Meek & Hatfield (1994) and Allen (1996) developed screening rules and instructions guiding the decision when data/sensors should be scrutinized.

Continuous monitoring of weather stations is another issue. Worldwide many stations have been abandoned or disassembled, while many FLUXNET stations have been installed during the past decade. Overall, the lack of spatial and temporal (long-term) weather data and the uncertainty of data quality are common, and limits the application of the FAO-56 Penman-Monteith equation.

Due to the lack of lysimeters and fully equipped weather stations to estimate ETo using the FAO-56 Penman-Monteith method, the application of empirical equations requiring fewer weather variables is pivotal for hydrological, ecohydrological, and biometeorological studies and applications. It must be highlighted that a large body of literature related to empirical equations for the estimation of ETo is available. Based on the data requirement the available equations can be subdivided into the following groups: Temperature-, radiation-, and mass transfer- based methods (see Supplementary Table). Most ETo equations have been developed specifically for definite atmospheric conditions and for different temporal scales such as hourly, daily, or monthly. Hupet & Vanclooster (2001) demonstrated that low temporal sampling resolutions of meteorological variables (time-aggregation effect) tend to overestimate ETo. This highlights the paramount importance of using finer-scale monitoring resolutions. Some academic efforts were directed to adapt empirical equations from low to finer resolutions. For example, Pereira & Pruitt (2004) and Chang et al. (2019) attempted to modify the original monthly Thornthwaite temperature-based equation to estimating daily ETo.

The Supplementary Table provides for each of the listed empirical and semi-empirical equations the data requirement of each equation. In this way, this table serves as a guide for users to identify the optimal methods that they can apply given the availability of weather data.

As shown in the Supplementary Table different empirical models to estimate ETo were developed using meteorological variables from weather stations at surface level, assuming intrinsically the local conditions where the models were formulated. Some models work well in areas with similar climatological and environmental conditions. When such approaches are tested in other climatic conditions, their performance might be poor. This makes the transferability of models (those that can be used beyond the spatial and temporal bounds of their underlying data) to other areas or time periods uncertain. Except for the FAO-56 Penman-Monteith method, the transferability of ETo models across geographic locations have failed, and the development of transferable models remains elusive. Empirical models will always have to be calibrated to the local conditions where they are applied. Models require making the tradeoff between prediction bias and variance (homogenization versus non-transferability), and it is evident that for application and decision making (e.g., irrigation systems, catchment water balance), preference ought to be given to estimation models with high accuracy.

A relatively simple and widely applied calibration method consists of the recalibration of the coefficients by means of the 10-fold cross-validation method. The complete dataset is randomly distributed in 10 groups of approximately the same size. After the coefficients are computed by using nine of the groups as a training set and validated with the remaining group, this procedure is repeated 10 times to get the new coefficients with the lowest test error. Following, the calibration radius (cr) is derived by dividing the measured variable by the estimated variable (Equation (3)). This rough but simple calibration method consists in multiplying the average cr value with all data (Equation (4)), its performance can be tested by cross-validation.
(3)
(4)
where Vcal is the calibrated variable, V the measured value, Vo the estimated variable, and is the average calibration radius of the data considered.

In recent decades, rapid advances in the application of regression models and machine learning (ML) have been made, and the scientific community has adopted these techniques for different purposes in the hydrology field (Lange & Sippel 2020). As found by Jing et al. (2019) there is a large and growing field of implementation of evolutionary computational models for ETo estimates. Regression models in general terms are a method that use observations records to quantify the relationship between a target variable (also named as dependent variable), and a set of independent variables (also named as a covariate). The following are classic examples of regression models: multiple linear regression, Bayesian regression, robust regression, and multivariate adaptive regression splines. ML, depending on the underlying algorithm that is used, can perform supervised and unsupervised learning and then build statistical models, determining trends and patterns, for data analysis and forecasting. The ML algorithms (e.g., artificial neural network (ANN), support vector machine (SVM) and adaptive neuro-fuzzy inference system (ANFIS)) are able to learn implicitly using the input data and provide accurate predictions, without having been specifically programmed for that task. A brief description of the most widely used regression models are given in the following.

The most popular applied regression model is the multiple linear regression (MLR), which is a statistical approach for modelling the linear relationship between explanatory (independent) and response (dependent) variables. The main assumption in the MLR is that the relationship between the dependent and independent variables is linear. It also assumes that there is no significant correlation between the independent variables. MLR can be considered as an extension of ordinary least-squares (OLS) regression because it involves more than one explanatory variable (Eberly 2007).

Multivariate adaptive regression splines (MARS) is a non-parametric model of a nonlinear regression that allows explaining the dependence of the response variable on one or more explanatory variables. Non-parametric modeling does not approximate one single function but adjusts it to several other functions for simple metrics, usually low-order polynomials, defined on a sub-region of the domain (parametric adjustment per section), or sets a simple function for each value of the variable (global setting). MARS is preferred because it allows approximating complex nonlinear relationships from the data, without postulating a hypothesis about the present type of nonlinearity. The construction of the algorithm model incorporates mechanisms that allow the selection of relevant explanatory variables. The resulting model is easier to interpret and apply. Finally, the estimation of its parameters is computationally efficient and rapid (Friedman 1991).

The basis of robust regression (RB) consists of assigning a weight to each data point, to counter OLS estimates which are extremely sensitive to outliers. Weighting is done automatically and iteratively through a process called ‘iteratively reweighted least squares’. In the first iteration, each point is assigned equal weight, and model coefficients are estimated using OLS. At subsequent iterations, weights are recomputed so that points farther from model predictions in the previous iteration are given a lower weight. Model coefficients are then recomputed using weighted least squares. The process continues until the values of the coefficient estimates converge within a specified tolerance (Khoshravesh et al. 2017).

Bayesian regression (BR) in simple terms attempt to find a variable θ considered as a random variable with probability distribution π(θ) (called prior distribution) from the data y=(y1,y2,…,yn) using a statistical model described by a density function [l(y|θ)], called the likelihood function. The prior distribution expresses the beliefs about the parameter before examining the data. Given the observed data y, update of beliefs about θ by combining information from the prior distribution and the data by the use of Bayes theorem, and so the calculation of the posterior distribution, π(θ|y), i.e., the posterior distribution is computed by the variances of the prior and sample data. The variance establishes two conditions: if variance (1) prior data<sample data, a higher weight is assigned to the prior data, or (2) prior data>sample data, a higher weight is assigned to the sample data (Khoshravesh et al. 2017).

Similar to the brief outline of the most frequently used regression models in the previous paragraphs, in the following a brief description is given of the most popular ML algorithms used for prediction. ANNs are considered a computation tool that emulates the function of neural networks in biological systems. ANNs extract the relationship of inputs and outputs of a process, without explicitly knowing the physical nature of the problem in such a way that the result is transmitted in the network until a signal output is obtained. The ANN-based model's procedure is, in general, divided into training, validation, and testing. The architecture of an ANN has an input layer (where data are introduced to an ANN), the hidden layer(s) (where data is processed), and the output layer (where results of given inputs are provided). The advantage of the neural method relies on the possibility of improving the performance criteria by modifying the network architecture (Lange & Sippel 2020).

SVM is popularly and broadly used for classification and regression problems in machine learning ML. SVM for classification problems separate the data by class from the separating line (called hyperplane) and unlike regression, a safety boundary from both sides of the hyperplane is created (maximizing the margin), while SVM models for regression problems find the linear regression function that can best approximate the output vector with an error tolerance. The advantage of SVM models is their flexibility in defining how much error is acceptable and by yielding an appropriate line (or hyperplane in higher dimensions) that fits the data (Kecman 2005).

Finally, a random forest (RF) is a trendy and effective algorithm based on model aggregation ideas for several tasks such as classification, regression, and forecasts. RF works by constructing a large number of relatively uncorrelated decision trees from bootstrap samples that operate as an ensemble, and also involves selecting a subset of input features (columns or variables) at each split point in the construction of the trees. Each individual tree in the RF returns a class prediction, and the class with the most votes becomes the model predictor (Breiman 2001).

The application of any of the models will depend on the objective to be achieved, the relationship between the variables in the dataset, and also on the capacity and expertise of the user who develops and implements the model. Some of the main advantages and disadvantages of these models are presented in Table 1.

Table 1

Overview of advantages and disadvantages of regression models and machine learning algorithms

Estimation modelsAdvantagesDisadvantages
MLR Adequate for small datasets
Simple to understand and interpret 
The linear assumption
Sensitive to outliers 
RB Improve the performance when the dataset present heteroscedasticity and outliers When the underlying assumptions of the classic method (OLS) are true, the RBs have lower performance than the classic method 
BR Fast data processing Less accurate when collinearity exists 
MARS Fast data processing
Simple to understand and interpret
Is flexible to capture the shape of functions 
The high degree of flexibility can result in overfitting 
ANN Powerful to identify complex non-linear relationships Large datasets are required to achieve good performance Overfitting 
SWM Powerful to identify complex non-linearrelationships
Robust for outliers 
Requires considerable processing time
The performance depends on the selection of the kernel function
Risk of overfitting 
RF Powerful to identify complex non-linear relationshipsHarder to overfit Poor performance with small datasets
The number of decision trees must be set
Low model interpretability 
Estimation modelsAdvantagesDisadvantages
MLR Adequate for small datasets
Simple to understand and interpret 
The linear assumption
Sensitive to outliers 
RB Improve the performance when the dataset present heteroscedasticity and outliers When the underlying assumptions of the classic method (OLS) are true, the RBs have lower performance than the classic method 
BR Fast data processing Less accurate when collinearity exists 
MARS Fast data processing
Simple to understand and interpret
Is flexible to capture the shape of functions 
The high degree of flexibility can result in overfitting 
ANN Powerful to identify complex non-linear relationships Large datasets are required to achieve good performance Overfitting 
SWM Powerful to identify complex non-linearrelationships
Robust for outliers 
Requires considerable processing time
The performance depends on the selection of the kernel function
Risk of overfitting 
RF Powerful to identify complex non-linear relationshipsHarder to overfit Poor performance with small datasets
The number of decision trees must be set
Low model interpretability 

MLR, multiple linear regression; RB, robust regression; BR, Bayesian regression; MARS, multivariate adaptive regression splines; ANN, artificial neural networks; SVM, support vector machine; RF, random forests.

In the past decade, the assessment of the performance of empirical equations and ML algorithms and regression model approaches for ETo estimation has considerably increased in the academic literature (e.g., Table 2). From these studies, the following facts can be highlighted: (1) most studies used the FAO-56 Penman-Monteith model as the reference for performance assessment; (2) studies applied original, modified, and locally adapted equations; (3) the ranking of the different model's performance between studies showed heterogeneity and its mainly related to the geographic location; (4) most of the ETo models that have been developed are site specific; (5) a combination of several input variables were chosen to identify the ML models with the least number of weather variables, which were found to have higher superiority than empirical equations under all climatic conditions; and 6) most of the regression models also demonstrated high performance.

Table 2

List of performance studies of empirical, machine learning, and regression models for the estimation of ETo

SourceNumber of empirical models*Machine learning modelsRegression modelsCountry (environment type)
Landeras et al. (2008)  11 ANN – Spain (subatlantic enviroment) 
Tabari et al. (2012)  13 SVM, ANFIS MLR, MNLR Iran (semi-arid environment) 
Tabari et al. (2013)  32 – – Iran (humid environment) 
Khoshravesh et al. (2017)   MFP, RB, BR Iran (arid environment) 
Mehdizadeh et al. (2017)  17 GEP, SVM MARS Iran (mainly arid and semi-arid environment) 
Djaman et al. (2019)  35 – – New Mexico – USA (semi-arid environment) 
Farzanpour et al. (2019)  21 – – Iran (semi-arid environment) 
Muhammad et al. (2019)  31 – – Peninsular Malaysia (tropical environment) 
Celestin et al. (2020)  33 – – Hexi Corridor – China (arid environment) 
Chen et al. (2020)  ANN, SVM, RF – Northeast Plain – China (subtropical monsoon environment) 
dos Santos Farias et al. (2020)  ANN, SVM CB, SW Brazil (humid and semi-arid enviroment) 
Ferreira & da Cunha (2020)  ANN, RF, XGBoost – Brazil (sub-humid environment) 
Pinos et al. (2020)  22 ANN MARS Ecuador (super-humid environment) 
Tikhamarine et al. (2020)  ANN, SVM – Algeria (Mediterranean environment) 
SourceNumber of empirical models*Machine learning modelsRegression modelsCountry (environment type)
Landeras et al. (2008)  11 ANN – Spain (subatlantic enviroment) 
Tabari et al. (2012)  13 SVM, ANFIS MLR, MNLR Iran (semi-arid environment) 
Tabari et al. (2013)  32 – – Iran (humid environment) 
Khoshravesh et al. (2017)   MFP, RB, BR Iran (arid environment) 
Mehdizadeh et al. (2017)  17 GEP, SVM MARS Iran (mainly arid and semi-arid environment) 
Djaman et al. (2019)  35 – – New Mexico – USA (semi-arid environment) 
Farzanpour et al. (2019)  21 – – Iran (semi-arid environment) 
Muhammad et al. (2019)  31 – – Peninsular Malaysia (tropical environment) 
Celestin et al. (2020)  33 – – Hexi Corridor – China (arid environment) 
Chen et al. (2020)  ANN, SVM, RF – Northeast Plain – China (subtropical monsoon environment) 
dos Santos Farias et al. (2020)  ANN, SVM CB, SW Brazil (humid and semi-arid enviroment) 
Ferreira & da Cunha (2020)  ANN, RF, XGBoost – Brazil (sub-humid environment) 
Pinos et al. (2020)  22 ANN MARS Ecuador (super-humid environment) 
Tikhamarine et al. (2020)  ANN, SVM – Algeria (Mediterranean environment) 

ANFIS, adaptive neuro-fuzzy inference system; ANN, artificial neural networks; BR, Bayesian regression; CB, cubist regression; GEP, gene expression programming; MARS, multivariate adaptive regression splines; MFP, multivariate fractional polynomial; MLR, multiple linear regression; MNLR, multiple non-linear regression; RB, robust regression; RF, random forest; SVM, support vector machine models; SW, stepwise regression; XGBoost, extreme gradient boosting. Asterisk (*) means that the FAO-56 Penman-Monteith model is included.

Do you know a water or climate scientist who denies the importance of precise estimation of ET? It is well known that the FAO-56 Penman-Monteith model has high performance in estimating ETo and its serves as a good proxy of lysimeters or eddy covariance measurements, however, the model is not free of errors. In the absence of lysimeter or eddy covariance data, the question arises whether the FAO-56 Penman-Monteith method is a valid reference to be used? A ‘double bias’ in estimation model studies can be expected, one by the FAO-56 Penman-Monteith model against the lysimeters/eddy covariance, and the second by the FAO-56 Penman-Monteith model against the estimation model. Why are accurate estimations so important? The under or overestimation of ETo at the aggregated scale (e.g., Pinos et al. 2020) can have consequences in terms of water management, which can introduce water supply problems for agriculture and human consumption, as well as water-cycling modeling.

Why can models not be generalized? Empirical equations require local calibration, which is site-specific, thereby preventing generalization, because high variance exists between locations. Regression models are developed to be site-specific and are sensitive to changes in the ranges and dynamics of the input data, i.e., to other locations. ML models cannot be transferred because the main data processing is implicitly developed as a black box (i.e., the internal working is unknown). To complicate things even more, almost all the studies do not publish the data used in their analysis in open access repositories, therefore, their results cannot be validated or reused by the scientific community for new studies covering large scales such as regions, countries, or biomes.

In summary, the past few decades have seen an explosion of research on ETo estimation methods; however, scientific progress has remained somewhat stagnant. This note reports a brief explanation of the main components and assumptions in estimation models and presents an extended compilation of relevant existing models. In the absence of actual field measurements, empirical models are helpful to estimate ETo using routine meteorological variables. Yet these models require local calibration. To minimize the effort required to calibrate the empirical models to local conditions, studies should be oriented to derive estimation approaches that cover large spatial scales such as regional, national, or biomes. To achieve this goal, it is recommended that studies publish the weather data used in open access repositories. In this way, the number of surveys usually conducted on a local scale can be generalized to larger areas of interest. ML algorithms are increasingly applied to estimating ETo as a function of weather variables. Since ML performs as a black box, future studies should be directed to how to make these models with relatively high accuracy transferable. At present, a great bulk of studies compare the performance of ETo estimation models against the FAO-56 Penman-Monteith equation, and only in a limited number of studies are model results compared with lysimeters or eddy covariance measurements. The quality of input data and direct measurements play a key role for estimation and validation, respective of the empirical, regression, and machine learning models. Furthermore, heterogeneity on ETo models ranking, based on their performance, can lead to ambiguous interpretations.

All relevant data are included in the paper or its Supplementary Information.

Allen
R. G.
1996
Assessing integrity of weather data for reference evapotranspiration estimation
.
Journal of Irrigation and Drainage Engineering
122
(
2
),
97
106
.
https://doi.org/10.1061/(ASCE)0733-9437(1996)122:2(97)
.
Allen
R. G.
,
Pereira
L. S.
,
Raes
D.
&
Smith
M.
1998
Crop Evapotranspiration: Guidelines for Computing Crop Evapotranspiration
.
FAO Irrigation and Drainage Paper, No. 56
,
Rome
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
https://doi.org/10.1023/A:1010933404324
.
Chang
X.
,
Wang
S.
,
Gao
Z.
,
Luo
Y.
&
Chen
H.
2019
Forecast of daily reference evapotranspiration using a modified daily Thornthwaite equation and temperature forecasts
.
Irrigation and Drainage
68
(
2
),
297
317
.
https://doi.org/10.1002/ird.2309
.
Chen
Z.
,
Zhu
Z.
,
Jiang
H.
&
Sun
S.
2020
Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods
.
Journal of Hydrology
591
,
125286
.
https://doi.org/10.1016/j.jhydrol.2020.125286
.
Djaman
K.
,
O'Neill
M.
,
Diop
L.
,
Bodian
A.
,
Allen
S.
,
Koudahe
K.
&
Lombard
K.
2019
Evaluation of the Penman-Monteith and other 34 reference evapotranspiration equations under limited data in a semiarid dry climate
.
Theoretical and Applied Climatology
137
(
1
),
729
743
.
https://doi.org/10.1007/s00704-018-2624-0
.
dos Santos Farias
D. B.
,
Althoff
D.
,
Rodrigues
L. N.
&
Filgueiras
R.
2020
Performance evaluation of numerical and machine learning methods in estimating reference evapotranspiration in a Brazilian agricultural frontier
.
Theoretical and Applied Climatology
142
(
3
),
1481
1492
.
https://doi.org/10.1007/s00704-020-03380-4
.
Eberly
L. E.
2007
Multiple Linear Regression
. In:
Topics in Biostatistics. Methods in Molecular Biology™
, Vol.
404
(
Ambrosius
W. T.
ed.).
Humana Press
, pp.
165
187
.
https://doi.org/10.1007/978-1-59745-530-5_9
.
Farzanpour
H.
,
Shiri
J.
,
Sadraddini
A. A.
&
Trajkovic
S.
2019
Global comparison of 20 reference evapotranspiration equations in a semi-arid region of Iran
.
Hydrology Research
50
(
1
),
282
300
.
https://doi.org/10.2166/nh.2018.174
.
Ferreira
L. B.
&
da Cunha
F. F.
2020
New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning
.
Agricultural Water Management
234
,
106113
.
https://doi.org/10.1016/j.agwat.2020.106113
.
Friedman
J. H.
1991
Multivariate adaptive regression splines
.
The Annals of Statistics
19
(
1
),
1
67
.
https://doi.org/10.1214/aos/1176347963
.
Hupet
F.
&
Vanclooster
M.
2001
Effect of the sampling frequency of meteorological variables on the estimation of the reference evapotranspiration
.
Journal of Hydrology
243
(
3–4
),
192
204
.
https://doi.org/10.1016/S0022-1694(00)00413-3
.
Jing
W.
,
Yaseen
Z. M.
,
Shahid
S.
,
Saggi
M. K.
,
Tao
H.
,
Kisi
O.
,
Salih
S. Q.
,
Al-Ansari
N.
&
Chau
K. W.
2019
Implementation of evolutionary computing models for reference evapotranspiration modeling: short review, assessment and possible future research directions
.
Engineering Applications of Computational Fluid Mechanics
13
(
1
),
811
823
.
https://doi.org/10.1080/19942060.2019.1645045
.
Kecman
V.
2005
Support Vector Machines – An Introduction
. In:
Support Vector Machines: Theory and Applications. Studies in Fuzziness and Soft Computing
, Vol.
177
(
Wang
L.
ed.).
Springer
,
Berlin, Heidelberg
, pp.
1
47
.
https://doi.org/10.1007/10984697_1
.
Khoshravesh
M.
,
Sefidkouhi
M. A. G.
&
Valipour
M.
2017
Estimation of reference evapotranspiration using multivariate fractional polynomial, Bayesian regression, and robust regression models in three arid environments
.
Applied Water Science
7
(
4
),
1911
1922
.
https://doi.org/10.1007/s13201-015-0368-x
.
Landeras
G.
,
Ortiz-Barredo
A.
&
López
J. J.
2008
Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain)
.
Agricultural Water Management
95
(
5
),
553
565
.
https://doi.org/10.1016/j.agwat.2007.12.011
.
Lange
H.
,
Sippel
S.
2020
Machine Learning Applications in Hydrology
. In:
Forest-Water Interactions. Ecological Studies (Analysis and Synthesis)
, Vol.
240
(
Levia
D. F.
,
Carlyle-Moses
D. E.
,
Iida
S.
,
Michalzik
B.
,
Nanko
K.
&
Tischer
A.
eds.).
Springer
,
Cham
, pp.
233
257
.
https://doi.org/10.1007/978-3-030-26086-6_10
.
Meek
D. W.
&
Hatfield
J. L.
1994
Data quality checking for single station meteorological databases
.
Agricultural and Forest Meteorology
69
(
1–2
),
85
109
.
https://doi.org/10.1016/0168-1923(94)90083-3
.
Mehdizadeh
S.
,
Behmanesh
J.
&
Khalili
K.
2017
Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration
.
Computers and Electronics in Agriculture
139
,
103
114
.
https://doi.org/10.1016/j.compag.2017.05.002
.
Muhammad
M. K. I.
,
Nashwan
M. S.
,
Shahid
S.
,
Ismail
T. B.
,
Song
Y. H.
&
Chung
E. S.
2019
Evaluation of empirical reference evapotranspiration models using compromise programming: a case study of peninsular Malaysia
.
Sustainability
11
(
16
),
4267
.
https://doi.org/10.3390/su11164267
.
Pereira
A. R.
&
Pruitt
W. O.
2004
Adaptation of the Thornthwaite scheme for estimating daily reference evapotranspiration
.
Agricultural Water Management
66
(
3
),
251
257
.
https://doi.org/10.1016/j.agwat.2003.11.003
.
Pinos
J.
,
Chacón
G.
&
Feyen
J.
2020
Comparative analysis of reference evapotranspiration models with application to the wet Andean páramo ecosystem in southern Ecuador
.
Meteorologica
45
(
1
),
25
45
.
Tabari
H.
,
Kisi
O.
,
Ezani
A.
&
Talaee
P. H.
2012
SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment
.
Journal of Hydrology
444
,
78
89
.
https://doi.org/10.1016/j.jhydrol.2012.04.007
.
Tabari
H.
,
Grismer
M. E.
&
Trajkovic
S.
2013
Comparative analysis of 31 reference evapotranspiration methods under humid conditions
.
Irrigation Science
31
(
2
),
107
117
.
https://doi.org/10.1007/s00271-011-0295-z
.
Tikhamarine
Y.
,
Malik
A.
,
Souag-Gamane
D.
&
Kisi
O.
2020
Artificial intelligence models versus empirical equations for modeling monthly reference evapotranspiration
.
Environmental Science and Pollution Research
27
,
30001
30019
.
https://doi.org/10.1007/s11356-020-08792-3
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data