Abstract
In this study, a hybrid model of least square support vector machine-gamma test (LSSVM-GT) is proposed for estimating daily ETo under arid conditions of Zahedan station, Iran. Gamma test was used for selecting the best input vectors for models. The estimated ETo by LSSVM-GT model with different kernels of RBF, linear and polynomial, were compared with other hybrid approaches including ANN-GT, ANFIS-GT, and empirical equations. The gamma test revealed that climate variables of minimum and maximum air temperature and wind speed are the most important parameters. The LSSVM model performed better than the ANFIS and ANN models when similar meteorological input variables are used. Also, the performance of the three models of LSSVM, ANFIS, and ANN were better than the empirical equations such as Blaney–Criddle and Hargreaves–Samani. The RMSE, MAE, and R2 for the best input vector by LSSVM were 0.1 mm day−1, 0.13 mm day−1, and 0.99, respectively. The threshold of relative absolute error of 95% predicted values by LSSVM, ANN, and ANFIS models were about 8.4%, 9.4%, and 24%, respectively. Based on the comparison of the overall performances, the developed LSSVM-GT approach is greatly capable of providing favorable predictions with high precision in arid regions of Iran.
NOMENCLATURE
- ET
Evapotranspiration
- ETo
Reference evapotranspiration
- ETc
Crop evapotranspiration
- SVMs
Support vector machines
- LSSVM
Least square support vector machine
- GT
Gamma test
- RBF
Radial basis function
- ANN
Artificial neural network
- ANFIS
Adaptive neuro fuzzy inference system
- FAO
Food and Agriculture Organization
- ETo(PM)
Calculated ETo by Penman–Monteith equation
- ETo(HS)
Calculated ETo by Hargreaves–Samani equation
- ETo(BC)
Calculated ETo by Blaney–Criddle equation
- RMSE
Root mean square error
- MAE
Mean absolute error
- R2
Coefficient of determination
- AARE
Average absolute relative error
- TS
Threshold statistics
- KKT
Karush–Kuhn–Tucker
- Rn
Net radiation
- G
Soil heat flux density
- T
Mean air temperature
- U2
Average wind speed at 2 m height
- Tmean
Mean air temperature
- es
Saturation vapor pressure
- ea
Actual vapor pressure
- P
Mean annual percentage of daytime hours
- Ra
Water equivalent of extraterrestrial radiation
- Tmin
Minimum temperature
- Tdew
Dew point temperature
- Tmax
Maximum temperature
- N
Number of data
- Oi
Observed value
Average of observed values
- Pi
Predicted value
Average of predicted values
- REt
Relative error in predicted values at time t
- Yx
Number of computed ETo
- a and b
Equation parameters in Blaney–Criddle equation
- x
Input data
- y
Output data
- |…|
Euclidean distance
- p
Number of near neighbors
pth nearest neighbors to xi
value for each vector xi
Output value related with
- Γ
Gamma statistic value
- A
Line gradient
Output variance
- r
Noise
- Var(r)
Variance of r parameter
- δ(p)
Mean square distance to the pth nearest neighbors of the input vectors
- ф(x)
Non-linear function
- w
The m-dimensional weight vector
- MSE
Minimum mean square error
- φ
Mapping function that maps x into the m-dimensional feature vector
Lagrange multipliers
- b
Bias
- ei
Slack variables
- i
Input layer
- Ok
Output at the node k of the output layer
- wji
Controller for the strength connection between the input nodes i and the hidden node j
- Ii
Input value
- Vj
Hidden value to node node j of the hidden layer
- g2
Activation function for the output layer
- wjk
Controller for the strength connection between the hidden node j and the output node k.
- tpk
Target output
- Ep
Total error in ANN network
- zpk
Output of ANN
- vpk
Error of output unit k of data pattern p
Membership degree of x in Ai set
Normalized membership degree of i rule
- ci, bi, and ai
Membership function of ANFIS
Membership degree of y in Bi set
- ri, qi, and pi
Adaptive parameters of the ANFIS
- D
Slope of the saturation vapor pressure function
- c
Psychometric constant
INTRODUCTION
Evapotranspiration (ET) calculation is an essential and important subject for quantifying crop water requirements (Lovelli et al. 2008) particularly in arid regions such as the southeast of Iran. Irrigation engineers need to calculate crop water requirement irrigation, especially in agricultural regions, to obtain a satisfactory yield and to estimate other components of the water balance and system design (Kisi 2008). The proper prediction of ET has an important role in the optimal utilization of water resources. Measuring ETo using a lysimeter is a direct and relatively accurate method, but it is expensive and time-consuming and has application limitations. Therefore, ET is usually determined by means of reference evapotranspiration (ETo) in the agricultural sector (Mehdizadeh et al. 2017).
The Food and Agriculture Organization (FAO) introduced the combination equation of Penman–Monteith for estimating ETo modified by Allen et al. (1998) (FAO-56 PM) as the reference method for ETo estimates. This approach is commonly used throughout the world, and has been proven to precisely estimate the ETo in different climates (Kişi & Öztürk 2007; Jain et al. 2008; Kisi 2008; Doğan 2009; Marti et al. 2010; Traore et al. 2010). It is very difficult to formulate a simple equation that can create accurate estimates under different climate conditions (Tabari et al. 2013).
During the last decades, several models such as artificial neural network (ANN), adaptive neuron fuzzy inference system (ANFIS), and genetic programing (GP) have studied the reliability for estimating ETo as a function of climate variables. Recently, a new simulation model, the support vector machine (SVM), has emerged as a data-driven computation in complex and practical studies (Liong & Sivapragasam 2002). The SVM is a powerful model for solving non-linear classification problems, function estimation, and density evaluation. This model solves convex optimization problems (Yu et al. 2006). SVMs are advanced machine learning models based on structural risk minimization (SRM), which minimizes the expected error of a learning model and decreases the problem of overfitting (Yu et al. 2006). SVM is a very good model for solving pattern recognition and classification problems that can be applied to regression problems by introducing an unusual loss function. However, finding the final SVM model can be computationally very difficult because it needs to solve a set of non-linear equations (Niazi et al. 2008). Thus, the least square support vector machine (LSSVM) is recommended as a modified statement of SVM, which offers a set of linear equations instead of a non-linear programming problem (Niazi et al. 2008).
Many researchers have studied the ANN and ANFIS models for estimating ETo (e.g., Kumar et al. 2002; Trajkovic 2005; Kişi & Öztürk 2007; Kim & Kim 2008; Traore et al. 2010). Recently, wide application of the SVM model has been reported in hydrological engineering (Pai & Hong 2007; Behzad et al. 2009; Khemchandani & Chandra 2009; Misra et al. 2009; Kalteh 2013). A few past studies have used the LSSVM model for estimating ETo. Kisi & Cimen (2009), Torres et al. (2011), Kisi (2013), Samui & Dixon (2012), and Tezel & Buyukyildiz (2015) investigated the accuracy of SVM and ANN models for predicting ETo and evaporation, and their comparison results revealed that the SVM could be successfully used in modeling the ETo process. Samui (2011) used the regression model of LSSVM for prediction evaporation losses in reservoirs. The results showed the LSSVM as a robust model for evaporation prediction from a reservoir. Tabari et al. (2013) compared the ANFIS and SVM models and empirical equations including Blaney–Criddle, Makkink, Turc, Priestley–Taylor, Hargreaves & Ritchie for estimating crop evapotranspiration (ETc) of potato when weather or lysimeter data were not complete for applying the FAO method. The results confirmed that the SVM and ANFIS models could provide more accurate ETc estimates. Shrestha & Shukla (2015) used SVM and ANN models for predicting crop coefficient (Kc) and ETc of bell pepper and watermelon using lysimeter dataset. The SVM model was superior to ANN and the improved accuracy of the SVM model makes it useful for deriving Kc and ETc using available hydro-climatic data. Mehdizadeh et al. (2017) investigated the SVM for estimating ETo in Iran. The inputs' selection for the model was conducted based on the parameters used in 16 empirical equations. The performance of the SVM was better than the used empirical equations.
ET is a non-linear and complex process because the estimation and calculation of this parameter requires a large number of meteorological variables, such as maximum air temperature, minimum air temperature, dew point temperature, relative humidity, solar radiation, wind speed, etc. As selecting false variables can prevent achieving the optimal solution in the simulation model, the proper selection of input variables is a challenging and vital problem. There are several methods for reducing the number of input variables and selecting effective variables. In this study, the gamma test (GT) was used as an advanced method for optimal selection of input variables. The GT estimates the minimum mean square error (MSE) that can be acquired when modeling unseen data using any continuous non-linear models. The ability of GT was evaluated for selecting variables to make suitable non-linear models for estimating radiation. The number of data needed to build a reliable model was determined by M-test (Remesan et al. 2008). In this research, the GT model was used for selection of the best input combination from climate variables that have the most effect on daily ETo.
Basically, the LSSVM model is a type of soft computing technique that has recently obtained importance in different applications such as ETo estimation. The application of hybrid models for ETo estimation has gained great popularity but input selection techniques hybridizing with machine learning is missing. As a consequence, in this research, a new model is developed to estimate daily ETo by hybridizing the LSSVM and GT models. Thus, the main objectives of this study are: (1) to examine Blaney–Criddle and Hargreaves–Samani equations against the FAO-56 PM as the reference equation using weather data from Zahedan synoptic station located in an arid climate in Iran; (2) selecting the best and optimal input combinations for the LSSVM, ANN, and ANFIS models using the GT method; (3) to investigate the capability of the LSSVM model with different kernels including RBF, linear, and polynomial for modeling ETo; (4) to evaluate the performance of ANFIS and ANN models to predict daily ETo in an arid area; and (5) to compare the performances of LSSVM, ANFIS, ANN, and climate-based models.
MATERIALS AND METHODS
Model parameter selection using gamma test


The gamma test estimates what percentage of the y variance is caused by the stochastic variable r, and what percentage is caused by unknown function f. The gamma test is described based on the distance between two points. If two points are close in input space, then their corresponding outputs should be close together in the output space. Otherwise, the difference between output distances is considered as a noise (Karimaldini et al. 2012).






Least square support vector machine
The LSSVM was introduced by Suykens & Vandewalle (1999). The LSSVM formulation has the same constraints as the SVM model but it performs better than the SVM model computationally. In this case, training needs to solve a set of linear functions instead of solving the quadratic programming problem of the classical SVM model (Khemchandani & Chandra 2009). The LSSVM model effectively reduces the complexity of the algorithm and uses all training data for solving the optimization problem (Suykens & Vandewalle 2000).
The procedure of the LSSVM regression algorithm is illustrated in Figure 1.
Artificial neural networks
Adaptive neuro fuzzy inference system
In this model, the main training algorithm is error back-propagation. By using a gradient descent algorithm, error signals are propagated towards the input layers and nodes and model parameters adopted (Riahi-Madvar et al. 2009). Gaussian membership function was used for designing ANFIS networks. The number of membership functions for each variable was determined through trial and error.
Empirical equations of ETo estimation
Evaluation criteria


Case study
A precise measurement of meteorological variables is an important issue in ETo studies. Thus, it is necessary to investigate the accuracy of meteorological data. In this paper, evaluation of meteorological data was conducted using recommendations in guidelines of FAO-56 PM (Allen et al. 1998) and ASCE reports (Allen 1996).
The FAO-56 PM method was proposed as the standard method for estimating of ETo at an international level. This model is used for evaluating the results of other models when lysimeter measured data are not available (Terzi et al. 2006; Kişi & Öztürk 2007; Zanetti et al. 2007; Jain et al. 2008; Doğan 2009; Marti et al. 2010; Traore et al. 2010; Kisi 2013). Similarly, the FAO-56 PM model was used for evaluating the results of LSSVM, ANN, and ANFIS models in this study.
Modeling framework
In this study, six climate variables that have been measured at the Meteorological Organization of Iran, including Tmin, Tmax, Tdew, RHmean, n, and U2 are used for training and testing of LSSVM, ANN, and ANFIS models. Combining these variables creates 63 different combinations of input variables, and 30 additional combinations are also created by using solar radiation (Rs) instead of sunshine hours, for example, I1: ETo=f (Tmin), …, I63: ETo=f (Tmin, Tmax, RHmean, n, u2, Tdew).
In previous studies on ETo, a trial and error approach was used for input variable selection. Using trial and error approach for modeling these 93 combinations is time-consuming. On the other hand, there is not any practical guidance in the literature about input vectors that must be used to develop robust expert models for ETo predictions. Due to this shortcoming in input vector selection of ETo, in this study, the GT technique was used to determine the best input vector required for developing non-linear models for ETo estimation, and to find the most important variables that effect ETo estimation. Indeed, in this study, the new method of GT is combined with expert models and hybrid prediction models are developed. MATLAB software was used for developing computational programs and learning and simulating algorithms.
In this study, three kernel functions of linear, radial basic function (RBF), and polynomial were used for the LSSVM model. These functions have γ and σ2 calibration parameters, where their values should be determined during the calibration of the model to achieve the maximum performance of the LSSVM model. The γ parameter specifies the trade-off between the fitting error minimization and the smoothness of the estimated function. These parameters do not have specific values predetermined and therefore should be determined separately for each combination. For this purpose, an exponential sequences series of these parameters including and
were used for each factor. The ten-fold grid search algorithm was used for finding the best ratio between these values. Also, a large number of trials were applied for determining the best parameters of c and d for polynomial kernel.
The models were trained using all available combinations of coefficients and the combination that causes the least amount of error was selected. The values of regulatory parameters related to optimization problem and kernel functions were introduced with a matrix of input (combinations of meteorological variables) and output (ETo values calculated from FAO-56 PM function) training data and then the bias values were determined. Modeling was performed using selected parameters in the previous stage and input matrix from selected training data to predict the desired output values. By this way, a k-fold algorithm is used not only for parameter optimization of LSSVM, but also for training of the expert models. The flowchart of modeling and methodology is shown in Figure 5.
RESULTS AND DISCUSSION
The whole dataset was divided into two parts and 75 and 25% of the dataset were selected for training and testing, respectively. The first dataset of 1982–1997 was used for training the models and the remaining data (1998–2003) was utilized to test the models.
Empirical equations result
In order to evaluate the performance of the climate-based models, the computed ETo values using Blaney–Criddle and Hargreaves–Samani equations are compared by the FAO-56 PM model. The results of the statistical analysis of these empirical equations versus the FAO-56 PM model are given in Table 1. Based on the results, the ETo predicted by Blaney–Criddle (ETo(BC)) is better matched to the calculated ETo by the FAO-56 PM model with lower errors rates (RMSE = 2.03 mm/day and MAE = 1.78 mm/day) than the Hargreaves–Samani (ETo(HS)) model. Thus, the estimation error of the Hargreaves–Samani equation was higher than the Blaney–Criddle equation at this station.
The performance statistics of empirical models
Performance statistics . | Empirical equation . | ||
---|---|---|---|
Blaney–Criddle . | Hargreaves–Samani . | ||
Calibration | RMSE | 2.03 | 2.58 |
MAE | 1.78 | 1.79 | |
Equation | ![]() | ![]() | |
Test | RMSE | 0.49 | 1.23 |
MAE | 0.38 | 1.02 |
Performance statistics . | Empirical equation . | ||
---|---|---|---|
Blaney–Criddle . | Hargreaves–Samani . | ||
Calibration | RMSE | 2.03 | 2.58 |
MAE | 1.78 | 1.79 | |
Equation | ![]() | ![]() | |
Test | RMSE | 0.49 | 1.23 |
MAE | 0.38 | 1.02 |
Valipour et al. (2017) showed that the Blaney–Criddle is the best model for estimating the ETo based on FAO-56 PM in arid regions. It can be clearly seen from the statistics given in Table 1 that the accuracy of Blaney–Criddle and Hargreaves–Samani equations increase using calibration with the FAO-56 PM equation. The result of ETo values computed by empirical equations and their calibration equations are given in Figure 6. Also, Figure 7 shows the comparison plots, between the daily estimated ETo values by empirical models and those obtained from the FAO-56 PM model. As seen from Figure 7, two empirical models have a tendency to underestimate the ETo(PM) values in the arid climate of Zahedan. This result has been approved by many researchers such as as Mohawesh (2010), Sabziparvar & Tabari (2010), Raziei & Pereira (2013), and Ngongondo et al. (2013) in arid areas.
The estimated ETo values with calibrated Hargreaves–Samani (top) and Blaney–Criddle (bottom) equations in the testing phase.
The estimated ETo values with calibrated Hargreaves–Samani (top) and Blaney–Criddle (bottom) equations in the testing phase.
GT input selection results
The Spearman correlation matrix between the input variables and output is given in Table 2. The results showed that ETo(PM) was strongly correlated with minimum, maximum, and dew point air temperature and solar radiation variables. When the plants are in a hot air condition, water is released from their open stomata, and transpiration will increase (Crawford et al. 2012). The ETo has a very strong correlation with radiation parameters at Zahedan station. Mean relative humidity was negatively and strongly correlated with ETo(PM). This negative correlation demonstrates that relative humidity has inverse relationships with ETo. It is noticeable that the maximum temperature is negatively correlated with RHmean. According to this, if the maximum temperature increases, the mean relative humidity variable would decrease. This is demonstrated, because at higher temperatures more water is lost from the Earth's surface and from plant cells to the atmosphere due to low humidity in the atmosphere (Edoga & Suzzy 2008). Thus, a low rate of ETo is obtained when the air is cool, cloudy, and humid while the rate of ETo is low in hot, sunny, and dry conditions. The effect of RHmean for estimating ETo(PM) was greater than RHmin and RHmax. Wind speed was also positively but moderately correlated with daily ETo. This result may be due to the non-linear effect of wind speed on the daily ETo and the complex nature of the aerodynamic effects in relation to ET (Vanderlinden et al. 2004).
Correlation matrix between input and output variables
Parameter . | Tmin . | Tmax . | RHmin . | RHmax . | Tdew . | RHmean . | U2 . | Rs . | n . | ETo(PM) . |
---|---|---|---|---|---|---|---|---|---|---|
Tmin | 1 | 0.77 | 0.25 | 0.45 | 0.99 | −0.2 | 0.33 | 0.48 | 0.18 | 0.74 |
Tmax | 0.77 | 1 | −0.16 | 0.27 | 0.75 | −0.51 | 0.01 | 0.69 | 0.48 | 0.84 |
RHmin | 0.25 | −0.16 | 1 | 0.8 | 0.28 | 0.34 | 0.17 | −0.17 | −0.32 | −0.08 |
RHmax | 0.45 | 0.27 | 0.8 | 1 | 0.49 | 0.14 | 0.02 | 0.1 | −0.1 | 0.21 |
Tdew | 0.99 | 0.75 | 0.28 | 0.49 | 1 | −0.16 | 0.33 | 0.45 | 0.15 | 0.72 |
RHmean | −0.2 | −0.51 | 0.34 | 0.14 | −0.16 | 1 | −0.04 | −0.7 | −0.63 | −0.62 |
U2 | 0.33 | 0.01 | 0.17 | 0.02 | 0.33 | −0.04 | 1 | 0.05 | −0.07 | 0.41 |
Rs | 0.48 | 0.69 | −0.17 | 0.1 | 0.45 | −0.7 | 0.05 | 1 | 0.88 | 0.83 |
n | 0.18 | 0.48 | −0.32 | −0.1 | 0.15 | −0.63 | −0.07 | 0.88 | 1 | 0.6 |
ETo(PM) | 0.74 | 0.84 | −0.08 | 0.21 | 0.72 | −0.62 | 0.41 | 0.83 | 0.6 | 1 |
Parameter . | Tmin . | Tmax . | RHmin . | RHmax . | Tdew . | RHmean . | U2 . | Rs . | n . | ETo(PM) . |
---|---|---|---|---|---|---|---|---|---|---|
Tmin | 1 | 0.77 | 0.25 | 0.45 | 0.99 | −0.2 | 0.33 | 0.48 | 0.18 | 0.74 |
Tmax | 0.77 | 1 | −0.16 | 0.27 | 0.75 | −0.51 | 0.01 | 0.69 | 0.48 | 0.84 |
RHmin | 0.25 | −0.16 | 1 | 0.8 | 0.28 | 0.34 | 0.17 | −0.17 | −0.32 | −0.08 |
RHmax | 0.45 | 0.27 | 0.8 | 1 | 0.49 | 0.14 | 0.02 | 0.1 | −0.1 | 0.21 |
Tdew | 0.99 | 0.75 | 0.28 | 0.49 | 1 | −0.16 | 0.33 | 0.45 | 0.15 | 0.72 |
RHmean | −0.2 | −0.51 | 0.34 | 0.14 | −0.16 | 1 | −0.04 | −0.7 | −0.63 | −0.62 |
U2 | 0.33 | 0.01 | 0.17 | 0.02 | 0.33 | −0.04 | 1 | 0.05 | −0.07 | 0.41 |
Rs | 0.48 | 0.69 | −0.17 | 0.1 | 0.45 | −0.7 | 0.05 | 1 | 0.88 | 0.83 |
n | 0.18 | 0.48 | −0.32 | −0.1 | 0.15 | −0.63 | −0.07 | 0.88 | 1 | 0.6 |
ETo(PM) | 0.74 | 0.84 | −0.08 | 0.21 | 0.72 | −0.62 | 0.41 | 0.83 | 0.6 | 1 |
Therefore, the results of correlation indicate that the radiation parameters and air temperature are the most important factors influencing the daily ETo at Zahedan station. In addition, the high correlation between climate variables such as (1) Tmin, Tmax, Tdew, (2) Rs, n, etc., show that a combination of these parameters can offer a good estimation for ETo.
In this study, the best combinations of the input dataset were determined with GT to assess their influence on the ETo modeling in an arid area. Different combinations (93 combinations) of input variables were examined and the best combination determined by the lowest of the gamma values. The best combinations that had the smallest gamma values are given in Table 3. These input combinations were evaluated by LSSVM, ANFIS, and ANN models in the present study. In Table 3, the input vectors of I11 and I12 include the same climate inputs required for the Blaney–Criddle and Hargreaves–Samani equations, respectively. As shown, the I1 input vector with parameters of Tmax, Tdew, RHmean, Rs, U2 had the least values of gamma and Vratio, although its result was similar and near to those of the I2, …, I7 input vectors. The input vectors I11 and I12 had higher values of gamma and Vratio than the other input vectors. The Vratio, gradient, and standard error of Γ represent the accuracy and complexity of the model that should be developed (Karimaldini et al. 2012).
The gamma test results for the selected input combinations
Input vectors . | Γ . | Vratio . | Gradient . | Standard error . |
---|---|---|---|---|
(I1): Tmax, Tdew, RHmean, Rs, and U2 | 1.42 × 10−3 | 5.69 × 10−3 | 0.08 | 1.09 × 10−4 |
(I2): Tmin, Tmax, Tdew, Rs, and U2 | 1.43 × 10−3 | 5.73 × 10−3 | 0.08 | 1.18 × 10−4 |
(I3): Tmin, Tmax, RHmean, Rs, and U2 | 1.55 × 10−3 | 6.19 × 10−3 | 0.07 | 0.99 × 10−4 |
(I4): Tmin, Tmax, Tdew, RHmean, Rs, and U2 | 1.56 × 10−3 | 6.34 × 10−3 | 0.1 | 1.04 × 10−4 |
(I5): Tmax, Tdew, Rs, and U2 | 1.73 × 10−3 | 6.93 × 10−3 | 0.11 | 0.7 × 10−4 |
(I6): Tmax, Tmin, Rs, and U2 | 1.9 × 10−3 | 7.6 × 10−3 | 0.11 | 0.7 × 10−4 |
(I7): Tmax, RHmean, Rs, and U2 | 2.18 × 10−3 | 8.72 × 10−3 | 0.09 | 0.55 × 10−4 |
(I8): Tmax, Rs, and U2 | 3.03 × 10−3 | 12.12 × 10−3 | 0.13 | 0.65 × 10−4 |
(I9): Tmin, Tmax, RHmean, U2, and n | 7.52 × 10−3 | 30.1 × 10−3 | 0.08 | 2.93 × 10−4 |
(I10): Tmax, Tdew, RHmean, U2, and n | 7.45 × 10−3 | 29.8 × 10−3 | 0.08 | 2.49 × 10−4 |
(I11): Tmin, Tmax, Ra, and n | 26.15 × 10−4 | 104.62 × 10−3 | 0.77 | 4.33 × 10−4 |
(I12): Tmin, Tmax, and Ra | 26.34 × 10−4 | 105.35 × 10−3 | 0.73 | 5.07 × 10−4 |
Input vectors . | Γ . | Vratio . | Gradient . | Standard error . |
---|---|---|---|---|
(I1): Tmax, Tdew, RHmean, Rs, and U2 | 1.42 × 10−3 | 5.69 × 10−3 | 0.08 | 1.09 × 10−4 |
(I2): Tmin, Tmax, Tdew, Rs, and U2 | 1.43 × 10−3 | 5.73 × 10−3 | 0.08 | 1.18 × 10−4 |
(I3): Tmin, Tmax, RHmean, Rs, and U2 | 1.55 × 10−3 | 6.19 × 10−3 | 0.07 | 0.99 × 10−4 |
(I4): Tmin, Tmax, Tdew, RHmean, Rs, and U2 | 1.56 × 10−3 | 6.34 × 10−3 | 0.1 | 1.04 × 10−4 |
(I5): Tmax, Tdew, Rs, and U2 | 1.73 × 10−3 | 6.93 × 10−3 | 0.11 | 0.7 × 10−4 |
(I6): Tmax, Tmin, Rs, and U2 | 1.9 × 10−3 | 7.6 × 10−3 | 0.11 | 0.7 × 10−4 |
(I7): Tmax, RHmean, Rs, and U2 | 2.18 × 10−3 | 8.72 × 10−3 | 0.09 | 0.55 × 10−4 |
(I8): Tmax, Rs, and U2 | 3.03 × 10−3 | 12.12 × 10−3 | 0.13 | 0.65 × 10−4 |
(I9): Tmin, Tmax, RHmean, U2, and n | 7.52 × 10−3 | 30.1 × 10−3 | 0.08 | 2.93 × 10−4 |
(I10): Tmax, Tdew, RHmean, U2, and n | 7.45 × 10−3 | 29.8 × 10−3 | 0.08 | 2.49 × 10−4 |
(I11): Tmin, Tmax, Ra, and n | 26.15 × 10−4 | 104.62 × 10−3 | 0.77 | 4.33 × 10−4 |
(I12): Tmin, Tmax, and Ra | 26.34 × 10−4 | 105.35 × 10−3 | 0.73 | 5.07 × 10−4 |
The results of GT confirmed that the RHmean variable was more effective than RHmin and RHmax parameters for estimating daily ETo at Zahedan station in an arid environment. Although, based on Table 2, the wind speed had moderate correlation with ETo, but from Table 3, the wind speed is the most effective variable after temperature (without elimination of any RH and Rs variables) that should be considered in the input vectors of models. Hence, from the GT, the best results could be attained when wind speed and temperature variables are considered in the combination of input vectors. The difference between results of Tables 2 and 3 may be because of the robust non-linear nature of relations among climate variables. Also, by eliminating solar radiation variable from input vectors, the best results of GT were obtained using combinations that include the sunshine hour variable. This is because of the significant relations between solar radiation and sunshine hours variables with correlation coefficient equal to 0.88 (Table 2).
LSSVM, ANN, ANFIS hybrid with GT model results
The LSSVM model with three kernels including RBF, polynomial, and linear were implemented by three different program codes written in MATLAB for predicting daily ETo. Also, the ANFIS and ANN models were developed using the same training and test dataset used for the LSSVM model. The architectures of input vectors that were selected from gamma test results (I1, I2, …, I10) as the best combinations and two other input vectors with the same variables as Blaney–Criddle and Hargreaves–Samani equations (I11 and I12) were employed using these codes, then the proper model structure was determined for each input vector. The performance criteria summary of all models for estimating ETo in training and testing phases is given in Table 4. Results show that the LSSVM model using RBF kernel (RBF-LSSVM) consistently outperformed the polynomial and linear kernels. The regularization parameter (γ = 22.78) and the kernel parameter of RBF (σ2 = 6.25) were determined by 10 k-fold. The RBF is a robust kernel that is very suitable for limiting the computational training process and modifying the generalization of estimation. The RBF kernel can successfully be used by the LSSVM model for estimating climate-oriented issues (such as evapotranspiration) that are naturally non-linear phenomena (Mirzavand et al. 2015).
The performance of the three applied models for estimating ETo
Input vector . | Model . | Training phase . | Testing phase . | Structure . | ||||
---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | |||
I1 | RBF-LSSVM1 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM1 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM1 | 0.95 | 1.11 | 1.20 | 0.95 | 1.14 | 1.20 | γ = 8 | |
ANFIS1 | 0.99 | 0.49 | 0.52 | 0.99 | 0.26 | 0.30 | NMF = 10 | |
ANN1 | 0.99 | 0.14 | 0.17 | 0.99 | 0.10 | 0.13 | NNHL = 10 | |
I2 | RBF-LSSVM2 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM2 | 0.99 | 0.15 | 0.11 | 0.99 | 0.15 | 0.12 | γ = 8.06 | |
Linear-LSSVM2 | 0.95 | 1.08 | 1.17 | 0.95 | 1.14 | 1.19 | γ = 8 | |
ANFIS2 | 0.99 | 0.24 | 0.29 | 0.99 | 0.38 | 0.42 | NMF = 10 | |
ANN2 | 0.99 | 0.09 | 0.13 | 0.99 | 0.10 | 0.13 | NNHL = 12 | |
I3 | RBF-LSSVM3 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM3 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM3 | 0.95 | 1.13 | 1.22 | 0.95 | 1.14 | 1.19 | γ = 8 | |
ANFIS3 | 0.99 | 0.48 | 0.52 | 0.99 | 0.23 | 0.27 | NMF = 10 | |
ANN3 | 0.99 | 0.20 | 0.23 | 0.99 | 0.14 | 0.16 | NNHL = 5 | |
I4 | RBF-LSSVM4 | 0.99 | 0.12 | 0.08 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM4 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM4 | 0.95 | 1.11 | 1.20 | 0.95 | 1.14 | 1.20 | γ = 0.09 | |
ANFIS4 | 0.98 | 0.64 | 0.69 | 0.99 | 0.34 | 0.37 | NMF = 10 | |
ANN4 | 0.99 | 0.12 | 0.15 | 0.99 | 0.09 | 0.12 | NNHL = 14 | |
I5 | RBF-LSSVM5 | 0.99 | 0.13 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM5 | 0.99 | 0.16 | 0.12 | 0.99 | 0.15 | 0.12 | γ = 8.06 | |
Linear-LSSVM5 | 0.95 | 1.07 | 1.17 | 0.95 | 1.14 | 1.20 | γ = 8 | |
ANFIS5 | 0.99 | 0.44 | 0.47 | 0.99 | 0.43 | 0.46 | NMF = 10 | |
ANN5 | 0.99 | 0.12 | 0.16 | 0.99 | 0.12 | 0.15 | NNHL = 7 | |
I6 | RBF-LSSVM6 | 0.99 | 0.14 | 0.09 | 0.99 | 0.14 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM6 | 0.99 | 0.16 | 0.12 | 0.99 | 0.16 | 0.12 | γ = 8.06 | |
Linear-LSSVM6 | 0.95 | 1.10 | 1.19 | 0.95 | 1.13 | 1.19 | γ = 8 | |
ANFIS6 | 0.99 | 0.19 | 0.25 | 0.99 | 0.17 | 0.21 | NMF = 10 | |
ANN6 | 0.99 | 0.11 | 0.15 | 0.99 | 0.13 | 0.16 | NNHL = 12 | |
I7 | RBF-LSSVM7 | 0.99 | 0.16 | 0.11 | 0.99 | 0.17 | 0.13 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM7 | 0.99 | 0.19 | 0.14 | 0.99 | 0.18 | 0.14 | γ = 1.01 | |
Linear-LSSVM7 | 0.95 | 1.10 | 1.20 | 0.95 | 1.20 | 1.26 | γ = 8 | |
ANFIS7 | 0.99 | 0.31 | 0.36 | 0.99 | 0.21 | 0.26 | NMF = 11 | |
ANN7 | 0.99 | 0.27 | 0.31 | 0.99 | 0.18 | 0.21 | NNHL = 15 | |
I8 | RBF-LSSVM8 | 0.99 | 0.18 | 0.13 | 0.99 | 0.19 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM8 | 0.98 | 0.21 | 0.15 | 0.99 | 0.20 | 0.14 | γ = 1.01 | |
Linear-LSSVM8 | 0.94 | 1.14 | 1.24 | 0.95 | 1.22 | 1.28 | γ = 8 | |
ANFIS8 | 0.99 | 0.22 | 0.27 | 0.99 | 0.20 | 0.25 | NMF = 10 | |
ANN8 | 0.99 | 0.15 | 0.20 | 0.99 | 0.25 | 0.29 | NNHL = 9 | |
I9 | RBF-LSSVM9 | 0.98 | 0.20 | 0.13 | 0.98 | 0.21 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM9 | 0.98 | 0.23 | 0.15 | 0.98 | 0.23 | 0.17 | γ = 1.01 | |
Linear-LSSVM9 | 0.94 | 1.08 | 1.16 | 0.95 | 0.97 | 1.07 | γ = 8 | |
ANFIS9 | 0.97 | 0.28 | 0.35 | 0.98 | 0.19 | 0.28 | NMF = 10 | |
ANN9 | 0.98 | 0.19 | 0.28 | 0.98 | 0.16 | 0.23 | NNHL = 11 | |
I10 | RBF-LSSVM10 | 0.98 | 0.20 | 0.13 | 0.98 | 0.21 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM10 | 0.98 | 0.22 | 0.15 | 0.98 | 0.23 | 0.16 | γ = 1.01 | |
Linear-LSSVM10 | 0.94 | 1.12 | 1.19 | 0.95 | 0.95 | 1.05 | γ = 8 | |
ANFIS10 | 0.97 | 0.29 | 0.36 | 0.98 | 0.19 | 0.29 | NMF = 10 | |
ANN10 | 0.98 | 0.25 | 0.32 | 0.98 | 0.14 | 0.23 | NNHL = 9 | |
I11 | RBF-LSSVM11 | 0.87 | 0.58 | 0.44 | 0.89 | 0.63 | 0.49 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM11 | 0.86 | 0.60 | 0.47 | 0.89 | 0.63 | 0.50 | γ = 1.01 | |
Linear-LSSVM11 | 0.85 | 1.12 | 1.31 | 0.89 | 1.21 | 1.39 | γ = 0.008 | |
ANFIS11 | 0.87 | 0.97 | 1.17 | 0.89 | 1.17 | 1.39 | NMF = 11 | |
ANN11 | 0.85 | 0.90 | 1.12 | 0.89 | 0.97 | 1.19 | NNHL = 1 | |
I12 | RBF-LSSVM12 | 0.86 | 0.58 | 0.45 | 0.89 | 0.63 | 0.49 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM12 | 0.85 | 0.60 | 0.47 | 0.89 | 0.63 | 0.50 | γ = 1.01 | |
Linear-LSSVM12 | 0.84 | 1.21 | 1.40 | 0.88 | 1.29 | 1.46 | γ = 0.02 | |
ANFIS12 | 0.86 | 0.94 | 1.14 | 0.89 | 1.15 | 1.36 | NMF = 10 | |
ANN12 | 0.85 | 0.85 | 1.07 | 0.89 | 0.90 | 1.10 | NNHL = 10 |
Input vector . | Model . | Training phase . | Testing phase . | Structure . | ||||
---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | R2 . | RMSE . | MAE . | |||
I1 | RBF-LSSVM1 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM1 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM1 | 0.95 | 1.11 | 1.20 | 0.95 | 1.14 | 1.20 | γ = 8 | |
ANFIS1 | 0.99 | 0.49 | 0.52 | 0.99 | 0.26 | 0.30 | NMF = 10 | |
ANN1 | 0.99 | 0.14 | 0.17 | 0.99 | 0.10 | 0.13 | NNHL = 10 | |
I2 | RBF-LSSVM2 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM2 | 0.99 | 0.15 | 0.11 | 0.99 | 0.15 | 0.12 | γ = 8.06 | |
Linear-LSSVM2 | 0.95 | 1.08 | 1.17 | 0.95 | 1.14 | 1.19 | γ = 8 | |
ANFIS2 | 0.99 | 0.24 | 0.29 | 0.99 | 0.38 | 0.42 | NMF = 10 | |
ANN2 | 0.99 | 0.09 | 0.13 | 0.99 | 0.10 | 0.13 | NNHL = 12 | |
I3 | RBF-LSSVM3 | 0.99 | 0.12 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM3 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM3 | 0.95 | 1.13 | 1.22 | 0.95 | 1.14 | 1.19 | γ = 8 | |
ANFIS3 | 0.99 | 0.48 | 0.52 | 0.99 | 0.23 | 0.27 | NMF = 10 | |
ANN3 | 0.99 | 0.20 | 0.23 | 0.99 | 0.14 | 0.16 | NNHL = 5 | |
I4 | RBF-LSSVM4 | 0.99 | 0.12 | 0.08 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM4 | 0.99 | 0.15 | 0.11 | 0.99 | 0.14 | 0.11 | γ = 8.06 | |
Linear-LSSVM4 | 0.95 | 1.11 | 1.20 | 0.95 | 1.14 | 1.20 | γ = 0.09 | |
ANFIS4 | 0.98 | 0.64 | 0.69 | 0.99 | 0.34 | 0.37 | NMF = 10 | |
ANN4 | 0.99 | 0.12 | 0.15 | 0.99 | 0.09 | 0.12 | NNHL = 14 | |
I5 | RBF-LSSVM5 | 0.99 | 0.13 | 0.09 | 0.99 | 0.13 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM5 | 0.99 | 0.16 | 0.12 | 0.99 | 0.15 | 0.12 | γ = 8.06 | |
Linear-LSSVM5 | 0.95 | 1.07 | 1.17 | 0.95 | 1.14 | 1.20 | γ = 8 | |
ANFIS5 | 0.99 | 0.44 | 0.47 | 0.99 | 0.43 | 0.46 | NMF = 10 | |
ANN5 | 0.99 | 0.12 | 0.16 | 0.99 | 0.12 | 0.15 | NNHL = 7 | |
I6 | RBF-LSSVM6 | 0.99 | 0.14 | 0.09 | 0.99 | 0.14 | 0.10 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM6 | 0.99 | 0.16 | 0.12 | 0.99 | 0.16 | 0.12 | γ = 8.06 | |
Linear-LSSVM6 | 0.95 | 1.10 | 1.19 | 0.95 | 1.13 | 1.19 | γ = 8 | |
ANFIS6 | 0.99 | 0.19 | 0.25 | 0.99 | 0.17 | 0.21 | NMF = 10 | |
ANN6 | 0.99 | 0.11 | 0.15 | 0.99 | 0.13 | 0.16 | NNHL = 12 | |
I7 | RBF-LSSVM7 | 0.99 | 0.16 | 0.11 | 0.99 | 0.17 | 0.13 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM7 | 0.99 | 0.19 | 0.14 | 0.99 | 0.18 | 0.14 | γ = 1.01 | |
Linear-LSSVM7 | 0.95 | 1.10 | 1.20 | 0.95 | 1.20 | 1.26 | γ = 8 | |
ANFIS7 | 0.99 | 0.31 | 0.36 | 0.99 | 0.21 | 0.26 | NMF = 11 | |
ANN7 | 0.99 | 0.27 | 0.31 | 0.99 | 0.18 | 0.21 | NNHL = 15 | |
I8 | RBF-LSSVM8 | 0.99 | 0.18 | 0.13 | 0.99 | 0.19 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM8 | 0.98 | 0.21 | 0.15 | 0.99 | 0.20 | 0.14 | γ = 1.01 | |
Linear-LSSVM8 | 0.94 | 1.14 | 1.24 | 0.95 | 1.22 | 1.28 | γ = 8 | |
ANFIS8 | 0.99 | 0.22 | 0.27 | 0.99 | 0.20 | 0.25 | NMF = 10 | |
ANN8 | 0.99 | 0.15 | 0.20 | 0.99 | 0.25 | 0.29 | NNHL = 9 | |
I9 | RBF-LSSVM9 | 0.98 | 0.20 | 0.13 | 0.98 | 0.21 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM9 | 0.98 | 0.23 | 0.15 | 0.98 | 0.23 | 0.17 | γ = 1.01 | |
Linear-LSSVM9 | 0.94 | 1.08 | 1.16 | 0.95 | 0.97 | 1.07 | γ = 8 | |
ANFIS9 | 0.97 | 0.28 | 0.35 | 0.98 | 0.19 | 0.28 | NMF = 10 | |
ANN9 | 0.98 | 0.19 | 0.28 | 0.98 | 0.16 | 0.23 | NNHL = 11 | |
I10 | RBF-LSSVM10 | 0.98 | 0.20 | 0.13 | 0.98 | 0.21 | 0.14 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM10 | 0.98 | 0.22 | 0.15 | 0.98 | 0.23 | 0.16 | γ = 1.01 | |
Linear-LSSVM10 | 0.94 | 1.12 | 1.19 | 0.95 | 0.95 | 1.05 | γ = 8 | |
ANFIS10 | 0.97 | 0.29 | 0.36 | 0.98 | 0.19 | 0.29 | NMF = 10 | |
ANN10 | 0.98 | 0.25 | 0.32 | 0.98 | 0.14 | 0.23 | NNHL = 9 | |
I11 | RBF-LSSVM11 | 0.87 | 0.58 | 0.44 | 0.89 | 0.63 | 0.49 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM11 | 0.86 | 0.60 | 0.47 | 0.89 | 0.63 | 0.50 | γ = 1.01 | |
Linear-LSSVM11 | 0.85 | 1.12 | 1.31 | 0.89 | 1.21 | 1.39 | γ = 0.008 | |
ANFIS11 | 0.87 | 0.97 | 1.17 | 0.89 | 1.17 | 1.39 | NMF = 11 | |
ANN11 | 0.85 | 0.90 | 1.12 | 0.89 | 0.97 | 1.19 | NNHL = 1 | |
I12 | RBF-LSSVM12 | 0.86 | 0.58 | 0.45 | 0.89 | 0.63 | 0.49 | γ = 22.78, σ2 = 6.25 |
Polynomial-LSSVM12 | 0.85 | 0.60 | 0.47 | 0.89 | 0.63 | 0.50 | γ = 1.01 | |
Linear-LSSVM12 | 0.84 | 1.21 | 1.40 | 0.88 | 1.29 | 1.46 | γ = 0.02 | |
ANFIS12 | 0.86 | 0.94 | 1.14 | 0.89 | 1.15 | 1.36 | NMF = 10 | |
ANN12 | 0.85 | 0.85 | 1.07 | 0.89 | 0.90 | 1.10 | NNHL = 10 |
NMF, number of member functions; NNHL, number of neurons in hidden layer.
As shown in Table 4, the RBF-LSSVM4 model with input variables of maximum, minimum, and dew point air temperature, mean relative humidity, solar radiation, and wind speed had the best performance (RMSE = 0.1 mm day−1, MAE = 0.13 mm day−1, and R2 = 0.99) among the LSSVM models. Also, the RBF-LSSVM1, RBF-LSSVM2, RBF-LSSVM3, and RBF-LSSVM5 provided the same results with RBF-LSSVM4 in testing phase. Among the ANFIS models, ANFIS6 with input parameters of maximum and minimum air temperature, solar radiation, and wind speed had the best accuracy with RMSE = 0.17 mm day−1, MAE = 0.21 mm day−1, and R2 = 0.99. After that, the ANFIS8 model with RMSE = 0.2 mm day−1, MAE = 0.25 mm day−1, and R2 = 0.99 was ranked as the second best model for estimating daily ETo. ANFIS8 includes the three climate variables of maximum air temperature, solar radiation, and wind speed. Also, ANN4 had the best performance (RMSE = 0.09 mm day−1, MAE = 0.12 mm day−1, and R2 = 0.99) among ANN models for daily ETo estimation and after that ANN2 had better accuracy than the other ANN models (RMSE = 0.1 mm day−1, MAE = 0.13 mm day−1, and R2 = 0.99). In total, the RBF-LSSVM4 showed the best performance among the other input vectors and among the other models in training and testing phases. The differences between the results of RBF-LSSVM4 and ANN4 models were small. Therefore, the best estimation of ETo is achieved by all measured climate variables. The daily ETo estimates from RBF-LSSVM4, ANFIS4, and ANN4 models in the testing phase are given in Figure 8 in the form of graph line. It is remarkable that the ETo estimates obtained by the RBF-LSSVM4 model closely follow the corresponding ETo(PM) values. Some underestimates of the ETo values are clearly seen from the ANFIS4 and ANN4 models, but, overall the difference between the models for estimating ETo(PM) by I4 input vector was small.
The estimated ETo values by RBF-LSSVM4, ANFIS4, and ANN4 models in the testing phase.
The estimated ETo values by RBF-LSSVM4, ANFIS4, and ANN4 models in the testing phase.
Comparison of the ETo values estimated by RBF-LSSVM, ANFIS, and ANN models and the ETo computed by the FAO-56 PM are given in Figures 9–11. Based on the performance statistics, the RBF-LSSVM mostly provided better accuracy for estimating ETo than other models with respect to RMSE, MAE, and R2. All of the RBF-LSSVM models overestimated the daily ETo of FAO-56 PM. This result was reported by Tabari et al. (2012) in the semi-arid area of Iran. It is clear from the scatter plots that the four input vectors including RBF-LSSVM4 (RMSE = 0.13 mm day−1, MAE = 0.1 mm day−1, and R2 = 0.99), RBF-LSSVM1 (RMSE = 0.1 mm day−1, MAE = 0.13 mm day−1, and R2 = 0.99), RBF-LSSVM3 (RMSE = 0.1 mm day−1, MAE = 0.13 mm day−1, and R2 = 0.99), and RBF-LSSVM2 (RMSE = 0.1 mm day−1, MAE = 0.13 mm day−1, and R2 = 0.99) were better than the other models for ETo estimations. The ETo estimations from these input vectors are closer to the ETo from corresponding FAO-56 PM values than those of the other models and followed the same trend. Thus, elimination of the relative humidity variable from input vectors does not decrease the precision of daily ETo estimation and the performance of models.
The estimated ETo values by RBF-LSSVM model for all selected input vectors from gamma test.
The estimated ETo values by RBF-LSSVM model for all selected input vectors from gamma test.
The estimated ETo values by ANN model for all selected input vectors from gamma test.
The estimated ETo values by ANN model for all selected input vectors from gamma test.
The estimated ETo values by ANFIS model for all selected input vectors from gamma test.
The estimated ETo values by ANFIS model for all selected input vectors from gamma test.
Although the best result was attained from RBF-LSSVM4 with all required climate variables for FAO-56 PM, by eliminating mean relative humidity and minimum temperature from the two models RBF-LSSVM6 and RBF-LSSVM8, the performance still remained good. Thus, if all climate variables are not available, the daily ETo can be estimated using the three parameters of Tmax, Rs, and U2 with high accuracy in arid areas.
Based on the results, the ANN model has better accuracy than the ANFIS in both training and testing phases for the all input vectors except for I8 input vector. The results of ANN, ANFIS, polynomial-LSSVM, and linear-LSSVM models are also in good agreement with ETo(PM), and the performance of models was similar. Therefore, the selection of one model over the others depends upon the available climatic data. Although the linear-LSSVM12 (input climate variables same as Hargreaves–Samani) had the highest error with RMSE = 1.46 mm day−1 and MAE = 1.29 mm day−1, and lower R2 = 0.88, its performance was good. Overall, the RBF-LSSVM model showed superior performance over the ANN and ANFIS models for predicting daily ETo. Therefore, it seems that the RBF-LSSVM model is very appropriate for modeling non-linear processes like evapotranspiration. The obtained results from this study were in line with studies carried out by Ramedani et al. (2014), Kisi (2013), Okkan & Serbes (2012), and Deng et al. (2011). It is noticeable that, by using a suitable choice of kernel, the dataset can be separable in the feature space to obtain non-linear algorithms, while the dataset is non-separable in the original input space (Bray & Han 2004).
After evaluation of the overall accuracy of the models, applied models could be ranked as RBF-LSSVM, ANN, polynomial-LSSVM, ANFIS, and linear-LSSVM models, respectively. Comparison of Tables 2 and 4 indicates that the LSSVM, ANFIS, and ANN models with the same input variables with empirical equations (I11 and I12) performed better than the corresponding calibrated Blaney–Criddle and Hargreaves–Samani equations. Therefore, Hargreaves–Samani has the worse accuracy for daily ETo estimation at Zahedan station.
Error distribution analysis
In addition to calculating the average estimation error, evaluating the distribution of estimation error is important to find the applicability of any model to predict the daily ETo. The error distribution in training and testing phases at different threshold levels for the RBF-LSSVM4, ANFIS4, and ANN4 models are shown in Figure 12. It is clear from Figure 12(a) that about 95% of the predicted values for the best combination of RBF-LSSVM4, ANN4, and ANFIS4 models for training phase had estimation errors of 9.2%, 11.1%, and 64.5%, respectively. Also, Figure 12(b) shows that about 98% of the estimated values for the best input combination of the RBF-LSSVM4, ANN4, and ANFIS4 models for the testing phase had estimation errors of 8.4%, 9.4%, and 24%, respectively. Thus, the AARE statistic shows the potential of RBF-LSSVM4 and ANN4 models in comparison with the ANFIS4 model for estimating ETo from the error distribution viewpoint. The performance of the RBF-LSSVM model obviously shows that this model was more reliable and advantageous for simulating daily ETo than the ANN and ANFIS models. Therefore, the RBF-LSSVM model can be effectively used to modify ETo estimates and so it can improve irrigation water requirement predictions.
Error distributions of (a) training and (b) testing phases of three models for estimating ETo at Zahedan station.
Error distributions of (a) training and (b) testing phases of three models for estimating ETo at Zahedan station.
Considering all of the predictive models' results in this paper, it is declared that the three models of LSSVM, ANFIS, and ANN can be employed successfully for estimating ETo. The ANNs are universal approximates that are beneficial for finding irregularities within a set of patterns. The ANNs provide an analytical alternative to conventional approaches when the diversity of the dataset is very large or relationships between variables are difficult to explain adequately with conventional techniques (Chang et al. 2004; Riahi-Madvar et al. 2011). The major capability of the ANFIS model is that it combines the fuzzy logic system power with the numerical power of ANNs in various modeling processes using several rules (Sayed et al. 2003). The LSSVM has major flexibility in non-linear relationships of the model, and able to remove non-support vectors from the model which lead to fast training and low computational cost. Therefore, the LSSVM is able to choose the main support vectors from the model in the training process. This allows the model to avoid over-fitting and lead to better generalization of LSSVM performance than ANFIS and ANN models (Zhou et al. 2009). In addition, the LSSVM will always find a global minimum by solving a convex optimization problem (Quej et al. 2017). Hence, it seems the LSSVM model has a robust theoretical background making it more certain than the ANN and ANFIS models (Noori et al. 2015).
CONCLUSIONS
In this study, the potential of the LSSVM model in comparison with ANFIS and ANN models' hybrid with gamma test for estimating FAO-56 PM reference evapotranspiration was evaluated. Three models were trained and tested using climatic variables' dataset of Zahedan synoptic station, Iran, to estimate ETo(PM). The results of estimator models were compared with Blaney–Criddle and Hargreaves–Samani equation results. The gamma test was used to select proper combinations of input vectors in non-linear estimator models. The gamma test results specified the main variables that affect ETo and provide them with the best input vectors. These input vectors are consistent with the physical basis of ETo processes.
The results of the gamma test showed that the four input vectors of (1): Tmin, Tmax, Tdew, RHmean, Rs, U2; (2): Tmax, Tdew, RHmean, Rs, U2; (3): Tmin, Tmax, RHmean, Rs, U2; and (4): Tmin, Tmax, Tdew, Rs, U2 were better than the other input combinations for estimating ETo at Zahedan station. These input vectors had smaller gamma values than the other combinations. Based on gamma test results, the air temperature and wind speed are the main variables affecting ETo processes, where elimination of one of these variables increased the gamma value for input vectors. Thus, adding these variables into input vectors increases the accuracy models in estimating daily ETo.
The performance of the LSSVM model using radial basis function (RBF) kernel and ANN was superior to the polynomial kernels of LSSVM and ANFIS models. Comparison results of RBF-LSSVM, ANFIS, and ANN models with the empirical equations results showed the superiority of these models for ETo prediction in arid areas. None of the empirical equations have good results and show considerable error in comparison with RBF-LSSVM, ANFIS, and ANN models. In 95% of predicted values by RBF-LSSVM and ANN models, the threshold of relative absolute error was very low – about 8.4% and 9.4%, respectively, but it was high for predicted values by ANFIS – about 24%. The RBF-LSSVM model had 30% error for 100% of predicted values, but ANN and ANFIS models had 39% and 117% errors, respectively. The performance of the RBF-LSSVM model was superior to the ANN and ANFIS models. Since the RBF-LSSVM model worked well in an arid area, it is expected to work well for other climates.
The LSSVM model developed in this study will be useful to accurately quantify the crop water use and develop effective irrigation plans for the conservation of water resources at a farm as well as on a regional scale.
The application of estimator models like ANN, ANFIS, and LSSVM is not common for irrigation scheduling in Iran. The LSSVM model that was developed in this study will be useful to precisely calculate the daily ETo at a farm and regional scale. Hence, this model can be used to develop effective irrigation plans for the protection of water resources.