Accurate estimation of evapotranspiration is vitally important for management of water resources and environmental protection. This study investigated the accuracy of integrating genetic algorithm and support vector machine (GA-SVM) models using climatic variables for simulating daily reference evapotranspiration (ET0). The developed GA-SVM models were tested using the ET0 calculated by Penman–Monteith FAO-56 (PMF-56) equation in a semi-arid environment of Qilian Mountain, northwest China. Eight models were developed using different combinations of daily climatic data including maximum air temperature (Tmax), minimum air temperature (Tmin), wind speed (U2), relative humidity (RH), and solar radiation (Rs). The accuracy of the models was evaluated using root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (r). The results indicated that the GA-SVM models successfully estimated ET0 with those obtained by the PMF-56 equation in the semi-arid mountain environment. The model with input combinations of Tmin, Tmax, U2, RH, and Rs had the smallest value of the RMSE and MAE as well as higher value of r (0.995) compared to other models. Relative to the performance of support vector machine (SVM) models and feed-forward artificial neural network models, it was found that the GA-SVM models proved superior for simulating ET0.
INTRODUCTION
Evapotranspiration is the process of water transportation from the Earth's surface to the atmosphere including the evaporation process and transpiration process. It is a vital component of the hydrological cycle and water balance computing, which affects water resources management and planning (Traore et al. 2010; Shiri et al. 2013; Xing et al. 2016). Accurate estimation of evapotranspiration is crucial for water resource management, weather and climate studies, hydrology, reliable irrigation design, and determination of the water budget, especially in water shortage regions, for example, arid and semi-arid areas where water resources are under severe threat by overexploitation (Huo et al. 2013; Chatzithomas & Alexandris 2015; Shiri et al. 2015).
Evapotranspiration can be quantified either by the experimental method or mathematical method. It can be measured immediately by some instruments and equipment. However, this method is difficult, expensive and time-consuming (Pour Ali Baba et al. 2013; Ma et al. 2015; Zhang et al. 2015). The mathematical approach has to be preceded by the estimation of reference evapotranspiration (ET0) (Xu et al. 2015). Many empirical and semi-empirical models have been developed to estimate ET0 by applying meteorological data. Among the several methods for estimating ET0, the Penman–Monteith FAO-56 (PMF-56) equation has been recommended as the normal equation to estimate ET0 by the Food and Agriculture Organization of the United Nations (FAO). The PMF-56 equation has become the benchmark against all other ET0 models, based on physical methods and requiring a large number of climatic variables, such as daily minimum temperature and maximum temperature, relative humidity, solar radiation, and wind speed. However, data such as weather variables are usually incomplete or not always acquirable for many regions and thus limit the effective use of the PMF-56 model (Kumar et al. 2002; Cobaner 2011).
ET0 is an open, nonlinear, complex, and dynamic phenomenon due to its dependence on the interaction of several climatic elements (Luo et al. 2014). Thus, it is difficult to derive a definite equation to express all the related physical processes. As an alternative to conventional approaches, artificial neural network (ANN) is highly suitable for modeling the nonlinear processes (He et al. 2014; Deo & Şahin 2015; Si et al. 2015; Wen et al. 2015a). Many researchers have applied ANN for estimating ET0 (Traore et al. 2010; Marti et al. 2011; Laaboudi et al. 2012; Yassin et al. 2016). These studies indicated that the ANN models were more superior in modeling ET0 compared to the conventional methods such as Hargreaves, Priestley–Taylor, and some empirical and semi-empirical equations (Kumar et al. 2002; Landeras et al. 2008; Huo et al. 2012; Wen et al. 2015b). However, neural networks have some disadvantages such as training slowly, requiring a large amount of training data, and easily getting stuck in a local minimum (Principe et al. 2000). Support vector machine (SVM) is a novel learning machine based on a statistical learning theory and a structural risk minimization principle, which has been successfully applied for modeling the nonlinear system (Shiri et al. 2014; Feng et al. 2015). Given the same training conditions, SVM provides more dependable and better performance when compared to ANN (Gill et al. 2006; Çimen & Kisi 2009; Yoon et al. 2011). Over the last decade, SVM models have been used in a very wide range of applications to solve hydrological problems (Chou et al. 2010; Tan et al. 2012; Kalra et al. 2013; Wen et al. 2015b). Recently, researchers began to employ SVM for ET0 modeling. Kişi & Cimen (2009) discussed the potential of SVM models in estimating ET0 in central California, USA. The results showed that the SVM model could be built in as a module for estimating ET0 values in a hydrological model. Kişi (2013) tested the capacities of a least square support vector machine (LSSVM) for modeling ET0. It was found that LSSVM performed better than the ANN models and the empirical models in simulating ET0 processes. Lin et al. (2012) established a SVM-based model for daily pan evaporation estimating and compared it with an ANN-based model. They found that the SVM was superior to ANN in modeling evaporation. Tabari et al. (2012, 2013) studied the potential of SVM for estimating ET0 in a highland semi-arid environment in Iran. The results demonstrated that the SVM models achieved better ET0 estimations than the regression- and climate-based models. Wen et al. (2015b) evaluated the application of SVM to model daily ET0 by limited climatic data in an extremely arid region in northwest China. They drew the conclusion that the SVM could provide better performance when compared to ANN, Priestley–Taylor, Hargreaves, and Ritchie. These studies indicated that SVM could be applied to estimate ET0, with relatively better performance than ANN in simulating the ET0 process. Although possessing excellent features, SVM is limited in modeling ET0 research because the users must define a great many parameters appropriately (Liu et al. 2011). The estimation accuracy and efficiency of the SVM depends on the hyper-parameters being set correctly. However, selecting the most appropriate training parameter value is a critical problem for application of SVM, which can affect the model performance of the SVM. Genetic algorithm (GA), a general suited optimization search approach based on a direct analogy to Darwinian natural genetics and selection in biological systems, can be used to generate appropriate solutions to optimize and search problems. Thus, the GA could be used to select appropriate SVM parameters (Abdullah et al. 2015). In recent research, GA has been applied to optimize SVM parameters in different fields (Pourbasheer et al. 2009; Liu & Jiao 2011; Chen et al. 2016); however, applications of GA-support vector machine for modeling ET0 are limited (Shiri et al. 2011; Tao et al. 2015; Liu et al. 2016). Therefore, the proposed integration of the genetic algorithm and support vector machine (GA-SVM) model was applied to modeling the daily ET0 in this paper, in which GA was used to optimize the parameters of the SVM. Furthermore, there have been few studies conducted under the climatic conditions of the semi-arid environment of Qilian Mountain.
The main purposes of this study were to investigate the accuracy of GA-SVM models for estimating daily ET0 with various combinations of daily meteorological data including: minimum air temperature (Tmin), maximum air temperature (Tmax), wind speed (U2), relative humidity (RH), and solar radiation (Rs) in the semi-arid environment of Qilian Mountain, northwest China. Furthermore, the conventional grid algorithm-based SVM model and ANN model were also investigated for comparison.
MATERIALS AND METHODS
SVM
GA
GA is an adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics developed by Holland (1975). The procedure of GA simulates the processes of selection, crossover, and mutation to maintain superior solutions and to generate better and better offspring, making the solutions close to the objective function. Selection is performed to choose excellent chromosomes in the population for reproduction (Gao & Hou 2016). The better fit the chromosome, the more likely it will be selected. Crossover is performed randomly to choose a locus between two chromosomes to create two offspring. Mutation is performed randomly to flip some bits in a chromosome. In this paper, the parameters C and γ of SVM were optimized by GA.
The construction of SVM model based on GA optimization parameters (GA-SVM) is described below.
Step 1. Encode the SVM parameters:
The SVM parameters C and γ are directly coded to form chromosomes.
Step 2. Generate a random initial population:
Randomly generate an initial population of chromosomes which represent the SVM parameters C and γ.
Step 3. Evaluate fitness:
Step 4. Genetic operators:
The real-valued GA uses selection, crossover, and mutation operators to generate the offspring of the existing population. Excellent chromosomes are selected from a population according to the fitness to yield offspring in the next generation. The roulette wheel selection principle is applied to choose chromosomes for reproduction. In crossovers, single point crossover is randomly adopted to exchange genes between two chromosomes. The mutation operation follows the crossover operation, and determines whether a chromosome should be mutated in the next generation.
Step 5. Stop condition:
If the stop condition is satisfied, the optimization will stop and return the best parameters C and γ. Otherwise, go back to step 3.
ANN
ANN is a massively parallel distributed information processing system that has certain performance characteristics resembling biological neural networks of the human brain. A neural network is characterized by its architecture that represents the pattern of connection between nodes, its method of determining the connection weights and the activation function. The most commonly used neural network structure is the feed forward hierarchical architecture. ANN customary architecture is composed of three layers. Many theoretical and experimental works have shown that a single hidden layer is sufficient for ANNs to approximate any complex nonlinear function. The Levenberg–Marquardt training algorithm was used to train the ANN model in our research. The sigmoid and linear activation functions were used for the hidden and output node(s), respectively.
PMF-56 equation
Model performance criteria
The performances of the models developed in this research were assessed using various standard statistical performance evaluation criteria. To evaluate the performance of SVM models, three statistical criteria were used. The considered statistical measures were: coefficient of correlation (r), RMSE, and mean absolute error (MAE). The r measures the degree to which two variables are linearly related. RMSE and MAE provide different types of information on the predictive capabilities of the model. The RMSE measures the goodness-of-fit relevant to high ET0 values whereas the MAE yields a more balanced perspective of the goodness-of-fit at moderate value distribution of the estimation errors.
CASE STUDY
Study area and data
The daily climatic data employed in this study were composed of: minimum air temperature (Tmin), maximum air temperature (Tmax), wind speed (U2), relative humidity (RH), and solar radiation (Rs). The data were divided into two sets: the data from June 1, 2009 to December 31, 2010, 549 records (about 60% of total data), were used for training the models, and the remaining 365 records from January 1, 2011 to December 31, 2011 (about 40%) were used for testing. A full year data set used in the identification period enabled inclusion of various hydrological conditions that are observable during different seasons of the year. In this way the model became robust for the different hydrological conditions that prevail in the total time series (Kişi 2006). The statistical parameters of daily climatic data are shown in Table 1. The minimum and maximum values of ET0 used in the training periods ranged from 0.01 to 4.60 mm; however, the maximum value of the testing periods’ ET0 is 5.11 mm, which may cause difficulties in estimating the high ET0 values.
Climatic data and the PMF-56 ET0 . | Minimum . | Maximum . | Mean . | Std . | SK . |
---|---|---|---|---|---|
Tmin (°C) | |||||
All | −24.70 | 15.90 | −2.89 | 9.63 | −0.22 |
Training | −23.60 | 15.90 | −2.53 | 9.45 | −0.15 |
Testing | −24.70 | 13.10 | −3.43 | 9.88 | −0.30 |
Tmax (°C) | |||||
All | −21.00 | 29.00 | 7.98 | 10.60 | −0.26 |
Training | −21.00 | 28.20 | 8.28 | 10.40 | −0.23 |
Testing | −17.90 | 29.00 | 7.52 | 10.90 | −0.29 |
RH (%) | |||||
All | 18.52 | 98.10 | 59.59 | 18.94 | 0.11 |
Training | 19.75 | 98.10 | 61.30 | 18.49 | 0.06 |
Testing | 18.52 | 97.49 | 57.01 | 19.33 | 0.22 |
U2 (m s−1) | |||||
All | 0.46 | 2.08 | 1.16 | 0.26 | 0.07 |
Training | 0.47 | 2.08 | 1.15 | 0.26 | 0.15 |
Testing | 0.46 | 2.02 | 1.16 | 0.27 | −0.05 |
Rs (MJ m−2 day−1) | |||||
All | 0.00 | 12.59 | 4.50 | 3.48 | 0.43 |
Training | 0.00 | 12.59 | 4.39 | 3.50 | 0.47 |
Testing | 0.00 | 12.06 | 4.65 | 3.45 | 0.37 |
PMF-56 ET0 (mm day−1) | |||||
All | 0.01 | 5.11 | 1.44 | 1.25 | 0.73 |
Training | 0.01 | 4.60 | 1.39 | 1.23 | 0.69 |
Testing | 0.01 | 5.11 | 1.51 | 1.28 | 0.77 |
Climatic data and the PMF-56 ET0 . | Minimum . | Maximum . | Mean . | Std . | SK . |
---|---|---|---|---|---|
Tmin (°C) | |||||
All | −24.70 | 15.90 | −2.89 | 9.63 | −0.22 |
Training | −23.60 | 15.90 | −2.53 | 9.45 | −0.15 |
Testing | −24.70 | 13.10 | −3.43 | 9.88 | −0.30 |
Tmax (°C) | |||||
All | −21.00 | 29.00 | 7.98 | 10.60 | −0.26 |
Training | −21.00 | 28.20 | 8.28 | 10.40 | −0.23 |
Testing | −17.90 | 29.00 | 7.52 | 10.90 | −0.29 |
RH (%) | |||||
All | 18.52 | 98.10 | 59.59 | 18.94 | 0.11 |
Training | 19.75 | 98.10 | 61.30 | 18.49 | 0.06 |
Testing | 18.52 | 97.49 | 57.01 | 19.33 | 0.22 |
U2 (m s−1) | |||||
All | 0.46 | 2.08 | 1.16 | 0.26 | 0.07 |
Training | 0.47 | 2.08 | 1.15 | 0.26 | 0.15 |
Testing | 0.46 | 2.02 | 1.16 | 0.27 | −0.05 |
Rs (MJ m−2 day−1) | |||||
All | 0.00 | 12.59 | 4.50 | 3.48 | 0.43 |
Training | 0.00 | 12.59 | 4.39 | 3.50 | 0.47 |
Testing | 0.00 | 12.06 | 4.65 | 3.45 | 0.37 |
PMF-56 ET0 (mm day−1) | |||||
All | 0.01 | 5.11 | 1.44 | 1.25 | 0.73 |
Training | 0.01 | 4.60 | 1.39 | 1.23 | 0.69 |
Testing | 0.01 | 5.11 | 1.51 | 1.28 | 0.77 |
Std, standard deviation; SK, skewness.
Model development
The selection of appropriate input variables is important for the SVM model development since it provides the basic information on the system being modeled. In the current study, different input combinations of various daily climatic data including Tmax, Tmin, U2, RH, and Rs were used as inputs to estimate the ET0 obtained using the PMF-56 equation. Input 1 was designed as temperature-based models; the other input structures were formed by inserting wind speed, solar radiation, and relative humidity into the input 1 combination, respectively. Finally, eight input combinations evaluated in the present study were: (1) Tmax and Tmin; (2) Tmax, Tmin, and U2; (3) Tmax, Tmin, and Rs; (4) Tmin, Tmax, and RH; (5) Tmin, Tmax, U2, and RH; (6) Tmin, Tmax, U2, and Rs; (7) Tmin, Tmax, RH, and Rs; (8) Tmin, Tmax, U2, RH, and Rs. The GA-SVM, SVM, and ANN models were trained and tested for each combination.
RESULTS AND DISCUSSION
The parameters C and γ of SVM were optimized by GA. Performance statistics of the GA-SVM models for PMF-56 ET0 for the training and testing periods are given in Table 2. It was found that the difference between the values of the statistical indices of the training and validation set did not vary substantially.
. | . | Parameter . | Training periods . | Testing periods . | |||||
---|---|---|---|---|---|---|---|---|---|
Model . | Input . | C . | γ . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
GA-SVM1 | 1 | 5 | 9.64 | 0.940 | 0.422 | 0.294 | 0.948 | 0.424 | 0.311 |
GA-SVM2 | 2 | 3.01 | 1.32 | 0.959 | 0.353 | 0.247 | 0.972 | 0.314 | 0.241 |
GA-SVM3 | 3 | 4.13 | 8.81 | 0.975 | 0.273 | 0.151 | 0.990 | 0.201 | 0.147 |
GA-SVM4 | 4 | 15.11 | 0.53 | 0.948 | 0.390 | 0.287 | 0.955 | 0.396 | 0.298 |
GA-SVM5 | 5 | 1.68 | 1.24 | 0.963 | 0.331 | 0.234 | 0.971 | 0.316 | 0.241 |
GA-SVM6 | 6 | 0.69 | 2.61 | 0.977 | 0.263 | 0.143 | 0.993 | 0.163 | 0.124 |
GA-SVM7 | 7 | 3.88 | 8.07 | 0.985 | 0.213 | 0.113 | 0.991 | 0.175 | 0.132 |
GA-SVM8 | 8 | 29.13 | 0.27 | 0.980 | 0.249 | 0.137 | 0.995 | 0.138 | 0.106 |
. | . | Parameter . | Training periods . | Testing periods . | |||||
---|---|---|---|---|---|---|---|---|---|
Model . | Input . | C . | γ . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
GA-SVM1 | 1 | 5 | 9.64 | 0.940 | 0.422 | 0.294 | 0.948 | 0.424 | 0.311 |
GA-SVM2 | 2 | 3.01 | 1.32 | 0.959 | 0.353 | 0.247 | 0.972 | 0.314 | 0.241 |
GA-SVM3 | 3 | 4.13 | 8.81 | 0.975 | 0.273 | 0.151 | 0.990 | 0.201 | 0.147 |
GA-SVM4 | 4 | 15.11 | 0.53 | 0.948 | 0.390 | 0.287 | 0.955 | 0.396 | 0.298 |
GA-SVM5 | 5 | 1.68 | 1.24 | 0.963 | 0.331 | 0.234 | 0.971 | 0.316 | 0.241 |
GA-SVM6 | 6 | 0.69 | 2.61 | 0.977 | 0.263 | 0.143 | 0.993 | 0.163 | 0.124 |
GA-SVM7 | 7 | 3.88 | 8.07 | 0.985 | 0.213 | 0.113 | 0.991 | 0.175 | 0.132 |
GA-SVM8 | 8 | 29.13 | 0.27 | 0.980 | 0.249 | 0.137 | 0.995 | 0.138 | 0.106 |
Considering all models, according to the results of the testing periods, the values of RMSE, MAE, and r ranged from 0.424 to 0.138 mm/day, 0.311 to 0.106 mm/day, and 0.948 to 0.995, respectively. We observed that the r values of all models were higher than 0.94, pointing to a strong relation between estimated and PMF-56 ET0 values. The RMSE and MAE values less than 0.424 and 0.311 mm/day, respectively, indicate good and appropriate forecast. It appears that all the GA-SVM models demonstrated a high generalization capacity for the proposed model with relatively low error and high correlation, exhibiting a high accuracy for estimating PMF-56 ET0. GA-SVM8, whose input combinations included Tmin, Tmax, U2, RH, and Rs had the smallest value of the RMSE (0.138 mm/day) and MAE (0.106 mm/day) as well as a higher value of r (0.995) than other models in the testing periods therefore, it was selected as the best-fit GA-SVM model to estimate the PMF-56 ET0. All the models, GA-SVM8, GA-SVM7, GA-SVM6, and GA-SVM3 performed similarly since the values of RMSE and MAE did not vary significantly, and all r values were also very close to unity. They were found to be better than the GA-SVM1, GA-SVM2, GA-SVM4, and GA-SVM5 models in the modeling of PMF-56 ET0. For practical uses, the GA-SVM8, GA-SVM7, GA-SVM6, and GA-SVM3 models had good accuracy in PMF-56 ET0 modeling and the selection of one model over the other should be dependent upon the available meteorological data. Although the GA-SVM1 model, with only minimum and maximum air temperature as inputs had the highest error rates, its performance was good. For practical uses, the GA-SVM1 model can be used where only air temperature data are available. This is especially true in the mountain areas where reliable weather data sets of solar radiation, relative humidity, and wind speed are limited.
Comparing the different GA-SVM models, we can find that the models significantly improved the accuracy of GA-SVM1 when either solar radiation or wind speed is integrated as additional input variables. GA-SVM3 and GA-SVM2 models, which insert solar radiation and wind speed into inputs, respectively, improved r, RMSE, and MAE by 4.4%, 52.6%, and 52.7% and 2.5%, 22.9%, and 22.5%, respectively, in comparison to the GA-SVM1 model. The reduction of RMSE and MAE and increase of r obtained with relative humidity were less than those obtained from solar radiation and wind speed. Although some studies reported that wind speed was more effective for estimating ET0 (Popova et al. 2006; Traore et al. 2010; Cobaner 2011), in this study, the results show that solar radiation is the more effective and required climatic variable for modeling the ET0 in this semi-arid mountain environment with high accuracy.
In order to assess the ability of the GA-SVM model relative to the grid algorithm-based SVM model and feed-forward ANN model, the eight SVM models and eight feed-forward ANN models were developed using the same variables’ combinations as done for the GA-SVM input combinations of (1)–(8) for the PMF-56 ET0 modeling. The appropriate model structures were determined for each input combination, then, the SVM and ANN models were tested, and the results were compared by means of performance statistics.
The parameters C and γ of SVM were optimized by grid algorithm and performance statistics of the GA-SVM models for PMF-56 ET0 for the testing period are given in Table 3.
. | . | Parameter . | Training periods . | Testing periods . | |||||
---|---|---|---|---|---|---|---|---|---|
Model . | Input . | C . | γ . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
SVM1 | 1 | 0.57 | 48.50 | 0.943 | 0.412 | 0.282 | 0.947 | 0.433 | 0.321 |
SVM2 | 2 | 0.57 | 5.28 | 0.961 | 0.342 | 0.237 | 0.969 | 0.329 | 0.251 |
SVM3 | 3 | 0.33 | 16 | 0.975 | 0.275 | 0.153 | 0.989 | 0.206 | 0.150 |
SVM4 | 4 | 1 | 5.73 | 0.953 | 0.374 | 0.268 | 0.951 | 0.404 | 0.308 |
SVM5 | 5 | 0.57 | 5.29 | 0.967 | 0.313 | 0.216 | 0.965 | 0.347 | 0.262 |
SVM6 | 6 | 1 | 9.19 | 0.980 | 0.245 | 0.124 | 0.992 | 0.179 | 0.129 |
SVM7 | 7 | 1 | 16 | 0.987 | 0.201 | 0.106 | 0.988 | 0.211 | 0.153 |
SVM8 | 8 | 0.58 | 9.19 | 0.986 | 0.205 | 0.104 | 0.994 | 0.148 | 0.114 |
. | . | Parameter . | Training periods . | Testing periods . | |||||
---|---|---|---|---|---|---|---|---|---|
Model . | Input . | C . | γ . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
SVM1 | 1 | 0.57 | 48.50 | 0.943 | 0.412 | 0.282 | 0.947 | 0.433 | 0.321 |
SVM2 | 2 | 0.57 | 5.28 | 0.961 | 0.342 | 0.237 | 0.969 | 0.329 | 0.251 |
SVM3 | 3 | 0.33 | 16 | 0.975 | 0.275 | 0.153 | 0.989 | 0.206 | 0.150 |
SVM4 | 4 | 1 | 5.73 | 0.953 | 0.374 | 0.268 | 0.951 | 0.404 | 0.308 |
SVM5 | 5 | 0.57 | 5.29 | 0.967 | 0.313 | 0.216 | 0.965 | 0.347 | 0.262 |
SVM6 | 6 | 1 | 9.19 | 0.980 | 0.245 | 0.124 | 0.992 | 0.179 | 0.129 |
SVM7 | 7 | 1 | 16 | 0.987 | 0.201 | 0.106 | 0.988 | 0.211 | 0.153 |
SVM8 | 8 | 0.58 | 9.19 | 0.986 | 0.205 | 0.104 | 0.994 | 0.148 | 0.114 |
Considering the performance of the testing periods, the RMSE values ranged from 0.433 to 0.148 mm/day, MAE values ranged from 0.321 to 0.114 mm/day, and r values ranged from 0.947 to 0.994. The lower RMSE, MAE and higher r values implied the good performance of the SVM model for PMF-56 ET0 modeling. Among the SVM models, SVM8, SVM7, SVM6, and SVM3 were found to be better than the SVM1, SVM2, SVM4, and SVM5 models in the modeling of PMF-56 ET0. In the case of the GA-SVM models, SVM8, whose inputs combinations were Tmin, Tmax, U2, RH, and Rs, had the smallest value of the RMSE (0.148 mm/day) and MAE (0.114 mm/day) and a higher value of r (0.994) than other models in the test periods; therefore, it was the best-fit SVM model for estimating the PMF-56 ET0.
The architectural identification of the ANN model is the primary important aspect of the modeling since inappropriate architecture may lead to under-fitting, over-fitting and computational overload. In the current research, the optimal number of neurons in the hidden layer was identified using a trial and error procedure by varying the number of hidden neurons from 2 to 20. Furthermore, the optimal network architecture was selected based on the one with minimum MSE. The final ANN architecture and the performance statistics of each model are shown in Table 4. It was observed that the ANN models produced slight variability in performance with the RMSE values varying from 0.473 to 0.176 mm/day, MAE values varying from 0.341 to 0.136 mm/day, and r values varying from 0.930 to 0.992 in the testing periods. ANN8, ANN7, ANN6, and ANN3 had similar performance that showed small differences between the RMSE, MAE, and r values. They were found to be better than the ANN1, ANN2, ANN4, and ANN5 models in the modeling of PMF-56 ET0. Similar to GA-SVM and SVM models, the ANN8 model with inputs of Tmin, Tmax, U2, RH, and Rs had the best performance (RMSE = 0.176 mm/day, MAE = 0.136 mm/day, and r = 0.992) among the ANN models.
. | . | . | Training periods . | Testing periods . | ||||
---|---|---|---|---|---|---|---|---|
Model . | Input . | Structure . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
ANN1 | 1 | 2-12-1 | 0.939 | 0.439 | 0.315 | 0.941 | 0.473 | 0.341 |
ANN2 | 2 | 3-4-1 | 0.959 | 0.352 | 0.259 | 0.967 | 0.361 | 0.280 |
ANN3 | 3 | 3-11-1 | 0.971 | 0.301 | 0.174 | 0.986 | 0.245 | 0.174 |
ANN4 | 4 | 3-13-1 | 0.952 | 0.380 | 0.289 | 0.930 | 0.482 | 0.349 |
ANN5 | 5 | 4-6-1 | 0.964 | 0.330 | 0.243 | 0.968 | 0.347 | 0.266 |
ANN6 | 6 | 4-4-1 | 0.974 | 0.280 | 0.176 | 0.990 | 0.224 | 0.169 |
ANN7 | 7 | 4-6-1 | 0.982 | 0.232 | 0.156 | 0.985 | 0.232 | 0.155 |
ANN8 | 8 | 5-11-1 | 0.984 | 0.223 | 0.141 | 0.992 | 0.176 | 0.136 |
. | . | . | Training periods . | Testing periods . | ||||
---|---|---|---|---|---|---|---|---|
Model . | Input . | Structure . | r . | RMSE mm/day . | MAE mm/day . | r . | RMSE mm/day . | MAE mm/day . |
ANN1 | 1 | 2-12-1 | 0.939 | 0.439 | 0.315 | 0.941 | 0.473 | 0.341 |
ANN2 | 2 | 3-4-1 | 0.959 | 0.352 | 0.259 | 0.967 | 0.361 | 0.280 |
ANN3 | 3 | 3-11-1 | 0.971 | 0.301 | 0.174 | 0.986 | 0.245 | 0.174 |
ANN4 | 4 | 3-13-1 | 0.952 | 0.380 | 0.289 | 0.930 | 0.482 | 0.349 |
ANN5 | 5 | 4-6-1 | 0.964 | 0.330 | 0.243 | 0.968 | 0.347 | 0.266 |
ANN6 | 6 | 4-4-1 | 0.974 | 0.280 | 0.176 | 0.990 | 0.224 | 0.169 |
ANN7 | 7 | 4-6-1 | 0.982 | 0.232 | 0.156 | 0.985 | 0.232 | 0.155 |
ANN8 | 8 | 5-11-1 | 0.984 | 0.223 | 0.141 | 0.992 | 0.176 | 0.136 |
Comparing the performance criteria of the GA-SVM models (Table 2) with those of the SVM (Table 3) and ANN models (Table 4), it was observed that all three models generally gave low values of the RMSE and MAE as well as high r. The GA-SVM, SVM, and ANN models had good performance in PMF-56 ET0 modeling. However, the comparison revealed that all the GA-SVM models performed little better than the corresponding SVM and ANN models in modeling the PMF-56 ET0. The GA-SVM model produced a lower RMSE and MAE as well as a higher r, being the best according to the criteria. The SVM model had the second best performance and the ANN model was found to be the worst of all approaches investigated according to the criteria in this study.
The estimation of total PMF-56 ET0 obtained from the estimated ET0 values was also considered for comparison due to its importance in water balance calculation, water resources planning and management. The total estimated ET0 amounts in testing periods are given in Table 5, showing that most models underestimated total PMF-56 ET0 value. The GA-SVM8, SVM8, and ANN8 models, whose input parameters were Tmin, Tmax, U2, RH, and Rs, estimated the total PMF-56 ET0 value of 551.19 mm as 538.96 mm, 537.94 mm, and 536.54 mm, with an underestimation of 2.22%, 2.40%, and 2.66%, respectively. The total PMF-56 ET0 amount estimates of the GA-SVM8 and SVM8 were closer to the PMF-56 ET0 values. Among the models, the GA-SVM8 model had the best estimate (−2.22%) and the SVM8 had the second best estimate (−2.40%).
. | GA-SVM . | SVM . | ANN . | |||
---|---|---|---|---|---|---|
Input . | Total ET0.(mm) . | Relative error (%) . | Total ET0.(mm) . | Relative error (%) . | Total ET0.(mm) . | Relative error (%) . |
1 | 507.64 | −7.90 | 502.39 | −8.85 | 486.14 | −11.80 |
2 | 519.51 | −5.75 | 517.90 | −6.04 | 495.39 | −10.12 |
3 | 529.13 | −4.00 | 528.85 | −4.05 | 524.70 | −4.81 |
4 | 509.35 | −7.59 | 520.98 | −5.48 | 516.94 | −6.21 |
5 | 521.64 | −5.36 | 520.88 | −5.50 | 504.69 | −8.44 |
6 | 528.21 | −4.17 | 528.35 | −4.14 | 513.22 | −6.89 |
7 | 533.74 | −3.17 | 535.93 | −2.77 | 525.42 | −4.68 |
8 | 538.96 | −2.22 | 537.94 | −2.40 | 536.54 | −2.66 |
. | GA-SVM . | SVM . | ANN . | |||
---|---|---|---|---|---|---|
Input . | Total ET0.(mm) . | Relative error (%) . | Total ET0.(mm) . | Relative error (%) . | Total ET0.(mm) . | Relative error (%) . |
1 | 507.64 | −7.90 | 502.39 | −8.85 | 486.14 | −11.80 |
2 | 519.51 | −5.75 | 517.90 | −6.04 | 495.39 | −10.12 |
3 | 529.13 | −4.00 | 528.85 | −4.05 | 524.70 | −4.81 |
4 | 509.35 | −7.59 | 520.98 | −5.48 | 516.94 | −6.21 |
5 | 521.64 | −5.36 | 520.88 | −5.50 | 504.69 | −8.44 |
6 | 528.21 | −4.17 | 528.35 | −4.14 | 513.22 | −6.89 |
7 | 533.74 | −3.17 | 535.93 | −2.77 | 525.42 | −4.68 |
8 | 538.96 | −2.22 | 537.94 | −2.40 | 536.54 | −2.66 |
Overall, the GA-SVM, SVM, and ANN models can give good prediction performance and can be successfully applied to establish models that could provide accurate and reliable PMF-56 ET0 modeling. The results suggested that the GA-SVM models were superior to the SVM and ANN in the PMF-56 ET0 modeling.
Although this study tried to seek the best models in GA-SVM, SVM, and ANN models, the truth is that the more climate variables used to train the models, the more accurate the results in estimating PMF-56 ET0. In ungauged regions, the climate data are limited; however, the temperature data are easy to access and process, and in this situation, the GA-SVM, SVM, and ANN models with only minimum and maximum air temperature as inputs can all be used to estimate PMF-56 ET0.
CONCLUSIONS
The objective of this paper was to investigate the accuracy of integrating GA-SVM model using climatic variables for modeling of daily reference evapotranspiration (ET0), in which GA was used to optimize the parameters of the SVM. The developed GA-SVM models were tested using the ET0 calculated by PMF-56 equation of a semi-arid environment of Qilian Mountain, northwest China. Eight models were developed using different combinations of daily climatic data including maximum air temperature (Tmax), minimum air temperature (Tmin), wind speed (U2), relative humidity (RH), and solar radiation (Rs). The results showed that the GA-SVM models exhibited a successful performance for estimating ET0. The GA-SVM model whose combinations inputs included Tmin, Tmax, U2, RH, and Rs had the best accuracy for the estimation of PMF-56 ET0.
In order to assess the ability of the GA-SVM model relative to the conventional grid algorithm-based SVM model and feed-forward ANN model, the eight SVM models and eight feed-forward ANN models were developed using the same variables combinations as done for the GA-SVM input combinations for comparison. The results showed that the proposed GA-SVM model had better accuracy performance than the grid algorithm-based SVM model and ANN model in modeling the PMF-56 ET0. This study suggests that the GA-SVM model provides accurate PMF-56 ET0 modeling and can be successfully applied to PMF-56 ET0 modeling in a semi-arid mountain area where evapotranspiration measurements or the complete climatic data for applying the PMF-56 method are often not available.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (Grant Nos. 31370466, 41522102, 41571031), and the technology innovation team of China Academy of Science (Cross and Cooperation), and the Foundation for Excellent Youth Scholars of CAREERI, CAS, and the China Postdoctoral Science Foundation (Grant No. 2015M572620). The authors wish to thank the anonymous reviewers for reading the manuscript, and for their suggestions and critical comments.