## Abstract

Estimating evapotranspiration is very important in calculating crop water requirements. The Penman–Montieth (P–M) method is the most commonly used. This study is an attempt to simplify the P–M application using apparent temperature (AT) as a substitute for meteorological parameters. Genetic programming (GP) was used to model paddy crop evapotranspiration for six stations in Tamil Nadu, India, with two input sets. In one the inputs were mean temperature, wind speed, relative humidity and antecedent evapotranspiration. In the other model, the input was AT, an agglomeration of meteorological parameters and antecedent evapotranspiration. The GP model, using AT, proved capable of predicting evapotranspiration better than the P–M-based model, indicating that AT reflects the effects of other meteorological parameters and can be used in estimating evapotranspiration. Like any other data-driven technique, the training dataset in GP should include the lowest and highest modelling parameter values, if the model developed is to be robust.

## HIGHLIGHTS

Simplify the application of the Penman–Montieth method by proposing apparent temperature (AT) as a substitute for meteorological parameters.

Genetic programming (GP) has been used to model the evapotranspiration of paddy crops.

The GP model with AT as the input parameter is capable of accurately predicting evapotranspiration.

Data-driven models like GP must have all the expected range of values in the training set data to develop a robust model.

### Graphical Abstract

## INTRODUCTION

Water scarcity and non-point source pollution in agricultural areas are worldwide issues. Increasing demand for food has pushed increases in food production through irrigation and improved fertilizer application (Cao *et al.* 2015). The shortage of water resources has become a crucial restraint in the growth of irrigation. Objective assessment of crop water requirement and related impacts on water quality by crop production are trusted methods used to encourage efficient and sustainable water resource use in agriculture (Xinchun *et al.* 2018). Conventionally, crop water requirement is estimated by finding the evapotranspiration (ET) during the entire crop growth period. ET is also used in many other domains including hydrology, climatology, ecology, water management, etc. (Alexandris & Proutsos 2020; Jahanfar *et al.* 2020).

Of the different ET estimation methods, empirical equations – e.g. Blaney–Criddle, Hargreaves, Penman–Montieth (P–M), Turc, Thornthwaite, etc. – are in common use. All involve one or more meteorological parameters. The Hargreaves and Thornthwaite methods are temperature-based (Trajkovic *et al.* 2019, 2020). The P–M method, on the other hand, includes energy exchange and latent heat flux parameters. P–M is the most widely adopted worldwide and many professional bodies and organizations, including the Food and Agriculture Organization (FAO), recommend its use (Trajkovic & Gocic 2021).

Many publications deal directly with the use of the P–M method and compare its performance with that of other methods – e.g., Nikam *et al.* 2014; Pandey *et al.* 2014; Jadhav *et al.* 2015; da Cunha *et al.* 2017; Lang *et al.* 2017; Chowdhury *et al.* 2017; Hafeez & Khan 2018. Efforts have also been made to simplify the equation and study its performance in limited data scenarios. Trajkovic *et al.* (2011) estimated the errors arising when ET is estimated in the absence of some weather parameters and determined the minimum weather data requirements for estimating ET to acceptable levels. They concluded that minimum and maximum temperatures and wind speed were the minimum requirements for FAO-56 ET estimation in humid climates. Djaman *et al.* (2017) evaluated the FAO-56 P–M method using two of Valiantza's equations and four others for estimating reference ET with limited data across Tanzania and southwestern Kenya. Quej *et al.* (2019) compared seven temperature-based models and a standardized reference ET equation for the Yucatan Peninsula, Mexico. They concluded that the uncalibrated P–M method, using temperature alone, produced better results than the FAO-56-based ET method. Xie & Wang (2020) compared 10 potential ET models and their attribution analyses for 10 drainage basins in China, using daily, observed meteorological variables at 2,267 stations as the models’ input. Sensitivity analysis revealed wind speed and sunshine duration as the two main factors responsible for the decreasing ET trend. Yeh (2017) estimated ET using a limited number of parameters from the Tainan weather station in Taiwan.

Apparent temperature (AT) is proposed for modelling ET in this study, as an alternative to simplifying the P–M method in limited data scenarios. AT is a feel-like temperature, and is caused by the combined effects of air temperature, wind speed and relative humidity (RH). Although AT is defined as the temperature equivalent perceived by humans, it is expected that such effects also influence plant growth and hence ET. Sivapragasam *et al.* (2017) studied the influence of AT and other weather parameters in BOD removal by *Lemna minor* using GP-based mathematical modelling. Vanitha *et al.* (2017) modelled the BOD removal performance of a constructed wetland under the influence of RH and AT using GP. Sivapragasam & Natarajan (2021) compared the trends of apparent and actual air temperature to assess climate change for five stations in Tamil Nadu, India. This study is an attempt to develop a GP model for ET estimation with two different input parameter sets. In case 1, mean temperature (*T*_{mean}), wind speed (*u*), RH and ET antecedent (ETA) are the inputs, and ET, estimated from P–M, is the output. In case 2, only AT and ETA are inputs, with ET estimated from P–M as the output.

## STUDY AREA AND DATA

Ramanathapuram, which has a tropical climate, is in southern Tamil Nadu, India. The district is between 8 and 19 m above mean sea level, the average elevation is taken as 11 m, which is that of the district headquarters. The average monthly maximum temperature ranges between 29.2 and 37.8 °C, and the minimum 19.5 and 24.8 °C. The annual precipitation is about 912 mm, with an average of 122.7 mm in summer and 67.4 mm in winter. The highest and lowest temperatures are observed in May and January.

Location . | Latitude (°N) . | Longitude (°E) . | Elevation (m) . |
---|---|---|---|

Thelichatanallur | 9.5562 | 78.5625 | 45 |

Pamboor | 9.4767 | 78.5633 | 19 |

Sirakikottai | 9.4995 | 78.6853 | 2 |

Mangudi | 9.7863 | 78.4375 | 77 |

Thayamangalam | 9.6804 | 78.6087 | 45 |

Maravamangalam | 9.7644 | 78.6406 | 95 |

Location . | Latitude (°N) . | Longitude (°E) . | Elevation (m) . |
---|---|---|---|

Thelichatanallur | 9.5562 | 78.5625 | 45 |

Pamboor | 9.4767 | 78.5633 | 19 |

Sirakikottai | 9.4995 | 78.6853 | 2 |

Mangudi | 9.7863 | 78.4375 | 77 |

Thayamangalam | 9.6804 | 78.6087 | 45 |

Maravamangalam | 9.7644 | 78.6406 | 95 |

## ET METHODS

### Blaney–Criddle method

### Hargreaves method

*ET* is in mm/day; *Ra* (MJ/m^{2}/day) is the extra-terrestrial solar radiation, and *T*_{max} and *T*_{min} are the maximum and minimum daily air temperatures (°C).

### P–M method

*ET*is the reference evapotranspiration (mm/day;

*Δ*is the slope vapour curve (kPa/°C);

*R*is the crop surface net radiation (MJ/m

_{n}^{2}/day);

*G*is the soil heat flux density (MJ/m

^{2}/day);

*T*is the air temperature at 2 m height (°C);

*u*is the windspeed at 2 m (m/s);

_{2}*e*is the saturation vapour pressure (kPa);

_{s}*e*is the actual vapour pressure (kPa); and

_{a}*γ*is the psychrometric constant (kPa/°C).

## METHODOLOGY AND TOOL

GP, an evolutionary algorithm, was used to develop the ET models. GP operates on parse trees, to approximate the equation that best describes the output's relationship to the input variables. An initial population is considered of randomly generated programmes (equations), derived from the random combination of input variables, random numbers and functions. The functions can include arithmetic operators (plus, minus, multiply, divide), mathematical functions (sin, cos, exp, log) and logical/comparison functions (OR/AND), and must be chosen appropriately on the basis of some understanding of the process. The resulting population of potential solutions is subjected to an evolutionary process, and the ‘fitness’ (a measure of how well they solve the problem) of the evolved programmes is evaluated. Those programmes that best fit the data are then selected to exchange a portion of their information to produce better programmes through ‘crossover’ and ‘mutation’. These processes are used to mimic natural reproduction. Crossover is exchanging parts of the best programmes with each other; reproduction is the exact copying of the data into the next generation, while random alteration of programmes to create new ones is the mutation (Koza 1992). The user determines the number of GP parameters – e.g., population size and number of generations, as well as crossover and mutation probability – before applying the algorithm to model the data. Programmes that fit the data less well are discarded. This evolutionary process is repeated over successive generations and driven towards finding symbolic expressions describing the data, which can be interpreted scientifically to derive knowledge about the process being modelled.

Since the ET process is not expected to depend on logarithmic, exponential or trigonometric functions, simple arithmetic functions are used for modelling.

*y*is the actual ET,

_{a}*y*is the predicted ET, and

_{f}*N*is the total number of samples.

## RESULTS AND DISCUSSION

GP was applied to the Thelichatanallur dataset. Among the equations it generated, that with the lowest RMSE was selected as the best and was also considered as a benchmark for the other stations nearby. The optimum GP parameters adopted to generate equations for cases 1 and 2 are provided in Table 2. The population size was in the range 100–1,000 and the number of children between 100 and 500. Variations in other parameters had meagre impact on the results. The best GP equation for case 1 was obtained for a run of 1,787 generations, while, for case 2, there were 3,228 generations.

Parameters . | Values . |
---|---|

Subtree Mutation Probability | 0.05 |

Maximum Subtree Mutation Size | 15 |

Constant Mutation Probability | 0.05 |

Constant Mutation Extent | 5 |

BroodSelection | True |

BroodSize | 2 |

Swap mutation rate | 0.05 |

Crossover rate | 0.4 |

Reduce mutation rate | 0.05 |

Self crossover | 0.05 |

Subtree Mutation Probability | 0.05 |

Maximum Subtree Mutation Size | 15 |

Population size | 500 |

Number of children | 250 |

Parameters . | Values . |
---|---|

Subtree Mutation Probability | 0.05 |

Maximum Subtree Mutation Size | 15 |

Constant Mutation Probability | 0.05 |

Constant Mutation Extent | 5 |

BroodSelection | True |

BroodSize | 2 |

Swap mutation rate | 0.05 |

Crossover rate | 0.4 |

Reduce mutation rate | 0.05 |

Self crossover | 0.05 |

Subtree Mutation Probability | 0.05 |

Maximum Subtree Mutation Size | 15 |

Population size | 500 |

Number of children | 250 |

Table 3 is a comparison of the performance of models generated for cases 1 and 2.

Station . | Entire dataset . | Training . | Validation . | |||
---|---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 1 . | Case 2 . | Case 1 . | Case 2 . | |

Thelichatanallur | 1.57 | 1.08 | 1.59 | 1.09 | 1.52 | 1.02 |

Pamboor | 1.41 | 1.16 | 1.35 | 1.18 | 1.59 | 1.09 |

Sirakikottai | 1.44 | 1.19 | 1.37 | 1.22 | 1.60 | 1.09 |

Mangudi | 1.19 | 1.20 | 1.16 | 1.26 | 1.27 | 0.99 |

Thayamangalam | 2.81 | 2.71 | 3.18 | 3.06 | 1.11 | 1.21 |

Maravamangalam | 3.06 | 2.91 | 3.47 | 3.29 | 1.19 | 1.26 |

Station . | Entire dataset . | Training . | Validation . | |||
---|---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 1 . | Case 2 . | Case 1 . | Case 2 . | |

Thelichatanallur | 1.57 | 1.08 | 1.59 | 1.09 | 1.52 | 1.02 |

Pamboor | 1.41 | 1.16 | 1.35 | 1.18 | 1.59 | 1.09 |

Sirakikottai | 1.44 | 1.19 | 1.37 | 1.22 | 1.60 | 1.09 |

Mangudi | 1.19 | 1.20 | 1.16 | 1.26 | 1.27 | 0.99 |

Thayamangalam | 2.81 | 2.71 | 3.18 | 3.06 | 1.11 | 1.21 |

Maravamangalam | 3.06 | 2.91 | 3.47 | 3.29 | 1.19 | 1.26 |

This model indicates the influence of both antecedent ET and AT clearly. Antecedent ET accounts, implicitly, for the antecedent meteorological conditions, which interact in a complex way with the current meteorological conditions indicated by the product of ETA and AT. Case 2's lower RMSE, compared to case 1, also means that AT seems to account for other meteorological parameters such as sunshine hours, net radiation, etc., apart from being an agglomeration of temperature, wind speed and RH. The ET values from the case 2 model range from 2.69 to 7.46 mm/day for Thelichatanallur, Pamboor, Sirakikottai and Mangudi. Similarly, they range from 2.98 to 9.57 mm/day for Thayamangalam and Maravamangalam.

An attempt was made to improve the model's accuracy for Thayamangalam and Maravamangalam. Two approaches were considered, (a) developing a model using Thayamangalam data and validating it for Maravamangalam, i.e., developing a separate model for stations where ET has a higher range (0–20 mm/day); (b) developing a model using data from Thelichatanallur and Thayamangalam, and validating the other four stations, i.e., a single model for all stations. Since the inputs using AT (case 2) give better accuracy, only that was considered for improving the model. The model developed from the first approach was applied to determine ET for Maravamangalam. Table 4 is a comparison of the output results from this equation with the actual ET values.

Stations . | Training . | Validation . | ||
---|---|---|---|---|

Original model (case 2) . | Modified model (case a) . | Original model (case 2) . | Modified model (case a) . | |

Thayamangalam | 3.06 | 1.92 | 1.21 | 1.28 |

Maravamangalam | 3.29 | 2.08 | 1.26 | 1.32 |

Stations . | Training . | Validation . | ||
---|---|---|---|---|

Original model (case 2) . | Modified model (case a) . | Original model (case 2) . | Modified model (case a) . | |

Thayamangalam | 3.06 | 1.92 | 1.21 | 1.28 |

Maravamangalam | 3.29 | 2.08 | 1.26 | 1.32 |

Equation (8) does not have the complex interaction of AT and ETA shown by Equation (7), indicating that the natures of the processes governing higher and lower ETs are different. *ET*_{0} ranges from 4.54 to 19.37 mm/day for Maravamangalam, and from 4.20 to 19.67 mm/day for Thayamangalam.

Station . | Training . | Validation . | ||
---|---|---|---|---|

Original model (case 2) . | Modified model (case b) . | Original model (case 2) . | Modified model (case b) . | |

Thelichatanallur | 1.09 | 1.13 | 1.02 | 1.05 |

Pamboor | 1.18 | 1.12 | 1.09 | 1.06 |

Sirakikottai | 1.22 | 1.14 | 1.09 | 1.04 |

Mangudi | 1.26 | 1.24 | 0.99 | 1.04 |

Thayamangalam | 3.06 | 1.93 | 1.21 | 1.24 |

Maravamangalam | 3.29 | 2.08 | 1.26 | 1.32 |

Station . | Training . | Validation . | ||
---|---|---|---|---|

Original model (case 2) . | Modified model (case b) . | Original model (case 2) . | Modified model (case b) . | |

Thelichatanallur | 1.09 | 1.13 | 1.02 | 1.05 |

Pamboor | 1.18 | 1.12 | 1.09 | 1.06 |

Sirakikottai | 1.22 | 1.14 | 1.09 | 1.04 |

Mangudi | 1.26 | 1.24 | 0.99 | 1.04 |

Thayamangalam | 3.06 | 1.93 | 1.21 | 1.24 |

Maravamangalam | 3.29 | 2.08 | 1.26 | 1.32 |

As can be seen, antecedent *ET* appears as *ETA*^{3} in Equation (9), in contrast to *ETA*^{2} in both Equations (7) and (8, and there is no term indicating the complex interaction of *ETA* and *AT*. However, since *ETA* contains the antecedent meteorological conditions implicitly, *ETA*^{3} can be assumed to explain the model's ability to model both lower and higher values of *ET*. As per the case b model, *ET* ranges from 3.88 to 8.77 mm/day for Thelichatanallur, Pamboor, Sirakikottai and Mangudi. On the other hand, the range is from 3.90 to 19.60 mm/day for Thayamangalam and Maravamangalam.

The study's results corroborate those reported by others. For instance, Guven *et al.* (2008) predicted daily ET using GP for five stations in Southern California. They reported error percentages as 4, 0, 1.32, 3.81 and 2.56%. In this study, the error percentages based on the mean and predicted ET values for Thelichatanallur, Pamboor, Sirakikottai, Mangudi, Thayamangalam and Maravamangalam are 1.61, 0.7, 0.56, 1.2, 1.35 and 1.85%, respectively. Hence, this study's results confirm those obtained by Guven *et al.* (2008). The performance of GP has thus been shown to be robust and the model can predict ET accurately, taking AT and ETA into consideration. Table 6 shows the mean values of ET for the period August to November in both 2018 and 2019.

Station . | Mean ET values (mm/day) . | |||
---|---|---|---|---|

2018 . | 2019 . | |||

Actual . | Predicted . | Actual . | Predicted . | |

Thelichatanallur | 5.21 | 5.24 | 5.21 | 5.33 |

Pamboor | 5.52 | 5.49 | 5.22 | 5.32 |

Sirakikottai | 5.54 | 5.51 | 5.25 | 5.34 |

Mangudi | 5.58 | 5.55 | 5.10 | 5.25 |

Thayamangalam | 5.75 | 5.67 | 7.94 | 7.84 |

Maravamangalam | 5.75 | 5.67 | 8.47 | 8.29 |

Station . | Mean ET values (mm/day) . | |||
---|---|---|---|---|

2018 . | 2019 . | |||

Actual . | Predicted . | Actual . | Predicted . | |

Thelichatanallur | 5.21 | 5.24 | 5.21 | 5.33 |

Pamboor | 5.52 | 5.49 | 5.22 | 5.32 |

Sirakikottai | 5.54 | 5.51 | 5.25 | 5.34 |

Mangudi | 5.58 | 5.55 | 5.10 | 5.25 |

Thayamangalam | 5.75 | 5.67 | 7.94 | 7.84 |

Maravamangalam | 5.75 | 5.67 | 8.47 | 8.29 |

Deviation between actual and predicted mean ET is marginal for all six stations, indicating that GP can predict ET using AT and ETA.

It is noted that these models, being empirical, will not provide exact understanding of the physics of the process involved – a complex aerodynamic process. Since parameters are interconnected, however, the complexity of some terms can be understood to indicate process complexity. It is also noted that data-driven models, like GP, depend crucially on the input data quality. A sound model must be developed from a training dataset consisting of the full expected range of the variable.

## CONCLUSIONS

GP was used to model ET at six weather stations in Tamil Nadu, India. Two different inputs were considered. In case 1, mean temperature, wind speed, RH and antecedent ET were considered, while, in case 2, AT was used as the input. The RMSE between the actual and predicted values is lower for case 2 than case 1 for both training and validation datasets. Furthermore, the datasets for stations with high and low ET values were combined to derive a single GP equation for all six. Since, this combination yielded a training set covering all possible ET ranges, the resulting GP equation had a much lower RMSE than that derived from a dataset from a single station. The mean predicted ETs for the periods considered in 2018 and 2019 are very similar to the actual means. The predicted mean ET values range from 5.24 to 5.67 mm/day in 2018, and from 5.25 to 8.29 mm/day in 2019.

Analysis of the GP-evolved model indicates a complex interaction of antecedent ET and meteorological parameters. It is difficult to explain the exact nature of the physical process involved but the complexity of the terms in the model does indicate some understanding of the process complexity. It appears that AT also implicitly reflects the effect of other meteorological parameters, including sunshine hours, net radiation, etc. Thus, AT appears to be a suitable replacement for meteorological parameters like mean temperature, humidity and wind speed, as well as a more meaningful parameter for modelling potential ET.

## ACKNOWLEDGEMENT

The authors wish to thank Dr K. Selvarani and Dr S. Vanitha of Kalasalingam Academy of Research and Education for their support in carrying out this work.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.