## Abstract

The estimation of evaporation in the field as well as the regional level is required for the efficient planning and management of water resources. In the present study, artificial neural network (ANN) and multiple linear regression (MLR)-based models were developed to estimate the pan evaporation on the basis of one day-lagged rainfall (*P _{t}*

_{−1}), one day-lagged relative humidity (RH

_{t}_{−1}), current day maximum temperature (

*T*

_{max}) and minimum temperature (

*T*

_{min}). These were selected as the most effective parameters on the basis of cross-correlation. The performance of models was evaluated using correlation coefficient (

*r*), root-mean-square error (RMSE) and Nash–Sutcliffe efficiency (coefficient of efficiency, CE) during calibration and validation periods. Based on the comparison, the ANN model (4-9-1), with sigmoid as activation function and Levenberg–Marquardt as a learning algorithm, was selected as the best performing model among all ANN models. The values of

*r*, CE and RMSE for training and validation periods were found as 0.885, 0.785 and 1.00 mm/day and 0.889, 0.782 and 1.01 mm/day, respectively, through the ANN model (4-9-1). The values of

*r*, CE and RMSE for training and validation periods were found as 0.835, 0.698 and 1.19 mm/day and 0.866, 0.750 and 1.15 mm/day, respectively, through the selected MLR model. Based on the sensitivity analysis, RH

_{t}_{−1}is selected as the most effective parameter followed by

*P*

_{t}_{−1},

*T*

_{max}and

*T*

_{min}. The developed model can be utilized as an alternative for the estimation of the evaporation at the regional level with limited input data.

## INTRODUCTION

Evaporation is the major, diffusive process of the hydrological cycle in which water is changed from the liquid state into the vapor state due to the transfer of heat energy. It plays a vital role in water resource planning and development in arid and semi-arid climatic regions (Shirsath & Singh 2010). The management of scarce water resources is important for sustainable crop production (Shirgure & Rajput 2012). In a hot climate, water loss plays an important role through evaporation from rivers, canals and open-water bodies. Evaporation losses are even significant in humid areas, but the cumulative mean precipitation over these areas can mask it, meaning it is not ordinarily recognized except during the rainless period. The estimation of evaporation is required in water balance computations, irrigation management, crop yield forecasting, river flow forecasting and ecosystem modeling in hydrology, agronomy, forestry and land resource planning (Terzi & Keskin 2005).

Irrigation scheduling is based on evapotranspiration estimated through evaporation. It is a systematic approach for efficient irrigation scheduling. Farm irrigation systems, as well as water resource development projects, are also designed with a long-term mean value of evaporation and their magnitude along with variation in evaporation losses (Shiri *et al.* 2011). There is a growing demand for evaporation data for the studies of surface water and energy fluxes, especially addressing the impacts of global warming (Xu & Singh 2005). The accurate estimation of evaporation is fundamental for the effective management of water resources. Therefore, the need for reliable models for quantifying evaporation losses from increasingly scarce water resources is greater than ever before (Tabari *et al.* 2010).

An appropriate and consistent measurement of evaporation estimated by direct and indirect methods for a longer duration is always challenging for the researchers. In the direct evaporation measurement method, the United States Weather Bureau (USWB) Class A pan evaporimeter and eddy correlation techniques were used, whereas, in indirect methods, the meteorological variables were used to estimate the evaporation. The most widely used crop monitoring and forecasting models by the Food and Agriculture Organization (FAO) are based on evaporation estimates (Gommes 1998). Yihdego & Webb (2017) evaluated the differences in the measured pan evaporation and estimated evaporation seasonally of Lake Burrumbeet, Australia and showed that evaporation is fully radiation driven and the effect of wind is minimal. Although various empirical formulas are available, their performances are not satisfactory because of the complex nature of the evaporation process and non-availability of the input data. Therefore, the estimation of evaporation from easily available meteorological parameters is another good alternative.

Evaporation is a complex and nonlinear phenomenon as it depends on several interacting meteorological factors such as temperature, humidity, wind speed and bright sunshine hours (Xu & Singh 1998, 2001). Unfortunately, reliable estimates of evaporation are extremely difficult to obtain because of complex interactions between the components of the soil–plant–atmosphere system. This is partly due to reasons such as a wide range of data type and expertise are required (Goyal *et al.* 2014). Many conventional models such as empirical, regression-based models and conceptual models have been developed to estimate the evaporation with a large quantity of data but produced less accurate results. The models developed from meteorological data like multiple regression models involve empirical relationships to some extent which accounts for local conditions. Therefore, most of the models may give reliable results when applied to climatic conditions similar to those for which they were developed. But without local or regional calibration, the use of such models under greatly different climatic conditions may give results that differ considerably. No single model is universally adequate under all climatic conditions, so it is difficult to select the most appropriate evaporation model for a given region.

Climate-based models (Priestley & Taylor 1972) like regression models (Kisi 2009), artificial neural network (ANN) models (Sudheer *et al.* 2002) and adaptive neural-based fuzzy inference systems (Guven & Kisi 2011) were used for estimating daily evaporation. Of all the models, the recently developed soft computing techniques such as ANN, fuzzy logic, decision-tree algorithms (ASCE 2000) with a different network structure and meteorological variable combinations were effectively applied by many researchers for modeling the evaporation. The ANN models have been found to be useful and efficient, particularly in problems for which the characteristics of the processes are difficult to describe using physical equations (Zhang *et al.* 1998). Neural network applications have diffused rapidly due to their functional characteristics which provide many advantages over traditional analytical approaches (ASCE 2000). The model requires significantly minimum input data than a similar conventional mathematical model since variables remain fixed from one simulation to another. An ANN can quickly present sensitive responses to tiny input changes in a dynamic environment (Moghaddamnia *et al.* 2009). The estimation of the reference evapotranspiration using the least square support vector machine gamma test (LSSVM-GT) model with different kernels of radial basis function, linear and polynomial, ANN-GT, adaptive neuro-fuzzy inference system (ANFIS-GT), and empirical equations was carried out and it was found that the LSSVM-GT approach is highly capable of providing favorable predictions with high precision in arid regions (Seifi & Riahi 2018). In the present study, an attempt has been made to perform the following objectives: (a) development of the relationship between independent variables and dependent variables using correlation coefficient, cross-correlation techniques for selecting the best performing input variables, (b) development of the ANN and multiple linear regression (MLR)-based evaporation estimation models, (c) qualitative and quantitative performance evaluation of the developed model and (d) sensitivity analysis to determine the most effective parameter.

## MATERIALS AND METHODS

### Study area

The present study was carried out in the Roorkee, having an area of 811 ha, which comes under the Haridwar District of Uttarakhand, India as shown in Figure 1. The Upper Ganga Canal divides the city into two distinct parts that enhance the beauty of the city. The city is located at a latitude of 29°51′N and longitude of 77°53′E with an elevation of 274 m above the mean sea level. It receives an average annual rainfall of 1,032 mm. The average temperature in January and June is 13.8 and 32.2 °C, respectively, with annual temperature difference of 18.4 °C. The highest temperature of 45.5 °C (on 9 May 1956) was recorded and the lowest temperature of 3.3 °C was recorded (on 26 January 1964). The temperature ranges from 5 to 20 °C in Winter (December to March), 25 to 40 °C in Summer (April to June) and 20 to 40 °C in the rainy season (July to September). The humidity ranges from 30% to 100%. The soil type is alluvial of the Ganga plain derived from the soft dolomitic rocks of the Himalayas. Due to its location away from any major water body and its proximity to the Himalayas, the study area of Roorkee has an extreme and erratic continental climate, with average annual wind speed of 4.9 m/s (http://nihroorkee.gov.in/location.html; accessed 27 September 2018). Daily rainfall data (*P*), minimum temperature (*T*_{min}), maximum temperature (*T*_{max}) and relative humidity (RH) from January 2001 to December 2013 were collected from NIH (2018), Roorkee Observatory.

### Artificial neural network

An ANN is an information processing structure that consists of several interconnected processing elements called nodes which are analogous to human brain neurons. Each node combines some inputs and produces an output which is then transmitted to many different locations including other nodes (Affandi *et al.* 2008). Synapses are the structures where weight values are stored. An ANN model can compute complex nonlinear problems that may be too difficult to represent by conventional mathematical equations (McCulloh & Pitts 1943). An ANN model with the feedforward neural network (FFNN) architecture was used in the present study to estimate the daily pan evaporation.

#### Feedforward neural network

*x*(

_{i}*i*= 1, 2, 3

*…n*) = net input,

*w*(

_{i}*i*

*=*1, 2, 3

*…n*) = respective weights.

#### Network architecture

### Multiple linear regressions

*y*and two or more independent variables by fitting a linear equation to observed data. The value of the dependent variable

*y*is associated with each response of independent variables. The regression equation of

*y*can be written as follows:where

*y*is the response variable or the dependent variable,

*x*

_{1},

*x*

_{2}…

*x*are the independent variables and

_{n}*m*

_{1},

*m*

_{2}…

*m*are the regression coefficients and

_{n}*C*is constant.

When there was a linear relationship between the dependent variable and independent variables, then the MLR models were the standard method for the estimation of responses between a dependent variable and various independent variables.

### Data formulation

The daily observed data of relative humidity (RH), rainfall (*P*), minimum temperature (*T*_{min}) and maximum temperature (*T*_{max}) of the current day and previous days from January 2001 to December 2013 (4,787 daily datasets) were explored to estimate the current day pan evaporation. Out of 4,787 datasets, 1–3,286 datasets (62.50%) were used for calibration, while datasets from 3,287 to 4,787 (37.50%) were used for validation. These datasets were selected based on a trial-and-error method.

### Normalization of input data

*R*is the real value applied to neuron

_{i}*i*,

*N*is the subsequent normalized value calculated for neuron

_{i}*i*, Min

*is the minimum value of all values applied to neuron*

_{i}*i*, Max

*is the maximum value of all values applied to neuron*

_{i}*i*.

### Performance evaluation of the developed ANN model and the MLR model

The qualitative and quantitative evaluation of the developed models was performed to judge the goodness of fit between observed and estimated values. The qualitative observation is based on the visual comparison of graphs between the observed and estimated evaporation values. The performances of the model based on quantitative evaluation during calibration and validation were evaluated by using statistical parameters, namely correlation coefficient (*r*), root-mean-square error (RMSE) and coefficient of efficiency (CE).

#### Coefficient of efficiency

Chiew *et al.* (1993) classified the CE into three categories: perfectly acceptable simulation (CE >0.90), acceptable simulation (CE between 0.60 and 0.90) and unacceptable simulation (CE <0.60).

#### Root-mean-square error

#### Correlation coefficient

*r*) is an indicator of the degree of closeness between observed and predicted values. If observed and predicted values are completely independent, the correlation coefficient will be zero and is given as per the following equation:where

*y*is a predicted variable,

_{j}*y*

_{ej}is observed variable, is the mean of the predicted valve, is the mean of the observed valve and

*n*is the number of observation.

## RESULTS AND DISCUSSION

### Selection of input vector

The cross-correlation analysis between the dependent and independent variables was carried out to select the significant input vector. To identify the input vector, a detailed cross-correlation analysis of the following variables was done.

- (i)
Daily rainfall (

*P*) valves with daily evaporation valves. - (ii)
Daily maximum (

*T*_{max}) and minimum temperature (*T*_{min}) valves with daily evaporation valves. - (iii)
Daily relative humidity (RH) valves with daily evaporation valves.

The cross-correlation graph was plotted between inputs used in analysis and pan evaporation. It was observed that the daily rainfall (*P*) valves and RH are negatively correlated with evaporation valves (Figures 3(a), 3(b), 4(a) and 4(b)).

The correlation analysis was also done to find the exact lag between the dependent and independent variables. The correlation coefficients were estimated between daily pan evaporation and different input parameters such as daily rainfall (*P*), daily relative humidity (RH), daily maximum temperature (*T*_{max}) and daily minimum temperature (*T*_{min}). The maximum positive correlation coefficient value estimated between *T*_{max} and evaporation was 0.72, whereas that estimated value between *T*_{min} and evaporation was 0.53. The negative correlation coefficient with values −0.030 and −0.610 was found between evaporation and rainfall (*P*) and relative humidity (RH), respectively, as shown in Table 1. Negative correlation means that evaporation decreases with an increase in rainfall and RH.

S. no. . | Data . | Correlation coefficient with evaporation . |
---|---|---|

1 | Rainfall (mm) | −0.03 |

2 | Relative humidity (%) | −0.61 |

3 | Maximum temperature (°C) | 0.72 |

4 | Minimum temperature (°C) | 0.53 |

S. no. . | Data . | Correlation coefficient with evaporation . |
---|---|---|

1 | Rainfall (mm) | −0.03 |

2 | Relative humidity (%) | −0.61 |

3 | Maximum temperature (°C) | 0.72 |

4 | Minimum temperature (°C) | 0.53 |

*P*

_{t}_{−1}), one day-lagged relative humidity (RH

_{t}_{−1}) current day maximum temperature (

*T*

_{max}) and current day minimum temperature (

*T*

_{min}) as shown in the following equation. There was one node in the output layer which estimates evaporation

*E*(

*t*).

### Development of ANN- and MLR-based models

*P*

_{t}_{−1}), one day-lagged RH (RH

_{t}_{−1}, daily

*T*

_{max}and daily

*T*

_{min)}were taken as the independent variables and pan evaporation was taken as a dependent variable for the MLR model. The regression equation developed for the present study was given in the following equation:

*m*

_{1},

*m*

_{2},

*m*

_{3}and

*m*

_{4}were found according to the regression analysis between the dependent variable and independent variables. The regression coefficient values

*m*

_{1}= −0.010 observed between evaporation and daily rainfall,

*m*

_{2}= 0.199 and

*m*

_{3}= 0.008 observed between daily evaporation and daily maximum temperature, daily minimum temperature, respectively, and

*m*

_{4}= −0.044 observed between evaporation and daily RH. From regression analysis, it was observed that rainfall and RH were negatively related to evaporation. The multiple linear equation was developed as per the following equation:The number of neurons in the hidden layer is explored from 1 to 10 based on the trial-and-error procedure. The transfer functions of hidden and output layers have been considered as log sigmoid and pure linear, respectively, in the training of the ANN model.

### Performance evaluation of ANN and MLR models

On the basis of performance of the ANN models during calibration and validation periods, the EVAP9 model with structure ANN (4-9-1), with sigmoid as activation function and Levenberg–Marquardt as a learning algorithm, was selected as the best performing model among the all ANN models, and the values of *r*, CE and RMSE for training and validation periods were found to be 0.885, 0.785 and 1.001 mm/day and 0.889, 0.782 and 1.005 mm/day, respectively. The MLR model with values of *r*, CE and RMSE for training and validation periods, 0.835, 0.698 and 1.187 mm/day, and 0.866, 0.750 and 1.152 mm/day, respectively, as given in Table 2 was selected from all the structures. Even the ANN structure 4-7-1 with seven neurons in the hidden layer had given a better result, but the differences between the results of these two structures with different numbers of neurons in the hidden layer were negligible, and also further iterating the model with more than 10 neurons in the hidden layer, the performance of the model was fluctuating (decreasing and then is increasing) which might have led to the overfitting. The performance of the best ANN structure (4-9-1) with nine neurons in the hidden layer and the MLR model for the estimation of daily pan evaporation at Roorkee during calibration and validation were plotted against observed daily pan evaporation as depicted in Figures 5–10. The scatter plots and graphs between the ordinates of the best ANN model, the MLR model and observed evaporation during calibration and validation demonstrate the potentiality of the developed ANN model over the MLR model in the estimation of the daily pan evaporation. The results of the calibration and validation of the best ANN and MLR models in terms of various statistical indices are presented in Table 3.

Model no. . | ANN structure . | Calibration . | Validation . | ||||
---|---|---|---|---|---|---|---|

r
. | CE . | RMSE . | r
. | CE . | RMSE . | ||

EVAP1 | 4-1-1 | 0.875 | 0.766 | 1.05 | 0.885 | 0.774 | 1.02 |

EVAP2 | 4-2-1 | 0.879 | 0.773 | 1.03 | 0.888 | 0.782 | 1.01 |

EVAP3 | 4-3-1 | 0.880 | 0.776 | 1.02 | 0.889 | 0.781 | 1.01 |

EVAP4 | 4-4-1 | 0.882 | 0.780 | 1.02 | 0.889 | 0.783 | 1.00 |

EVAP5 | 4-5-1 | 0.882 | 0.779 | 1.02 | 0.889 | 0.783 | 1.00 |

EVAP6 | 4-6-1 | 0.883 | 0.781 | 1.01 | 0.889 | 0.782 | 1.01 |

EVAP7 | 4-7-1 | 0.884 | 0.783 | 1.01 | 0.889 | 0.782 | 1.01 |

EVAP8 | 4-8-1 | 0.883 | 0.782 | 1.01 | 0.890 | 0.784 | 1.00 |

EVAP9 | 4-9-1 | 0.885 | 0.785 | 1.00 | 0.889 | 0.782 | 1.01 |

EVAP10 | 4-10-1 | 0.882 | 0.780 | 1.01 | 0.882 | 0.786 | 0.99 |

Model no. . | ANN structure . | Calibration . | Validation . | ||||
---|---|---|---|---|---|---|---|

r
. | CE . | RMSE . | r
. | CE . | RMSE . | ||

EVAP1 | 4-1-1 | 0.875 | 0.766 | 1.05 | 0.885 | 0.774 | 1.02 |

EVAP2 | 4-2-1 | 0.879 | 0.773 | 1.03 | 0.888 | 0.782 | 1.01 |

EVAP3 | 4-3-1 | 0.880 | 0.776 | 1.02 | 0.889 | 0.781 | 1.01 |

EVAP4 | 4-4-1 | 0.882 | 0.780 | 1.02 | 0.889 | 0.783 | 1.00 |

EVAP5 | 4-5-1 | 0.882 | 0.779 | 1.02 | 0.889 | 0.783 | 1.00 |

EVAP6 | 4-6-1 | 0.883 | 0.781 | 1.01 | 0.889 | 0.782 | 1.01 |

EVAP7 | 4-7-1 | 0.884 | 0.783 | 1.01 | 0.889 | 0.782 | 1.01 |

EVAP8 | 4-8-1 | 0.883 | 0.782 | 1.01 | 0.890 | 0.784 | 1.00 |

EVAP9 | 4-9-1 | 0.885 | 0.785 | 1.00 | 0.889 | 0.782 | 1.01 |

EVAP10 | 4-10-1 | 0.882 | 0.780 | 1.01 | 0.882 | 0.786 | 0.99 |

Model . | . | Calibration . | . | Validation . | ||
---|---|---|---|---|---|---|

r
. | RMSE (mm/day) . | CE . | r
. | RMSE (mm/day) . | CE . | |

ANNEVAP4 (4-9-1) | 0.885 | 1.00 | 0.785 | 0.889 | 1.01 | 0.782 |

MLR EVAP | 0.835 | 1.19 | 0.698 | 0.856 | 1.15 | 0.750 |

Model . | . | Calibration . | . | Validation . | ||
---|---|---|---|---|---|---|

r
. | RMSE (mm/day) . | CE . | r
. | RMSE (mm/day) . | CE . | |

ANNEVAP4 (4-9-1) | 0.885 | 1.00 | 0.785 | 0.889 | 1.01 | 0.782 |

MLR EVAP | 0.835 | 1.19 | 0.698 | 0.856 | 1.15 | 0.750 |

### Sensitivity analysis

To check the effect of each input parameter on the pan evaporation, sensitivity analysis has been performed by utilizing the selected ANN (4-9-1) model. The effectiveness of each parameter was judged by skipping parameters one by one from the selected model with four inputs as depicted in Table 4. These models were explored in the ANN with the sigmoid activation function and Levenberg–Marquardt as a learning algorithm with nine neurons in the hidden layer. The model without RH_{t}_{−1} has a very low value of *r* (0.78) and CE (0.723) during training. Similar values of *r* and CE were observed during testing periods. The higher value of RMSE was observed with the variables one day-lagged rainfall (*P _{t}*

_{−1}), current day maximum temperature (

*T*

_{max}) and current day minimum temperature (

*T*

_{min}) by skipping RH

_{t}_{−1}as compared with a model with all variables.

Input variables . | Calibration . | Validation . | ||||
---|---|---|---|---|---|---|

r
. | RMSE (mm/day) . | CE . | r
. | RMSE (mm/day) . | CE . | |

P_{t}_{−1}, RH_{t}_{−1}, T_{max}, T_{min} | 0.885 | 1.00 | 0.785 | 0.889 | 1.01 | 0.782 |

RH_{t}_{−1}, T_{max}, T_{min} | 0.810 | 1.26 | 0.763 | 0.742 | 1.18 | 0.615 |

P_{t}_{−1}, T_{max}, T_{min} | 0.780 | 1.30 | 0.723 | 0.729 | 1.26 | 0.737 |

P_{t}_{−1}, RH_{t}_{−1}, T_{min} | 0.854 | 1.12 | 0.743 | 0.786 | 1.28 | 0.715 |

P_{t}_{−1}, RH_{t}_{−1}, T_{max} | 0.860 | 1.16 | 0.743 | 0.792 | 1.18 | 0.745 |

Input variables . | Calibration . | Validation . | ||||
---|---|---|---|---|---|---|

r
. | RMSE (mm/day) . | CE . | r
. | RMSE (mm/day) . | CE . | |

P_{t}_{−1}, RH_{t}_{−1}, T_{max}, T_{min} | 0.885 | 1.00 | 0.785 | 0.889 | 1.01 | 0.782 |

RH_{t}_{−1}, T_{max}, T_{min} | 0.810 | 1.26 | 0.763 | 0.742 | 1.18 | 0.615 |

P_{t}_{−1}, T_{max}, T_{min} | 0.780 | 1.30 | 0.723 | 0.729 | 1.26 | 0.737 |

P_{t}_{−1}, RH_{t}_{−1}, T_{min} | 0.854 | 1.12 | 0.743 | 0.786 | 1.28 | 0.715 |

P_{t}_{−1}, RH_{t}_{−1}, T_{max} | 0.860 | 1.16 | 0.743 | 0.792 | 1.18 | 0.745 |

## CONCLUSIONS

The estimation of evaporation at the regional level is cumbersome with limited meteorological data due to the complex nature of the evaporation process. It can easily be performed using an ANN with available measured meteorological parameters. Evaporation is highly correlated with current day minimum and maximum temperatures as compared with one day-lagged rainfall and RH. The ANN model with a single hidden layer having nine neurons was the best architecture among several architectures tested using log-sigmoid activation function and Levenberg–Marquardt learning algorithm. The selected ANN model showed greater capability for the estimation of evaporation, when compared with the MLR-based model, due to higher optimizing capacity to reach the desired result with fewer requirements of input parameters under validation. The model efficiency and correlation coefficient of the ANN model was higher than the MLR model during calibration and validation, but the RMSE value was higher in the MLR model. One day-lagged RH is the most sensitive variable followed by day-lagged rainfall, current day maximum temperature and current day minimum temperature. Results of the study for the estimation of evaporation are of great importance for planning and management of irrigation management projects.