Abstract
A study was carried out to develop and evaluate the performance of different machine learning (ML) models for predicting reference evapotranspiration (ET0). The models included multiple linear regression (MLR), least square-support vector machine (LS-SVM), artificial neural networks (ANNs) and adaptive neuro-fuzzy inference system (ANFIS). The daily meteorological data for 50 years (1970–2019) were used to estimate ET0 using FAO-ET calculator. The FAO-ET calculator was compared with ML models to investigate the best-fit ML model for predicting ET. Thereafter, ET predicted by the best-fit ML model was compared with satellite (Moderate Resolution Imaging Spectroradiometer – MODIS) ET, which was finally mapped to a larger landscape (over entire Punjab and Haryana). Modeling of ET0 was best performed through LS-SVM followed by ANN2, ANN1, ANFIS10, ANFIS2, MLR and ANFIS9 models. Among developed models, coefficient of determination (R2) value varied from 0.800 to 0.998, being highest (0.998) under LS-SVM model. MODIS overestimated ET when compared with LS-SVM having R2 and root mean square error (RMSE) values of 0.73 and 3.95 mm, respectively. After applying the bias correction factor, R2 and RMSE were 0.74 and 1.19 mm, respectively. The ML and satellite-based ET estimation would be useful for timely water budgeting to manage the water scarcity problems from local to regional levels.
HIGHLIGHTS
The study region is experiencing an acute decline in groundwater resources, so it becomes imperative to manage water resources based on evapotranspiration.
The traditional methods are not time-efficient, but machine learning and remote sensing techniques may solve this problem.
The applications of machine learning models for estimating ET and use of satellite-derived ET are limited in the present study region.
INTRODUCTION
The evapotranspiration (ET), which is one of the major components of the hydrologic cycle (Eslamian et al. 2012; Callañaupa Gutierrez et al. 2021; Duhan et al. 2021; Singh et al. 2022a, 2022b), is a nonlinear and complex phenomenon, affected by numerous factors involving micrometeorological variables, soil properties, crop physiognomies and field management practices (Allen et al. 1998; Zotarelli et al. 2010). Reference evapotranspiration (ET0) is the most widely used variable to work out the crop water requirement, irrigation scheduling, hydrological water balance, simulating crop yields and designing of irrigation systems (Zhang et al. 2015; Satpute et al. 2021). ET0 is the amount of water lost from a continuous stretch of vegetation for given climatic conditions when there is no lack of water (Gangopadhyay et al. 1966). So, its precise estimation becomes highly significant to manage the available irrigation water, particularly in regions of water scarcity.
ET0 can be calculated using microweather methods based on energy balance and vapor/mass flux transfer approaches or indirectly using empirical techniques (Penman 1948; Thornthwaite 1948; Blaney & Criddle 1950; Monteith 1965; Priestley & Taylor 1972; Hargreaves & Samani 1982). However, direct methods are time and cost effective when compared with indirect approaches, which require only site-specific micrometeorological data. The FAO 56 PM method (Allen et al. 1998) has been validated against lysimeter data under different climatic conditions of the world and found superior to other empirical models for ET0 estimation. According to Suleiman & Hoogenboom (2007), the PM method can enhance irrigation efficiency under humid climate of Georgia, in the coastal and mountainous areas. George & Raghuwanshi (2012) identified FAO24 RD method as closely related tothe PM method for the humid site. While evaluating the performance of 18 ET0 models against the PM method for the humid subtropical ecosystem of North-East India, Pandey et al. (2016) reported IRMAK3, TURC and FAO24-Blaney-Criddle methods as alternatives for PM method. Duhan et al. (2021) concluded that PM-ET0 is closely related to pan evaporation (ETpan), and it is a good indicator of ETpan in a semi-arid region of central India. Vishwakarma et al. (2022) reported that the PM method outperformed against ETpan among 30 models in humid and subtropical regions of India.
The drawback of PM method is the large number of input variables for ET estimation (Feng et al. 2017; Fan et al. 2018).
The recently developed machine learning (ML) models are more accurate when compared with empirical models under different environmental conditions (Reis et al. 2019; Gonzalez del Cerro et al. 2020), as a nonlinear and complex phenomenon of ET0 can be easily simulated using ML models. ML is a subsection of artificial intelligence (AI) that uses prior experience (training of data) to learn and apply it to execute new tasks. As AI maps nonlinear relations among input and output variables, it improves the prediction accuracy (Fan et al. 2018; Chia et al. 2020a). The applications of ML models for forecasting ET have strongly increased in the recent past (Feng et al. 2017; Chia et al. 2020b, 2021; Duhan et al. 2021). Different ML algorithms being used for predicting ET0 globally include adaptive neuro-fuzzy neural network (Keshtegar et al. 2018), least square-support vector machine: LS-SVM (Reis et al. 2019), fuzzy logic (Malik & Kumar 2015), multiple-layer perception neural network (Seifi & Riahi 2020), relevance vector machine (Bachour et al. 2016), multivariate regression spline (Mehdizadeh et al. 2017) and many more. The high precision, robustness and ability to handle big data motivate the use of artificial neural network (ANN) models (Govindaraju 2000). Likewise, SVM is a powerful method for classification and regression, which tries to decrease the least square errors (Duhan et al. 2021). SVM models include LS-SVM (Tabari et al. 2012), least square-support vector regression (LS-SVR) and SVM based on kernel-like polynomial (SVM-polynomial) and radial basis (SVM-radial).
According to Reis et al. (2019), ANN gives better estimates of ET0 when compared with the empirical methods such as the original Hargreaves-Samani method in the semi-arid area of Brazil using temperature data only. The LS-SVM model is more efficient in comparison to ANN and Adaptive Neuro-Fuzzy Inference System (ANFIS) models for predicting daily ET0 in arid conditions (Zahedan station) of Iran (Seifi & Riahi 2020). According to Dou & Yang (2018), ANN is better than ANFIS and extreme learning machine models for predicting ET0 in cropland and grassland ecosystems. Nema et al. (2017) explored the ability of ANN to predict the monthly ET in subhumid climatic region of Dehradun, India. The fuzzy genetic followed by LS-SVR model is the best model for ETpan estimation in Dongting Lake Basin of China (Wang et al. 2017). Granata (2019) reported the M5P regression tree as the best model followed by SVR and Random Forest in a subtropical humid climate of Florida, USA.
In recent years, ET estimation has improved using remote sensing techniques, mainly in agricultural and hydrological fields. The researchers have compared satellite-based global moderate resolution imaging spectroradiometer (MODIS) ET (MOD16, Mu et al. 2011) product with lysimeter-based ground ET (Knipper et al. 2017; Duhan et al. 2021; Rodrigues & Braga 2021). The estimation error is reduced after applying bias correction in NASA POWER data for estimating ET0, and NASA POWER reanalysis products are a suitable choice to estimate ET0 in an area where most of the meteorological parameters are not available (Rodrigues & Braga 2021). However, the satellite-based ET estimation, their use and application with the ground-based observations of ET are restricted in the Indian context. Furthermore, the development and application of ML approaches/models for ET estimation are limited and the information on the topic is still incomplete, particularly in the study region. The application of ML and satellite-based models may be useful to estimate the ET0 at a farm and regional scale which may be used for the development of irrigation plans and managing crop water requirements.
To the best of our knowledge, no attempt has been made to test the ability of ML models for the prediction of ET0 and its comparison with satellite-based ET in the semi-arid subtropical climate of Indian Punjab. Keeping the above information in view, the present study was undertaken with the following objectives: (i) to calculate ET0 using the FAO-ET calculator for the semi-arid subtropical climate of Indian Punjab, (ii) to develop ANFIS, ANN, LS-SVM and MLR models for predicting ET0, (iii) to identify the best-fit ML model from among LS-SVM, ANN, ANFIS and MLR for predicting ET0 in the study area, (iv) to understand the similarity between satellite-based ET (MODIS satellite) and ET estimated by best-fit ML model and (v) finally, to map the ET at a larger landscape (over entire Punjab and Haryana) from satellite using correction bias factors.
STUDY AREA
MATERIALS AND METHODS
Details of data
Meteorological data
The daily micrometeorological data of past 50 years (1970–2019) for maximum air temperature (Tmax, °C), minimum air temperature (Tmin, °C), maximum relative humidity (RHmax, %), minimum relative humidity (RHmin, %), wind speed (m/s) and a number of sunshine hours (h/day) were obtained from the meteorological observatory of PAU, Ludhiana. The monthly values were obtained by averaging the daily values of each calendar month.
Satellite data and processing
The satellite data of MODIS product of 500-m resolution (MODIS-MOD16A2 L4) was downloaded from the earth data website (https://earthdata.nasa.gov/) on an 8-day interval basis from January 2001 to December 2019, which was processed (i.e. ET/8) to get daily average values of each month for the comparison with meteorological data-based ET. The spatial variation of ET from satellite data were seen on an annual average basis before and after corrections using average monthly ET data.
Methodology
The description of PM method (Allen et al. 1998) may be found in previous research works (Jhajharia et al. 2012; Darshana et al. 2013; Vishwakarma et al. 2022). The performance of developed models was carried out using root mean square error (RMSE), R2, NASH and normalized mean square error (NMSE), which is explained in Table 1. The MATLAB language was used for the development of the models in this study.
Performance indicators used in the present study
Sr. No. . | Performance indicator . | Equation . | Range . |
---|---|---|---|
1 | Root mean square error (RMSE) | ![]() | 0 to ∞ |
2 | Normalized root mean square error (NRMSE) | ![]() | 0 to ∞ |
3 | Nash–Sutcliffe model efficiency (NASH) | ![]() | 0 to + ∞ |
4. | Coefficient of determination (R2) | ![]() | 0 and 1 |
Sr. No. . | Performance indicator . | Equation . | Range . |
---|---|---|---|
1 | Root mean square error (RMSE) | ![]() | 0 to ∞ |
2 | Normalized root mean square error (NRMSE) | ![]() | 0 to ∞ |
3 | Nash–Sutcliffe model efficiency (NASH) | ![]() | 0 to + ∞ |
4. | Coefficient of determination (R2) | ![]() | 0 and 1 |
Note: ET0,obs = observed ET0, ET0,cal = modeled ET0, N = number of observations, = mean of calculated ET0,
= mean of observed ET0 and
= mean of modeled ET0.
Brief description of the ML models used
The brief description of models (i.e. ANN, LS-SVM, ANFIS and MLR) used are given in the following subsections.
Artificial neural network
ANN is a classical supervised learning method (Maier & Dandy 2000) that simulates the capability of the human brain to detect patterns using trial and error procedures. The architecture of ANN consists of three layers, namely the input layer in which data are imported into the network, the hidden layer(s) in which data are processed and the output layer in which the results are generated (Duhan & Pandey 2015; Patil & Deka 2016), and ANN network can be trained by altering the weight values between elements. To produce the results, the nonlinear functions between hidden and output layers were used. The input data are processed using nonlinear functions at hidden and output layers to produce the results. The most popularly used neural networks, i.e. feedforward and log-sigmoid (a nonlinear function) are used in the present study. For the training of ANN, there are many algorithms available such as Levenberg-Marquardt (LM) back propagation (BR), Bayesian regulation BR and one-step secant BR algorithms. From these training algorithms, LM and BR were found superior in prior studies (Deo & Sahin 2015; Duhan & Pandey 2015; Kisi et al. 2015). Therefore, in the present study, ANN model was trained using BR and LM algorithms. The number of neurons was identified using the trial and error procedure in the hidden layer. The ‘logsig’ for the hidden layer and ‘purelin’ functions for the output layer was used as an activation function in the present study. The details of the ANN model may be found in the research work of Duhan & Pandey (2015).
Adaptive neuro-fuzzy inference system
ANFIS model (Jang 1993) is a kind of ANN model based on an unsupervised fuzzy inference system (FIS), develops a nonlinear relationship between inputs and outputs and reduces the randomness and nontransparency error which is the main problem in the ANN method. The premise parameters are identified using gradient descent and linear consequent parameter using the least square method in the present study. The learning algorithm tunes all the consequence parameters and membership function (MF) parameters. The selection of FIS among the grid partitioning, subtractive clustering and fuzzy c-means clustering is a critical step for the development of the ANFIS model with a given dataset. Detail information about ANFIS may be found in Jang (1993). In this study, FIS was generated through grid partition and subtracted clustering methods to assess their impacts on the capability of ANFIS model for the calculation of monthly ET. To generate FIS in a grid partitioning algorithm, Gaussian and trapezoidal functions with constant and linear MF type output were adopted and trial and error procedures were adopted to determine its numbers.
The subtracted clustering method of FIS generation was established using accept ratio = 0.5, a reject ratio = 0.15, squash-factor = 1.25 and the range of influence = 0.5 for the development of the models. The developed models using both the FIS were trained using hybrid and BR training algorithms with 1,000 iterations and which provides minimum errors was selected as the best algorithm. Then, the ANFIS models were tested using performance statistics. The architectures of the developed ANFIS models are shown in Table 2.
Description of ANFIS models using different FIS and MF types
Models’ name . | Fuzzy Inference System (FIS) . | Membership function . | Output MF type . | Algorithm used . |
---|---|---|---|---|
ANFIS1 | Grid partition | Gaussian | Linear | Hybrid |
ANFIS2 | Linear | Back propagation | ||
ANFIS3 | Constant | Hybrid | ||
ANFIS4 | Constant | Back propagation | ||
ANFIS5 | Trapezoidal | Constant | Hybrid | |
ANFIS6 | Constant | Back propagation | ||
ANFIS7 | Linear | Hybrid | ||
ANFIS8 | Linear | Back propagation | ||
ANFIS9 | Subtracted Clustering | – | – | Back propagation |
ANFIS10 | – | – | Hybrid |
Models’ name . | Fuzzy Inference System (FIS) . | Membership function . | Output MF type . | Algorithm used . |
---|---|---|---|---|
ANFIS1 | Grid partition | Gaussian | Linear | Hybrid |
ANFIS2 | Linear | Back propagation | ||
ANFIS3 | Constant | Hybrid | ||
ANFIS4 | Constant | Back propagation | ||
ANFIS5 | Trapezoidal | Constant | Hybrid | |
ANFIS6 | Constant | Back propagation | ||
ANFIS7 | Linear | Hybrid | ||
ANFIS8 | Linear | Back propagation | ||
ANFIS9 | Subtracted Clustering | – | – | Back propagation |
ANFIS10 | – | – | Hybrid |
Least square-support vector machine
The LS-SVM (Suykens & Vandewalle 1999) is a least square version of support vector machines (SVMs) and is used for classification and regression analysis based on the supervised learning method (Duhan & Pandey 2015). It is a kernel-based learning method including linear, polynomial, sigmoid and radial basis function (RBF). For the present study, the RBF kernel is selected because it represents the hydro-meteorological processes effectively from another kernel and was identified as the best kernel among the above in the past hydro-meteorological studies (Debnath & Takahashi 2004; Cheng-Ping et al. 2011; Duhan & Pandey 2015; Seifi & Riahi 2020). The ‘gam’ and the squared kernel are the main parameters required for the development of the LS-SVM model which is acquired using RBF kernel type and grid search optimization algorithm. The grid search method is used for the determination of RBF kernel width (sig2) and parameter game which is required for the development of the model using RBF kernel. The optimal value for sig2 and gam are 19.56 and 3,425.78, respectively. Based on size of the data, 10-fold cross-validation (when the data is less than or equal to 300) and leave-one-out cross-validation (when the data is greater than 300) were used for tuning the parameters (De Brabanter et al. 2011). For cross-validation, the mean square error (MSE) is applied as a loss function. More details about the model may be found in the research work of Duhan & Pandey (2015).
Bias correction techniques used in the present study
The aim of the bias correction method is to use the probability of exceedance of monthly satellite ET values and match this probability with the observed LS-SVM ET value. The following steps were followed for bias corrections:
Step 1. Compute empirical probability of exceedance curve Pobs(T) from LS-SVM ET values for a particular month.
Step 2. Compute empirical probability of exceedance curve Psatellite(t) from satellite ET for the same period and same month as from LS-SVM ET values for a particular month from Step 1.
Step 3. Apply bias correction approach, i = 1, number of observations:
- (a)
Select a given value of Tsatellite.
- (b)
For this particular value of Tsatellite, compute its probability of exceedance from the curve derived in Step 2.
Using the probability obtained in Step 3(b), go to the probability of exceedance curve Pobs(T) and assume Psatellite(t) = Pobs(T)
- (c)
Using the probability Pobs(T), compute Tobs.
- (d)
Do this for all months.
Step 4. Repeat steps 2–3.
Model development
The monthly meteorological data of Tmax, Tmin, RHmax, RHmin, wind speed (WS) and n were used for the calibration and validation of different models as an input parameter and estimated PM-ET as a target parameter for the development of all models. The data from January 1970 to December 2004 (around 70%) are used for training the model and remaining data from January 2005 to December 2019 are used for testing the model for all the techniques used in this study.
Performance evaluation of developed models
The performance and accuracy of the developed models were evaluated using statistical indicators like coefficient of determination (R2, Nagelkerke 1991), Nash–Sutcliffe model efficiency (NASH, Nash & Sutcliffe 1970), RMSE (Huffman 1997) and NMSE (Poli & Cirillo 1993). The model which gives the highest value of R2 and NASH, and lowest values of RMSE and NMSE was considered as the best model. The mathematical equations of the statistical indicators are shown in Table 1.
Comparison of ET obtained from best-fit ML model with satellite ET
The monthly average MODIS ET was then compared with the ET obtained from best-fit identified ML model. A scatter plot and performance indices were drawn/estimated to understand the similarity and accuracy between models. After that, a bias correction technique adopted by Duhan et al. (2021) was applied to reduce the error in MODIS ET. Using the developed bias correction factor (Δ), the ET was predicted at a larger landscape (over entire Punjab and Haryana) for the year 2021.
RESULTS AND DISCUSSION
Preliminary analysis
Location of stations and characteristics of daily meteorological variables during 1970–2019
Station name . | Lat. (N) . | Long. (E) . | Alt., amsl (m) . | Tmax (°C) . | Tmin (°C) . | RHmax (%) . | RHmin (%) . | Wind speed (m/s) . | No. of sunshine (h/day) . | ET0 (mm/day) . |
---|---|---|---|---|---|---|---|---|---|---|
PAU Ludhiana | 30°54′ | 75°48′ | 247 | 5.2–47.2 | 1.6–44.4 | 15–100 | 3–99 | 0–4.2 | 0–14 | 0.6–11.5 |
Station name . | Lat. (N) . | Long. (E) . | Alt., amsl (m) . | Tmax (°C) . | Tmin (°C) . | RHmax (%) . | RHmin (%) . | Wind speed (m/s) . | No. of sunshine (h/day) . | ET0 (mm/day) . |
---|---|---|---|---|---|---|---|---|---|---|
PAU Ludhiana | 30°54′ | 75°48′ | 247 | 5.2–47.2 | 1.6–44.4 | 15–100 | 3–99 | 0–4.2 | 0–14 | 0.6–11.5 |
Monthly variation of climatic variables during 1970–2019 in the study area.
Performance evaluation of developed models
Statistical performance of potential evapotranspiration (PET) estimates using different models versus PM-ET on monthly basis during the calibration (1970–2004) and validation (2005–2019) period
Models’ name . | Algorithm used . | Model no. . | R2 . | NASH . | RMSE . | NMSE . | ||||
---|---|---|---|---|---|---|---|---|---|---|
Cali. . | Vali. . | Cali. . | Vali. . | Cali. . | Vali. . | Cali. . | Vali. . | |||
MLR | 0.9817 | 0.9878 | 0.9817 | 0.9808 | 0.255 | 0.3116 | 0.0182 | 0.0290 | ||
SVM | Radial | 0.9962 | 0.9981 | 0.9962 | 0.9981 | 0.1158 | 0.0798 | 0.0038 | 0.0094 | |
ANN | Levenberg-Marquardt | ANN1 | 0.9950 | 0.9936 | 0.995 | 0.9932 | 0.1331 | 0.1503 | 0.005 | 0.0068 |
ANN | Bayesian regularization | ANN2 | 0.9959 | 0.9957 | 0.9959 | 0.9957 | 0.1213 | 0.1202 | 0.0041 | 0.0043 |
ANFISGP, Gaussion, linear | Hybrid | ANFIS1 | 1 | 0.0940 | 1 | 0.5769 | 0.0004 | 3.902 | 0 | 4.5515 |
ANFISGP, Gaussion, linear | Back propagation | ANFIS2 | 0.9857 | 0.9793 | 0.9845 | 0.9749 | 0.2345 | 0.2893 | 0.0154 | 0.025 |
ANFISGP, Gaussion, constant | Hybrid | ANFIS3 | 0.9984 | 0.7950 | 0.9984 | 0.7831 | 0.0755 | 0.8494 | 0.0016 | 0.2157 |
ANFISGP, Gaussion, constant | Back propagation | ANFIS4 | 0.2886 | 0.3042 | 0.5405 | 0.3187 | 2.3404 | 2.0945 | 1.5369 | 1.3113 |
ANFISGP, Trapz, constant | Hybrid | ANFIS5 | 0.9939 | 0.8690 | 0.9939 | 0.8334 | 0.1475 | 0.7445 | 0.0061 | 0.1657 |
ANFISGP, Trapz, constant | Back propagation | ANFIS6 | 0.9494 | 0.9189 | 0.9484 | 0.9117 | 0.4283 | 0.542 | 0.0515 | 0.0878 |
ANFISGP, Trapz, linear | Hybrid | ANFIS7 | 0.9996 | 0.5213 | 0.9996 | 0.4902 | 0.0355 | 0.9456 | 0.0004 | 2.7603 |
ANFISGP, Trapz, linear | Back propagation | ANFIS8 | 0.9408 | 0.9129 | 0.9401 | 0.9039 | 0.4616 | 0.5654 | 0.0598 | 0.0956 |
ANFISSubtracted Clustering | Back propagation | ANFIS9 | 0.9853 | 0.9833 | 0.9629 | 0.9718 | 0.3634 | 0.3062 | 0.037 | 0.0280 |
ANFISSubtracted Clustering | Hybrid | ANFIS10 | 0.9963 | 0.9865 | 0.9963 | 0.9828 | 0.1141 | 0.2394 | 0.0037 | 0.0171 |
Models’ name . | Algorithm used . | Model no. . | R2 . | NASH . | RMSE . | NMSE . | ||||
---|---|---|---|---|---|---|---|---|---|---|
Cali. . | Vali. . | Cali. . | Vali. . | Cali. . | Vali. . | Cali. . | Vali. . | |||
MLR | 0.9817 | 0.9878 | 0.9817 | 0.9808 | 0.255 | 0.3116 | 0.0182 | 0.0290 | ||
SVM | Radial | 0.9962 | 0.9981 | 0.9962 | 0.9981 | 0.1158 | 0.0798 | 0.0038 | 0.0094 | |
ANN | Levenberg-Marquardt | ANN1 | 0.9950 | 0.9936 | 0.995 | 0.9932 | 0.1331 | 0.1503 | 0.005 | 0.0068 |
ANN | Bayesian regularization | ANN2 | 0.9959 | 0.9957 | 0.9959 | 0.9957 | 0.1213 | 0.1202 | 0.0041 | 0.0043 |
ANFISGP, Gaussion, linear | Hybrid | ANFIS1 | 1 | 0.0940 | 1 | 0.5769 | 0.0004 | 3.902 | 0 | 4.5515 |
ANFISGP, Gaussion, linear | Back propagation | ANFIS2 | 0.9857 | 0.9793 | 0.9845 | 0.9749 | 0.2345 | 0.2893 | 0.0154 | 0.025 |
ANFISGP, Gaussion, constant | Hybrid | ANFIS3 | 0.9984 | 0.7950 | 0.9984 | 0.7831 | 0.0755 | 0.8494 | 0.0016 | 0.2157 |
ANFISGP, Gaussion, constant | Back propagation | ANFIS4 | 0.2886 | 0.3042 | 0.5405 | 0.3187 | 2.3404 | 2.0945 | 1.5369 | 1.3113 |
ANFISGP, Trapz, constant | Hybrid | ANFIS5 | 0.9939 | 0.8690 | 0.9939 | 0.8334 | 0.1475 | 0.7445 | 0.0061 | 0.1657 |
ANFISGP, Trapz, constant | Back propagation | ANFIS6 | 0.9494 | 0.9189 | 0.9484 | 0.9117 | 0.4283 | 0.542 | 0.0515 | 0.0878 |
ANFISGP, Trapz, linear | Hybrid | ANFIS7 | 0.9996 | 0.5213 | 0.9996 | 0.4902 | 0.0355 | 0.9456 | 0.0004 | 2.7603 |
ANFISGP, Trapz, linear | Back propagation | ANFIS8 | 0.9408 | 0.9129 | 0.9401 | 0.9039 | 0.4616 | 0.5654 | 0.0598 | 0.0956 |
ANFISSubtracted Clustering | Back propagation | ANFIS9 | 0.9853 | 0.9833 | 0.9629 | 0.9718 | 0.3634 | 0.3062 | 0.037 | 0.0280 |
ANFISSubtracted Clustering | Hybrid | ANFIS10 | 0.9963 | 0.9865 | 0.9963 | 0.9828 | 0.1141 | 0.2394 | 0.0037 | 0.0171 |
Note: Higher values of R2 and NASH and lower values of RMSE and NMSE give the best performing model.
Scatter plots between predicted ET0 using machine learning models and the corresponding PM-ET0 during the calibration period (January 1970–December 2004).
Scatter plots between predicted ET0 using machine learning models and the corresponding PM-ET0 during the calibration period (January 1970–December 2004).
Scatter plots between predicted ET0 using machine learning models and the corresponding PM-ET0 during the validation period (January, 2005–December, 2019).
Scatter plots between predicted ET0 using machine learning models and the corresponding PM-ET0 during the validation period (January, 2005–December, 2019).
The ANN models were developed using BR and LM algorithms and the results showed that both the models performed well during training and testing procedures. But BR algorithm (R2 = 0.9959 and 0.9936; NASH = 0.9959 and 0.9932; RMSE = 0.1213 and 0.1503 and NMSE = 0.0041 and 0.0068) was found little superior than LM algorithm (R2 = 0.9817 and 0.9957; NASH = 0.9817 and 0.9957; RMSE = 0.255 and 0.1202 and NMSE = 0.0182 and 0.0043) during calibration and validation periods.
For the development of ANFIS model, in the grid partition method of FIS (ANFIS1 to ANFIS8), trapezoidal and Gaussian membership functions with constant and linear MF type output were used to generate FIS. It was found that all the models have high efficiency during training, but the model performance was deteriorating in some models (ANFIS1, ANFIS4 and ANFIS7) during testing. By comparing the results of Gaussian FIS of ANFIS model (ANFIS1 to ANFIS4), it was observed that BR algorithm with linear output (ANFIS2) followed by hybrid algorithm with constant output (ANFIS3) gives better results out of the different combination of membership functions outputs (i.e. linear and constant) and algorithms (i.e. BR and hybrid). ANFIS4 model is the worst performer among the above models during both calibration and validation periods (R2 = 0.2886 and 0.3042; NASH = 0.5405 and 0.3187; RMSE = 2.3404 and 2.0945 and NMSE = 1.5369 and 1.3113). It is also worth mentioning here that ANFIS1 is the highest performing model among the selected models during calibration but the performance deteriorated in the validation period. The results of trapezoidal MF of ANFIS model (ANFIS5 to ANFIS8) indicated that all the combination gives a better performance during calibration and validation except the hybrid algorithm with linear output (ANFIS7).
Among developed ANFIS models using different combinations of grid partition method (ANFIS1 to ANFIS8) showed that Gaussian MF with back propagation algorithm with linear output (ANFIS2) is a good choice to predict monthly ET0 among Gaussian and Trapezoidal MFs with different combinations of MFs outputs (i.e. linear and constant) and algorithms (i.e. BR and hybrid). Furthermore, the applied MF may affect the performance of grid partitioning-based ANFIS models. The results of ANFIS models (ANFIS1 to ANFIS10) using grid partition and subtracted clustering method of FIS showed that the models developed using clustering method outperformed grid partition method during training and testing in terms of R2, NMSE, RMSE and NASH values. This finding is similar to the study conducted by Dou & Yang (2018) for the ET estimation in different terrestrial ecosystems.
It is worth mentioning here that some models performed well during calibration but not during validation. Therefore, the best model was selected based on validation results. It is also seen from Table 4 that the best-identified model in the study area is LS-SVM, followed by ANN2, ANN1, ANFIS10, ANFIS2, MLR and ANFIS9 for monthly ET0 prediction in a semi-arid subtropical climate of Indian Punjab.
Furthermore, to check the significance of the LS-SVM model among other models used in the present study, Student's t-test as suggested by Santafe et al. (2015) was performed to see the statistical significance of R2 obtained for various algorithms (Table 4) with respect to LS-SVM. The validation results were found to be statistically significant at the p-value of 0.01 (significance level of 1%). Single-factor analysis of variance test also showed statistical significance with p-value of 0.01 (Table 5).
Analysis of variance (ANOVA) for significance analysis
Source of variation . | Sum of squares . | Degrees of freedom . | Mean square . | F-value . | p-Value . |
---|---|---|---|---|---|
Between groups | 0.232 | 1 | 0.232 | 7.08 | 0.01 |
Within groups | 1.63 | 50 | 0.03 |
Source of variation . | Sum of squares . | Degrees of freedom . | Mean square . | F-value . | p-Value . |
---|---|---|---|---|---|
Between groups | 0.232 | 1 | 0.232 | 7.08 | 0.01 |
Within groups | 1.63 | 50 | 0.03 |
Figures 4 and 5 show the scatter plots of models that performed well during calibration and validation, respectively. It is seen from Figure 4 that all the models overestimated the ET0 in the present study region. The R2 ranged between 0.94 and 0.998 during the training period and the highest R2 (0.998) was given by the ANFIS3 model. The scatter plots of the ANFIS10 (R2 = 0.987), LS-SVM (R2 = 0.996), ANN2 (R2 = 0.996), ANN1 (R2 = 0.995) and ANFIS5 (R2 = 0.994) models represent almost the equal values of R2 (Figure 4). During validation (Figure 5), R2 varied between 0.998 and 0.80 among the developed models. The highest R2 (0.998) was given by the LS-SVM model followed by ANN2 (R2 = 0.995), ANN1 (R2 = 0.994), ANFIS10 (R2 = 0.987), MLR (R2 = 0.987), ANFIS9 (R2 = 0.983), ANFIS2 (R2 = 0.979), ANFIS6 (R2 = 0.919), ANFIS8 (R2 = 0.913), ANFIS5 (R2 = 0.869) and ANFIS3 (R2 = 0.795) during the testing period.
Comparison of ET between satellite and best-fit identified ML model
Monthly bias correction factor (Δ) for conversion of MODIS PET data to corrected PET data (mm/day)
Jan . | Feb . | Mar . | April . | May . | June . | July . | August . | September . | Oct . | Nov . | Dec . |
---|---|---|---|---|---|---|---|---|---|---|---|
2.77 | 3.04 | 3.455 | 3.575 | 2.89 | 1.035 | 0.385 | 0.485 | 0.105 | 1.205 | 1.965 | 2.375 |
Jan . | Feb . | Mar . | April . | May . | June . | July . | August . | September . | Oct . | Nov . | Dec . |
---|---|---|---|---|---|---|---|---|---|---|---|
2.77 | 3.04 | 3.455 | 3.575 | 2.89 | 1.035 | 0.385 | 0.485 | 0.105 | 1.205 | 1.965 | 2.375 |
Correlation between ET from satellite and best-identified LS-SVM model before (a) and after (b) bias corrections.
Correlation between ET from satellite and best-identified LS-SVM model before (a) and after (b) bias corrections.
Spatial variations of ET from satellite before (a) and after (b) bias corrections.
Spatial variations of ET from satellite before (a) and after (b) bias corrections.
CONCLUSION
The present research demonstrated the ability of ML algorithms (MLR, ANN, LS-SVM and ANFIS) for modeling and predicting the monthly ET0 in semi-arid subtropical climate of Indian Punjab. From the results, it is found that during calibration, the best performing model was ANFIS10 followed by LS-SVM, ANN2, ANN1, ANFIS2, MLR and ANFIS9 for monthly ET0 prediction in the study area. However, during validation period, LS-SVM model was identified as the best model followed by ANN2, ANN1, ANFIS10, ANFIS2, MLR and ANFIS9 for monthly ET0 prediction. Among the subtractive clustering and grid partition algorithm of FIS of ANFIS model, subtractive clustering algorithm outperformed when compared with grid partition algorithm. It can be concluded that LS-SVM model is a better choice due to their robustness and flexibility. Furthermore, the satellite overestimated ET in comparison to best-fit identified ML model (LS-SVM) and the performance of MODIS product for the ET estimation increased after applying the correction factor (RMSE reduces to 1.18 from 3.9 mm). The application of ML-based models proves to be useful for handling big data and precise estimates of ET in lowest possible time. This type of study is essential and useful for agricultural scientists and policymakers to analyze the water budget and to cope up with the water scarcity problems at the local to regional levels at different temporal and spatial scales.
ACKNOWLEDGEMENTS
The authors acknowledge the Head of the Department of Climate Change & Agricultural Meteorology, PAU Ludhiana, for providing the desired meteorology datasets.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.